Intel has validated its AI product portfolio for the first Meta Llama 3 8B and 70B models across Intel Gaudi accelerators, Intel Xeon processors, Intel Core Ultra processors and Intel Arc graphics.
As part of its mission to bring AI everywhere, Intel invests in the software and AI ecosystem to ensure that its products are ready for the latest innovations in the dynamic AI space. In the data centre, Intel Gaudi and Intel Xeon processors with Intel Advanced Matrix Extension (Intel AMX) acceleration give customers options to meet dynamic and wide-ranging requirements.
Intel Core Ultra processors and Intel Arc graphics products provide both a local development vehicle and deployment across millions of devices with support for comprehensive software frameworks and tools, including PyTorch and Intel Extension for PyTorch used for local research and development and OpenVINO toolkit for model development and inference.
Intel’s initial testing and performance results for Llama 3 8B and 70B models use open-source software, including PyTorch, DeepSpeed, Intel Optimum Habana library and Intel Extension for PyTorch to provide the latest software optimisations.
Intel Xeon processors address demanding end-to-end AI workloads, and Intel invests in optimising LLM results to reduce latency. Intel Xeon 6 processors with Performance-cores (code-named Granite Rapids) show a 2x improvement on Llama 3 8B inference latency compared with 4th Gen Intel Xeon processors and the ability to run larger language models, like Llama 3 70B, under 100ms per generated token.
Intel Core Ultra and Intel Arc Graphics deliver impressive performance for Llama 3. In an initial round of testing, Intel Core Ultra processors already generate faster than typical human reading speeds. Further, the Intel Arc A770 GPU has Xe Matrix eXtensions (XMX) AI acceleration and 16GB of dedicated memory to provide exceptional performance for LLM workloads.
Discussion about this post