Chatbots, facial recognition, autonomous vehicles, an earlier diagnosis of diseases: the range of applications that AI could be deployed for seems endless. Whilst it will still take a while to fully understand the capabilities and implications of this new technology, organisations are already preparing their IT infrastructure to meet the needs of AI and machine learning tools. So, what is different about AI networks and how must cabling, connectivity and network design adapt to cope with the new demands?
AI benefits
The global AI market is valued at $142.3 Billion with a forecasted CAGR of 17.3% through 2030. Whilst it is undeniable that many industries will be able to take advantage of AI technology in the long term, it is the healthcare, finance and gaming markets that are heavily driving AI adoption today. AI is capable of supporting the development of precision medicine and can help speed up the process for new drugs to enter the market, it can improve fraud detection through improved identity verification, and help to advance risk simulation and algorithmic trading. Many companies in these verticals are looking to leverage AI networks and AI models to increase efficiencies, reduce costs and increase productivity.
What’s different about AI Networks?
The arrival of new and emerging technologies means that network bandwidth and speeds must increase and there is no difference with AI. However, the jump in bandwidth that AI requires is tremendous with server speeds now demanding 100G for text-based Generative AI, trained edge inferencing and machine learning, 200/400Gb/s for training models and even 800Gb/s/1.6 Tb/s for training models, HPC and quantum computing.
The second game changer with AI networks is power consumption which increases dramatically due to very power-hungry GPUs (graphical processing units) used in AI nodes. These GPUs require anywhere from 6.5kW to more than 11kW per node (the average power consumption of a fully loaded data centre cabinet ranges around 7 or 8kW with a typical max of 15kW per entire cabinet) and data centres are evaluating more efficient cooling methods, such as direct-to-chip liquid cooling and liquid immersion cooling to handle the high heat generation.
Thirdly, we are seeing that a large majority of AI clusters that are being built today are based on InfiniBand technology and NVIDIA hardware which performs with ultra-low latency and is lossless. However, both Ethernet and InfiniBand are expected to coexist, with Ethernet expected to grow, driven by performance enhancements and a desire to have a multi-source ecosystem.
Impact on cabling, connectivity and network design
Data centre operators need to be prepared for the demands of AI and this includes
considerations around cabling and connectivity for fast and easy migration to higher data speeds but also considerations around the network design for improved power efficiencies.
To handle the power requirements, data centres are making major design changes, including spreading compute cabinets out more, and using end-of-row (EoR) or middle-of-row (MoR) designs, which increase the physical distance from switches to nodes. To facilitate longer switch-to-node connections, data centre operators may need to utilise more fibre cabling in addition to the typical structured fibre cabling for switch-to-switch connections. Active optical cables (AOCs) are also a very popular and cost-effective option since they can cover longer distances. AOCs also provide additional benefits with lower power consumption and better latency than transceivers.
Generative AI networks typically require Base-8 MTP fibre cabling to support 100, 200, 400, 800G and 1.6T for switch-to-switch and switch-to-server applications. Both single mode and OM4 fibre are being used in these AI networks that are being deployed by cloud service providers and large enterprise companies.
Siemon recommends AI-ready fibre solutions such as its high-density, end-to-end LightVerse single-mode and multi-mode MTP fibre systems that deliver high-performance, ultra-low-loss (ULL) transmission to 800G and beyond for compute and storage fabrics. In addition, direct attach cables (DACs) and active optical cables (AOCs) sould be considered for point-to-point high-speed, low-latency connections within back-end AI clusters for Ethernet, RoCE and InfiniBand networks.
Discussion about this post