VAST Data unveiled a new AI cloud architecture designed to deliver unprecedented levels of performance, quality of service, zero-trust security and space/cost/power efficiency for the AI factory. Building on NVIDIA BlueField-3 data processing unit (DPU) technology, VAST Data’s parallel system architecture makes it possible to disaggregate the entirety of VAST’s operating system natively into AI computing machinery, transforming supercomputers into AI data engines.
The NVIDIA BlueField networking platform combines robust compute power and integrated hardware accelerators to create secure and software-defined accelerated computing infrastructure for AI. By outfitting each GPU server with a dedicated NVIDIA BlueField DPU running a stateless container that powers the VAST parallel services operating system, this new architecture design embeds storage and database processing services directly into AI servers and delivers true linear data services designed to scale to hundreds of thousands of GPUs. Moreover, by removing multiple layers of x86 hardware and networking from VAST’s network-attached Data Platform infrastructure, this new AI factory architecture dramatically reduces the cost, footprint, and power associated with AI data services.
Through its collaboration with NVIDIA and this first-of-its-kind integration, VAST Data is:
- Maximising Data Centre Efficiency: VAST’s Disaggregated, Shared Everything (DASE) architecture leverages the processing power of NVIDIA BlueField-3 to require less independent compute and networking resources, reducing the power usage and data centre footprint for VAST infrastructure by 70%. The combined end-to-end solution results in a net energy consumption savings of over 5% compared to deploying NVIDIA-powered supercomputers with the previous VAST distributed data services infrastructure.
- Enabling Unprecedented Quality of Service: By providing each GPU server with a dedicated and truly parallel storage and database container, this new AI factory architecture eliminates contention for data services infrastructure. VAST’s DASE architecture features extreme parallelism such that each NVIDIA BlueField-3 can read and write into shared namespaces of the VAST Data Platform without coordinating IO across containers. In essence, this architecture eliminates infrastructure contention at the most fundamental level. This contention-less architecture is essential for multi-tenant service providers who need to meet the contractual Service Level Objectives of their clients while also maximising the utilisation of all GPU computing assets.
- Enhancing Zero-Trust Security: This new AI factory architecture ensures that data and data management remain protected and isolated from host operating systems. Compared to AI computers that use parallel file system clients (which have an intimate understanding of the data services layer), VAST can eliminate many attack vectors in a multi-tenant environment by hosting industry-standard network attached services, object services, and database services from NVIDIA BlueField-3 DPUs via standard client protocols that do not expose the underlying Data Platform system topology – such as NFS, SMB, S3 and Apache Arrow.
- Delivering Block Storage Services: VAST systems, powered by the NVIDIA DOCA software framework that enables the rapid development of containerised services, now provides block storage services natively to host operating systems – combining with VAST’s file, object, and database services to provide a comprehensive set of data presentations to high-performance applications.
“We’re extremely proud to partner with NVIDIA to help industrialise AI computing,” said Jeff Denworth, co-founder at VAST Data. “This new architecture is the perfect showcase to express the parallelism of the VAST Data Platform. With NVIDIA BlueField-3 DPUs, we can now realise the full potential of our vision for disaggregated data centres that we’ve been working toward since the company was founded.”
This new VAST architecture – running VAST software on BlueField DPUs in the AI servers – is being tested and deployed first at CoreWeave, the leading specialised GPU cloud provider. VAST and CoreWeave began partnering in 2023 to build some of the world’s most scalable AI machinery and to help many of the world’s leading LLM builders and blue-chip enterprise customers build their own AI factories.
“With VAST’s operating system, next-generation accelerated computing solutions are paired with next-generation accelerated network infrastructure, enabling enterprises and service providers to benefit from simpler, more secure experiences with high-performance systems,” said Rob Davis, Vice President of Storage Technology at NVIDIA.
“VAST’s revolutionary architecture is a game-changer for CoreWeave, enabling us to fully disaggregate our data centres. We’re seamlessly integrating VAST’s advanced software directly into our GPU clusters,” said Peter Salanki, vice president of Engineering at CoreWeave. “Leveraging NVIDIA BlueField DPUs, we’ve been at the forefront of creating sophisticated, software-defined data centre abstractions. Now, by natively incorporating storage and database services onto BlueField, we’re not just streamlining our infrastructure but we are also elevating the user experience for our customers by removing bottlenecks in the AI data computing pipeline. CoreWeave is not just keeping pace with the future of cloud data management – we are defining it.”
Explore the possibilities with VAST Data at NVIDIA GTC and learn more about the NVIDIA BlueField-3 DPU integration by visiting VAST Data at Booth #1424.
Discussion about this post