Cloudflare announced that it is collaborating with Microsoft to make it easier for companies to run AI in the location most suitable for their needs. As inference tasks become more and more distributed, this collaboration will enable businesses to seamlessly deploy AI models across a computing continuum that spans device, network edge, and cloud environments – maximising the benefits of both centralised and distributed computing models. By leveraging ONNX Runtime across these three tiers, Cloudflare and Microsoft can ensure that AI models can be run wherever processing makes the most sense across this three-tier architecture – from the hyperscale cloud to the hyper-distributed network edge to devices themselves – that best addresses the bandwidth, latency, connectivity, processing, battery/energy, and data sovereignty and localisation demands of a given application or service.
AI model training requires significant computational and storage resources in close proximity to one another, making centralised cloud platforms the best environment for the intensive calculations needed in model training. While training will continue to be centralised, inference tasks will be increasingly performed in more distributed locations, specifically on devices themselves and on edge networks. For example, some inference tasks (e.g., an autonomous vehicle braking at the sight of a pedestrian) will run on the physical device for the lowest possible latency. However, to navigate device limitations such as compute, storage, and battery power, more and more tasks will need to run on edge networks. Edge networks – close geographical proximity to end users and devices – will provide an optimal balance of computational resources, speed, and data privacy. Some applications may require moving through three tiers of this computing continuum, with device, edge network, and cloud environments working together to bring the best experience to the end user.
“Together, Cloudflare and Microsoft will build the railroad tracks that AI traffic and tasks will move on, to tailor AI inference to the exact needs and demands of every organisation,” said Matthew Prince, CEO and co-founder, Cloudflare. “Whether you’re looking for speed or accuracy, dealing with energy or connectivity bandwidth challenges, or complying with regional localisation requirements, Cloudflare and Microsoft can help you find the best location for your AI tasks.”
“As companies explore the best way to harness the power of generative AI in unique ways to meet their needs, the ability to run AI models anywhere is paramount,” said Rashmi Misra, GM of Data, AI, & Emerging Technologies at Microsoft. “With Cloudflare’s global network, combined with Microsoft’s experience in training and deploying the world’s most advanced AI workloads through our Azure cloud, businesses will gain access to a new level of flexibility and performance for AI inference.”
Together, Cloudflare and Microsoft will collaborate to make it easy for companies to run AI in the place most suitable for the workload. There are two pieces to making this happen:
- Microsoft’s ONNX Runtime creates a standardised solution that allows the same models to be deployed regardless of environment, whether on device (Windows, mobile, or in-browser), on the distributed network edge (Cloudflare), or in Azure’s centralised cloud platform.
- Cloudflare can provide the infrastructure for routing traffic across the different environments, depending on connectivity, latency, compliance, or other requirements.
Businesses want to be able to move inference tasks across this continuum of device, edge network, and cloud, depending on the performance, cost, and regulatory requirements they face. Microsoft’s AI capabilities and hyperscale cloud infrastructure combined with Cloudflare’s hyper-distributed edge network will empower businesses to drive innovation and efficiency across the entire AI lifecycle. As a result, businesses will be able to:
- Find the best location for AI tasks: Choose to deploy AI inference wherever processing makes the most sense to achieve the desired outcomes, maximising the benefits of both centralised and distributed computing models. For example, a security camera system could utilise edge networks to run object detection. This overcomes the resource constraints of the device itself, without the latency of sending data to a central server for processing.
- Navigate changing needs: Run models in all three locations and adjust or fall back based on availability, use case, and latency requirements.
- Deploy on Cloudflare in a few clicks: Access easy deployable models and ML tooling capabilities on Workers AI through Microsoft Azure Machine Learning.
Discussion about this post