Workers AI
What is it?
Workers AI lets developers run AI inference (using pre-trained machine learning models) directly on Cloudflare's network. Instead of sending AI requests to a centralized GPU cluster (like OpenAI's API), Workers AI runs models on GPUs deployed in Cloudflare data centers, bringing AI compute closer to users.
Workers AI supports a catalog of open-source models including large language models (Meta's Llama), image generation (Stable Diffusion), speech-to-text (Whisper), translation, embedding, and more.
What problem does it solve?
- Latency: Sending requests to a centralized AI API adds round-trip latency. Running models at the edge brings inference closer to users.
- Cost: Major AI providers charge significant per-token or per-request fees. Workers AI offers competitive pricing with generous free tiers.
- Privacy: Some applications need to run AI inference without sending data to third-party providers. Workers AI processes data on Cloudflare's network.
- Integration: Developers building on Workers need AI capabilities that integrate natively with the rest of the platform, without managing separate AI infrastructure.
How does it work?
- A developer binds an AI model to their Worker using the
@cf/model identifier (e.g.,@cf/meta/llama-3.2-3b-instruct). - In their Worker code, they call
env.AI.run()with the model name and input data. - Cloudflare routes the request to the nearest data center with available GPU capacity.
- The model processes the input and returns the result.
- The Worker can then use the result in its response — for example, generating a chatbot reply, summarizing text, or classifying an image.
Available model types:
- Text Generation: Chat, summarization, code generation (Llama, Mistral, Gemma)
- Text Embeddings: Converting text to vectors for semantic search
- Image Generation: Creating images from text prompts (Stable Diffusion)
- Speech-to-Text: Transcription (Whisper)
- Translation: Converting text between languages
- Image Classification: Identifying objects in images
Why it matters strategically
Workers AI is Cloudflare's entry point into the AI infrastructure market. As AI becomes central to every application, Cloudflare wants to be the platform developers choose to run their AI workloads — not just their web traffic. It ties directly into Act 4 (the Agentic Web), where AI agents will need low-latency inference everywhere. Workers AI also creates a new revenue stream based on GPU compute, diversifying Cloudflare beyond network services.