unsubbed.co

Envoy AI Gateway

Envoy AI Gateway gives you gateway built on Envoy for routing application traffic to GenAI services. Supports 16+ LLM on your own infrastructure.

Open-source AI traffic management, honestly reviewed. This one is for engineering teams, not no-code users.

TL;DR

  • What it is: An open-source (Apache-2.0) AI gateway built on top of Envoy Gateway and Envoy Proxy — a reverse proxy that sits between your applications and LLM providers (OpenAI, Anthropic, AWS Bedrock, etc.) and handles routing, rate limiting, failover, and upstream authentication [1].
  • Who it’s for: Platform engineering teams and DevOps engineers already running Kubernetes who want a unified, production-grade control plane for LLM traffic. Not for non-technical founders. Not for small teams on a shared VPS.
  • Cost angle: This tool doesn’t replace a SaaS subscription. It controls your LLM API spending — rate limiting, token budgets, failover to cheaper providers — so your OpenAI/Anthropic bills don’t spiral out of control at scale.
  • Key strength: Built on battle-tested Envoy Proxy infrastructure; genuinely enterprise-grade routing, observability, and security for teams already in the Envoy ecosystem. Real adopters include Bloomberg and Tencent Cloud [website].
  • Key weakness: Requires Kubernetes. Requires familiarity with Envoy Gateway. At v0.5, it’s still early-stage infrastructure software — expect rough edges. The available independent reviews are thin; most public documentation is the project’s own.

What is Envoy AI Gateway

Envoy AI Gateway is a Kubernetes-native proxy specifically designed for routing traffic between application clients and generative AI services. Built on Envoy Gateway — itself an abstraction over Envoy Proxy — it adds LLM-specific capabilities: token-based rate limiting, multi-provider routing, upstream authentication to AI APIs, and failover between models [1].

The project describes itself as using a two-tier gateway pattern. The Tier One Gateway functions as a centralized entry point handling authentication, top-level routing, and global rate limiting. The Tier Two Gateway handles ingress traffic to a self-hosted model serving cluster, with endpoint picker support for LLM inference optimization [README].

The practical translation: you deploy this on your Kubernetes cluster, point your applications at it instead of directly at OpenAI, and get a single control plane that enforces policies across all your LLM traffic. If one provider goes down, it fails over automatically. If a team is burning too many tokens, you throttle them. If you want to route some requests to a cheaper self-hosted model and others to GPT-4, you configure routing rules [1][5].

As of this review, the project is at v0.5 with 1,443 GitHub stars under the Apache-2.0 license. The CNCF-style community includes weekly Monday meetings and a Slack channel [website]. Real-world adopters on the homepage include Bloomberg, LY Corporation (Line/Yahoo Japan), National Research Platform, Nutanix, Tencent Cloud, and Tetrate [website].


Why people choose it

The honest answer is that the public review record for Envoy AI Gateway is sparse. The five sources provided for this review are all versions of the project’s own official documentation (v0.2 through v0.5 and latest) — there are no independent Trustpilot pages, no user community reviews, no third-party comparison articles in the data available. That’s partly a function of age (v0.5, relatively new) and partly a function of audience: infrastructure proxy tools don’t get the same consumer review coverage that no-code tools do.

What we can synthesize from the documentation and adopter profile:

The Envoy ecosystem argument. Engineers who already run Envoy Proxy or Istio in their clusters know the Envoy configuration model. Envoy AI Gateway extends that model with AI-specific primitives instead of forcing teams to adopt an entirely separate tool for LLM traffic [1][3]. For teams already invested in Envoy, this is the path of least resistance.

Enterprise-grade reliability over convenience. The tool is explicitly designed for organizations where LLM calls are production traffic, not experiments. Bloomberg and Tencent Cloud are not running this for side projects — they need fault-tolerant routing, observability, and policy enforcement at scale [website][1].

Multi-provider risk reduction. Committing your entire application to OpenAI’s API endpoint is a single point of failure and a single vendor’s pricing power. Envoy AI Gateway lets you define fallback routes: if OpenAI is slow or rate-limiting you, automatically route to Azure OpenAI or AWS Bedrock. You write that logic once in a Kubernetes custom resource, not in every application [1][5].

MCP routing — the newest capability. A recent blog post on the project site announces support for Model Context Protocol (MCP), enabling enterprise-grade security and routing for AI agent tool integrations. The post describes full spec compliance, OAuth authentication, and stateful session handling [website]. This is still new enough that production hardening should be assumed rather than guaranteed.


Features

Based on the official documentation across versions:

Traffic routing and management:

  • Unified routing layer across 16 LLM providers: OpenAI, Anthropic, Azure OpenAI, Google Gemini, Vertex AI, AWS Bedrock, Mistral, Cohere, Groq, Together AI, DeepInfra, DeepSeek, Hunyuan, SambaNova, Grok, and Tetrate Agent Router Service [README]
  • Automatic failover between providers when one becomes unavailable [1]
  • Fine-grained routing rules via Kubernetes custom resources (AIGatewayRoute) [1]
  • Benchmarked to 2,000 AIGatewayRoute resources with measured control plane latency, CPU, and memory [website blog]

Rate limiting and policy:

  • Token-based rate limiting (per-token budgets, not just per-request) [1]
  • Global and per-backend rate limiting [1]
  • Policy framework for usage limiting across teams/applications [1][5]

Security:

  • Upstream authentication to AI provider APIs (manages credentials, not your app code) [1]
  • Fine-grained access control and authorization policies [1][5]
  • MCP support with OAuth authentication for AI agent integrations [website]

Observability:

  • Traffic performance metrics [1]
  • Usage pattern monitoring [1]
  • Cost analytics to track LLM spending across providers [1][5]

Architecture:

  • Kubernetes-native, managed via custom resources [1]
  • Extensible via Envoy’s existing extension framework [1]
  • CNCF Code of Conduct, weekly community meetings [website]

What’s not present: there’s no GUI, no no-code interface, no drag-and-drop configuration. This is YAML and Kubernetes all the way down.


Pricing: SaaS vs self-hosted math

Envoy AI Gateway itself is free and open source under Apache-2.0. There’s no hosted version, no premium tier, no commercial license. The project is community-driven without a company selling enterprise seats [README].

The cost math is therefore different from tools like Zapier or n8n. The question is not “how much does this tool cost” but “how much money does this tool save you on your LLM API bills.”

What it saves:

  • Token-based rate limiting prevents runaway costs when a bug or a bad actor floods your OpenAI endpoint. Without a gateway, you discover the problem on your credit card statement.
  • Multi-provider failover lets you route to cheaper providers under normal load and reserve expensive models for critical paths.
  • Centralized observability means you can actually see which team or application is burning your token budget — data you can’t easily get when every service calls the LLM API directly.

What it costs to run:

  • A Kubernetes cluster — if you don’t already have one, this is a significant prerequisite. Minimum viable Kubernetes on AWS/GCP/Azure runs $70–150/month for the control plane plus node compute.
  • Operational overhead: someone needs to understand Envoy configuration, maintain the deployment, and handle version upgrades as the project evolves toward 1.0.
  • Engineering time for initial setup and policy configuration.

For teams already running Kubernetes at any reasonable scale, the marginal cost of adding Envoy AI Gateway is close to zero. For a founder who doesn’t have a Kubernetes cluster, the entry cost is everything — this is not the right tool.


Deployment reality check

There’s no getting around the prerequisite: Envoy AI Gateway requires Kubernetes. The documentation’s quickstart assumes you have a cluster running and kubectl configured. If you’re evaluating this for a startup running on a single VPS, stop here — this is not the right tool [1].

For teams that have Kubernetes, the deployment path goes through Envoy Gateway first. Envoy AI Gateway is an extension of Envoy Gateway, not a standalone component. The install sequence is:

  1. Install Envoy Gateway on your cluster (separate project, separate Helm chart)
  2. Install Envoy AI Gateway on top of it
  3. Configure AIGatewayRoute custom resources to define your routing rules
  4. Configure BackendSecurityPolicy for upstream auth to your LLM providers

What can go sideways:

  • At v0.5, this is pre-1.0 software. APIs and custom resource definitions may change between versions — your configuration from v0.4 may need updates for v0.5 [2][3]. Budget migration effort for each version upgrade.
  • MCP support is brand new. The blog post describing it is from the most recent batch of content on the site — treat it as experimental [website].
  • The control plane scaling benchmark (2,000 AIGatewayRoute resources) is a positive signal, but 2,000 is a modest ceiling for large multi-tenant deployments. The benchmark is also self-reported [website blog].
  • Documentation quality is consistent across versions [1][2][3][4][5] but lacks the depth of operational runbooks — you’ll be reading source code and Slack threads for edge cases.

Realistic setup time for an experienced platform engineer: half a day to have something working, another day or two to properly configure policies for production. For a team new to Envoy, add another day to understand the configuration model.


Pros and Cons

Pros

  • Apache-2.0 license. Genuinely permissive — no commercial restrictions, no “fair code” ambiguity. You can run it, fork it, embed it in your product [README].
  • Built on Envoy Proxy. Envoy is battle-tested at Google, Lyft, and hundreds of enterprises. The proxy layer isn’t experimental — only the AI-specific extensions are [1].
  • Real enterprise adopters. Bloomberg and Tencent Cloud using a tool in production is a meaningful signal. This isn’t purely experimental [website].
  • 16 providers out of the box. Covers the realistic shortlist — OpenAI, Anthropic, AWS Bedrock, Google, Azure — plus a long tail of newer providers [README].
  • Token-based rate limiting. Per-request rate limiting misses the real problem with LLM costs; per-token budgets are the correct abstraction [1][5].
  • MCP routing support. Early but real — if your architecture is building toward AI agents making tool calls, having this handled at the gateway level is architecturally clean [website].
  • CNCF-style community. Weekly meetings, Slack, GitHub issues — not a one-company closed project [website][1].

Cons

  • Requires Kubernetes. This immediately disqualifies it for solo founders, small teams, and anyone on managed VPS. Not a critique — it’s a design decision — but it must be stated clearly.
  • Pre-1.0 software (v0.5). Custom resource definitions, APIs, and behaviors can change. Not production-stable in the way that Envoy Proxy itself is [1][2][3][4][5].
  • No independent public reviews. The only available “reviews” are the project’s own documentation. There’s no community review record to draw on, no Trustpilot equivalent, no forum threads from frustrated or happy users. You’re making decisions with thin public signal.
  • Small star count relative to alternatives. 1,443 GitHub stars versus LiteLLM (~15,000+) suggests a significantly smaller community and ecosystem of worked examples, integrations, and blog posts.
  • No GUI. Everything is YAML and kubectl. If your team expects a dashboard for LLM usage analytics, you’re building it yourself or integrating an observability stack.
  • MCP support is new and untested at scale. The blog post uses careful language (“aligned with the broader Envoy ecosystem,” “stays competitive”) — treat MCP routing as experimental until the community produces production case studies [website].

Who should use this / who shouldn’t

Use Envoy AI Gateway if:

  • You’re a platform engineering team already running Kubernetes and Envoy, and you want LLM traffic to be a first-class concern in your existing infrastructure.
  • You’re building a multi-tenant platform that serves LLM calls for multiple internal teams and you need token budgets, rate limiting, and auditability per team.
  • You’re building applications that need to be independent of any single LLM provider and want failover baked into infrastructure rather than application code.
  • You’re already familiar with Envoy configuration and don’t want to learn an entirely new tool’s paradigm.

Skip it if:

  • You’re a non-technical founder or small team. The Kubernetes prerequisite alone disqualifies this for you.
  • You’re still in prototype/MVP stage. Introducing a gateway layer before you’ve validated your LLM use case is premature optimization.
  • You need a dashboard, a no-code interface, or any kind of visual configuration tool.
  • You want a stable, 1.0, production-hardened tool today. At v0.5, this is still maturing.

Consider LiteLLM instead if:

  • You want multi-provider LLM routing with a management dashboard, without the Kubernetes requirement.
  • Your team is comfortable with Python but not with Envoy’s configuration model.
  • You want a larger community, more blog posts, and more worked examples to draw from.

Alternatives worth considering

  • LiteLLM — the most direct alternative for multi-provider LLM routing. Runs as a standalone Python service, includes a management dashboard, much larger community (~15,000+ GitHub stars). Lower infrastructure bar than Envoy AI Gateway. Less Kubernetes-native.
  • Kong AI Gateway — Kong’s commercial extension for LLM traffic management. More mature enterprise tooling but less open. Relevant if you’re already on Kong.
  • Portkey — managed SaaS for LLM gateway functionality. Zero infrastructure, dashboard included. Relevant if you want the features without self-hosting anything.
  • OpenRouter — managed multi-provider LLM routing as a SaaS API. Not self-hosted at all, but eliminates the provider lock-in problem without any ops work.
  • Traefik AI / custom middleware — teams already on Traefik can build similar routing behavior without adopting Envoy. More DIY, less batteries-included.

For engineering teams deep in the Envoy ecosystem, Envoy AI Gateway is the cleanest architectural choice. For everyone else, LiteLLM is the more accessible starting point.


Bottom line

Envoy AI Gateway is the right tool for a specific profile: platform engineering teams already running Kubernetes, already familiar with Envoy Proxy, building multi-provider LLM applications where token cost control and failover are first-class concerns. For that profile, the Apache-2.0 license, real enterprise adopters, and the backing of the Envoy community make this a credible infrastructure choice.

For everyone else — especially the non-technical founder trying to escape SaaS bills — this is not the tool. The Kubernetes prerequisite is a hard wall, v0.5 means real API churn risk, and the community is small enough that you’ll be mostly on your own when something breaks in production. If you’re running LLM calls in a startup without a dedicated platform team, LiteLLM or a managed gateway like Portkey will get you 90% of the same benefits with a fraction of the operational overhead.

If Kubernetes infrastructure setup is the blocker, that’s what upready.dev handles for clients — one-time setup, production-ready, you own it.


Sources

  1. Envoy AI Gateway Documentation (latest) — Official project documentation covering architecture, key objectives, and project goals. https://aigateway.envoyproxy.io/docs/
  2. Envoy AI Gateway Documentation v0.2 — Archived documentation for version 0.2. https://aigateway.envoyproxy.io/docs/0.2/
  3. Envoy AI Gateway Documentation v0.4 — Documentation for version 0.4. https://aigateway.envoyproxy.io/docs/0.4/
  4. Envoy AI Gateway Documentation v0.3 — Archived documentation for version 0.3. https://aigateway.envoyproxy.io/docs/0.3/
  5. Envoy AI Gateway Documentation v0.5 — Documentation for current release v0.5. https://aigateway.envoyproxy.io/docs/0.5/

Primary sources: