Open-source AI gateway and LLM monitoring, honestly reviewed. No marketing fluff, just what you actually get when you route your API calls through it.

TL;DR

What it is: Open-source (Apache-2.0) LLM observability platform and AI gateway — a proxy layer that sits between your app and model providers, logging every request with cost, latency, and token usage [3][4].
Who it’s for: AI engineers and technical founders who want visibility into production LLM behavior without building logging infrastructure from scratch. Works best when you’re already spending real money on tokens and don’t know where it’s going [4][5].
Cost savings: Compared to commercial observability platforms, the self-hosted version is free beyond VPS costs. The cloud free tier (10K requests/month) covers early-stage experimentation [pricing page].
Key strength: One-line integration. You change one baseURL and your entire OpenAI/Anthropic call history appears in a dashboard, with cost breakdowns and latency percentiles, immediately [3][README].
Key weakness: The proxy architecture means your LLM calls flow through Helicone’s infrastructure — if Helicone has downtime, your requests fail too. And it only captures what passes through the proxy, so agent steps, retrieval, and business logic stay invisible [1].
Notable: Helicone was acquired by Mintlify in early 2026. What this means for the roadmap and self-hosted support is still unclear [homepage].

What is Helicone

Helicone is a proxy-based LLM observability platform. The core mechanic is simple: instead of calling https://api.openai.com/v1, you call https://ai-gateway.helicone.ai and add your Helicone API key to the headers. From that point on, every request gets logged — model, tokens in, tokens out, latency, cost, response — and surfaces in a dashboard with filters, search, and time-series charts [README][4].

Beyond logging, Helicone has expanded into three additional areas: an AI gateway (routing, caching, rate limiting, automatic fallbacks across 100+ models), prompt management (versioning and deploying prompts without code deploys), and datasets (storing request-response pairs for fine-tuning or eval runs) [README][4].

The project started as a Y Combinator W23 company focused purely on observability. In 2025, the team rebuilt their self-hosting architecture from twelve containers down to four after a potential enterprise customer called the original deployment “too onerous” and walked [3]. The GitHub repository sits at 5,279 stars with an Apache-2.0 license, which is a genuinely permissive open-source license — no commercial use restrictions, no “fair-code” asterisks [README].

In early 2026, Helicone announced it was joining Mintlify, a developer documentation company. The homepage now leads with that announcement rather than the product itself, which is an unusual sign to see mid-review.

Why people choose it (and why some leave)

The honest answer is: people choose Helicone because the integration friction is almost zero and the alternative is building logging yourself.

If you’re calling OpenAI from a production app and you don’t know how much it costs per user, per prompt, or per day — Helicone solves that problem in under five minutes [README][4]. You change one URL. You add one header. You get a dashboard. That simplicity is the core value proposition, and it’s real.

The comparisons that show up in the wild are mostly Helicone versus Braintrust, LangSmith, and Portkey.

Versus Braintrust. Braintrust’s own review [1] makes a sharp architectural argument against Helicone: the proxy model means your application reliability is coupled to Helicone’s infrastructure uptime. If Helicone’s servers have an incident, your LLM calls fail. Braintrust uses SDK-based tracing that logs asynchronously outside the request path, so observability failures don’t cascade into production failures [1]. That’s a real architectural difference that matters in production. Braintrust also argues that the proxy only captures what passes through it — if your application has retrieval steps, database calls, or multi-step agent logic, Helicone shows you the LLM call but none of the surrounding context [1]. For simple apps hitting one model, that’s fine. For complex agents, it’s a significant blind spot.

Versus Portkey. TrueFoundry’s comparison [5] puts Helicone’s new Rust-based AI gateway at ~8ms P50 latency overhead through edge deployment on Cloudflare Workers. That’s the headline advantage: Portkey routes through centralized infrastructure, Helicone runs at the edge, which helps for latency-sensitive applications. But Portkey processes 2.5 trillion tokens across 650+ organizations and has more mature enterprise features — audit trails, advanced RBAC, guardrails, extensive failover strategies [5]. The TrueFoundry review describes Helicone’s enterprise feature set as “limited” and its operational scope as “narrow” relative to Portkey [5].

Versus building it yourself. This is the comparison that rarely gets written down but is often the real alternative. Logging LLM calls, parsing token counts from provider APIs, computing costs from model pricing tables, building dashboards — all of that is undifferentiated infrastructure work. Helicone handles it, for free at low volume. For solo founders and small teams, that trade-off is usually obvious.

Features

Based on the README and documentation:

Core gateway:

Proxy routing to 100+ models (OpenAI, Anthropic, Gemini, Mistral, DeepSeek, Together AI, Groq, OpenRouter, and more) through a single API key [README]
0% markup on model costs — Helicone charges for its platform, not a percentage of your token spend [4]
Automatic fallbacks — if a primary model fails, route to a backup [README][4]
Semantic caching — cache responses for similar prompts to reduce latency and cost [4]
Rate limiting per user, per API key, or custom dimensions [README]
Load balancing across providers or model versions [4]

Observability:

Full request/response logging with timestamps, latency, token counts, and computed cost [README][4]
Session tracing — group related requests into a session (e.g., a full agent run or multi-turn conversation) [README]
User analytics — tag requests by user ID and see per-user cost and usage [README]
Custom properties — attach arbitrary metadata to any request (feature flag, experiment ID, customer tier) and filter by it in the dashboard [README]
Alerts for rate limit breaches, unusual patterns, or cost spikes [pricing page][4]
PostHog export — send logs to PostHog in one line for custom dashboards [README]

Prompt management:

Version prompts and deploy them through the gateway without a code deploy [README]
Playground — test prompts against production traces in a UI [README][4]
Datasets — curate request-response pairs for fine-tuning or running evals [README][4]

Agent tracing:

Session-based tracing groups multi-step agent runs into a single observable unit [README]
Human-in-the-loop tracking for workflows with approval steps [README]

Enterprise:

SOC 2, HIPAA, GDPR compliance on the managed cloud [README][pricing page]
SAML SSO — Enterprise tier only [pricing page]
On-premises deployment — Enterprise tier only [pricing page]
Configurable data retention — up to indefinite on Team and Enterprise [pricing page]

Pricing: SaaS vs self-hosted math

Helicone Cloud:

Hobby: Free. 10,000 requests/month, 1 GB storage, 1 seat, 7-day log retention [pricing page].
Pro: $79/month. 10K requests included, then usage-based. Unlimited seats, alerts, reporting, HQL query language, 1-month retention [pricing page].
Team: $799/month. 10K requests included, then usage-based. 5 organizations, SOC-2 and HIPAA compliance, dedicated Slack channel, 3-month retention [pricing page].
Enterprise: Custom. Everything in Team plus custom MSA, SAML SSO, on-prem deployment, bulk discounts, indefinite retention [pricing page].

The usage-based overage on top of flat rates is worth understanding before you commit. The pricing calculator on the website shows approximately $0.97/month for 10K requests — at higher volumes, storage and request counts stack on top of the base price [pricing page].

Self-hosted:

License: $0 (Apache-2.0) [README]
Infrastructure: Four Docker containers. A T2 medium EC2 instance handles roughly 90% of typical workloads, capable of scaling to one million logs per day, according to the Helicone team [3]. That’s approximately $30–50/month on AWS, or under $10/month on Hetzner or Contabo for equivalent specs.
At enterprise scale (hundreds of thousands of requests per day): the Helm chart supports Aurora or dedicated ClickHouse clusters for horizontal scaling [3]. The Helicone team suggests this level of scaling becomes relevant around $100,000/month in model spend [3].

Comparison math: If you’re a solo founder running 50,000 LLM requests/month (a realistic number for a small production app with active users), that puts you comfortably in Pro territory on Helicone Cloud at $79/month. Self-hosted on a $8/month Hetzner VPS, that’s $8/month with no per-request charge. Over a year: Cloud ≈ $948, self-hosted ≈ $96. The $850 difference is real, but so is the operational overhead of maintaining four Docker containers on your own server.

Deployment reality check

Helicone’s self-hosting story improved substantially in May 2025 [3]. The original architecture required twelve separate containers with complex configuration — enough that at least one enterprise prospect walked rather than deploy it. After a month of focused engineering, the team rebuilt around four services: main application, ClickHouse for log storage, an auth container, and a mailer [3].

The current install path:

git clone https://github.com/Helicone/helicone.git
cd docker
cp .env.example .env
./helicone-compose.sh helicone up

The dashboard is at localhost:3000 [README][3].

What you actually need:

A Linux VPS with 2+ GB RAM (the team suggests T2 medium as a baseline) [3]
Docker and docker-compose installed
A domain and reverse proxy (Caddy or nginx) for HTTPS if you’re using it in production

What can go sideways:

ClickHouse is not a database most engineers have operated before. At small scale it’s invisible; at large scale (billions of logs) it requires tuning and dedicated infrastructure [3].
The Helm chart for enterprise-scale deployments requires contacting Helicone’s sales team for access — it’s not in the public repository [README].
Helicone recently joined Mintlify [homepage]. What this means for the self-hosted offering long-term is unknown. No official statement about continuation of open-source development has been published as of this review.
The Braintrust comparison [1] notes a structural risk: because the proxy sits in your request path, observability infrastructure downtime equals production LLM call failures. This is worth engineering around (Helicone supports async logging as a mitigation, but the proxy model still creates dependency).

Realistic setup time for a technical user: 20–30 minutes to a working local instance. With domain and HTTPS on a production VPS: 1–2 hours. For a non-technical founder following a guide: budget a half day and consider having someone deploy it once.

Pros and Cons

Pros

Genuinely one-line integration. Change the baseURL, add a header, done. No SDK to learn, no agent to install, no schema to define. Works with any OpenAI-compatible client [README][4].
Apache-2.0 license. No commercial restrictions, no fair-code asterisks, no license negotiation required to self-host or embed [README]. This matters for teams that need compliance clarity.
Edge-deployed gateway with ~8ms overhead. The new Rust-based AI Gateway runs on Cloudflare Workers, which means geographic distribution and no cold starts [5]. For latency-sensitive applications this is a meaningful advantage over centralized alternatives.
ClickHouse for log storage. Fast aggregation at scale. Even with millions of logs, dashboard filters return instantly [3].
Semantic caching. Not every observability tool ships a caching layer. Helicone’s caching can reduce costs and latency on repeated or similar prompts [4].
100+ models through one API. The gateway handles provider-specific authentication and format differences, so switching from GPT-4o to Claude Sonnet is a one-line model string change [README][4].
Self-hosting went from painful to reasonable. Four containers instead of twelve, three-line start command, no mandatory Supabase dependency [3]. The engineering investment here was real and the improvement is meaningful.

Cons

Proxy architecture creates reliability coupling. Your LLM calls go through Helicone. If Helicone is down, your calls fail. For production applications with SLA requirements, this is a genuine risk to architect around [1].
Blind spot on non-LLM steps. If your application does retrieval, tool calls, database queries, or multi-step agent logic, Helicone only shows you what passes through the proxy — the LLM calls. The surrounding context that actually explains why a request behaved a certain way remains invisible [1].
Limited enterprise governance. No SAML SSO below the Enterprise tier. Audit logs, advanced RBAC, and on-prem deployment all require contacting sales. If you’re evaluating this for a regulated industry, the self-hosted community edition won’t satisfy compliance requirements without purchasing the Enterprise tier [pricing page].
Volume-based pricing stacks. The Pro and Team tiers advertise usage-based pricing on top of the flat fee. At high request volumes, this can make the cost less predictable than it first appears [pricing page].
The Mintlify acquisition creates uncertainty. Helicone recently announced it’s joining Mintlify, a documentation company [homepage]. The strategic fit isn’t obvious, and there’s no published commitment to the open-source roadmap post-acquisition. For teams making a long-term infrastructure bet, this deserves a risk flag.
Evaluation and testing are still early. Scores, datasets, and fine-tuning integrations exist on paper, but the platform’s strength is logging and routing — not eval infrastructure. Teams that need structured evals with CI/CD integration should look at tools purpose-built for that workflow [1].

Who should use this / who shouldn’t

Use Helicone if:

You’re shipping a production LLM app and don’t know where your token spend is going. This is the primary problem it solves, and it solves it fast.
You want Apache-2.0 licensed self-hosted observability with no commercial restrictions.
Your application makes direct LLM API calls (single-turn or chat) and you need latency and cost visibility now.
You’re comfortable with Docker and want to run this on your own infrastructure to keep logs out of third-party hands.
Low integration friction matters — your team is moving fast and can’t afford a week to instrument observability properly.

Skip it (consider Braintrust or LangSmith) if:

You’re building complex multi-step agents where the LLM call is one step in a longer pipeline that includes retrieval, tool use, and business logic. You need traces that span the whole flow, not just the model call [1].
You need structured evaluations integrated into your CI/CD pipeline, with baseline comparisons and regression detection.
A proxy sitting in your LLM request path is architecturally unacceptable for your reliability requirements.

Skip it (consider Portkey) if:

You’re an enterprise team that needs mature RBAC, comprehensive audit trails, guardrails, and policy enforcement [5].
You’re integrating 250+ models with complex failover strategies and need a battle-tested platform with enterprise support contracts.

Skip it (stay unmonitored for now) if:

You have fewer than a few hundred LLM calls per day. The operational overhead of any observability layer isn’t worth it at that volume — just log to a file.

Alternatives worth considering

Braintrust — SDK-based tracing (not proxy-based), so your reliability doesn’t depend on their uptime. Stronger eval infrastructure. Better choice for teams that need structured testing as part of CI [1].
LangSmith (LangChain) — natural choice if you’re already in the LangChain ecosystem. Purpose-built for tracing chains and agent runs. Closed-source SaaS.
Portkey — more mature enterprise feature set (RBAC, guardrails, policy enforcement), broader model ecosystem, but centralized rather than edge-deployed [5].
Phoenix (Arize) — open-source observability with a focus on RAG and LLM evals. Good for teams that want structured evaluation workflows alongside logging.
OpenLLMetry / Traceloop — OpenTelemetry-based LLM observability. No proxy required; instruments at the SDK level. Good for teams already running OTel infrastructure.
Self-built logging — for simple applications with a single model provider, streaming logs to your own database with cost computed from provider pricing tables is often the right call at early scale.

Bottom line

Helicone is the fastest path from “I have no idea what my LLM app costs or how it performs” to “I have a dashboard with answers.” That problem is real, the integration is genuinely one line, and the Apache-2.0 license means you can self-host without legal review. For solo founders and small teams shipping quickly, that value is straightforward.

The trade-offs are also real: a proxy in your request path couples your reliability to Helicone’s uptime, and the observability blind spot on non-LLM application steps limits its usefulness as your agents grow in complexity. The Mintlify acquisition adds a question mark over the long-term open-source trajectory. If you need enterprise governance, sophisticated evals, or complete agent tracing, you’ll find the platform’s scope too narrow.

But if you’re an AI engineer who has been meaning to add request logging “eventually” — Helicone is the version of that task that takes 10 minutes instead of two weeks.

Sources

Braintrust — “Helicone alternative: Why Braintrust is the best pick” (October 28, 2025). https://www.braintrust.dev/articles/helicone-vs-braintrust
DEV Community (exemplar) — “AI Engineer’s Tool Review: Helicone”. https://dev.to/exemplar/ai-engineers-tool-review-helicone-55ff
Helicone Blog — “How We Simplified Helicone’s Self-Hosting in 30 Days” (May 7, 2025). https://www.helicone.ai/blog/self-hosting-journey
AI Agents List — “Helicone Review 2026 | AI Infrastructure & MLOps Tool”. https://aiagentslist.com/agents/helicone
TrueFoundry — “Helicone vs Portkey: Key Features, Pros, and Cons” (September 10, 2025). https://www.truefoundry.com/blog/helicone-vs-portkey

Primary sources:

GitHub repository and README: https://github.com/helicone/helicone (5,279 stars, Apache-2.0 license)
Official website: https://www.helicone.ai
Pricing page: https://www.helicone.ai/pricing
Documentation: https://docs.helicone.ai

Replaces

Related Monitoring & Observability Tools

View all 92 →

Firecrawl

94K

Turn websites into LLM-ready data — scrape, crawl, and extract structured content from any website as clean markdown, JSON, or screenshots.

monitoring AGPL-3.0

Uptime Kuma

84K

Fancy self-hosted uptime monitoring with 90+ notification services, status pages, and 20-second check intervals — the open-source UptimeRobot alternative.

monitoring MIT

Netdata

78K

Real-time infrastructure monitoring with per-second metrics, 800+ integrations, built-in ML anomaly detection, and AI troubleshooting — using just 5% CPU and 150MB RAM.

monitoring GPL-3.0

Elasticsearch

76K

The distributed search and analytics engine that powers search at Netflix, eBay, and Uber — sub-millisecond queries across billions of documents, with vector search built in for AI/RAG applications.

monitoring

Grafana

73K

The open-source observability platform for visualizing metrics, logs, and traces from Prometheus, Loki, Elasticsearch, and dozens more data sources.