unsubbed.co

Apache Solr

Apache Solr lets you run enterprise search platform featuring full-text search entirely on your own server.

Open-source enterprise search, honestly reviewed. Built on Lucene, battle-tested at scale, and not for the faint-hearted.

TL;DR

  • What it is: Apache-2.0 licensed full-text, vector, and geospatial search platform built on Apache Lucene — the same indexing engine under Elasticsearch [README].
  • Who it’s for: Engineering teams building search into production applications — e-commerce, content portals, enterprise document search. Not a tool for non-technical founders who need search in an afternoon [1][4].
  • Cost savings: Algolia’s paid plans start around $0.50 per 1,000 operations and climb fast — a site doing 10M searches/month sits comfortably in the $500–$2,000/mo range. Self-hosted Solr on a $20–40/mo VPS handles the same load at infrastructure cost only [README].
  • Key strength: Truly Apache-licensed (not “fair-code,” not BSL), mature SolrCloud for distributed deployment, and a feature surface — faceted search, highlighting, spell-check, geospatial, vector — that most hosted search products charge a premium for [1][4][README].
  • Key weakness: The learning curve is real and steep. Schema definition, query tuning, and cluster management all require hands-on expertise. It’s not a tool you configure in a YAML file and call done [4][5].

What is Apache Solr

Apache Solr is an open-source search server. You index documents into it (via HTTP/JSON, XML, CSV, or a data import handler), and it returns ranked, faceted, highlighted results against full-text queries — plus vector similarity search and geospatial queries as of the 10.x release [README][website].

The project has been running since 2004, donated to Apache by CNET Networks. It predates Elasticsearch by six years. Both tools are built on top of Apache Lucene — the Java inverted-index library that is, in some sense, the actual engine for most serious search deployments in the world. What Solr and Elasticsearch both do is wrap Lucene with a REST API, distributed coordination, and operational tooling [4][5].

As of this review, Apache Solr 10.0.0 is the current stable release (shipped March 3, 2026). The GitHub repo sits at 1,592 stars — a number that dramatically understates the project’s real usage, since Solr predates GitHub’s star culture and most major deployments are corporate rather than hobbyist [README][website].

The official description from their own homepage says it best: “Solr powers the search and navigation features of many of the world’s largest internet sites” [website]. That’s not marketing copy — Netflix, eBay, and major retail and media companies have run Solr at scale for over a decade.


Why people choose it over Elasticsearch, Algolia, and hosted alternatives

The comparison that matters for anyone reading this is Solr vs Elasticsearch and Solr vs Algolia. The SaaSHub summary [1] and the Appmus head-to-head comparisons [4][5] land in broadly the same place.

Versus Elasticsearch. Both tools sit on top of Lucene. Historically, Solr was the first serious Lucene wrapper and was dominant in enterprise search until Elasticsearch shipped in 2010 with a more developer-friendly REST API and won the startup/cloud-native crowd. Today, the capabilities are largely equivalent. The practical differences:

  • License: Solr is Apache-2.0, full stop. Elasticsearch changed its license to SSPL (Server Side Public License) in 2021, which restricts commercial cloud hosting — that’s why AWS had to fork it into OpenSearch. If you need a license you can use inside a SaaS product without legal review, Solr is cleaner [4].
  • Operational model: Solr’s distributed mode is called SolrCloud, which uses Apache ZooKeeper for cluster coordination. Elasticsearch manages its own cluster state without an external coordinator. Many teams find Elasticsearch’s self-contained clustering simpler to operate; Solr’s ZooKeeper dependency is a genuine operational tax [4][5].
  • Query interface: Elasticsearch’s query DSL is JSON-native and has become the de facto standard for search APIs. Solr’s query interface is powerful but older in design — Lucene query syntax mixed with Solr-specific parameters. Newer Solr releases have improved this, but Elasticsearch wins on developer ergonomics [5].
  • Community momentum: Elasticsearch’s community, documentation, and third-party tooling have outpaced Solr’s since roughly 2015. If you’re hiring an engineer who knows “search,” odds are they know Elasticsearch first [4][5].

Versus OpenSearch. AWS OpenSearch is the Elasticsearch fork, Apache-2.0, maintained by Amazon. It’s the closest direct competitor to Solr on the license dimension. OpenSearch has better AWS integration, a modern UI (OpenSearch Dashboards), and is the default choice for teams already in the AWS ecosystem. Solr has no cloud provider preference and is vendor-neutral.

Versus Algolia. Algolia is a hosted, developer-focused search API — you push documents, it handles everything else. The trade-off is cost and data sovereignty. Algolia’s pricing is operation-based: their Search plan starts with a free tier for small usage and moves to per-operation billing that becomes significant at scale. A high-traffic site doing 10M+ search requests/month can easily hit $1,000–$2,000/mo or more. Self-hosted Solr handling the same traffic on a $30–50/mo VPS saves real money — at the cost of engineering time to set it up, tune it, and keep it running [1].

The SaaSHub analysis [1] notes: “Apache Solr is recommended for organizations that need to implement powerful search capabilities, especially those managing large, complex datasets.” That’s an accurate but incomplete description — the key missing word is engineering-capable organizations.


Features

What Solr actually ships with, based on the README and website documentation:

Core search:

  • Full-text search with Lucene’s query parser — phrase, proximity, wildcard, fuzzy, range queries [README][1]
  • Faceted search (counts by category, range, date) — one of Solr’s historically strongest features [1][4]
  • Spell checking and autocomplete (suggest component) [1][4]
  • Result highlighting with configurable snippets [4]
  • Multi-index search across collections [1]
  • Near real-time indexing — documents are searchable within a second of being committed [4]
  • Geospatial queries (distance, bounding box, polygon) [README]

Vector and AI search:

  • Dense vector search (approximate nearest neighbor, HNSW-backed) — added in recent major releases [README]
  • Hybrid search combining full-text BM25 ranking with vector similarity [README]
  • This positions Solr for RAG pipelines and semantic search without bolt-on tooling [README]

Distributed operation (SolrCloud):

  • Distributed indexing and querying across a cluster of nodes [website][1]
  • Replication and load-balanced querying [website]
  • Automated failover and recovery [website]
  • Centralized configuration management via ZooKeeper [2][website]
  • The Solr Operator enables running SolrCloud on Kubernetes with the same operational model [README][website]

Data ingestion:

  • Data Import Handler: pull data from RDBMS, LDAP, XML/RSS feeds [1]
  • POST tool for JSON, XML, CSV, rich documents (PDF, Word, HTML via Apache Tika) [README]
  • REST-like JSON API for programmatic indexing [1][4]

Admin and operations:

  • Web-based Admin UI at localhost:8983/solr/ for schema inspection, query testing, core management [README]
  • JMX metrics for integration with Prometheus/Grafana
  • bin/solr script for service lifecycle management [2]
  • Docker official image; Helm charts for Kubernetes via the Solr Operator [README][website]

Pricing: SaaS vs self-hosted math

Solr itself has no pricing — it’s Apache-2.0 open-source software with no vendor, no cloud tier, no feature gates [README]. The cost calculation is infrastructure-only.

What self-hosting Solr actually costs:

  • A single-node development/small-production instance: 2–4 vCPU, 4–8 GB RAM minimum — a $15–25/mo VPS on Hetzner, OVH, or DigitalOcean
  • A SolrCloud cluster (3 nodes minimum for HA, each with 4–8 GB RAM): $50–150/mo depending on provider and data volume
  • Your engineering time — this is where self-hosting gets expensive if you’re not already a search practitioner

Algolia for comparison (the hosted alternative for production sites):

  • Free tier: limited operations, good for prototyping
  • Build plan: usage-based, $0.50/1k search operations
  • At 10M searches/month: ~$5,000/mo (operations + records)
  • Enterprise: pricing by negotiation

Elastic Cloud (Elasticsearch-as-a-service) for comparison:

  • Development: starts ~$16/mo (0.5 GB RAM)
  • Production-ready (8 GB RAM, 3 zones): $200–400/mo
  • Enterprise features (security, ML): higher tiers

Concrete savings math: An e-commerce site with 5M product searches/month on Algolia sits around $2,500/mo on the operations plan. A 3-node SolrCloud on Hetzner (3 × $25/mo servers) handles comparable load for $75/mo. That’s $2,425/month or roughly $29,000/year — but this math only holds if you have an engineer to run it. A one-time setup engagement plus basic ops documentation changes the math entirely.


Deployment reality check

This is where Solr demands an honest reckoning. The official deployment guide [2] walks through installing Solr as a Linux service — and it’s a real walkthrough, not a two-command installer.

What the install actually involves [2]:

  • Download the distribution tarball
  • Extract the service installation script and run it as root
  • The script creates a solr system user, installs to /opt/solr, writes data to /var/solr
  • Configure Java heap in solr.in.sh (Solr is a JVM application — you need to size it correctly)
  • For SolrCloud: install and configure ZooKeeper separately, or use the embedded ZooKeeper for single-node testing only
  • Set up a reverse proxy (nginx/Caddy) for HTTPS access
  • Create and configure your first collection with a schema (or use schemaless mode with trade-offs)

What can go sideways:

  • Schema pain: Solr in schemaless mode is forgiving to start but produces poorly typed fields that degrade relevance. Proper schema design — field types, tokenizers, analyzers — requires knowing what you’re doing [4][5].
  • ZooKeeper dependency: SolrCloud requires a ZooKeeper ensemble (typically 3 nodes). That’s three more processes to run, monitor, and keep alive. Elasticsearch eliminated this dependency in version 8; Solr has not [4].
  • Java heap tuning: Solr’s performance degrades sharply if the JVM heap is under-sized or misconfigured. Out-of-the-box defaults are not production-ready [2].
  • Query relevance tuning: Getting search results to feel good — not just technically correct — requires iterating on field boosts, query parsers, and sometimes custom Lucene analyzers. This is work [4][5].
  • Security is not default-on: Pre-9.x Solr had no authentication enabled by default. Solr 9+ ships with basic auth turned on, but proper TLS, network isolation, and authentication still require deliberate configuration [2].

Realistic time estimates: A technical developer who knows Docker and Linux: 2–4 hours to a single-node instance answering queries. A production SolrCloud cluster with proper schema, security, and monitoring: 1–2 weeks including testing. For a non-technical founder with no search background: this is not a weekend project. Hire someone.


Pros and Cons

Pros

  • Genuinely Apache-2.0 licensed. No BSL, no SSPL, no “fair-code” asterisks. You can embed it in your SaaS, run it in a commercial product, or build a managed hosting business on it — no vendor conversation required [README][1].
  • Multi-modal search in one package. Full-text + vector + geospatial from a single deployment. Competitors often require bolt-on services or separate tiers for vector search [README].
  • SolrCloud is production-proven. Distributed indexing, replication, failover, and load-balanced querying have been running at enterprise scale for over a decade [website][1].
  • Rich feature set out of the box. Faceted search, highlighting, spell-check, auto-suggest, data import handlers — features Algolia charges for, Solr ships with [1][4].
  • Strong documentation. The Solr Reference Guide is comprehensive and well-maintained through major versions [2][1].
  • Kubernetes-native with Solr Operator. Official Helm-based operator for running SolrCloud on Kubernetes — not an afterthought, an officially supported deployment path [README][website].
  • No per-query cost. Unlike hosted search APIs, there’s no meter running. Bulk re-indexing, high-volume autocomplete, analytics queries — all zero marginal cost [README].

Cons

  • Steep learning curve. Schema definition, query tuning, ZooKeeper management, JVM sizing — these aren’t optional. They determine whether your search works well or just technically works [4][5].
  • ZooKeeper dependency for clustering. SolrCloud requires a separate ZooKeeper ensemble. That’s more operational surface area than Elasticsearch’s self-managed cluster state [4].
  • Lost community momentum. Elasticsearch dominates the search job market and tooling ecosystem. Most monitoring integrations, tutorials, and StackOverflow answers target Elasticsearch first. Solr is second [4][5].
  • Older query interface. Solr’s URL-based query parameter syntax is less ergonomic than Elasticsearch’s JSON DSL for complex query composition. It works, but it shows its age [5].
  • Resource-intensive. JVM-based, requires tuning for production performance. A small Solr instance that “works” in development may fall over under production load without proper heap and cache configuration [4][2].
  • Debugging is genuinely hard. Complex relevance issues, performance bottlenecks, and SolrCloud split-brain scenarios are difficult to diagnose without search engineering experience [4][5].
  • No hosted Solr-as-a-service. If you want managed Solr, Lucidworks Fusion is the enterprise offering [3] — expensive and aimed at large organizations. There’s no equivalent of Elastic Cloud or Algolia for Solr. You either self-host or go with an alternative.

Who should use this / who shouldn’t

Use Apache Solr if:

  • You’re an engineering team building search into a product that will do significant query volume, and the Algolia/Elastic Cloud bill is becoming a line item you can justify eliminating.
  • You need Apache-2.0 licensing specifically — building a SaaS product, reselling a solution, or operating under legal constraints that make SSPL/BSL problematic.
  • You’re already in a Java-heavy stack and Solr’s JVM-native deployment fits your operational tooling.
  • You need SolrCloud’s battle-tested distributed search for large document collections and you have engineers who know how to run it.
  • You want multi-modal search (text + vectors + geospatial) without paying a vendor per-feature.

Skip it (pick OpenSearch or Elasticsearch) if:

  • You want the same Lucene-based power but better developer ergonomics, a modern JSON query DSL, and a larger community of documentation and tooling.
  • You’re already in the AWS ecosystem — OpenSearch integrates natively with IAM, CloudWatch, and other AWS services.
  • Your team knows Elasticsearch from prior work. The switching cost doesn’t justify it unless the license matters to you.

Skip it (pick Typesense or Meilisearch) if:

  • You need search that a non-technical person can configure, tune, and maintain.
  • You’re a smaller team building a product with a standard search-as-you-type use case and you don’t need faceting, geospatial, or vector search.
  • You want fast setup, sensible defaults, and relevance that works without a week of tuning.

Skip it (stay on Algolia) if:

  • Your search volume is low enough that Algolia’s free or entry-tier pricing covers you.
  • You don’t have engineering time to operate infrastructure, and the SaaS cost is cheaper than the engineering cost of self-hosting.
  • Your team needs managed uptime, compliance, and zero operational overhead.

Alternatives worth considering

  • Elasticsearch — the most direct competitor. More developer-friendly API, larger community, SSPL license (check if that matters for your use case). The obvious first comparison [4][5].
  • OpenSearch — Elasticsearch fork, Apache-2.0, maintained by AWS. Better choice than Elasticsearch if you care about licensing and are in AWS. Roughly equivalent feature set.
  • Typesense — simpler, faster to deploy, better defaults for typical application search. Worse at complex enterprise scenarios but excellent for product search, documentation search, and autocomplete.
  • Meilisearch — developer-friendly, fast, excellent defaults. Apache-2.0. Better for smaller datasets and teams new to search infrastructure.
  • Algolia — hosted, zero ops, per-operation pricing. The right answer when engineering time is the scarce resource and search volume is moderate.
  • Manticore Search — MySQL-compatible alternative for teams coming from a relational background.
  • Zinc/Zincsearch — lightweight Elasticsearch-compatible alternative with much lower resource requirements.

For a non-technical founder, the realistic shortlist is Algolia vs Meilisearch vs Typesense — Solr doesn’t belong on that list. For an engineering team with scale requirements and a license constraint, the shortlist is Solr vs OpenSearch.


Bottom line

Apache Solr is what enterprise search looked like before search became a SaaS product — and that’s both its strength and its limitation. It’s genuinely capable, genuinely free (Apache-2.0, no catches), and battle-tested at traffic levels that would bankrupt most Algolia budgets. But it demands real engineering investment: schema design, JVM tuning, ZooKeeper operations, and query relevance work that doesn’t happen automatically. The teams that should seriously evaluate Solr are those building search as a product feature at scale, with engineers who either know search infrastructure or are willing to learn it, and a license requirement that rules out Elasticsearch’s SSPL. For everyone else — especially non-technical teams evaluating self-hosted options to cut a SaaS bill — Meilisearch or Typesense will get you 90% of the results with 10% of the operational complexity. Solr earns its place in serious production deployments. It just doesn’t earn a spot in the “set it up on a Sunday afternoon” category.


Sources

  1. SaaSHub — Apache Solr Reviews and Details. https://www.saashub.com/apache-solr
  2. Apache Solr Reference Guide — Taking Solr to Production (v10.0). https://solr.apache.org/guide/solr/latest/deployment-guide/taking-solr-to-production.html
  3. Lucidworks — Apache Solr Support Policy (updated January 2024). https://doc.lucidworks.com/docs/policies/apache-solr-support-policy
  4. Appmus — Apache Solr vs Elasticsearch Comparison (2026). https://appmus.com/vs/apache-solr-vs-elasticsearch
  5. Appmus — Elasticsearch vs Apache Solr Comparison (2026). https://appmus.com/vs/elasticsearch-vs-apache-solr

Primary sources: