unsubbed.co

txtdot

Txtdot lets you run HTTP proxy that parses only text entirely on your own server.

Open-source web reader proxy, honestly reviewed. No marketing fluff, just what you get when you self-host it.

TL;DR

  • What it is: An MIT-licensed HTTP proxy that strips any web page down to text, links, and images — no ads, no trackers, no heavy JavaScript — and serves it back as a clean, minimal HTML page [README].
  • Who it’s for: Developers and privacy-conscious users on slow or metered connections who want a server-side “reader mode” they control. Not for non-technical founders looking for a SaaS replacement — this is a utility tool, not a platform.
  • Cost savings: No SaaS to replace. The closest commercial analog is an Instapaper or Pocket Premium subscription ($24–$45/yr). Self-hosted txtdot runs on any $5/mo VPS. The real savings are bandwidth: performance tests show Mobile PageSpeed jumping from 21% to 100% on proxied pages [README].
  • Key strength: Server-side processing means zero client JavaScript, which matters when you’re on 2G or behind a corporate firewall. The extensible plugin architecture (engines + middlewares) means you can write domain-specific parsers for sites where generic Readability fails [docs].
  • Key weakness: V1 development is formally discontinued. The project is rebuilding on a V2 branch with no stated release timeline. Only 206 GitHub stars. No active community of documented case studies. You’re self-hosting beta-adjacent software [README].

What is txtdot

txtdot is a server-side reader proxy. You give it a URL. It fetches the page, runs Mozilla’s Readability library against the HTML, strips everything except the useful content — text, links, images, tables — and returns a minimal HTML page with no scripts and no tracking [docs].

The architecture is deliberately straightforward: a Fastify web server receives requests, fetches the target page (with Axios), applies an engine (Readability by default, or a domain-specific parser if one exists), optionally runs middlewares (syntax highlighting, image compression via Sharp), and renders the result with EJS templates or returns raw JSON via /api/parse [docs][README].

What separates txtdot from a browser extension like Firefox Reader Mode is that the processing happens server-side. Your end device — phone on 3G, Raspberry Pi, old laptop — downloads a stripped HTML file rather than a full page with megabytes of tracking scripts and ad bundles. The official instance (txt.dc09.ru) demonstrates this: the project’s own benchmarks show a Medium article jumping from 44% PageSpeed on desktop and 36% on mobile to 100% in both cases after proxying [README].

The project lives under the TempoWorks GitHub organization, which appears to be a small independent team. The codebase is organized as a monorepo with three packages: the main server, @txtdot/sdk for writing plugins, and @txtdot/plugins for the bundled engine/middleware collection [README].

Important caveat before you commit to it: the README leads with a prominent warning — V1 development is discontinued and active work has moved to a V2 branch. What you’re deploying from main is a frozen, no-longer-actively-maintained codebase. V2 is under development but has no documented release date [README].


Why people choose it

Honest answer: there aren’t independent reviews of txtdot to synthesize. It has 206 stars and doesn’t appear in the review press. What exists is the project’s own documentation and the community of people who self-host similar tools.

The use case is a niche one: you want server-side reader mode rather than a browser extension. Why server-side?

  • Works on any client: dumb TV browser, terminal w3m, old phone without extension support, headless curl scripts — if it can make HTTP requests, it gets clean pages.
  • Bandwidth savings are real: the performance tests in the README aren’t marketing copy, they’re screenshots from PageSpeed Insights on Habr, Medium, and Nginx Blog. The worst result in the table is still 56% desktop → 99%, and mobile results improve from 21–36% to 100% every time [README].
  • No JavaScript required on the client: this matters for security-conscious deployments, IoT devices, and anyone who runs a locked-down browser profile.
  • Integrated search: the SearXNG integration means you can search the web and read results all through the same proxy interface without touching a JavaScript-heavy search frontend [README].
  • API-first: /api/parse returns structured JSON (title, lang, content), which means you can wire txtdot as a preprocessing step in any pipeline — feed it a URL, get clean article text, pass to an LLM, summarize [README][docs].

The awesome-selfhosted community [3] lists dozens of tools in the same general space (readers, proxies, archivers), which signals active demand for this category even if txtdot itself is a niche player within it.


Features

Based on the README and documentation:

Core proxy:

  • Server-side page simplification via Mozilla’s Readability.js [README]
  • Returns clean HTML (browser-renderable) or structured JSON via /api/parse [docs]
  • Pure HTML response via /api/raw-html for downstream processing [docs]
  • No client JavaScript — the result page is static HTML [README]
  • Material Design 3 UI (minimal, text-optimized) [README]

Media handling:

  • Image proxy — images are served through txtdot rather than directly from origin [README]
  • Image compression with Sharp — reduces image payload size [README]
  • Media proxy also prevents trackers embedded in image URLs from firing [docs]

JavaScript rendering:

  • Client-side app support via webder — can render Vanilla JS, React, Vue apps before extraction [README]
  • This matters for paywalled or JavaScript-gated content that Readability alone can’t handle

Search:

  • SearXNG integration for search within the txtdot interface [README]

Plugin system:

  • @txtdot/sdk for writing custom engines (domain-specific parsers) [README][docs]
  • @txtdot/plugins for bundled engines and middlewares [README]
  • Engine registration by domain pattern — e.g., all StackOverflow URLs use a custom engine instead of Readability [docs]
  • Middleware support with JSX (code highlighting, transforms, output post-processing) [docs]
  • Hot-reload support for local plugin development [README]

Deployment:

  • Docker + Docker Compose [README]
  • npm-based production build (npm run build && npm run start) [README]
  • Documented reverse proxy setup [docs]
  • Environment variable configuration [docs]

What’s missing:

  • No authentication or user accounts — it’s a public proxy by design
  • No read-later / bookmarking — it’s a real-time proxy, not an archiver
  • No browser extension — you manually paste URLs or build a redirect bookmarklet
  • Official instance is rate-limited to 2 requests/second [docs]

Pricing: SaaS vs self-hosted math

txtdot has no SaaS offering and no paid tier. The official instance at txt.dc09.ru is free but rate-limited.

Self-hosted:

  • Software: $0 (MIT license) [README]
  • VPS: $5–10/mo (any Linux server with Docker)
  • Domain (optional): $10–15/yr

What are you replacing?

If you’re using txtdot as a read-later replacement with a reading interface, the comparison is:

  • Pocket Premium: $4.99/mo ($60/yr)
  • Instapaper Premium: $2.99/mo ($36/yr)
  • Wallabag.it hosted: €9/yr

Self-hosted txtdot on a $5 VPS: $60/yr — not cheaper if you’re on a shared VPS just for this. But if you’re already running a homelab or VPS with other services, the marginal cost is near zero.

The more honest value proposition isn’t saving money against Pocket — it’s eliminating the dependency on a read-later service entirely, and gaining a scriptable API proxy that Pocket doesn’t offer.

If you’re using txtdot for bandwidth reduction (the stated primary use case), the comparison is your mobile data plan cost versus a small VPS. On a 2GB/month data cap with heavy article reading, proxying through txtdot could meaningfully extend how long that data lasts.


Deployment reality check

The documentation is organized and complete enough for a technical user. Four documented paths: npm dev, npm production build, Docker Compose, and reverse proxy setup [docs].

Docker Compose is the recommended path:

docker compose up -d

The Docker Compose setup is the fastest route to a running instance — no dependency management, no node version conflicts.

What you actually need:

  • A Linux server or VPS (512MB RAM is probably fine for low traffic)
  • Docker and docker-compose
  • Optionally: a domain and reverse proxy for HTTPS

What can go sideways:

The V1 freeze is the largest risk. The README states plainly: “V1 version development is discontinued. Check current development on v2 branch.” This means:

  • Security vulnerabilities discovered in V1 dependencies won’t get patches
  • The codebase uses older package versions (Axios, Fastify, etc.) that may have known CVEs by the time you deploy it
  • Feature requests and bug reports on V1 are dead ends

The webder dependency (for JavaScript rendering) is a separate project from the same organization — if webder has issues, JavaScript-heavy sites won’t proxy correctly, and there’s no upstream support for V1.

Realistic estimate for a technical user: 15–30 minutes to a running Docker instance. If you want HTTPS with a reverse proxy and your own domain, add another 30 minutes. No database, no external services required.

For a non-technical founder: this probably isn’t the right tool — it’s a developer utility, and there’s no managed hosting option or setup wizard.


Pros and Cons

Pros

  • MIT license — use it, fork it, embed it in your product, no commercial restrictions [README].
  • Genuinely useful performance gains. The PageSpeed screenshots in the README aren’t cherry-picked: every tested site improves dramatically on mobile, where slow connections make the difference most [README].
  • Plugin architecture is well-designed. The engine + middleware separation is clean. Writing a custom domain parser means you can handle sites that Readability butchers (paywalled content, dynamic JavaScript apps via webder) [docs].
  • API-first design. /api/parse returning JSON makes txtdot composable — it’s not just a browser tool but a preprocessing pipeline step for any content extraction workflow [docs].
  • No client JavaScript. The output works on dumb clients, in curl, in w3m, on IoT devices [README].
  • Integrated SearXNG search means the whole read-privacy-first workflow lives in one proxy [README].
  • Image compression via Sharp reduces payload further than just stripping scripts [README].

Cons

  • V1 is discontinued. This is the most important fact about txtdot today. You’re deploying software the maintainers no longer patch [README].
  • 206 stars. Small project, thin community, few documented real-world deployments. If something breaks, you’re debugging alone.
  • No authentication. Deploy this on a public IP without a firewall and anyone can use your proxy. Not a flaw for a personal homelab setup, but worth knowing.
  • No archiving. Txtdot is a real-time proxy, not a read-later archive. Close the tab and the reading session is gone. Wallabag or Shiori solve a different problem.
  • Official instance is rate-limited to 2 req/second [docs] — fine for personal use, painful if you’re trying to script batch processing.
  • JavaScript rendering requires webder as a separate service — not bundled, not documented exhaustively, adds operational complexity if you need it [README].
  • No extension or bookmarklet documented — you manually copy-paste URLs into the txtdot interface, which creates friction compared to a browser extension approach.
  • V2 timeline is unknown. Development is happening on the v2 branch but no release date is documented anywhere in the available sources.

Who should use this / who shouldn’t

Use txtdot if:

  • You’re a developer or sysadmin who wants a scriptable, API-accessible text extraction proxy you can compose into other tools.
  • You’re on a slow or metered connection and want a proxy that provably reduces page payload — the benchmarks back this up [README].
  • You’re building a content pipeline (scraper → clean text → LLM → output) and want a battle-tested Readability layer you host yourself instead of calling an external API.
  • You already run a VPS and the marginal cost is zero.
  • You understand that V1 is frozen and you’re willing to pin dependencies and accept the maintenance burden.

Skip it if:

  • You want a read-later service with bookmarks, tags, and sync across devices — use Wallabag instead.
  • You want active maintenance and security patching — wait for V2 or choose a more active project.
  • You’re non-technical and need a UI-first product — this isn’t that.
  • You want a browser extension that works with one click — install Firefox Reader Mode or uBlock Origin for the ad-blocking part.
  • You need it to reliably handle heavy JavaScript SPAs — the webder dependency for that use case adds complexity and is not well-documented for self-hosters.

Alternatives worth considering

  • Wallabag — the self-hosted read-later application. Adds bookmarking, tagging, offline sync, browser extensions. More feature-complete but heavier (PHP + database). Use if you want archiving, not just a real-time proxy.
  • Shiori — another self-hosted read-later tool, simpler than Wallabag, written in Go. Better maintained and more actively developed.
  • Outline — similar server-side reader proxy concept, also open-source. Fewer stars than Wallabag but more aligned with txtdot’s “proxy mode” use case.
  • Firefox Reader Mode / Safari Reader — free, zero maintenance, works on any page Reader Mode supports. Choose this if you only need personal browser reading, not a networked API.
  • Mercury Parser (Postlight) — open-source text extraction library, similar to Readability. You’d build the server yourself, but the extraction quality is comparable.
  • Diffbot — commercial API for structured content extraction. More powerful than Readability for complex sites, but $299+/mo. Only relevant if you’re processing at scale for a product.
  • 12ft.io / archive.ph — hosted services that do similar things. Zero maintenance, but you hand your URLs to a third party and have no control.

For the specific use case txtdot targets — self-hosted, bandwidth-conscious, API-accessible, no-JavaScript reading proxy — the realistic comparison is Wallabag (if you want archiving) or just running Readability directly in your own service (if you want control). Txtdot’s plugin architecture is genuinely better designed than rolling your own, which is its strongest argument against the DIY route.


Bottom line

txtdot does one thing and does it well: it proxies web pages through Readability server-side and returns clean, scriptable output. The performance numbers are real, the MIT license is clean, and the plugin architecture is more thoughtful than a project this small has any right to be. But the V1 freeze is a hard blocker for anyone who cares about long-term maintenance — this is software that its own authors consider superseded. If you deploy it today, you’re betting on V2 arriving before your dependencies rot. For a homelab utility where you’re comfortable pinning and auditing dependencies, that’s an acceptable bet. For anything production-facing or user-exposed, wait for V2 or pick Wallabag for archiving and Firefox Reader Mode for browser reading. The good news: if V2 ships and the plugin ecosystem carries over, this could be a genuinely useful tool in the self-hosted content pipeline space.


Sources

  1. txtdot README — TempoWorks GitHub. https://github.com/tempoworks/txtdot
  2. txtdot Documentation — Getting Started, Engines, Middlewares. https://tempoworks.github.io/documentation
  3. Awesome-Selfhosted — Docker platform listing. https://awesome-selfhosted.net/platforms/docker.html

Features

Integrations & APIs

  • Client SDKs
  • Plugin / Extension System
  • REST API

Media & Files

  • Image Processing