ArchiveBox
Self-hosted web archiving tool that saves pages as HTML, PDF, screenshots, and WARC files from bookmarks, history, or RSS feeds.
Overview
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more… 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more… The project has 27K+ GitHub stars and is licensed under MIT.
Key Features
Source: GitHub README
- Free & open source, own your own data & maintain your privacy by self-hosting
- Powerful CLI with modular dependencies and support for Google Drive/NFS/SMB/S3/B2/etc.
- Comprehensive documentation, active development, and rich community
- Extracts a wide variety of content out-of-the-box: media (yt-dlp), articles (readability), code (git), etc.
- Supports scheduled/realtime importing from many types of sources
- Uses standard, durable, long-term formats like HTML, JSON, PDF, PNG, MP4, TXT, and WARC
- Saves all pages to archive.org as well by default for redundancy (can be disabled for local-only mode)
- Advanced users: support for archiving content requiring login/paywall/cookies (see wiki security caveats!)
Getting Started
Source: GitHub README
pip install archivebox mkdir -p ~/archivebox/data && cd ~/archivebox/data archivebox init —install
Normalized Features
Source: tool-features-normalized.json
apt, brew, desktop app, docker, docker compose, ldap, npm, pip, plugins, portainer, rest api, sqlite, sso, unraid, webhooks, yunohost.
Deploy
Features
Authentication & Access
- LDAP / Active Directory
- Single Sign-On (SSO)
Integrations & APIs
- Plugin / Extension System
- REST API
- Webhooks
Mobile & Desktop
- Desktop App
Related Archiving & Preservation Tools
View all 15 →CKAN
5KCKAN is a self-hosted archiving & preservation replacement for Socrata.
Wayback
2.2KFor archiving & preservation, Wayback is a self-hosted solution that provides toolkit for archiving webpages to the Internet Archive, archive.today, IPFS,...
Open Archiver
1.8KOpen Archiver lets you run email archiving solution with full-text search and eDiscovery search features entirely on your own server.
mail-archiver
1.7KMail-archiver is a C#-based application that provides web application for archiving.
Bichon
1.5KBichon lets you run lightweight e-mail archiver entirely on your own server.
Ganymede
928Ganymede is a Go-based application that provides twitch VOD and live stream archiving platform. Includes a rendered chat for each archive.