Automated Browser-Based Crawling at Scale

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!

Start Crawling!
A screenshot of Browsertrix's Watch Crawl page with two crawl windows visible. The app is crawling Webrecorder's website.

Browser-Based Crawling

Define the scope of your crawl workflow with an extensive array of options. Watch crawls as they run in real time to diagnose issues and ensure you are capturing exactly the pages and content you want.

Signed, Sealed, Authenticated

Crawl outputs are digitally signed to ensure a provable chain of custody.

Always On Schedule

Schedule workflows to run crawls on a recurring basis and automatically collect snapshots of a website.

Browser Profiles

Sign into websites and archive them exactly as they appear when logged in.

Live Exclusion Editing

Stop runaway crawls from getting bogged down in crawler traps without restarting the entire crawl.

ArchiveWeb.page Integration

Send archived items directly to Browsertrix from the ArchiveWeb.page browser extension.

Create Complete Collections

Four different archived items in a list, three of them are checked and added to a collection, one item (an incomplete crawl that was stopped by the user) has been omitted.

Combine archived items created through automated crawling, ArchiveWeb.page, and other tools, for viewing and export.

Single Collaborative Workspace

Work together with colleagues to create, manage, and organize crawls.

Upload Existing Archives

Bring your existing WACZ files along!

In-App Browser-Based Replay

A screenshot of an archived item being viewed from within Browsertrix.

View archived webpages directly in the browser, exactly as they appeared when crawled.

Export to Standard Files

Export your collections to a single packaged WACZ file.

Embed In Your Own Content

Embed archives into your own content using ReplayWeb.page.

Browsertrix Cloud Hosting

Individual

$30 per month

Contact Sales
  • Community forum support
  • API access
  • Up to 2000 pages per crawl
  • 100GB base disk space
  • 1 concurrent crawl
  • 180 minutes of base crawling time

Pro

Pricing based on requirements

Schedule a Call
  • Dedicated support
  • API access
  • Increased crawl page limits
  • 500GB (minimum) base disk space
  • 2+ concurrent crawls
  • Increased base crawling time

Self Hosted

Get Started

Browse our deployment documentation to get started with your own instances of Browsertrix.

Browsertrix is open source software! Browse our source code, make your own updates, and submit changes on GitHub.


Get Help

Support contracts for self-hosted instances are available on a case-by-case basis.