How It Works

Two-Stage Extraction

Our content extraction uses a sophisticated two-stage process to ensure the cleanest possible output:

  1. Stage A - Readability: We use Mozilla's Readability algorithm to extract the main content, removing navigation, headers, footers, and other boilerplate.
  2. Stage B - LLM Refinement: An AI model further cleans the content, removing cookie notices, share buttons, related posts, and any remaining boilerplate.

Getting Started

  1. Create an account and log in
  2. Generate an API token in your dashboard
  3. Send HTML to our /v1/extract endpoint
  4. Receive clean, structured JSON

Free Tier

Every account gets one free token with 100 requests per day. Perfect for testing and small projects. No credit card required.