How It Works
Two-Stage Extraction
Our content extraction uses a sophisticated two-stage process to ensure the cleanest possible output:
- Stage A - Readability: We use Mozilla's Readability algorithm to extract the main content, removing navigation, headers, footers, and other boilerplate.
- Stage B - LLM Refinement: An AI model further cleans the content, removing cookie notices, share buttons, related posts, and any remaining boilerplate.
Getting Started
- Create an account and log in
- Generate an API token in your dashboard
- Send HTML to our /v1/extract endpoint
- Receive clean, structured JSON
Free Tier
Every account gets one free token with 100 requests per day. Perfect for testing and small projects. No credit card required.