Frequently Asked Questions

What HTML should I send?

Send the complete HTML source of the page containing the article. Include the full <html> document with <head> and <body> sections. Avoid sending truncated or partial HTML.

What does the API return?

The API returns JSON with the extracted title and clean article content as plain text, with all navigation, ads, and boilerplate removed. Optional fields include excerpt, author, publishedAt, and language.

Does it work with all websites?

The API works well with many article-based websites, but extraction quality varies by site structure. Well-structured HTML with semantic markup typically yields better results. We recommend testing with your target sites.

How accurate is the extraction?

Extraction accuracy depends on the HTML structure. Well-structured articles with semantic HTML (using <article>, <main>, proper headings) typically yield better results. The API uses enhanced Readability algorithms and AI processing to improve accuracy.

What is the maximum HTML size?

The maximum HTML size per request is 2MB. If your HTML is larger, consider preprocessing or splitting the content.

How do I handle errors?

Check the HTTP status code and error message in the response. 401 means invalid token, 402 means quota exceeded, 400 means invalid request format. See the errors documentation for details.

Can I use this for web scraping?

This API processes HTML you provide - it does not fetch URLs or crawl websites. You need to obtain HTML through other means (respecting robots.txt and terms of service). The API extracts content from HTML you already have.

Do credits expire?

No, credits never expire. Once purchased, credits remain available until used.

What counts as a request?

Each API call to /v1/extract counts as one request, regardless of HTML size (up to 2MB limit) or extraction success/failure. Failed requests (4xx/5xx errors) don't consume credits.

Can I use this for commercial purposes?

Yes, you can use the API for commercial purposes. Make sure you have rights to process the content you're extracting and comply with applicable terms of service and laws.