Powerful Powerful

Tested on many sites with millions of pages. Crawl what you want, where you want. Manipulate extracted content at will. Get it now

Open Source Open Source

Licensed under GNU General Public License v3.0 and maintained on GitHub.
Community and commercial support available.

Easy to Run Easy to Run

Fully documentated and works out-of-the-box with sample configurations you can modify to suit your needs. Get started!

Flexible Flexible

Integrates with virtually any search engines (or else). Easily add new functionalities, or replace existing ones. See for yourself

Cross-Platform Cross-Platform

100% Java-based. Runs anywhere. Test on an operating system (e.g. Windows) and deploy to another one (e.g. Linux).

Feature-Rich Feature-Rich

From supporting robot rules to detecting document deletions, you will be pleased with its long list of features.

Back to top