Norconex Importer

Open-Source document text extractor and transformer

Getting Started Download 2.9.0

Content importer

Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a computer file as plain text, whatever its native format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before importing/using it in your own service or application.

A typical but not limited usage is to “import” crawled content for use by a search engine. We invite you to consider one of Norconex Collectors for this purpose (which rely on Norconex Importer).

Have a look at the supported file formats.

Latest news

Google Cloud Search Committer now available
Google contributes their own Committer for Google Cloud Search. More...

Norconex Neo4j Committer 1.0.0 released
New Committer for Neo4j graph database. More...

Norconex HTTP Collector 2.8.1 released
Maintenance release. Fixes and minor updates. More...

Norconex SQL Committer 2.0.0 released
More stable and flexible release. Better control of table/field creation. More...

Norconex Collector Core 1.9.1 released
Maintenance release. More...

Copyright © 2013-2019 Norconex Inc.