Norconex Importer

Open-Source document text extractor and transformer

Getting Started Download 2.9.0

Content importer

Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a computer file as plain text, whatever its native format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before importing/using it in your own service or application.

A typical but not limited usage is to “import” crawled content for use by a search engine. We invite you to consider one of Norconex Collectors for this purpose (which rely on Norconex Importer).

Have a look at the supported file formats.

Latest news

Norconex Neo4j Committer 1.0.0 released
2019-01-09
New Committer for Neo4j graph database. More...

Norconex HTTP Collector 2.8.1 released
2018-08-17
Maintenance release. Fixes and minor updates. More...

Norconex SQL Committer 2.0.0 released
2018-08-16
More stable and flexible release. Better control of table/field creation. More...

Norconex Collector Core 1.9.1 released
2018-07-29
Maintenance release. More...

Norconex Importer 2.9.0 released
2018-06-17
New PDFPageSplitter and a few fixes. More...

Copyright © 2013-2019 Norconex Inc.