Norconex Importer 2.2.0 released

Posted on June 15, 2015 by in Latest Releases

This release of Norconex Importer brings many fixes, increased stability, and nice new features. The following highlights some of the additions with XML configuration or Java code samples.

Retrieve a document Length

Thanks to the new DocumentLengthTagger, you can now store a document byte length in a metadata field of your choice. The length can be obtained at any document processing stage.  For instance, it can be obtained before any transformation took place, or after it was parsed.


Add the current date to a document

The new CurrentDateTagger allows to add the current date to a metadata field and date format of your choice. This can be useful to indicate when a document was actually processed by the Importer.


Filter documents on numeric or date range

NumericMetadataFilter and DateMetadataFilter now allow you to filter documents based on metadata field numeric or date values, respectively. You can define both closed ranges and open-ended ranges.


Use external parsers

Wrapping a Tika class of the same name, the new ExternalParser allows Java programmers to point to external command-line applications to parse documents. One example can be for using “pdftotext” to parse PDFs instead of the default PDF parser based on PDFBox, which is much slower (but does a better job overall).


Other improvements

There are more changes under the hood, like upgrading to Apache Tika 1.8, as well as the fixing of OutOfMemory errors and document parsing sometimes never returning. You can find the complete list of changes in the release notes.

Several of these improvements were made possible thanks to the great feedback of the open-source community. Keep doing so: you make a difference.

Useful links


Pascal Essiembre has been a successful Enterprise Application Developer for several years before founding Norconex in 2007 and remaining its president to this day. Pascal has been responsible for several successful Norconex enterprise search projects across North America. Pascal is also heading the Product Division of Norconex and leading Norconex Open-Source initiatives.