Norconex Filesystem Collector

Open-Source Enterprise Filesystem Crawler

Getting Started Download 2.8.0

Crawl Filesystems

Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to network locations into various data repositories such as search engines.


Norconex Filesystem Collector shares common features with other Norconex Collectors. Find out about those here. The following is a non exhaustive list of features supported by the Norconex Filesystem Collector:

  • Multi-threaded
  • Extract text out of many file formats (HTML, PDF, Word, etc.)
  • Extract metadata associated with documents.
  • Language detection.
  • OCR support on images and PDFs.
  • Translation support.
  • Dynamic title generation.
  • Allow easy text transformation and metadata manipulation.
  • Filters unwanted document.
  • Detects modified and deleted documents.
  • Easy to add your own features.
  • Can be used command-line or embedded in your Java application.
  • Can treat embedded documents as distinct documents.
  • Can split a formatted document into multiple documents.
  • Can store crawled URLs in different database engines.
  • Fires crawler event types for custom listeners.
  • Date parsers/formatters to match your source/target repository dates.
  • Can create hierarchical fields.
  • Supports scripting languages for manipulating documents.
  • Reference XML/HTML elements using simple DOM tree navigation.
  • Extract document ACL from SMB/CIFS file systems.
  • Many others.

Supported Filesystems

  • Local files
  • HDFS
  • RAM
  • RES
  • SFTP
  • Temp
  • WebDAV
  • Zip, Jar and Tar

Latest news

Google Cloud Search Committer now available
Google contributes their own Committer for Google Cloud Search. More...

Norconex Neo4j Committer 1.0.0 released
New Committer for Neo4j graph database. More...

Norconex HTTP Collector 2.8.1 released
Maintenance release. Fixes and minor updates. More...

Norconex SQL Committer 2.0.0 released
More stable and flexible release. Better control of table/field creation. More...

Norconex Collector Core 1.9.1 released
Maintenance release. More...

Copyright © 2013-2019 Norconex Inc.