Posts tagged ‘HTTP Collector’

How to run Norconex Collector in Docker

Introduction Docker is popular because it makes it easy to package and deliver programs. This article will show you how to run the Java-based, open-source crawler, Norconex HTTP Collector and Elasticsearch Committer in Docker to crawl a website and index ... Read More...


Posted on February 10, 2018 by in Latest Articles


Norconex HTTP Collector 2.8.0 released

Norconex is proud to announce the release of Norconex HTTP Collector version 2.8.0.  This release is accompanied by new releases of many related Norconex open-source products (Filesystem Collector, Importer, Committers, etc.), and together they bring dozens of new features and ... Read More...


Posted on November 26, 2017 by in Latest Releases


Diagrams for Norconex Crawlers

Norconex just made it easier to understand the inner-workings of its crawlers by creating clickable flow diagrams. Those diagrams are now available as part of both the Norconex HTTP Collector and Norconex Filesystem Collector websites. Clicking on a shape will ... Read More...


Posted on May 15, 2017 by in Latest Articles


Indexing to an AWS CloudSearch Domain

Amazon Web Services (AWS) have been all the rage lately, used by many organizations, companies and even individuals. This rise in popularity can be attributed to the sheer number of services provided by AWS, such as Elastic Compute (EC2), Elastic ... Read More...


Posted on May 4, 2017 by in Latest Articles


Norconex HTTP and Filesystem Collector 2.7.0 released

Norconex released version 2.7.0 of both its HTTP Collector and Filesystem Collector.  This update, along with related component updates, introduces several interesting features. HTTP Collector changes The following items are specific to the HTTP Collector.  For changes applying to both the ... Read More...


Posted on April 26, 2017 by in Latest Releases


Norconex HTTP Collector 2.6.0 released

Norconex has released version 2.6.0 of its HTTP Collector web crawler! Among new features, an upgrade of its Importer module brings new document parsing and manipulating capabilities. Some of the changes highlighted here also benefit the Norconex Filesystem Collector. New ... Read More...


Posted on August 25, 2016 by in Latest Releases


Norconex HTTP Collector 2.5.0 released

Norconex has released Norconex HTTP Collector version 2.5.0! This new version of our open source web crawler was released to help minimize your re-crawling frequencies and download delays, and it allows you to specify a locale for date parsing/formatting. The ... Read More...


Posted on June 3, 2016 by in Latest Releases


Norconex HTTP Collector 2.2.0 Now Available

The latest release of Norconex HTTP Collector provides more content transformation capabilities, canonical URL support, increased stability, and more additional features.   As the Internet grows, so does the demand for better ways to extract and process web data. Several ... Read More...


Posted on July 22, 2015 by in Latest Releases


New release of Norconex Collectors

Optical character recognition (ORC), content translation, title generation, detection and text extraction from more file formats, are among the new features now part of your favorite crawlers: Norconex HTTP Collector 2.1.0 and Norconex Filesystem Collector 2.1.0. They are both available ... Read More...


Posted on April 8, 2015 by in Latest Releases


Create a website broken links checker

This tutorial will show you how to extend Norconex HTTP Collector using Java to create a link checker to ensure all URLs in your web pages are valid. The link checker will crawl your target site(s) and create a report ... Read More...


Posted on February 10, 2015 by in Latest Articles