After attending the Enterprise Search Summit in New York City on May 13 and 14, 2014 (#ESDNY), one theme became very evident: enterprise searches should no longer be considered simply a means to enable users to find documents, (more…)

Norconex HTTP Collector 1.3Release 1.3 of Norconex HTTP Collector is now available.  Among new features added to our open-source web crawler, you can expect the following:

  • Now supports NTLM authentication. Experimental support added for SPNEGO and Kerberos.
  • Document checksums are added to each document metadata.
  • Refactoring of HTTPClient creation with many new configuration options added (connection timeout, charset, maximum redirects, and several more).
  • Can optionally trust all SSL certificate now.
  • Integrates new features of Norconex Importer 1.2.0 such as support for WordPerfect document parsing, new filter and transformers, etc.
  • Integrates new features of Norconex Committer 1.2.0 such as defining multiple committers, retrying upon commit failure, etc.
  • Other third-party library upgrades.

Download it now!

UpgradeNorconex Importer 1.2.0 was just released along with a new website for it.

New features:

  • Now support text extraction from WordPerfect documents.
  • New transformer to reduce consecutive instances of the same string to only one instance.
  • New transformer to perform search and replace on document content using regular expression.
  • New filter to exclude/include documents with no data for one or more specified metadata properties.
  • Now attempts to detect the character encoding from a character stream by looking at a Content-Type metadata. If none is present, defaults to UTF-8.

Download it now!

Web site: /collectors/importer/

 

UpgradeNorconex Committer and all is current concrete implementations (Solr, Elasticsearch, IDOL) have been upgraded and have seen a redesign of their web sites.  Committers are libraries responsible for posting data to various repositories (typically search engines).  They are in other products or projects, such as Norconex HTTP Collector. (more…)

Norconex Commons Lang 1.3.0 was just released along with a new website for it.

New features:

  • New YearMonthDay class for a local date without time.
  • New YearMonthDayInterval class for a local date range without time.
  • New FileMonitor and IFileChangeListener to be notified of file changes.
  • New methods on FileUtil to visit empty directories or delete empty directories older than a date.

Grab it while it is still warm!

Web site: /product/commons-lang/

Happy coding!

Responsive DesignAs a Search Expert at Norconex, I am often assigned the task of integrating web accessibility standards within a search user interface for our customers in the government sector; these customers look to web accessibility to improve the overall search experience of their current enterprise’s search offering. And, of course, we have noticed a growing trend in the desire for search user interfaces that can respond to various screen sizes on various devices.

This post will look at the strategies I have used to design a web accessible search user interface and present why it is equally important to tackle a responsive web design.  To illustrate an accessible response web design, I will demonstrate the Elections.ca website project that I have recently worked on with the Google Search Appliance (GSA). (more…)

HP Autonomy users, take control over your web crawling. Norconex recently released an HP Autonomy IDOL Committer module for its open-source web crawler, Norconex HTTP Collector. You can now enjoy the features of Norconex crawler and experience the freedom of open-source when crawling your sites for indexing into IDOL. (more…)

Apache Lucene Web Site

For a client last year, we had to upgrade some old Lucene code to Lucene 4. Lucene 4 was a rather large release and there are many aspects to be aware when upgrading non trivial code. Let’s take a look at some of them.

(more…)

Norconex HTTP CollectorNorconex just released version 1.2 of Norconex HTTP Collector, its open-source web crawler.  Along with it comes a complete product web site redesign and a new logo: a lovely web crawling spider wearing a Norconex hat.

Some changes in this feature release: (more…)