cat page

System integration conceptDuring a recent client project, I was required to crawl several websites with specific requirements for each.  For example, one of the websites required:

  • to have a meta tag content be used as a URL replacement for the actual URL,
  • the header, footer and any repetitive content be excluded from each page,
  • to be able to ignore robots.txt since it is meant for external crawlers only (Google, Bing, etc.), and
  • to index them in LucidWorks.

LucidWorks built-in web crawler is based on Aperture.  It is great for basic web crawls, but I needed more advanced features that it could not provide.  I had to configure LucidWorks with an external crawler that had more advanced built-in capabilities and the ability to create new functionality.

(more…)

2013/06/05

Norconex HTTP CollectorSay hello to Norconex HTTP Collector!  At Norconex, we have always recognized the value open-source brings to software development, and to a greater extent, the world.   It benefits us when building custom solutions for our customers and ourselves.   As long-time consumers of open-source, it is time for us to give back.

As a result, Norconex is proud to announce open-sourcing of a handful of its libraries and products, so that the community can save time and money like it did for us.   The Norconex HTTP Collector is an HTTP Crawler meant to give the greatest flexibility possible for developers and integrators. (more…)

Over the last few weeks, I have had the opportunity to work and play around with the Google Search Appliance Version 7.0, and I must say that it’s an interesting piece of technology. Varying opinions, such as “Yeah, GSA is cool, but it can’t do X, Y, or Z,” have prevented many people from adopting the Google offering as a real Enterprise Search solution. Thanks to the new version, most of those limitations are a thing of the past. The platform is much more mature than previous versions. You can sense that Google is really making the effort required to improve its search platform so that it properly addresses the needs of the Enterprise market. It’s not just about web content anymore. (more…)

Over the past few years we’ve seen a drastic improvement in search interfaces.  The development of ugly search boxes and incomprehensible search result screens should now (hopefully) be a thing of the past.  New search UI’s are designed and developed daily, that are crisp, clean, well thought out, and really help guide users to the information they’re looking for.  Search based concepts and technologies are finally starting to be properly recognized as being as important as the content they’re put in place to help discover.  A clean, well thought out search user interface not only makes a large difference in helping users navigate site information, but also goes a long ways towards bringing users back for more. (more…)

Enterprise Search Costs Can Quickly GrowI’ll say it: enterprise search is expensive and takes time to do properly.   The bigger the number of different repository types, the more complex your architecture is, the more you have in-house solutions or archaic systems in place, the more it will cost you time and money.  Seems obvious doesn’t it?  Then why do so many enterprise search projects go over budget?   Possibly due to the fact that enterprise search is often treated as an “add-on” to an existing infrastructure, or even as a necessary evil in some cases.   Many will treat search implementations like any other “simple” project.  To them the requirements gathering, vendor evaluation, and procurement process are the most daunting tasks.   Actual implementation and ongoing maintenance is too often neglected. (more…)

Series Introduction

Regardless how many satisfied enterprise search customers we meet, it seems there are always “irritants”.   Many enterprise search customers have shared with us their frustrations towards their current search engine solution.  For some, frustration grows with their needs.  For others, it seems as though no search engine works quite like they want it to.  (more…)

Series Introduction

Over the past several years, we’ve helped clients design, implement and maintain unique search implementations. A natural side effect of our search implementation work has been exposure to hundreds, if not thousands, of different approaches to implementing various forms of search functionality. The organizations we’ve worked with usually have a pretty good understanding of how they want their search system to look and work. On the other hand, they have often not been exposed to possible search user interface search design techniques that could help them greatly improve their search. To help enterprise search customers understand different search user interface approaches, we’ve decided to start a series of articles showcasing well thought out search user interface component implementations. In each one of these articles we’ll showcase 5 different public search implementations, focusing on one search UI component. We’ll outline what we like about the UI approach, and how it helps improve the search experience. Search UI implementations will be highlighted from organizations varying in size and intended audience, and will in no way be exhaustive. These examples represent just a few of the many great implementations of a specific search UI component out there. (Note: This series is not intended to be a a showcase for Norconex accomplishments – we are intentionally not using search examples taken from sites or applications we’ve worked on. Please refer to our portfolio for implementations performed by Norconex.) (more…)

Enterprise Search Discusson GroupNorconex officially launches its Enterprise Search Discussion Group, based on the great Zendesk platform.   Reserved to federal employees, this discussion group is the place to share your enterprise search experiences and learn a lot from others.  Find out who else is responsible for enterprise search in other departments.  Are their challenges similar to yours?  Seek advices while helping others.   As an independent enterprise search company, Norconex will also be sharing its insights in order to add value to appropriate discussions.   There is a lot to talk about:

  • Search Vendor related questions
  • Infrastructure costs evaluation
  • Meta-data publishing strategies
  • Shared experiences
  • Design advices
  • Comparative reviews
  • Much more!

Register for free now!

This discussion group is no longer supported, for questions, please contact Norconex.

It is no secret that Norconex is very fond of search analytics.  While you may understand the importance of gathering good search metrics, it may be more difficult to grasp how a product such as Norconex’s very own Search Analytics would integrate with your enterprise search application.

While Norconex Search Analytics can be integrated with ANY enterprise search product (Autonomy, Endeca, Exalead, Solr, etc), in this post, we have chosen to demonstrate its integration with Microsoft FAST Search for SharePoint 2010 (more…)

Your organization has implemented a new search engine and things seem to be running well. Staff are able to search for and find documents that help them get their job done. Employee feedback has been positive, and people are overall quite happy with how the search implementation has turned out. Now that the first phase of your search overhaul has been completed however, what’s next? How about this: look into ways your search engine can be used to improve not only its document findings, but also the finding of human experts who can help provide you with appropriate answers.

At Norconex, we’ve been quite lucky to be able to work with numerous different types of clients, ranging in size from start-ups to Fortune 10 companies. One of the most interesting benefits of working with a wide variety of clients comes from observing problematic behaviour patterns within organizations. This post focuses on one of these patterns: not being able to find who has particular knowledge within your organization. You may not realize it, but the solution to this problem is likely closer than you think. (more…)