An Open-Source Crawler That Feeds an SQL Database

Norconex released an SQL Committer for its open-source crawlers (Norconex Collectors).  This enables you to store your crawled information into an SQL database of your choice.

To define an SQL database as your crawler’s target repository, follow these steps:

  1. Download the SQL Search Committer.
  2. Follow the install instructions.
  3. Add this minimalist configuration snippet to your Collector configuration file. It is using H2 database as an example only. Replace with your own settings:
    <committer class="com.norconex.committer.sql.SQLCommitter">
      <driverPath>/path/to/driver/h2.jar</driverPath>
      <driverClass>org.h2.Driver</driverClass>
      <connectionUrl>jdbc:h2:file:///path/to/db/h2</connectionUrl>
      <tableName>test_table</tableName>
      <createMissing>true</createMissing>
    </committer>
  4. Get familiar with additional Committer configuration options.  For instance, while the above example will create a table and fields for you, you can also use an existing table, or provide the CREATE statement used to create a table.

For further information:

Pascal Essiembre has been a successful Enterprise Application Developer for several years before founding Norconex in 2007 and remaining its president to this day. Pascal has been responsible for several successful Norconex enterprise search projects across North America. Pascal is also heading the Product Division of Norconex and leading Norconex Open-Source initiatives.