public class CountMatchesTagger extends AbstractStringTagger
Counts the number of matches of a given string (or string pattern) and store the resulting value in a field in the specified "toField".
If no "fromField" is specified, the document content will be used. If the "toField" already exists before counting begins, it will be overwritten with the result of the match count. If within this tagger the "toField" is repeated, the sum of all count will be added. If the fromField has multiple values, the total count of all matches will be stored as a single value.
Can be used as a pre-parse tagger on text document only when matching strings on document content, or both as a pre-parse or post-parse handler when the "fromField" is used.
<tagger class="com.norconex.importer.handler.tagger.impl.CountMatchesTagger" sourceCharset="(character encoding)" maxReadSize="(max characters to read at once)" > <restrictTo caseSensitive="[false|true]" field="(name of header/metadata field name to match)"> (regular expression of value to match) </restrictTo> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <countMatches fromField="(optional source field)" toField="(target field)" caseSensitive="[false|true]" regex="[false|true]"> (text to match or regular expression) </countMatches> <!-- multiple countMatches tags allowed --> </tagger>
The following will count the number of segments in a URL:
<tagger class="com.norconex.importer.handler.tagger.impl.CountMatchesTagger"> <countMatches fromField="document.reference" toField="urlSegmentCount" regex="true"> /[^/]+ </countMatches> </tagger>
Pattern
Modifier and Type | Class and Description |
---|---|
static class |
CountMatchesTagger.MatchDetails |
Constructor and Description |
---|
CountMatchesTagger() |
Modifier and Type | Method and Description |
---|---|
void |
addMatchDetails(CountMatchesTagger.MatchDetails matchDetails)
Adds a match details.
|
boolean |
equals(Object other) |
List<CountMatchesTagger.MatchDetails> |
getMatchesDetails() |
int |
hashCode() |
protected void |
loadStringTaggerFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
void |
removeMatchDetails(CountMatchesTagger.MatchDetails matchDetails) |
protected void |
saveStringTaggerToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
protected void |
tagStringContent(String reference,
StringBuilder content,
ImporterMetadata metadata,
boolean parsed,
int sectionIndex) |
String |
toString() |
getMaxReadSize, loadCharStreamTaggerFromXML, saveCharStreamTaggerToXML, setMaxReadSize, tagTextDocument
getSourceCharset, loadHandlerFromXML, saveHandlerToXML, setSourceCharset, tagApplicableDocument
tagDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
protected void tagStringContent(String reference, StringBuilder content, ImporterMetadata metadata, boolean parsed, int sectionIndex) throws ImporterHandlerException
tagStringContent
in class AbstractStringTagger
ImporterHandlerException
public List<CountMatchesTagger.MatchDetails> getMatchesDetails()
public void removeMatchDetails(CountMatchesTagger.MatchDetails matchDetails)
public void addMatchDetails(CountMatchesTagger.MatchDetails matchDetails)
matchDetails
- the match detailsprotected void loadStringTaggerFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
AbstractStringTagger
loadStringTaggerFromXML
in class AbstractStringTagger
xml
- xml configurationIOException
- could not load from XMLprotected void saveStringTaggerToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
AbstractStringTagger
saveStringTaggerToXML
in class AbstractStringTagger
writer
- the xml writerXMLStreamException
- could not save to XMLpublic boolean equals(Object other)
equals
in class AbstractStringTagger
public int hashCode()
hashCode
in class AbstractStringTagger
public String toString()
toString
in class AbstractStringTagger
Copyright © 2009–2021 Norconex Inc.. All rights reserved.