public class DocumentLengthTagger extends AbstractDocumentTagger
Adds the document length (i.e., number of bytes) to
the specified field
. The length is the document
content length as it is in its current processing stage. If for
instance you set this tagger after a transformer that modifies the content,
the obtained length will be for the modified content, and not the
original length. To obtain a document's length before any modification
was made to it, use this tagger as one of the first
handler in your pre-parse handlers.
If field
already has one or more values,
the length will be added to the list of
existing values, unless "overwrite" is set to true
.
Can be used both as a pre-parse or post-parse handler.
<tagger class="com.norconex.importer.handler.tagger.impl.DocumentLengthTagger" field="(mandatory target field)" overwrite="[false|true]" > <restrictTo caseSensitive="[false|true]" field="(name of header/metadata field name to match)"> (regular expression of value to match) </restrictTo> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> </tagger>
The following stores the document lenght into a "docSize" field.
<tagger class="com.norconex.importer.handler.tagger.impl.DocumentLengthTagger" field="docSize" />
Constructor and Description |
---|
DocumentLengthTagger() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
String |
getField() |
int |
hashCode() |
boolean |
isOverwrite() |
protected void |
loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml)
Loads configuration settings specific to the implementing class.
|
protected void |
saveHandlerToXML(EnhancedXMLStreamWriter writer)
Saves configuration settings specific to the implementing class.
|
void |
setField(String field) |
void |
setOverwrite(boolean overwrite) |
protected void |
tagApplicableDocument(String reference,
InputStream document,
ImporterMetadata metadata,
boolean parsed) |
String |
toString() |
tagDocument
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
protected void tagApplicableDocument(String reference, InputStream document, ImporterMetadata metadata, boolean parsed) throws ImporterHandlerException
tagApplicableDocument
in class AbstractDocumentTagger
ImporterHandlerException
public String getField()
public void setField(String field)
public boolean isOverwrite()
public void setOverwrite(boolean overwrite)
protected void loadHandlerFromXML(org.apache.commons.configuration.XMLConfiguration xml) throws IOException
AbstractImporterHandler
loadHandlerFromXML
in class AbstractImporterHandler
xml
- xml configurationIOException
- could not load from XMLprotected void saveHandlerToXML(EnhancedXMLStreamWriter writer) throws XMLStreamException
AbstractImporterHandler
saveHandlerToXML
in class AbstractImporterHandler
writer
- the xml writerXMLStreamException
- could not save to XMLpublic String toString()
toString
in class AbstractImporterHandler
public boolean equals(Object other)
equals
in class AbstractImporterHandler
public int hashCode()
hashCode
in class AbstractImporterHandler
Copyright © 2009–2021 Norconex Inc.. All rights reserved.