Save This Page
Home » lucene-3.0.1-src » org.apache » lucene » analysis » [javadoc | source]
org.apache.lucene.analysis
abstract public class: CharTokenizer [javadoc | source]
java.lang.Object
   org.apache.lucene.util.AttributeSource
      org.apache.lucene.analysis.TokenStream
         org.apache.lucene.analysis.Tokenizer
            org.apache.lucene.analysis.CharTokenizer

All Implemented Interfaces:
    Closeable

Direct Known Subclasses:
    LetterTokenizer, RussianLetterTokenizer, WhitespaceTokenizer, LowerCaseTokenizer, ArabicLetterTokenizer

An abstract base class for simple, character-oriented tokenizers.
Fields inherited from org.apache.lucene.analysis.Tokenizer:
input
Constructor:
 public CharTokenizer(Reader input) 
 public CharTokenizer(AttributeSource source,
    Reader input) 
 public CharTokenizer(AttributeFactory factory,
    Reader input) 
Method from org.apache.lucene.analysis.CharTokenizer Summary:
end,   incrementToken,   isTokenChar,   normalize,   reset
Methods from org.apache.lucene.analysis.Tokenizer:
close,   correctOffset,   reset
Methods from org.apache.lucene.analysis.TokenStream:
close,   end,   incrementToken,   reset
Methods from org.apache.lucene.util.AttributeSource:
addAttribute,   addAttributeImpl,   captureState,   clearAttributes,   cloneAttributes,   equals,   getAttribute,   getAttributeClassesIterator,   getAttributeFactory,   getAttributeImplsIterator,   hasAttribute,   hasAttributes,   hashCode,   restoreState,   toString
Methods from java.lang.Object:
clone,   equals,   finalize,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.lucene.analysis.CharTokenizer Detail:
 public final  void end() 
 public final boolean incrementToken() throws IOException 
 abstract protected boolean isTokenChar(char c)
    Returns true iff a character should be included in a token. This tokenizer generates as tokens adjacent sequences of characters which satisfy this predicate. Characters for which this is false are used to define token boundaries and are not included in tokens.
 protected char normalize(char c) 
    Called on each token character to normalize it before it is added to the token. The default implementation does nothing. Subclasses may use this to, e.g., lowercase tokens.
 public  void reset(Reader input) throws IOException