Save This Page
Home » lucene-3.0.1-src » org.apache » lucene » analysis » [javadoc | source]
org.apache.lucene.analysis
public class: LetterTokenizer [javadoc | source]
java.lang.Object
   org.apache.lucene.util.AttributeSource
      org.apache.lucene.analysis.TokenStream
         org.apache.lucene.analysis.Tokenizer
            org.apache.lucene.analysis.CharTokenizer
               org.apache.lucene.analysis.LetterTokenizer

All Implemented Interfaces:
    Closeable

Direct Known Subclasses:
    LowerCaseTokenizer, ArabicLetterTokenizer

A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate. Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
Fields inherited from org.apache.lucene.analysis.Tokenizer:
input
Constructor:
 public LetterTokenizer(Reader in) 
    Construct a new LetterTokenizer.
 public LetterTokenizer(AttributeSource source,
    Reader in) 
 public LetterTokenizer(AttributeFactory factory,
    Reader in) 
Method from org.apache.lucene.analysis.LetterTokenizer Summary:
isTokenChar
Methods from org.apache.lucene.analysis.CharTokenizer:
end,   incrementToken,   isTokenChar,   normalize,   reset
Methods from org.apache.lucene.analysis.Tokenizer:
close,   correctOffset,   reset
Methods from org.apache.lucene.analysis.TokenStream:
close,   end,   incrementToken,   reset
Methods from org.apache.lucene.util.AttributeSource:
addAttribute,   addAttributeImpl,   captureState,   clearAttributes,   cloneAttributes,   equals,   getAttribute,   getAttributeClassesIterator,   getAttributeFactory,   getAttributeImplsIterator,   hasAttribute,   hasAttributes,   hashCode,   restoreState,   toString
Methods from java.lang.Object:
clone,   equals,   finalize,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.lucene.analysis.LetterTokenizer Detail:
 protected boolean isTokenChar(char c)