This splits up input into tokens and passes
the tokens to a sequence of filters.
add
public void add(TokenFilter.Filter filter)
Add an arbitrary filter
filter
- the filter to add
add
public void add(Tokenizer tokenizer)
add an arbitrary tokenizer
tokenizer
- the tokenizer to all, only one allowed
addContainsRegex
public void addContainsRegex(TokenFilter.ContainsRegex filter)
contains regex filter
filter
- the contains regex filter
addContainsString
public void addContainsString(TokenFilter.ContainsString filter)
contains string filter
filter
- the contains string filter
addFileTokenizer
public void addFileTokenizer(TokenFilter.FileTokenizer tokenizer)
add a file tokenizer
tokenizer
- the file tokenizer
addIgnoreBlank
public void addIgnoreBlank(TokenFilter.IgnoreBlank filter)
ignore blank filter
filter
- the ignore blank filter
addLineTokenizer
public void addLineTokenizer(LineTokenizer tokenizer)
add a line tokenizer - this is the default.
tokenizer
- the line tokenizer
addReplaceRegex
public void addReplaceRegex(TokenFilter.ReplaceRegex filter)
replace regex filter
filter
- the replace regex filter
addReplaceString
public void addReplaceString(TokenFilter.ReplaceString filter)
replace string filter
filter
- the replace string filter
addStringTokenizer
public void addStringTokenizer(TokenFilter.StringTokenizer tokenizer)
add a string tokenizer
tokenizer
- the string tokenizer
chain
public final Reader chain(Reader reader)
Creates a new TokenFilter using the passed in
Reader for instantiation.
- chain in interface ChainableReader
reader
- A Reader object providing the underlying stream.
- a new filter based on this configuration
convertRegexOptions
public static int convertRegexOptions(String flags)
convert regex option flag characters to regex options
g - Regexp.REPLACE_ALL
i - Regexp.MATCH_CASE_INSENSITIVE
m - Regexp.MATCH_MULTILINE
s - Regexp.MATCH_SINGLELINE
flags
- the string containing the flags
read
public int read()
throws IOException
Returns the next character in the filtered stream, only including
lines from the original stream which match all of the specified
regular expressions.
- the next character in the resulting stream, or -1
if the end of the resulting stream has been reached
resolveBackSlash
public static String resolveBackSlash(String input)
xml does not do "c" like interpretation of strings.
i.e. \n\r\t etc.
this method processes \n, \r, \t, \f, \\
also subs \s -> " \n\r\t\f"
a trailing '\' will be ignored
input
- raw string with possible embedded '\'s
setDelimOutput
public void setDelimOutput(String delimOutput)
set the output delimiter.
delimOutput
- replaces the delim string returned by the
tokenizer, if present.