public class StringNormalizer extends ArrayList<StringNormalizer.NormalizerRule>
normalize(String)
method can be used to apply the entire
set of transformations to a given string.
For example, you can build a string normalizer that replaces all
sequences of one or more whitespace characters by a single space
character, trims any leading or trailing space, and converts a
string to lower case. This class provides a number of predefined
transformations in the StringNormalizer.StandardRule
enumeration.
Some examples:
// An "identity" transformation that does nothing: StringNormalizer norm1 = new StringNormalizer(); // norm1.normalize(...) returns its argument unchanged // A "lower case" normalizer: StringNormalizer norm2 = new StringNormalizer( StringNormalizer.StandardRule.IGNORE_CAPITALIZATION); // norm2.normalize(...) returns a lower case version of its argument // self-explanatory: StringNormalizer norm3 = new StringNormalizer( StringNormalizer.StandardRule.IGNORE_CAPITALIZATION, StringNormalizer.StandardRule.IGNORE_PUNCTUATION); // A "standard" normalizer: StringNormalizer norm4 = new StringNormalizer(true); // norm4.normalize(...) returns its contents with all punctuation // characters removed, all letters converted to lower case, all // whitespace sequences replaced by single spaces, all MS-DOS or // Mac line terminators replaced by "\n"'s, and all leading and // trailing whitespace removed.
Note that string normalizers that contain multiple rules apply those
rules in order (i.e., in the order added, or the
List
order of this class). This may produce
inconsistent results if you are not careful when you add your rules.
Modifier and Type | Class and Description |
---|---|
static interface |
StringNormalizer.NormalizerRule
This interface defines what it means to be a normalizer rule: an
object having an appropriate
StringNormalizer.NormalizerRule.normalize(String) method. |
static class |
StringNormalizer.RegexNormalizerRule
A highly reusable concrete implementation of
StringNormalizer.NormalizerRule
that applies a series of regular expression
substitutions. |
static class |
StringNormalizer.StandardRule
This enumeration defines the set of predefined transformation rules.
|
modCount
Constructor and Description |
---|
StringNormalizer()
Creates a new StringNormalizer object containing no rules (the
"identity" normalizer).
|
StringNormalizer(boolean useStandardRules)
Creates a new StringNormalizer object, optionally containing the
standard set of rules.
|
StringNormalizer(Collection<? extends StringNormalizer.NormalizerRule> rules)
Creates a new StringNormalizer object containing the given
set of rules.
|
StringNormalizer(StringNormalizer.NormalizerRule... rules)
Creates a new StringNormalizer object containing the given
set of rules.
|
StringNormalizer(StringNormalizer.StandardRule... rules)
Creates a new StringNormalizer object containing the given
set of rules.
|
Modifier and Type | Method and Description |
---|---|
boolean |
add(StringNormalizer.NormalizerRule rule)
Add the specified rule.
|
void |
add(StringNormalizer.StandardRule rule)
Add the specified standard rule, as defined in
StringNormalizer.StandardRule . |
void |
addStandardRules()
Add the standard set of rules.
|
String |
normalize(String content)
Normalize a string by applying a set of normalization rules
(transformations).
|
void |
remove(StringNormalizer.StandardRule rule)
Remove the specified standard rule, as defined in
StringNormalizer.StandardRule . |
static StringNormalizer.NormalizerRule |
standardRule(StringNormalizer.StandardRule rule)
Retrieve a standard rule by name.
|
add, addAll, addAll, clear, clone, contains, ensureCapacity, forEach, get, indexOf, isEmpty, iterator, lastIndexOf, listIterator, listIterator, remove, remove, removeAll, removeIf, removeRange, replaceAll, retainAll, set, size, sort, spliterator, subList, toArray, toArray, trimToSize
equals, hashCode
containsAll, toString
finalize, getClass, notify, notifyAll, wait, wait, wait
containsAll, equals, hashCode
parallelStream, stream
public StringNormalizer()
public StringNormalizer(boolean useStandardRules)
StringNormalizer.StandardRule
exception the OPT_* rules.useStandardRules
- If true, the set of standard (non-OPT_*)
rules will be used. If false, an "identity" normalizer will be
produced instead.public StringNormalizer(StringNormalizer.StandardRule... rules)
rules
- a (variable-length) comma-separated sequence of
rules to addpublic StringNormalizer(StringNormalizer.NormalizerRule... rules)
rules
- a (variable-length) comma-separated sequence of
rules to addpublic StringNormalizer(Collection<? extends StringNormalizer.NormalizerRule> rules)
rules
- a collection of rules to add (could be another
StringNormalizer, or any other kind of collection)public String normalize(String content)
content
- The string to transformpublic void addStandardRules()
StringNormalizer.StandardRule
exception the OPT_* rules.public void add(StringNormalizer.StandardRule rule)
StringNormalizer.StandardRule
.
Note that you can also use the inherited
List.add(Object)
method to add custom NormalizerRule
objects.rule
- The rule to addpublic boolean add(StringNormalizer.NormalizerRule rule)
add
in interface Collection<StringNormalizer.NormalizerRule>
add
in interface List<StringNormalizer.NormalizerRule>
add
in class ArrayList<StringNormalizer.NormalizerRule>
rule
- The rule to addpublic void remove(StringNormalizer.StandardRule rule)
StringNormalizer.StandardRule
.
Note that you can also use the inherited
List.remove(Object)
method to remove other kinds
of NormalizerRule objects.rule
- The rule to removepublic static StringNormalizer.NormalizerRule standardRule(StringNormalizer.StandardRule rule)
rule
- the rule to retrieveStringNormalizer.NormalizerRule