CS 1705 Library

net.sf.webcat
Class StringNormalizer

java.lang.Object
  extended by java.util.AbstractCollection<E>
      extended by java.util.AbstractList<E>
          extended by java.util.ArrayList<StringNormalizer.NormalizerRule>
              extended by net.sf.webcat.StringNormalizer
All Implemented Interfaces:
Serializable, Cloneable, Iterable<StringNormalizer.NormalizerRule>, Collection<StringNormalizer.NormalizerRule>, List<StringNormalizer.NormalizerRule>, RandomAccess

public class StringNormalizer
extends ArrayList<StringNormalizer.NormalizerRule>

This class represents a programmable string "normalizing" engine that can be used to convert strings into a canonical form, say, before comparing strings for equality or something. Basically, a normalizer is a list of zero or more rules, or transformations. The normalize(String) method can be used to apply the entire set of transformations to a given string.

For example, you can build a string normalizer that replaces all sequences of one or more whitespace characters by a single space character, trims any leading or trailing space, and converts a string to lower case. This class provides a number of predefined transformations in the StringNormalizer.StandardRule enumeration. Some examples:

  // An "identity" transformation that does nothing:
  StringNormalizer norm1 = new StringNormalizer();
  // norm1.normalize(...) returns its argument unchanged

  // A "lower case" normalizer:
  StringNormalizer norm2 = new StringNormalizer(
      StringNormalizer.StandardRule.IGNORE_CAPITALIZATION);
  // norm2.normalize(...) returns a lower case version of its argument

  // self-explanatory:
  StringNormalizer norm3 = new StringNormalizer(
      StringNormalizer.StandardRule.IGNORE_CAPITALIZATION,
      StringNormalizer.StandardRule.IGNORE_PUNCTUATION);

  // A "standard" normalizer:
  StringNormalizer norm4 = new StringNormalizer(true);
  // norm4.normalize(...) returns its contents with all punctuation
  // characters removed, all letters converted to lower case, all
  // whitespace sequences replaced by single spaces, all MS-DOS or
  // Mac line terminators replaced by "\n"'s, and all leading and
  // trailing whitespace removed.
  

Note that string normalizers that contain multiple rules apply those rules in order (i.e., in the order added, or the List order of this class). This may produce inconsistent results if you are not careful when you add your rules.

Version:
$Id: StringNormalizer.java,v 1.2 2007/09/15 02:04:16 stedwar2 Exp $
Author:
Stephen Edwards
See Also:
Serialized Form

Nested Class Summary
static interface StringNormalizer.NormalizerRule
          This interface defines what it means to be a normalizer rule: an object having an appropriate StringNormalizer.NormalizerRule.normalize(String) method.
static class StringNormalizer.RegexNormalizerRule
          A highly reusable concrete implementation of StringNormalizer.NormalizerRule that applies a series of regular expression substitutions.
static class StringNormalizer.StandardRule
          This enumeration defines the set of predefined transformation rules.
 
Constructor Summary
StringNormalizer()
          Creates a new StringNormalizer object containing no rules (the "identity" normalizer).
StringNormalizer(boolean useStandardRules)
          Creates a new StringNormalizer object, optionally containing the standard set of rules.
StringNormalizer(Collection<? extends StringNormalizer.NormalizerRule> rules)
          Creates a new StringNormalizer object containing the given set of rules.
StringNormalizer(StringNormalizer.NormalizerRule... rules)
          Creates a new StringNormalizer object containing the given set of rules.
StringNormalizer(StringNormalizer.StandardRule... rules)
          Creates a new StringNormalizer object containing the given set of rules.
 
Method Summary
 boolean add(StringNormalizer.NormalizerRule rule)
          Add the specified rule.
 void add(StringNormalizer.StandardRule rule)
          Add the specified standard rule, as defined in StringNormalizer.StandardRule.
 void addStandardRules()
          Add the standard set of rules.
 String normalize(String content)
          Normalize a string by applying a set of normalization rules (transformations).
 void remove(StringNormalizer.StandardRule rule)
          Remove the specified standard rule, as defined in StringNormalizer.StandardRule.
static StringNormalizer.NormalizerRule standardRule(StringNormalizer.StandardRule rule)
          Retrieve a standard rule by name.
 
Methods inherited from class java.util.ArrayList
add, addAll, addAll, clear, clone, contains, ensureCapacity, get, indexOf, isEmpty, lastIndexOf, remove, remove, set, size, toArray, toArray, trimToSize
 
Methods inherited from class java.util.AbstractList
equals, hashCode, iterator, listIterator, listIterator, subList
 
Methods inherited from class java.util.AbstractCollection
containsAll, removeAll, retainAll, toString
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface java.util.List
containsAll, equals, hashCode, iterator, listIterator, listIterator, removeAll, retainAll, subList
 

Constructor Detail

StringNormalizer

public StringNormalizer()
Creates a new StringNormalizer object containing no rules (the "identity" normalizer).


StringNormalizer

public StringNormalizer(boolean useStandardRules)
Creates a new StringNormalizer object, optionally containing the standard set of rules. The standard set is all those in StringNormalizer.StandardRule exception the OPT_* rules.

Parameters:
useStandardRules - If true, the set of standard (non-OPT_*) rules will be used. If false, an "identity" normalizer will be produced instead.

StringNormalizer

public StringNormalizer(StringNormalizer.StandardRule... rules)
Creates a new StringNormalizer object containing the given set of rules.

Parameters:
rules - a (variable-length) comma-separated sequence of rules to add

StringNormalizer

public StringNormalizer(StringNormalizer.NormalizerRule... rules)
Creates a new StringNormalizer object containing the given set of rules.

Parameters:
rules - a (variable-length) comma-separated sequence of rules to add

StringNormalizer

public StringNormalizer(Collection<? extends StringNormalizer.NormalizerRule> rules)
Creates a new StringNormalizer object containing the given set of rules.

Parameters:
rules - a collection of rules to add (could be another StringNormalizer, or any other kind of collection)
Method Detail

normalize

public String normalize(String content)
Normalize a string by applying a set of normalization rules (transformations).

Parameters:
content - The string to transform
Returns:
The result after all rules have been applied

addStandardRules

public void addStandardRules()
Add the standard set of rules. The standard set is all those in StringNormalizer.StandardRule exception the OPT_* rules.


add

public void add(StringNormalizer.StandardRule rule)
Add the specified standard rule, as defined in StringNormalizer.StandardRule. Note that you can also use the inherited List.add(Object) method to add custom NormalizerRule objects.

Parameters:
rule - The rule to add

add

public boolean add(StringNormalizer.NormalizerRule rule)
Add the specified rule. For efficiency, only adds the rule if it is not already present in this normalizer.

Specified by:
add in interface Collection<StringNormalizer.NormalizerRule>
Specified by:
add in interface List<StringNormalizer.NormalizerRule>
Overrides:
add in class ArrayList<StringNormalizer.NormalizerRule>
Parameters:
rule - The rule to add
Returns:
True if the rule was added, or false if it is already present

remove

public void remove(StringNormalizer.StandardRule rule)
Remove the specified standard rule, as defined in StringNormalizer.StandardRule. Note that you can also use the inherited List.remove(Object) method to remove other kinds of NormalizerRule objects.

Parameters:
rule - The rule to remove

standardRule

public static StringNormalizer.NormalizerRule standardRule(StringNormalizer.StandardRule rule)
Retrieve a standard rule by name.

Parameters:
rule - the rule to retrieve
Returns:
The corresponding StringNormalizer.NormalizerRule

Last updated: Wed, Apr 1, 2009 • 12:29 AM EDT

Copyright © 2009 Virginia Tech.