CS 2104 Problem Solving in Computer Science                    OOC Assignment 9
-------------------------------------------------------------------------------

1.	[10 points] The set of all 10-digit telephone numbers, like 540.231.5605, 
    formatted in precisely that manner; match all and only telephone numbers 
    that are preceded and followed by one or more spaces.  (You do not have 
    to take into account any restrictions on what are actually valid area 
    codes or prefixes.)
    
    This is pretty straightforward, especially if you make good use of the
    relevant repetition syntax:
    
      ' ([0-9]{3}\.){2}[0-9]{4} '
      
    I didn't mention '\s\, but you might have come across it in the reading.
    Technically, the following is not valid, since it would also match tabs:
    
      '\s([0-9]{3}\.){2}[0-9]{4}\s'
	
    
2.	[10 points] The set of all strings that consist of three or four lower-case 
    letters, where the first character cannot be a vowel, and both ends of the 
    string must be adjacent to space characters.  (Be careful about the 
    requirement that the string only contains letters.)
    
    The only real challenge here is the restriction that the first character
    cannot be a vowel.  Here is one solution:
    
      ' \<[b-df-hj-np-tv-z][a-z]{2,3}\> '
    
    And here is a seductive incorrect answer:
    
      ' \<[^aeiou][a-z]{2,3}\> '
    
    The problem with this one is that the first part will match any character
    that's not a vowel, not just letters.
    
	
3.	[20 points] The set of all lines in a file that begin and end with the word 
    "the".  (There are exactly 64 such lines in the Moby Dick file from the 
    Gutenberg Project.)
    
    This was also straightforward.  You must specify that both occurrences of 
    "the" are matched as words, and force the first to be at the beginning of
    the line and the last to be at the end of the line.  And, what comes in
    the middle of the line is of no importance, but you must specify that.
    
      ^\<the\>.*\<the\>$
	
    
4.	[20 points] The set of all lines (in a file) that contain the place name 
    "New England" or "Spain".  (There are exactly 10 such lines in the Moby 
    Dick file from the Gutenberg Project.)
    
    Again, both names must be matched as words, and we must find all lines
    that contain one or the other (or both), so we need the OR operator:
    
      '\<New England\>|\<Spain\>'
	
    
5.	[20 points] The set of all lines (in a file) that include a word that 
    contains two (or more) consecutive occurrences of the letter 'a' or two 
    (or more) consecutive occurrences of the letter 'u'.  (There are exactly 6 
    such lines in the Moby Dick file from the Gutenberg Project.)
    
    Aside from needing word matches, we must allow arbitrary content before 
    and after the "aa" or "uu":
    
      '\<.*aa.*\>|\<.*uu.*\>'
      
	
6.	[20 points] The set of all lines (in a file) that contain the word "the" at 
    least five times.  (There are exactly 4 such lines in the Moby Dick file 
    from the Gutenberg Project.)
    
    The key elements are that the content before, between and after the 
    occurrences of "the" is arbitary; that "the" must be matched as a word;
    and that their must be 5 or more matches of "the":
    
      (.*\<the\>){5}
	
    Note that it's OK to search for only 5 matches, since that will include
    all lines with more than 5 matches.