CS 2204: Homework #2. Common Mistakes

Write egrep commands to operate on an input file and

  1. (2 points) Find all lines containing US postal abbreviations for states (you can assume that a postal abbreviation is two uppercase letters). Leaving off the spaces before and after [A-Z]{2} allows for the lines containing SSN and GBCB to be matched.
  2. (1 point) Find all lines listing US city references. A US city reference is like ÔPortland, ORÕ, i.e., a word, followed by a comma, a space, and then a two-letter state abbreviation. Having [A-Z][a-z]* instead of [A-Z][a-z]+ allows the line with "TX, VA, NY" to be matched, which is not City, State.
  3. (2 points) Find all lines listing university courses. A university course is a word, followed by a space, followed by exactly four digits. There needs to be a restriction on what comes after [0-9]{4}, otherwise it allows for 5 numbers in a row.
  4. (1 point) Find all lines containing formatted dollar amounts. These begin with a $, followed by a whole dollar amount (i.e., a succession of one or more digits), then a Ò.Ó, and finally, exactly two digits denoting the cents. There are two ways to correctly match the $ character, '\$' or "[\$]". If double quotes are used with \$ then it sees $ as a metacharacter.
  5. (2 points) Find all Russian last names in sample.txt. There needs to be a restriction on what comes before and after "in" and "off" otherwise lines with the words in and off will be matched.
  6. (1 point) Find all blank lines, i.e., lines containing nothing. Technically a line containing nothing means there are no characters on it. Space is a character too. However, most solutions that checked for spaces checked for any number including 0, which allowed for the correct solution.