Home  |   Notes  |   Homework  |   Labs  |   Programs
Program 5: Web Log Analyzer

Due midnight the evening of 11/16

Goal

In your fifth programming assignment, you will be writing a variation of a program example discussed in the text book: the web log analyzer from Chapter 4. The application in B&K performs a hour-based analysis of web site accesses stored in a simplified log format. In this case, you will be using real data from the CS department's web server in its native format. Additionally, you will be accumulating access statistics by both day of the week and hour of the day.

 
Learning Objectives
  •  Exposure to interfaces
  •  Exposure to multi-dimensional arrays
  •  Exposure to using library classes
  •  Familiarity with control constructs
  •  Familiarity with writing test cases
  •  Familiarity with loops
  •  Mastery of the Web-CAT Grader
  • Reading Log Files

    Just as in the B&K web log analyzer application, you are being provided with two support classes that will help in processing log files. Make sure to update your cs1705.jar file so that you have these two classes.

    The two support classes are in the package cs1705.weblog:

    Refer to the javadoc documentation for these classes in the cs1705 package API on-line for more information about the methods they provide. Do not forget to import cs1705.weblog.* in your client class.

    Requirements for Your Solution

    Rather than extending a base class, your solution must implement an interface. It must implement the cs1705.weblog.LogAnalyzer interface:

    public interface LogAnalyzer
    {
        public void accumulateLogData( BufferedReader inStream );
        public void accumulateLogDataFromFile( String file );
        public void accumulateLogDataFromURL( String url );
    
        public int accessCountsForDayHour( int day, int hour );
        public int accessCountsForDay( int day );
        public int accessCountsForHour( int hour );
    }
    

    When you create your solution, after the class name you state that it "implements LogAnalyzer" instead of saying that it extends some base class. When a class implements an interface, it must provide all of the operations described in the interface with exactly the same signatures (i.e., the same return types and the same number and types of parameters). Note that in this case, none of the methods in the LogAnalyzer interface throws any exceptions. In particular, this means that none of the accumulate...() methods provided by your class can throw I/O exceptions.

    The LogAnalyzer methods are fairly simple in behavior. Your log analyzer should maintain an internal count of the number of web accesses for each hour of the day in each day of the week. The three accumulate...() methods simply use a LogReader object to process all of the log entries in the given source and add each log entry in turn to your running totals. The three accessCountsFor...() methods return information about the number of web accesses recorded for a specified time period.

    In addition, you must name your log analyzer class VTLogAnalyzer and give it a constructor that takes no parameters.

    Implementing a Log Analyzer

    Because you must be able to return access counts for a specific hour on a specific day of the week, it is natural to think of maintaining the data internally in a matrix, with one row for each day of the week and one column for each hour of the day. You can then simply add each log entry to the appropriate cell in this matrix. One easy way to implement such a matrix is to use a two-dimensional array.

    In addition, you will need to extract day and hour information from each LogEntry that you process. Unlike the LogEntry class from the textbook example, the cs1705.weblog.LogEntry class has one method to retrieve the access time called accessTime(). This method returns a java.util.Calendar object representing the localized time associated with the given log entry.

    For the purposes of this assignment, the most important feature of the Calendar class is that it provides an accessor called get() that allows you to inspect most of the information about a calendar date. The get() method takes a single int parameter that is used to indicate what kind of information you want to retrieve. Fortunately, the Calendar class provides a large set of predefined constants you can use. For example:

    Calendar myDate = ...;     // get a calendar date somehow
    
    int day  = myDate.get( Calendar.DAY_OF_WEEK );
    int hour = myDate.get( Calendar.HOUR_OF_DAY );
    

    Be careful about choosing the parameters your use for get() (e.g., Calendar.HOUR is in 12-hour format instead of 24, and Calendar.DAY_OF_MONTH or Calendar.DAY_OF_YEAR return day numbers on different scales). Also, note that the Calendar class defines constants Calendar.SUNDAY (1) through Calendar.SATURDAY (7) representing the possible day numbers--they start at one instead of zero, which you must account for in your design. The accessCountsFor...() methods that take day values must accept the appropriate Calendar constant values for days of the week. You can review the Calendar javadoc on-line if you want to learn more about the other features of this class.

    Testing Your Analyzer

    The following URL contains a sample log file you can use for analysis:

    http://courses.cs.vt.edu/~cs1705/Fall03/programs/sample.log
    

    After you begin processing statistics for this sample, feel free to cross-check your work with your peers on the discussion board--look for the thread called "sample.log stats for testing". If no one has posted any information, feel free to post what you've come up with so far. Then check back to make sure you have values consistent with your classmates'. You can use these values in writing specific test cases for your log analyzer.

    If you have mastered the sample access log and are looking for a bigger challenge, you might also consider trying this log:

    http://courses.cs.vt.edu/~cs1705/Fall03/programs/log.zip
    

    This zip file contains the actual log for one week of accesses to http://www.cs.vt.edu/ during October of this year. Be careful--the zip file is about 2.6MB, and the uncompressed log file it contains is about 62MB in size (!). As a result, you will need to uncompress it yourself before running your analyzer on it. Feel free to start a new thread on the course discussion forum if you generate statistics from this log.

    Submit Your Solution

    Program submissions work just like lab submissions. On BlueJ's main menu, click Tools->Submit.... Click on "Browse...", double-click to open the "CS 1705 Programs" folder, and select Program 5. Click "OK". Click "Submit". Click on the link provided in the submission response in order to view the results of the automated phase of program grading.

    If no "Program 5" entry is visible on BlueJ's submission menu, then the Web-CAT Grader is not yet accepting submissions for this assignment. Wait for a message posted to the course web site that submissions are being accepted, and try again.

    If any errors, warnings or suggestions are indicated, you can fix them and resubmit. You are expected to fix all such issues in your code. You may resubmit as many times as you like, up until the deadline. Be careful as the due time approaches--if you submit just over the deadline, a late penalty will be assessed.

    Home  |   Notes  |   Homework  |   Labs  |   Programs

    copyright © 2003 Virginia Tech, ALL RIGHTS RESERVED
    Last modified: November 12, 2003, 8:46:14 am EST, by Stephen Edwards <edwards@cs.vt.edu>