| Home | Notes | Homework | Labs | Programs |
| Program 5: Web Log Analyzer |
| Goal |
|
In your fifth programming assignment, you will be writing a variation of a program example discussed in the text book: the web log analyzer from Chapter 4. The application in B&K performs a hour-based analysis of web site accesses stored in a simplified log format. In this case, you will be using real data from the CS department's web server in its native format. Additionally, you will be accumulating access statistics by both day of the week and hour of the day. |
|
| Reading Log Files |
Just as in the B&K web log analyzer application, you are
being provided with two support classes that will help in processing
log files. Make sure to
update your
cs1705.jar file so that you have these two classes.
The two support classes are in the package cs1705.weblog:
LogReader plays the same role as the
LogfileReader class in the B&K example.
The main difference is that it supports reading log entries from
a BufferedReader stream.
LogEntry plays the same role as the
LogEntry class in the B&K example.
The main differences are that it provides access to time information
using the java.util.Calendar class, and that it also
provides other information about each log entry (the URL being
requested, the result code sent back to the requester, the browser
being used, and so on).
Refer to the javadoc documentation for these classes in the
cs1705 package API on-line for more information about the methods
they provide. Do not forget to import cs1705.weblog.*
in your client class.
| Requirements for Your Solution |
Rather than extending a base class, your solution must implement
an interface. It must implement the cs1705.weblog.LogAnalyzer
interface:
public interface LogAnalyzer
{
public void accumulateLogData( BufferedReader inStream );
public void accumulateLogDataFromFile( String file );
public void accumulateLogDataFromURL( String url );
public int accessCountsForDayHour( int day, int hour );
public int accessCountsForDay( int day );
public int accessCountsForHour( int hour );
}
When you create your solution, after the class name you state that
it "implements LogAnalyzer" instead of saying that it
extends some base class. When a class implements an interface, it must
provide all of the operations described in the interface with exactly
the same signatures (i.e., the same return types and the same number and
types of parameters). Note that in this case, none of the methods
in the LogAnalyzer interface throws any exceptions. In
particular, this means that none of the accumulate...()
methods provided by your class can throw I/O exceptions.
LogAnalyzer methods are fairly simple in behavior.
Your log analyzer should maintain an internal count of the number of
web accesses for each hour of the day in each day of the week. The
three accumulate...() methods simply use a LogReader
object to process all of the log entries in the given source and
add each log entry in turn to your running totals. The three
accessCountsFor...() methods return information about the
number of web accesses recorded for a specified time period.
accessCountsForDayHour() retrieves the number of
accesses occurring at the specified hour (24-hour format) on
the specified day of the week.
accessCountsForDay() retrieves the number of
accesses occurring at any hour on the specified day of the week.
accessCountsForHour() retrieves the number of
accesses occurring at the specified hour on any day of the week.
In addition, you must name your log analyzer class
VTLogAnalyzer and give it a constructor that takes no
parameters.
| Implementing a Log Analyzer |
Because you must be able to return access counts for a specific hour on a specific day of the week, it is natural to think of maintaining the data internally in a matrix, with one row for each day of the week and one column for each hour of the day. You can then simply add each log entry to the appropriate cell in this matrix. One easy way to implement such a matrix is to use a two-dimensional array.
In addition, you will need to extract day and hour information from
each LogEntry that you process. Unlike the
LogEntry class from the textbook example, the
cs1705.weblog.LogEntry class has one method to
retrieve the access time called accessTime(). This
method returns a java.util.Calendar object representing
the localized time associated with the given log entry.
For the purposes of this assignment, the most important feature of the
Calendar class is that it provides an accessor called
get() that allows you to inspect most of the information
about a calendar date. The get() method takes a single
int parameter that is used to indicate what kind of information you
want to retrieve. Fortunately, the Calendar class provides
a large set of predefined constants you can use. For example:
Calendar myDate = ...; // get a calendar date somehow int day = myDate.get( Calendar.DAY_OF_WEEK ); int hour = myDate.get( Calendar.HOUR_OF_DAY );
Be careful about choosing the parameters your use for get()
(e.g., Calendar.HOUR is in 12-hour format instead of 24,
and Calendar.DAY_OF_MONTH or Calendar.DAY_OF_YEAR
return day numbers on different scales). Also, note that the
Calendar class defines constants
Calendar.SUNDAY (1) through Calendar.SATURDAY
(7) representing the possible day numbers--they start at one instead of
zero, which you must account for in your design.
The accessCountsFor...() methods that take day values must
accept the appropriate Calendar constant values for
days of the week.
You can review the
Calendar
javadoc on-line if you
want to learn more about the other features of this class.
| Testing Your Analyzer |
The following URL contains a sample log file you can use for analysis:
http://courses.cs.vt.edu/~cs1705/Fall03/programs/sample.log
After you begin processing statistics for this sample, feel free to cross-check your work with your peers on the discussion board--look for the thread called "sample.log stats for testing". If no one has posted any information, feel free to post what you've come up with so far. Then check back to make sure you have values consistent with your classmates'. You can use these values in writing specific test cases for your log analyzer.
If you have mastered the sample access log and are looking for a bigger challenge, you might also consider trying this log:
http://courses.cs.vt.edu/~cs1705/Fall03/programs/log.zip
This zip file contains the actual log for one week of accesses to http://www.cs.vt.edu/ during October of this year. Be careful--the zip file is about 2.6MB, and the uncompressed log file it contains is about 62MB in size (!). As a result, you will need to uncompress it yourself before running your analyzer on it. Feel free to start a new thread on the course discussion forum if you generate statistics from this log.
| Submit Your Solution |
Program submissions work just like lab submissions.
On BlueJ's main menu, click Tools->Submit.... Click on
"Browse...", double-click to open the
"CS 1705 Programs" folder, and select
Program 5. Click "OK".
Click "Submit". Click on the link provided
in the submission response in order to view the results of the
automated phase of program grading.
If no "Program 5" entry is visible on BlueJ's submission menu, then the Web-CAT Grader is not yet accepting submissions for this assignment. Wait for a message posted to the course web site that submissions are being accepted, and try again.
If any errors, warnings or suggestions are indicated, you can fix them and resubmit. You are expected to fix all such issues in your code. You may resubmit as many times as you like, up until the deadline. Be careful as the due time approaches--if you submit just over the deadline, a late penalty will be assessed.
| Home | Notes | Homework | Labs | Programs |