CS 4604 Project Assignment 2

Assigned on September 16, 2008. Hardcopy due at the start of class on Tuesday, September 23, 2008.

This assignment will consist of designing your schema in PostgreSQL and populating the database. You can utilise your PostgreSQL account on postdb.cs.vt.edu for all aspects of this assignment. If you do not have an account for yourself and for your group, please contact us.

(30 points) Design a set of relations to model your database. For each relation, specify its schema: the name of the relation, its attributes, and for each attribute, its type. Describe using a sentence or two what each relation will store.
Note: Your design should have anywhere from 4-5 relations (or more). If you have fewer than that, then your application is not complex enough for a CS 4604 project. You should talk to me about how to extend its scope. However, do not cook up relations just to meet this goal. We will carefully vet each relation and penalise relations we think are unnecessary. We will not penalise you if you have more than five relations. However, be warned that later steps of the project may become quite complex.
(30 points) Write an SQL database schema for your application, using the CREATE TABLE commands described in class. Pick suitable datatypes for each attribute. Consult the online PostgreSQL manual to determine the principal options that are available for types (your class notes may or may not cover the ones that you require and your class notes may or may not contain the right declaration). Do not forget to specify at least one key for each relation (using the PRIMARY KEY construct). Specify other choices for keys via the UNIQUE construct. Justify your choice of key(s) for each relation. If some attribute should not be NULL, use NOT NULL to specify this aspect. Also provide DEFAULT values wherever applicable. An example of a complete schema definition for a relation is:
```
 CREATE TABLE
Students (
    pid CHAR(6) PRIMARY KEY,
    ssn CHAR(9) UNIQUE,
    name CHAR(30),
    birthdate DATE,
    gender CHAR(1) NOT NULL,
    address VARCHAR(255)
);
```
Do not start loading these definitions into the database just yet!
(10 points) For each of the relations that you defined, indicate any foreign key (referential integrity) constraints that are expected to hold for that relation. Include the declaration of foreign key constraints for any appropriate relations. Also mention any domain specific constraints that you might require of your attributes, using the CHECK construct.
(30 points) Develop a substantial amount of data for your application and load it into your relations using the bulk loading facility provided in PostgreSQL: you can use either the \copy command in the psql command-line client described in class (check if its deprecated? if yes, use the newer version) or the COPY command.

We are looking at data in the order of 25-50 tuples for each relation in your application. To create the data, either get it from some web source or information repository (making sure that it is public first, of course) or write a program in any scripting/programming language (like Perl, TCL or C) that creates large files of records in a format acceptable to the bulk loader. Either way, realize that you may need to transform data from one form to another to be acceptable for use in PostgreSQL.
Sometimes, some of your relations might require only 4--5 tuples. For example, if you are modeling the real-estate business, there might be only a few realtors in a small town like Blacksburg, in which case it is okay to have just a few tuples in that particular relation. In such cases, where you have only a few tuples, state explicitly why you have only a few tuples.
Precautions to Take if you "Cook up" your Data: In parts (1) and (2), you have have specified key(s) (and possibly foreign keys) for each relation. Make sure that while cooking up data, your fabricated data indeed confirms to such "key restrictions". If you declare that "StudentID" is the key, then do not generate data for different students having the same ID! Notice that relations will definitely share attributes (if there are no common attributes at all in your schema, then your project is in grave danger, see the instructor immediately!), so make sure that they "agree" on the common attributes. For example, if there are two relations that share an attribute called "name", then make sure that the names do indeed tally. Do not list the name "Mark" as "MARK" in one and "maRK" in another and "Mark" in yet another because then they would be three different names. The bottom line is to be consistent. When two things are meant to be the same, make sure your data reflects this fact. If you do not ensure so at this stage, you will spend more time debugging your application than otherwise!

Now execute some sample INSERT, DELETE and UPDATE commands on every one of your relations. Convince yourself that things are working fine. Report any interesting observations that you make.

What to turn in: A paper copy that details the following:

A list of your relation schemas.
A list of your SQL CREATE TABLE commands (with PRIMARY KEY, NOT NULL, DEFAULT, FOREIGN KEY, and CHECK constraints).
A list of the number of tuples for each relation in your application.
Samples of INSERT, DELETE, and UPDATE commands that you tried out and that worked.
The source of your data. Examples of possible answers are "We got it from this web address", "We got the data from this reference book on apartment homes", "We imported it from this document", or "We cooked it up". Whatever your answer, write a short paragraph on how you transformed the data from its native form to the form required by PostgreSQL. If you wrote a special purpose program or script (in Perl/TCL/C or some other language), include that script. Explain how you took special care to make sure that key restrictions and common attribute constraints are not violated. Explain why some relations have only a few tuples (if applicable).
Samples of tuples for each relation (about 6--10 each), just to give us an idea of how they look.

What NOT to turn in:

Pages and pages of raw data or printouts of all the data in your database. There is no way we can comb this stuff manually! You will have ample opportunity to present this stuff later on in the semester.

Precautions and Recommendations:

Please SAVE a copy of your SQL statements (and data) in your home directory or some other safe place. While this precaution may seem obvious, a surprising number of people miss it.