CS 4604 Project Assignment 2
Assigned on September 16, 2008. Hardcopy due at the start of class on Tuesday,
September 23, 2008.
This assignment will consist of designing your schema in PostgreSQL and populating
the database. You can utilise your PostgreSQL account on postdb.cs.vt.edu for
all aspects of this assignment. If you do not have an account for yourself and
for your group, please contact us.
- (30 points) Design a set of relations to model your database. For each relation,
specify its schema: the name of the relation, its attributes, and for each
attribute, its type. Describe using a sentence or two what each relation will
store.
Note: Your design should have anywhere from 4-5 relations
(or more). If you have fewer than that, then your application is not complex
enough for a CS 4604 project. You should talk to me about how to extend its
scope. However, do not cook up relations just to meet this goal. We will carefully
vet each relation and penalise relations we think are unnecessary. We will
not penalise you if you have more than five relations. However, be warned
that later steps of the project may become quite complex.
- (30 points) Write an SQL database schema for your application, using the
CREATE TABLE commands described in class. Pick suitable datatypes
for each attribute. Consult the online PostgreSQL
manual to determine the principal options that are available for types
(your class notes may or may not cover the ones that you require and your
class notes may or may not contain the right declaration). Do not forget to
specify at least one key for each relation (using the PRIMARY KEY
construct). Specify other choices for keys via the UNIQUE construct.
Justify your choice of key(s) for each relation. If some attribute should
not be NULL, use NOT NULL to specify this aspect. Also
provide DEFAULT values wherever applicable. An example of a complete
schema definition for a relation is:
CREATE TABLE
Students (
pid CHAR(6) PRIMARY KEY,
ssn CHAR(9) UNIQUE,
name CHAR(30),
birthdate DATE,
gender CHAR(1) NOT NULL,
address VARCHAR(255)
);
Do not start loading these definitions into the database just yet!
- (10 points) For each of the relations that you defined, indicate any foreign
key (referential integrity) constraints that are expected to hold for that
relation. Include the declaration of foreign key constraints for any appropriate
relations. Also mention any domain specific constraints that you might require
of your attributes, using the CHECK construct.
- (30 points) Develop a substantial amount of data for your application and
load it into your relations using the bulk loading facility provided in PostgreSQL:
you can use either the \copy command
in the psql command-line client described in class (check if its deprecated?
if yes, use the newer version) or the COPY command.
We are looking at data in the order of 25-50 tuples for each relation in your
application. To create the data, either get it from some web source or information
repository (making sure that it is public first, of course) or write a program
in any scripting/programming language (like Perl, TCL or C) that creates large
files of records in a format acceptable to the bulk loader. Either way, realize
that you may need to transform data from one form to another to be acceptable
for use in PostgreSQL. Sometimes, some of your relations might require
only 4--5 tuples. For example, if you are modeling the real-estate business,
there might be only a few realtors in a small town like Blacksburg, in which
case it is okay to have just a few tuples in that particular relation. In
such cases, where you have only a few tuples, state explicitly why you
have only a few tuples. Precautions to Take if you "Cook up" your
Data: In parts (1) and (2), you have have specified key(s) (and possibly
foreign keys) for each relation. Make sure that while cooking up data, your
fabricated data indeed confirms to such "key restrictions". If you declare
that "StudentID" is the key, then do not generate data for different students
having the same ID! Notice that relations will definitely share attributes
(if there are no common attributes at all in your schema, then your project
is in grave danger, see the instructor immediately!), so make sure that they
"agree" on the common attributes. For example, if there are two relations
that share an attribute called "name", then make sure that the names do indeed
tally. Do not list the name "Mark" as "MARK" in one and "maRK" in another
and "Mark" in yet another because then they would be three different names.
The bottom line is to be consistent. When two things are meant to be the same,
make sure your data reflects this fact. If you do not ensure so at this stage,
you will spend more time debugging your application than otherwise!
Now execute some sample INSERT, DELETE
and UPDATE commands on every one of your relations. Convince yourself
that things are working fine. Report any interesting observations that you
make.
What to turn in: A paper copy that details the following:
- A list of your relation schemas.
- A list of your SQL CREATE TABLE commands (with
PRIMARY KEY, NOT NULL, DEFAULT,
FOREIGN KEY, and CHECK constraints).
- A list of the number of tuples for each relation in your
application.
- Samples of INSERT, DELETE, and
UPDATE commands that you tried out and that worked.
- The source of your data. Examples of possible answers are "We got it
from this web address", "We got the data from this reference book on
apartment homes", "We imported it from this document", or "We cooked it
up". Whatever your answer, write a short paragraph on how you
transformed the data from its native form to the form required by
PostgreSQL. If you wrote a special purpose program or script (in
Perl/TCL/C or some other language), include that script. Explain how you
took special care to make sure that key restrictions and common
attribute constraints are not violated. Explain why some relations have
only a few tuples (if applicable).
- Samples of tuples for each relation (about 6--10 each), just to
give us an idea of how they look.
What NOT to turn in:
- Pages and pages of raw data or printouts of all the data in your
database. There is no way we can comb this stuff manually! You
will have ample opportunity to present this stuff later on in the
semester.
Precautions and Recommendations:
- Please SAVE a copy of your SQL statements (and
data) in your home directory or some other safe place. While this
precaution may seem obvious, a surprising number of people miss it.