CS 4604 Project Assignment 3
Released on Mar 8, 2013. Hardcopy due at the start of
class on Mar 22, 2013.
- (0 points) Modify your database schema to address all
our comments on the solution you turned in for Project Assignment
2. Itemize all the changes you made. It is enough if you
explain in words, e.g., "We created a table called Pubs
to store all keys and titles of all the publications in the
database. We added constraints to create foreign key referecnes to
this tables from
tables Articles, InProceedings,
...". We will not grade assignment
3 unless you list the changes made to your schema. If you did not
make any changes, explain why.
- (10 points) Conferences are becoming increasingly important
venues in computer science. Journals on the other hand are becoming
less and less important. Nevertheless, scientists continue to first
publish a paper in a conference and then submit a full version (with
the same title) to a journal. Count the number of publications that
first appeared in a conference (the type of such a publication
is inproceedings) and later appeared with the same title
in a journal (the type of such a publication is article).
- (15 points) Write a query to find the names of the 10 most
prolific authors, i.e., the 10 authors who have written the most
publications. In this query and the remaining queries, the rank of
an author on the list of authors of a publication does not
matter. Ignore editorships in this and the remaining queries as
well, i.e., if an author has edited a publication, do not credit
this publication in the authors' count. Return the author name and
the number of publications written by the author, for the 10 most
prolific authors.
- (20 points) Some authors like to work alone, but can still be
very prolific. Find the 10 authors who have written the most number
of single author papers. Return both the author name and the number
of papers written.
- (25 points) Other authors are highly collaborative. A shining
example is the Hungarian
mathematician Paul
Erdös, who has published the largest number of papers in
mathematics and has worked with hundreds of collaborators. As a side
note, "The
Man Who Loved Only Numbers" is a fascinating biography of this
great mathematician. For this query, you are required to find the 10
scientists with the highest number of collaborators. Two scientists
have collaborated if they write at least one publication together;
the number of papers they have written together does not matter. The
number of collaborators of a scientist is the number of other
scientists he/she has written at least one publication with. Return
the author name and the number of other authors he/she has
collaborated with, for the 10 authors with the highest number of
collaborators.
- (30 points) Keeping to the theme of Paul Erdös,
mathematicians amuse themselves by computing
their Erdös
numbers. Briefly, if you have written a publication with
Erdös, your Erdös number is 1. The number is defined
inductively for other scientists. If your Erdös number is
not k or less and if you have written a publication with a
scientist whose Erdös number is k, then your Erdös
number is k + 1. The DBLP database does not store
publications by Erdös, so let us use the highly prolific computer
scientist 'Philip S. Yu' in his stead, and define a Yu number
analagous to the Erdös number. Write a query to find the number
of authors whose Yu number is 2.
While grading your solutions, we will pay attention to the quality of
your queries, e.g., whether they are correct, the number of tables they
reference, and the running time. Please desist from creating
massive new tables to support answering these queries!
What you should turn in:
A paper copy that details the
following:
- The name of your project and the names of the students in your
group.
- The changes you made to your database based on our suggestions
for Assignment 2.
- A list of your defined SQL schemas; these schemas are to remind us
about your design.
- For each of the problems listed above,
- Your SQL query,
- The result you obtained,
- The time your query took (use the \timing command
in psql to obtain the time),
Last Updated: Sat, Mar 2, 3:30pm EDT, 2013