CS 4604: Project Assignment 4


Released on Mar 22, 2013. Hardcopy due at the start of class on Mar 29, 2013.

  1. (100 points) Design an E/R diagram for the DBLP database. Here is a complete description of what your E/R diagram should model. Read this description carefully, since it differs in some details from the description in Project Assignment 2. The DBLP dataset contains information about approximately 1.4 million publications in the computer science literature. Each publication has a unique string called the dblp_key that identifies it. It also has a title, a year of publication, and one or more authors. Some types of publications do not have authors: they have editors (see below). The order in which authors appear in a publication is important and must be recorded. In each publication, each author appears at most once. The rank of a author is unique within the publication. Within a publication, ranks must start at 1 and be consecutive. For some publications, the authors have not been recorded. A publication may also have a URL and a Digital Object Identifier (DOI). Each publication belongs to one of the following categories:
    article
    This type corresponds to a journal article. The publication will have an associated journal name, a volume and a number specifying the issue of the journal, page numbers, and a publisher of the journal.
    book
    As the name indicates, this type of publication is a book. It also has a publisher and an ISBN number.
    incollection
    This type indicates a publication contained within a collection. An example of a collection is a book that contains different chapters written by different authors (note that every book is not necessarily a collection). Each chapter in a collection will have the type incollection. Each chapter will have its own page numbers and authors. The entire collection itself is considered a separate publication and has its own title, a list of editors, and a publisher. It is not possible for a person to be an author of a collection, i.e., collections only have editors. Within a single collection, an editor appears at most once. Within a single collection, ranks of editors are also unique and consecutively numbered starting at 1. A chapter in a collection has a cross reference to the collection it was published in.
    inproceedings
    This type indicates a paper published in the proceedings of a scientific conference. It is very similar to a publication of type "incollection". The conference proceedings is itself a separate publication with its own title, editors, and publisher. Editors and their ranks for a "proceeding" have the same function and constraints as for a "collection". A publication of type "inproceedings" has a cross reference to the proceedings it was published in.
    mastersthesis
    This publication is a Master's thesis, with a specific author, department and/or university, and year.
    phdthesis
    This publication is a PhD thesis, with a specific author, department and/or university, and year.
    www
    This type of "publication" is just a pointer to a web page, possibly with a title and one or more authors. It must have a URL.

    Each publication can cite one or more publications (these are the list of references that appear at the end of a typical publication). In addition, each publication can be associated with one or more topics. Topics are themselves arranged hierarchically, e.g., see the Computing Classification System. A topic can be a sub-topic of more than one "parent" topic and itself have one or more specialised topics as "children".


What to turn in (100 points):
Hard-copies of the E/R design. Identify your group by your group name and the team members. In a section titled "Explanation", for each entity set and relationship, write a short description in plain English of what it represents or models. One or two sentences per entity set and relationship is enough. These descriptions are primarily to help us understand that you are modeling the DBLP database correctly. 30% of the grade will be for the explanation. Discuss and identify any constraints and restrictions that the domain poses. For constraints that E/R diagrams cannot model, write in plain English what these constraints in a section titled "Notes".

Note: As the E/R diagram may be large, Here is a useful program which you can use for creating E/R diagrams (and flowcharts etc.). Email Qianzhou if you have any problems using it.
Common Mistakes to avoid in design:
Last Updated: Thur, Mar 14, 10:30pm EDT, 2013