Project Suggestion: TREC

Title: TEXT RETRIEVAL CONFERENCE
                        CALL FOR PARTICIPATION

                       TEXT RETRIEVAL CONFERENCE

                      January 2000 - November 2000


                            Conducted by:
          National Institute of Standards and Technology (NIST)

                            With support from:
            Defense Advanced Research Projects Agency (DARPA)

The Text Retrieval Conference (TREC) workshop series encourages research in information retrieval from large text applications by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. Now in its ninth year, the conference has become the major experimental effort in the field. Participants in the previous TREC conferences have examined a wide variety of retrieval techniques, including methods using automatic thesauri, sophisticated term weighting, natural language techniques, relevance feedback, and advanced pattern matching. Other related problems such as cross-language retrieval, retrieval of recorded speech, and question answering have also been studied. Details about TREC can be found at the TREC web site, http://trec.nist.gov .

You are invited to participate in TREC-9. TREC-9 will consist of a set of seven parallel tasks known as "tracks". Each track focuses on a particular subproblem or variant of the retrieval task as described below. Organizations may choose to participate in any or all of the tracks. For most tracks, training and test materials are available from NIST; a few tracks will use special collections that are available from other organizations for a nominal fee. For all tracks, NIST will collect and analyze the retrieval results.

Dissemination of TREC work and results other than in the (publicly available) conference proceedings is welcomed, but the conditions of participation preclude specific advertising claims based on TREC results. As before, the workshop in November will be open only to participating groups that submit results and to selected government personnel from sponsoring agencies.

Schedule:
---------
  By February 1, 2000 -- submit application described
  below to NIST.
        Returning an application will add you to the active
        participants' mailing list.  On Feb 1, NIST will
        announce a new password for the "active participants"
        portion of the TREC web site.  Included in this portion
        of the web site is information regarding the permission
        forms needed to obtain the TREC document disks.
  Beginning February 8 -- document disks distributed to those new
        participants who have returned the required forms.  There
        is a total of 5 CD-ROMS containing about 5 gigabytes of
        text. In addition, 450 training topics (questions) and                        
        relevance judgments are available from NIST.  Please
        note that no disks will be shipped before February 8.
  August  2   -- earliest results submission deadline.
  August 30   -- latest results submission deadline.
       (Results deadline will vary by track.  The Web track
        deadline will be August 2.  Deadlines for other tracks
        are still to be determined, but will be sometime in
        August.)
  September 7 -- speaker proposals due at NIST.
  October 5   -- relevance judgments and individual evaluation
        scores due back to participants.
  Nov. 13-16  -- TREC-9 conference at NIST in Gaithersburg, Md.


Task Description:
-----------------

Below is a brief summary of the tasks. Complete descriptions of tasks performed in previous years are included in the Overview papers in each of the TREC proceedings (in the Publications section of the web site).

For most tracks, the exact definition of the tasks to be performed in the track for TREC-9 is still being formulated. Track discussion takes place on the track mailing list. To be added to a track mailing list, send a request to the Mailing List Contact Address listed below. For questions about the track, send mail to the track coordinator (or post the question to the track mailing list once you join).

Cross-Language Track -- a track that investigates the ability of retrieval systems to find documents that pertain to a topic regardless of the language in which the document is written.
In previous TRECs, the cross-language track involved documents written in English, German, French, or Italian. Starting in 2000, the investigation of cross-language retrieval for European languages will have its own evaluation known as CLEF (for Cross-Language Evaluation Forum). More details about CLEF can be found at the CLEF web site, http://www.iei.pi.cnr.it/DELOS/CLEF .
In TREC-9, the cross-language track will use English and Mandarin documents and English topics. Depending on data availability, the track may also involve Tamil and Malay documents.
Track coordinator: Donna Harman, donna.harman@nist.gov
Mailing list contact address: erika.ashburn@nist.gov

Filtering Track -- A task in which the user's information need is stable (and some relevant documents are known) but there is a stream of new documents. For each document, the system must make a binary decision as to whether the document should be retrieved (as opposed to forming a ranked list).
Track coordinators: David Hull, david.hull@xrce.xerox.com and Steve Robertson, ser@microsoft.com
Mailing list contact address: lewis@research.att.com

Interactive Track -- A track studying user interaction with text retrieval systems. This year's track will use the Web document collection and a task similar (but not identical) to the Question Answering Track. All participating groups follow a common experimental protocol that provides insights into user searching.
Track coordinator: Bill Hersh, hersh@ohsu.edu
Mailing list contact address: hersh@ohsu.edu

Query Track -- A track designed to foster research on the effects of query variability and analysis on retrieval performance. Each participant constructs several different versions of existing TREC topics. All groups then run all versions of the topics.
Track coordinator: Chris Buckley, chrisb@sabir.com
Mailing list contact address: chrisb@sabir.com

Question Answering Track -- A track designed to take a step closer to *information* retrieval rather than *document* retrieval. For each of a set of 500 questions, systems produce a text extract that answers the question. Different runs will have different limits on the maximum length of the extract, including a short phrase (a few words), 50 bytes, and 250 bytes.
Track coordinators: Amit Singhal, singhal@research.att.com and Tomek Strzalkowski, strzalkowski@crd.ge.com
Mailing list contact address: singhal@research.att.com

Spoken Document Retrieval Track -- A track that investigates the effects of speech recognition errors on retrieval performance. The task to be performed in TREC-9 is still to be determined. Please contact the track coordinator as soon as possible if you are interested in this track.
Track coordinator: John Garofolo, john.garofolo@nist.gov
Mailing list contact address: john.garofolo@nist.gov

Web Track -- A track featuring ad hoc search tasks on a document set that is a snapshot of the World Wide Web. The main focus of the track will be to form a Web test collection using pooled relevance judgments. The document set will be a 10GB subsample of the existing VLC2 document set. Topics will be created at NIST by taking queries from search engine logs and retro-fitting topic statements around them. (Thus, the true web query will be there, but there will also be a narrative explaining how it will be judged.) Relevance judgments will then be made using the traditional TREC pooling methodology, with NIST assessors doing the judging.
Track coordinator: David Hawking, David.Hawking@cmis.csiro.a u
Mailing list contact address: David.Hawking@cmis.csiro.au

Conference Format:
The conference itself will be used as a forum both for presentation of results (including failure analyses and system comparisons), and for more lengthy system presentations describing retrieval techniques used, experiments run using the data, and other issues of interest to researchers in information retrieval. As there is a limited amount of time for these presentations, the program committee will determine which groups are asked to speak and which groups will present in a poster session. Groups that are interested in having a speaking slot during the workshop will submit a 200-300 word abstract in September describing the experiments they performed. The program committee will use these abstracts to select speakers.
As some organizations may not wish to describe their proprietary algorithms, TREC defines two categories of participation.

Category A: Full participation Participants will be expected to present full details of system algorithms and various experiments run using the data, either in a talk or in a poster session.

Category C: Evaluation only Participants in this category will be expected to submit results for common scoring and tabulation, and present their results in a poster session. They will not be expected to describe their systems in detail, but will be expected to report on time and effort statistics.

Data:
The existing TREC English collections (documents, topics, and relevance judgments) are available for training purposes and will also be used in some of the tracks. Parts of the training collection (Disks 1-3) were assembled from Linguistic Data Consortium text, and a signed User Agreement will be required from all participants. The documents are an assorted collection of newspapers, newswires, journals, and technical abstracts. A separate Agreement is needed for the remaining disks (4-5). All documents are typical of those seen in a real-world situation (i.e. there will not be arcane vocabulary, but there may be missing pieces of text or typographical errors). The relevance judgments against which each system's output will be scored will be made by experienced relevance assessors based on the output of all TREC participants using a pooled relevance methodology.

Response format and submission details:
Organizations wishing to participate in TREC-9 should respond to this call for participation by submitting an application. An application consists of four parts: contact information, a one-paragraph description of your retrieval approach, whether you will participate as a Category A or a Category C group, and a list of tracks that you are likely to participate in. Contact information includes a full regular address, voice and fax telephone numbers, and an email address of the one person in the organization who will be the main TREC contact. Please note that email is the only method of communication in TREC. Participants in TREC-8 who will participate in TREC-9 should also submit an application.

All responses should be submitted by February 1, 2000 to Ellen Voorhees, TREC project leader, at ellen.voorhees@nist.gov . Any questions about conference participation, response format, etc. should be sent to the same address.

Program Committee
-----------------
Ellen Voorhees, NIST, chair
James Allan, University of Massachusetts, Amherst
Nick Belkin, Rutgers University
Chris Buckley, Sabir Research, Inc.
Jamie Callan, Carnegie Mellon University
Susan Dumais, Microsoft
Donna Harman, NIST
David Hawking, CSIRO, Australia
Bill Hersh, Oregon Health Sciences University
Darryl Howard, U.S. Department of Defense
David Hull, Xerox Research Center Europe
John Prange, U.S. Department of Defense
Steve Robertson, Microsoft
Amit Singhal, AT&T Labs Research
Karen Sparck Jones, Cambridge University, UK
Tomek Strzalkowski, GE
Ross Wilkinson, CSIRO, Australia