CALL FOR PARTICIPATION TEXT RETRIEVAL CONFERENCE January 2000 - November 2000 Conducted by: National Institute of Standards and Technology (NIST) With support from: Defense Advanced Research Projects Agency (DARPA)
The Text Retrieval Conference (TREC) workshop series encourages research in information retrieval from large text applications by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. Now in its ninth year, the conference has become the major experimental effort in the field. Participants in the previous TREC conferences have examined a wide variety of retrieval techniques, including methods using automatic thesauri, sophisticated term weighting, natural language techniques, relevance feedback, and advanced pattern matching. Other related problems such as cross-language retrieval, retrieval of recorded speech, and question answering have also been studied. Details about TREC can be found at the TREC web site, http://trec.nist.gov .
You are invited to participate in TREC-9. TREC-9 will consist of a set of seven parallel tasks known as "tracks". Each track focuses on a particular subproblem or variant of the retrieval task as described below. Organizations may choose to participate in any or all of the tracks. For most tracks, training and test materials are available from NIST; a few tracks will use special collections that are available from other organizations for a nominal fee. For all tracks, NIST will collect and analyze the retrieval results.
Dissemination of TREC work and results other than in the (publicly available) conference proceedings is welcomed, but the conditions of participation preclude specific advertising claims based on TREC results. As before, the workshop in November will be open only to participating groups that submit results and to selected government personnel from sponsoring agencies.
Schedule: --------- By February 1, 2000 -- submit application described below to NIST. Returning an application will add you to the active participants' mailing list. On Feb 1, NIST will announce a new password for the "active participants" portion of the TREC web site. Included in this portion of the web site is information regarding the permission forms needed to obtain the TREC document disks. Beginning February 8 -- document disks distributed to those new participants who have returned the required forms. There is a total of 5 CD-ROMS containing about 5 gigabytes of text. In addition, 450 training topics (questions) and relevance judgments are available from NIST. Please note that no disks will be shipped before February 8. August 2 -- earliest results submission deadline. August 30 -- latest results submission deadline. (Results deadline will vary by track. The Web track deadline will be August 2. Deadlines for other tracks are still to be determined, but will be sometime in August.) September 7 -- speaker proposals due at NIST. October 5 -- relevance judgments and individual evaluation scores due back to participants. Nov. 13-16 -- TREC-9 conference at NIST in Gaithersburg, Md. Task Description: -----------------
Below is a brief summary of the tasks. Complete descriptions of tasks performed in previous years are included in the Overview papers in each of the TREC proceedings (in the Publications section of the web site).
For most tracks, the exact definition of the tasks to be performed in the track for TREC-9 is still being formulated. Track discussion takes place on the track mailing list. To be added to a track mailing list, send a request to the Mailing List Contact Address listed below. For questions about the track, send mail to the track coordinator (or post the question to the track mailing list once you join).
Cross-Language Track -- a track that investigates the ability of
retrieval systems to find documents that pertain to a topic
regardless of the language in which the document is written.
In previous TRECs, the cross-language track involved documents written in English, German, French, or Italian. Starting in 2000, the investigation of cross-language retrieval for European languages will have its own evaluation known as CLEF (for Cross-Language Evaluation Forum). More details about CLEF can be found at the CLEF web site, http://www.iei.pi.cnr.it/DELOS/CLEF .
In TREC-9, the cross-language track will use English and Mandarin documents and English topics. Depending on data availability, the track may also involve Tamil and Malay documents.
Track coordinator: Donna Harman, email@example.com
Mailing list contact address: firstname.lastname@example.org
Filtering Track -- A task in which the user's information need is
(and some relevant documents are known) but there is a stream
of new documents. For each document, the system must make a
binary decision as to whether the document should be
(as opposed to forming a ranked list).
Track coordinators: David Hull, email@example.com and Steve Robertson, firstname.lastname@example.org
Mailing list contact address: email@example.com
Interactive Track -- A track studying user interaction with text
systems. This year's track will use the Web document
collection and a
task similar (but not identical) to the Question Answering
All participating groups follow a common experimental
that provides insights into user searching.
Track coordinator: Bill Hersh, firstname.lastname@example.org
Mailing list contact address: email@example.com
Query Track -- A track designed to foster research on the effects
query variability and analysis on retrieval performance.
participant constructs several different versions of existing
TREC topics. All groups then run all versions of the topics.
Track coordinator: Chris Buckley, firstname.lastname@example.org
Mailing list contact address: email@example.com
Question Answering Track -- A track designed to take a step
to *information* retrieval rather than *document* retrieval.
For each of a set of 500 questions, systems produce a text
extract that answers the question. Different runs will
have different limits on the maximum length of the extract,
including a short phrase (a few words), 50 bytes, and 250
Track coordinators: Amit Singhal, firstname.lastname@example.org and Tomek Strzalkowski, email@example.com
Mailing list contact address: firstname.lastname@example.org
Spoken Document Retrieval Track -- A track that investigates the
effects of speech recognition errors on retrieval
The task to be performed in TREC-9 is still to be determined.
Please contact the track coordinator as soon as possible if
you are interested in this track.
Track coordinator: John Garofolo, email@example.com
Mailing list contact address: firstname.lastname@example.org
Web Track -- A track featuring ad hoc search tasks on a document
set that is a snapshot of the World Wide Web. The main focus
of the track will be to form a Web test collection using
relevance judgments. The document set will be a 10GB
of the existing VLC2 document set. Topics will be created at
NIST by taking queries from search engine logs and
topic statements around them. (Thus, the true web query will
be there, but there will also be a narrative explaining how
will be judged.) Relevance judgments will then be made using
the traditional TREC pooling methodology, with NIST assessors
doing the judging.
Track coordinator: David Hawking, David.Hawking@cmis.csiro.a u
Mailing list contact address: David.Hawking@cmis.csiro.au
The conference itself will be used as a forum both for presentation of results (including failure analyses and system comparisons), and for more lengthy system presentations describing retrieval techniques used, experiments run using the data, and other issues of interest to researchers in information retrieval. As there is a limited amount of time for these presentations, the program committee will determine which groups are asked to speak and which groups will present in a poster session. Groups that are interested in having a speaking slot during the workshop will submit a 200-300 word abstract in September describing the experiments they performed. The program committee will use these abstracts to select speakers.
As some organizations may not wish to describe their proprietary algorithms, TREC defines two categories of participation.
Category A: Full participation Participants will be expected to present full details of system algorithms and various experiments run using the data, either in a talk or in a poster session.
Category C: Evaluation only Participants in this category will be expected to submit results for common scoring and tabulation, and present their results in a poster session. They will not be expected to describe their systems in detail, but will be expected to report on time and effort statistics.
The existing TREC English collections (documents, topics, and relevance judgments) are available for training purposes and will also be used in some of the tracks. Parts of the training collection (Disks 1-3) were assembled from Linguistic Data Consortium text, and a signed User Agreement will be required from all participants. The documents are an assorted collection of newspapers, newswires, journals, and technical abstracts. A separate Agreement is needed for the remaining disks (4-5). All documents are typical of those seen in a real-world situation (i.e. there will not be arcane vocabulary, but there may be missing pieces of text or typographical errors). The relevance judgments against which each system's output will be scored will be made by experienced relevance assessors based on the output of all TREC participants using a pooled relevance methodology.
Response format and submission details:
Organizations wishing to participate in TREC-9 should respond to this call for participation by submitting an application. An application consists of four parts: contact information, a one-paragraph description of your retrieval approach, whether you will participate as a Category A or a Category C group, and a list of tracks that you are likely to participate in. Contact information includes a full regular address, voice and fax telephone numbers, and an email address of the one person in the organization who will be the main TREC contact. Please note that email is the only method of communication in TREC. Participants in TREC-8 who will participate in TREC-9 should also submit an application.
All responses should be submitted by February 1, 2000 to Ellen Voorhees, TREC project leader, at email@example.com . Any questions about conference participation, response format, etc. should be sent to the same address.
Program Committee ----------------- Ellen Voorhees, NIST, chair James Allan, University of Massachusetts, Amherst Nick Belkin, Rutgers University Chris Buckley, Sabir Research, Inc. Jamie Callan, Carnegie Mellon University Susan Dumais, Microsoft Donna Harman, NIST David Hawking, CSIRO, Australia Bill Hersh, Oregon Health Sciences University Darryl Howard, U.S. Department of Defense David Hull, Xerox Research Center Europe John Prange, U.S. Department of Defense Steve Robertson, Microsoft Amit Singhal, AT&T Labs Research Karen Sparck Jones, Cambridge University, UK Tomek Strzalkowski, GE Ross Wilkinson, CSIRO, Australia