Project Suggestion: PetaPlex
Title: PetaPlex super storage system
- Number of people: 10 projects, each with 2 or more
- Goal: for VT-PetaPlex-1 to go into production use on campus
- Contact information: Robert Akscyn, rma@ks.com,
President of Knowledge
Systems Incorporated, developer of PetaPlex technology;
the instructor; and others listed in subprojects below
- Required background: ability to program in C and C++, over networks
- Description:
This is an exciting project to help
deploy the Virginia Tech super storage cluster for campus use.
This system involves a powerful RS/6000 front end (1G RAM, 4 processors)
and the main PetaPlex unit itself. That has 100 nodes, each with a 233 MHz
Pentium, 64 M, and 25G byte disk, so there is a total of 2.5 terabytes capacity.
Robert Akscyn
will work with project groups during the semester.
Documentation should be available directly at
http://ks.com/vt/50.html
or indirectly from
ks.com/vt
under "Documents".
Particular efforts include (i.e., you can work on one or more of):
- Help Ohm Sornil (osornil@vt.edu) and extend his PhD work, that soon should
lead to a number of publications (which students may co-author) and probably
another dissertation. One effort is to integrate his inverted file technology
with MARIAN and get that to run on the PetaPlex as a production service.
Another is to carry out experiments
with information retrieval for 1 terabyte of text, to get performance figures
and tune the algorithms.
See O. Sornil, "A Distributed Inverted Index for a Large-Scale,
Dynamic Digital Library," Virginia Tech Computer Science, Blacksburg,
Ph. D. Dissertation, to be defended 1/25/2000 in McB104c at 9am.
- Develop support for video on PetaPlex, working with Paul Mather
(paul@csgrad.cs.vt.edu)
who is funded by IBM to connect their VideoCharger software
- Develop a Web server atop PetaPlex. Work on this proceeded well with
a CS6604 project group in Fall 2000:
- Rohit Gupta, rogupta@csgrad.cs.vt.edu
- Palash Jain, pjain@csgrad.cs.vt.edu
- Abhishek Ram, aram@vt.edu
It may be possible to adapt their code as a backend to a standard
server like Apache to
quickly achieve full functionality, using rewrite rulesets.
- Develop NFS and FTP services for PetaPlex.
Assistance can be provided by Sumedh Sawant, ssawant@vt.edu, who is
carrying out an Independent Study, CS5974, related to this
in Spring 2001.
- Develop quota and security controls for PetaPlex, so various applications
Assistance can be provided by Sumedh Sawant, ssawant@vt.edu, who is
carrying out an Independent Study, CS5974, related to this Spring 2001.
can run without interfering.
- Adapt a Web spider to PetaPlex, so we can index the Web.
Divide up the sites to index randomly among the 100 nodes
and have each one index a complete site, in turn.
See for example spider code from U. of Arizona.
- Adapt bioinformatics algorithms to the PetaPlex, running on MPI.
One interesting
possibility is the work of Professor David Bevan (drbevan@vt.edu).
He has particular interest in the parallel version of Assisted
Model Building with Energy Refiement (AMBER) software. See also separate specific project
about the BLAST tool.
- Connect PetaPlex with the Internet 2 Distributed Storage
Initiative
Students working on this include:
Julia Lee (jewlz_77@yahoo.com),
Dzung T. Dang (ddang@vt.edu),
Matt Weber (mweber@vt.edu)
- Connect PetaPlex to run the Storage Request Broker (SRB) software from
San Diego Supercomputer Center, so it can support collections interoperable
with that, around the world. See http://www.npaci.edu/dice/srb and
http://srb.npaci.edu/
- Develop support for images and GIS on PetaPlex.
The best contact on this is Professor
Carstensen,
carstens@vt.edu, x1-2600.