REDDnet: Enabling Data Intensive Science in the Wide Area
REDDnet (Research and Education Data Depot network) is an NSF-funded infrastructure project designed to provide a large distributed storage facility for data intensive collaboration among the nation's researchers and educators in a wide variety of application areas. Its mission is to provide "working storage" to help manage the logistics of moving and staging large amounts of data in the wide area network, e.g. among collaborating researchers who are either trying to move data from one collaborator (person or institution) to another or who want share large data sets for limited periods of time (ranging from a few hours to a few months) while they work on it. REDDnet is not designed or intended to be a replacement for reliable archival or long term personal storage and users must make separate arrangements to insure that the data they are sharing via REDDnet's "best effort" storage is also preserved independently with stronger guarantees.
One example comes from the CMS collaboration, a high energy physics experiment that will be taking data soon at the Large Hadron Collider (LHC) at CERN. Groups of researchers, distributed across the country and the world, will want to use data products derived from the raw data produced by collisions in the LHC to do a variety of tasks from calibrating the detector to searching for new physics. They will want the newest data products available for anywhere from a month to a few months, after which it can be archived to make way for the next batch of data. Although all the data will be stored long term at CERN and Fermi Lab they would benefit greatly if this data could be made more readily available for processing on their distributed computing infrastructure, especially on the Open Science Grid. REDDnet is the kind of resource needed to deal with the data logistics of this application.
Another example, from the AmericaView project, might occur in the aftermath of an earthquake in California or a Hurricane on the Gulf Coast, where researchers across the country will want access to the geospatial image data from satellites covering the affected region. For a few months after the event, this data could be uploaded to REDDnet and made available to this community with much higher levels of performance and availability.
Initially, REDDnet will deploy >700 Terabytes of distributed storage with an emphasis on scalability, speed and fault tolerance. Currently (Spring 08), there are roughly 160 TB deployed. For example, at the Supercomputing 2006 Conference in Tampa, Florida, REDDnet demonstrated sustained transfers at a rate of 10 Gigabits per second between Caltech and the convention floor. These transfers were limited by the bandwidth of the network connection. At the same conference, REDDnet demonstrated fault tolerance by striping data across thirty depots and then successfully reading the data even after turning off nine of these depots.
Research Projects Using REDDnet
- AmericaView - Satellite remote sensing data and technologies in support of applied research, K-16 education, workforce development, and technology transfer.
- Structural Biology - Image reconstruction of large macromolecular assemblies through a collaborative effort of Vanderbilt and Lawrence Berkeley National Laboratory researchers.
- Terascale Supernova Initiative - a multidisciplinary collaboration to develop models for core collapse supernovae and related enabling technologies.
- National Geospatial Digital Archive (NGDA) - a collecting network for the archiving of geospatial images and data.
- Retinopathy - Diabetic Eye Disease Screening in Peru and Bolivia
|Vanderbilt||Tennessee||S. F. Austin||ORNL||Nevoa Networks||N. C. State||Delaware|
Collaborating Host Institutions
|São Paulo||Rio de Janeiro||Michigan||Florida||Fermilab||Caltech|
|AMPATH||FIU||Library of Congress||SDSC||Stanford||UCSB|
This work is supported by NSF Grant PHY-0619847 and by the Vanderbilt Center for the Americas