AmericaView

From ReddNet
Revision as of 13:40, 17 July 2008 by Prblackwell (talk | contribs)
Jump to navigation Jump to search

AmericaView REDDnet Implementation Plan

Background

The Research and Education Data Depot Network (REDDnet) “will create a wide area storage facility for data intensive collaboration, consisting of a set of eight large storage nodes, strategically positioned across the nation’s high performance research networks, and configured with a distributed storage management technology, called Logistical networking (LN), expressly designed to attack major problems of data logistics.” The hardware infrastructure project, funded by NSF MRI, includes three application groups. One of these is remote sensing data archiving and distribution, represented by the AmericaView Remote Sensing Consortium.

TexasView, the AmericaView member consortium for Texas, has been working to harness Logistical Networking for remote sensing data distribution for several years. Initial work involved adapting the Logistical Download Network (LoDN) for use with the TexasView GloVis archive responder (download manager.) The Advanced Automated Archive Responder Gadget accepted scene selections from GloVis and offered either immediate or deferred LoDN data delivery as well as conventional http delivery. The limited success of this effort was attributed to two factors.:

First, the “best effort” nature of IBP depots meant that data had to be uploaded from the archive to the depots on demand. The IBP solution to this, i.e. replicating multiple copies across several depots, seemed unsuited for permanent storage of remote sensing data. Further, the excessive storage required for multiple copies of large datasets seemed impractical. As a result, the time required to complete the upload portion of the transaction greatly diminished the time savings afforded by the faster transfer rate.

Secondly, inherent instability in the LoDN server caused user frustration. Once frustrated by a failed transfer, most users abandoned the LoDN option and adopted the more pedestrian and reliable http download for future transactions.

REDDnet

The AmericaView REDDnet project addresses both of these shortcomings. First, with L-Store technology, storage is guaranteed. This means that a single copy of the data files can be stored reliably on the REDDnet depots. In addition, with 320 TB of IBP-enabled storage available in the REDDnet network, data storage capacity will no longer be a consideration.

Secondly, the LoDN server instability will be mitigated by replacing the LoDN server with an integrated facility for handing exnodes and a download client to the user.

Objectives

The Goal of the AmericaView REDDnet project is to demonstrate the application of logistical networking technology for archiving and distribution of a national archive of remote sensing data. To achieve this goal, the following objectives are identified:

  • Develop an ingest software tool that automates uploading remote sensing archives to REDDnet and archives exnodes in a GloVis-compatible storage system.
  • Develop GloVis archive responder software (download manager) that accepts a standard GloVis scene list, retrieves the appropriate exnode(s), delivers the exnodes to the client, and downloads and launches software to retrieve and assemble the original file.
  • Load all available AmericaView remotes sensing data into REDDnet and expose those data to the general public.

Software Requirements

Upload Tool The Upload Tool walks an archive data structure, identifies “deliverable units”, uploads these to REDDnet, receives an exnode, and stores the exnode for later use. This process should be merged with existing code used to generate GloVis scene lists and download GloVis browse and metadata. Requirements for the Upload Tool include:

  1. Ability to autonomously walk a directory structure and identify which files constitute a “deliverable unit”. This should work across a variety of directory structures and file collections, although some restrictions will probably have to be imposed.
  2. Integrate exnodes with existing GloVis data structure. (This step is rendered mute becuase L-Store archives exnodes by filename.)
  3. Incorporate existing browse and metadata download functionality.

Download Manager The Archive Responder will accept a standard GloVis scenelist, extract appropriate L-Store descriptors, pass the descrptors to the client along with an exnode retrieval tool and launch the tool. Requirements include:

  1. ability to be “branded” by the hosting entity.
  2. interaction with both a previously downloaded retrieval client and with a download on demand retrieval client.
  3. very simple interface – easy to use.
  4. possibly incorporate client TCP/IP stack tuning to improve performance (the Current LoDN download tool does not do this.)

AmericaView participation

The initial work on the AmericaView REDDnet project will be completed by TexasView with volunteer assistance from other StateViews on a voluntary basis. Once the system is operational, all StateView’s will be given the opportunity to load their data into REDDnet and/or host a REDDnet-GloVis instance. The latter will give each StateView the opportunity to “brand” both the GloVis interface and the download interface for their instance. It will appear that each StateView is hosting the entire AmericaView archive, while in fact each is merely providing a window into the REDDNET archive.

Loading data from each StateView will be much simpler if some limitations are place on file formats and band combinations. We might consider defining a standard set of formats. This is problematic because there are so many different formats available. One approach would be to offer 1 file only for each scene. This 1 file might be a complete NLAPS data set (zipped), a full dataset in TIFF format (zipped), or a set of 3, 3-band jpg composites (zipped.) In all cases, only one zip file would be delivered to the client.

However this issue is resolved, the upload procedure must be relatively simple if we are to get wide-spread adoption by the AmericaView community. Likewise, the download procedure must be even more straight-forward.

Comments:

Please include comments, ideas, suggestions, etc. here.


AmericaView/REDDnet Use Scenarios

I. Emergency Response

Background

The Center for Earth Resources Observation and Science (EROS) is a data management, systems development, and research field center for the U.S. Geological Survey's (USGS) Geography Discipline. EROS manages the operational functions of most non-commercial, earth observing satellite systems in the United States. One of the critical functions EROS serves is to provide satellite remote sensing data to researchers and responders during times of emergency.

Scenario

A Category 5 hurricane is approaching the upper gulf coast of Texas. The last time this happened the city of Galveston, Texas virtually disappeared overnight. Today, the proliferation of off-shore oil production, petrochemical plants, and population make this area even more vital and vulnerable than before.

Pre-Landfall

As the storm approaches, special LUNs are created on REDDnet to accommodate data for the storm response. Access permissions are established and priority status assigned for this activity. EROS uploads the latest Landsat datasets for the coastal region to REDDnet. EROS selects 25 scenes from the archive, opting for the most recent cloud-free coverage available. These datasets average around 300 megabytes per scene for a total of 7 ½ gigabytes of data. Exnodes for these data are distributed to AmericaView and other researchers at The University of Texas Center for Space Research (CSR), the NASA/UL Lafayette Regional Application Center(RAC) at the University of Louisiana at Lafayette, the Columbia Regional Geospatial Service Center (CRGSC) at Stephen F. Austin State University, and other regional research and governmental agencies. REDDnet allows efficient, simultaneous download to multiple sites. These data are used for pre-landfall planning and staging. Results are relayed to the Texas Governor’s Department of Emergency Management (GDEM), the Texas Department of Public Safety, Texas Military Forces (TMF), and other state and local agencies.

EROS also coordinates distribution of other datasets that will be useful during the pre-land fall. Light Detection and Ranging (LiDAR) data has been collected for the coastal region by the University of Texas Bureau of Economic Geology (BEG). This data set provide very accurate topographical data for the region. Necessarily, these detailed data are quite large, totaling several terabytes. CSR facilitates uploading of the LiDAR dataset to REDDnet. Exnodes are distributed to researchers working on the SURA Coastal Ocean Observing and Prediction (SCOOP) project where it is used to produce predictive models of wave and storm surge. Because of the collaboration between SURAgrid and REDDNet, SCOOP will be able to access data directly on REDDnet, eliminating the need to download large data sets. The results are uploaded to REDDnet, distributed and used to develop response plans including pre-deployment of equipment and personnel. In addition to helping emergency response personnel make difficult decisions under crisis conditions, this activity will save lives by allowing more accurate and timely evaluation determinations.

Post Landfall

After landfall, a large scale, high-resolution, aerial actuation is commissioned covering the entire area affected by the storm. These data are captured at 1 meter ground resolution and total approximately 3 terabytes in size. This imagery is needed to help guide emergency resources to areas where they will do the most good. Once again, REDDnet allows this large dataset to be distributed to multiple researches and agencies simultaneously. The only practical alternative is to load the data onto USB hard drives and manually deliver them by vehicle currier.

Meanwhile, researchers at Columbia Center at Stephen F. Austin State University are using electric grid and water supply data to map areas of outages and recommend the areas where emergency resources will have the most impact. The results of this work are relayed to the State Operations Center and to National Labs that are working on models relating storm activity to critical facility impact for the Department of Energy (DOE.)

Significance

The role of REDDnet in this scenario is to provide a dependable mechanism for simultaneous distribution of large datasets to multiple sites rapidly and reliably. In addition, REDDnet does this in a generic and easy to access way, making data accessible to a large contingent of users. The potential users of this information range from the national science labs, to GRID-based modelers, to State Operations Centers, to university researchers, to local law enforcement personnel. The potential for REDDnet to improve response, speed up recovery and save lives is very real.

II. LDCM Data Distribution

Background

The Landsat Data Continuity Mission (LDCM) is the follow on to the Landsat program, continuing 30 years of imaging the Earth from space. LDCM is the next-generation Landsat satellite and is projected to be launched during the summer of 2011. It will ensure the continued acquisition and availability of Landsat-like data will into the future. As is the case with the current Landsat missions, the operational phase of the LDCM will be run by the USGS EROS National Center. Unlike previous landsat data, EROS is planning to distribute LDCM data free of charge. LDCM archives are expected to grow at a rate of approximately 520GB per day. EROS has issued an RFP for researchers interested in high-volume ingest of LDCM data. The current definition of “high-volume” is 25 or more Landsat scenes per day (approximately 32.5 GB.) Many AmericaView state consortiums, including TexasView, could qualify as high-volume users. At present, EROS is planning to distribute data to high-volume users only on high-capacity hard media only.

Scenario

EROS installs a standard deployment of 8 REDDnet Depots at the National Center in Sioux Falls, South Dakota. LDCM data is loaded into the EROS depots daily and the exnodes are published on a LoDN like web service. Users access the exnodes and download the data on demand. Data expires in 30 days and is automatically purged from the system. Total data collected over 30 days would total approximately 15.6 TB, leaving a comfortable margin.

Standing orders for high-volume data users leverage data logistics to improve performance. For example, LDCM data for Texas is automatically pre-staged on the REDDnet depots at SFA upon upload.

Significance

REDDnet can benefit the transfer of LDCM data from EROS to the end user by pre-staging data close to the destination and increasing transfer throughput to the end user. In addition, REDDnet makes possible the use of Internet transfer for high-volume users, reducing the workload at EROS and saving money for the users.

II. AmericaView Repository Data Distribution

Background

AmericaView is a state-based, university-led, consortium dedicated to the advancement of satellite remote sensing technology. Among the missions embrased by AmericaView, each state maintains a repository of publically available satellite remote sensing data, providing access of these data to universities, agencies and the public.

Many states have sophisticated mechanisms for fulfilling this mission. Due to the size of these datasets, all AmericaView consortiums struggle with delivering the data in an efficient and timely manner. At the same time, some AmericaView consortia lack the resources or expertise to establish practical data archive, discovery and distribution systems of their own. Solving these problems is the first goal of the AmericaView/REDDnet initiative.

Scenario

TexasView and other states install local REDDnet depots. AmericaView data is loaded into LUNs defined on a subset of these. This subset is dedicated to persistent storage of AmericaView data. During the upload process, scene lists are developed containing pointers to exnodes on LStore directory servers. These scene lists are integrated with the GloVis discovery tool, and distributed to all AmericaView GloVis sites. Meanwhile, replicas are made across REDDnet and distributed as needed. Users visit any of the AmericaView GloVis sites to search and identify satellite images that meet their specifications. GloVis hands the request to an application that launches an Applet on the user’s computer. The Applet uses the scenelist to retrieve the exnode and retrieve the data from REDDnet to the client’s computer.

Significance

This implementation is functional today in the form of the TexasView prototype. It is significant to AmericaView that GloVis instances can be hosted anywhere, regardless of the local presence of a REDDnet depot. This allows all AmericaView member consortiums to offer the service, complete with their branding. Each GloVis instance has access to all AmericaView datasets. Thus, a member consortium that is not able to host an archive of their own, can publish their data holding on the AmericaView/REDDnet system, and even host a GloVis instance to access it.


AmericaView/REDDnet Free Satellite Remote Sensing Data Distribution BACKGROUND: Recent decisions by the United State Geological Survey (USGS) have fundamentally altered the manner in wish government-owned remote sensing data is being distributed. Previously, data was offered for sale by the USGS. Once purchased on one individual or entity, data was available to be use by anyone else desiring to do so. AmericaView member consortiums take advantage of this licensing model by building archives of purchased data and making those data available to the public through state archives. USGS has announced that in the future, all government owned remote sensing data will be made available to the public for free. The transition to free data distribution as already begun, and many datasets are now presented for free download through the USGS GloVis facility (http://glovis.usgs.gov). The AmericaView/REDDnet project will develop facilities to free USGS data available through the REDDnet system. This will make access to newly released free USGS data easier and more convenient for many users. The facility will provide a customized GloVis interface featuring the newest free data releases from USGS for the United States. Users will be able to browse these datasets and download them using fact REDDnet muti-stream protocol. METHODOLOGY: The system will work by maintaining a list of all data currently available for free from USGS and comparing that list with the master USGS GloVis scene list. Whenever new data appears in the master list it will be downloaded from USGS, uploaded to REDDnet, and added to the REDDnet GloVis scene list. The new data will be maintained in REDDnet for some period of time before being removed to make room for addition new data. Thus, at all times a cache of the latest free data will be available through REDDnet. SIGNIFICANCE: This project will demonstrate the effectiveness of logistical networking for large scale geospatial data distribution. It will reduce the time and effort required to identify new data of interest and download those data to local systems. Further, it will off-load a portion of demand from USGS data distribution facilities. Finally, it paves the way for further USGS projects involving Landsat Data Continuity Mission (LCDM) and emergency response data distribution. IMPLEMENTATION: Initial work on this project began in July, 2008. A working prototype, including GloVis interface, automated discover and LStore upload, and LStore data delivery is expected to be completed by the end of August, 2008.

ACTION ITEMS: 1. Get current GloVis contacts from USGS EROS National Center. 2. Download latest GloVis source code. 3. Re-establish a Linux GloVis development instance.

ISSUES: The following issues have been identified as important for the success of this project: 1. Exnodes must be removed from the LStore directory structure when the data they represent is removed from the depots. 2. How can we determine which datasets are available for free from the USGS GloVis metadata? 3. How do we download datasets programmatically from USGS data stores. 4. What are the implications of the different data types available from USGS?