REDDnet user scenarios
One way to inform decisions about the design, configuration, and policy regime of REDDnet is to develop scenarios that characterize how researchers plan to use it to advance their work. This page is intended to provide a place for those scenarios to be developed and recorded.
Questions for developing REDDnet user scenarios
This set of questions is intended to help people generate or fill out scenarios which exemplify expected patterns of usage for REDDnet. It is meant to be suggestive, not exhaustive. In some cases it raises similar issues from different points of view.
- Follow the data flow – What size? How often? Where’s it going?
- How much data is being produced?
- How often is new data collected/generated? e.g. is there new data produced every night, week, month?
- Is it preprocessed for distribution?
- Does everyone need/want the same data? If not, how is it broken up?
- Do users or groups generate derived data products which they then need to distribute?
- User Community – Size and distribution of the user community and their resources
- How many users in the community who will be using the data
- How many (expected) sites?
- Are all the sites on fast networks or are some user sites on slower links, relative to the amount of data?
- Where are the computing resources they’ll be using? Where does the computation take place in relation to the data, i.e. on a local cluster, on a remote SC, on the Grid?
- Is there a significant secondary community, e.g. a consumer community who want to use the products of the first order community? Example: County planners and agriculture agents who want to access to process satellite imagery for their area.
- Do the user sites have sufficient “working storage”?
- Do user sites archive the data, or some subset of it? Do they or will they provide archival services for other sites?
- Is the community expected to grow? How fast?
- What's the relative size of the group that wants to distribute content relative to those who just want to consume it? Are most people just downloading or streaming, or do most people want to upload as well?
- Usage Patterns
- Are there constraints on the ways the different members of the community can use it? For example, do the users need streaming because they don’t have adequate storage at they’re site?
- Will users be providing longer term storage for some of the data they want easier/faster access to?
- Do they need to open it from a program? What applications are used and what are it’s uses?
- How long do they need easy/fast access to the same data? For example, do they need access to the most recent data, which can go into the archive as soon as the next batch of data from the source is made available?
- Is there a difference between the policy for writers and the policy for readers, e.g. content distribution.
- Is there some data that is freely available, and some that people will need to authenticate to see?
- Will there be any storage that users can use without authenticating? Read from? Write to?
- Does the data have commercial value?
- Will REDDnet need to be a single virtual community?
- Will REDDnet users need OSG, TeraGrid, or other credentials?
Applications and Communities
Currently, 1TB of simulation data needs to be shared for among 3 different researchers at locations on opposite sides of the the country (e.g. Raleigh and UC Davis). The data is produced by simulation runs at ORNL or NERSC. At one step in the process, the data needs to be moved to UC Davis for volume rendering. The current rate for that step is 5MB/s, which means that it takes about a week of continuous operation to move a TB there. This group shares this kind of data about every month or two, but that is in part because it is such a laborious process to move the data. Much more data is than this is generated, but it isn't shared and stays at ORNL.
[The description above is just a stub based on an earlier email from John Blondin. I plan to fill it out in the near future by questionning John further. Terry]