August 9, 2006
REDDnet Steering Committee Meeting
- Blondin (NCState)
- Hagewood (Nevoa)
- Beck, Moore (UTK)
- Sheldon, Stewart, Tackett (VU)
- Blackwell (SFA)
- Cavanaugh (Florida)
- Swany (UDel/CERN)
- Discuss with committee how best to engage the applications that were part of the REDDnet proposal: how to get them using the facility.
- Items from the floor
Report from Alan on REDDnet implementation plans for next three years.
Trying to evaluate hardware possibilities.
Building blocks we are looking at are $1K/TB
Hopefully for SC06 put together 10-20 of these for initial tests think we might be able to saturate as much as 20Gbs with 20 of these boxes.
this would be ~80 TB
after SC06 start to deploy these around and test, start to deploy more (1/3 of hardware this year?)
Software: At SC06, we will have a baseline system with most things including erasure coding but without distributed metadata. A version 1 of the software
Question for John: how much is the TSI using LN tools? Answer: not much right now but collaboration is moving to a much more collaborative mode and we will need these tools. Data sizes from a few hundred GBs to several TB for one simulation (done at Oak Ridge). The main collaboration sites are NC State, UCSD, ORNL, SUNY Stony Brook, and Florida Atlantic.
Phoebe: up to data sets several hundred GB now, 2-10 GB is data that we would want to share. We do imaging processing so we would have several "copies" of the data that were slightly different. We have proposed work with LBNL but right now it is just us locally at Vanderbilt, we just need space.
Alan: probably first quarter next year we will have tape integrated...
John: REDDnet would store working copies, archived copies at ORNL. A given simulation will be actively looked at for several months, needs to persist for that long.
Micah: could we use the tape that is part of REDDnet to support stability as opposed to long term archiving?
John: at any one point in time, we will need a few TBs of space at least.
Terry: but maybe you want more than one copy? 3-4 copies? so 15 TBs?
Micah: you will be running on Jaguar at ORNL, how do you get data off of that?
John: working with the people at ORNL, feedback is that this will be improved. Things are much better than PHENIX in terms of network I/O.
Micah: I have been working with Scott Klasky, moving data off Jaguar to ewok systems, then off to world.
John: Lustre is supposed to be working much better (according to ORNL folks)... maybe even collective I/O...
Terry: are you still having All-Hands meetings? It would be nice to meet with you, San Diego, and so on all at once to start to talk about this with your group all at once.
At this point we are putting together slightly different solutions with each group, and we are also trying to decide on deployment... we probably need to meet with groups on their own turf if you will this year...
Pheobe: (answering a question from Terry) there are many other labs doing similar work to me. (in response to question from Alan) We definitely want to archive our 700 GB of data. We may share the output data, but that is only 2 GB. Right now I only move data around on campus, haven't had to move it from one institution to another. As for collab with Berkeley, they would only need some of the data, not all 700 GB.
What is REDDnet? Can I read/write natively from my application or is it like a depot?
Alan: may be able to link some of Micah's tools to read/write natively, but what we plan to offer is the depot. You would use a command very like scp to move data in and out.
Micah: the tools we have to treat LN like a filesystem... that all comes from the fusion energy group and Scott Klasky who said that is what they need to make it useful. Some users don't like extra steps.
Pheobe: I would like to treat it like a big disk, I need to constantly read from it.
My favorite program is called free-align... I read a chunk of the large input data (on image out of 700,000 or so) It would be nice to have 10 processors, each working on one chunk... check in and get a little bit of data... I have access to the source so I could re-compile...
Micah: what format is the file?
John: (response to question from Terry): we are using NetCDF for I/O (some HDF5).
Micah: writing NetCDF costs you half your performance... over writing raw binaries...
John: we are flexible.
for the Terascale people, meet with them at their next meeting? At ORNL...
for Pheobe: we will deploy some campus depots in about a month.
Pheobe: we would be most interested in the direct reads...
Micah: I can offer the graduate student who wrote the direct I/O stuff... he could try to help you port your code to use this. The difference is that it would use a different directory. Use that to test performance, then we could work on getting the L-Store directory part implemented. I can send him over, sooner would be better than later? Later next week? (after ACCRE power outage?) I may try to come with him...
Micah: I was talking to the Internet II people, who have become very interested in Tennessee (now that it looks like we might be going the NLR route...) they offered space for a side meeting at their next meeting. In Chicago, right down town (Hyatt) in December 4-7. Should we take them up on this offer and have a REDDnet meeting?
Terry: AmericaView sends people to this meeting, often have breakout meetings.
Micah: this would serve as a workshop/collaboration meeting, we could try to get the Brazilians... could we get our NSF program manager to come? We have to play it up to leverage MRI? Have a ribbon cutting ceromony/Kick-off? we could have tutorials on using the I/O libraries, etc.
Speculative... they have competition now, NLR is competing with them now. They are looking for things to differentiate themselves other than just circuit switching. Martin is only faculty member fellow... could we leverage this somehow?
So, look at your calendars. Meeting would be day before (3rd-Sun) or after (8th-Fri).
Terry: we will put together something for I2 and send it to them.
Everyone: seems to be very positive. Especially good if we can get application groups.
Consider this a kick-off/All-hands meeting. should have some preliminary results. can leverage material we put together for SC06.
John will let us know about the TSI ORNL meeting. (new name is prism).