Feb 29, 2008: Difference between revisions

From ReddNet
Jump to navigation Jump to search
(New page: =Bi-Monthly Collaboration Meeting= ==Coordinates== * Feb 15, 2008 -- 11:00ET/10:00CT/8:00PT * Call 510-665-5437 * Meeting ID is 7333 ==Attending== * Santi, Alan, Larry, Bobby (Vanderbi...)
 
 
(11 intermediate revisions by the same user not shown)
Line 2: Line 2:


==Coordinates==
==Coordinates==
* Feb 15, 2008 -- 11:00ET/10:00CT/8:00PT  
* Feb 29, 2008 -- 11:00ET/10:00CT/8:00PT  
* Call 510-665-5437  
* Call 510-665-5437  
* Meeting ID is 7333
* Meeting ID is 7333
Line 8: Line 8:
==Attending==
==Attending==
* Santi, Alan, Larry, Bobby (Vanderbilt ACCRE)
* Santi, Alan, Larry, Bobby (Vanderbilt ACCRE)
* Paul and Dan (Vanderbilt)
* Paul (Vanderbilt)
* PR and Diane (SFASU)
* PR and Diane (SFASU)
* John Cobb (ORNL)
* Hunter (Nevoa)
* Terry and Chris (UTK)


==Agenda ==
==Agenda ==


=== Status Report on Production REDDnet Deployment ===
=== Status Report on Production REDDnet Deployment ===
new keys on their way out to Caltech and UFL.  UMich
will be sent out today.  Still waiting to hear from UCSD.
All sites willing to help to do re-imaging.  As they
are re-imaged (as soon as next week) they can be
brought online.
Once all exisiting depots are online Nagios will be back
up.
there is a partition on each system that hopefully will
allow us to install rescue software in a seperate partition
to help with recovery of lost nodes.  But this project still needs some
work to complete it, which probably won't be looked at until the end of March.
=== AmericaView Status ===
they are routing over I1 and not I2.  so that is one
particular problem with download speeds.  investigations
going on at SFASU.
timeout issue with metadata server... but timeouts are at
Vanderbilt... traceroute timing out at Vanderbilt?  could
be Vanderbilt doing "traffic shaping"
they will keep us posted.
everything is uploaded to default LUN.  its there and
coming down fine.  (modulo the network issues).
they will consider augment to SFASU for persistent
copy and then augment to default which will spread
out to everyone for use.
=== LODN Status ===
new depot code required.  should now work with both warmer
and LODN code.  but this exposed another issue with storecore
and so new depot code will not be distributed until then.
Nevoa is looking into this and after that is working ACCRE
will roll out depot to rest of REDDnet.
UTK will try to install local depot so they can do testing.
Alan has to package it up a bit better first.
And Chris has fixed a problem locally at UTK (not with
LODN code but a campus problem) that was also causing
trouble.
everything works fine now (LODN and warmer) on test depots
at Vanderbilt so optimistic will work fine on production
depots.
=== TeraGrid Status/ORNL ===
David Giles has joined ORNL as a systems person,
John has him for half time.  Some hope now to get the UTK
depots up.
Around end of March or early April, Bobby and other from
ACCRE should stop by ORNL to meet with David and John to
discuss getting ORNL REDDnet depots up.


=== Library of Congress Status ===
=== Library of Congress Status ===
To find out how things were going, Terry contacted Andy Boyko at the LC.  Here is what Andy said:
"Hi, Terry.  I'm putting together the summary email now, but the summary summary is: transfer rate never exceeded 40-50Mbps, which is lower than I hoped for but should still be fine; a bigger concern is that the transfer ended up retrieving only 53439 out of the 56712 files named in the manifest, without any indication of a failure.  Hoping we can talk about the best approach to using the client on our call later today..."
Santi and Alan will call Andy at LC before the LC call this
afternoon to discuss technical details.
Need to find out if the problem with missing files was on the upload or download, for example.
And network speed can be worked on.


=== OSG Status ===
=== OSG Status ===
Dan was not present and could not report, but he is making progress.  For example, his gridftp
server works completely now, you can upload or download a file via gridftp into/outof REDDnet
and the files appear on the LODN website.


=== CMS Status ===
=== CMS Status ===
Dan was not present.

Latest revision as of 11:48, 29 February 2008

Bi-Monthly Collaboration Meeting

Coordinates

  • Feb 29, 2008 -- 11:00ET/10:00CT/8:00PT
  • Call 510-665-5437
  • Meeting ID is 7333

Attending

  • Santi, Alan, Larry, Bobby (Vanderbilt ACCRE)
  • Paul (Vanderbilt)
  • PR and Diane (SFASU)
  • John Cobb (ORNL)
  • Hunter (Nevoa)
  • Terry and Chris (UTK)

Agenda

Status Report on Production REDDnet Deployment

new keys on their way out to Caltech and UFL. UMich will be sent out today. Still waiting to hear from UCSD. All sites willing to help to do re-imaging. As they are re-imaged (as soon as next week) they can be brought online.

Once all exisiting depots are online Nagios will be back up.

there is a partition on each system that hopefully will allow us to install rescue software in a seperate partition to help with recovery of lost nodes. But this project still needs some work to complete it, which probably won't be looked at until the end of March.

AmericaView Status

they are routing over I1 and not I2. so that is one particular problem with download speeds. investigations going on at SFASU.

timeout issue with metadata server... but timeouts are at Vanderbilt... traceroute timing out at Vanderbilt? could be Vanderbilt doing "traffic shaping"

they will keep us posted.

everything is uploaded to default LUN. its there and coming down fine. (modulo the network issues).

they will consider augment to SFASU for persistent copy and then augment to default which will spread out to everyone for use.

LODN Status

new depot code required. should now work with both warmer and LODN code. but this exposed another issue with storecore and so new depot code will not be distributed until then. Nevoa is looking into this and after that is working ACCRE will roll out depot to rest of REDDnet.

UTK will try to install local depot so they can do testing. Alan has to package it up a bit better first.

And Chris has fixed a problem locally at UTK (not with LODN code but a campus problem) that was also causing trouble.

everything works fine now (LODN and warmer) on test depots at Vanderbilt so optimistic will work fine on production depots.

TeraGrid Status/ORNL

David Giles has joined ORNL as a systems person, John has him for half time. Some hope now to get the UTK depots up.

Around end of March or early April, Bobby and other from ACCRE should stop by ORNL to meet with David and John to discuss getting ORNL REDDnet depots up.

Library of Congress Status

To find out how things were going, Terry contacted Andy Boyko at the LC. Here is what Andy said:

"Hi, Terry. I'm putting together the summary email now, but the summary summary is: transfer rate never exceeded 40-50Mbps, which is lower than I hoped for but should still be fine; a bigger concern is that the transfer ended up retrieving only 53439 out of the 56712 files named in the manifest, without any indication of a failure. Hoping we can talk about the best approach to using the client on our call later today..."

Santi and Alan will call Andy at LC before the LC call this afternoon to discuss technical details.

Need to find out if the problem with missing files was on the upload or download, for example.

And network speed can be worked on.

OSG Status

Dan was not present and could not report, but he is making progress. For example, his gridftp server works completely now, you can upload or download a file via gridftp into/outof REDDnet and the files appear on the LODN website.

CMS Status

Dan was not present.