Operations and Deployment: Difference between revisions

From ReddNet
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
=== Deployment ===  
=== Deployment ===  
* Bring current deployment up-to-date (estimate complete by mid-Feb)
* '''Bring current deployment up-to-date (estimate complete by mid-Feb)'''
** Build new image for 2GB internal USB memory
** Build new image for 2GB internal USB memory
*** may be done by All-Hands
*** may be done by All-Hands
Line 15: Line 15:
*** needs a day or two of time, could be done now depending on priority
*** needs a day or two of time, could be done now depending on priority


* Prepare additional existing hardware for deployment (19 nodes)
* '''Prepare additional existing hardware for deployment (19 nodes)'''
** Update image on internal USB (will use for testing of the above recovery process)
** Update image on internal USB (will use for testing of the above recovery process)
** Send 6 depots to SFASU with additional PDU
** Send 6 depots to SFASU with additional PDU
Line 37: Line 37:
** review deployment experience end of March
** review deployment experience end of March


* Proposed multi-tiered system for sites for discussion
* '''Proposed multi-tiered system for sites for discussion'''
** Tier 1: Sites that run their own LServer and Chord ring
** Tier 1: Sites that run their own LServer and Chord ring
** Tier 2: Sites that manage their own REDDnet depots  
** Tier 2: Sites that manage their own REDDnet depots  
Line 48: Line 48:


=== Monitoring ===
=== Monitoring ===
* Use StorCore, Nagios, iperf, and visualization tools from SC07  
* '''Use StorCore, Nagios, iperf, and visualization tools from SC07'''
** Have a statistic page that gathers information from tests and presents them cleanly
** Have a statistic page that gathers information from tests and presents them cleanly
*** Long term project - 6 months to a year?
*** Long term project - 6 months to a year?


* What is required to provide adequate support for REDDnet
* '''What is required to provide adequate support for REDDnet'''
** Want feedback on this.
** Want feedback on this.
** Setting expectations?
** Setting expectations?
Line 62: Line 62:
** longer term - expected to evolve, integrate with the vis site
** longer term - expected to evolve, integrate with the vis site


* Create an RT site to resolve users' issues
* '''Create an RT site to resolve users' issues'''
** needs to happen quickly.  mid-Feb.
** needs to happen quickly.  mid-Feb.


=== Validation Framework ===
=== Validation Framework ===
* Stress and WAN testing on Production REDDnet
* '''Stress and WAN testing on Production REDDnet'''
** Automated testing with Clyde
** Automated testing with Clyde
*** excercise system prior to heavy real world use
*** excercise system prior to heavy real world use

Revision as of 09:45, 31 January 2008

Deployment

  • Bring current deployment up-to-date (estimate complete by mid-Feb)
    • Build new image for 2GB internal USB memory
      • may be done by All-Hands
    • Design and implement a depot recovery process
      • This process will be vetted on current deployed hardware
      • Initial process may be done by All Hands or soon thereafter
      • Begin with SFASU for initial vetting
    • Send recovery keys out to sites and update the depots
      • early February
      • need to recruit one person at each site to assist
    • Set Nagios back up
      • need a more stable system before this makes sense
      • turned off during this transition period to avoid flood of diagnostics
      • needs a day or two of time, could be done now depending on priority
  • Prepare additional existing hardware for deployment (19 nodes)
    • Update image on internal USB (will use for testing of the above recovery process)
    • Send 6 depots to SFASU with additional PDU
    • Find new collaborators/sites for remaining 13.
      • Ideas?
  • Develop MOU for current deployment (timescale uncertain?)
    • Longer term project
  • Define a standard set of software tools for depots
    • The following also exist
      • Iperf
      • Nagios
      • mtr
    • other tools to be added?
      • investigating new tools now
  • Gain experience with existing deployment
    • See "Validation Framework" below.
    • make frequent reports in weekly REDDnet meetings
    • review deployment experience end of March
  • Proposed multi-tiered system for sites for discussion
    • Tier 1: Sites that run their own LServer and Chord ring
    • Tier 2: Sites that manage their own REDDnet depots
    • Tier 3: Sites that use their own storage resources as depots
    • Develop MOU for each tier
  • Investigate new management tools over the next two months
    • rsync or similar (short term)
    • Perceus (long term)

Monitoring

  • Use StorCore, Nagios, iperf, and visualization tools from SC07
    • Have a statistic page that gathers information from tests and presents them cleanly
      • Long term project - 6 months to a year?
  • What is required to provide adequate support for REDDnet
    • Want feedback on this.
    • Setting expectations?
    • Actively talk to users over the next couple of months - 6 months - ongoing
    • Develop an initial plan by mid-Feb, then review every 3 months and tweak
  • Create a REDDnet status site, using google maps
    • short term just get green dots on a map (what is the priority for this?)
    • longer term - expected to evolve, integrate with the vis site
  • Create an RT site to resolve users' issues
    • needs to happen quickly. mid-Feb.

Validation Framework

  • Stress and WAN testing on Production REDDnet
    • Automated testing with Clyde
      • excercise system prior to heavy real world use
      • unfortunately longer term - first of April?
      • this testing will move to test deployment eventually
    • Real world use (happening now, although not heavy)
  • QA testing on Test REDDnet required before moving into production REDDnet
    • A stringent set of tests to test both the hardware, OS, IBP, and LStore as throughly as possible (primarily Clyde)
    • Allow users to test using this system