CMS

From ReddNet
Jump to navigation Jump to search

Goals

  • L-Store plugin for root, so that rootio can read (write?) L-Store files
  • CMS then uses REDDnet as temporary storage for user analysis (Tier 3 and below)
  • Other CMS Applications possible, begin with above.

Benchmarks

  • IBP --> CMSSW streaming tests
    • CMSSW_1_2_0
    • ROOT/L plugin
      • extends TFile Class
    • input: 1.5GB root file, 100 events
  • RUNNING ON VAMPIRE, 2GHz CPU:
Vanderbilt has 300MByte connection to outside
Data Source # Depots URL Ping

time (ms)

iperf file

download

CMSSW 100

events (mins)

in out
local gpfs 0 /gpfs2/ 12
vanderbilt reddnet depots 10 vudepot1.accre.vanderbilt.edu 0.162 1.5
across campus ibp depot 1 vpac12.phy.vanderbilt.edu 0.458 4.5
remote reddnet depots 5 ounce.cs.utk.edu 12.7 21
pound.cs.utk.edu 12.8
acre.cs.utk.edu 12.7
umich-depot01.ultralight.org 82.7
ibp.its.uiowa.edu 35.1
  • RUNNING AT CALTECH, 2.4GHz Opteron:
Caltech has 10GByte connection to Ultralight
Data Source # Depots URL Ping

time (ms)

iperf file

download

CMSSW 100

events (mins)

in out
local disk 0 /dev/hda3 1
vanderbilt reddnet depots 10 vudepot1.accre.vanderbilt.edu 100
campus ibp depot 1
remote reddnet depots 5 ounce.cs.utk.edu
pound.cs.utk.edu
acre.cs.utk.edu
umich-depot01.ultralight.org
ibp.its.uiowa.edu

Current Work in Progress

  • Figure out how to get necessary code included in CMSSW
    • Talk to Bill Tannenbaum, Phillipe Canal,...
    • include L-Store code in CMSSW distribution so it is built on platforms correctly for use with rest of CMS software.
    • that way no software for users to download themselves, no changing of configuratino scripts, etc.
    • how test and validate before checking in?
    • how to check code in?
  • Figure out all issues needed to integrate with CMS data distribution model
    • phEDeX, TFILE, DBS/DLS,...
  • Switch Root plugin to use L-Store version of libxio

Demos

Demos at March 2007 OSG Grid Meeting (UC San Diego)

Can use Vladimir's or Dmitri's analysis for all of below.

Interactive Root

  • First upload a file to Depots spread across the WAN, use LORSView to show where and how they go.
  • Then read it back in root, show it works.
  • Mainly an introduction to the issues.

100 Node ACCRE Run

  • each reads its own file from WAN set of depots.
  • show speed versus local copies of file (data tethered analysis).

100 CPU Grid Job

  • similar to ACCRE Run, each job reads its own file from WAN depots.
  • jobs are distributed accross open science grid sites
  • demonstrates complete lack of data tether.

To Do To Get Ready

  • Run all of the above many times before actual demo!
  • Get LORSview upload working
  • Figure out how to submit 100 CPU Grid Job.
  • Want to run all 100 ACCRE jobs simultaneously? Need to work with ACCRE on that...

Get Rick Cavanaugh to run his analysis

  • need most of the stuff needed for "Summer 2007 demo" but maybe not all fully in place.
  • he runs himself.
  • work with him so he understands full functionality possible.
  • work with him to develop ideas for better implementing Summer 2007 demo
    • what docs are needed
    • best approach to getting users using it
    • etc.

Summer 2007 Demo

  • A "scratch space" demo for CMS users.
  • Use deployed REDDnet resources which should become available June 2007
  • Load REDDnet with "hot" data files, convince a few users to try them out
  • Must have L-Store code fully integrated with CMS software

General Testing

verify ROOT/L works

  • package up plugin for CMS L-Store test community
  • gain experience via benchmarking
  • finalize API (add write access?)
  • checkin plugin to root cvs
    • it will take a while for this ROOT addition to propagate into CMSSW
  • explore CMSSW procedures for checkin of LORS and/or LSTORE
    • it will take months for L-Store to be available for check-in

increasing level of stress tests:

(validate and benchmark)

do various combinations of the following:

  • single jobs vs simultaneous jobs
  • many jobs: at one cluster vs across the grid
  • simultaneous jobs hitting same ibp depot accessing one or many files
  • simultaneous jobs hitting same file at one depot or striped across many depots

Also need to benchmark and profile various types of jobs:

  • I/O intensive skims
  • CPU intensive jobs
  • show benchmarks/demonstrate which jobs work well with L-Store and which jobs won't work well (if any). Have thorough benchmarks for the worst-case scenario.
  • gather numbers to discuss impact on bandwidth as L-Store usage explodes.
  • will people feel more free to do unnecessary computations?

assemble interactive analysis demos:

  • host variety of interesting datasets
    • need to identify these datasets
  • make a wiki
    • with instructions
    • links to necessary data catalogs
      • L-Store
      • DBS/DLS
  • gather visually interesting ROOT Macros
    • event/detector displays
    • histograms/results
  • any FW-lite tools (even development versions) to try

assist user-analysis batch production:

  • identify and host a wide variety of datasets
    • calibration datasets
    • various backgrounds, pileup
    • variety of signal samples
  • populate catalogs to find datasets
  • web tools to assist this
    • how to find datasets
    • how to upload results
    • how to register results in catalogs
    • how to coordinate with L-Store and DBS/DLS

provide info on joining CMS/L (via L-Store? via REDDnet? ultralight?)

  • how to add an ibp depot


Long Term Needs

  • L-store version of libxio
  • Stable, production REDDnet depot deployment
    • including request tracker support!
  • L-store software fully integrated with CMS software, and being distributed.
    • this means need source code for L-Store version of libxio - check into CMS distribution

SRM interface

  • in principle, this is important for CMS usage.
  • Need to get new support person on board with this and up to speed.