Ahm2005 Timbarras

PhEDEx: a novel approach to
robust Grid data management
Tim Barrass
Dave Newbold and Lassi Tuura
All Hands Meeting, Nottingham, UK
22 September 2005
What is PhEDEx?
• A data distribution management system
 Used by the Compact Muon Solenoid (CMS) High Energy
Physics (HEP) experiment at CERN, Geneva
• Blends traditional HEP data distribution practice
with more recent technologies
 Grid and peer-to-peer filesharing
• Scalable infrastructure for managing dataset
replication
 Automates low-level activity
 Allows manager to work with high level dataset concepts
rather than low level file operations
• Technology agnostic
 Overlies Grid components
 Currently couples LCG, OSG, NorduGrid, standalone sites
Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 2

The HEP environment
• HEP collaborations are quite large
 Order of 1000 collaborators, globally distributed
 CMS is only one of four Large Hadron Collider (LHC) experiments
being built at CERN
• Typically resources are globally distributed
 Resources organised in tiers of decreasing capacity
 Tier 0: the detector facility
 Tier 1: large regional centres
 Tier 2+: smaller sites-- Universities, groups, individuals…
 Raw data partitioned between sites, highly processed ready-for-
analysis data available everywhere
• LHC computing demands are large
 Order 10 PetaBytes per year created for CMS alone
 Similar order simulated
 Also analysis and user data

CMS distribution use cases
• Two principle use cases- push and pull of data

 Raw data is pushed onto the regional centres
 Simulated and analysis data is pulled to a subscribing site
 Actual transfers are 3rd party- handshake between active
components important, not push or pull
• Maintain end-to-end multi-hop transfer state
 Can only clean online buffers at detector when data safe at Tier
1
• Policy must be used to resolve these two
Tim Barrass, Bristol, use cases
tim.barrass@bristol.ac.uk 4
PhEDEx design
• Assume every operation is going to fail!
• Keep complex functionality in discrete agents
 Handover between agents minimal
 Agents are persistent, autonomous, stateless, distributed
 System state maintained using a modified blackboard
architecture
• Layered abstractions make system robust
• Keep local information local where possible
 Enable site administrators to maintain local infrastructure
 Robust in face of most local changes
 Deletion and accidental loss require attention
• Draws inspiration from agent systems,
“autonomic” and peer-to-peer computing
Transfer workflow overview

Production performance

Service challenge performance

Future directions
• Contractual file routing
 Cost-based offers for a given transfer
• Peer-to-peer data location
 Using Kademlia to partition replica location information
• Semi-autonomy
 Agents governed by many small tuning parameters
 Self modify- or use more intelligent protocols?
• Advanced policies for priority conflict resolution
 Need to ensure that raw data is always flowing
 Difficult real-time scheduling problem

Summary
• PhEDEx enables dataset level replication for the
CMS HEP experiment
 Currently manages 200TB+ of data, globally distributed
 Real life performance of 1 TB per day sustained per site
 Challenge performance of over 10TB per day
• Not CMS-- or indeed HEP-- specific
• Well-placed to meet future challenges
 Ramping up to get to O(10)PB per year
 10-100TB per day
 Data starts flowing for real in the next two years

Extra information
• PhEDEx and CMS
 http://cms-project-phedex.web.cern.ch/cms-project-phedex/
 cms-phedex-developers@cern.ch : feel free to subscribe!
 CMS Computing model
http://www.gridpp.ac.uk/eb/ComputingModels/cms_computing_model.pdf
• Agent frameworks
 JADE http://jade.tilab.com/
 DiaMONDs http://diamonds.cacr.caltech.edu/
 FIPA http://www.fipa.org
• Peer-to-peer
 Kademlia http://citeseer.ist.psu.edu/529075.html
 Kenosis http://sourceforge.net/projects/kenosis
• Autonomic computing
 http://www.research.ibm.com/autonomic/
• General agents and blackboards
 Where should complexity go? http://www.cs.bath.ac.uk/~jjb/ftp/wrac01.pdf
 Agents and blackboards http://dancorkill.home.comcast.net/pubs/

Issues
• Most issues fabric-related
 Most low level components experimental or not production-
hardened
• Tools typically unreliable under load
• MSS access a serious handicap
 PhEDEx plays very fair, keeping within request limits and
ordering requests by tape when possible
• Main problem is keeping in touch with the O(3) people at
each site involved in deploying fabric, administration &c

Deployment
• 8 regional centres, 16 smaller sites
• 110TB, replicated ~twice
• 1 TB per day sustained
 On standard Internet

Testing and scalability

PhEDEx architecture

Ahm2005 Timbarras

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ahm2005 Timbarras

Uploaded by

Copyright:

Available Formats

PhEDEx: a novel approach to

robust Grid data management

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 2

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 3

• Two principle use cases- push and pull of data

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 6

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 7

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 8

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 9

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 10

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 11

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 12

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 13

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 14

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 15

You might also like