You are on page 1of 15

PhEDEx: a novel approach to

robust Grid data management

Tim Barrass
Dave Newbold and Lassi Tuura
All Hands Meeting, Nottingham, UK
22 September 2005
What is PhEDEx?
• A data distribution management system
 Used by the Compact Muon Solenoid (CMS) High Energy
Physics (HEP) experiment at CERN, Geneva
• Blends traditional HEP data distribution practice
with more recent technologies
 Grid and peer-to-peer filesharing
• Scalable infrastructure for managing dataset
replication
 Automates low-level activity
 Allows manager to work with high level dataset concepts
rather than low level file operations
• Technology agnostic
 Overlies Grid components
 Currently couples LCG, OSG, NorduGrid, standalone sites

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 2


The HEP environment
• HEP collaborations are quite large
 Order of 1000 collaborators, globally distributed
 CMS is only one of four Large Hadron Collider (LHC) experiments
being built at CERN
• Typically resources are globally distributed
 Resources organised in tiers of decreasing capacity
 Tier 0: the detector facility
 Tier 1: large regional centres
 Tier 2+: smaller sites-- Universities, groups, individuals…
 Raw data partitioned between sites, highly processed ready-for-
analysis data available everywhere
• LHC computing demands are large
 Order 10 PetaBytes per year created for CMS alone
 Similar order simulated
 Also analysis and user data

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 3


CMS distribution use cases

• Two principle use cases- push and pull of data


 Raw data is pushed onto the regional centres
 Simulated and analysis data is pulled to a subscribing site
 Actual transfers are 3rd party- handshake between active
components important, not push or pull
• Maintain end-to-end multi-hop transfer state
 Can only clean online buffers at detector when data safe at Tier
1
• Policy must be used to resolve these two
Tim Barrass, Bristol, use cases
tim.barrass@bristol.ac.uk 4
PhEDEx design
• Assume every operation is going to fail!
• Keep complex functionality in discrete agents
 Handover between agents minimal
 Agents are persistent, autonomous, stateless, distributed
 System state maintained using a modified blackboard
architecture
• Layered abstractions make system robust
• Keep local information local where possible
 Enable site administrators to maintain local infrastructure
 Robust in face of most local changes
 Deletion and accidental loss require attention
• Draws inspiration from agent systems,
“autonomic” and peer-to-peer computing
Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 5
Transfer workflow overview

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 6


Production performance

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 7


Service challenge performance

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 8


Future directions
• Contractual file routing
 Cost-based offers for a given transfer
• Peer-to-peer data location
 Using Kademlia to partition replica location information
• Semi-autonomy
 Agents governed by many small tuning parameters
 Self modify- or use more intelligent protocols?
• Advanced policies for priority conflict resolution
 Need to ensure that raw data is always flowing
 Difficult real-time scheduling problem

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 9


Summary
• PhEDEx enables dataset level replication for the
CMS HEP experiment
 Currently manages 200TB+ of data, globally distributed
 Real life performance of 1 TB per day sustained per site
 Challenge performance of over 10TB per day
• Not CMS-- or indeed HEP-- specific
• Well-placed to meet future challenges
 Ramping up to get to O(10)PB per year
 10-100TB per day
 Data starts flowing for real in the next two years

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 10


Extra information
• PhEDEx and CMS
 http://cms-project-phedex.web.cern.ch/cms-project-phedex/
 cms-phedex-developers@cern.ch : feel free to subscribe!
 CMS Computing model
http://www.gridpp.ac.uk/eb/ComputingModels/cms_computing_model.pdf
• Agent frameworks
 JADE http://jade.tilab.com/
 DiaMONDs http://diamonds.cacr.caltech.edu/
 FIPA http://www.fipa.org
• Peer-to-peer
 Kademlia http://citeseer.ist.psu.edu/529075.html
 Kenosis http://sourceforge.net/projects/kenosis
• Autonomic computing
 http://www.research.ibm.com/autonomic/
• General agents and blackboards
 Where should complexity go? http://www.cs.bath.ac.uk/~jjb/ftp/wrac01.pdf
 Agents and blackboards http://dancorkill.home.comcast.net/pubs/

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 11


Issues
• Most issues fabric-related
 Most low level components experimental or not production-
hardened
• Tools typically unreliable under load
• MSS access a serious handicap
 PhEDEx plays very fair, keeping within request limits and
ordering requests by tape when possible
• Main problem is keeping in touch with the O(3) people at
each site involved in deploying fabric, administration &c

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 12


Deployment
• 8 regional centres, 16 smaller sites
• 110TB, replicated ~twice
• 1 TB per day sustained
 On standard Internet

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 13


Testing and scalability

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 14


PhEDEx architecture

Tim Barrass, Bristol, tim.barrass@bristol.ac.uk 15

You might also like