tftge dec 02

Information about tftge dec 02

Published on June 18, 2007

Author: CoolDude26

Source: authorstream.com

Content

U.S. ATLAS Grid Production Experience:  U.S. ATLAS Grid Production Experience Kaushik De University of Texas at Arlington Troubleshooting and Fault Tolerance in Grid Environments, Chicago December 11, 2002 U.S. ATLAS Testbed:  U.S. ATLAS Testbed US -ATLAS testbed launched February 2001 Fabric Testing:  Fabric Testing Testbed Production:  Testbed Production Goals: Demonstrate distributed ATLAS data production, access and analysis using grid middleware and using tools developed by the testbed group Production (and testing) experience so far: Fast simulation (Atlfast) Short jobs, 5 sites used (all 8 sites certified) Generated ~10 million events during two weeks in July 2002, 6000 files fully catalogued and accessible through the grid Data Challenge production (Atlsim) Phase 1 CPU intensive - ~14 hours per job/output file 3 heterogeneous sites participated: 15, 30, 300 nodes; Condor (2) and LSF; 300-1500 MHz Generated 200k events, 5000 files in August 2002 DC Phase 2 ~25 hours per job, 50-60k events in January 2003 Pre-production testing started General Remarks:  General Remarks Tackled a large number of complex issues repackaging of applications (by hand) software deployment (PACMAN) site verification (GridView) production tools (GRAT, Grappa) data management (magda) VO management (BNL tools) ... Troubleshooting ignore andamp; resubmit, check log files, databases most of the troubleshooting done by tool developers - not a robust operations model! Fault tolerance redundancy, independent verification process, concatenated logs, error handling Not a production environment yet - still a development testbed doing production! Databases Used in U.S. DC1:  Databases Used in U.S. DC1 MySQL databases play a central role in U.S. DC1 production scripts Production database used to track job status (filename, submitting site, processing site, job id, time started, time finished, temporary and final file locations…) information is updated periodically during job Data management to transfer input and output files using GridFTP to register file locations in Magda catalogue Virtual Data Catalogue used to define job (transformation) store job parameters, random numbers Metadata catalogue store post-production summary information data provenance, physics summary... GRAT Software:  GRAT Software ~50 independently executable modular scripts based on Globus and magda Minimal requirement on grid production site Globus andamp; Magda installed on gatekeeper shared $ATLAS_SCRATCH disk for all nodes Automatic job submission under full user control One, many or infinite sequence of jobs at one or many sites, using grid even for local submits Any user from any site can submit production jobs Independent data management scripts to check consistency of production semi-automatically query production database check Globus for job completion status check data catalog (magda) for output files recover from many possible production failures Data management using magda: moving and registering output files to BNL HPSS and at replica locations on the grid GRAT Execution Model:  GRAT Execution Model 1. Resource Discovery 2. Partition Selection 3. Job Creation 4. Pre-stage 5. Batch Submission 6. Job Parameterization 7. Simulation 8. Post-stage 9. Cataloging 10. Monitoring GRAT Job Scheduling:  GRAT Job Scheduling Create job script module Replica storage select module Site select module Stage software on atlas_scratch Move files/cleanup module Execute Atlsim Job Query environment Partition select module Scheduler Scheduler Gatekeeper Queue Node ATLAS_SCRATCH Virtual Data Catalogue Register Production Magda Database DC1 Jobs on U.S. Grid:  DC1 Jobs on U.S. Grid DC1 Production Experience:  DC1 Production Experience Grid production requires robust software During 18 days of grid production (in August), every system died at least once Local experts were not always accessible (many of them on vacation) Examples: scheduling machines died 5 times (thrice power failure, twice system hung) Long network outages - multiple times Gatekeeper - died at every site at least 2-3 times Three databases used - production, magda and virtual data. Each inaccessible at least once! Scheduled maintenance - HPSS, Magda server, LBNL hardware, LBNL Raid array… These outages should be expected on the grid, as we include many more sites We managed andgt; 100 files/day (~75% efficiency) in spite of these stoppages! Future Plans:  Future Plans Continue production/development Pileup data production (data - not cpu intensive) other production/analysis use cases GRAT improvements Use Condor-G for job submission detailed plan developed working with Condor team need database publication of Condor log files 1 month time-scale Use DAGMan for pileup production nice use case - hundreds of nodes to be managed over many days or many weeks 3 month time-scale Migrate to Chimera 6 month time-scale MDS integration (using GLUE andamp; Pippy schema) Implement resource broker

Related presentations


Other presentations created by CoolDude26

CH8 PowerPoint Robotics
31. 12. 2007
0 views

CH8 PowerPoint Robotics

challenging behavior
17. 09. 2007
0 views

challenging behavior

ch1
17. 09. 2007
0 views

ch1

IP P mohan
19. 09. 2007
0 views

IP P mohan

RMP October 2006 Data
11. 10. 2007
0 views

RMP October 2006 Data

standing2
12. 10. 2007
0 views

standing2

lam talk
15. 10. 2007
0 views

lam talk

Escher
15. 10. 2007
0 views

Escher

2006 Radicarbon History
16. 10. 2007
0 views

2006 Radicarbon History

pre b373
17. 10. 2007
0 views

pre b373

banderas2
22. 10. 2007
0 views

banderas2

ostrichsm
17. 09. 2007
0 views

ostrichsm

209 OASIS Ostrich presentation
17. 09. 2007
0 views

209 OASIS Ostrich presentation

ssec software development
07. 10. 2007
0 views

ssec software development

dianxinyezhuanxing
12. 10. 2007
0 views

dianxinyezhuanxing

africa presentation
23. 10. 2007
0 views

africa presentation

spatial databases
23. 10. 2007
0 views

spatial databases

vollhardt lecc2005
17. 10. 2007
0 views

vollhardt lecc2005

232nm13
29. 10. 2007
0 views

232nm13

ans321L2
17. 09. 2007
0 views

ans321L2

Plan Bleu partie2
24. 10. 2007
0 views

Plan Bleu partie2

05 galaxies
29. 08. 2007
0 views

05 galaxies

nuclearWeapons
23. 12. 2007
0 views

nuclearWeapons

ChristensenNov2
29. 08. 2007
0 views

ChristensenNov2

Unit3
03. 01. 2008
0 views

Unit3

como fazer palestra espirita
07. 01. 2008
0 views

como fazer palestra espirita

XES Architecture Vacuum v2
17. 09. 2007
0 views

XES Architecture Vacuum v2

2 tinyos
29. 10. 2007
0 views

2 tinyos

Lecture16 overheads
21. 08. 2007
0 views

Lecture16 overheads

Hip Injuries in Athletics PartI
01. 08. 2007
0 views

Hip Injuries in Athletics PartI

jsimon irvine
29. 08. 2007
0 views

jsimon irvine

GeorgeMiley LOFAR May06
29. 08. 2007
0 views

GeorgeMiley LOFAR May06

jim brady
05. 10. 2007
0 views

jim brady

automotive invitation
24. 10. 2007
0 views

automotive invitation

microscopy
15. 10. 2007
0 views

microscopy

General Psychopathology
16. 02. 2008
0 views

General Psychopathology

The Virus of Violence
20. 02. 2008
0 views

The Virus of Violence

cindy pragma grid
17. 10. 2007
0 views

cindy pragma grid

ASOCallPresentation2 006WBSD
18. 03. 2008
0 views

ASOCallPresentation2 006WBSD

China Korea Trip Info
25. 03. 2008
0 views

China Korea Trip Info

A105 021 GalI
29. 08. 2007
0 views

A105 021 GalI

widefield yan
29. 08. 2007
0 views

widefield yan

blain cosmoskyoto
29. 08. 2007
0 views

blain cosmoskyoto

Corporate Profile November 2007
27. 03. 2008
0 views

Corporate Profile November 2007

Cal Mrtg Watkins
10. 04. 2008
0 views

Cal Mrtg Watkins

pcreek
13. 04. 2008
0 views

pcreek

7 9 kraft
29. 08. 2007
0 views

7 9 kraft

neos innovation challenge short
14. 04. 2008
0 views

neos innovation challenge short

nslab diffserv 06a
16. 04. 2008
0 views

nslab diffserv 06a

podraza medicare
17. 04. 2008
0 views

podraza medicare

Capitalizing
18. 04. 2008
0 views

Capitalizing

Meyer
22. 04. 2008
0 views

Meyer

forbes manhattan presentation
28. 04. 2008
0 views

forbes manhattan presentation

cs4811 ch09 uncertainty
17. 09. 2007
0 views

cs4811 ch09 uncertainty

Managing Tough Decisions
17. 09. 2007
0 views

Managing Tough Decisions

David Ellis powerpoint
30. 04. 2008
0 views

David Ellis powerpoint

Anesthetic Machines
02. 05. 2008
0 views

Anesthetic Machines

ADSL NTT
09. 10. 2007
0 views

ADSL NTT

ABM12006
15. 10. 2007
0 views

ABM12006

Richstone Mitchell
29. 08. 2007
0 views

Richstone Mitchell

CIP TOA and Beyond 5 29 07
03. 01. 2008
0 views

CIP TOA and Beyond 5 29 07

Thomson top panic05
18. 06. 2007
0 views

Thomson top panic05

Thesis defense rev12
18. 06. 2007
0 views

Thesis defense rev12

Temp bone trauma slides 051012
18. 06. 2007
0 views

Temp bone trauma slides 051012

tamara
18. 06. 2007
0 views

tamara

tactical euro condor06
18. 06. 2007
0 views

tactical euro condor06

Stolarz D 1603
18. 06. 2007
0 views

Stolarz D 1603

stoc04
18. 06. 2007
0 views

stoc04

stabicp slides
18. 06. 2007
0 views

stabicp slides

Sofia Sima 2 ext new
18. 06. 2007
0 views

Sofia Sima 2 ext new

slacbaryo genesis
18. 06. 2007
0 views

slacbaryo genesis

iyef project homeless connect
31. 10. 2007
0 views

iyef project homeless connect

SOFG
18. 06. 2007
0 views

SOFG

NorthStar
13. 11. 2007
0 views

NorthStar

rickwilliams
21. 08. 2007
0 views

rickwilliams

Noah s Ark
03. 10. 2007
0 views

Noah s Ark

ambertech
19. 11. 2007
0 views

ambertech

Soc RespI SOC02
17. 09. 2007
0 views

Soc RespI SOC02

Internet Protocol Addresses
15. 06. 2007
0 views

Internet Protocol Addresses

Colour Reconnection
15. 06. 2007
0 views

Colour Reconnection

Research & Development
15. 06. 2007
0 views

Research & Development

Low Frequency Gravitational Wave
15. 06. 2007
0 views

Low Frequency Gravitational Wave

vander Marel mgct2 win
29. 08. 2007
0 views

vander Marel mgct2 win

G020514 00
17. 09. 2007
0 views

G020514 00

aas calzetti
29. 08. 2007
0 views

aas calzetti

jokes riddles
17. 09. 2007
0 views

jokes riddles

vaulttutorial
19. 09. 2007
0 views

vaulttutorial

Ch90 ExtensionsToFOPC
17. 09. 2007
0 views

Ch90 ExtensionsToFOPC

Mercurio
29. 08. 2007
0 views

Mercurio

Eric Gawiser pire galclust
29. 08. 2007
0 views

Eric Gawiser pire galclust

Eric Gawiser pire galform
29. 08. 2007
0 views

Eric Gawiser pire galform

1 Dirk Van Braeckel
23. 10. 2007
0 views

1 Dirk Van Braeckel

directors roundtable 0407
02. 10. 2007
0 views

directors roundtable 0407

MWR
01. 08. 2007
0 views

MWR

01 Singleton
17. 09. 2007
0 views

01 Singleton

astro101 2000oct
15. 11. 2007
0 views

astro101 2000oct

CCAT06 Chapman
29. 08. 2007
0 views

CCAT06 Chapman

nips06 tutorial
17. 09. 2007
0 views

nips06 tutorial

90convexpo Jeff Tobe ppt
17. 09. 2007
0 views

90convexpo Jeff Tobe ppt