peters HTC BlueGene CondorWeek

Information about peters HTC BlueGene CondorWeek

Published on September 19, 2007

Author: Haggrid

Source: authorstream.com

Content

High Throughput Computingon Blue Gene:  High Throughput Computing on Blue Gene IBM Rochester: Amanda Peters, Tom Budnik With contributions from: IBM Rochester: Mike Mundy, Greg Stewart, Pat McCarthy IBM Watson Research: Alan King, Jim Sexton UW-Madison Condor: Greg Thain, Miron Livny, Todd Tannenbaum Agenda:  Agenda Blue Gene Architecture Overview High Throughput Computing (HTC) on Blue Gene Condor and IBM Blue Gene Collaboration Exploratory Application Case Studies for Blue Gene HTC Questions and Web resource links Blue Gene/L Overview:  2.8/5.6 GF/s 2 processors 2 chips 5.6/11.2 GF/s 1.0 GB 32 chips 16 compute, 0-2 IO cards 90/180 GF/s 16 GB 32 node cards 1,024 chips 2.8/5.6 TF/s 512 GB 64 Racks 65,536 chips 180/360 TF/s 32 TB Rack System Node card Compute node Chip Blue Gene/L Overview Scalable from 1 rack to 64 racks Rack has 2048 processors with 512 MB or 1 GB DRAM/node Blue Gene has 5 independent networks (Torus, Collective, Control (JTAG), Global barrier, and Functional 1 Gb Ethernet) Blue Gene System Architecture :  Blue Gene System Architecture Functional Gigabit Ethernet I/O Node 0 Linux ciod I/O Node 1023 Linux ciod Control Gigabit Ethernet IDo chip Resource Scheduler System Console Control System DB2 I2C fs client fs client app app app app HPC vs. HTC Comparison:  HPC vs. HTC Comparison High Performance Computing (HPC) Model Parallel, tightly coupled applications Single Instruction, Multiple Data (SIMD) architecture Programming model: typically MPI Apps need tremendous amount of computational power over short time period High Throughput Computing (HTC) Model Large number of independent tasks Multiple Instruction, Multiple Data (MIMD) architecture Programming model: non-MPI Apps need large amount of computational power over long time period Traditionally run on large clusters HTC and HPC modes co-exist on Blue Gene Determined when resource pool (partition) is allocated Why Blue Gene for HTC?:  Why Blue Gene for HTC? High processing capacity with minimal floor space High compute node density – 2,048 processors in one Blue Gene rack Scalability from 1 to 64 racks (2,048 to 131,072 processors) Resource consolidation Multiple HTC and HPC workloads on a single system Optimal use of compute resources Low power consumption #1 on Green500 list @ 112 MFlops/Watt (www.green500.org/CurrentLists.html) Twice the performance per watt of a high frequency microprocessor Low cooling requirements enable extreme scale-up Centralized system management Blue Gene Navigator Slide7:  Generic HTC Flow on Blue Gene :  Generic HTC Flow on Blue Gene One or more dispatcher programs are started on front end/service node Dispatcher will manage HTC work request queue A pool (partition) of compute nodes is booted on Blue Gene Every compute node has a launcher program started on it that connects back to the designated HTC dispatcher New pools of resources can be added dynamically as workload increases External work requests are routed to HTC dispatcher queue Single or multiple work requests from each source HTC dispatcher finds available HTC client and forwards the work request HTC client runs executable on compute node A launcher program on each compute node handles work request sent to it by the dispatcher. When work request completes, the launcher program is reloaded and client is ready to handle another work request. Executable exit status is reported back to dispatcher Generic HTC Flow on Blue Gene:  Generic HTC Flow on Blue Gene Node Resiliency for HTC:  Node Resiliency for HTC In HPC mode a single failing node in a partition (pool of compute nodes) causes termination of all nodes in the partition Expected behavior for parallel MPI type apps, but unacceptable for HTC apps HTC mode partition handles this situation In HTC mode Blue Gene can recover from soft node failures For example parity errors If failure is not related to network hardware, a software reboot will recover the node Other nodes in the partition are unaffected and continue to run jobs Job on failed node is terminated and must be resubmitted by dispatcher If the partition is started in HTC mode, the Control System will poll at regular intervals looking for nodes in the reset state Nodes in the reset state will be rebooted and launcher restarted on them Condor and IBM Blue Gene Collaboration:  Condor and IBM Blue Gene Collaboration Both IBM and Condor teams engaged in adapting code to bring Condor and Blue Gene technologies together Initial Collaboration (Blue Gene/L) Prototype/research Condor running HTC workloads on Blue Gene/L Condor developed dispatcher/launcher running HTC jobs Prototype work for Condor being performed on Rochester On-Demand Center Blue Gene system Mid-term Collaboration (Blue Gene/L) Condor supports HPC workloads along with HTC workloads on Blue Gene/L Long-term Collaboration (Next Generation Blue Gene) I/O Node exploitation with Condor Partner in design of HTC services for Next Generation Blue Gene Standardized launcher, boot/allocation services, job submission/tracking via database, etc. Study ways to automatically switch between HTC/HPC workloads on a partition Data persistence (persisting data in memory across executables) Data affinity scheduling Petascale environment issues Condor Architecture:  Execute Machine Submit Machine Condor Architecture Submit Schedd Starter Shadow Startd Central Manager Collector Negotiator Condor with Blue Gene/L:  Blue Gene I/O Node Submit Machine Condor with Blue Gene/L Submit Schedd Starter Shadow Startd Central Manager Collector Negotiator mpirun Blue Gene Compute Nodes etc. Exploratory Application Case Studies for Blue Gene HTC:  Exploratory Application Case Studies for Blue Gene HTC Case Study #1: Financial overnight risk calculation for trading portfolio Large number of calculations to be completed by market opening Algorithm is Monte Carlo simulation Easy to distribute and robust to resource failure (fewer simulations just gives less accurate result) Grid middleware bundles tasks into relatively long-running jobs (45 minutes) Limiting resource is number of CPUs In some cases power density (KW/sq foot) is critical Case Study #2: Molecular docking code for virtual drug screening Docking simulation algorithm for screening large databases of potential drugs against targets Large number of independent calculations to determine the minimization energy between the target and each potential candidate, and subsequently find the strongest leads Exploratory Application Case Studies for Blue Gene HTC:  Exploratory Application Case Studies for Blue Gene HTC Experience results: Demonstrated scalable task dispatch to 1000’s of processors Successfully verified multiple dispatcher architecture Discovered optimal ratio of dispatcher to partition (pool) size is 1:64 or less Latencies increase as ratio increases above this level, possibly due to launcher contention for socket resource as scaling increases – still investigating in this area May depend on task duration and arrival rates Running in HTC mode changes the I/O patterns Typical MPI programs read and write to the file system with small buffer sizes HTC requires loading the full executable into memory and sending it to compute node Launcher is cached on IO Node but not the executable Experiments with delaying dispatch proportional to executable size for effective task distribution across partitions were successful Due to IO Node to Compute Node bandwidth To achieve the fastest throughput a low compute node to I/O node ratio is desirable Questions?:  Questions? http://www.ibm.com/servers/deepcomputing/bluegene.html http://www.research.ibm.com/bluegene http://www.redbooks.ibm.com/cgi-bin/searchsite.cgi?query=blue+gene Web resources: Backup Slides:  Backup Slides Blue Gene Software Stack :  Blue Gene Software Stack Slide19:  Slide20:  Dispatcher Launcher Connect to Dispatcher Dispatch task N Start task N Reboot Launcher Connect to Dispatcher andamp; send task N status Exit task N Boot Launcher Write task N status Read task N Submitter Submit task N to Work Queue Read task N status off Results Queue Slide21:  Node Resiliency

Related presentations


Other presentations created by Haggrid

makyaj
18. 06. 2007
0 views

makyaj

2407224601
22. 04. 2008
0 views

2407224601

0616PVR76491
17. 04. 2008
0 views

0616PVR76491

DART Slideshow
17. 04. 2008
0 views

DART Slideshow

AdvFin 2008 01 Introduction
10. 04. 2008
0 views

AdvFin 2008 01 Introduction

dept revenue presentation
09. 04. 2008
0 views

dept revenue presentation

het607 m06a01
07. 04. 2008
0 views

het607 m06a01

20061116 intl ops
30. 03. 2008
0 views

20061116 intl ops

2004 AMCHAM Doorknock
27. 03. 2008
0 views

2004 AMCHAM Doorknock

pdhpe moderate
18. 06. 2007
0 views

pdhpe moderate

Where the Red Fern Grows
03. 10. 2007
0 views

Where the Red Fern Grows

tutorial 1
19. 09. 2007
0 views

tutorial 1

Future Law Enforcement ppt
19. 09. 2007
0 views

Future Law Enforcement ppt

231B 2006 Suetterlin Lec1
12. 10. 2007
0 views

231B 2006 Suetterlin Lec1

Crocodile
12. 10. 2007
0 views

Crocodile

VLSI Symp 2 10 2007
09. 10. 2007
0 views

VLSI Symp 2 10 2007

2003 08 27 Schelle Wolff Carola
24. 10. 2007
0 views

2003 08 27 Schelle Wolff Carola

875 PERL 06 mini
02. 11. 2007
0 views

875 PERL 06 mini

Where the Sidewalk Ends
26. 10. 2007
0 views

Where the Sidewalk Ends

CNV
22. 10. 2007
0 views

CNV

pfit
07. 11. 2007
0 views

pfit

scholz
16. 11. 2007
0 views

scholz

DDR Frog Licking
17. 11. 2007
0 views

DDR Frog Licking

The Suffering of Jesus
17. 08. 2007
0 views

The Suffering of Jesus

lecture5
28. 11. 2007
0 views

lecture5

ontology
11. 12. 2007
0 views

ontology

predationmurray
01. 01. 2008
0 views

predationmurray

academy mission vision
03. 01. 2008
0 views

academy mission vision

Maldives presentation
07. 08. 2007
0 views

Maldives presentation

mood disorders
07. 08. 2007
0 views

mood disorders

Loh Verma Michalowski CPS04
07. 08. 2007
0 views

Loh Verma Michalowski CPS04

Karen Middleton
07. 08. 2007
0 views

Karen Middleton

Linkage ordinal data hm
07. 08. 2007
0 views

Linkage ordinal data hm

modern Day Slavery
07. 08. 2007
0 views

modern Day Slavery

MOA Presentation Mandsager final
07. 08. 2007
0 views

MOA Presentation Mandsager final

oct15 insurance reinsurance RGA
07. 08. 2007
0 views

oct15 insurance reinsurance RGA

maldives khaleel
07. 08. 2007
0 views

maldives khaleel

mostly oopsla03
19. 09. 2007
0 views

mostly oopsla03

2005 Loftus Introduced Fish
19. 11. 2007
0 views

2005 Loftus Introduced Fish

UNTITLED
07. 08. 2007
0 views

UNTITLED

knoblock
23. 10. 2007
0 views

knoblock

India US Dual Use Goldman
17. 08. 2007
0 views

India US Dual Use Goldman

RedSquare Bike Ride Eng
27. 09. 2007
0 views

RedSquare Bike Ride Eng

Financing EFA Maldives
07. 08. 2007
0 views

Financing EFA Maldives

Languages Models Factories
14. 11. 2007
0 views

Languages Models Factories

Ge11cDIfferentiation
20. 02. 2008
0 views

Ge11cDIfferentiation

1950s
24. 02. 2008
0 views

1950s

Dual Language Posterboard 2
24. 02. 2008
0 views

Dual Language Posterboard 2

200792013611855
10. 10. 2007
0 views

200792013611855

as2007 aviation careers brief
28. 02. 2008
0 views

as2007 aviation careers brief

BiodieselFuelQuality pt1
29. 02. 2008
0 views

BiodieselFuelQuality pt1

2005 Inflammation
04. 03. 2008
0 views

2005 Inflammation

MI 2006 final 11 9
07. 08. 2007
0 views

MI 2006 final 11 9

figuerola lucifer
15. 10. 2007
0 views

figuerola lucifer

TOXICVB
05. 01. 2008
0 views

TOXICVB

GENIe ISA
10. 03. 2008
0 views

GENIe ISA

Pretty Blue Planet
19. 09. 2007
0 views

Pretty Blue Planet

AM1 DTV China EN
11. 10. 2007
0 views

AM1 DTV China EN

2007RoyalEurope consumer
01. 11. 2007
0 views

2007RoyalEurope consumer

YTBv4
12. 03. 2008
0 views

YTBv4

SevenBrochure
26. 03. 2008
0 views

SevenBrochure

babar
15. 10. 2007
0 views

babar

memphis
23. 10. 2007
0 views

memphis

NASBE Asthma Policies
07. 08. 2007
0 views

NASBE Asthma Policies

Apache Harmony Short Talk
19. 09. 2007
0 views

Apache Harmony Short Talk

DSF
07. 01. 2008
0 views

DSF

Module 10 C Older Adults
07. 08. 2007
0 views

Module 10 C Older Adults

CEC 999 2006 018
11. 10. 2007
0 views

CEC 999 2006 018

NNER MAGAZIN neu
18. 06. 2007
0 views

NNER MAGAZIN neu

kids slide show
18. 06. 2007
0 views

kids slide show

Inco Present1
18. 06. 2007
0 views

Inco Present1

nifty fifty thrifty 2
18. 06. 2007
0 views

nifty fifty thrifty 2

Navigator
18. 06. 2007
0 views

Navigator

mudancas internas2 lila
18. 06. 2007
0 views

mudancas internas2 lila

MMC Selection271006
18. 06. 2007
0 views

MMC Selection271006

MD Rhythm Software
18. 06. 2007
0 views

MD Rhythm Software

Experian
19. 09. 2007
0 views

Experian

urb1
27. 11. 2007
0 views

urb1

presentation reunion cnds clubs
18. 06. 2007
0 views

presentation reunion cnds clubs

PMA Veri Sign Hot Trends
18. 06. 2007
0 views

PMA Veri Sign Hot Trends

Phys Act2 Ron Johnston
18. 06. 2007
0 views

Phys Act2 Ron Johnston

cdp 8 12 06
19. 09. 2007
0 views

cdp 8 12 06

cdp 12 06
19. 09. 2007
0 views

cdp 12 06

061101 Panofsky
17. 08. 2007
0 views

061101 Panofsky

Saints or Sinners
17. 08. 2007
0 views

Saints or Sinners

memoria
18. 06. 2007
0 views

memoria

00021386
19. 09. 2007
0 views

00021386

peso fall protection w
04. 01. 2008
0 views

peso fall protection w

irony
15. 06. 2007
0 views

irony

HOUSE HOLDER
15. 06. 2007
0 views

HOUSE HOLDER

god you are looking for
15. 06. 2007
0 views

god you are looking for

generadio7
15. 06. 2007
0 views

generadio7

friendship cinquain
15. 06. 2007
0 views

friendship cinquain

foreign words in english
15. 06. 2007
0 views

foreign words in english

FME UC 2006 Opening Session
15. 06. 2007
0 views

FME UC 2006 Opening Session

First fun in the afternoon
15. 06. 2007
0 views

First fun in the afternoon

feml fool 2006
15. 06. 2007
0 views

feml fool 2006

FATE AND CHANCE WEEK I 2006
15. 06. 2007
0 views

FATE AND CHANCE WEEK I 2006

FATE AND CHANCE WEEK 2 2006
15. 06. 2007
0 views

FATE AND CHANCE WEEK 2 2006

faq remarriage
15. 06. 2007
0 views

faq remarriage

faq commitment
15. 06. 2007
0 views

faq commitment

Fabulously Funny Facts
15. 06. 2007
0 views

Fabulously Funny Facts

NEW MEMBERS
18. 06. 2007
0 views

NEW MEMBERS

FOL and Prolog
15. 06. 2007
0 views

FOL and Prolog

mouton
18. 06. 2007
0 views

mouton

NPR
18. 06. 2007
0 views

NPR

NOWARonIran
16. 10. 2007
0 views

NOWARonIran

maendsregler
18. 06. 2007
0 views

maendsregler

Les arts figuratives al s XIX
01. 10. 2007
0 views

Les arts figuratives al s XIX

Kirkpatrick
07. 08. 2007
0 views

Kirkpatrick

texas emission
26. 02. 2008
0 views

texas emission

NFI Pact presentation copy 2
07. 08. 2007
0 views

NFI Pact presentation copy 2

casestudy
22. 10. 2007
0 views

casestudy

Thankyou Lord
17. 08. 2007
0 views

Thankyou Lord

megagreen
18. 06. 2007
0 views

megagreen

SRaha
17. 08. 2007
0 views

SRaha

Glasgow Anand
15. 11. 2007
0 views

Glasgow Anand

2 day training slideshow
07. 08. 2007
0 views

2 day training slideshow

Finch
15. 11. 2007
0 views

Finch