GSRC 12 10 2006 Eisley

Information about GSRC 12 10 2006 Eisley

Published on November 29, 2007

Author: Hannah

Source: authorstream.com

Content

In-Network Cache Coherence:  In-Network Cache Coherence Noel Eisley*, Li-Shiuan Peh*, Li Shang** *Department of Electrical Engineering Princeton University {eisley, peh}@princeton.edu **Department of Electrical and Computer Engineering Queen’s University [email protected] Chip-Multiprocessors (CMPs):  Chip-Multiprocessors (CMPs) CMPs reduce complexity and offer scalability Higher frequencies and more transistors  higher power consumption Higher frequencies  many-cycle chip traversal CMPs offer a large amount of distributed on-chip storage On-chip data access is much faster than off-chip (~10-100x) An important challenge is to manage the distributed on-chip data using a scalable approach Then Now Future… Intel 80-core   Background: Interconnection Networks:  Background: Interconnection Networks Arbiter Link Traversal Crossbar Traversal Virtual Channel Arbitration Physical Channel Arbitration Routing Router Pipeline Ejection Injection South North East West I/O buffers On-Chip Data Management -- Cache coherence is the first step:  On-Chip Data Management -- Cache coherence is the first step Finding valid data Does it exist? Where is it? Optimal: go directly to the nearest sharer Updating/invalidating data Avoid unnecessary traffic (i.e. broadcast) Optimal: invalidate sharers directly A A A A A A A A Current: Directory-Based Protocol:  Current: Directory-Based Protocol Home node (H) keeps a list (or partial list) of sharers Read: Round trip overhead Reads don’t take advantage of locality Write: Up to two round trip overhead Invalidations inefficient H Our Approach: Virtual Network:  Our Approach: Virtual Network Network is not just for getting from A to B It can keep track of sharers and maintain coherence How? With trees stored within routers Tree connecting home node with sharers Tree stored within routers On the way to home node, when requests “bump” into trees, routers will re-route them towards sharers. H Bump into tree Add to tree Outline:  Outline Motivation and introduction In-network cache coherence protocol Methodology and results Conclusions Reads Locate Data Efficiently:  H Reads Locate Data Efficiently Legend H: Home node Sharer node Tree node (no data) Read request/reply Write request/reply Teardown request Acknowledgement Read Request injected into the network Tree constructed as read reply is sent back New read injected into the network Request is redirected by the network Data is obtained from sharer and returned Writes Invalidate Data Efficiently:  Writes Invalidate Data Efficiently H Write Request injected into the network In-transit invalidations Acknowledgements spawned at leaf nodes Wait for acknowledgements Send out write reply Legend H: Home node Sharer node Tree node (no data) Read request/reply Write request/reply Teardown request Acknowledgement Virtual Trees - Structure:  Virtual Trees - Structure Node types Home: same as for directory protocol Root: first sharer Internal: all others Characteristics: Directional link always points towards the root node No loops No disjoint trees H R Router Microarchitecture:  Router Microarchitecture Tree Cache Link Traversal Crossbar Traversal Virtual Channel Arbitration Physical Channel Arbitration Tree Cache Routing Router Pipeline Tag Data Entry H Router Microrchitecture: Virtual Tree Cache:  Router Microrchitecture: Virtual Tree Cache Each cache line in the virtual tree cache contains 2 bits (To, From root node) for each direction (NSEW) 1 bit if the local node contains valid data 1 busy bit 1 Outstanding request bit Tag Data Entry T T T T F F F F N S E W D B R Virtual Tree Cache Outline:  Outline Motivation and introduction In-network cache coherence protocol Methodology and results Conclusions Methodology:  Methodology Benchmarks: SPLASH-2 Multithreaded simulator: Bochs running 16 or 64-way Network and virtual tree simulator Trace-driven Full-system simulation becomes prohibitive with many cores Motivation: explore scalability Cycle-accurate Each message is modeled Models network contention (arbitration) Calculates average request completion times Bochs Network Simulator Application Memory traces Memory access latency performance Tree Cache Configuration:  Tree Cache Configuration Good configuration arrived at through detailed design-space analysis Cache access parameters derived from Cacti Tree Cache Configuration Simulation Timing Summary Performance Scalability:  Performance Scalability Compare in-network virtual tree protocol to standard directory protocol Good improvement 4x4 Mesh: Avg. 35.5% read latency reduction, 41.2% write latency reduction Scalable 8x8 Mesh: Avg. 35% read and 48% write latency reduction 4x4 Mesh 8x8 Mesh Storage Scalability:  Storage Scalability Directory protocol: O(# Nodes) Virtual tree protocol: O(# Ports) ~O(1) Storage overhead 4x4 mesh: 56% more storage bits 8x8 mesh: 58% fewer storage bits 16 64 # of Nodes Tree Cache Line Directory Cache Line 28 bits 28 bits 18 bits 66 bits Related Work:  Related Work Mizrahi et al. [1989] Goodman and Woest [1988] LimitLess (Chaiken, et al.) [1991] SCI (Gjessing, et al.) [1992] GLOW (Kaxiras and Goodman) [1996] Chang, et al. [1999] Iyer, et al. [2000] Cheng, et al. [2006] Conclusions:  Conclusions Want to address the scalability of on-chip data management (cache coherence) for CMPs Embedding cache coherence within the network fabric allows the network to optimize memory traffic ~35% and ~41% read and write latency reduction for 4x4 meshes ~35% and ~48% reduction for 8x8 meshes Storage overhead scalability Overhead when compared to directory protocol for a 4x4 mesh Reduction for an 8x8 mesh Slide20:  Thank you! Results (cont.):  Results (cont.) Effect of embedding the connectivity (i.e. tree cache) within the network 31% Avg. read latency reduction 49.1% Avg. write latency reduction Network Template:  Network Template H Legend H: Home node Sharer node Tree node(no data) Read request/reply Write request/reply Teardown Acknowledgement Tough Questions:  Tough Questions Wouldn’t it be faster to use look-ahead routing (e.g. 2-cycle hop) and accept more hops? Scalability:  Scalability Performance From previous slide, avg. read and write latencies remain constant or decrease (relative to directory) from 4x48x8 mesh Storage overhead 4x4 mesh: 56% more storage bits 8x8 mesh: 58% fewer storage bits 4x4: 8x8: Tree Full-Map Lim. Dir. Motivation: Coherence-Aware Networks:  Motivation: Coherence-Aware Networks Network is not just for getting from A to B Storing more information allows for in-transit optimizations Give routers the ability to source and sink coherence messages Virtual Tree - Protocol:  Virtual Tree - Protocol On capacity miss Tear down the tree and begin a new one On write request Tear down the tree and begin a new one Proactive evictions of tree cache Even at nodes other than the home node Tree must be torn down completely before a new one can be created A B C D H

Related presentations


Other presentations created by Hannah

Brief History Ballet
14. 03. 2008
0 views

Brief History Ballet

victor board meeting
28. 09. 2007
0 views

victor board meeting

lecture6B
09. 11. 2007
0 views

lecture6B

Sci Method AS
10. 12. 2007
0 views

Sci Method AS

WTC Training revb
06. 11. 2007
0 views

WTC Training revb

gene finding 1
16. 11. 2007
0 views

gene finding 1

jdbc1
21. 11. 2007
0 views

jdbc1

inaugural
26. 11. 2007
0 views

inaugural

Harbulot
28. 12. 2007
0 views

Harbulot

Thesaurus
03. 12. 2007
0 views

Thesaurus

uabwf
30. 12. 2007
0 views

uabwf

10 Predation
01. 01. 2008
0 views

10 Predation

Wetlands Terrain Modeler IAP
04. 01. 2008
0 views

Wetlands Terrain Modeler IAP

Wastewater Treatment Albq
04. 01. 2008
0 views

Wastewater Treatment Albq

LegWkshp 5 99 final
07. 01. 2008
0 views

LegWkshp 5 99 final

KMW gtsug2004
07. 11. 2007
0 views

KMW gtsug2004

September 06 IR Presentation
03. 01. 2008
0 views

September 06 IR Presentation

11 10 04 Biddle Military Power
26. 02. 2008
0 views

11 10 04 Biddle Military Power

lawbreakers
28. 02. 2008
0 views

lawbreakers

fa19
06. 03. 2008
0 views

fa19

Matthews NACCHO 02 15 06
10. 03. 2008
0 views

Matthews NACCHO 02 15 06

Oct 2005 Michael Hunter
12. 03. 2008
0 views

Oct 2005 Michael Hunter

Hungary
18. 03. 2008
0 views

Hungary

aries 2008
07. 04. 2008
0 views

aries 2008

irses br oct07 v2
30. 03. 2008
0 views

irses br oct07 v2

student10demo1
04. 10. 2007
0 views

student10demo1

sym12 37g
13. 04. 2008
0 views

sym12 37g

New York City 2003
27. 09. 2007
0 views

New York City 2003

13 9 00 A
29. 12. 2007
0 views

13 9 00 A

mos ship
07. 11. 2007
0 views

mos ship

cioccolato stroppiana
20. 11. 2007
0 views

cioccolato stroppiana

Colin Butler SD Oct2
07. 12. 2007
0 views

Colin Butler SD Oct2