S10 Processor Performance

Information about S10 Processor Performance

Published on September 17, 2007

Author: Techy_Guy

Source: authorstream.com

Content

How do we evaluate computer architectures?:  How do we evaluate computer architectures? Think of 5 characteristics that differentiate computers? Can some processors compute things that others can’t? How do we evaluate computer architectures?:  How do we evaluate computer architectures? Think of 5 characteristics that differentiate computers? Single-Cycle Performance:  Single-Cycle Performance Last time we saw a MIPS single-cycle datapath and control unit. Today, we’ll explore factors that contribute to a processor’s execution time, and specifically at the performance of the single-cycle machine. Next time, we’ll explore how to improve on the single cycle machine’s performance using pipelining. Three Components of CPU Performance:  Three Components of CPU Performance Cycles Per Instruction CPU timeX,P = Instructions executedP * CPIX,P * Clock cycle timeX Instructions Executed:  Instructions executed: We are not interested in the static instruction count, or how many lines of code are in a program. Instead we care about the dynamic instruction count, or how many instructions are actually executed when the program runs. There are three lines of code below, but the number of instructions executed would be 2001. li $a0, 1000 Ostrich: sub $a0, $a0, 1 bne $a0, $0, Ostrich Instructions Executed CPI:  The average number of clock cycles per instruction, or CPI, is a function of the machine and program. The CPI depends on the actual instructions appearing in the program—a floating-point intensive application might have a higher CPI than an integer-based program. It also depends on the CPU implementation. For example, a Pentium can execute the same instructions as an older 80486, but faster. In CS231, we assumed each instruction took one cycle, so we had CPI = 1. The CPI can be andgt;1 due to memory stalls and slow instructions. The CPI can be andlt;1 on machines that execute more than 1 instruction per cycle (superscalar). CPI Clock cycle time:  One 'cycle' is the minimum time it takes the CPU to do any work. The clock cycle time or clock period is just the length of a cycle. The clock rate, or frequency, is the reciprocal of the cycle time. Generally, a higher frequency is better. Some examples illustrate some typical frequencies. A 500MHz processor has a cycle time of 2ns. A 2GHz (2000MHz) CPU has a cycle time of just 0.5ns (500ps). Clock cycle time Execution time, again:  CPU timeX,P = Instructions executedP * CPIX,P * Clock cycle timeX The easiest way to remember this is match up the units: Make things faster by making any component smaller!! Often easy to reduce one component by increasing another Execution time, again Example 1: ISA-compatible processors:  Let’s compare the performances two x86-based processors. An 800MHz AMD Duron, with a CPI of 1.2 for an MP3 compressor. A 1GHz Pentium III with a CPI of 1.5 for the same program. Compatible processors implement identical instruction sets and will use the same executable files, with the same number of instructions. But they implement the ISA differently, which leads to different CPIs. CPU timeAMD,P = InstructionsP * CPIAMD,P * Cycle timeAMD = = CPU timeP3,P = InstructionsP * CPIP3,P * Cycle timeP3 = = Example 1: ISA-compatible processors Example 2: Comparing across ISAs:  Example 2: Comparing across ISAs Intel’s Itanium (IA-64) ISA is designed facilitate executing multiple instructions per cycle. If an Itanium processor achieves an average CPI of .3 (3 instructions per cycle), how much faster is it than a Pentium4 (which uses the x86 ISA) with an average CPI of 1? Itanium is three times faster Itanium is one third as fast Not enough information The single-cycle design from last time:  The single-cycle design from last time A control unit (not shown) generates all the control signals from the instruction’s 'op' and 'func' fields. The example add from last time:  The example add from last time Consider the instruction add $s4, $t1, $t2. Assume $t1 and $t2 initially contain 1 and 2 respectively. Executing this instruction involves several steps. The instruction word is read from the instruction memory, and the program counter is incremented by 4. The sources $t1 and $t2 are read from the register file. The values 1 and 2 are added by the ALU. The result (3) is stored back into $s4 in the register file. How the add goes through the datapath:  10100 I [15 - 11] How the add goes through the datapath 4 I [25 - 21] 01001 I [20 - 16] 01010 RegWrite 00...01 00...10 00...11 PC+4 Performance of Single-cycle Design:  Performance of Single-cycle Design CPU timeX,P = Instructions executedP * CPIX,P * Clock cycle timeX Edge-triggered state elements:  Edge-triggered state elements In an instruction like add $t1, $t1, $t2, how do we know $t1 is not updated until after its original value is read? We’ll assume that our state elements are positive edge triggered, and are updated only on the positive edge of a clock signal. The register file and data memory have explicit write control signals, RegWrite and MemWrite. These units can be written to only if the control signal is asserted and there is a positive clock edge. In a single-cycle machine the PC is updated on each clock cycle, so we don’t bother to give it an explicit write control signal. The datapath and the clock:  The datapath and the clock On a positive clock edge, the PC is updated with a new address. A new instruction can then be loaded from memory. The control unit sets the datapath signals appropriately so that registers are read, ALU output is generated, data memory is read or written, and branch target addresses are computed. Several things happen on the next positive clock edge. The register file is updated for arithmetic or lw instructions. Data memory is written for a sw instruction. The PC is updated to point to the next instruction. In a single-cycle datapath everything in Step 2 must complete within one clock cycle, before the next positive clock edge. How long is that clock cycle? The slowest instruction...:  The slowest instruction... If all instructions must complete within one clock cycle, then the cycle time has to be large enough to accommodate the slowest instruction. For example, lw $t0, –4($sp) needs 8ns, assuming the delays shown here. 2 ns 2 ns 2 ns 1 ns 0 ns 0 ns 0 ns 0 ns ...determines the clock cycle time:  ...determines the clock cycle time If we make the cycle time 8ns then every instruction will take 8ns, even if they don’t need that much time. For example, the instruction add $s4, $t1, $t2 really needs just __ns. How bad is this?:  How bad is this? With these same component delays, a sw instruction would need 7ns, and beq would need just 5ns. Let’s consider the gcc instruction mix from p. 189 of the textbook. With a single-cycle datapath, each instruction would require 8ns. But if we could execute instructions as fast as possible, the average time per instruction for gcc would be: (48% x 6ns) + (22% x 8ns) + (11% x 7ns) + (19% x 5ns) = 6.36ns The single-cycle datapath is about 1.26 times slower! It gets worse...:  It gets worse... We’ve made very optimistic assumptions about memory latency: Main memory accesses on modern machines is andgt;50ns. For comparison, an ALU on the Pentium4 takes ~0.3ns. Our worst case cycle (loads/stores) includes 2 memory accesses A modern single cycle implementation would be stuck at andlt;10Mhz. Caches will improve common case access time, not worst case. Tying frequency to worst case path violates first law of performance!! Summary:  Summary Performance is one of the most important criteria in judging systems. Here we’ll focus on Execution time. Our main performance equation explains how performance depends on several factors related to both hardware and software. CPU timeX,P = Instructions executedP * CPIX,P * Clock cycle timeX It can be hard to measure these factors in real life, but this is a useful guide for comparing systems and designs. A single-cycle CPU has two main disadvantages. The cycle time is limited by the worst case latency. It isn’t efficiently using its hardware. Next time, we’ll see how this can be rectified with pipelining.

Related presentations


Other presentations created by Techy_Guy

Character Analysis
04. 01. 2008
0 views

Character Analysis

Roosevelt and Latin America
22. 10. 2007
0 views

Roosevelt and Latin America

Hawaiian Humpback Whale
17. 09. 2007
0 views

Hawaiian Humpback Whale

rainforest
02. 10. 2007
0 views

rainforest

Comvalid BGPsentinel
07. 10. 2007
0 views

Comvalid BGPsentinel

PETERPAN
10. 10. 2007
0 views

PETERPAN

across crocodile lake
11. 10. 2007
0 views

across crocodile lake

MLM basic info
12. 10. 2007
0 views

MLM basic info

VortragRichter
15. 10. 2007
0 views

VortragRichter

azerbaijan
15. 10. 2007
0 views

azerbaijan

ch02jjm
19. 10. 2007
0 views

ch02jjm

PRNAV Eurocontrol presentation
19. 10. 2007
0 views

PRNAV Eurocontrol presentation

Hakkarainen 091104
17. 09. 2007
0 views

Hakkarainen 091104

Extreme Ostrich2
17. 09. 2007
0 views

Extreme Ostrich2

Soy Protein in Baking
04. 10. 2007
0 views

Soy Protein in Baking

McMurrenTidbits
23. 10. 2007
0 views

McMurrenTidbits

Larijani stemcell ABA2007 Final
24. 10. 2007
0 views

Larijani stemcell ABA2007 Final

F Gauze
24. 10. 2007
0 views

F Gauze

TornadoSafetyAMS
07. 10. 2007
0 views

TornadoSafetyAMS

nii report
09. 10. 2007
0 views

nii report

NS102 3a S07 Fighting Sail
21. 10. 2007
0 views

NS102 3a S07 Fighting Sail

am0845 Khanna
16. 11. 2007
0 views

am0845 Khanna

culturechange
10. 12. 2007
0 views

culturechange

Jeopardy
29. 10. 2007
0 views

Jeopardy

masstheory
02. 11. 2007
0 views

masstheory

Finnish Chemicals information
21. 08. 2007
0 views

Finnish Chemicals information

zodiac
21. 08. 2007
0 views

zodiac

ICT Expo Presentation
21. 08. 2007
0 views

ICT Expo Presentation

words alive notes
21. 08. 2007
0 views

words alive notes

notes 13
21. 08. 2007
0 views

notes 13

200612011440150 ser mama
01. 10. 2007
0 views

200612011440150 ser mama

t5f2
07. 11. 2007
0 views

t5f2

PHYS 124 lt 2
13. 11. 2007
0 views

PHYS 124 lt 2

Localization days1 2
14. 11. 2007
0 views

Localization days1 2

Barlow
15. 11. 2007
0 views

Barlow

CEO breakfast Mar
16. 11. 2007
0 views

CEO breakfast Mar

SEVESO II 28 04 2003 d jansen
23. 11. 2007
0 views

SEVESO II 28 04 2003 d jansen

farawayplaces quiz
31. 10. 2007
0 views

farawayplaces quiz

lino hospitalstay 2005
28. 12. 2007
0 views

lino hospitalstay 2005

eno
05. 10. 2007
0 views

eno

Destinos Tradicionale
22. 10. 2007
0 views

Destinos Tradicionale

El Karib Hagmann 2001 HEKS ACORD
23. 10. 2007
0 views

El Karib Hagmann 2001 HEKS ACORD

Bioceramics
05. 01. 2008
0 views

Bioceramics

dennis
07. 01. 2008
0 views

dennis

DNR wetland benefits
07. 01. 2008
0 views

DNR wetland benefits

Norm Wright Presentation06
17. 09. 2007
0 views

Norm Wright Presentation06

Tudor Sports
21. 08. 2007
0 views

Tudor Sports

watson 2006
21. 08. 2007
0 views

watson 2006

IBM Presentation Roel Spee
24. 10. 2007
0 views

IBM Presentation Roel Spee

david simek
17. 09. 2007
0 views

david simek

75thWinter Silver
02. 08. 2007
0 views

75thWinter Silver

Revay Presentation
17. 09. 2007
0 views

Revay Presentation

week12 f03
17. 09. 2007
0 views

week12 f03

Ch12 ResolutionTheoremPro ving
17. 09. 2007
0 views

Ch12 ResolutionTheoremPro ving

INFOCOM99
05. 10. 2007
0 views

INFOCOM99

RoHS Presentation3 May
12. 10. 2007
0 views

RoHS Presentation3 May

Botany
07. 12. 2007
0 views

Botany

Week6February20 07
20. 02. 2008
0 views

Week6February20 07

Microcosmo Parte II
12. 10. 2007
0 views

Microcosmo Parte II

TSW
29. 02. 2008
0 views

TSW

HazMat Flow Study
26. 02. 2008
0 views

HazMat Flow Study

Vegetarian Nutrition 101
04. 03. 2008
0 views

Vegetarian Nutrition 101

White 10th Inter mountain
21. 08. 2007
0 views

White 10th Inter mountain

hondaimobil
02. 01. 2008
0 views

hondaimobil

Cfi
10. 03. 2008
0 views

Cfi

Timber Bridge Presentation
01. 01. 2008
0 views

Timber Bridge Presentation

carstenschymik
29. 12. 2007
0 views

carstenschymik

Ch 22 WB
07. 04. 2008
0 views

Ch 22 WB

Macroclean
10. 04. 2008
0 views

Macroclean

agingandwork
13. 04. 2008
0 views

agingandwork

nyBrazeau
14. 04. 2008
0 views

nyBrazeau

presentation total
16. 04. 2008
0 views

presentation total

3 Tufano2002
17. 04. 2008
0 views

3 Tufano2002

Chapter 18
18. 04. 2008
0 views

Chapter 18

Baltic states and Russia
12. 10. 2007
0 views

Baltic states and Russia

quotes
03. 10. 2007
0 views

quotes

WDR 2008
29. 11. 2007
0 views

WDR 2008

CHLA PSRS Overview
30. 04. 2008
0 views

CHLA PSRS Overview

15 UKernel
02. 05. 2008
0 views

15 UKernel

Mr Logan OCCAR
06. 03. 2008
0 views

Mr Logan OCCAR

shen
15. 10. 2007
0 views

shen

Industry Brief
22. 10. 2007
0 views

Industry Brief

sess 4 solano
18. 06. 2007
0 views

sess 4 solano

sess 2 vollmer
18. 06. 2007
0 views

sess 2 vollmer

NSDI05 poster
18. 06. 2007
0 views

NSDI05 poster

NLC talk
18. 06. 2007
0 views

NLC talk

My Proxy GW06
18. 06. 2007
0 views

My Proxy GW06

my Master 4
18. 06. 2007
0 views

my Master 4

More Mosaics
18. 06. 2007
0 views

More Mosaics

MEM SPI Jan00
18. 06. 2007
0 views

MEM SPI Jan00

VCA Org Charts
11. 12. 2007
0 views

VCA Org Charts

cjdim com Boudchiche
23. 10. 2007
0 views

cjdim com Boudchiche

GA Conf06China1
25. 03. 2008
0 views

GA Conf06China1

lecture 7 deadlock
17. 09. 2007
0 views

lecture 7 deadlock

Neptune Presentation
15. 06. 2007
0 views

Neptune Presentation

neptune
15. 06. 2007
0 views

neptune

Mehregan
18. 06. 2007
0 views

Mehregan

Plants are very useful
15. 06. 2007
0 views

Plants are very useful

Learning Phonics
15. 06. 2007
0 views

Learning Phonics

Learning Percent III
15. 06. 2007
0 views

Learning Percent III

Learning Percent I
15. 06. 2007
0 views

Learning Percent I

Physical Education Procedures
15. 06. 2007
0 views

Physical Education Procedures

Penguins
15. 06. 2007
0 views

Penguins

Olympic Wax Museum
15. 06. 2007
0 views

Olympic Wax Museum

howe9
17. 09. 2007
0 views

howe9

GSantin Siena 2 SpaceTools
03. 01. 2008
0 views

GSantin Siena 2 SpaceTools

gunderia powerpointlab
26. 11. 2007
0 views

gunderia powerpointlab

MySQL UC solid DB xact
18. 06. 2007
0 views

MySQL UC solid DB xact

Civics Lecture
31. 12. 2007
0 views

Civics Lecture

Physics and psycho2
14. 02. 2008
0 views

Physics and psycho2

TOUREDIT
12. 03. 2008
0 views

TOUREDIT

harvard deas
03. 01. 2008
0 views

harvard deas

Angelology
01. 10. 2007
0 views

Angelology

HenryVIII wwtbam
21. 08. 2007
0 views

HenryVIII wwtbam

AGU 2002
03. 10. 2007
0 views

AGU 2002

RubÃn Blades
22. 10. 2007
0 views

RubÃn Blades

tran present
21. 08. 2007
0 views

tran present

BU01
17. 09. 2007
0 views

BU01

Thode
17. 09. 2007
0 views

Thode

PP R CAJAR
22. 10. 2007
0 views

PP R CAJAR

moore lightning uw05
17. 09. 2007
0 views

moore lightning uw05

Space- The Outside World
15. 06. 2007
0 views

Space- The Outside World

arts and humanities applications
22. 11. 2007
0 views

arts and humanities applications

Session9 CATHALAC UNDP
25. 10. 2007
0 views

Session9 CATHALAC UNDP

use sunscreen
17. 09. 2007
0 views

use sunscreen

aatom
20. 11. 2007
0 views

aatom

9681
02. 08. 2007
0 views

9681

HHDL
15. 10. 2007
0 views

HHDL