127 barto p

Information about 127 barto p

Published on January 16, 2008

Author: Tibald

Source: authorstream.com

Content

Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems :  Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems Rod Barto NASA/GSFC Office of Logic Design Spacecraft Digital Electronics 3312 Moonlight El Paso, Texas 79904 Reconfigurable Computing is…:  Reconfigurable Computing is… A design methodology by which computational components can be arranged in several ways to perform various computing tasks Two types of reconfigurable computing: Static, i.e., the computing system is configured before launch Dynamic, i.e., the computing system can be reconfigured after launch Static Reconfigurability:  Static Reconfigurability Several examples exist, e.g., Cray Typically processing modules connected by an intercommunication mechanism, e.g., Ethernet Goals are To reduce system development costs To provide higher performance computing Dynamic Reconfigurability (DR):  Dynamic Reconfigurability (DR) Processing modules that can be reconfigured in flight Goal is to provide processing support for algorithms that do not map well onto general purpose computers using reduced amounts of hardware Outline of Paper:  Outline of Paper Discuss the computation of a series of algorithms on general purpose, special purpose, and DR computers Calculate the execution time of an image processing algorithm on a concept DR computer Compare the reconfiguration time of a Xilinx FPGA with the algorithm execution time calculated in section 2. Obtain an extremely rough estimate of image processing algorithm execution time on a flight computer Conclude that the DR computer described offers higher performance than does the flight computer Section 1: Algorithm Execution on General Purpose (GP), Special Purpose (SP), and DR Computers:  Section 1: Algorithm Execution on General Purpose (GP), Special Purpose (SP), and DR Computers Processing example:  Processing example A computing function is the composition of n algorithms executed serially Can be executed on a general purpose computer (GP) or a special purpose computer (SP) Execution on a GP Computer :  Execution on a GP Computer Processing time of each stage = ti, i=1..n Total processing time = Latency time = GP computer must execute processing stages sequentially, and cannot exploit parallelism in overall computing function Processing on an SP Processor:  Processing on an SP Processor Each stage is an independently operating processor designed specifically for the algorithm it executes Processing time of each stage = ti, i=1..n Results appear at rate of one per max(ti), 1=1..n Latency time = Performance increase comes from two factors: Pipelining of constituent algorithms exploiting parallelism Processors being designed specifically for their algorithms Processing on a DR Computer:  Processing on a DR Computer Two processing elements alternately process and reconfigure, i.e., fodd executes one algorithm while feven reconfigures for the next algorithm, etc. fodd feven Input Output DR Computer Processing Flow:  DR Computer Processing Flow Performance increase comes from configuring processors specifically for the algorithm they are executing Do not get increase from exploiting parallelism. Section 2: Execution Time of an Image Processing Algorithm on a Concept DR Computer:  Section 2: Execution Time of an Image Processing Algorithm on a Concept DR Computer DR Computer Concept:  DR Computer Concept RAM0 is source for FPGA0, destination for FPGFA1, etc. Processing elements are implemented in FPGAs FPGA0 and FPGA1 alternately process and reconfigure, as previously discussed. Input and output not shown FPGA0 FPGA1 RAM1 RAM0 AlgorithmExample: 3x3 Image Convolution:  AlgorithmExample: 3x3 Image Convolution Shifting in 1 row at a time pixel-serial, and parallel shifting into the upper 3 row registers, the rows are shifted around through the convolution processor. All the row registers and processing is inside the FPGA. The results are written to the destination RAM after a latency of 3 row reads. Image width in pixels row i-1 row i row i+1 Parallel shift rows up row i+2 Circular shift rows through convolution processor 3x3 convolution processor Destination RAM Source RAM one pixel Convolution Operation:  Convolution Operation Used, for example, to compute the intensity gradient (derivative) at pixel (i,j) Result = P(i-1,j-1)*m11+P(i-1,j)*m12+P(i-1,j-1)*m13+…+P(i+1,j+1)*m33 Pixel array Convolution mask Convolution Calculation:  Convolution Calculation Arithmetic processing may require some pipelining Result(I,j) Convolution Timing:  Convolution Timing Total time = latency+processing = 20.971 msec This assumes we can get pixels into the FPGA at a 20 nsec/pixel rate Latency = time to read 3 rows: 1024 pixels *3 rows * 20 nsec/pixel = 61 usec Processing = time to stream remaining 1021 rows through and process: 1024 * 1021 * 20 nsec = 20.910 msec Larger convolutions (e.g., 7x7) have longer latencies, but same computation time Calculation is for a mono image, stereo image would take twice as long. Section 3: Comparing the Reconfiguration Time of a Xilinx FPGA With the Algorithm Execution Time Calculated in Section 2. :  Section 3: Comparing the Reconfiguration Time of a Xilinx FPGA With the Algorithm Execution Time Calculated in Section 2. DR Computer Processing Element: Virtex-4 LX FPGA:  DR Computer Processing Element: Virtex-4 LX FPGA Eight versions: XC4VLX15, -25, -40, -60, -80, -100, -160, -200 Logic hierarchically arranged: 2 flip-flops per slice 4 slices per CLB Time to Configure FPGA:  Time to Configure FPGA FPGA Configuration Sequence PROG_B INIT_B CCLK DONE Tpl Tconfig Total Configuration Time Configuration Timing: Tpl:  Configuration Timing: Tpl Tpl = 0.5 usec/frame “frame” is a unit of configuration RAM Tpl period clears configuration RAM Configuration Timing: Tconfig:  Configuration Timing: Tconfig FPGA programmed by bitstream CCLK (programming CLK) can run at 100 MHz Parallel mode loads 8 bits per CCLK Total Configuration Time:  Total Configuration Time Plus some extra time amounting to a few CCLK cycles (@ 10 nsec each) Processing and Reconfiguration Time Comparison:  Processing and Reconfiguration Time Comparison Convolution execution is faster than reconfiguration Convolution = 21 msec mono, 42 msec stereo Reconfiguration = 81 msec Assuming -200 device Processing shown is well within FPGA’s capabilities More complex algorithms may require use of FPGA performance features Much higher internal clock rates Large internal RAM Dedicated arithmetic support in –SX series What this shows is that it’s reasonable to consider alternating execution and reconfiguration of two FPGAs Section 4: An Extremely Rough Estimate of Image Processing Algorithm Execution Time on a Flight Computer:  Section 4: An Extremely Rough Estimate of Image Processing Algorithm Execution Time on a Flight Computer GP Computing Performance Estimate:  GP Computing Performance Estimate DANGER: really rough estimate! Based on data from this paper: “Stereo Vision and Rover Navigation Software for Planetary Exploration”, Steven B. Goldberg, Indelible Systems; Mark Maimone, Larry Matthies, JPL; 2002 IEEE Aerospace Conference Available at robotics.jpl.nasa.gov/people/mwm/visnavsw/aero.pdf Describes processing and algorithms to be used on 2004 Rover missions, and Rover requirements. Published Vision Algorithm Timing:  Published Vision Algorithm Timing Timed on Pentium III 700 MHz CPU, 32K L1 cache, 256K L2 cache, 512M RAM, Win2K algorithms explicitly timed (names from paper): The Gaussian and most vision algorithms involve neighborhood operations that are comparable to an image convolution of some size Flight Computer Performance:  Flight Computer Performance Flight processor is RAD6000 GESTALT Navigation algorithm timed on 3 processors: Assume that the RAD6000 takes 7 times as long as the 500 MHz Pentium Final Peformance Estimate:  Final Peformance Estimate Assume RAD6000 time = 7 times the 500 MHz Pentium time Assume 500 MHz Pentium time = 7/5=1.4 times the 700 MHz Pentium time Then, RAD6000 time is 1.4*7=9.8 times the 700 MHz Pentium time Vision algorithm timing can be estimated as follows: Remember: This is a really rough estimate!! Section 5: Conclusions:  Section 5: Conclusions What We Have Shown:  What We Have Shown We have shown that the concept DR computer presented executes a 3x3 neighborhood-type algorithm “a lot” faster than it appears that a RAD6000 executes what are probably a bunch of neighborhood algorithms. The reader is cautioned to not try to quantify what “a lot” means based on the data given here. But, it’s a good enough estimate to tell us that this is worth looking into in more detail. Conclusions:  Conclusions Xilinx-based DR computer shows promise for performance enhancement of a vision system By extension, the DR computer shows promise for the performance enhancement of other algorithms.

Related presentations


Other presentations created by Tibald

Delta Sigma Data Converters
09. 01. 2008
0 views

Delta Sigma Data Converters

african trypanosomes
11. 01. 2008
0 views

african trypanosomes

Lecture 14
13. 01. 2008
0 views

Lecture 14

TacSat Path Web OV 6 1 04
22. 01. 2008
0 views

TacSat Path Web OV 6 1 04

april croatia aepc
24. 01. 2008
0 views

april croatia aepc

102 Gerbec Kontic
09. 01. 2008
0 views

102 Gerbec Kontic

Bootstrapping Entrepreneurship
11. 02. 2008
0 views

Bootstrapping Entrepreneurship

freeze drying effects
25. 01. 2008
0 views

freeze drying effects

CPD146 private equity admin
07. 02. 2008
0 views

CPD146 private equity admin

Social Psychology
17. 01. 2008
0 views

Social Psychology

leaf
20. 02. 2008
0 views

leaf

Ped Resp
20. 02. 2008
0 views

Ped Resp

Baroque
26. 02. 2008
0 views

Baroque

licenciamento
10. 01. 2008
0 views

licenciamento

Russian Military Expenditures
05. 03. 2008
0 views

Russian Military Expenditures

FIPSE
11. 03. 2008
0 views

FIPSE

The Urban Mosaic Part I
14. 03. 2008
0 views

The Urban Mosaic Part I

Multimedia Markets
19. 03. 2008
0 views

Multimedia Markets

Ase Hedemark Gallofsta05
28. 01. 2008
0 views

Ase Hedemark Gallofsta05

NotationAndEncoding
15. 01. 2008
0 views

NotationAndEncoding

IndeedNETPresentatio n07 07
08. 04. 2008
0 views

IndeedNETPresentatio n07 07

values
16. 04. 2008
0 views

values

An2F03 IGweek09
12. 02. 2008
0 views

An2F03 IGweek09

gpmems
11. 01. 2008
0 views

gpmems

waddell
24. 04. 2008
0 views

waddell

s shashikant
08. 05. 2008
0 views

s shashikant

2005 JRC Workshop Eisele
02. 05. 2008
0 views

2005 JRC Workshop Eisele

Writing ISAT
18. 02. 2008
0 views

Writing ISAT

vetit 3 resz
28. 02. 2008
0 views

vetit 3 resz

EinarLovdal
16. 01. 2008
0 views

EinarLovdal

AN RBC CRight open
25. 01. 2008
0 views

AN RBC CRight open

comets 1
16. 01. 2008
0 views

comets 1

Cabinet integrated present
21. 01. 2008
0 views

Cabinet integrated present

Almeida ITSAPT
22. 01. 2008
0 views

Almeida ITSAPT

F Couvreux ihop juin04
25. 02. 2008
0 views

F Couvreux ihop juin04

santTOPch02
29. 01. 2008
0 views

santTOPch02

thinking allowed
15. 01. 2008
0 views

thinking allowed

pulso
10. 01. 2008
0 views

pulso