dw olap

Information about dw olap

Published on June 18, 2007

Author: FunSchool

Source: authorstream.com

Content

Data Warehousing andOLAP:  Data Warehousing and OLAP Hector Garcia-Molina Stanford University Warehousing:  Warehousing Growing industry: $8 billion in 1998 Range from desktop to huge: Walmart: 900-CPU, 2,700 disk, 23TB Teradata system Lots of buzzwords, hype slice andamp; dice, rollup, MOLAP, pivot, ... Outline:  Outline What is a data warehouse? Why a warehouse? Models andamp; operations Implementing a warehouse Future directions What is a Warehouse?:  What is a Warehouse? Collection of diverse data subject oriented aimed at executive, decision maker often a copy of operational data with value-added data (e.g., summaries, history) integrated time-varying non-volatile What is a Warehouse?:  What is a Warehouse? Collection of tools gathering data cleansing, integrating, ... querying, reporting, analysis data mining monitoring, administering warehouse Warehouse Architecture:  Warehouse Architecture Metadata Why a Warehouse?:  Why a Warehouse? Two Approaches: Query-Driven (Lazy) Warehouse (Eager) Query-Driven Approach:  Query-Driven Approach Advantages of Warehousing:  Advantages of Warehousing High query performance Queries not visible outside warehouse Local processing at sources unaffected Can operate when sources unavailable Can query data not stored in a DBMS Extra information at warehouse Modify, summarize (store aggregates) Add historical information Advantages of Query-Driven:  Advantages of Query-Driven No need to copy data less storage no need to purchase data More up-to-date data Query needs can be unknown Only query interface needed at sources May be less draining on sources OLTP vs. OLAP:  OLTP vs. OLAP OLTP: On Line Transaction Processing Describes processing at operational sites OLAP: On Line Analytical Processing Describes processing at warehouse OLTP vs. OLAP:  OLTP vs. OLAP Mostly updates Many small transactions Mb-Tb of data Raw data Clerical users Up-to-date data Consistency, recoverability critical Mostly reads Queries long, complex Gb-Tb of data Summarized, consolidated data Decision-makers, analysts as users OLTP OLAP Data Marts:  Data Marts Smaller warehouses Spans part of organization e.g., marketing (customers, products, sales) Do not require enterprise-wide consensus but long term integration problems? Warehouse Models & Operators:  Warehouse Models andamp; Operators Data Models relations stars andamp; snowflakes cubes Operators slice andamp; dice roll-up, drill down pivoting other Star:  Star Star Schema:  Star Schema Terms:  Terms Fact table Dimension tables Measures Dimension Hierarchies:  Dimension Hierarchies store sType city region è snowflake schema è constellations Cube:  Cube Fact table view: Multi-dimensional cube: dimensions = 2 3-D Cube:  3-D Cube dimensions = 3 Multi-dimensional cube: Fact table view: ROLAP vs. MOLAP:  ROLAP vs. MOLAP ROLAP: Relational On-Line Analytical Processing MOLAP: Multi-Dimensional On-Line Analytical Processing Aggregates:  Aggregates Add up amounts for day 1 In SQL: SELECT sum(amt) FROM SALE WHERE date = 1 81 Aggregates:  Aggregates Add up amounts by day In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date Another Example:  Another Example Add up amounts by day, product In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId drill-down rollup Aggregates:  Aggregates Operators: sum, count, max, min, median, ave 'Having' clause Using dimension hierarchy average by region (within store) maximum by month (within date) Cube Aggregation:  Cube Aggregation day 2 day 1 129 . . . Example: computing sums Cube Operators:  Cube Operators day 2 day 1 129 . . . sale(c1,*,*) sale(*,*,*) sale(c2,p2,*) Extended Cube:  Extended Cube day 2 day 1 * sale(*,p2,*) Aggregation Using Hierarchies:  Aggregation Using Hierarchies customer region country (customer c1 in Region A; customers c2, c3 in Region B) Pivoting:  Pivoting Multi-dimensional cube: Fact table view: Implementing a Warehouse:  Implementing a Warehouse Monitoring: Sending data from sources Integrating: Loading, cleansing,... Processing: Query processing, indexing, ... Managing: Metadata, Design, ... Monitoring:  Monitoring Source Types: relational, flat file, IMS, VSAM, IDMS, WWW, news-wire, … Incremental vs. Refresh Monitoring Techniques:  Monitoring Techniques Periodic snapshots Database triggers Log shipping Data shipping (replication service) Transaction shipping Polling (queries to source) Screen scraping Application level monitoring è Advantages andamp; Disadvantages!! Monitoring Issues:  Monitoring Issues Frequency periodic: daily, weekly, … triggered: on 'big' change, lots of changes, ... Data transformation convert data to uniform format remove andamp; add fields (e.g., add date to get history) Standards (e.g., ODBC) Gateways Integration:  Integration Data Cleaning Data Loading Derived Data Data Cleaning:  Data Cleaning Migration (e.g., yen ð dollars) Scrubbing: use domain-specific knowledge (e.g., social security numbers) Fusion (e.g., mail list, customer merging) Auditing: discover rules andamp; relationships (like data mining) Loading Data:  Loading Data Incremental vs. refresh Off-line vs. on-line Frequency of loading At night, 1x a week/month, continuously Parallel/Partitioned load Derived Data:  Derived Data Derived Warehouse Data indexes aggregates materialized views (next slide) When to update derived data? Incremental vs. refresh Materialized Views:  Materialized Views Define new warehouse relations using SQL expressions Processing:  Processing ROLAP servers vs. MOLAP servers Index Structures What to Materialize? Algorithms ROLAP Server:  ROLAP Server Relational OLAP Server tools Special indices, tuning; Schema is 'denormalized' MOLAP Server:  MOLAP Server Multi-Dimensional OLAP Server multi-dimensional server M.D. tools could also sit on relational DBMS Index Structures:  Index Structures Traditional Access Methods B-trees, hash tables, R-trees, grids, … Popular in Warehouses inverted lists bit map indexes join indexes text indexes Inverted Lists:  Inverted Lists . . . age index inverted lists data records Using Inverted Lists:  Using Inverted Lists Query: Get people with age = 20 and name = 'fred' List for age = 20: r4, r18, r34, r35 List for name = 'fred': r18, r52 Answer is intersection: r18 Bit Maps:  Bit Maps . . . age index bit maps data records Using Bit Maps:  Using Bit Maps Query: Get people with age = 20 and name = 'fred' List for age = 20: 1101100000 List for name = 'fred': 0100000001 Answer is intersection: 010000000000 Good if domain cardinality small Bit vectors can be compressed Join:  Join 'Combine' SALE, PRODUCT relations In SQL: SELECT * FROM SALE, PRODUCT Join Indexes:  Join Indexes join index What to Materialize?:  What to Materialize? Store in warehouse results useful for common queries Example: day 2 day 1 129 . . . total sales materialize Materialization Factors:  Materialization Factors Type/frequency of queries Query response time Storage cost Update cost Cube Aggregates Lattice:  Cube Aggregates Lattice city, product, date city, product city, date product, date city product date all 129 use greedy algorithm to decide what to materialize Dimension Hierarchies:  Dimension Hierarchies all state city Dimension Hierarchies:  Dimension Hierarchies city, product city, product, date city, date product, date city product date all state, product, date state, date state, product state not all arcs shown... Interesting Hierarchy:  Interesting Hierarchy all years quarters months days weeks conceptual dimension table Design:  Design What data is needed? Where does it come from? How to clean data? How to represent in warehouse (schema)? What to summarize? What to materialize? What to index? Tools:  Tools Development design andamp; edit: schemas, views, scripts, rules, queries, reports Planning andamp; Analysis what-if scenarios (schema changes, refresh rates), capacity planning Warehouse Management performance monitoring, usage patterns, exception reporting System andamp; Network Management measure traffic (sources, warehouse, clients) Workflow Management 'reliable scripts' for cleaning andamp; analyzing data Current State of Industry:  Current State of Industry Extraction and integration done off-line Usually in large, time-consuming, batches Everything copied at warehouse Not selective about what is stored Query benefit vs storage andamp; update cost Query optimization aimed at OLTP High throughput instead of fast response Process whole query before displaying anything

Related presentations


Other presentations created by FunSchool

Got Discipline PowerPoint
18. 06. 2007
0 views

Got Discipline PowerPoint

InterAccess
30. 04. 2008
0 views

InterAccess

lockin1
28. 04. 2008
0 views

lockin1

IIEF MCX
22. 04. 2008
0 views

IIEF MCX

MapAccNov01
18. 04. 2008
0 views

MapAccNov01

ICW Presentation
17. 04. 2008
0 views

ICW Presentation

Lecture 9 Macro model
16. 04. 2008
0 views

Lecture 9 Macro model

ATE12
14. 04. 2008
0 views

ATE12

TeAM NEF ESF Report
13. 04. 2008
0 views

TeAM NEF ESF Report

SAEA06Robinson
10. 04. 2008
0 views

SAEA06Robinson

Larose
09. 04. 2008
0 views

Larose

dairybreeds
19. 10. 2007
0 views

dairybreeds

hex
18. 09. 2007
0 views

hex

infocom2001sfb
18. 09. 2007
0 views

infocom2001sfb

2003 Student Info System IBM
18. 09. 2007
0 views

2003 Student Info System IBM

vfpweb
18. 09. 2007
0 views

vfpweb

Fine Art and Literature
13. 10. 2007
0 views

Fine Art and Literature

cell3
15. 10. 2007
0 views

cell3

opel
19. 10. 2007
0 views

opel

horrocks
23. 10. 2007
0 views

horrocks

yang45
15. 10. 2007
0 views

yang45

GGFpart2
28. 11. 2007
0 views

GGFpart2

Sunken backbone game
10. 10. 2007
0 views

Sunken backbone game

BrandtPadua
16. 10. 2007
0 views

BrandtPadua

Lectures4 5 Ch2
07. 11. 2007
0 views

Lectures4 5 Ch2

vugia emerging
23. 10. 2007
0 views

vugia emerging

1a co clustering
19. 11. 2007
0 views

1a co clustering

mobopts 5
01. 12. 2007
0 views

mobopts 5

majoranastatus
10. 12. 2007
0 views

majoranastatus

Northern Renaissance Art
14. 12. 2007
0 views

Northern Renaissance Art

world Hunger
13. 08. 2007
0 views

world Hunger

unconscious Origins
13. 08. 2007
0 views

unconscious Origins

Vending Ala Carte
13. 08. 2007
0 views

Vending Ala Carte

weaning mice
13. 08. 2007
0 views

weaning mice

Xiushi Yang
13. 08. 2007
0 views

Xiushi Yang

LarsonWelcome
16. 10. 2007
0 views

LarsonWelcome

Ken stellar halo
15. 11. 2007
0 views

Ken stellar halo

African Union 2050
23. 12. 2007
0 views

African Union 2050

frfin
12. 10. 2007
0 views

frfin

Bhutan Hunger presentation RCC
04. 01. 2008
0 views

Bhutan Hunger presentation RCC

10 31 05 chaps 15 16
02. 11. 2007
0 views

10 31 05 chaps 15 16

HAtrash
15. 10. 2007
0 views

HAtrash

gti pmgti
24. 10. 2007
0 views

gti pmgti

Uniform Wear
18. 09. 2007
0 views

Uniform Wear

Lsn 6 Maya and Inca
20. 11. 2007
0 views

Lsn 6 Maya and Inca

monster
21. 11. 2007
0 views

monster

campusmap
28. 12. 2007
0 views

campusmap

wolson presentation
17. 10. 2007
0 views

wolson presentation

thurston
09. 10. 2007
0 views

thurston

compfpm flood plain functions
03. 01. 2008
0 views

compfpm flood plain functions

jeff kephart 11 03
18. 09. 2007
0 views

jeff kephart 11 03

LACEApresJZ1
26. 10. 2007
0 views

LACEApresJZ1

Wedding PP Presntation attach
27. 11. 2007
0 views

Wedding PP Presntation attach

RHCh1
20. 02. 2008
0 views

RHCh1

MATERIAL HANDLING PREVIEW
26. 02. 2008
0 views

MATERIAL HANDLING PREVIEW

unaids
13. 08. 2007
0 views

unaids

tgs04b
18. 09. 2007
0 views

tgs04b

about
28. 09. 2007
0 views

about

CPPStudyPhysicalProt ection
19. 11. 2007
0 views

CPPStudyPhysicalProt ection

nliwiSCS
12. 10. 2007
0 views

nliwiSCS

TurfBMP81704
14. 02. 2008
0 views

TurfBMP81704

CAlbala
22. 10. 2007
0 views

CAlbala

LT SLIDE show
11. 03. 2008
0 views

LT SLIDE show

marie curie jenam june 2005
13. 03. 2008
0 views

marie curie jenam june 2005

volcanoes group5
25. 03. 2008
0 views

volcanoes group5

gusev
15. 10. 2007
0 views

gusev

Parent Presentation predators
01. 01. 2008
0 views

Parent Presentation predators

Williams Tanzania
13. 08. 2007
0 views

Williams Tanzania

PrelimI
07. 10. 2007
0 views

PrelimI

personalities
12. 10. 2007
0 views

personalities

MeetingFreightDataCh allenges
28. 02. 2008
0 views

MeetingFreightDataCh allenges

04 19 SW
29. 10. 2007
0 views

04 19 SW

APP The American Experience WK3
17. 12. 2007
0 views

APP The American Experience WK3

062507
04. 03. 2008
0 views

062507

wedekind
08. 10. 2007
0 views

wedekind

IZMO CONCIERGE
30. 10. 2007
0 views

IZMO CONCIERGE

eh wellseptic
07. 11. 2007
0 views

eh wellseptic

20070114 sanog9 apnic update
27. 03. 2008
0 views

20070114 sanog9 apnic update

ATCNewswireCatalogue EN Q207
02. 10. 2007
0 views

ATCNewswireCatalogue EN Q207

db pres okutani whois
09. 10. 2007
0 views

db pres okutani whois

hotchips 2004 motes
18. 06. 2007
0 views

hotchips 2004 motes

gww sid july27
18. 06. 2007
0 views

gww sid july27

eolson AUV2004
18. 06. 2007
0 views

eolson AUV2004

EMS Stake holders
18. 06. 2007
0 views

EMS Stake holders

edinburgh condor tutorial
18. 06. 2007
0 views

edinburgh condor tutorial

dztalk 3
18. 06. 2007
0 views

dztalk 3

Digital Photos Hitchcock
18. 06. 2007
0 views

Digital Photos Hitchcock

DGov transform wo notes web
18. 06. 2007
0 views

DGov transform wo notes web

defense
18. 06. 2007
0 views

defense

palais 04
18. 09. 2007
0 views

palais 04

chop06
29. 10. 2007
0 views

chop06

MISR images Aug2001
21. 10. 2007
0 views

MISR images Aug2001

PresentacionGobCorp2 003
22. 10. 2007
0 views

PresentacionGobCorp2 003

cmc2q06
18. 09. 2007
0 views

cmc2q06

AfricanAmericans A Z
03. 10. 2007
0 views

AfricanAmericans A Z

Proper KeyBoarding Technique
15. 06. 2007
0 views

Proper KeyBoarding Technique

Dinosaurs
15. 06. 2007
0 views

Dinosaurs

Memories
15. 06. 2007
0 views

Memories

Like - Having Different Hobbies
15. 06. 2007
0 views

Like - Having Different Hobbies

humanity
15. 06. 2007
0 views

humanity

Volcanoes are Hot Stuff
15. 06. 2007
0 views

Volcanoes are Hot Stuff

China2005NoAnimation
23. 10. 2007
0 views

China2005NoAnimation

BTL Statistics2005
18. 09. 2007
0 views

BTL Statistics2005

Dillon
18. 06. 2007
0 views

Dillon

infovis03 talk slides
18. 09. 2007
0 views

infovis03 talk slides

Block 6 Basler Ausschuss
16. 10. 2007
0 views

Block 6 Basler Ausschuss

scarlett 28oct05
18. 09. 2007
0 views

scarlett 28oct05

barrier
18. 09. 2007
0 views

barrier

SNIMA BTP
24. 10. 2007
0 views

SNIMA BTP

familyweek3
24. 02. 2008
0 views

familyweek3

rutkowska bheurope2006
18. 09. 2007
0 views

rutkowska bheurope2006

ATA2007 US MA MReyna
23. 10. 2007
0 views

ATA2007 US MA MReyna

P Jonson AIC
18. 10. 2007
0 views

P Jonson AIC

Denver 05b
18. 06. 2007
0 views

Denver 05b

6 WP4 TRT
20. 03. 2008
0 views

6 WP4 TRT

Enigma1
31. 12. 2007
0 views

Enigma1

P416 Lec5 S07
27. 09. 2007
0 views

P416 Lec5 S07

mulligan
18. 09. 2007
0 views

mulligan