BasevsDIStudio

Information about BasevsDIStudio

Published on February 5, 2008

Author: Berta

Source: authorstream.com

Content

Base SAS® vs. SAS® Data Integration Studio:  Base SAS® vs. SAS® Data Integration Studio Greg Nelson and Danny Grasse Slide2:  Outline Overview of Understanding ETL What SAS approaches do we have? 38 Best Practices Key Areas for Comparison Design and data profiling. Source data extraction. Transformation and loading. Change data capture. Quality review, auditing and exception handling/management. Integration with the production environment and business process components. Summary and Conclusions Overview:  Overview ETL Data Warehousing 101 Data Integration Studio “Consistent” version of the truth Credible information versus Data quality Corporate Information Factory:  Corporate Information Factory Ralph Kimball History Excellence:  Ralph Kimball History Excellence Father of “dimensional” data warehouse design The Data Warehouse Toolkit (I and II) The Data Warehouse Lifecycle Toolkit The Data Warehouse ETL Toolkit The Data Integration Process:  The Data Integration Process 38 Subsystems:  38 Subsystems 38 Sub-systems define your ETL strategy Design and data profiling Source data extraction Transformation and loading Change data capture Quality review, auditing and exception handling/management Integration with the production environment and business process components 38 Subsystems: Category 1:  Design and Data Profiling Transformation and loading Change data capture Source data extraction Quality and exception handling 38 Subsystems: Category 1 Productionizatation Design and Data Profiling:  Design and Data Profiling Design is often played by “soft skills” in the SAS ecosystem Data profiling Base SAS – frequencies, crosstabs, macros and toolkits DataFlux software – data profiling on steroids DI Studio – data profiling currently supported through generalized metadata Data Profiling is an analysis exercise, not a technical one 38 Subsystems: Category 2:  Design and Data Profiling Transformation and loading Change data capture Source data extraction Quality and exception handling 38 Subsystems: Category 2 Productionizatation Source Data Extraction:  Source Data Extraction Source data adapters (including data conversions and filters) SAS/Access products Data Step, SQL and DI Studio Push/pull/dribble How we move the data Filtering & Sorting How we select the data to be moved Data staging (versus accessing) How we land the data 38 Subsystems: Category 3:  Design and Data Profiling Transformation and loading Change data capture Source data extraction Quality and exception handling 38 Subsystems: Category 3 Productionizatation Transformation and Loading:  Transformation and Loading 5. Data conformer. 9. Surrogate key creation system. 12. Fixed hierarchy dimension builder. 13. Variable hierarchy dimension builder. 14. Multivalued dimension bridge table builder. 15. Junk dimension builder. 16. Transaction grain fact table loader. 17. Periodic snapshot grain fact table loader. 18. Accumulating snapshot grain fact table loader. 19. Surrogate key pipeline. 20. Late arriving fact handler. 21. Aggregate builder. 22. Multidimensional cube builder. 23. Real-time partition builder. 24. Dimension manager system. 25. Fact table provider system. Transformation and Loading:  Transformation and Loading Conforming dimensions and facts Good old SAS code (Data Step, SQL, Formats) – all available in DI Studio Good design Creation of surrogate keys Hand coded in Base SAS “Automagic” in DI Studio Building summary tables and cubes Just another “target” in DI Studio 38 Subsystems: Category 4:  Design and Data Profiling Transformation and loading Change data capture Source data extraction Quality and exception handling 38 Subsystems: Category 4 Productionizatation Change Data Capture:  Change Data Capture Change Data Capture:  Change Data Capture When we see new data coming in from the operational system, we have to make a decision about how to handle that change. We have three options: We can overwrite or update the old value (Type I) We can create a new record and create some mechanism for recreating historical references to that data – depending on the date for the report that is being requested (Type II). We can retain both as alternative “realities”. For the latter, we usually create a new column and put the old value in the new column to allow for alternatives to reporting. (Type III) Change Data Capture:  Change Data Capture SAS Approaches Base SAS – very robust using macros Can control everything about the load DI Studio has limited coverage SAS does support CRC-based record comparisons (MD5 function) DI Studio 3 types of loading techniques: update, refresh, append Type I & II are dropdowns; Type II SCD is a transform Doesn’t support Type 3 outside of transform code 38 Subsystems: Category 5:  Design and Data Profiling Transformation and loading Change data capture Source data extraction Quality and exception handling 38 Subsystems: Category 5 Productionizatation Quality Handling:  Quality Handling Quality Handling:  Quality Handling Detecting errors Handling them Providing audit records Quality Management:  Quality Management Detecting errors SAS errors versus data errors DataFlux Data rationalization At the point of data entry Base SAS If then else routines (lookup tables, formats) DI Studio Not much other than BASE SAS Audit trail:  Audit trail Base SAS Log parsing routines DI Studio Workspace server logs Event System Detailed logs, summary logs and event triggers Exception Handling:  Exception Handling Base SAS Macros, put statements in log file DI Studio Simple email, exception tables and log file Event System Subscribe to events Responds to errors, warnings, notes and custom assertions 38 Subsystems: Category 6:  Design and Data Profiling Transformation and loading Change data capture Source data extraction Quality and exception handling 38 Subsystems: Category 6 Productionizatation Productionization of SAS ETL:  Productionization of SAS ETL 26. Job scheduler. 27. Workflow monitor. 28. Recovery and restart system. 29. Parallelizing/pipelining system. 30. Problem escalation system. 31. Version control system. 32. Version migration system. 33. Lineage and dependency analyzer. 34. Compliance reporter. 35. Security system. 36. Backup system. 37. Metadata repository manager. 38. Project management system. Productionization of SAS:  Productionization of SAS Version control, change control, promotion, backup and recovery None is available in BASE SAS Version control – minimal for multi-developer access Change management –partially available in DI Studio Automated promotion – weak and/or not available Backup – metadata server can be backed up (but no rollback feature) Productionization of SAS:  Productionization of SAS Scheduling, dependency management and restartability, including parallelization. Provided by LSF Scheduler Managed by person doing the scheduling not writing the code LSF provides parallelization, but also 'grid' computing with the associated 'pipelining' of steps Productionization of SAS:  Productionization of SAS Metadata management and impact analysis. Very good in DI Studio Productionization of SAS:  Productionization of SAS Project management and problem escalation. Not in scope for DI Studio Summary:  Summary Di Studio is a code generator Can do everything Base SAS can do Metadata and security is the key for why we want to use DI Studio DI Studio writes “better” code in some cases Challenges: change control and what happens when things go bad Slide32:  ThotWave Technologies Thinking Data Danny Grasse Senior Consultant [email protected] Greg Nelson CEO and Founder [email protected]

Related presentations


Other presentations created by Berta

CII PRESENTATION
09. 01. 2008
0 views

CII PRESENTATION

John Wilkes Booth
10. 01. 2008
0 views

John Wilkes Booth

Abdala Amphetamines in Russia
10. 01. 2008
0 views

Abdala Amphetamines in Russia

terzalezione scritturanarrativa
12. 01. 2008
0 views

terzalezione scritturanarrativa

Colonialism
15. 01. 2008
0 views

Colonialism

Evolution for Beginners
17. 01. 2008
0 views

Evolution for Beginners

The Analysis Gatsby
04. 02. 2008
0 views

The Analysis Gatsby

OCTranspo
05. 02. 2008
0 views

OCTranspo

mhd day2005
16. 01. 2008
0 views

mhd day2005

nut info labels etiquettes e
07. 02. 2008
0 views

nut info labels etiquettes e

american revolution
12. 02. 2008
0 views

american revolution

moorthy ms
17. 01. 2008
0 views

moorthy ms

leadinglearning
20. 02. 2008
0 views

leadinglearning

izellerbauhauscli2005
26. 02. 2008
0 views

izellerbauhauscli2005

PersuasiveComp
28. 02. 2008
0 views

PersuasiveComp

ADA overview
07. 03. 2008
0 views

ADA overview

wipo smes uln 07 www 89154
14. 03. 2008
0 views

wipo smes uln 07 www 89154

latair
22. 01. 2008
0 views

latair

Presentation UWIHARP Final
19. 01. 2008
0 views

Presentation UWIHARP Final

Stuart Anderson
03. 04. 2008
0 views

Stuart Anderson

venkatesh
08. 04. 2008
0 views

venkatesh

identifiers
09. 01. 2008
0 views

identifiers

2007112679352641
14. 04. 2008
0 views

2007112679352641

Marketing of Evil book review
16. 04. 2008
0 views

Marketing of Evil book review

21431
17. 04. 2008
0 views

21431

IR3001 Middle East security
23. 04. 2008
0 views

IR3001 Middle East security

DSRDSept19
24. 04. 2008
0 views

DSRDSept19

TNG
02. 05. 2008
0 views

TNG

Gavin Mooney DRS Sept 06
14. 02. 2008
0 views

Gavin Mooney DRS Sept 06

STD11
02. 05. 2008
0 views

STD11

CLOCgonzalez
15. 01. 2008
0 views

CLOCgonzalez

9 10 07Watson Presentation2
30. 01. 2008
0 views

9 10 07Watson Presentation2

f3
13. 01. 2008
0 views

f3

gep2004slideshow
24. 01. 2008
0 views

gep2004slideshow

LuisLobopresentation
17. 01. 2008
0 views

LuisLobopresentation

adcas
16. 01. 2008
0 views

adcas

Weber Paul
14. 01. 2008
0 views

Weber Paul

BoraBora
29. 01. 2008
0 views

BoraBora

swga 004
05. 03. 2008
0 views

swga 004

MSThesisPresentation
07. 02. 2008
0 views

MSThesisPresentation