assoc2

Information about assoc2

Published on February 7, 2008

Author: Savin

Source: authorstream.com

Content

Association Rules: Advanced Topics: Association Rules: Advanced Topics Apriori Adv/Disadv: Apriori Adv/Disadv Advantages: Uses large itemset property. Easily parallelized Easy to implement. Disadvantages: Assumes transaction database is memory resident. Requires up to m database scans. Vertical Layout: Vertical Layout Rather than have Transaction ID – list of items (Transactional) We have Item – List of transactions (TID-list) Now to count itemset AB Intersect TID-list of itemA with TID-list of itemB All data for a particular item is available Eclat Algorithm: Eclat Algorithm Dynamically process each transaction online maintaining 2-itemset counts. Transform Partition L2 using 1-item prefix Equivalence classes - {AB, AC, AD}, {BC, BD}, {CD} Transform database to vertical form Asynchronous Phase For each equivalence class E Compute frequent (E) Asynchronous Phase: Asynchronous Phase Compute Frequent (E_k-1) For all itemsets I1 and I2 in E_k-1 If (I1 ∩ I2 >= minsup) add I1 and I2 to L_k Partition L_k into equivalence classes For each equivalence class E_k in L_k Compute_frequent (E_k) Properties of ECLAT Locality enhancing approach Easy and efficient to parallelize Few scans of database (best case 2) Max-patterns: Max-patterns Frequent pattern {a1, …, a100}  (1001) + (1002) + … + (110000) = 2100-1 = 1.27*1030 frequent sub-patterns! Max-pattern: frequent patterns without proper frequent super pattern BCDE, ACD are max-patterns BCD is not a max-pattern Min_sup=2 Frequent Closed Patterns: Frequent Closed Patterns Conf(acd)=100%  record acd only For frequent itemset X, if there exists no item y s.t. every transaction containing X also contains y, then X is a frequent closed pattern “acd” is a frequent closed pattern Concise rep. of freq pats Reduce # of patterns and rules N. Pasquier et al. In ICDT’99 Min_sup=2 Mining Various Kinds of Rules or Regularities: Mining Various Kinds of Rules or Regularities Multi-level, quantitative association rules, correlation and causality, ratio rules, sequential patterns, emerging patterns, temporal associations, partial periodicity Classification, clustering, iceberg cubes, etc. Multiple-level Association Rules: Multiple-level Association Rules Items often form hierarchy Flexible support settings: Items at the lower level are expected to have lower support. Transaction database can be encoded based on dimensions and levels explore shared multi-level mining ML/MD Associations with Flexible Support Constraints: ML/MD Associations with Flexible Support Constraints Why flexible support constraints? Real life occurrence frequencies vary greatly Diamond, watch, pens in a shopping basket Uniform support may not be an interesting model A flexible model The lower-level, the more dimension combination, and the long pattern length, usually the smaller support General rules should be easy to specify and understand Special items and special group of items may be specified individually and have higher priority Multi-dimensional Association: Multi-dimensional Association Single-dimensional rules: buys(X, “milk”)  buys(X, “bread”) Multi-dimensional rules:  2 dimensions or predicates Inter-dimension assoc. rules (no repeated predicates) age(X,”19-25”)  occupation(X,“student”)  buys(X,“coke”) hybrid-dimension assoc. rules (repeated predicates) age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”) Multi-level Association: Redundancy Filtering: Multi-level Association: Redundancy Filtering Some rules may be redundant due to “ancestor” relationships between items. Example milk  wheat bread [support = 8%, confidence = 70%] 2% milk  wheat bread [support = 2%, confidence = 72%] We say the first rule is an ancestor of the second rule. A rule is redundant if its support is close to the “expected” value, based on the rule’s ancestor. Multi-Level Mining: Progressive Deepening: Multi-Level Mining: Progressive Deepening A top-down, progressive deepening approach: First mine high-level frequent items: milk (15%), bread (10%) Then mine their lower-level “weaker” frequent itemsets: 2% milk (5%), wheat bread (4%) Different min_support threshold across multi-levels lead to different algorithms: If adopting the same min_support across multi-levels then toss t if any of t’s ancestors is infrequent. If adopting reduced min_support at lower levels then examine only those descendents whose ancestor’s support is frequent/non-negligible. Interestingness Measure: Correlations (Lift): Interestingness Measure: Correlations (Lift) play basketball  eat cereal [40%, 66.7%] is misleading The overall percentage of students eating cereal is 75% which is higher than 66.7%. play basketball  not eat cereal [20%, 33.3%] is more accurate, although with lower support and confidence Measure of dependent/correlated events: lift Constraint-based Data Mining: Constraint-based Data Mining Finding all the patterns in a database autonomously? — unrealistic! The patterns could be too many but not focused! Data mining should be an interactive process User directs what to be mined using a data mining query language (or a graphical user interface) Constraint-based mining User flexibility: provides constraints on what to be mined System optimization: explores such constraints for efficient mining—constraint-based mining Constrained Frequent Pattern Mining: A Mining Query Optimization Problem: Constrained Frequent Pattern Mining: A Mining Query Optimization Problem Given a frequent pattern mining query with a set of constraints C, the algorithm should be sound: it only finds frequent sets that satisfy the given constraints C complete: all frequent sets satisfying the given constraints C are found A naïve solution First find all frequent sets, and then test them for constraint satisfaction More efficient approaches: Analyze the properties of constraints comprehensively Push them as deeply as possible inside the frequent pattern computation. Anti-Monotonicity in Constraint-Based Mining: Anti-Monotonicity in Constraint-Based Mining Anti-monotonicity When an intemset S violates the constraint, so does any of its superset sum(S.Price)  v is anti-monotone sum(S.Price)  v is not anti-monotone Example. C: range(S.profit)  15 is anti-monotone Itemset ab violates C So does every superset of ab TDB (min_sup=2) Which Constraints Are Anti-Monotone?: Which Constraints Are Anti-Monotone? Monotonicity in Constraint-Based Mining: Monotonicity in Constraint-Based Mining Monotonicity When an intemset S satisfies the constraint, so does any of its superset sum(S.Price)  v is monotone min(S.Price)  v is monotone Example. C: range(S.profit)  15 Itemset ab satisfies C So does every superset of ab TDB (min_sup=2) Which Constraints Are Monotone?: Which Constraints Are Monotone? Succinctness: Succinctness Succinctness: Given A1, the set of items satisfying a succinctness constraint C, then any set S satisfying C is based on A1 , i.e., S contains a subset belonging to A1 Idea: Without looking at the transaction database, whether an itemset S satisfies constraint C can be determined based on the selection of items min(S.Price)  v is succinct sum(S.Price)  v is not succinct Optimization: If C is succinct, C is pre-counting pushable Which Constraints Are Succinct?: Which Constraints Are Succinct? The Apriori Algorithm — Example: The Apriori Algorithm — Example Database D Scan D C1 L1 L2 C2 C2 Scan D C3 L3 Scan D Naïve Algorithm: Apriori + Constraint : Naïve Algorithm: Apriori + Constraint Database D Scan D C1 L1 L2 C2 C2 Scan D C3 L3 Scan D Constraint: Sum{S.price < 5} Pushing the constraint deep into the process : Pushing the constraint deep into the process Database D Scan D C1 L1 L2 C2 C2 Scan D C3 L3 Scan D Constraint: Sum{S.price < 5} Push a Succinct Constraint Deep : Push a Succinct Constraint Deep Database D Scan D C1 L1 L2 C2 C2 Scan D C3 L3 Scan D Constraint: min{S.price <= 1 } Converting “Tough” Constraints: Converting “Tough” Constraints Convert tough constraints into anti-monotone or monotone by properly ordering items Examine C: avg(S.profit)  25 Order items in value-descending order <a, f, g, d, b, h, c, e> If an itemset afb violates C So does afbh, afb* It becomes anti-monotone! TDB (min_sup=2) Convertible Constraints: Convertible Constraints Let R be an order of items Convertible anti-monotone If an itemset S violates a constraint C, so does every itemset having S as a prefix w.r.t. R Ex. avg(S)  v w.r.t. item value descending order Convertible monotone If an itemset S satisfies constraint C, so does every itemset having S as a prefix w.r.t. R Ex. avg(S)  v w.r.t. item value descending order Strongly Convertible Constraints: Strongly Convertible Constraints avg(X)  25 is convertible anti-monotone w.r.t. item value descending order R: <a, f, g, d, b, h, c, e> If an itemset af violates a constraint C, so does every itemset with af as prefix, such as afd avg(X)  25 is convertible monotone w.r.t. item value ascending order R-1: <e, c, h, b, d, g, f, a> If an itemset d satisfies a constraint C, so does itemsets df and dfa, which having d as a prefix Thus, avg(X)  25 is strongly convertible What Constraints Are Convertible?: What Constraints Are Convertible? Combing Them Together—A General Picture: Combing Them Together—A General Picture Classification of Constraints: Classification of Constraints Convertible anti-monotone Convertible monotone Strongly convertible Inconvertible Succinct Antimonotone Monotone Mining With Convertible Constraints: Mining With Convertible Constraints C: avg(S.profit)  25 List of items in every transaction in value descending order R: <a, f, g, d, b, h, c, e> C is convertible anti-monotone w.r.t. R Scan transaction DB once remove infrequent items Item h in transaction 40 is dropped Itemsets a and f are good TDB (min_sup=2) Can Apriori Handle Convertible Constraint?: Can Apriori Handle Convertible Constraint? A convertible, not monotone nor anti-monotone nor succinct constraint cannot be pushed deep into the an Apriori mining algorithm Within the level wise framework, no direct pruning based on the constraint can be made Itemset df violates constraint C: avg(X)>=25 Since adf satisfies C, Apriori needs df to assemble adf, df cannot be pruned But it can be pushed into frequent-pattern growth framework! Mining With Convertible Constraints: Mining With Convertible Constraints C: avg(X)>=25, min_sup=2 List items in every transaction in value descending order R: <a, f, g, d, b, h, c, e> C is convertible anti-monotone w.r.t. R Scan TDB once remove infrequent items Item h is dropped Itemsets a and f are good, … Projection-based mining Imposing an appropriate order on item projection Many tough constraints can be converted into (anti)-monotone TDB (min_sup=2) Handling Multiple Constraints: Handling Multiple Constraints Different constraints may require different or even conflicting item-ordering If there exists an order R s.t. both C1 and C2 are convertible w.r.t. R, then there is no conflict between the two convertible constraints If there exists conflict on order of items Try to satisfy one constraint first Then using the order for the other constraint to mine frequent itemsets in the corresponding projected database

Related presentations


Other presentations created by Savin

Soil Experiment
20. 01. 2008
0 views

Soil Experiment

GFACJan2006
08. 05. 2008
0 views

GFACJan2006

2010 Tourism Organising Plan
02. 05. 2008
0 views

2010 Tourism Organising Plan

bejingppt
02. 05. 2008
0 views

bejingppt

ZZS Motor Scooters
24. 04. 2008
0 views

ZZS Motor Scooters

EFF2007
22. 04. 2008
0 views

EFF2007

ic 06 v13
15. 04. 2008
0 views

ic 06 v13

Presentation Jim Pateman 2of2
14. 04. 2008
0 views

Presentation Jim Pateman 2of2

normalization
09. 01. 2008
0 views

normalization

ENGLAND
07. 02. 2008
0 views

ENGLAND

Russia Sergei Tikhonov
08. 01. 2008
0 views

Russia Sergei Tikhonov

Aynak MDSG 2006
10. 01. 2008
0 views

Aynak MDSG 2006

Period 6 CHapter 27
10. 01. 2008
0 views

Period 6 CHapter 27

levin
12. 01. 2008
0 views

levin

SomeshwarRao
12. 01. 2008
0 views

SomeshwarRao

Lesson31 Chandelles
13. 01. 2008
0 views

Lesson31 Chandelles

Tourist Attractions in Fermanagh
14. 01. 2008
0 views

Tourist Attractions in Fermanagh

ap
17. 01. 2008
0 views

ap

Byzantine
19. 01. 2008
0 views

Byzantine

JacobsenControlPotato Fusar
22. 01. 2008
0 views

JacobsenControlPotato Fusar

SSA EducationTutorial Jan2006
12. 02. 2008
0 views

SSA EducationTutorial Jan2006

Unicode and Windows XP c
21. 01. 2008
0 views

Unicode and Windows XP c

Khadi Presentation
25. 01. 2008
0 views

Khadi Presentation

weddedbliss wbw 07
28. 01. 2008
0 views

weddedbliss wbw 07

MeteorsMars
24. 01. 2008
0 views

MeteorsMars

Sand Dune Formation
13. 02. 2008
0 views

Sand Dune Formation

Civil War Power point
18. 02. 2008
0 views

Civil War Power point

SeamanDavid
20. 02. 2008
0 views

SeamanDavid

11Incineration
11. 02. 2008
0 views

11Incineration

6 s111 costello s
28. 02. 2008
0 views

6 s111 costello s

Genomics
25. 02. 2008
0 views

Genomics

week12
08. 03. 2008
0 views

week12

CairoIonsphericA
12. 02. 2008
0 views

CairoIonsphericA

Ask Moss Adams Excess Benefits
20. 03. 2008
0 views

Ask Moss Adams Excess Benefits

LT1001N Lecture 5 2006 7
19. 03. 2008
0 views

LT1001N Lecture 5 2006 7

Malaria prophylaxis engl
31. 03. 2008
0 views

Malaria prophylaxis engl

CubaPublicHealth
03. 04. 2008
0 views

CubaPublicHealth

Ghana presentation
03. 04. 2008
0 views

Ghana presentation

MEPAG Astro meyer
24. 01. 2008
0 views

MEPAG Astro meyer

Monica Brand
10. 04. 2008
0 views

Monica Brand

2006 11 29 larcenter karlskoga
06. 02. 2008
0 views

2006 11 29 larcenter karlskoga

OralHealthGrade3
06. 02. 2008
0 views

OralHealthGrade3

problem 050830
07. 02. 2008
0 views

problem 050830

FEC Lynn 5 11 07
23. 01. 2008
0 views

FEC Lynn 5 11 07

CDN Webcast
16. 01. 2008
0 views

CDN Webcast

CEOs for Cities Talk
21. 03. 2008
0 views

CEOs for Cities Talk

homepage
04. 02. 2008
0 views

homepage

SCI1010 C3
22. 02. 2008
0 views

SCI1010 C3

ULF fÃredrag okt2005
08. 02. 2008
0 views

ULF fÃredrag okt2005

customizing
14. 01. 2008
0 views

customizing

CNM GED Orientation EnglishAug07
04. 02. 2008
0 views

CNM GED Orientation EnglishAug07

AV2005BW
21. 02. 2008
0 views

AV2005BW

09 17 2006 ROTS 1
29. 01. 2008
0 views

09 17 2006 ROTS 1

SBII 2u14 3
26. 03. 2008
0 views

SBII 2u14 3

Martyn Johnson
08. 04. 2008
0 views

Martyn Johnson

hayashi
15. 01. 2008
0 views

hayashi

sofie performance 22 nov 2006
17. 01. 2008
0 views

sofie performance 22 nov 2006

GravityShape 4
22. 01. 2008
0 views

GravityShape 4

Cultural O Syd Beane
15. 01. 2008
0 views

Cultural O Syd Beane