tjea

Information about tjea

Published on February 7, 2008

Author: Susann

Source: authorstream.com

Content

Basic concepts of Data Mining, Clustering and Genetic Algorithms:  Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo Data Mining Motivation:  Data Mining Motivation Mechanical production of data need for mechanical consumption of data Large databases = vast amounts of information Difficulty lies in accessing it KDD and Data Mining:  KDD and Data Mining KDD: Extraction of knowledge from data “non-trivial extraction of implicit, previously unknown & potentially useful knowledge from data” Data Mining: Discovery stage of the KDD process Data Mining Techniques:  Data Mining Techniques Query tools Statistical techniques Visualization On-line analytical processing (OLAP) Clustering Classification Decision trees Association rules Neural networks Genetic algorithms Any technique that helps to extract more out of data is useful What’s Clustering:  What’s Clustering Clustering is a kind of unsupervised learning. Clustering is a method of grouping data that share similar trend and patterns. Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. Example: Thus, we see clustering means grouping of data or dividing a large data set into smaller data sets of some similarity. After clustering: The usage of clustering:  The usage of clustering Some engineering sciences such as pattern recognition, artificial intelligence have been using the concepts of cluster analysis. Typical examples to which clustering has been applied include handwritten characters, samples of speech, fingerprints, and pictures. In the life sciences (biology, botany, zoology, entomology, cytology, microbiology), the objects of analysis are life forms such as plants, animals, and insects. The clustering analysis may range from developing complete taxonomies to classification of the species into subspecies. The subspecies can be further classified into subspecies. Clustering analysis is also widely used in information, policy and decision sciences. The various applications of clustering analysis to documents include votes on political issues, survey of markets, survey of products, survey of sales programs, and R & D. A Clustering Example:  A Clustering Example Income: High Children:1 Car:Luxury Income: Low Children:0 Car:Compact Car: Sedan and Children:3 Income: Medium Income: Medium Children:2 Car:Truck Cluster 1 Cluster 2 Cluster 3 Cluster 4 Different ways of representing clusters:  Different ways of representing clusters (b) g i f e c b K Means Clustering (Iterative distance-based clustering):  K Means Clustering (Iterative distance-based clustering) K means clustering is an effective algorithm to extract a given number of clusters of patterns from a training set. Once done, the cluster locations can be used to classify patterns into distinct classes. K means clustering (Cont.):  K means clustering (Cont.) Select the k cluster centers randomly. Store the k cluster centers. Loop until the change in cluster means is less the amount specified by the user. The drawbacks of K-means clustering:  The drawbacks of K-means clustering The final clusters do not represent a global optimization result but only the local one, and complete different final clusters can arise from difference in the initial randomly chosen cluster centers. (fig. 1) We have to know how many clusters we will have at the first. Drawback of K-means clustering (Cont.):  Drawback of K-means clustering (Cont.) Figure 1 Clustering with Genetic Algorithm:  Clustering with Genetic Algorithm Introduction of Genetic Algorithm Elements consisting GAs Genetic Representation Genetic operators Introduction of GAs:  Introduction of GAs Inspired by biological evolution. Many operators mimic the process of the biological evolution including Natural selection Crossover Mutation Elements consisting GAs:  Elements consisting GAs Individual (chromosome): feasible solution in an optimization problem Population Set of individuals Should be maintained in each generation Elements consisting GAs:  Elements consisting GAs Genetic operators. (crossover, mutation…) Define the fitness function. The fitness function takes a single chromosome as input and returns a measure of the goodness of the solution represented by the chromosome. Genetic Representation:  Genetic Representation The most important starting point to develop a genetic algorithm Each gene has its special meaning Based on this representation, we can define fitness evaluation function, crossover operator, mutation operator. Genetic Representation (Cont.):  Genetic Representation (Cont.) Examples 1 Gene Allele value Genetic Representation (Cont.):  Genetic Representation (Cont.) Examples 2 ( In clustering problem) Each chromosome represents a set of clusters; each gene represents an object; each allele value represents a cluster. Genes with the same allele value are in the same cluster. 1 2 1 4 3 5 5 A B C D E F G Crossover:  Crossover Exchange features of two individuals to produce two offspring (children) Selected mates may have good properties to survive in next generations So, we can expect that exchanging features may produce other good individuals Crossover (cont.):  Crossover (cont.) Single-point Crossover Two-point Crossover Uniform Crossover 1 1 0 1 1 0 0 1 0 0 0 0 1 0 1 0 1 1 1 0 1 1 0 1 0 1 0 1 0 0 1 0 0 0 1 1 0 1 1 0 0 1 0 0 0 0 1 0 1 0 1 1 1 0 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 0 0 0 1 0 1 0 1 1 0 0 0 1 0 1 0 1 1 0 0 0 1 0 0 0 1 1 0 0 1 Crossover template Mutation:  Mutation Usually change a single bit in a bit string This operator should happen with very low probability. Mutation point (random) Typical Procedures:  Typical Procedures Crossover mates are probabilistically selected based on their fitness value. Crossover point randomly selected old generation new generation Mutation point (random) Probabilistically select individuals How to apply GA on a clustering problem:  Preparing the chromosomes Defining genetic operators Fusion: takes two unique allele values and combines them into a single allele value, combining two clusters into one. Fission: takes a single allele value and gives it a different random allele value, breaking a cluster apart. Defining fitness functions How to apply GA on a clustering problem Example: (Cont.):  Example: (Cont.) Crossover Mutation Fusion Fission Old generation New generation Select the chromosomes according to the fitness function. Finally…:  Finally…

Related presentations


Other presentations created by Susann

Athletic Footwear Industry
31. 03. 2008
0 views

Athletic Footwear Industry

radioactivity 1
07. 02. 2008
0 views

radioactivity 1

Africa Unit 3
02. 04. 2008
0 views

Africa Unit 3

Market Research
03. 03. 2008
0 views

Market Research

module 1 intro
08. 05. 2008
0 views

module 1 intro

zly
07. 05. 2008
0 views

zly

031016 inhale montreal
02. 05. 2008
0 views

031016 inhale montreal

new leg
02. 05. 2008
0 views

new leg

JM UWA Luncheon Oct2007
30. 04. 2008
0 views

JM UWA Luncheon Oct2007

Tutorial
24. 04. 2008
0 views

Tutorial

NSW China Briefing
22. 04. 2008
0 views

NSW China Briefing

Weber Fall06
18. 04. 2008
0 views

Weber Fall06

SharksPP
17. 04. 2008
0 views

SharksPP

Dennill Managing Your Risk
09. 01. 2008
0 views

Dennill Managing Your Risk

AAS Presentation
10. 01. 2008
0 views

AAS Presentation

GANG Powerpoint
11. 01. 2008
0 views

GANG Powerpoint

885
15. 01. 2008
0 views

885

Chapt 05
16. 01. 2008
0 views

Chapt 05

Class11
09. 01. 2008
0 views

Class11

Mercantilism
18. 01. 2008
0 views

Mercantilism

ppe p sp
20. 01. 2008
0 views

ppe p sp

training docs
21. 01. 2008
0 views

training docs

eukaryoticorgs
22. 01. 2008
0 views

eukaryoticorgs

LimitedBrandsPresent ation
04. 02. 2008
0 views

LimitedBrandsPresent ation

MLA outreach presentation
15. 01. 2008
0 views

MLA outreach presentation

CenturyTheatre Ad Sept06 gg
15. 01. 2008
0 views

CenturyTheatre Ad Sept06 gg

LutzWalter
22. 01. 2008
0 views

LutzWalter

secondary aluminum
12. 02. 2008
0 views

secondary aluminum

2 3 Ben Sekamatte
25. 01. 2008
0 views

2 3 Ben Sekamatte

Clara Qualizza
19. 01. 2008
0 views

Clara Qualizza

Session 4 Oper vs Aux
28. 01. 2008
0 views

Session 4 Oper vs Aux

Ch5 6 10 14
29. 01. 2008
0 views

Ch5 6 10 14

Rites of Passage
29. 01. 2008
0 views

Rites of Passage

Jonah Presentation
30. 01. 2008
0 views

Jonah Presentation

firesafety
07. 02. 2008
0 views

firesafety

Wiriya Suwannet
10. 01. 2008
0 views

Wiriya Suwannet

s Oceans
13. 02. 2008
0 views

s Oceans

Freeman
20. 02. 2008
0 views

Freeman

Forage Diseases
27. 02. 2008
0 views

Forage Diseases

SlaneP constraints
22. 01. 2008
0 views

SlaneP constraints

EPD2 Present and Future
28. 02. 2008
0 views

EPD2 Present and Future

INSOMNIA DrJeanGrenier Nov 2007
28. 02. 2008
0 views

INSOMNIA DrJeanGrenier Nov 2007

SAAS Gianessi
28. 01. 2008
0 views

SAAS Gianessi

5th Grade FCAT Review
14. 03. 2008
0 views

5th Grade FCAT Review

talent engine discussion guide
16. 03. 2008
0 views

talent engine discussion guide

Lecture14 TheEarth
11. 03. 2008
0 views

Lecture14 TheEarth

GEO205 powerpoint 12
27. 03. 2008
0 views

GEO205 powerpoint 12

Parasitology Basic 07
28. 03. 2008
0 views

Parasitology Basic 07

world geography
15. 04. 2008
0 views

world geography

Irwin
16. 04. 2008
0 views

Irwin

BuildingWordnets
05. 02. 2008
0 views

BuildingWordnets

Douglas Morgan
16. 01. 2008
0 views

Douglas Morgan

Wien
05. 03. 2008
0 views

Wien

NSO 2007
14. 01. 2008
0 views

NSO 2007

GeneralOutreachKansas
29. 01. 2008
0 views

GeneralOutreachKansas

Rita Tucker
25. 01. 2008
0 views

Rita Tucker

Bob Moseley Kawagebo
05. 02. 2008
0 views

Bob Moseley Kawagebo

michigan0314b
15. 01. 2008
0 views

michigan0314b

Canning Fruits
06. 02. 2008
0 views

Canning Fruits

English Labofa EGO Nordic
05. 03. 2008
0 views

English Labofa EGO Nordic

SlidesWithNotes
14. 02. 2008
0 views

SlidesWithNotes

GMkk
24. 01. 2008
0 views

GMkk

20030909152346 EVI
13. 01. 2008
0 views

20030909152346 EVI

07nrm
22. 01. 2008
0 views

07nrm

praca5
03. 03. 2008
0 views

praca5

majingweili
14. 04. 2008
0 views

majingweili

Module3Presentation
25. 02. 2008
0 views

Module3Presentation

Jones NDDA
20. 01. 2008
0 views

Jones NDDA

Thornton CreatingAClimate
17. 01. 2008
0 views

Thornton CreatingAClimate

FirstHalfStudyGuide
15. 02. 2008
0 views

FirstHalfStudyGuide