ic 06 v13

Information about ic 06 v13

Published on April 15, 2008

Author: Savin

Source: authorstream.com

Content

Multimedia and Graph mining:  Multimedia and Graph mining Christos Faloutsos CMU CONGRATULATIONS!:  CONGRATULATIONS! Outline:  Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting Conclusions Motivation:  Motivation Data mining: ~ find patterns (rules, outliers) How do detached cat retinas evolve? How do real graphs look like? How do (numerical) streams look like? ViVo: cat retina mining:  ViVo: cat retina mining with Ambuj Singh, Mark Verardo, Vebjorn Ljosa, Arnab Bhattacharya (UCSB) Jia-Yu Tim Pan, HJ Yang (CMU) Detachment Development:  Detachment Development Normal 1 day after detachment 3 days after detachment 7 days after detachment 28 days after detachment 3 months after detachment Data and Problem:  Data and Problem (Problem) What happens in retina after detachment? What tissues (regions) are involved? How do they change over time? How will a program convey this info? More than classification “we want to learn what classifier learned” Main idea:  Main idea extract characteristic visual ‘words’ Equivalent to characteristic keywords, in a collection of text documents Visual vocabulary?:  Visual vocabulary? Visual vocabulary?:  Visual vocabulary? news: president, minister, economic sports: baseball, score, penalty Visual Vocabulary (ViVo) generation:  Visual Vocabulary (ViVo) generation Step 1: Tile image Step 2: Extract tile features Step 3: ViVo generation Visual vocabulary Feature 1 Feature 2 8x12 tiles Biological interpretation:  Biological interpretation Which tissue is significant on 7-day?:  Which tissue is significant on 7-day? FEMine: Mining Fly Embryos:  FEMine: Mining Fly Embryos With:  With Eric Xing (CMU CS) Bob Murphy (CMU – Bio) Tim Pan (CMU -> Google) Andre Balan (U. Sao Paulo) Outline:  Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting Conclusions Graphs - why should we care?:  Graphs - why should we care? Graphs - why should we care?:  Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01] Joint work with:  Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.) Problem: network and graph mining:  Problem: network and graph mining How does the Internet look like? How does the web look like? What constitutes a ‘normal’ social network? What is ‘normal’/‘abnormal’? which patterns/laws hold? Graph mining:  Graph mining Are real graphs random? Laws and patterns:  Laws and patterns NO!! Diameter in- and out- degree distributions other (surprising) patterns Laws – degree distributions:  Laws – degree distributions Q: avg degree is ~3 - what is the most probable degree? degree count ?? 3 Laws – degree distributions:  Laws – degree distributions Q: avg degree is ~3 - what is the most probable degree? degree Solution::  Solution: The plot is linear in log-log scale [FFF’99] freq = degree (-2.15) O = -2.15 Exponent = slope Outdegree Frequency Nov’97 -2.15 But::  But: Q1: How about graphs from other domains? Q2: How about temporal evolution? The Peer-to-Peer Topology:  The Peer-to-Peer Topology Frequency versus degree Number of adjacent peers follows a power-law [Jovanovic+] More power laws::  More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) Ullman Slide29:  Swedish sex-web Nodes: people (Females; Males) Links: sexual relationships Liljeros et al. Nature 2001 4781 Swedes; 18-74; 59% response rate. Albert Laszlo Barabasi http://www.nd.edu/~networks/ Publication%20Categories/ 04%20Talks/2005-norway-3hours.ppt More power laws::  More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(in-degree) log(count) Zipf ``ebay’’ epinions.com:  epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] (out) degree count trusts-2000-people user A famous power law: Zipf’s law:  A famous power law: Zipf’s law Bible - rank vs frequency (log-log) similarly, in many other languages; for customers and sales volume; city populations etc etc log(rank) log(freq) Olympic medals (Sidney’00, Athens’04)::  Olympic medals (Sidney’00, Athens’04): log( rank) log(#medals) More power laws: areas – Korcak’s law:  More power laws: areas – Korcak’s law Scandinavian lakes area vs complementary cumulative count (log-log axes) log(count( >= area)) log(area) But::  But: Q1: How about graphs from other domains? Q2: How about temporal evolution? Time evolution:  Time evolution with Jure Leskovec (CMU) and Jon Kleinberg (Cornell) (‘best paper’ KDD05) Evolution of the Diameter:  Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? Evolution of the Diameter:  Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time As the network grows the distances between nodes slowly decrease Diameter – ArXiv citation graph:  Diameter – ArXiv citation graph Citations among physics papers 1992 –2003 One graph per year time [years] diameter Diameter – “Autonomous Systems”:  Diameter – “Autonomous Systems” Graph of Internet One graph per day 1997 – 2000 number of nodes diameter Diameter – “Affiliation Network”:  Diameter – “Affiliation Network” Graph of collaborations in physics – authors linked to papers 10 years of data time [years] diameter Diameter – “Patents”:  Diameter – “Patents” Patent citation network 25 years of data time [years] diameter Temporal Evolution of the Graphs:  Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) Temporal Evolution of the Graphs:  Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! But obeying the ``Densification Power Law’’ Densification – Physics Citations:  Densification – Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations N(t) E(t) 1.69 Densification – Physics Citations:  Densification – Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations N(t) E(t) 1.69 1: tree Densification – Physics Citations:  Densification – Physics Citations Citations among physics papers 2003: 29,555 papers, 352,807 citations N(t) E(t) 1.69 clique: 2 Densification – Patent Citations:  Densification – Patent Citations Citations among patents granted 1999 2.9 million nodes 16.5 million edges Each year is a datapoint N(t) E(t) 1.66 Densification – Autonomous Systems:  Densification – Autonomous Systems Graph of Internet 2000 6,000 nodes 26,000 edges One graph per day N(t) E(t) 1.18 Densification – Affiliation Network:  Densification – Affiliation Network Authors linked to their publications 2002 60,000 nodes 20,000 authors 38,000 papers 133,000 edges N(t) E(t) 1.15 Graphs - Conclusions:  Graphs - Conclusions Real graphs obey some surprising patterns which can help us spot anomalies / outliers A lot of interest from web searching companies recommendation systems link spamming trust propagation HUGE graphs (Millions and Billions of nodes) Outline:  Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting Conclusions Why care about streams?:  Why care about streams? Why care about streams?:  Why care about streams? Sensor devices Temperature, weather measurements Road traffic data Geological observations Patient physiological data sensor-Andrew project Embedded devices Network routers Co-evolving time sequences:  Co-evolving time sequences Joint work with Jimeng Sun (CMU) Dr. Spiros Papadimitriou (CMU/IBM) Dr. Yasushi Sakurai (NTT) Prof. Jeanne VanBriesen (CMU/CEE) Motivation:  Motivation water distribution network normal operation sensors near leak sensors away from leak Motivation:  Motivation water distribution network normal operation major leak chlorine concentrations sensors near leak sensors away from leak Motivation:  Motivation actual measurements (n streams) k hidden variable(s) spot: “hidden (latent) variables” Motivation:  Motivation chlorine concentrations Phase 1 Phase 1 Phase 2 Phase 2 actual measurements (n streams) k hidden variable(s) k = 2 spot: “hidden (latent) variables” Motivation:  Motivation chlorine concentrations Phase 1 Phase 1 Phase 2 Phase 2 Phase 3 Phase 3 actual measurements (n streams) k hidden variable(s) k = 1 spot: “hidden (latent) variables” SPIRIT / InteMon:  SPIRIT / InteMon http://warsteiner.db.cs.cmu.edu/demo/intemon.jsp http://localhost:8080/demo/graphs.jsp self-* storage system (PDL/CMU) 1 PetaByte storage self-monitoring, self-healing: self-* with Jimeng Sun (CMU/CS) Evan Hoke (CMU/CS-ug) Prof. Greg Ganger (CMU/CS/ECE) John Strunk (CMU/ECE) Related project:  Related project Anomaly detection in network traffic (Zhang, Xie) Conclusions:  Conclusions Biological images, graphs & streams pose fascinating problems self-similarity, fractals and power laws work, when other methods fail! Books:  Books Manfred Schroeder: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 (Probably the BEST book on fractals!) Contact info:  Contact info [email protected] www.cs.cmu.edu/~christos Wean Hall 7107 Ph#: x8.1457 and, again WELCOME!

Related presentations


Other presentations created by Savin

Soil Experiment
20. 01. 2008
0 views

Soil Experiment

GFACJan2006
08. 05. 2008
0 views

GFACJan2006

2010 Tourism Organising Plan
02. 05. 2008
0 views

2010 Tourism Organising Plan

bejingppt
02. 05. 2008
0 views

bejingppt

ZZS Motor Scooters
24. 04. 2008
0 views

ZZS Motor Scooters

EFF2007
22. 04. 2008
0 views

EFF2007

Presentation Jim Pateman 2of2
14. 04. 2008
0 views

Presentation Jim Pateman 2of2

normalization
09. 01. 2008
0 views

normalization

ENGLAND
07. 02. 2008
0 views

ENGLAND

Russia Sergei Tikhonov
08. 01. 2008
0 views

Russia Sergei Tikhonov

Aynak MDSG 2006
10. 01. 2008
0 views

Aynak MDSG 2006

Period 6 CHapter 27
10. 01. 2008
0 views

Period 6 CHapter 27

levin
12. 01. 2008
0 views

levin

SomeshwarRao
12. 01. 2008
0 views

SomeshwarRao

Lesson31 Chandelles
13. 01. 2008
0 views

Lesson31 Chandelles

Tourist Attractions in Fermanagh
14. 01. 2008
0 views

Tourist Attractions in Fermanagh

ap
17. 01. 2008
0 views

ap

Byzantine
19. 01. 2008
0 views

Byzantine

JacobsenControlPotato Fusar
22. 01. 2008
0 views

JacobsenControlPotato Fusar

SSA EducationTutorial Jan2006
12. 02. 2008
0 views

SSA EducationTutorial Jan2006

Unicode and Windows XP c
21. 01. 2008
0 views

Unicode and Windows XP c

Khadi Presentation
25. 01. 2008
0 views

Khadi Presentation

weddedbliss wbw 07
28. 01. 2008
0 views

weddedbliss wbw 07

MeteorsMars
24. 01. 2008
0 views

MeteorsMars

Sand Dune Formation
13. 02. 2008
0 views

Sand Dune Formation

Civil War Power point
18. 02. 2008
0 views

Civil War Power point

SeamanDavid
20. 02. 2008
0 views

SeamanDavid

11Incineration
11. 02. 2008
0 views

11Incineration

6 s111 costello s
28. 02. 2008
0 views

6 s111 costello s

Genomics
25. 02. 2008
0 views

Genomics

week12
08. 03. 2008
0 views

week12

CairoIonsphericA
12. 02. 2008
0 views

CairoIonsphericA

Ask Moss Adams Excess Benefits
20. 03. 2008
0 views

Ask Moss Adams Excess Benefits

LT1001N Lecture 5 2006 7
19. 03. 2008
0 views

LT1001N Lecture 5 2006 7

Malaria prophylaxis engl
31. 03. 2008
0 views

Malaria prophylaxis engl

CubaPublicHealth
03. 04. 2008
0 views

CubaPublicHealth

Ghana presentation
03. 04. 2008
0 views

Ghana presentation

MEPAG Astro meyer
24. 01. 2008
0 views

MEPAG Astro meyer

Monica Brand
10. 04. 2008
0 views

Monica Brand

2006 11 29 larcenter karlskoga
06. 02. 2008
0 views

2006 11 29 larcenter karlskoga

OralHealthGrade3
06. 02. 2008
0 views

OralHealthGrade3

problem 050830
07. 02. 2008
0 views

problem 050830

FEC Lynn 5 11 07
23. 01. 2008
0 views

FEC Lynn 5 11 07

CDN Webcast
16. 01. 2008
0 views

CDN Webcast

CEOs for Cities Talk
21. 03. 2008
0 views

CEOs for Cities Talk

homepage
04. 02. 2008
0 views

homepage

SCI1010 C3
22. 02. 2008
0 views

SCI1010 C3

ULF fÃredrag okt2005
08. 02. 2008
0 views

ULF fÃredrag okt2005

customizing
14. 01. 2008
0 views

customizing

CNM GED Orientation EnglishAug07
04. 02. 2008
0 views

CNM GED Orientation EnglishAug07

AV2005BW
21. 02. 2008
0 views

AV2005BW

09 17 2006 ROTS 1
29. 01. 2008
0 views

09 17 2006 ROTS 1

SBII 2u14 3
26. 03. 2008
0 views

SBII 2u14 3

Martyn Johnson
08. 04. 2008
0 views

Martyn Johnson

hayashi
15. 01. 2008
0 views

hayashi

sofie performance 22 nov 2006
17. 01. 2008
0 views

sofie performance 22 nov 2006

assoc2
07. 02. 2008
0 views

assoc2

GravityShape 4
22. 01. 2008
0 views

GravityShape 4

Cultural O Syd Beane
15. 01. 2008
0 views

Cultural O Syd Beane