MAHA Talk

Information about MAHA Talk

Published on October 29, 2007

Author: Peppar

Source: authorstream.com

Content

Data Indexing and Sharing in Large-scale Distributed Systems:  Data Indexing and Sharing in Large-scale Distributed Systems Maha Abdallah LIP6, Paris VI DBMS : A Brief History:  DBMS : A Brief History Application Files DB Server DBMS Connections (ODBC, JDBC) Application Application A Client-Server Architecture Why a DBMS ?:  Why a DBMS ? Simplified Data Management & Access Data Definition Language (DDL) Data Manipulation Language (DML) Specific DBMS Functionalities Query Engine Query Optimizer Transaction Management CREATE TABLE Accounts ( AccountID varchar(30) PRIMARY KEY, CustomerID varchar(30) NOT NULL, Branch varchar(10) NOT NULL, Balance Decimal(10, 2), …) INSERT INTO Accounts VALUES (‘200’, ‘JONES’, ‘New York’, 3500.84) Relational Data Representation:  Relational Data Representation 333.00 New York KHAN 456 500.00 San Francisco ONO 400 340.14 San Francisco GREEN 350 23.17 New York SMITH 345 200.00 San Francisco GRAY 324 1000.00 New York JONES 200 Balance Branch CustomerID AccountID . . . New York San Francisco Primary Index Index Distributed DBMS Design:  Distributed DBMS Design Top-down Approach Global schema definition Data fragmentation Local sites for fragments storage Transparent Distributed DBMS Design:  Distributed DBMS Design Horizontal Fragmentation (Account) branch = ‘New York’ Vertical Fragmentation (Account) accountID, balance Distributed DBMS Design:  Distributed DBMS Design Bottom-up Approaches Integration of preexisting legacy systems Mediated schema approach Others: Ontology-based approach, … Global As View (GAV) Global schema as view over Source ones query(G) = query(f(S1,…, Sn)) Local As View (LAV) Sources as views over global schema query(G) = query(f1-1 (S1),…, fn-1 (Sn)) Integrated Schema User view Integration Mediator OODBMS ORDBMS Heterogeneous Systems RDBMS Adapter Adapter Adapter Distributed DBMSs…:  Distributed DBMSs… Assessment in large-scale environments Central Mediation Schema Centralized / Replicated Global Index Issue : Current DB technologies DO NOT SCALE Can we Bring P2P systems’ scalability to the DB world ? Why P2P ?:  Why P2P ? A distributed system architecture Typically many nodes, unreliable and heterogeneous No centralized control Nodes are symmetric in function Harness distributed, shared resources (bandwidth, CPU, storage) on peer nodes Fault-tolerant, self-organizing Dynamic environment with frequent join and leave P2P Challenge : Locating Content:  P2P Challenge : Locating Content Unstructured Systems Napster : Centralized global index Gnutella : No global index Flooding who has file F ? 1 Node 1 Napster central server Napster Gnutella P2P Challenge : Locating Content:  P2P Challenge : Locating Content Structured Systems Distributed Hash Tables Efficient : hash-based indexing Scalable : distributed global index ... x y z hash table hash bucket insert (key, data) lookup (key) → data 0 1 2 ... node insert (key, data) lookup (key) → data N-1 Hash Tables Associate data with keys Key is hashed to find bucket in hash table DHTs Associate data with keys Key is hashed to find peer node in the system Internet Scale DHTs : A Case Study :  Internet Scale DHTs : A Case Study Content Addressable Networks (CAN) Interface - insert(key,value) - value = retrieve(key) DHTs : Dealing with System Dynamicity:  DHTs : Dealing with System Dynamicity Consistent Hashing Fixed hash space shared by all peer nodes 2-dimensional space 1 By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN : Sharing Hash Space:  CAN : Sharing Hash Space 1 2 By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN : Sharing Hash Space:  CAN : Sharing Hash Space 1 2 3 By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN : Sharing Hash Space:  1 2 3 4 CAN : Sharing Hash Space By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN : Sharing Hash Space:  CAN : Sharing Hash Space By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: Example:  CAN: Example I By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: Example:  CAN: Example node I::insert(K,V) I By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: Example:  (1) a = hx(K) CAN: Example x = a node I::insert(K,V) I By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: Example:  (1) a = hx(K) b = hy(K) CAN: Example x = a y = b node I::insert(K,V) I By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: Example:  (1) a = hx(K) b = hy(K) CAN: Example (2) route(K,V) -> (a,b) node I::insert(K,V) I By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: Example:  CAN: Example (2) route(K,V) -> (a,b) (3) (a,b) stores (K,V) (K,V) node I::insert(K,V) I (1) a = hx(K) b = hy(K) By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: Example:  CAN: Example (2) route “retrieve(K)” to (a,b) (K,V) (1) a = hx(K) b = hy(K) node J::retrieve(K) J By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: routing table:  CAN: routing table A node only maintains state for its immediate neighboring nodes By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: routing:  CAN: routing (a,b) (x,y) By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: node insertion:  CAN: node insertion Bootstrap node Discover some node “I” already in CAN new node By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: node insertion:  CAN: node insertion I new node By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: node insertion:  CAN: node insertion 2) pick random point in space I (p,q) new node By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: node insertion:  CAN: node insertion (p,q) I routes to (p,q), discovers node J I J new node By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks CAN: node insertion:  CAN: node insertion new J Split J’s zone in half… new owns one half By S.Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker ICIR, U.C.Berkeley, Tahoe Networks DHTs…:  DHTs… Assessment in large-scale environments Coarse-grained indexing by document ID File level Content and semantic unawareness Random graph construction No “practical” relation between neighbors, as… - Semantic proximity - Geographical proximity, … Read-only data sharing Problem: do not adapt to complex data sharing fine-grained data semantically rich data Read/Write data sharing Bridging the Gap…:  Bridging the Gap… Can we bring DBMSs’ technologies to the P2P world ? DBMS Technologies P2P Technologies scalable Fine-grained Sharing of Semantically Rich Data Targeted Applications : Scientific data sharing among research communities Fine-Grained Indexing : Basic Idea:  Fine-Grained Indexing : Basic Idea DHT-based approach Indexing granularity Relations, attributes Key-words Content-based look-up & access Strong query language expressiveness Relational algebra operators Relational Data Indexing:  Relational Data Indexing Peers support relational data No central mediated schema Global index distributed over peers Meta-Data for semantic/structural content description Relations set of keywords Attributes set of keywords SELECT treatment FROM Sickness WHERE Name = “Cancer” Typical query Q Slide36:  Relational Data Indexing Slide37:  Relational Data Indexing Relation & Attribute indexing (key, value) hash index, ∀ key ∈ {relation name, relation keywords} value = node [email protected] Sample hash index : Relations entries h(sickness) Peer3 h(Illness) Peer2 Relational Data Indexing:  Relational Data Indexing Sample hash index : Attributes entries h(sickness||name) Peer3 h(sickness||treatment) Peer1 Peer1 issues query Q SELECT treatment SELECT Attribute FROM Sickness FROM Relation WHERE name = “Cancer” WHERE Predicate h(sickness||name) Peer3 & h(Illness||name) Peer2 h(sickness||treatment) Peer1 & h(Illness||treatment) Peer4 Send request Q to Peers 1, 2 & 3 only Slide39:  Relational Data Indexing Assessment Scalability No global mediation schema Attribute-level distributed indexing, look-up and access Highly expressive query languages Current Extensions Store value intervals in index Support for range queries SELECT Attribute1 FROM Relation WHERE Attribute2 BETWEEN 350 AND 500 Still a lot to do… Semantic Clustering & Indexing:  Semantic Clustering & Indexing Meta-Data for Content Description General core meta-data schema Theme-specialized schema Hierarchical semantic representation Semantic/content characterization Documents : keyword vectors Document indexing Index placement Peers : semantic vectors/matrices Overlay construction Semantic node clustering Semantic Groups/Subgroups Slide41:  Semantic Clustering & Indexing Vector construction from meta-data Vector values represent relevance degree for content description Semantic Proximity Node vectors <theme1, theme2, theme3,…> <sub-theme1, sub-theme2, sub-theme3,…>, etc. Semantic Distance d, where d = cosine of angle between vectors Neighbors chosen wrt d Slide42:  K V K V K V K V K V K V K V K V K V K V K V Semantic Clustering & Indexing Document vectors : <keyword1, keyword2, …, keywordN> Document indexing : keywords hashing Slide43:  Semantic Clustering & Indexing Direct Issues Group specification - Completeness - Validity - Evolution Group management Semantic distance between groups Index placement wrt group semantics Other Issues Beyond semantic grouping … Confidence work spheres Multi criteria grouping And still… Data consistency and updates Schema mappings

Related presentations


Other presentations created by Peppar

Financial Statement Analysis
10. 04. 2008
0 views

Financial Statement Analysis

Dr PH Presentation
07. 08. 2007
0 views

Dr PH Presentation

Burj Al Arab
22. 04. 2008
0 views

Burj Al Arab

cd4 hiv dr a singh
17. 04. 2008
0 views

cd4 hiv dr a singh

Philip Scott
17. 04. 2008
0 views

Philip Scott

RMIT 25July01 Pres
14. 04. 2008
0 views

RMIT 25July01 Pres

060125
13. 04. 2008
0 views

060125

Blomqvist
09. 04. 2008
0 views

Blomqvist

Rocks and Weathering
20. 09. 2007
0 views

Rocks and Weathering

Your First RSS Feed
29. 09. 2007
0 views

Your First RSS Feed

Switzerland
15. 10. 2007
0 views

Switzerland

t infantil y legislacion
22. 10. 2007
0 views

t infantil y legislacion

14501
07. 10. 2007
0 views

14501

Stylish Sentences
02. 11. 2007
0 views

Stylish Sentences

15 GardnerHarris
19. 11. 2007
0 views

15 GardnerHarris

cryptorchidism
19. 11. 2007
0 views

cryptorchidism

geometry and art P2
22. 11. 2007
0 views

geometry and art P2

Green Bldgs and WQ
31. 12. 2007
0 views

Green Bldgs and WQ

paper2
03. 01. 2008
0 views

paper2

Earthquakes Chap 5
20. 09. 2007
0 views

Earthquakes Chap 5

Ch08
20. 09. 2007
0 views

Ch08

El Paso Electric lowres pics
07. 08. 2007
0 views

El Paso Electric lowres pics

AG presentation Infrastructure
07. 08. 2007
0 views

AG presentation Infrastructure

GCRA Presentation 2005 1
07. 08. 2007
0 views

GCRA Presentation 2005 1

Asian Alphabet Book 04 17 06
07. 08. 2007
0 views

Asian Alphabet Book 04 17 06

pres maldives
07. 08. 2007
0 views

pres maldives

Tsunami Effects
07. 08. 2007
0 views

Tsunami Effects

Maldives
07. 08. 2007
0 views

Maldives

Global gs pp 0207
22. 10. 2007
0 views

Global gs pp 0207

feist ch14McCrae
06. 08. 2007
0 views

feist ch14McCrae

measurement and geometry
07. 08. 2007
0 views

measurement and geometry

arguments
15. 11. 2007
0 views

arguments

Tidal Energy Overview7
07. 08. 2007
0 views

Tidal Energy Overview7

camoa presse spip 01
07. 08. 2007
0 views

camoa presse spip 01

PE Rocks Igneous
20. 09. 2007
0 views

PE Rocks Igneous

ROCKS and how to identify them
20. 09. 2007
0 views

ROCKS and how to identify them

RIPARWIN Presentation
03. 01. 2008
0 views

RIPARWIN Presentation

ioag9 jaxa briefing
03. 01. 2008
0 views

ioag9 jaxa briefing

social structure
19. 02. 2008
0 views

social structure

investing in the future
04. 03. 2008
0 views

investing in the future

Secondary Math Handout
07. 08. 2007
0 views

Secondary Math Handout

Roundtable
26. 10. 2007
0 views

Roundtable

NicholasEberstadt
15. 10. 2007
0 views

NicholasEberstadt

YTB 052007
12. 03. 2008
0 views

YTB 052007

raicevic
18. 03. 2008
0 views

raicevic

sbp 07
25. 03. 2008
0 views

sbp 07

purchasing sp presentation
20. 09. 2007
0 views

purchasing sp presentation

CSAPA Awareness2005
07. 08. 2007
0 views

CSAPA Awareness2005

FinalReport
22. 10. 2007
0 views

FinalReport

IJCDlineNojiri
09. 10. 2007
0 views

IJCDlineNojiri

DFT KeyChallengesNicNewm an
05. 10. 2007
0 views

DFT KeyChallengesNicNewm an

ILejarraga
07. 08. 2007
0 views

ILejarraga

EP Tecon 0405
19. 06. 2007
0 views

EP Tecon 0405

famigliaim presaanto 3
18. 06. 2007
0 views

famigliaim presaanto 3

Bulldogs Best Books 1211
18. 06. 2007
0 views

Bulldogs Best Books 1211

btw e 008 moriresearch
18. 06. 2007
0 views

btw e 008 moriresearch

BPL Sanyo JV Pressrelease
18. 06. 2007
0 views

BPL Sanyo JV Pressrelease

bite overview
18. 06. 2007
0 views

bite overview

biogas
18. 06. 2007
0 views

biogas

BILBAO
18. 06. 2007
0 views

BILBAO

Benef oport
18. 06. 2007
0 views

Benef oport

formazrete
18. 06. 2007
0 views

formazrete

lect 4 1113 Class Ig Rx1
20. 09. 2007
0 views

lect 4 1113 Class Ig Rx1

ebs2 elearn07
19. 06. 2007
0 views

ebs2 elearn07

boiron
18. 06. 2007
0 views

boiron

fiscal
18. 06. 2007
0 views

fiscal

12 productivity quiz
16. 06. 2007
0 views

12 productivity quiz

10 1
16. 06. 2007
0 views

10 1

215 Pics With Captions
16. 06. 2007
0 views

215 Pics With Captions

2007 Power Stroke
16. 06. 2007
0 views

2007 Power Stroke

20070422
16. 06. 2007
0 views

20070422

20070225
16. 06. 2007
0 views

20070225

2005AM LC 2C
16. 06. 2007
0 views

2005AM LC 2C

2004 4082OPH1 01 Tiefer
16. 06. 2007
0 views

2004 4082OPH1 01 Tiefer

1kanrap
16. 06. 2007
0 views

1kanrap

1kanpres
16. 06. 2007
0 views

1kanpres

1kanintro
16. 06. 2007
0 views

1kanintro

1kancom
16. 06. 2007
0 views

1kancom

19 Flake 071706
16. 06. 2007
0 views

19 Flake 071706

OIF Presentation Final Sept07
05. 01. 2008
0 views

OIF Presentation Final Sept07

12nightacts
16. 06. 2007
0 views

12nightacts

Fri 0830 RegionalAQ liao 1 pc
16. 10. 2007
0 views

Fri 0830 RegionalAQ liao 1 pc

bullismo
18. 06. 2007
0 views

bullismo

ARVs friends or foes McCoy
28. 12. 2007
0 views

ARVs friends or foes McCoy

Rock types silica sat
20. 09. 2007
0 views

Rock types silica sat

Threats to our Water
29. 02. 2008
0 views

Threats to our Water

SIVplanmeet svg3c Shakir
07. 08. 2007
0 views

SIVplanmeet svg3c Shakir

Rajesh Mehta
07. 08. 2007
0 views

Rajesh Mehta

FER 9 und 20
18. 06. 2007
0 views

FER 9 und 20

georgevidor
16. 11. 2007
0 views

georgevidor

beo presentation
18. 06. 2007
0 views

beo presentation

EVACAR Otesis
18. 06. 2007
0 views

EVACAR Otesis

Calvin Dude
07. 08. 2007
0 views

Calvin Dude

PRACTICAL4
29. 12. 2007
0 views

PRACTICAL4

cuamcfarland
20. 09. 2007
0 views

cuamcfarland

UHART
24. 11. 2007
0 views

UHART

News CIRIA document
01. 01. 2008
0 views

News CIRIA document

EverythingDigital
07. 08. 2007
0 views

EverythingDigital

ntra Master B
23. 10. 2007
0 views

ntra Master B

FSGDevCon20060419
15. 11. 2007
0 views

FSGDevCon20060419

Context06
15. 10. 2007
0 views

Context06

4 brittle I
20. 09. 2007
0 views

4 brittle I

WBTi
07. 08. 2007
0 views

WBTi

BTSA Technology Bytes
18. 06. 2007
0 views

BTSA Technology Bytes

FCP05 PD Flaum
06. 08. 2007
0 views

FCP05 PD Flaum

14 20H 10 1103 Dr SEARO
07. 08. 2007
0 views

14 20H 10 1103 Dr SEARO

echt Zeit
19. 06. 2007
0 views

echt Zeit

dortmund
19. 06. 2007
0 views

dortmund