1015 1

Information about 1015 1

Published on November 19, 2007

Author: Arkwright26

Source: authorstream.com

Content

Authoritative Sources in a Hyperlinked Environment:  Authoritative Sources in a Hyperlinked Environment Hui Han CSE dept, PSU 10/15/01 Main Idea:  Main Idea Return more authoritative information, using the algorithm based on link structure analysis, esp. the relationship between a set of relevant authoritative pages and the set of “hub” pages that join them together in the link structure The types of queries:  The types of queries Specific queries – scarcity problem “Does Netscape support the JDK 1.1 code-signing API” Broad-topic queries – abundance problem “Find information about the Java programming language.” Solution: find a small set of the most “authoritative” relevant pages. Similar-page queries “Find pages ‘similar’ to java.sun.com Linked Based Analysis:  Linked Based Analysis Limitations of text based analysis Text-based ranking function Eg. Could www.harvard.edu be recognized as one of the most authoritative pages, since many other webpages contain “harvard” more often. Pages are not sufficiently self – descriptive Usually the term “search engine” doesn’t appear on search engine web pages Limitations of Basic Link-based Analysis:  Limitations of Basic Link-based Analysis Basic model: Of all pages containing the query string, return those with the greatest number of in-links Large number of links are created primarily for navigational purposes Difficulty in finding a balance between relevance and popularity Popular sites like www.yahoo.com would be considered as highly authoritative on any query strings it contained. P Q Conferred Authority Hub-Authority method:  Hub-Authority method A link-based model for the conferral of authority, using the method that consistently identifies relevant, authoritative WWW pages for broad search topics. Difference from Clustering Distinguishing pages related to different meanings or senses of a query term Step1: Constructing a Focused Subgraph of the WWW:  Step1: Constructing a Focused Subgraph of the WWW Properties of a ideal collection S of pages S is relatively small S is rich in relevant pages S contains most(or many) of the strongest authorities Construction of S Collect the t highest-ranked pages for the query  from a text-based search engine (Altavista, hotbot) as root set R Expand R to base set S a strong authority is quite likely to be pointed to by at least one page in R Keep only transverse links Step2: Compute Hubs and Authorities:  Step2: Compute Hubs and Authorities Authoritative pages should have large in-degree have a considerable overlap in the sets of pages that point to them, since they share a common topic “Hub” pages “pull together”(link to) authorities on a common topic An Iterative Algorithm:  An Iterative Algorithm G : the subgraph induced on the pages in S x(p) : authority weight of Page P y(p): hub weight of Page P Normalize: pS (x(p))2 = 1 pS (y(p))2 = 1 I operation: x(p) = q : (q,p)  Ey(p) O operation: y(p) = q : (q,p)  Ex(p) Hub and Authority exhibit “mutually reinforcing relationship” An Iterative Algorithm(cont.):  An Iterative Algorithm(cont.) {x(p) } is a vector x with authority weight for each page in G {y(p) } is a vector y with hub weight for each page in G Filter out the top c authorities and top c hubs Basic knowledge of Matrix:  Basic knowledge of Matrix M: symmetric n*n matrix  :vector : a number If for some vector , M  = , we say, The set of all such  is a subspace of Rn Eigenspace associated with ; These 1(M), 2(M), … are eigenvalues, while 1(M), 2(M), … are eigenvectors i(M) belongs to the subspace of i(M) If we assume |1(M) > 2(M)|, we refer to 1(M) as the principal eigenvector, and all other i(M) as non-principal eigenvector. Convergence Proof of Iterate Procedure:  Convergence Proof of Iterate Procedure Theorem1. The sequences x1, x2, x3, … and y1, y2, y3, … converge to x* and y* respectively. Proof: G=(V,E); V={p1, p2, …, pn}; A is the adjacency matrix of graph G; Aij = 1 if (pi, pj) is an edge of G. I & O operations can be written as: x  ATy y  Ax K loops, So, x (1) AT Ax (0); x(0) = AT z x*  … x (k) (AT A)k-1 AT z y*  … y (k) (AAT)k z “if  is a vector not orthogonal to the principle eigenvector 1(M), the unit vector in the direction of Mk converges to 1(M) as k increases without bound” Convergence Proof of Iterate Procedure(cont.):  Convergence Proof of Iterate Procedure(cont.) A is called an orthogonal matrix if AAT = AT A = E. Theorem2: x* is the principal eigenvector of ATA, and y* is the principal eigenvector of AAT. Experiment finds that k=20 is sufficient for the convergence of vectors. Experimental results:  Experimental results Most of these authorities do not contain any occurrences of the initial query string , except http://www.roadhead.com Similar-Page Queries:  Similar-Page Queries The strongest authorities in the local region of the link structure near p, can server as a broad-topic summary of the pages related to p. A small difference in initial request to the search engine: “Find t pages pointing to p”, to get R Multiple sets of Hubs and Authorities:  Multiple sets of Hubs and Authorities Why? The query string  may have several very different meanings. Eg.“java” The string may arise as a term in the context of multiple technical communities. Eg. “randomized algorithms” The string may refer to a highly polarized issue, involving groups that are not likely to link to one another. Eg. “abortion” Idea: The NON-principle eigenvectors of ATA and AAT provide us with a natural way to extract additional densely linked collections of hubs and authorities from the base set S. Multiple sets of Hubs and Authorities Experimental result1:  Multiple sets of Hubs and Authorities Experimental result1 For the query “jaguar*”, the strongest collections of authoritative sources concerned the Atari Jaguar product, the NFL football team from Jacksonville, and the automobile. Multiple sets of Hubs and Authorities Experimental result2:  Multiple sets of Hubs and Authorities Experimental result2 For the query “randomized algorithms”, none of the strongest collections of hubs and authorities are precisely on the query topic. They include home pages of theoretical computer scientists, compendia of mathesmatical software and pages on wavelets. Multiple sets of Hubs and Authorities Experimental result3:  Multiple sets of Hubs and Authorities Experimental result3 For the query “abortion”, there is a natural separation. Conclusion:  Conclusion A technique for locating high-quality information related to a broad search topic on the www, based on a structural analysis of the link topology surrounding “authoritative” pages on the topic. Related work. Standing, influence in social networks, scientific citations, etc. Hypertext and WWW rankings …

Related presentations


Other presentations created by Arkwright26

transportation
07. 11. 2007
0 views

transportation

wonderful world
19. 06. 2007
0 views

wonderful world

2006911155950435
28. 04. 2008
0 views

2006911155950435

dietrich
17. 04. 2008
0 views

dietrich

ME Individual DM JG 2006
16. 04. 2008
0 views

ME Individual DM JG 2006

H106n
14. 04. 2008
0 views

H106n

DM GlobalFDI Movements240306
13. 04. 2008
0 views

DM GlobalFDI Movements240306

may30
10. 04. 2008
0 views

may30

Ulad using crop residues
09. 04. 2008
0 views

Ulad using crop residues

coral reef and climate change
07. 04. 2008
0 views

coral reef and climate change

Ian Brinkley DtF 07 06
30. 03. 2008
0 views

Ian Brinkley DtF 07 06

Temperature
14. 02. 2008
0 views

Temperature

AP Review 1400 1800
20. 02. 2008
0 views

AP Review 1400 1800

New Sony
03. 10. 2007
0 views

New Sony

Literary Vocabulary Rhyme
10. 10. 2007
0 views

Literary Vocabulary Rhyme

Chapter1McMurry
13. 10. 2007
0 views

Chapter1McMurry

FinanceTransition
16. 10. 2007
0 views

FinanceTransition

rexcor baker
15. 10. 2007
0 views

rexcor baker

kakande
28. 11. 2007
0 views

kakande

bernsteintwo
16. 10. 2007
0 views

bernsteintwo

What Is Internal Control
29. 10. 2007
0 views

What Is Internal Control

11 40 063
07. 11. 2007
0 views

11 40 063

Il Nazismo
14. 11. 2007
0 views

Il Nazismo

kryukov 20041004
12. 10. 2007
0 views

kryukov 20041004

AI 120 Examples
17. 10. 2007
0 views

AI 120 Examples

galaxy physics
01. 12. 2007
0 views

galaxy physics

Qualitative tools
29. 11. 2007
0 views

Qualitative tools

pannebecker
03. 01. 2008
0 views

pannebecker

infectious
05. 01. 2008
0 views

infectious

b e flows
07. 01. 2008
0 views

b e flows

CSI pres
07. 10. 2007
0 views

CSI pres

1A Quality of our Water
02. 01. 2008
0 views

1A Quality of our Water

OPVII AldusEquity
01. 10. 2007
0 views

OPVII AldusEquity

wheat 1
04. 10. 2007
0 views

wheat 1

Lexical Semantics II
21. 11. 2007
0 views

Lexical Semantics II

Forklift Standard 12 14 99
27. 02. 2008
0 views

Forklift Standard 12 14 99

manuel scott powerpoint
25. 03. 2008
0 views

manuel scott powerpoint

subspace
19. 06. 2007
0 views

subspace

skos ecoterm 2006
19. 06. 2007
0 views

skos ecoterm 2006

services
19. 06. 2007
0 views

services

Working with Automatic PGA
19. 06. 2007
0 views

Working with Automatic PGA

wider context
19. 06. 2007
0 views

wider context

weinberg wfi
19. 06. 2007
0 views

weinberg wfi

VS Mod Presentation
19. 06. 2007
0 views

VS Mod Presentation

Unicode from a distance
19. 06. 2007
0 views

Unicode from a distance

Unicode AndIndia
19. 06. 2007
0 views

Unicode AndIndia

tunable abw
19. 06. 2007
0 views

tunable abw

Tsunefum Mizuno sep14 05
19. 06. 2007
0 views

Tsunefum Mizuno sep14 05

tlstut
19. 06. 2007
0 views

tlstut

synergy redesign demo
19. 06. 2007
0 views

synergy redesign demo

acadien
19. 06. 2007
0 views

acadien

y report
19. 06. 2007
0 views

y report

Tom Worthington
19. 06. 2007
0 views

Tom Worthington

Millennials
14. 07. 2007
0 views

Millennials

vienna a6
19. 06. 2007
0 views

vienna a6

unit armorer sustainment
28. 02. 2008
0 views

unit armorer sustainment

SCI1010 C2
13. 11. 2007
0 views

SCI1010 C2

Slides 2006 fin year web3
19. 06. 2007
0 views

Slides 2006 fin year web3

MNEaula07
28. 12. 2007
0 views

MNEaula07

OUR SCAVENGER HUNT edited
16. 11. 2007
0 views

OUR SCAVENGER HUNT edited

ImplicationsResearch
03. 01. 2008
0 views

ImplicationsResearch

Jonh Roberts
31. 07. 2007
0 views

Jonh Roberts

wstechnology
19. 06. 2007
0 views

wstechnology

DiapoAnglaisdÃf
23. 10. 2007
0 views

DiapoAnglaisdÃf

QM chip
15. 10. 2007
0 views

QM chip

zend talk
19. 06. 2007
0 views

zend talk

seminarpresent
24. 02. 2008
0 views

seminarpresent

High School Counsellor Session
23. 11. 2007
0 views

High School Counsellor Session

xml cop feb05
19. 06. 2007
0 views

xml cop feb05

Value of Org RWG
19. 06. 2007
0 views

Value of Org RWG

Promo wkshp Downes
13. 03. 2008
0 views

Promo wkshp Downes

yw
17. 10. 2007
0 views

yw

WP1b
15. 10. 2007
0 views

WP1b

Regency Traffic111508 3 1
11. 03. 2008
0 views

Regency Traffic111508 3 1

Superstar
19. 06. 2007
0 views

Superstar

BUS 400
05. 10. 2007
0 views

BUS 400

950321
11. 10. 2007
0 views

950321

perry presentation
04. 03. 2008
0 views

perry presentation

vergados 1
20. 11. 2007
0 views

vergados 1

vlad
19. 06. 2007
0 views

vlad

Darstellung des HH AZM
15. 11. 2007
0 views

Darstellung des HH AZM