CS20L Lecture Set 14 Hashing

Information about CS20L Lecture Set 14 Hashing

Published on January 17, 2008

Author: Raffaele

Source: authorstream.com

Content

Hashing:  Hashing The process of mapping a key value to a position in a table. A hash function maps key values to positions. It is denoted by h. A hash table is an array that holds the records. It is denoted by T. The hash table has M slots, indexed from 0 to M - 1 Hashing:  Hashing For any value K in the key range and some hash function h, h(K) =i; 0  i < M, such that key(T[i]) = K. Load factor The ratio of the number of items in the table to the table size Load factor = Number of items / Table size Hash Tables:  Hash Tables Direct Address Tables If we have a collection of n elements whose keys are unique integers in (1 ... m), where m >= n, then we can store the items in a direct address table, T[m], where Ti is either empty or contains one of the elements of our collection. Hash Tables:  Hash Tables Hash Tables:  Hash Tables Direct Address Tables Searching a direct address table operation for a key, k, we access Tk, if it contains an element, return it, if it doesn't then return a NULL. There are two constraints here: the keys must be unique, and the range of the key must be severely bounded. Hash Tables:  Hash Tables Direct Address Tables If the keys are not unique, then we can simply construct a set of m lists and store the heads of these lists in the direct address table. Hash Tables:  Hash Tables Direct Address Tables The range of the key determines the size of the direct address table and may be too large to be practical. Hash Tables:  Hash Tables Direct Address Tables Direct addressing is easily generalized to the case where there is a function, h(k) => (1,m) which maps each value of the key, k, to the range (1,m). In this case, we place the element in T[h(k)] rather than T[k]. Hash Tables:  Hash Tables Mapping functions The direct address approach requires that the function, h(k), is a one-to-one mapping from each k to integers in (1,m). Such a function is known as a perfect hashing function. The PHF maps each key to a distinct integer within some manageable range. Hash Tables:  Hash Tables Mapping functions Finding a perfect hashing function is not always possible. More realistically, a hash function, h(k), maps most of the keys onto unique integers, but maps some other keys on to the same integer. Hash Tables:  Hash Tables Mapping functions This clashing is called Collision. If the number of collisions is sufficiently small, then the hash tables work quite well. Hash Tables:  Hash Tables Collision Resolution Chaining Overflow areas Re-hashing Using neighbouring slots (linear probing) Quadratic probing Random probing. Collision Resolution:  Collision Resolution Chaining Chain all collisions in lists attached to the appropriate slot. This allows an unlimited number of collisions to be handled and doesn't require a priori knowledge of how many elements are contained in the collection. The tradeoff is the same as with linked lists versus array implementations of collections: linked list overhead in space and, to a lesser extent, in time. Collision Resolution:  Collision Resolution 2. Overflow areas Divide the pre-allocated table into two sections: the primary area to which keys are mapped an area for collisions, normally termed the overflow area. Collision Resolution:  Collision Resolution 2. Overflow areas When a collision occurs, a slot in the overflow area is used for the new element and a link from the primary slot established as in a chained system. Collision Resolution:  Collision Resolution Re-hashing Use a second hashing operation when there is a collision. If there is a further collision, re-hash until an empty “slot” in the table is found. The re-hashing function can either be a new function or a re-application of the original one. As long as the functions are applied to a key in the same order, then a sought key can always be located. Collision Resolution:  Collision Resolution Re-hashing Collision Resolution:  Collision Resolution 4. Using neighbouring slots - linear probing One of the simplest re-hashing functions is +1 (or -1) i.e. on a collision, look in the neighbouring slot in the table. This calculates the new address extremely quickly.. Collision Resolution:  Collision Resolution 4. Using neighbouring slots - linear probing Linear probing is subject to a clustering phenomenon. Re-hashes from one location occupy a block of slots in the table which “grows” towards slots to which other keys hash. This exacerbates the collision problem and the number of re-hashed can become large. Collision Resolution:  Collision Resolution Quadratic probing Better behaviour is usually obtained with quadratic probing, where the secondary hash function depends on the re-hash index: address = h(key) + c i2 on the tth re-hash. (A more complex function of i may also be used.) Collision Resolution:  Collision Resolution Quadratic probing Since keys which are mapped to the same value by the primary hash function follow the same sequence of addresses, quadratic probing shows secondary clustering. However, secondary clustering is not nearly as severe as the clustering shown by linear probes. Collision Resolution:  Collision Resolution Quadratic probing Re-hashing schemes use the originally allocated table space and thus avoid linked list overhead, but require advance knowledge of the number of items to be stored. However, the collision elements are stored in slots to which other key values map directly, thus the potential for multiple collisions increases as the table becomes full. Collision Resolution:  Collision Resolution 6. (Pseudo) Random probing Collision Resolution:  Collision Resolution Example: Using the hash function: h (k) = 2k mod 10 Insert the numbers: 2, 20, 7, 15, 3, 0, 4, 14 into a one-dimensional array of 10 elements using linear probing to resolve collisions. Collision Resolution:  Collision Resolution h (k) = 2k mod 10 2, 20, 7, 15, 3, 0, 4, 14

Related presentations


Other presentations created by Raffaele

Biomes freshwater
08. 01. 2008
0 views

Biomes freshwater

phase transformation
09. 01. 2008
0 views

phase transformation

SummerSafety
10. 01. 2008
0 views

SummerSafety

matt nauth
10. 01. 2008
0 views

matt nauth

PhotoReactionIronOxa late
11. 01. 2008
0 views

PhotoReactionIronOxa late

APA 2004
13. 01. 2008
0 views

APA 2004

guineapigs
13. 01. 2008
0 views

guineapigs

Lecture8 HyperTexture
14. 01. 2008
0 views

Lecture8 HyperTexture

Heuristicsstudent
15. 01. 2008
0 views

Heuristicsstudent

Laughing Matter1
17. 01. 2008
0 views

Laughing Matter1

pictorial dictionary
17. 01. 2008
0 views

pictorial dictionary

ETM Solar
21. 01. 2008
0 views

ETM Solar

51english
22. 01. 2008
0 views

51english

Lambda RED
22. 01. 2008
0 views

Lambda RED

NeillReid012005
25. 01. 2008
0 views

NeillReid012005

Rhyne
30. 01. 2008
0 views

Rhyne

stmgtsupv
07. 02. 2008
0 views

stmgtsupv

reel theory
12. 02. 2008
0 views

reel theory

urbinodek2
25. 02. 2008
0 views

urbinodek2

um rule transitional services
03. 03. 2008
0 views

um rule transitional services

Educ in SA PPt
11. 01. 2008
0 views

Educ in SA PPt

Melnik ASSE 2005 Powerpoint
07. 03. 2008
0 views

Melnik ASSE 2005 Powerpoint

Multi Site Presentation
10. 03. 2008
0 views

Multi Site Presentation

wk11 Antennas
24. 03. 2008
0 views

wk11 Antennas

fifty
28. 01. 2008
0 views

fifty

AGellmann BOPL 060704
08. 04. 2008
0 views

AGellmann BOPL 060704

EURPRE FROMTHEPOUNDTOTHEEURO
15. 04. 2008
0 views

EURPRE FROMTHEPOUNDTOTHEEURO

TomSchaller
09. 01. 2008
0 views

TomSchaller

WholeBodyVibration2
17. 04. 2008
0 views

WholeBodyVibration2

History of Swimming
23. 04. 2008
0 views

History of Swimming

Dr Kathie Cooper
24. 04. 2008
0 views

Dr Kathie Cooper

Dirk
07. 05. 2008
0 views

Dirk

Ch 03
02. 05. 2008
0 views

Ch 03

fourthgrade
22. 01. 2008
0 views

fourthgrade

SWP02
19. 01. 2008
0 views

SWP02

NextGenVisAids
04. 02. 2008
0 views

NextGenVisAids

Emma Bartle
11. 02. 2008
0 views

Emma Bartle

Eurodad INFID presentation 2005
14. 01. 2008
0 views

Eurodad INFID presentation 2005