sudeshna

Information about sudeshna

Published on December 6, 2007

Author: Joshua

Source: authorstream.com

Content

DB2 Net Search Extender:  DB2 Net Search Extender Presenter: Sudeshna Banerji (CIS 595: Bioinformatics) Slide2:  Topics to discuss: Information retrieval Text-indexing DB2 Text Extenders DB2 Net Search Extender References Questions A Little Background…:  A Little Background… Information Retrieval(IR): Extraction of “relevant” information from huge volumes of data scattered across different databases. Examples: Textual search, image search, video search etc. Efficiency(time and speed) of IR is based on different INDEXING technologies. Indexing increases performance of system. An example of indexing technology: Text-indexing used for textual-search. A Little Background…:  A Little Background… Text-Indexing : Process of deciding what will be used to represent a given document. A text index consists of significant terms extracted from the text documents, each term stored together with information about the document that contains it. The search is then handled as a query to look up the index. A Little Background…:  A Little Background… Text-Indexing (continued): Involves the following: Parsing the documents to recognize the structure. E.g title, date, other fields. Scan for word tokens: numbers, special characters, hyphenation, capitalization etc. Stopword removal: based on short list of common words like “the”, “and”, “or”. Slide6:  Indexing only Significant Terms DB2 Extenders:  DB2 Extenders Product of IBM family that provide support to data beyond traditional character and numeric data types. Extenders available for images, voice, video, complex documents (full-text search), spatial objects etc. Trial and beta versions available for testing. Link for extenders: http://www-3.ibm.com/software/data/db2/extenders/index.html DB2 Text Extenders:  DB2 Text Extenders To meet the increasing demands of content management, IBM has introduced 3 full-text retrieval applications available for DB2 Universal Database (DB2 UDB). DB2 Net Search Extender DB2 Text Information Extender DB2 Text Extender When to use what? Link for comparisons of the above: http://www-3.ibm.com/software/data/db2/extenders/fulltextcomparison.html DB2 Net Search Extender:  DB2 Net Search Extender Replaces DB2 Text Information Extender Version 7.2 Some important features: Indexing speed of about 1GB per hour . Different text formats: ASCII Plain text, HTML,XML, GPP Base support for 37 languages including English, Spanish, French, Japanese and Chinese . Sub-second search response times. No decrease in search performance with up to 1000 concurrent queries per second. DB2 Net Search Extender:  DB2 Net Search Extender Some text-search capabilities: Search can be performed using SQL (fourth generation language…almost like English query). Searches can include: Boolean operations. Proximity search for words in the same sentence or paragraph: for HTML,XML and GPP. “Fuzzy” searches for words having a similar spelling as the search term: Andrew & Andru Thesaurus related search. Restrict searching to sections within documents. User can limit the search results with a “hit count”, and can also specify how the results are to be sorted. DB2 Net Search Extender:  DB2 Net Search Extender System requirements DB2 Version 8.1 Java Runtime Environment (JRE) Version 1.3.1 Windows Installation Administrative rights required. Call db2text start to start the DB2 Net Search Extender Instance Services. DB2 Net Search Extender:  DB2 Net Search Extender Simple example with the SQL queries Following steps are required to do a basic textual-search in DB2 Net Search Extender: 1. Creating a database 2. Enabling a database for text search 3. Creating a table 4. Creating a full-text index 5. Loading sample data 6. Synchronizing the text index 7. Searching with the text index DB2 Net Search Extender:  DB2 Net Search Extender 1. Creating a database: db2 "create database sample" 2. Enabling a database for text search: To start Net Search Extender Service db2text "START“ To prepare the database for use with DB2 Net Search Extender: db2text "ENABLE DATABASE FOR TEXT CONNECT TO sample" DB2 Net Search Extender:  DB2 Net Search Extender 3. Creating a table: db2 "CREATE TABLE books (isbn VARCHAR(18) not null PRIMARY KEY, author VARCHAR(30), story LONG VARCHAR, year INTEGER)" 4. Creating a full-text index: db2text "CREATE INDEX db2ext.myTextIndex FOR TEXT ON books (story) CONNECT TO sample" DB2 Net Search Extender:  DB2 Net Search Extender 5. Loading sample data: db2 "INSERT INTO books VALUES (‘0-13-086755- 1’,’John’,’ A man was running down the street.’,2001)“ db2 "INSERT INTO books VALUES (‘0-13-086755-2’ , ‘Mike’, ’The cat hunts some mice.’, 2000)“ 6. Synchronizing the text index: db2text "UPDATE INDEX db2ext.myTextIndex FOR TEXT CONNECT TO sample“ DB2 Net Search Extender:  DB2 Net Search Extender 7. Searching with the text index: Using CONTAINS scalar search function: db2 "SELECT author, story FROM books WHERE CONTAINS (story, ‘”cat“’) = 1 AND year >= 2000" The following result table is returned: AUTHOR STORY Mike The cat hunts some mice. NOTE: To create a text-index, the text columns must be one of the following data types: CHAR, VARCHAR, LONG VARCHAR, CLOB. DB2 Net Search Extender:  DB2 Net Search Extender Thesaurus Support: A thesaurus is structured like a network of nodes linked together by relations: Associative relations: RELATED_TO Synonym relations: SYNONYM_OF Hierarchical relations: LOWER_THAN, HIGHER_THAN Creating and compiling a thesaurus: 1. Create a thesaurus definition file (explained below). 2. Compile the definition file into a thesaurus dictionary using DB2EXTTH utility. DB2 Net Search Extender:  DB2 Net Search Extender Create a thesaurus definition file. Define its content in a definition file using a text editor. Example of some definition groups: :WORDS football .RELATED_TO goal .SYNONYM_OF soccer :WORDS chapel .LOWER_THAN skyscraper .HIGHER_THAN house DB2 Net Search Extender:  DB2 Net Search Extender An example of a structure of a Thesaurus: Game Ball Game Tennis Soccer HIGHER_THAN HIGHER_THAN HIGHER_THAN Football HIGHER_THAN SYNONYM_OF DB2 Net Search Extender:  DB2 Net Search Extender References: http://www-3.ibm.com/cgibin/db2www/data/db2/udb/winos2unix/support/ document.d2w/report?fn=desu9m03.htm#ToC Information Retrieval Site containing good lecture slides: http://ciir.cs.umass.edu/cmpsci646/ Net Search Extender Administration and User’s Guide , Version 8.1 (can be downloaded with the software) Slide21:  ANY QUESTIONS????

Related presentations


Other presentations created by Joshua

Balancing Robot Seminar
07. 01. 2008
0 views

Balancing Robot Seminar

Histoplasmosis2007
04. 10. 2007
0 views

Histoplasmosis2007

DSLAM
28. 11. 2007
0 views

DSLAM

UNDERGROUND DAMAGE PREVENTION
05. 12. 2007
0 views

UNDERGROUND DAMAGE PREVENTION

math part2
06. 11. 2007
0 views

math part2

ch3 1
07. 11. 2007
0 views

ch3 1

programas de apoyo isde
15. 11. 2007
0 views

programas de apoyo isde

Natura 2000 EU ambitions
29. 12. 2007
0 views

Natura 2000 EU ambitions

quantom crypto
03. 01. 2008
0 views

quantom crypto

globesigning
28. 09. 2007
0 views

globesigning

SHOWREEL EASD 2002
13. 11. 2007
0 views

SHOWREEL EASD 2002

TS1 2 4
30. 12. 2007
0 views

TS1 2 4

munakataappendix
09. 10. 2007
0 views

munakataappendix

Virtual Training Part 2
02. 01. 2008
0 views

Virtual Training Part 2

sie urban presentation
24. 02. 2008
0 views

sie urban presentation

Howard Wood Keynote Presentation
26. 02. 2008
0 views

Howard Wood Keynote Presentation

Dan Gahagan NDIA Conference
28. 02. 2008
0 views

Dan Gahagan NDIA Conference

06 Aug08 Ballard Final
07. 11. 2007
0 views

06 Aug08 Ballard Final

Day3 PTLudkie BTabyss2
10. 03. 2008
0 views

Day3 PTLudkie BTabyss2

SpaceImpacts
12. 03. 2008
0 views

SpaceImpacts

ECEDHA Plenary Talk LAG
18. 03. 2008
0 views

ECEDHA Plenary Talk LAG

IHY NASA HQ
21. 03. 2008
0 views

IHY NASA HQ

t0dxj13cdvj21
27. 03. 2008
0 views

t0dxj13cdvj21

richichi
14. 11. 2007
0 views

richichi

07 12 GK3 ET7 Joe Doering
30. 03. 2008
0 views

07 12 GK3 ET7 Joe Doering

amenson
13. 04. 2008
0 views

amenson

rcms 10 09 2005 engl
27. 09. 2007
0 views

rcms 10 09 2005 engl

ENC1101 1
16. 11. 2007
0 views

ENC1101 1

Civpro 26
16. 11. 2007
0 views

Civpro 26

CPC06 DarkVoyage
04. 01. 2008
0 views

CPC06 DarkVoyage

Theocharopoulos
29. 12. 2007
0 views

Theocharopoulos

Trapp lecture
06. 03. 2008
0 views

Trapp lecture

Elephant Man
26. 11. 2007
0 views

Elephant Man

compshift
07. 01. 2008
0 views

compshift

AwardAddress
21. 11. 2007
0 views

AwardAddress

Historia A Kapitel 1
21. 11. 2007
0 views

Historia A Kapitel 1