Published on December 6, 2007
DB2 Net Search Extender: DB2 Net Search Extender Presenter: Sudeshna Banerji (CIS 595: Bioinformatics) Slide2: Topics to discuss: Information retrieval Text-indexing DB2 Text Extenders DB2 Net Search Extender References Questions A Little Background…: A Little Background… Information Retrieval(IR): Extraction of “relevant” information from huge volumes of data scattered across different databases. Examples: Textual search, image search, video search etc. Efficiency(time and speed) of IR is based on different INDEXING technologies. Indexing increases performance of system. An example of indexing technology: Text-indexing used for textual-search. A Little Background…: A Little Background… Text-Indexing : Process of deciding what will be used to represent a given document. A text index consists of significant terms extracted from the text documents, each term stored together with information about the document that contains it. The search is then handled as a query to look up the index. A Little Background…: A Little Background… Text-Indexing (continued): Involves the following: Parsing the documents to recognize the structure. E.g title, date, other fields. Scan for word tokens: numbers, special characters, hyphenation, capitalization etc. Stopword removal: based on short list of common words like “the”, “and”, “or”. Slide6: Indexing only Significant Terms DB2 Extenders: DB2 Extenders Product of IBM family that provide support to data beyond traditional character and numeric data types. Extenders available for images, voice, video, complex documents (full-text search), spatial objects etc. Trial and beta versions available for testing. Link for extenders: http://www-3.ibm.com/software/data/db2/extenders/index.html DB2 Text Extenders: DB2 Text Extenders To meet the increasing demands of content management, IBM has introduced 3 full-text retrieval applications available for DB2 Universal Database (DB2 UDB). DB2 Net Search Extender DB2 Text Information Extender DB2 Text Extender When to use what? Link for comparisons of the above: http://www-3.ibm.com/software/data/db2/extenders/fulltextcomparison.html DB2 Net Search Extender: DB2 Net Search Extender Replaces DB2 Text Information Extender Version 7.2 Some important features: Indexing speed of about 1GB per hour . Different text formats: ASCII Plain text, HTML,XML, GPP Base support for 37 languages including English, Spanish, French, Japanese and Chinese . Sub-second search response times. No decrease in search performance with up to 1000 concurrent queries per second. DB2 Net Search Extender: DB2 Net Search Extender Some text-search capabilities: Search can be performed using SQL (fourth generation language…almost like English query). Searches can include: Boolean operations. Proximity search for words in the same sentence or paragraph: for HTML,XML and GPP. “Fuzzy” searches for words having a similar spelling as the search term: Andrew & Andru Thesaurus related search. Restrict searching to sections within documents. User can limit the search results with a “hit count”, and can also specify how the results are to be sorted. DB2 Net Search Extender: DB2 Net Search Extender System requirements DB2 Version 8.1 Java Runtime Environment (JRE) Version 1.3.1 Windows Installation Administrative rights required. Call db2text start to start the DB2 Net Search Extender Instance Services. DB2 Net Search Extender: DB2 Net Search Extender Simple example with the SQL queries Following steps are required to do a basic textual-search in DB2 Net Search Extender: 1. Creating a database 2. Enabling a database for text search 3. Creating a table 4. Creating a full-text index 5. Loading sample data 6. Synchronizing the text index 7. Searching with the text index DB2 Net Search Extender: DB2 Net Search Extender 1. Creating a database: db2 "create database sample" 2. Enabling a database for text search: To start Net Search Extender Service db2text "START“ To prepare the database for use with DB2 Net Search Extender: db2text "ENABLE DATABASE FOR TEXT CONNECT TO sample" DB2 Net Search Extender: DB2 Net Search Extender 3. Creating a table: db2 "CREATE TABLE books (isbn VARCHAR(18) not null PRIMARY KEY, author VARCHAR(30), story LONG VARCHAR, year INTEGER)" 4. Creating a full-text index: db2text "CREATE INDEX db2ext.myTextIndex FOR TEXT ON books (story) CONNECT TO sample" DB2 Net Search Extender: DB2 Net Search Extender 5. Loading sample data: db2 "INSERT INTO books VALUES (‘0-13-086755- 1’,’John’,’ A man was running down the street.’,2001)“ db2 "INSERT INTO books VALUES (‘0-13-086755-2’ , ‘Mike’, ’The cat hunts some mice.’, 2000)“ 6. Synchronizing the text index: db2text "UPDATE INDEX db2ext.myTextIndex FOR TEXT CONNECT TO sample“ DB2 Net Search Extender: DB2 Net Search Extender 7. Searching with the text index: Using CONTAINS scalar search function: db2 "SELECT author, story FROM books WHERE CONTAINS (story, ‘”cat“’) = 1 AND year >= 2000" The following result table is returned: AUTHOR STORY Mike The cat hunts some mice. NOTE: To create a text-index, the text columns must be one of the following data types: CHAR, VARCHAR, LONG VARCHAR, CLOB. DB2 Net Search Extender: DB2 Net Search Extender Thesaurus Support: A thesaurus is structured like a network of nodes linked together by relations: Associative relations: RELATED_TO Synonym relations: SYNONYM_OF Hierarchical relations: LOWER_THAN, HIGHER_THAN Creating and compiling a thesaurus: 1. Create a thesaurus definition file (explained below). 2. Compile the definition file into a thesaurus dictionary using DB2EXTTH utility. DB2 Net Search Extender: DB2 Net Search Extender Create a thesaurus definition file. Define its content in a definition file using a text editor. Example of some definition groups: :WORDS football .RELATED_TO goal .SYNONYM_OF soccer :WORDS chapel .LOWER_THAN skyscraper .HIGHER_THAN house DB2 Net Search Extender: DB2 Net Search Extender An example of a structure of a Thesaurus: Game Ball Game Tennis Soccer HIGHER_THAN HIGHER_THAN HIGHER_THAN Football HIGHER_THAN SYNONYM_OF DB2 Net Search Extender: DB2 Net Search Extender References: http://www-3.ibm.com/cgibin/db2www/data/db2/udb/winos2unix/support/ document.d2w/report?fn=desu9m03.htm#ToC Information Retrieval Site containing good lecture slides: http://ciir.cs.umass.edu/cmpsci646/ Net Search Extender Administration and User’s Guide , Version 8.1 (can be downloaded with the software) Slide21: ANY QUESTIONS????