Digital Library

Information about Digital Library

Published on August 28, 2008

Author: rupak

Source: authorstream.com

Content

Information Retrieval and Digital Libraries : Information Retrieval and Digital Libraries Lee-Feng Chien (簡立峰)‏ Institute of Information Science Academia Sinica Outline : Outline Trends of information retrieval IR in digital libraries IR development in NDAP Selected IR researches at IIS Term clustering & thesaurus construction Query translation using Web mining I. Trends of Information Retrieval : I. Trends of Information Retrieval Information Retrieval : Information Retrieval Definition A research with a goal of exploration of information storage, classification, extraction, indexing and browsing techniques for the retrieval of non-structural databases such as textual documents Related conferences & journals SIGIR, TREC, CIKM, AIRS (originally as IRAL), NCTIR ACM TIOS, JASIST, IP&M, IRJ, ACM TALIP Conventional IR Text indexing, search, ranking, relevance feedback, classification, clustering, keyword extraction, thesaurus construction (Modern Information Retrieval (Baeza-Yates, R. and Riberio-Neto, B., Addison-Wesley 1999) Modern Information Retrieval : Modern Information Retrieval Web IR Global resource collection by robot, serving millions users per day, scalable search, distributed search, multi-lingual and multi-culture, ranking based on social information & user behaviors (A Survey on Web Information Retrieval Technologies, Lan Huang at http://citeseer.nj.nec.com/336617.html)‏ Multimedia Retrieval Retrieving multimedia contents such as speech, audio, music, image, video (from content-based to concept-based) New Research Topics Question answering, text mining, summarization for IR, filtering of e-mail spam & sensitive URLs, XML search, P2P search, semantic Web search, etc. Spectrum of IR Research : Spectrum of IR Research Media Text, e.g., Web texts, documents, bibliographic data Audio, e.g., music, speech, sound effects, songs, broadcast news Image, e.g., pictures, computer graphics Video, e.g., films Scale Personal, intranet, internet, P2P network, wireless network General or specific languages/subjects Scale: thousand, million or billion (documents, users, queries) Structure Non-structure (Full-text), semi-structure (XML/Metadata), structure (RDBMS)‏ Interface Web-based, mobile-phone-based, voice-based II. IR in Digital Libraries : II. IR in Digital Libraries Digital Libraries : Digital Libraries Content in digital libraries Heterogeneous data formats and distributed archiving With well-formed metadata IR demand Deep search, effective ranking, distributed processing vs. Web IR Similar but slightly different Well-organized data (exchangeable) More professional users More management issues IR in Digital Libraries (vs. IR in Web)‏ : IR in Digital Libraries (vs. IR in Web)‏ Union catalog, e.g., OAI (like Yahoo)‏ Federate search (like Google)‏ Harvesting (or crawling) & caching (like spider)‏ Thesaurus-based & concept-based search (none)‏ Metadata annotation/generation (like semantic Web)‏ Data protection (not seriously concerned)‏ OAI-based Union Catalog Services : OAI-based Union Catalog Services IR in Digital Libraries (vs. IR in Web)‏ : IR in Digital Libraries (vs. IR in Web)‏ Other advanced issues Language (developing) Cross-language search Media issues (developing)‏ multimedia search Presentation & interactive issues (bandwidth & cost problem)‏ VR and search III. IR Development in NDAP : III. IR Development in NDAP IR Development in NDAP : IR Development in NDAP Taiwan Digital Archives (go)‏ Union catalog OAI-based tool kit and union catalog of NDAP Multimedia search (go)‏ Multimedia presentation (go)‏ Language-based IR Retrieval of “missing” characters, esp. Chinese (go)‏ Chinese word segmentation (go)‏ Cross-language IR (go)‏ Ongoing (go)‏ Federate Search, digital right tracking Digital library caching Web page spider; will develop database wrappers Slide 14: 中文斷詞暨未知詞偵測系統 連結 Slide 15: 蔣宋美齡(Nb) 紐約(Nc) 去世(VH) 享年(VJ) 106歲(DM)  王良芬(Nb) /(FW) 紐約(Nc) 廿四日(DM) 電(Na)  跨越(VCL) 三個(DM) 世紀(Na) 的(DE) 傳奇(Na) 人物(Na) 、(PAUSECATEGORY) 「(PARENTHESISCATEGORY) 永遠(VH) 的(DE) 第一(DM) 夫人(Na) 」(PARENTHESISCATEGORY) 蔣宋美齡(Nb) 女士(Na) ,(COMMACATEGORY) 於(P) 紐約(Nc) 時間(Na) 十月廿三日(DM) 晚間(Nd) 十一點十七分(DM) ((PARENTHESISCATEGORY) 台北(Nc) 時間(Na) 二十四日(DM) 上午(Nd) 十一點十七分(DM) )(PARENTHESISCATEGORY) ,(COMMACATEGORY)  在(P) 曼哈頓(Nc) 上(Ncd) 東(Ncd) 城(Na) 的(DE) 寓所(Na) 與世長辭(VH) ,(COMMACATEGORY)  享年(VJ) 一百零六歲(DM) 。(PERIODCATEGORY) 外甥女(Na) 孔(Na) 令(VL) 儀(b) 與(Caa) 夫婿(Na) 黃雄盛(Nb) ,(COMMACATEGORY) 以及(Caa) 曾孫(Na) 蔣友(Nb) 常(D) 都(D) 隨侍在側(VA) 。(PERIODCATEGORY) 臨終(VH) 前後(Ng) 家人(Na) 一直(D) 為(P) 她(Nh) 讀(VC) 聖經(Nb) ,(COMMACATEGORY) 以及(Caa) 不斷(VH) 禱告(VA),(COMMACATEGORY) 祈願(VK) 她(Nh) 歸主(Na) 天國(Nc) 。(PERIODCATEGORY) 蔣(Nb) 夫人(Na) 生前(Nd) 在(P) 意識(Na) 清醒(VH) 的(DE) 時候(Na) ,(COMMACATEGORY 曾(D) 對(P) 身旁(Nc) 的(DE) 親人(Na) 說(VE) 過(Di) ,(COMMACATEGORY)  她(Nh) 能(D) 活到(VH) 一百多歲(DM) 是(SHI) 上帝(Na) 的(DE) 賜福(VB) ,(COMMACATEGORY)  心(Na) 中(Ng) 充滿(VJ) 喜樂(Na) ,(COMMACATEGORY)  她(Nh) 把(P) 一切(Neqa) 都(D) 交給(VD) 上帝(Na) ,(COMMACATEGORY) 沒有(VJ) 任何(Neqa) 憂愁(VK) 和(Caa) 懼怕(VJ) 。(PERIODCATEGORY)  蔣(Nb) 夫人(Na) 辭世(VH) 之後(Ng) ,(COMMACATEGORY)  遺體(Na) 已(D) 從(P) 寓所(Na) 移到(VC) 一家(DM) 位於(VCL) 麥迪遜(Nb) 大道(Na) 和(Caa) 第八十一街(DM) 交口(Nc) 的(DE) 殯儀館(Nc) ,(COMMACATEGORY)  這(Nep) 是(SHI) 紐約(Nc) 最(Dfa) 高級(VH) 的(DE) 殯儀館(Nc) 之一(Nc) ,(COMMACATEGORY)  曾(D) 辦過(VC) 許多(Neqa) 名流(Na) 的(DE) 後事(Na) 。(PERIODCATEGORY)  家屬(Na) 並(D) 將(D) 遵照(VC) 其(Nep) 生前(Nd) 交代(VE) ,(COMMACATEGORY)  將(P) 她(Nh) 安葬(VC) 在(P) 紐約(Nc) 上州(DM) 威徹斯特郡(Nc) 的(DE) 芬克里夫(Nb) 墓園(Nc) ((PARENTHESISCATEGORY) Ferncliff(FW) Cemetery(FW) )(PARENTHESISCATEGORY) ,(COMMACATEGORY)  而(Cbb) 不會(D) 移靈(VCL) 回(VCL) 台灣(Nc) 和(Caa) 在(P) 大溪(Nc) 慈湖(Nc) 的(DE) 蔣公(Nb) 合葬(VC) ,(COMMACATEGORY)  同時(Nd) 也(D) 完全(D) 排除(VC) 了(Di) 安葬(VC) 在(P) 大陸(Nc) 故土(Nc) 的(DE) 可能性(Na) 。(PERIODCATEGORY)‏ 斷詞結果 未知詞列表: 王良芬 Nb 1 黃雄盛 Nb 1 蔣友 Nb 1 歸主 Na 1 麥迪遜 Nb 1 交口 Nc 1 威徹斯特郡 Nc 1 芬克里夫 Nb 1 (Back)‏ IV. Selected IR Research at IIS : IV. Selected IR Research at IIS Related IR Research at IIS : Related IR Research at IIS Thesaurus-based & concept-based search Livethesaurus & liveconcept Metadata annotation/generation Liveclassifier & image annotation Cross language search Livetrans Speech retrieval, video caption retrieval Term Clustering and Thesaurus Construction : Term Clustering and Thesaurus Construction Term Clustering : Term Clustering 勞委會,長榮, 金庸, 武俠小說, 職訓局, 就業, 泡麵, dbt, 武俠, 青輔會, 自傳, 人力銀行,長榮航空, 找工作, 履歷表, 求職, 求才, 占卜, 徵才, 人力資源,104人力銀行, 塔羅牌, 算命, 紫微斗數, 命理, 姓名學, 心理測驗, 星座, 愛情, 航空公司, 航空, 華航, 中華航空, 補帖, 大補帖, 黃易, Eva Classy terms into classes with similar topics Can be applied to thesaurus construction, taxonomy generation, query expansion, user interests understanding (ICDM’02)‏ Term Clustering through Web Mining (ICDM’02)‏ : Term Clustering through Web Mining (ICDM’02)‏ Hierarchical clustering CS Terms Clustering : CS Terms Clustering CS Terms Clustering : CS Terms Clustering Paper Title Categorization : Paper Title Categorization Thesaurus Construction from Query Log : Thesaurus Construction from Query Log Query logs provide a representative terms for DL usage Taxonomy generation from query logs Query clustering Query categorization Document categorization Term Clustering : Term Clustering Feature Extraction Use co-occurred seed terms extracted from retrieved top pages Term Vector Each query term is assigned a term vector Record the co-occurred feature terms and their frequency values in the retrieved documents. Term Similarity TF *IDF-based Cosine measurement Hierarchical Term Clustering Cluster popular query terms in the log into initial categories Query terms with similar features are grouped into clusters. Slide 26: Term Similarity Hierarchical Term Clustering : Hierarchical Term Clustering Agglomerative hierarchical clustering (AHC)‏ Compute the similarity between all pairs of clusters Estimate similarity between all pairs of composed terms Use the lowest term similarity value as the cluster similarity value Merge the most similar (closest) two clusters Complete linkage method Update the cluster vector of the new cluster Repeat steps 2 and 3 until only a single cluster remains Clustering Results : Clustering Results Application – Concept Search : Application – Concept Search Other Research : Other Research Cross-Language Web Search : Cross-Language Web Search LiveTrans Q&A : Q&A Thanks !

Related presentations


Other presentations created by rupak

Telemedicine
28. 08. 2008
0 views

Telemedicine

Learning Management System
02. 12. 2008
0 views

Learning Management System

Free Open Source Softwares FOSS
04. 02. 2014
0 views

Free Open Source Softwares FOSS

Biomedical Search engines
26. 08. 2008
0 views

Biomedical Search engines

Biomedical Computing
15. 09. 2008
0 views

Biomedical Computing