Published on March 12, 2008
Exploring the Deep Web: Exploring the Deep Web University of Utah Government Documents Librarians Amy Brunvand Kate Holvoet Peter Kraus David Morrison What is the Deep Web?: What is the Deep Web? The deep Web is the hidden part of the Web, containing a huge volume of content that is inaccessible to conventional search engines, and consequently, to most users. How big is the Deep Web?: How big is the Deep Web? 550 billion documents 500 times the content of the surface Web Google has identified 1.2 billion documents An Internet search typically searches .03% (1/3000) of available content. What’s in the Deep Web?: What’s in the Deep Web? Searchable databases Downloadable files & spreadsheets Image and multi-media files Data sets Various file formats such as .pdf Lots of government information Why use the Deep Web?: Why use the Deep Web? Higher quality sources Selected and organized by subject experts Dynamic display Customized data sets Some data is visual, and not word searchable Regular search engines miss vast resources available in the Deep Web Why are we talking about Government Sites in the Deep Web?: Why are we talking about Government Sites in the Deep Web? Governments have the mandate and the capacity to gather information that individuals don’t Most government information is copyright free Government information is authoritative Governments have the financial and human resources to maintain Deep Web sites The Deep Web for Federal Information: The Deep Web for Federal Information Peter L. Kraus Federal Documents Librarian Marriott Library – University of Utah The Web Today: The Web Today Web sites from the federal government only occupy about 1% of the entire global web. However, they hold 85% of “The Deep Web”. The content of these web sites include items with either an .html or .pdf format (reports, records, data-sets, etc) – diversity of files. Little standardization or uniformity ; Common term for this content is “Grey Literature”. Definition of “Grey Literature”: Definition of “Grey Literature” “That which is produced on all levels of government, academics, business and industry in print and electronic formats, but which is not controlled by commercial publishers” Growth and Life of Federal Information: Growth and Life of Federal Information On federal web sites the amount of information grew 13-fold between 1992-2003 The average life expectancy of federal web resource is 4 months (2003) What can libraries do?: What can libraries do? LOCKSS-DOCS project (BYU and UU are members) (Archival project) Cooperative efforts in specific subject areas (Western Waters Digital Library) Individual Institutional Initiatives; such as Institutional Repositories ; reflecting the institutional productivity in research (Information often funded by federal grants) The Deep Web for Health and Science Information: The Deep Web for Health and Science Information Amy Brunvand – Government Information Librarian Marriott Library – University of Utah Slide25: Finding Naked People - Forsyth, Fleck (1996) (Correct) (54 citations) This paper demonstrates an automatic system for telling whether there are naked people present in an image. The approach combines color and texture properties to obtain a mask for skin regions, which is shown to be effective for a wide range of shades and colors of skin. http.cs.berkeley.edu/~daf/newo2.ps.Z Slide26: Graph showing number of citations to “Finding Naked People” Slide28: Arches National Park : NASA Landsat 7 10/3/99 Slide31: Development and Evaluation of Stitched Sandwich Panels Larry E. Stanley; Daniel O. Adams NASA Langley Research Center NASA/CR-2001-211025 , June 2001; 20010702 ….. test panels were produced initially at the University of Utah and later at NASA Langley Research Center…… http://techreports.larc.nasa.gov/ltrs/PDF/2001/cr/NASA-2001-cr211025.pdf Slide37: Marriott Library, Salt Lake City, Utah, United States 9/18/2003 (TerraServer) Slide39: Utah Seismic Hazards (National Atlas) The Deep Web for International Information: The Deep Web for International Information Kate Holvoet –Interim Head, Government Documents and Microforms Marriott Library – University of Utah International Deep Web Resources: International Deep Web Resources International organizations collect an amazing amount of data Statistical data is often best organized in database and spreadsheet format Like the US Government, individual countries post data files and databases This information may not be available in print sources in schools and libraries United Nations Official Documents System: United Nations Official Documents System http://documents.un.org/ Why use the ODS?: Why use the ODS? Full-text Official United Nations Documents (1993 -) online, free Retrospective digitization in process Highly relevant material for almost any international topic Timely and authoritative United Nations Statistical Databases: United Nations Statistical Databases Value of the information: Authoritative Comparative Time series Compact Database topics include: Commodity trade Demographics Disability statistics Social indicators Statistics on men and women Slide48: http://unstats.un.org/unsd/databases.htm Individual Country Statistics: Individual Country Statistics http://www.census.gov/main/www/stat_int.html Why use this kind of information?: Why use this kind of information? Aggregate statistical sources are often not as up-to-date Individual countries are often more specific in their indicators than aggregate sources Information in databases, spreadsheets, and downloadable files is usually NOT searchable by web crawlers Patents, Trademarks and the Deep Web: Patents, Trademarks and the Deep Web Dave Morrison Documents and Microforms Division Marriott Library - University of Utah Slide81: For Further Information USPTO Information Line 800-PTO-9199 Marriott Library, University of Utah 801-581-8394 www.lib.utah.edu/documents Slide82: Any Questions? Thanks!: Thanks!