Searching Outline

Searching
1
© Copyright 2006 Haim Levkowitz
Outline
•
•
•
•
•
•
•
•
•
•
•
Goals and Objectives
Topic Headlines
Introduction
Directories
Open Directory Project
Search Engines
Metasearch Engines
Search techniques
Intelligent Agents
Invisible Web
Summary
2
© Copyright 2006 Haim Levkowitz
1
Goals and Objectives
• Goals
• Understand searching
• find relevant information fast
• know what / how / where to search
• Objectives …
3
© Copyright 2006 Haim Levkowitz
Objectives
•
•
•
•
•
•
•
•
Subject Directories
Open Directory project
Search and metasearch engines
Search techniques
Intelligent agents
The visible web
The invisible web
Search techniques for the invisible web
4
© Copyright 2006 Haim Levkowitz
2
Topic Headlines
• Introduction
• directory / search engines
• Directories
• subject tree manually / use engine
• Open Directory Project
•…
5
© Copyright 2006 Haim Levkowitz
• Search Engines
• ranking search results
• How do search engines do their job
• Metasearch Engines
• multiple search engines at once
• Search internet more effectively
• Search Techniques
• Intelligent Agents
• Invisible Web
6
© Copyright 2006 Haim Levkowitz
3
Introduction
• what / how / where to search
• Search results  web pages
• main tools
• Directory – subject guide organized by major
topics and subtopics
• Search Engines – “crawler / bot”
• Each  database
• Directory  compiled by humans
• Engine’s  generated automatically
7
© Copyright 2006 Haim Levkowitz
Directories
•
•
•
•
human-powered search engines
organize information in hierarchical tree by subjects
general  specific
two ways to search directory
• Manual – browse subjects hierarchically
• Search engine – enter search terms
• Example –Yahoo Directory
8
© Copyright 2006 Haim Levkowitz
4
Directories
9
© Copyright 2006 Haim Levkowitz
Open Directory Project
• Search results ranked
• human editors to rank web pages
• As number of pages for topic increase more
time-consuming and cost-bearing to rank
• Open Directory Project
• ranking system to users
• Users become editors & evaluate web sites in area
of expertise
•  lot more content
• http://dmoz.org
10
© Copyright 2006 Haim Levkowitz
5
Search Engines
• Examples
• Google google.com
• Yahoo yahoo.com
• Ask : ask.com
• MSN Search search.msn.com
• AOL Search search.aol.com
• Answers answers.com
• Tips on using search engine and much more
http://www.searchenginewatch.com
11
© Copyright 2006 Haim Levkowitz
Search Engines
• based
• three important parts
• “Spider / crawler / robot”: follow links in
databases
• Indexer: identify web page content + store in
database
• Searcher: sift through engine’s index to find
matches query + rank matches
• Relevance ranking algorithm crucial
• Different engines  different results
12
© Copyright 2006 Haim Levkowitz
6
Metasearch Engines
•
•
•
•
multi-engine search
skip engine that is down
no own database
Examples :
• http://www.dogpile.com
• http://www.metacrawler.com
• http://www.profusion.com
13
© Copyright 2006 Haim Levkowitz
Search Techniques
• Searching guidelines:
• Change query to improve results
• Search string = key words
• not exact phrase
• Advanced Searching techniques:
• Words and exact phrase
• Boolean search – AND, OR, NOT
• Title search –web page title
• Site search –limit search to particular site
• URL search, Link search
• Wildcard (fuzzy) search –*
• Features search –special features of engines
14
© Copyright 2006 Haim Levkowitz
7
Search Techniques
15
© Copyright 2006 Haim Levkowitz
Intelligent Agents
• Three retrieval paradigms:
• Statistical – correlations of word counts in
documents
• Semantic – natural language processing and
artificial intelligence
• Contextual – use thesaurus and encoded
relationships
• intelligent agent: gather information / perform tasks
based on human input
• E.g., Spider part of search engine
16
© Copyright 2006 Haim Levkowitz
8
Intelligent Agents
• Advantages:
• More intelligent search
• Create and update own knowledge database
• Perform tasks quicker
• Communicate & co-operate w/ other agents
• Customizable
• Continuously scan internet for information
• Free user from mundane tasks
17
© Copyright 2006 Haim Levkowitz
Invisible Web
• hidden web content
• Database contents
• Dynamically generated pages
• estimated to be larger than visible web
• search invisible web:
• Directories (Invisible Web Catalogue)
• Databases
18
© Copyright 2006 Haim Levkowitz
9
Summary
• Search engines
• Directories
• Open Directory Project
• most popular search engines
• Metasearch engines
• Intelligent agents
• Invisible web
19
© Copyright 2006 Haim Levkowitz
10