Chapter-2 : Literature survey and scope of research

Chapter-2 : Literature survey and scope of research 15
CHAPTER - 2
Literature survey and scope of research
2.1 INTRODUCTION
This chapter introduces prior studies covering initiatives for the web search,
search engines, search engine optimization techniques, limitations of
existing
search
engines,
meta-search
engines,
meta-search
engine
optimization techniques, difference between search engines and meta-search
engines, limitations of existing meta-search engines, need of a new model of
meta-search engine and scope of research in meta-search engine for specific
information retrieval in an efficient manner.
2.2 LITERATURE SURVEY
2.2.1 History of the web surfing for web search
The roots of web search engine technology are in Information Retrieval (IR)
systems, which can be traced back to the work of Kuhn at IBM during the late
1950s. IR has been an active field within information science, and has been given a
big boost since the 1990s with the new requirements that the Web has brought.
[11]
Many methods used by current search engines can be traced back to the
developments in IR during the 1970s and 1980s. Especially influential is the
SMART (System for the Mechanical Analysis and Retrieval of Text) retrieval
system, initially developed by Gerard Salton and his collaborators at Cornell
University during the early 1970s. [11]
Prior to 1990, there was no approach to search the Web. At that time there
were a small number of websites. Most sites contained collections of files
that user could download. The only way user could find out that a file was
on a specific site. Then came a tool which is called Archie. It was the first
program to search the Web for the contents of all websites all over the world.
It is not actually search engine but like Yahoo, it is to search list of files.
Information seeker needed to know the exact name of the file for which
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 16
he/she is looking for. Prepared with that information, Archie would advise
from which website it is possible to download the file.
2.2.2 Initiative of search engine development
The students Alan Emtage, Peter J. Deutsch, and Bill Heelan at McGill
University in Montreal, Canada, produced the initial search engine in 1990.
The initial tool is called Archie, short for archive. The program was
searching file names of the files and not individual pages.
If Archie was the grandfather of all search engines, then Veronica was the
grandmother. Developed by the University of Nevada Computing Services, it
was searching Gopher servers for files. A Gopher server stores plain-text
documents while an FTP server stores other kinds of files (images, programs,
etc.) also. Jughead performed functions similar to Veronica. [59]
By 1993, the Web was beginning to change. Rather than being populated
mainly by FTP sites, Gopher sites, and e-mail servers, web sites began to
grow. In response to this change, Matthew Gray introduced World Wide Web
Wanderer. The program was a series of robots that hunted down web URLs
and listed them in database called Wandex. [59]
Again around 1993, ALIWEB was developed as the web page equivalent to
Archie and Veronica. Instead of cataloging files or text documents,
webmasters would submit a special index file with site information. [59]
The next development in cataloging the web came late in 1993 with spiders.
Like robots, spiders scoured the web for web page information. These early
versions looked at the titles of the web pages, the header information, and
the URL as a source for key words. The database techniques used by these
early search engines were primitive.
For instance, a search process would give up hits (List of URLs / Links) in
the order that the hits (List of URLs / Links) were in the database. Only one
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 17
of these search engine made effort to rank the hits (List of URLs / Links)
according to the website’s relationships to the key words.
The first popular search engine, Excite, has its roots in these early days of
web cataloging. The Excite project was begun by a group of Stanford
undergraduates. It was released for general use in 1994. [59]
One of the earliest search engines to be built was Lycos, founded in January
1994, operational in June 1994, and a publicity traded company in April
1996. Lycos was born from research project at Carnegie Mellon University
by Br. Michael Mauldin. [59]
Again in 1994, two Stanford Ph.D. students posted web pages with links on
them. They called these pages Yahoo!. As the number of links began to grow,
they developed a hierarchical listing. As the pages become more popular,
they developed a way to search through all of the links. Yahoo! became the
first popular searchable directory. It was not considered a search engine
because all the links on the pages were updated manually rather than
automatically by spider or robot and the search feature searched only those
links. [59]
Another search engine, WebCrawler, went online in spring 1994. It was alsi
started as a research project, at the University of Washington, by Brian
Pinkerton. [19]
The first full-text search was WebCrawler. WebCrawler began as an
undergraduate seminar project at the University of Washington. It became
so popular that has virtually shut down the University of Washington's
network because of the amount of traffic it generated. Eventually, AOL
bought it and operated it on their own network. Later, Excite bought
WebCrawler from AOL but AOL still uses it in their NetFind feature. At
Home Corp. currently owns Webcrawler (as well as Excite and Blue
Mountain Cards). [59]
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 18
The next search engine to appear on the web was Lycos. It was named for
the wolf spider (Lycosidae lycosa) because the wolf spider pursues its prey.
According to Michael Maudlin in Lycos: Design choices in an Internet search
service" (1997), by 1997, Lycos had indexed more than 60,000,000 web
pages and ranked 1st on Netscape's list of search engines. [59]
The next major player in the search engine wars as it was started i.e.
Infoseek. The Infoseek search engine itself was unremarkable and showed
little innovation beyond Webcrawler and Lycos. What made this search
engine stand out was its deal with Netscape to become the browser's default
search engine replacing Yahoo. [59]
By 1995, Digital Equipment Corporation (DEC) introduced AltaVista. This
search engine contained some innovations that set it apart from the others.
First, it ran on a group of DEC Alpha-based computers. At the time, these
were among the most powerful processors in existence. This meant that the
search engine could run even with very high traffic hardy slowing down.
(The DEC Alpha processor ran a version of UNIX. From its inception, UNIX
had been designed for such heavy multi-use loads.) It also featured the
ability for the user to ask a question rather than enter key words. This
innovation made it easier for the average user to find the results needed. It
was also the first to implement the use of Boolean operators (and, or, but,
not) to help in refining searches. [59]
Next came HotBot, a project from the University of California at Berkeley,
designed as the most powerful search engine. [59] Hotbot was owned by
Wired, had funky colors, fast results, and a cool name that sounded geeky,
but died off not long after Lycos bought it and ignored it. [60] It’s current
owner, Wired Magazine claims that it can index more than 10,000,000 pages
a day. Wired claims that HotBot should be able to update its entire index
daily making it contain the most up-to-date information of any major search
engine. [59]
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 19
Google developed in Stanford University around 1998, used concept of link
popularity and Page Rank as its main ranking algorithm. Yahoo, launched
in 1994 by Stanford University, started out as a listing of personal favourite
websites with URL and description of each page. MSN-search is a search
engine owned by Microsoft, launched in 1999 and was powered by results
from Looksmart and Inktomi till 2004, after that it uses its own crawlerbased index. [59]
Since birth of modern Internet in early 1990s, need for IR led to growth,
dominance and detach of various search engines likeWandex, Aliweb,
Excite, Webcrawler, Lycos, AltaVista, Inktomi, AskJeeves and Northern
Light. [15]
However, majority(80%) of Internet users are hooked on to three search
engines – Google, Yahoo and MSN – Search. [15]
2.2.3 Brief about working of search engine
Search engines are a kind of tools which are designed to search information
on the Web. The search engine results are generally displayed in a vertical
sequence often referred to as search results pages. Links / URLs available
on those pages are referred as hits.
Search engine basically works on steps like, Web crawling, Indexing,
Searching, etc. They work by storing information about web pages in
databases, which they retrieve from the web.
A search engine takes advantage of the hyperlinks that connect Web sites on
the Internet. A software program called a Web crawler automatically browses
the Web in a systematic way and sends out inquiries that “crawl” from site
to site. [2]
Since crawler is a software program, it is given different instructions on
different computers. For instance, WebCrawler, a program launched in
1994, was the first software to index entire web sites rather than just page
titles. Search engine crawlers operate within different sets of instructions or
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 20
parameters, such as search titles and first paragraphs only, or to search
entire documents, including metadata. [2]
The information the crawler software collects automatically put into an
index, when a query is submitted to that search engine’s index. Each search
engine has its own index. Thus the index searched by Google is not the
same as the one searched by Yahoo or MSN (Microsoft Network). [2]
2.2.4 Optimization techniques used by existing search engines
The aim of SEO (Search Engine Optimization) is to get higher position of
links in organic listings. Set of techniques are used for going up to the top of
search engine listings. SEO has conceptually expanded to include all likely
ways of promoting web traffic. Mainly there are two approaches for listing
results on screen. One, organic (natural way listing) and second, pay per
click (paid listing).
Following are some examples of search engines that use Pay Per Click
Strategy [76]:
i.
Google
The Google AdWords program places paid listings within Google's
search results, as well as on some other sites that carry its listings.
ii.
Overture
Overture is the oldest major paid placement search engine. It
distributes its listings to a wide-range of search engines, including
that of its owner, Yahoo.
Overture launched as GoTo in 1997 and
incorporated the former University of Colorado-based World Wide Web
Worm. In February 1998, it shifted to its pay-for-placement model.
The company changed its name from GoTo to Overture in October
2001. It was purchased by Yahoo in October 2003.
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 21
Following are some more the most popular paid search engines listed
below [77]:
i. Yahoo! Search Marketing is the oldest pay per click search engine
(formerly Overture), which produces relevant results. It incorporates
its paid listings into some of major search engines.
ii. FindWhat.com is an important pay-per-click search engine with
results incorporated into many metacrawlers search results. Bids
start at $0.01.
iii. Kanoodle offers a paid search listings with distribution to a large
network of other search engines and search box providers. Bids start
at $0.01.
iv. Sprinks: Pay-per-click searching service provided by About.com that
sends links to some meta search engines, and the Sprinks site itself.
v. Search123: Pay-per-click search engine that incorporates its paid
listings on the sites of some traffic partners.
vi. Xuppa is a paid search placement service with distribution on some
metacrawlers. Previously named as Bay9.
vii. Ah-ha.com powered by FAST Search, allows paid listings to appear at
the top of its results.
viii. ePilot.com is a pay-per-click search engine which distributes its
results on many search partners. Bids start at $0.01.
ix. ValleyAlley: Pay-per-click searching service that sends links to some
meta crawlers. Bids start at $0.01.
x. Win4Win: A paid search engine, which provides top listing for
advertisers in its results.
xi. theInfoDepot.: Pay-per-click search engine that uses Open Directory
Project database.
xii. eFind.com allows advertisers to bid for the top of listing with search
results.
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 22
Organic search result listings appear as search results without the
payment of a special charge to the search engine provider. Pay per click
strategy is to gain company’s revenue.
On and Off Page Components of Search Engine Optimization
SEO is mainly achieved by the combination of 2 main factors mainly on
page and off page factors. [75]
1. On page factors
Search Engine Optimization refers to the text and content on web site pages.
It acts as the foundation for the ongoing SEO process. Its work on the
website and content, so that the search engine can find the web page when
searching for the web sites for a particular keyword. This has a significant
impact on search engine results. [75]
Some of the on page factors include:
i. Search engine friendly web page URLs in the site. The inner pages of
the site have the URLs followed by the domain and then describing the
content.[75]
ii. Optimization of all meta tags which mainly includes title, keywords
and description. [75]
iii. Internal linking between the pages in the site. Internal linking must be
done wherever required such that google does not spam it. The major
pages of the site must be linked to the homepage. [75]
iv. Creation of sitemap is important so that all web pages are indexed by
search engines. [75]
v. Good quality content is liked by most of the search engines. Content
should be information rich which is relevant and inspiring-yet does
not forget the spiders. [75]
vi. Make sure that html code is free of errors and warnings. [75]
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 23
Some of the on page factors to be avoided:
i.
Hidden link or text: use of hidden or invisible text for getting listed on
search engines by using a font color similar to the background color
must be avoided which will affect page rankings. [75]
ii.
Cloaking mechanisms: never show up two different versions of the
site, one for the search engines and a completely different page to real
users. This will risk the site on being penalized. [75]
iii.
Duplicate content to be avoided. There is no substitute for unique,
original and useful content.[75]
2. Off page factors
As the name indicates off page optimization is the work that needs to be
done off the pages of the website. [75]
Some of the off page factors include:
i.
Use of anchor text in the links wherever required according to
relevancy. Also the text surrounding the links should not be
ignored.[75]
ii.
Building quality links for link building purposes which include
relevance, page rank and authority sites. [75]
iii.
Link popularity can be attained by using social networking sites, log
commenting,
forum
postings,
article/press
release
promotions,
directory submissions, link baiting, posting classified ads, link
exchange with relevant sites and so on.[75]
On page and off page factors are two different aspects of SEO efforts which
work towards getting qualified traffic which provides the path that leads
towards conversion. [75] Compared to off page factors, on page optimization
is relatively easy to achieve. [75]
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 24
Elements of Search Engine Optimization
The major elements of SEO are:
i.
Keyword-rich text: According to design matters to the search
engines one can access the keyword-rich text, and it matters to
human visitors so they can easily find that keyword-rich text once
they arrive at site. [74]
ii.
Site and page architecture: To get optimized results from search
engine robust site and page architecture plays vital role.[74]
iii.
Link development: This is one of the most overlooked components
of a successful SEO. Link development can be defined as collection
of links for the site from other web sites which to improve search
engine ranking of the web site. [74]
2.2.5 Limitations of existing search engines
When discussing web search engines, in most cases one arrives quickly at a
discussion of Google. In fact, Google is often seen as synonymous with web search.
[68] It may be irritating to see that many search engines claiming to search the
‘whole of the web’ are available on the market; however, only a few of them have
their own, web-scale index. Outside of these few, most search engines license
search results from other search engines, the most famous example being Yahoo
using results from Microsoft’s Bing search engine (Microsoft, 2009) [25]
Another point to consider is the market shares of different search engines.
While there may be at least a small variety of web search engines, user’s
acceptance of these choices differs greatly among them. When discussing
the search engine market, it is often forgotten that while search engines are
surely commercial enterprises, they also serve as facilitators of information,
and therefore, they serve as the interest of the public. When considering
that mainly one search engine is used, one has to ask whether this search
engine does indeed serve these interests? [25]
Size of internet in terms of data continues to grow exponentially. No single
search engine indexes more than about one-third of ‘indexable web’, and
combining results of 6 search engines yields about 3.5 times as many
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 25
documents on an average as compared to the results from only one engine.
Search engines do not sites equally and no engine indexes more than about
16% of web. Major engines index less than half of the web and average
overlap between engines is very small. Indexable web is approx. 11.5 billion
pages. Intersection of Google, MSN, Ask and Yahoo indexes is 28.85%, or
about 2.7 billion pages, and their union is about 9.36 billion pages. Even if
two search engines use same databases, search results may vary, because
each search engine uses its own ranking algorithm. Non indexable web often
contains large amount of data, whose major part is not available through
traditional search engines. A combination of retrieval paradigms brings
improvements in information retrieval results. Coverage limitations, nonuniform
user
interfaces,
query
limitations
and
duplicates,
lower
effectiveness of search engines. This has led to the development of metasearch engines. [15]
It is true, that no single search tool indexes the entire web. In the late 90’s,
the web had between 6 and 8 billion web pages. At that time, Google indexed
2.4 billion, AllTheWeb 2.1 billion and AltaVista about 1 billion. Meta-search
engines were designed to fill in the gaps by searching many search engines
simultaneously. [58]
2.2.6 Initiative of meta-search engine development
During 1995, a novel type of search engine was introduced called metasearch engine. The idea was simple. The meta-search engine would get user
input key words from the user and then forward all keywords to all of the
most important search engines. These search engines would send the hits
(URLs / Links) back to the meta-search engine and the meta-search engine
would set-up the hits (URLs / Links) all on single page for concise viewing.
The first of these meta-search engines was Metacrawler. Metacrawler took
the output of the search engines but not the advertising banners that users
of the search engines see reducing the advertising revenues of the search
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 26
engine companies. Metacrawler finally relented and began including the
banner ads with each set of search results. [59]
Besides Metacrawler, other major meta-search engines exist including
ProFusion, Dogpile, Ask Jeeves, and C-Net's Search.com. [59]
Examples of traditional meta-search engines are: Mamma.com, Dogpile.com,
Metasearch, Ixquick.com, Clusty, Hotbot, etc. [58]
The first meta-search engine launched focusing on travel in 2000, which is
domain specific. The earliest versions were not web based; users had to
download special software on to their desktop. These sites moved to the web
and became more mainstream in late 2003 and throughout 2004. Several
sites now play in this space like Kyak, Mobissimo, Travelzoo, etc. [58]
Most meta-search engines draw their search results from multiple other
search engines, then combine and re-rank those results is known fact. This
was a useful feature back when search engines were less sense at crawling
the web and each engine had a significantly unique index. [60]
Unlike most meta-search engines, Hotbot only pulls results from one search
engine at a time from the web. Currently Dogpile, owned by Infospace, is
probably the most popular meta-search engine on the market, but like all
other meta-search engines, it has limited market share. [60]
2.2.7 Optimization techniques used by existing meta-search engines
Optimization
techniques
used
by
meta-search
engines
are
also
advertisement prone like search engines. Another strategy is pay per click
strategy, which is based on paid listing strategy.
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 27
2.2.8 Difference between meta-search engines and search engines
The major differences between meta-search engines and search engines are:
i.
Meta-search engines do not have own database of web pages like
search engine, thus do not need indexing.
ii.
Meta-search engines provide search results by forwarding user input
search text to other search engines and merging the results returned
by different search engines.
In the area of Search Engine and Meta-Search Engine following type of work
has been done:
i.
Well known web search engine, in which whenever user provides text
to get the information in search browser then web search engine
provides information about many web pages, which are retrieved, from
the Web itself.
ii.
iii.
Search optimization technique is developed.
Meta-Search
Engine
search
tool
is
developed
which
provides
information by searching information from various fixed number of
search engines and finally retrieves aggregate result. Some of them are
domain specific. However, there are some problems related to timeout.
2.2.9 Existing meta-search engines
By doing literature survey information about various existing meta-search
engines are collected and summarized as below:
1. Mamma: It is a mother of meta-search engine having time out
problem.
2. Blingo: Retrieves search results form single search engine that is,
Yahoo. It makes its revenue like any other search engine when user
clicks on a sponsored link on result page.
3. Yippi: Retrieves search results for conservative values.
4. DeeperWeb: It offers integration with Google search engine and
retrieves results from Google only.
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 28
5. Dogpile: It fetches search results from three fixed number of search
engines: Google, Yahoo and Yandex.
6. Excite: It fetches search results from three fixed number of search
engines: Google, Yahoo and Yandex.
7. Harvester42: It is for information related to genes and proteins from
several species.
8. HotBot: Not functioning properly through user interface.
9. Info.com: It fetches search results from fixed number of search
engines: Google, Yahoo, Bing
and Yandex. It uses pay per click
strategy.
10. Kyak: It is a travel meta-search engine. It cannot be used for other
information related search.
11. Metacrawler: It fetches search results from search engines like,
Google, Yahoo and Yandex.
12. Mobissimo: It is a travel meta-search engine. It cannot be used for
searching other information.
13. Otalo: It is also a travel meta-search engine and It cannot be used for
other information search.
14. Ixquick: It returns the search results from multiple search engines. It
uses a star to rank its results by giving one star for every search
result that has been returned from a search engine.
15. PCH Search and Win: Retrieves search results from Google and
Yahoo.
16. SideStep: Meta-search engine for travel.
17. WebCrawler: Retrieves search results from Google and Yahoo.
2.2.10 Difference between existing meta-search engines
Meta-Search Engines differ on the basis of their functionalities featured, in
particular for the way they transmit the user query to the search engines and for
the way they collect and present the obtained results. For instance some MetaSearch Engines simply append the obtained results without performing any
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 29
processing on those results. Some of them directly present parts of the pages
returned by the search engines.
Existing meta-search engine uses fixed number of search engines. Some of
them retrieve results from one or two search engines like, Blingo and
WebCrawler. Some of them are domain specific, specifically they are for
travel based search like, Kyak, Mobissimo, Otalo and SideStep. Mother of
meta-search engines popularly known as mamma faces problem related to
time out during search process. And most of meta-search engines use
various marketing strategy to gain their revenue.
2.2.11 Limitations of existing meta-search engines
One of the major problems with meta-search in general is that most metasearch engines tend to mix pay per click ads in their organic search results, and for
some commercial queries 70% or more of the search results may be paid results.
[60]
It is known that meta-search engines groups results from different search
engines and displays results on screen based on own ranking algorithm. In
meta-search engine optimization, it is required to have proper ranking
algorithm, which may result in an organic way (Natural Way).
The problem of meta-search is known as the rank aggregation problem,
where meta-search engine submits a query to multiple search engines, and
then has to combine the individual ranked lists returned into a single
ranked list, which is to be presented to the user. One of the problems that a
meta-search engine has to solve, when combining results, is that of
detecting and removing duplicate web pages that are returned by several
search engines. [11]
Moreover, meta-search tools have no database of their own, but send the
same enquiry to a variety of search engines. [26]
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 30
One can summarize focused limitations of meta-search engines as below:
i.
Existing meta-search engines are subject to time outs when search
processing takes too long time, in that case it retrieves the few of
required hits from each search engine. And the total number of hits
retrieved may be considerably less than the total hits by doing a direct
search on one of a search engine. Mother of meta-search engine,
popularly known as “mamma” has this problem.
ii.
Existing meta-search engines work on fixed number of multiple search
engines. Some of them are using only one or two search engines.
iii.
Some of existing meta-search engine is domain specific.
iv.
Optimization technique used by some of meta-search engine is pay
per click strategy used.
2.2.12 Need of ranking method in meta-search engine
A meta-search engine has the advantage of being lightweight, since there is
no need for crawling and large-scale indexing. [11]
Meta-search engines often have only light information about the relevance of
web pages returned for s search query. In many cases all that the metasearch has to go with is a ranked ordering of the returned results, and a
summary of each of the web pages included in the results. Despite this,
some meta-search engines rely on relevance scores to combine the results,
which means that they need to infer the scores in some way, while other
meta-search engines combine the results based solely on the ranked results
obtained from the search engines queried. [11] Meta-search engine ranking
algorithms may differ to generate aggregate list of search results, they may
require training data to update search result position based on rank. Also
require to learn about the search engines they are querying. [11]
A
meta-search
engine,
which
uses
relevance
scores,
can
store
a
representative of each search engine, giving an indication of the contents of
the search engine’s index. The index of representatives could be built as the
meta-engine is queried, so that it is compact and represents user queries
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 31
rather than the full set of keywords in the underlying search engine’s index.
The meta-index enables a standard normalization of relevance scores across
all the search engines deployed. In order to get the relevance information
about the web pages returned, the meta-search engine can simply download
these pages before merging the results, but this will, obviously, slow down
the response time for the query. [11] Hence, there is a need of good ranking
method in meta-search engine.
2.2.13 Ranking methods
2.2.13.1 About ranking
A ranking is a relationship between a set of items such that, for any two
items, the first is either ‘ranked higher than’, ‘ranked lower than’ or ‘ranked
equal to’ the second. The web search engine may rank the pages it finds
according to an estimation of their relevance, making it possible for the user
quickly to select the pages they are likely to want to see. [86]
2.2.13.2 Strategies for assigning rankings
It is not always possible to assign rankings uniquely. For example, in a race
or competition two (or more) entrants might tie for a place in the ranking. When
computing an ordinal measurement, two (or more) of the quantities being ranked
might measure equal. In these cases, one of the strategies shown below for
assigning the rankings may be adopted. [86]
A common shorthand way to distinguish these ranking strategies is by the
ranking numbers that would be produced for four items, with the first item
ranked ahead of the second and third (which compare equal) which are both
ranked ahead of the fourth. [86]
Ranking strategies
Standard competition ranking ("1224" ranking)
In competition ranking, items that compare equal receive the same ranking
number, and then a gap is left in the ranking numbers. The number of
ranking numbers that are left out in this gap is one less than the number of
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 32
items that compared equal. Equivalently, each item's ranking number is 1
plus the number of items ranked above it. This ranking strategy is
frequently adopted for competitions, as it means that if two (or more)
competitors tie for a position in the ranking, the position of all those ranked
below them is unaffected (i.e., a competitor only comes second if exactly one
person scores better than them, third if exactly two people score better than
them, fourth if exactly three people score better than them, etc.). [86]
Thus if A ranks ahead of B and C (which compare equal) which are both
ranked ahead of D, then A gets ranking number 1 ("first"), B gets ranking
number 2 ("joint second"), C also gets ranking number 2 ("joint second") and
D gets ranking number 4 ("fourth"). [86]
Modified competition ranking ("1334" ranking)
Sometimes, competition ranking is done by leaving the gaps in the ranking
numbers before the sets of equal ranking items (rather than after them as in
standard competition ranking). The number of ranking numbers that are left
out in this gap remains one less than the number of items that compared
equal. Equivalently, each item's ranking number is equal to the number of
items ranked equal to it or above it. This ranking ensures that a competitor
only comes second if they score higher than all but one of equal ranking
items. [86]
Thus if A ranks ahead of B and C (which compare equal) which are both
ranked ahead of D, then A gets ranking number 1 ("first"), B gets ranking
number 3 ("joint third"), C also gets ranking number 3 ("joint third") and D
gets ranking number 4 ("fourth"). In this case, nobody would get ranking
number 2 ("second") and that would be left as a gap. [86]
Dense ranking ("1223" ranking)
In dense ranking, items that compare equal receive the same ranking
number, and the next item(s) receive the immediately following ranking
number. Equivalently, each item's ranking number is 1 plus the number of
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 33
items ranked above it that are distinct with respect to the ranking order.
[86]
Thus if A ranks ahead of B and C (which compare equal) which are both
ranked ahead of D, then A gets ranking number 1 ("first"), B gets ranking
number 2 ("joint second"), C also gets ranking number 2 ("joint second") and
D gets ranking number 3 ("third"). [86]
Ordinal ranking ("1234" ranking)
In ordinal ranking, all items receive distinct ordinal numbers, including
items that compare equal. The assignment of distinct ordinal numbers to
items that compare equal can be done at random, or arbitrarily, but it is
generally preferable to use a system that is arbitrary but consistent, as this
gives stable results if the ranking is done multiple times. An example of an
arbitrary but consistent system would be to incorporate other attributes into
the ranking order (such as alphabetical ordering of the competitor's name)
to ensure that no two items exactly match. [86]
With this strategy, if A ranks ahead of B and C (which compare equal) which
are both ranked ahead of D, then A gets ranking number 1 ("first") and D
gets ranking number 4 ("fourth"), and either B gets ranking number 2
("second") and C gets ranking number 3 ("third") or C gets ranking number 2
("second") and B gets ranking number 3 ("third"). [86]
Fractional ranking ("1 2.5 2.5 4" ranking)
Items that compare equal receive the same ranking number, which is
the mean of what they would have under ordinal rankings. Equivalently, the
ranking number of 1 plus the number of items ranked above it plus half the
number of items equal to it. This strategy has the property that the sum of
the ranking numbers is the same as under ordinal ranking. [86]
Thus if A ranks ahead of B and C (which compare equal) which are both
ranked ahead of D, then A gets ranking number 1 ("first"), B and C each get
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 34
ranking number 2.5 (average of "joint second/third") and D gets ranking
number 4 ("fourth"). [86]
For example: Suppose, the data set available is, 1 1 2 3 3 4 5 5 5. There are
5 different numbers, so there would be five different ranks. If 1 and 1 were
actually different numbers, they would occupy ranks 1 and 2. Since they are
the same number, you find there rank by finding the average as follows:
(rank) 1 + (rank) 2 / 2 numbers total = 1.5 (average rank). The next number
in the data set, 2, is thus assigned the rank of 3 (the average takes up 1 and
2 in the first two 1's). The two 3's in the set would occupy ranks 4 and 5 if
they were different numbers, so the average rank would be computed as
follows: 4 + 5 / 2 = 4.5, 4 would get the rank of 6 (because your average
took into account rank 4 and 5 in the average). There are 3 5's in the data
set. Their average rank is computed as "7+8+9/3 = 8 [86]
Resultant ranks would be: 1.5 1.5 3 4.5 4.5 6 8 8 8 [86]
2.2.13.3 Ranking methods in search engines
Search engine ranking methods are closely secured secrets, for at least two
reasons: search engine companies want to protect their methods from their
competitors, and they also want to make it difficult for web site owners to
manipulate their rankings.
A specific page's relevance ranking for a specific query currently depends on
three factors:
i.
Its relevance to the words and concepts in the query.
ii.
Its overall link popularity.
iii.
Whether or not it is being penalized for excessive search engine
optimization (SEO).
2.2.13.4 Ranking methods in meta-search engines
Meta-search engines are tools that receive user queries and dispatch them
to multiple search engines (they are also called component engines for metaDesigning Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 35
search engines). Then, meta-search engine collect the returned results,
reorder them and display the ranked result list to the user. The ranking
methods that meta-search engines utilize are based on a variety of
parameters, such as the ranking a result receives and the number of its
appearances in the component engine’s result lists. These parameters are
being used to compute a rank (also called score) for each received result.
Better results organization can be achieved by employing good ranking
methods that take into consideration additional information about a web
page. Another core step is to implicitly collect some data concerning the user
that submits the query. This will assist the engine to decide which results
suit better to his / her informational needs.
However, none of these studies propose a ranking method that is suitable
for meta-search engines. The existing methods assign scores according to
objective criteria, such as the rank, a result receives from the component
engines etc. None of them can accept any kind of input from different users
(subjective data) and produce different results respectively.
In other words, the current methods lack ranking method, which offers
competitive advantage to URLs position on resultant page and output the
similar kind results for the similar kind queries, submitted by different
users.
2.2.13.5 Integrating site into search engines using ranking method
Following are several steps that need to be followed to integrate site into
search engine [27]:
i.
Choosing the right keywords that are going to bring the most hits on
web site.
ii.
iii.
Using the right title tags on website.
Ensuring appropriate content writing on web site.
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 36
iv.
Choosing
the
right
search
engines
to
submit
web
site
and
understanding the free and paid listing service options available.
The base case is that spiders crawl the entire Web, starting from known
pages and following all links, and also crawling pages that are handsubmitted like Google. If a site has high PageRank, it is spidered more often
and more deeply.
However, search engines are trying to encourage site owners to pay for the
privilege of having their pages spidered. Teoma's index is very hard to get
into without paying money, and Inktomi's isn't that easy either. And even if
users do get into Inktomi for free, they'll take a long time to respider, while if
users pay they respider constantly.
Advantage of being respidered often is that users can twist their page and
page
contents
to
come
up
higher
in
their
relevancy
rankings.
Users can also pay to appear on a search page. That is, user’s link will
appear when someone searches on a specific keyword or keyphrase. Google
does a good job of making it pretty clear which results at the top or on the
right of the page are paid.
Paid search results are typically all pay-per-click, based on keyword. The
advertiser pays the search engine vendor a specific amount of money each
time a link is clicked.
Use of meta tag
Meta tags are a key part of the overall search engine optimization program
that needs to implement for web site. Meta tags have never guaranteed top
rankings on crawler-based search engines, but they may offer a degree of
control and the ability to impact how web pages are indexed within search
engines. [27]
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 37
Meta tags give search engines more information about a web page. This is
implicit information, which means it is not visible to visitors of the web page
itself. [87]
Meta tags can be found in the <head> element of a web page. Because, by
putting meta tags in the <body> part, some browsers may not recognize
them. [87]
Often, meta tags contain a name attribute, which sets a type of metadata.
The value of this metadata is expressed through a content attribute. The
meta description tag is most useful tag, as the name suggests, it gives
search engines a short description about the web page. [87] That is given as
below:
<meta name=”description” content=”about search engine optimization”/>
2.2.14 Meta-Search Engine perspective
As it is known that, a meta-search engine represents result from the
combination of multiple search engines where in it provides a better
performance than any individual search engine. The advantages of metasearch engines are that the results can be presented using different ranking
formulas and their attributes. This can be more specific than the output of
individual search engine. Therefore retrieval of the results should be
simpler. In most of cases, the search result is not necessarily all the web
pages matching the user input search query, as the number of results per
search engine retrieved by the meta-search engines are limited. Pages
returned by more than one search engine should require aggregating on
meta-search engine.
It is observed that the volume of information on the web is vast and that is
been covered by search engine for user input search text. Using a metasearch to obtain large data base contents of search engine is very important
on the web. It is known that major search engines cover only relatively a
small portion of the entire web.
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 38
Meta-Search engine uses source of different search engines. They have to
provide a better, specific and improved search results. It is found that to
achieve higher quality through combination process, it is necessary that the
input module retrieve not just different form of information, but they should
provide different relevant information using rank. Different retrieval
algorithms are used to retrieve many of the same relevant information. To
get reliable search results a good combination technique is required.
Reliable behaviour is considered to be another important and desirable
quality of a meta-search engine. It was proved that the same search engine
often returns results to the same input search text in different way over
time, which may be due to the evolution of the database and different
ranking algorithm. With database it is observed that each search engine
have its strengths and weakness, performing well on some input search text
and inadequately on others.If meta-search engine has own database then
this problem can be minimized.
Meta-search engine is the solution that provides all of search engine
information that can be incorporated logically in such a way that it takes the
advantage of each.
2.3 SCOPE OF RESEARCH
There is a great scope of research in designing a new model of Meta-Search
Engine in terms of improving efficiency and effectiveness of results using
optimization techniques using following strategies:
i.
Change in ranking formulas
ii.
Use of databases for indexing purpose
iii.
More normalization of databases
iv.
New strategy to improve response time
v.
Proper design of page to increase load speed
Designing Model for Meta-Search Engine
Chapter-2 : Literature survey and scope of research 39
2.4 SUMMARY
This chapter presents history of web surfing. It also presents initiative of
search
engine
development,
working
of
search
engine,
optimization
technique used by search engine, limitations of search engine and difference
between search engine and meta-search engine. Moreover, it provides list of
existing meta-search engines with their functionalities, differences and
limitations. It also gives overview of different ranking methods. It also
presents scope of research in this area.
Designing Model for Meta-Search Engine