You are on page 1of 14

Information Retrieval

Lesson 6
What a Search Engine Is?

As the term is generally used, a


search engine has two parts:
1. A "robot" or "crawler" that goes to every
page or representative pages on the
Web and creates a huge index
2. A program that receives your search
request, compares it to the entries in the
index, and returns results to you
Basic task of search engine
Internetsearch engines are special sites on the Web that are
designed to help people find information stored on other sites.
There are differences in the ways various search engines work,
but they all perform three basic tasks:
1. They search the Internet -- or select pieces of the Internet -- based
on important words.
2. They keep an index of the words they find, and where they find them.
3. They allow users to look for words or combinations of words found in
that index.
4. Early search engines held an index of a few hundred thousand
pages and documents, and received maybe one or two thousand
inquiries each day. Today, a top search engine will index hundreds of
millions of pages, and respond to tens of millions of queries per day.
In this article, we'll tell you how these major tasks are performed, and
how Internet search engines put the pieces together in order to let
you find the information you need on the Web
The Major Search Engines and How They Work

Most if not all of the major search engines attempt to do


something close to indexing the entire content of the World
Wide Web.
Once a site's pages have been indexed, the search engine
will return periodically to the site to update the index.
Some search engines give special weighting to: words in
the title, in subject descriptions and keywords listed in
HTML META tags, to the first words on a page, and to the
frequent recurrence (up to a limit) of a word on a page.
Because each of the search engines use a somewhat
different indexing and retrieval scheme (which is likely to
be treated as proprietary information) and because each
search engine can change its scheme at any time, we
haven't tried to describe these here.
Major Search engines
AltaVista
Sponsored by Digital Equipment Corp.,
processes more than 2.5 million search
requests every day. It has cataloged more
than 15 billion words on some 30 million
Web pages as well as all 13,000 Usenet
newsgroups. It collects Web pages at the
rate of 2.5 million a day.
Find AltaVista at
http://www.altavista.digital.com
Major Search engines cont.
Excite
Has a database of 1.5 million Web pages
that you can search by keyword or by
concept.
In addition, it has a browsable directory of
more than 50,000 reviewed Web sites, a
Usenet database of more than 1 million
articles, and a search of the Usenet
classifieds from the last 2 weeks.
Find Excite at http://www.excite.com
Major Search engines cont.
HotBot
Features a menu-driven search
engine. You can search by file type,
date, geographic location and domain,
and Web site.
Find HotBot at http://www.hotbot.com
Major Search engines cont.
InfoSeek
Is a full-text search system with which you
can look for Web pages, Usenet newsgroups,
and FAQs. A normal, free search is limited to
the first 100 matches.
If you subscribe to InfoSeek Professional,
you can search computer, medical, and
business news, press releases, and
technical-support databases.
Find InfoSeek at http://www2.infoseek.com
Major Search engines cont.
Lycos
is used by more than 500,000 people
every week and catalogs some 20
million Web pages, FTP sites, and
Gopher sites.
Find Lycos at http://www.lycos.com
Major Search engines
Open Text Index
Is a very powerful, multilingual search
engine with which you can do a
weighted search and receive
information that is ranked by
relevancy.
Find Open Text at
http://www.opentext.com:8080
Major Search engines cont.
WebCrawler
Is a free service from America Online
that gives you fast access to a 200-
megabyte database of 2 million
indexed Web documents.
Find WebCrawler at
http://webcrawler.com
Major Search engines cont.
Yahoo
Lists more than 200,000 Web sites in
more than 20,000 categories. A utility
at this site lets you extend your
search to other search engines, such
as AltaVista, Lycos, or WebCrawler.
Find Yahoo at
http://www.yahoo.com
Major Search engines cont.
Yehey
Is the first Filipino Search Engine.
Major Search engines cont.
Google
"Googol" is the mathematical term for a 1
followed by 100 zeros.
The term was coined by Milton Sirotta,
nephew of American mathematician Edward
Kasner, and was popularized in the book,
"Mathematics and the Imagination" by Kasner
and James Newman.
Google's play on the term reflects the
company's mission to organize the immense
amount of information available on the web.

You might also like