You are on page 1of 31

Koneru Lakshmaiah College Of Engineering

(Autonomous)
Freshman Engineering Department
Green Fields, Vaddeswaram, Guntur-522502
ANDHRA PRADESH, INDIA

MINI PROJECT

INTERNET SEARCHING
BY

Mr. PAWAN RAJ PHUYAL (Y8IT284)

Mr. SANTOSH BHANDARI(YTIT297)

Mr. Md. SHAHBAZ HAIDER(Y8CE240)

I/IV B. Tech.:- Information Science & Technology


Engineering & Civil

Lecturer In-charge(s)

Ms. Anila & Ms.Tripura


Koneru Lakshmaiah College Of Engineering

CERTIFICATE
This is to certify that the students of I/IV B. tech. Mr. PAWAN
RAJ PHUYAL (Y8IT284), Mr. SANTOSH BHANDARI(Y8IT297)
& Md. SHAHBAZ HAIDER(Y8CE240) have done a mini project in
the field of Internet Searching in the year 2009-06.

Head of the Department Lecturer In-charge


Acknowledgement
This is to express our heartfelt gratitude to our all supporters,
guidance and friends who inspired us to prepare and collect the
datas and helped us to complete this project successfully besides
many obstacles.
Our sincere thanks goes to the senior lecturers and our guides
Ms Anila & Ms Tripura who encouraged us providing the
valuable ideas.
Finally, we want to give special thanks to one and all that either
directly or indirectly co-operated in the successful completion of
this mini project.

Mr. PAWAN RAJ PHUYAL (Y8IT284)


Mr. SANTOSH BHANDARI (Y8IT297)
Mr. Md. SHAHBAZ HAIDER (Y8CE240)
1/4 B. Tech. Information Science & Technology
1/4 B. Tech. Civil Engineering
CONTENTS
1: Introduction
 Internet
 What is WWW

2: History of Internet
3: Uses of Internet
 Reasons why people use internet
 Why do people put thing on web

4: How Web Works


5: Searching the Internet
 Finding things on the web
 Search tips
 Search engines
 Search engine directories
 Some most popular search engines
 Limitation of search engines

6: Steps of Searching
7: Browsing the Internet
 Web browsers
 Useful web browsers

8: Demerits of Internet Searching


 Thief of personal information
 Spamming
 Virus threat
 Pornography

9: Conclusion
Chapter 1
Introduction
The Internet is a global system of interconnected computer networks that
use the standardized Internet Protocol Suite (TCP/IP). It is a network of networks
that consists of millions of private and public, academic, business, and government
networks of local to global scope that are linked by copper wires, fiber-optic
cables, wireless connections, and other technologies.

The Internet carries a vast array of information resources and services, most
notably, the inter-linked hypertext documents of the World Wide Web (WWW)
and the infrastructure to support electronic mail, in addition to popular services
such as online chat, file transfer and file sharing, online gaming, and Voice over
Internet Protocol (VoIP) person-to-person communication via voice and video.

The origins of the Internet reach back to the 1960s when the United States funded
research projects of its military agencies to build robust, fault-tolerant and
distributed computer networks. This research spawned world-wide participation in
the development of new networking technologies and led to the commercialization
of an international network and the popularization of countless applications in
virtually every aspect of modern human life. By 2009, an estimated quarter of
Earth's population uses the services of the Internet.

There are several different ways to look at what the Internet actually is

At the highest level, the Internet is the


people that use it - the global community of
users.
At another level, the Internet is a set of protocols that define the rules of
how the computers will transfer information with one another.

At the lowest level, it is the hardware behind the computer networks - the
computers, modems, phone lines and cables that link together to form a huge
network.

Who Controls the Internet?

The Internet is a kind of anarchy. Everyone looks after their own little
Internet 'patch', but no one is responsible for looking after it as a whole. It would
be nearly impossible to control the Internet now - and trying to would certainly
destroy it. But datas available in the internet are updated, refreshed & released by
their responsible sites & organizations.

What is the World Wide Web?

The official definition of the WWW is "wide-area hypermedia information


retrieval initiative aiming to give universal access to a large universe of
documents."

wide-area: The World Wide Web spans the whole globe.

hypermedia: It contains various types of media (text, pictures, sound, movies ...)
and hyperlinks that connect pages to one another.
Chapter 2
History of the Internet
The Internet was born about 20 years ago, as a U.S. Defense Department
network called the ARPnet. information and comments with millions of people all
over the world, get a fast answer to any question imaginable on a scientific,
computing, technical, business, investment, or any other subject. You could join
over 11,000 electronic conferences, anytime, on any subject, you would be
broadcasting your views , questions, and information to millions of other part.
There has never been anything like it in the history of the world, and in this
English class we've covered a lot of history. At a growing rate of about 20% per
month the Internet is only getting bigger and if people don't start utilizing its
resources they could be road kill on this Information Superhighway. Hey, I'll bet in
the middle of that last sentence another computer just got on-line to the Net. There
are three major features of the Internet, On-line discussion groups, Universal
Electronic Mail, files and software. There's about 11,000 on-line discussion groups
called Newsgroups, on most any topic you can imagine. If you are on the Net, you
can participate in any of these discussions in any of these newsgroups. The next
thing is Universal Electronic Mail or E-mail. E-mail is the biggest and cheapest
system on the Net and is also one of its biggest attractions. Since all commercial
on-line services have something called gateways for sending and receiving
electronic mail messages on the Internet, you're able to send and receive messages
or files to anyone else who is on-line, anywhere in the world and in seconds. The
third feature I mentioned was files and software. This in my opinion is the most
impressive one. All the thousands of individual computer facilities connected to
the Internet are also vast storage repositories for hundreds of thousands of software
programs, information text files, video and sound clips, and other computer based
resources. And their all accessible in minutes from any personal computer on-line
to the Internet. So I could do all this stuff on the Internet, why should I take notice?
Because of its sheer size, volume of messages, and it's incredible monthly growth.
From the latest statistics I was able to get, there are currently 30 million people
who use the Internet worldwide

Before the widespread internetworking that led to the Internet, most


communication networks were limited by their nature to only allow
communications between the stations on the local network and the prevalent
computer networking method was based on the central mainframe computer
model. Several research programs began to explore and articulate principles of
networking between physically separate networks, leading to the development of
the packet switching model of digital networking. These research efforts included
those of the laboratories of Donald Davies (NPL), Paul Baran (RAND
Corporation), and Leonard Kleinrock at MIT and at UCLA. The research led to the
development of several packet-switched networking solutions in the late 1960s and
1970s,[1] including ARPANET and the X.25 protocols. Additionally, public access
and hobbyist networking systems grew in popularity, including unit-to-unfix copy
(UUCP) and FidoNet. They were however still disjointed separate networks,
served only by limited gateways between networks. This led to the application of
packet switching to develop a protocol for internetworking, where multiple
different networks could be joined together into a super-framework of networks.
By defining a simple common network system, the Internet Protocol Suite, the
concept of the network could be separated from its physical implementation. This
spread of internetworking began to form into the idea of a global network that
would be called the Internet, based on standardized protocols officially
implemented in 1982. Adoption and interconnection occurred quickly across the
advanced telecommunication networks of the western world, and then began to
penetrate into the rest of the world as it became the de-facto international standard
for the global network. However, the disparity of growth between advanced
nations and the third-world countries led to a digital divide that is still a concern
today.

Following commercialization and introduction of privately run Internet service


providers in the 1980s, and the Internet's expansion for popular use in the 1990s,
the Internet has had a drastic impact on culture and commerce.
Chapter 3
Uses of Internet
The internet is computer based global information system. It is
composed of many interconnected computer networks. Each network may
link thousands of computers enabling them to share information. The
internet has brought a transformation in many aspects of life. It is one of the
biggest contributors in making the world into a global village. Use of
internet has grown tremendously since it was introduced. It is mostly
because of its flexibility. Nowadays one can access the internet easily.
Most people have computers in their homes but even the ones who don’t
they can always go to cyber cafes where this service is provided.

The internet developed from software called the ARPANET which the U.S
military had developed. It was only restrict to military personnel and the
people who developed it. Only after it was privatized was it allowed to be
used commercially.

The internet has developed to give many benefits to mankind. The access
to information being one of the most important. Student can now have
access to libraries around the world. Some charge a fee but most provide
free services. Before students had to spend hours and hours in the libraries
but now at the touch of a button students have a huge database in front of
them

3.1 Reasons Why People use Internet


It’s the first month of a new year and at this time I’m itching to start new web
ventures both for fun and profit. I usually do up a list of possible startup and site
ideas and narrow them down into those with the highest potential. But success
depends on execution and not just plans so I tend not to be too hung up about
having a complete vision of what I want.

A little vagueness won’t hurt. I can always muddle through and change things up
in response to market conditions or personal interest. No need to be perfect from
the start.
I looked at many websites to study their methods, to learn what made them a success. I started
planning what specific niche I wanted to explore and suddenly realized that I was
thinking about the whole thing in a roundabout way.

There’s really no need to think hard about having the perfect idea. The foundations
of popular and profitable websites/services are deeply related to the basic reasons
why people get online and use the internet. Let’s do some reverse engineering
from that perspective.

So, why do people worldwide use the internet?

To communicate and socialize

This is very much a fundamental human need. People like to meet and talk
to other people through the internet. They use it to maintain new or existing
relationships. They want to communicate ideas and find solidarity with
others who share similar interests. So do something which facilitates
communication. Hyper-local or cross-border communities, social networks,
virtual worlds, apps or services built on existing communication/social
protocols and services. Bring human social activities onto the internet grid.
Socialize existing web functions, emphasize on connecting people.

To find information, learn new things and be entertained

The internet is a massive archive of new and old information. It is also a


source of pleasure, giving immediate gratification in the form of images,
sound and interactivity. As an educational tool, the web is essential for
people who are seeking to learn.

People want to find things online. So help them. Create a system which
provides information or filters existing content. Monetize the flow of data.
Blogs, training courses, social news, aggregated news, paid membership
sites, online journals, one-stop entertainment portals, video, image and game
hubs with a specific focus.

To do work, generate income and run a business

People use the internet to make a living. It is essential to many businesses


that want to increase brand exposure or sell a product/service. They also use
the web to help them work better. There is a market of webmasters,
entrepreneurs and small/big businesses out there who are willing to pay to
boost their revenue. Consultancies, design firms, freelancers, enterprise
software, business-specific tools/apps and services. Think of ways to help
people work smarter and more efficiently online.

To find general information about a subject


The Web is like a huge encyclopedia of information - in some ways it's even
better. The volume of information you'll find on the Web is amazing. For
every topic that you've ever wondered about, there's bound to be someone
who's written a Web page about it. The Web offers many different
perspectives on a single topic. For example, here is a selection of pages
about Genetic Engineering:
In fact you can even find online encyclopedias. Many of these are now
offering a subscription service which lets you search through the complete
text of the encyclopedia. There are also many free encyclopedias that may
give you a cut-down version of what you would find in a complete
encyclopedia.
To access information not easily available elsewhere
One of the great things about the Web is that it puts information into your
hands that you might otherwise have to pay for or find out by less
convenient means.

To correspond with faraway friends


Email offers a cheap and easy alternative to traditional methods of
correspondence. It's faster and easier than writing snail mail and cheaper
than using the telephone. Of course, there are disadvantages too. It's not as
personal as a handwritten letter - and not as reliable either. If you spell the
name of the street wrong in a conventional address, it's not too difficult for
the post office to work out what you mean. However if you spell anything
wrong in an email address, your mail won't be delivered (you might get it
sent back to you or you might never realize).
To meet people
The Web is generally a very friendly place. People love getting email from
strangers, and friendships are quick to form from casual correspondence.
The "impersonal" aspect of email tends to encourage people to reveal
surprisingly personal things about themselves. When you know you will
never have to meet someone face-to-face, you may find it easier to tell them
your darkest secrets. Cyber-friendships have often developed into real life
ones too. Many people have even found love on the Net, and have gone on
to marry their cyber-partner.
Did you think you were alone in your obsession with a singer, TV
programmed, author.
To have fun
There's no doubt that the Internet is a fun place to be. There's plenty to keep
you occupied on a rainy day.
To learn
Online distance education courses can give you an opportunity to gain a
qualification over the Internet.
To read the news
To find software
The Internet contains a wealth of useful downloadable shareware. Some
pieces of shareware are limited versions of the full piece of software, other
are time limited trials (you should pay once the time limit is up). Other
shareware is free for educational institutes, or for non-commercial purposes.
To buy things
The security of on-line shopping is still questionable, but as long as you are
dealing with a reputable company or Web Site the risks are minimal.

3.2 Why do people put things on the Web?


To advertise a product
Most company Web sites start up as a big advertisement for their products
and services. It may be hard to see why anyone would willingly visit a 10
page ad - but these advertisements are very useful to anyone genuinely
interested in finding out about their products. Companies may also give
away some information for free as an incentive for people to visit their
pages.
To sell a product
Internet shopping (e-commerce) is still in its infancy - it takes a very good
marketing strategy to actually make money out of selling items over the
Web, but that doesn't stop lots of people from trying.
To make money
A popular way to make money out of the Web is from advertising revenue.
Popular sites have banners at the top of the page enticing people to click
them and be taken to the advertiser's Web site. These banners are generally
animated and very appealing, with mysterious messages to make users
Chapter 4
How Web Works
Web documents can be linked together because they are created in a
format known as hypertext. Hypertext systems provide an easy way to manage
large collection of data, which can include text files, picture, movies, and more. in
a hypertext system when you view a document on your system screen, you also can
access all the data that might be linked to it. So, if the document is discussion of
honey bees you might be able to click the hypertext linked and see photos of a
beehive, or a movie of bees gathering pollen from flowers.

To support hypertext documents, the web uses a special protocol, called the
hypertext transfer protocol or HTTP.A hypertext document is specially enclosed
the file that uses the hypertext markup language, or HTML. This language allows
the document author or embed hypertext links –also called a hyperlinks –or just
link in the web document .HTTP and hypertext links are the foundation of world
wide web.

As you read a hypertext document more commonly called a web pages –on
screen, you can click a word or picture enclosed as a hypertext link and
immediately jump to another location within the same location or to different web
page .the second page may be located on the same computer as the original page or
anywhere else on the internet .Because you do not have to learn separate command
and address to jump to a new location, the world wide web organized widely
scattered resource in to a seamless whole.

A collection of related web pages is called a web site. Web site are housed on
web server, internet host computer s that often store thousands of individual pages.
copying a pages in to a server is called publishing the pages, but process also
called posting or uploading.

Web pages are used to distribute news, interactive educational services,


product information ,catalogs, highway ,traffic reports, and live audio and video,
and other kinds of information.web pages permit readers to consult databases
,order products and information, and submit payment with a credit card or an
account number.
The hypertext transfer protocol use the internet address in a specific format, called
a uniform resource locators, or URL .the url type specifies the type of server in
which the file is located. address is the address of server, and path is an location
within the file structure of server. The path includes the list of folder where the
desire file is located. One example of url page at the library of congress web site
which includes the information about the library’s collection of permanent
exhibits.

When we put the cursor on the browser’s blinking area and start
to type it will starts to find out the things which are related to
our input throughout the web. If our input is proper and correct
then only it will find whatever we want to search.
Chapter 5
Searching the Internet
4.1 Finding Things on the Web
The Web is a very big and much disorganized place. Just about any information
you would ever want to know (and a whole lot more that you wouldn't) exists on
the Web somewhere. But finding it is another story.

The reason for this is that it was never designed as a global information retrieval
system, hence there is no central place monitoring where or how information is
stored. The added complication of hypertext makes it very easy to lose your focus
and get lost.

Search Tools

These are lists of links to other sites related to a particular subject. The most useful
trailblazer pages have links divided into categories and descriptions of why each
site is useful.

Trailblazer pages are often constructed by enthusiastic amateurs. Some librarians


are creating trailblazer pages to help people find information, e.g. Children's
Literature Page at the University of Calgary

Trailblazer pages can be very useful in your Web searching. You will often find
links to pages that don't show up in search engines or directories. However, it can
be frustrating to jump from one trailblazer page to another without finding any
pages with actual content!

Portal Sites

These are sites aim to be an Internet 'one-stop-shop', either to the whole Internet, or
for one particular broad subject (e.g. Education). As well as link directories and
search engines they might offer a range of other services such as discussion
forums, online shopping malls and news reports. They can be quite useful,
especially for new users to get orientated to what kinds of things the Internet can
offer them. No portal can cover the entire Internet though, so eventually you might
find their range of subjects limiting and prefer to go on a wider hunt for the
information you require.

4.3 Search Tips


Understand the search engine you are using. Read the 'search tips' or 'help'
for the search engine - for instance Alta Vista's Help.
Use a variety of key words, use synonyms.
Search engines often match the first word first so put the most important
word or the broadest category at the beginning.
Use quotation marks to search for a phrase. Searching for rock and roll will
return documents with any of the words rock, and or roll. Searching for
"rock and roll" will only return documents with the whole phrase (read the
search engine 'help' to see if it supports phrases).
Try different arrangements of key words e.g. if you are looking for
indigenous women poets try "women writers" AND indigenous. This is a
combination of a phrase and a single word. Remember though, that in North
America they use native rather than indigenous - think of how your key
words will be written on the document.
Some search engines are case sensitive. Most search engines will match both
upper and lower case if lower case letter are entered, but only upper case if
upper case letters are entered. For instance "margaret mahy" will match both
"margaret mahy" and "Margaret Mahy", but "Margaret Mahy" will only
match "Margaret Mahy".
Most search engines will match part or whole of a word - e.g. sing will
retrieve singer, single, singe etc.
Think of common misspellings - take into account American spellings e.g.
theater, center.
Most search engines will search for documents with any of the words you
enter - e.g. a search for Christmas carols will find documents with just the
word Christmas, as well as documents with just the word carols.
Documents with both of the words will appear earlier on in the results. You
can use operators to restrict your search further (check the 'help' to find out
which operators the search engine uses).
Results are returned in order of relevance. If there is nothing useful in the
first few pages, chances are there won't be anything useful in any of the
others. Change your search query or use another search tool.
A Web Searching Activity
Pick one of these to do your search on:

A general site about conservation for your students to use as a reference in a


piece of their writing.
A page with information about Keas that you can adapt into a worksheet for
your students.
A site containing facts on endangered species from around the world.
Information on why the Pohutukawa tree is dying out.
A page with lots of conservation links for your students to use as a starting
point in a web search.
A site about New Zealand flora and fauna.

Pick one or more search tools to use for your search. Here are some to get you
started:

Yahoo
AltaVista
Google
Infoseek
Excite
Search NZ

What Search Tools did you use?

How long did it take to find what you were looking for?

What search words did you use to start with?

Did you have to revise your search words? Why?

What were the most successful search words?

What's the address of the page you found?

Describe the steps you took to find this page:

How satisfied were you with the page you found? How well did it fit what you
were looking for?

How well do you think your chosen search tool/tools performed in your search?
This is a facility that you may "bookmark" or add to your "favorites" it is no longer
regularly updated and maintained nor will it be updated, as I personally use the
Google toolbar for most of my searching, If I need something "special" I can try
something from this list or I may use Speciality Search Engines or perhaps the
huge resource at Special Search Engines. There is also a search engine that
searches for specialist search engines, but ironically, I cannot find it at the moment.

4.6 Some most popular Search Engines


The search engines below are all excellent choices to start with when searching for
information.

Google
http://www.google.com

Google

Google Inc. is an American public corporation, earning revenue from advertising


related to its Internet search, e-mail, online mapping, office productivity, social
networking, and video sharing services as well as selling advertising-free versions
of the same technologies. The Google headquarters, the Googleplex, is located in
Mountain View, California. As of March 31, 2009 (2009 -03-31)[update], the
company has 20,164 full-time employees.
Google was founded by Larry Page and Sergey Brin while they were students at
Stanford University and the company was first incorporated as a privately held
company on September 4, 1998. The initial public offering took place on August
19, 2004, raising US$1.67 billion, implying a value for the entire corporation of
US$23 billion. Google has continued its growth through a series of new product
developments, acquisitions, and partnerships. Environmentalism, philanthropy and
positive employee relations have been important tenets during the growth of
Google. The company has been identified multiple times as Fortune Magazine's #1
Best Place to Work,[4] and as the most powerful brand in the world [5] (according to
the Millward Brown Group).

Yahoo
http://www.yahoo.com

Yahoo!

Launched in 1994, Yahoo is the web's oldest "directory," a place where human
editors organize web sites into categories. However, in October 2002, Yahoo made
a giant shift to crawler-based listings for its main results. These came from Google
until February 2004. Now, Yahoo uses its own search technology. Learn more in
this recent review from our Search Day newsletter, which also provides some
updated submission details.

In addition to excellent search results, you can use tabs above the search box on the
Yahoo home page to seek images, Yellow Page listings or use Yahoo's excellent
shopping search engine. Or visit the Yahoo Search home page, where even more
specialized search options are offered.
The Yahoo Directory still survives. You'll notice "category" links below some of
the sites lists in response to a keyword search. When offered, these will take you to
a list of web sites that have been reviewed and approved by a human editor.

AltaVista
http://www.altavista.com

AltaVista is a web search engine owned by Yahoo!. AltaVista was once one of the
most popular search engines but its popularity has waned due to the rise of Google.
AltaVista opened in December 1995 and for several years was the "Google" of its
day, in terms of providing relevant results and having a loyal group of users that
loved the service.

Ask
http://www.ask.com

Ask Jeeves initially gained fame in 1998 and 1999 as being the "natural language"
search engine that let you search by asking questions and responded with what
seemed to be the right answer to everything. In reality, technology wasn't what
made Ask Jeeves perform so well. Behind the scenes, the company at one point
had about 100 editors who monitored search logs. They then went out onto the web
and located what seemed to be the best sites to match the most popular queries.

In 1999, Ask acquired Direct Hit, which had developed the world's first "click
popularity" search technology. Then, in 2001, Ask acquired Teoma's unique index
and search relevancy technology. Teoma was based upon the clustering concept of
subject-specific popularity.

AOL Search
http://aolsearch.aol.com (internal)
http://search.aol.com/(external)

AOL Search provides users with editorial listings that come Google's crawler-
based index. Indeed, the same search on Google and AOL Search will come up
with very similar matches. So, why would you use AOL Search? Primarily because
you are an AOL user. The "internal" version of AOL Search provides links to
content only available within the AOL online service. In this way, you can search
AOL and the entire web at the same time. The "external" version lacks these links.
Why wouldn't you use AOL Search? If you like Google, many of Google's features
such as "cached" pages are not offered by AOL Search.
Live Search
http://www.live.com/

Live Search is the name of Microsoft's web search engine, successor to MSN
Search, designed to compete with the industry leaders Google and Yahoo. The
search engine offers some innovative features, such as the ability to view
additional search results on the same web page and the ability to adjust the amount
of information displayed for each search-result. It also allows the user to save
searches and see them updated automatically on Live.com.

Look Smart
http://www.looksmart.com

Look Smart is primarily a human-compiled directory of web sites. It gathers its


listings in two ways. Commercial sites pay to be listed in its commercial
categories, making the service very much like an electronic "Yellow Pages."
However, volunteer editors at the LookSmart-owned Zeal directory also catalog
sites into non-commercial categories for free. Though Zeal is a separate web site,
its listings are integrated into LookSmart's results.

Lycos
http://www.lycos.comLycos is one of the oldest search engines on the web,
launched in 1994. It ceased crawling the web for its own listings in April 1999 and
instead provides access to human-powered results from LookSmart for popular
queries and crawler-based results from Yahoo for others.

Netscape Search
http://search.netscape.com

Owned by AOL Time Warner, Netscape Search uses Google for its main listings,
just as does AOL's other major search site, AOL Search. So why use Netscape
Search rather than Google? Unlike with AOL Search, there's no compelling reason
to consider it. The main difference between Netscape Search and Google is that
Netscape Search will list some of Netscape's own content at the top of its results.
Netscape also has a completely different look and feel than Google. If you like
either of these reasons, then try Netscape Search. Otherwise, you're probably better
off just searching at Google.
4.2 Limitations of Search Engines
The ambiguities of language mean that the list of retrieved documents may
contain a high percentage of irrelevant material.
Some search only document titles and others search the entire document.
Being electronic, they can't discriminate between valuable documents and
ones of dubious quality.
With millions of people using the Internet they sometimes become
overloaded.
Chapter 6
Steps of Internet Searching
How is it that an Internet Search engine can find the answers to a query so quickly?
It is

a four-step process:

1. Crawling the Web: following links to find pages.

2. Indexing the pages: to create an index from every word to every place it occurs.

3. Ranking the pages: so the best ones show up first.

4. Displaying the results: in a way that is easy for the user to understand.

Crawling is conceptually quite simple: starting at some well-known sites on the


web, recursively follow every hypertext link, recording the pages encountered
along the way. In computer science this is called the transitive closure of the link
relation. However, the conceptual simplicity hides a large number of practical
complications: sites may be busy or down at one point, and come back to life later;
pages may be duplicated at multiple sites (or with different URLs at the same site)
and must be dealt with accordingly; many pages have text that does not conform to
the standards for HTML, HTTP redirection, robot exclusion, or other protocols;
some information is hard to access because it is hidden behind a form, Flash
animation or Java script program. Finally, the necessity of crawling 100 million
pages a day means that building a crawler is an exercise in distributed computing,
requiring many computers that must work together and schedule their actions so as
to get to all the pages without overwhelming any one site with too many requests at
once.

A search engine’s index is similar to the index in the back of a book: it is used to
find the pages on which a word occurs. There are two main differences: the search
engine’s index lists every occurrence of every word, not just the important
concepts, and the number of pages is in the billions, not hundreds. Various
techniques of compression and clever representation are used to keep the index
―small,‖ but it is still measured in terabytes (millions of megabytes), which again
means that distributed computing is required. Most modern search engines index
link data as well as word data. It is useful to know how many pages link to a given
page, and what are the quality of those pages. This kind of analysis is similar to
citation analysis in bibliographic work, and helps establish which pages are
authoritative. Algorithms such as PageRank and HITS are used to assign a numeric
measure of authority to each page. For example, the PageRank algorithm says that
the rank of a page is a function of the sum of the ranks of the pages that link to the
page. If we let PR(p) be the PageRank of page p, Out(p) be the number of outgoing
links from page p, Links(p) be the set of pages that link to page p and N be the
total number of pages in the index, then we can define PageRank by

PR (p) = r/N + (1 -r) Si Links (p) PR(i)/Out(i)

where r is a parameter that indicates the probability that a user will choose not to
follow a link, but will instead restart at some other page. The r/N term means that
each of the N pages is equally likely to be the restart point, although it is also
possible to use a smaller subset of well-known pages as the restart candidates. Note
that the formula for PageRank is recursive – PR appears on both the right- and left-
hand sides of the equation. The equation can be solved by iterating several times,
or by standard linear algebra techniques for computing the eigenvalues of a (3-
billion-by-3-billion) matrix.

The two steps above are query independent—they do not depend on the user’s
query, and thus can be done before a query is issued with the cost shared among all
users. This is why a search takes a second or less, rather than the days it would take
if a search engine had to crawl the web anew for each query. We now consider
what happens when a user types a query. Consider the query [―National
Academies‖ computer science], where the square brackets denote the beginning
and end of the query, and the quotation marks indicate that the enclosed words
must be found as an exact phrase match. The first step in responding to this query
is to look in the index for the hit lists corresponding to each of the four words
―National,‖ ―Academies,‖ ―computer‖ and ―science.‖ These four lists are then
intersected to yield the set of pages that mention all four words. Because ―National
Academies‖ was entered as a phrase, only hits where these two words appear
adjacent and in that order are counted. The result is a list of 19,000 or so pages.
The next step is ranking these 19,000 pages to decide which ones are most
relevant. In traditional information retrieval this is done by counting the number of
occurrences of each word, weighing rare words more heavily than frequent words,
and normalizing for the length of the page. A number of refinements on this
scheme have been developed, so it is common to give more credit for pages where
the words occur near each other, where the words are in bold or large font, or in a
title, or where the words occur in the anchor text of a link that points to the page.
Inaddition the query-independent authority of each page is factored in. The result is
a numeric score for each page that can be used to sort them best-first. For our four-
word query, most search engines agree that the Computer Science and
Telecommunications Board home page at www7.nationalacademies.org/cstb/ is the
best result, although one preferred the National

Academies news page at www.nas.edu/topnews/ and one inexplicably chose a


year-old news story that mentioned the Academies. The final step is displaying the
results. Traditionally this is done by listing a short description of each result in
rank-sorted order. The description will include the title of the page and may
include additional information such as a short abstract or excerpt from the page.
Some search engines generate query-independent abstracts while others customize
each excerpt to show at least some of the words from the query. Displaying this
kind of query-dependent excerpt means that the search engine must keep a copy of
the full text of the pages (in addition to the index) at a cost of several more
terabytes. Some search engines attempt to cluster the result pages into coherent
categories or folders, although this technology is not yet mature.

Studies have shown that the most popular uses of computers are email, word
processing and Internet searching. Of the three, Internet searching is by far the
most sophisticated example of computer science technology. Building a high-
quality search engine requires extensive knowledge and experience in information
retrieval, data structure design, user interfaces, and distributed systems
implementation.

Future advances in searching will increasingly depend on statistical natural


language processing [Lee] and machine learning [Mitchell] techniques. With so
much data—billions of pages, tens of billions of links, and hundreds of millions of
queries per day—it makes sense to use data mining approaches to automatically
improve the system. For example, several search engines now do spelling
correction of user queries. It turns out that the vast amount of correctly and
incorrectly spelled text available to a search engine makes it easier to create a good
spelling corrector than traditional techniques based on dictionaries. It is likely that
there will be other examples of text understanding that can be done better with a
data-oriented approach; this is an area that search engines are just beginning to
explore.
Chapter 7
Browsing the Internet
5.1 What do we need to know?
A browser is a program on your computer that enables you to search ("surf")
and retrieve information on the WorldWideWeb (WWW), which is part of the
Internet. The Web is simply a large number of computers linked together in a
global network, that can be accessed using an address (URL, Uniform Resource
Locator), e.g. http://www.nepalnews.com.np for thenews of Nepal).
URLs are often long and therefore easy to type incorrectly. They all begin with
http://, and many (but not all) begin with http://www. In many cases the first part
(http://, or even http://www.) can be omitted, and you will still be able to access the
page.

5.2 Searching the Web


If you don't know the telephone number of the person you wish to ring to, you need
a telephone directory. The Web provides two methods of searching for pages
providing information:
• sites presenting web pages sorted by category and subcategories, e.g. Yahoo
(several sites, including http://www.yahoo.com and http://www.yahoo.no)
• sites offering search engines that return lists of web pages containing text that
matches a search word or string, e.g. Google (http://www.google.com), AltaVista
(http://www.altavista.com) and FAST Search (http://www.alltheweb.com).

Before you conduct a search, it is important to consider, among others, the


following points:

1. Is your choice of search term is adequate, too restrictive or too general?


2. Is the search you have planned to undertake most suited for a search engine that
categorizes web sites, so that you can browse through appropriate subcategories
when the first results are returned?
3. Are you more interested in using a search engine that merely returns all the
search.
5.3Web browser
A web browser is a software application for retrieving, presenting, and traversing
information resources on the World Wide Web. An information resource is
identified by a Uniform Resource Identifier (URI) and may be a web page, image,
video, or other piece of content. Hyperlinks present in resources enable users to
easily navigate their browsers to related resources.Although browsers primarily
intended to access the World Wide Web, they can also be used to access
information provided by web servers in private networks or content in file systems.

The major web browsers in order of usage according to Net Applications are
Windows Internet Explorer, Mozilla Firefox, Apple Safari, Google Chrome, and
Opera.

5.4 Some useful Web Browsers


Internet Explorer
Microsoft's Internet Explorer (IE) is one of the most popular browser today.
IE was introduced in 1995 and passed Netscape in popularity in 1998.

Firefox
Firefox is a browser from Mozilla. It was released in 2004 and is one of the
most popular browser today.

Netscape
Netscape was the first commercial Internet browser. Netscape was
introduced in 1994, but gradually lost its popularity to Internet Explorer.
The development of Netscape officially ended in February 2008.

Mozilla
The Mozilla Project has grown from the ashes of Netscape. Browsers based
on Mozilla code are the largest browser-family on the Internet today.
Chapter 8
Demerits of Internet Searching
Theft of Personal information
If you use the Internet, you may be facing grave danger as your personal
information such as name, address, credit card number etc. can be accessed by
other culprits to make your problems worse.

Spamming
Spamming refers to sending unwanted e-mails in bulk, which provide no
purpose and needlessly obstruct the entire system. Such illegal activities can be
very frustrating for you, and so instead of just ignoring it, you should make an
effort to try and stop these activities so that using the Internet can become that
much safer.
Virus threat
Virus is nothing but a program which disrupts the normal functioning of your
computer systems. Computers attached to internet are more prone to virus attacks
and they can end up into crashing your whole hard disk, causing you considerable
headache.
Pornography:
This is perhaps the biggest threat related to your children’s healthy mental life.
A very serious issue concerning the Internet.

Time wasting

If we are not sure that what we are searching or if we can’t select the proper search
tips it will take long time.
Chapter 9
Conclusion
Now this is the era of 21 st century. The most of the people of the world are using
internet and it became a essential part of the daily life. We can say even a small
work at home also people are using internet. The modernization seen in the world
in a short period of time and rapid development of the world is only by the
evolution of computer and internet. We already discuss about the feature and uses
of Internet above also.

Therefore, from my experience during this mini project also I understood the
importance of Internet searching. We can get anything from the Internet if we
search for the proper combination of searching tips and proper words. If we are
surfing the Internet also we need to utilize it in the proper way.
References:
http://www.google.com
http://www.wikipedia.com
http://www.ask.com
http://www.yahoo.com

Introduction to Computer by PETER NORTON

You might also like