You are on page 1of 23

A Basic Guide to the Internet

The Internet is a computer network made up of thousands of networks worldwide. No


one knows exactly how many computers are connected to the Internet. It is certain,
however, that these number in the millions and are growing.

No one is in charge of the Internet. There are organizations which develop technical
aspects of this network and set standards for creating applications on it, but no governing
body is in control. The Internet backbone, through which Internet traffic flows, is owned
by private companies.

All computers on the Internet communicate with one another using the Transmission
Control Protocol/Internet Protocol suite, abbreviated to TCP/IP. Computers on the
Internet use a client/server architecture. This means that the remote server machine
provides files and services to the user's local client machine. Software can be installed on
a client computer to take advantage of the latest access technology.

An Internet user has access to a wide variety of services: electronic mail, file transfer,
vast information resources, interest group membership, interactive collaboration,
multimedia displays, real-time broadcasting, breaking news, shopping opportunities, and
much more.

The Internet consists primarily of a variety of access protocols. Many of these protocols
feature programs that allow users to search for and retrieve material made available by
the protocol.

Components of the Internet


World Wide Web

The World Wide Web (abbreviated as the Web or WWW) is a system of Internet servers
that supports hypertext to access several Internet protocols on a single interface. Almost
every protocol type available on the Internet is accessible on the Web. This includes e-
mail, FTP, Telnet, and Usenet News. In addition to these, the World Wide Web has its
own protocol: HyperText Transfer Protocol, or HTTP. These protocols will be explained
below.

The World Wide Web provides a single interface for accessing all these protocols. This
creates a convenient and user-friendly environment. It is not necessary to be conversant
in these protocols within separate, command-level environments, as was typical in the
early days of the Internet. The Web gathers together these protocols into a single system.
Because of this feature, and because of the Web's ability to work with multimedia and
advanced programming languages, the Web is the most popular component of the
Internet.

The operation of the Web relies primarily on hypertext as its means of information
retrieval. HyperText is a document containing words that connect to other documents.
These words are called links and are selectable by the user. A single hypertext document
can contain links to many documents. In the context of the Web, words or graphics may
serve as links to other documents, images, video, and sound. Links may or may not
follow a logical path, as each connection is programmed by the creator of the source
document. Overall, the Web contains a complex virtual web of connections among a vast
number of documents, graphics, videos, and sounds.

Producing hypertext for the Web is accomplished by creating documents with a language
called HyperText Markup Language, or HTML. With HTML, tags are placed within the
text to accomplish document formatting, visual features such as font size, italics and bold,
and the creation of hypertext links. Graphics and multimedia may also be incorporated
into an HTML document.

HTML is an evolving language, with new tags being added as each upgrade of the
language is developed and released. For example, visual formatting features are now
often separated from the HTML document and placed into Cascading Style Sheets (CSS).
This has several advantages, including the fact that an external style sheet can centrally
control the formatting of multiple documents. The World Wide Web Consortium (W3C),
led by Web founder Tim Berners-Lee, coordinates the efforts of standardizing HTML.
The W3C now calls the language XHTML and considers it to be an application of the
XML language standard.

The World Wide Web consists of files, called pages or home pages, containing links to
documents and resources throughout the Internet.

The Web provides a vast array of experiences including multimedia presentations, real-
time collaboration, interactive pages, radio and television broadcasts, and the automatic
"push" of information to a client computer or to an RSS reader. Programming languages
such as Java, JavaScript, Visual Basic, Cold Fusion and XML extend the capabilities of
the Web. Much information on the Web is served dynamically from content stored in
databases. The Web is therefore not a fixed entity, but one that is in a constant state of
development and flux.

For more complete information about the World Wide Web, see Understanding The
World Wide Web.

E-mail
Electronic mail, or e-mail, allows computer users locally and worldwide to exchange
messages. Each user of e-mail has a mailbox address to which messages are sent.
Messages sent through e-mail can arrive within a matter of seconds.

A powerful aspect of e-mail is the option to send electronic files to a person's e-mail
address. Non-ASCII files, known as binary files, may be attached to e-mail messages.
These files are referred to as MIME attachments. MIME stands for Multimedia Internet
Mail Extension, and was developed to help e-mail software handle a variety of file types.
For example, a document created in Microsoft Word can be attached to an e-mail
message and retrieved by the recipient with the appropriate e-mail program. Many e-mail
programs offer the ability to read files written in HTML, which is itself a MIME type.

Telnet

Telnet is a program that allows you to log into computers on the Internet and use online
databases, library catalogs, chat services, and more. There are no graphics in Telnet
sessions, just text. To Telnet to a computer, you must know its address. This can consist
of words (locis.loc.gov) or numbers (140.147.254.3). Some services require you to
connect to a specific port on the remote computer. In this case, type the port number after
the Internet address. Example: telnet nri.reston.va.us 185.

Telnet is available on the World Wide Web. Probably the most common Web-based
resources available through Telnet have been library catalogs, though most catalogs have
since migrated to the Web. A link to a Telnet resource may look like any other link, but it
will launch a Telnet session to make the connection. A Telnet program must be installed
on your local computer and configured to your Web browser in order to work.

With the popularity of the Web, Telnet is less frequently used as a means of access to
information on the Internet.

FTP

FTP stands for File Transfer Protocol. This is both a program and the method used to
transfer files between computers. Anonymous FTP is an option that allows users to
transfer files from thousands of host computers on the Internet to their personal computer
account. FTP sites contain books, articles, software, games, images, sounds, multimedia,
course work, data sets, and more.

If your computer is directly connected to the Internet via an Ethernet cable, you can use
one of several PC software programs, such as WS_FTP for Windows, to conduct a file
transfer.

FTP transfers can be performed on the World Wide Web without the need for special
software. In this case, the Web browser will suffice. Whenever you download software
from a Web site to your local machine, you are using FTP. You can also retrieve FTP files
via search engines. Visit FTPsearchengines.com for a list of sites. This option is easiest
because you do not need to know FTP program commands.

E-mail Discussion Groups

One of the benefits of the Internet is the opportunity it offers to people worldwide to
communicate via e-mail. The Internet is home to a large community of individuals who
carry out active discussions organized around topic-oriented forums distributed by e-mail.
These are administered by various types of software programs.

A great variety of topics are covered by discussion groups. When you subscribe to a
group, messages from other subscribers are automatically sent to your electronic mailbox.
You subscribe by sending an e-mail message to the address of the group. You must have a
e-mail account to participate in a listserv discussion group. Visit Tile.net at http://tile.net/
to see an example of a site that offers a searchablecollection of e-mail discussion groups.

Listserv, majordomo and Listproc are among the programs that administer e-mail
discussion groups. The commands for subscribing to and managing your list
memberships are similar to those of listserv.

Usenet News

Usenet News is a global electronic bulletin board system in which millions of computer
users exchange information on a vast range of topics. The major difference between
Usenet News and e-mail discussion groups is the fact that Usenet messages are stored on
central computers, and users must connect to these computers to read or download the
messages posted to these groups. This is distinct from e-mail distribution, in which
messages arrive in the electronic mailboxes of each list member.

Usenet itself is a set of machines that exchanges messages, or articles, from Usenet
discussion forums, called newsgroups. Usenet administrators control their own sites, and
decide which (if any) newsgroups to sponsor and which remote newsgroups to allow into
the system. A good directory of Usenet groups can be found at Tile.Net.

There are thousands of Usenet newsgroups. These range from academic to recreational
topics. Serious computer-related work takes place in Usenet discussions. A small number
of e-mail discussion groups also exist as Usenet newsgroups.

The Usenet newsfeed can be read by a variety of newsreader software programs. For
example, the Netscape suite comes with a newsreader program called Messenger.
Newsreaders are also available as standalone products.

Usenet is not as popular nowadays as it once was. Blogs and RSS feeds are newer modes
of communication that have caught the interest of Internet users. These technologies are
covered in Understanding the World Wide Web.
Chat & Instant Messaging

Chat programs allow users on the Internet to communicate with each other by typing in
real time. They are sometimes included as a feature of a Web site, where users can log
into the "chat room" to exchange comments and information about the topics addressed
on the site. Chat may take other, more wide-ranging forms. For example, America Online
is well known for sponsoring a number of topical chat rooms.

Internet Relay Chat (IRC) is a service through which participants can communicate to
each other on hundreds of channels. These channels are usually based on specific topics.
While many topics are frivolous, substantive conversations are also taking place. To
access IRC, you must use an IRC software program.

A variation of chat is the phenomenon of instant messaging. With instant messaging, a


user on the Web can contact another user currently logged in and type a conversation.
Most famous is America Online's Instant Messenger. ICQ, MSN and Yahoo also offer
chat programs. Open Source chat programs include GAIM and Jabber.

Other types of real-time communication are addressed in the tutorial Understanding the
World Wide Web.

Return to Top

Updated: February 2007

Send comments to Laura Cohen

Back to Internet Tutorials

Understanding the World Wide Web


The World Wide Web is a system of Internet servers that supports hypertext to access
several Internet protocols on a single interface. The World Wide Web is often abbreviated
as the Web or WWW.

The World Wide Web was developed in 1989 by Tim Berners-Lee of the European
Particle Physics Lab (CERN) in Switzerland. The initial purpose of the Web was to use
networked hypertext to facilitate communication among its members, who were located
in several countries. Word was soon spread beyond CERN, and a rapid growth in the
number of both developers and users ensued. In addition to hypertext, the Web began to
incorporate graphics, video, and sound. The use of the Web has reached global
proportions and has become a defining aspect of human culture in an amazingly short
period of time.

Almost every protocol type available on the Internet is accessible on the Web. Internet
protocols are sets of rules that allow for intermachine communication on the Internet.
The following is a sample of major protocols accessible on the Web:

E-mail (Simple Mail Transport Protocol or SMTP)


Distributes electronic messages and files to one or more electronic mailboxes
Telnet (Telnet Protocol)
Facilitates login to a computer host to execute commands
FTP (File Transfer Protocol)
Transfers text or binary files between an FTP server and client
Usenet (Network News Transfer Protocol or NNTP)
Distributes Usenet news articles derived from topical discussions on newsgroups
HTTP (HyperText Transfer Protocol)
Transmits hyptertext over networks. This is the protocol of the Web.

Many other protocols are available on the Web. To name just one example, the Voice over
Internet Protocol (VoIP) allows users to place a telephone call over the Web.

The World Wide Web provides a single interface for accessing all these protocols. This
creates a convenient and user-friendly environment. Once upon a time, it was necessary
to be conversant in these protocols within separate, command-level environments. The
Web gathers these protocols together into a single system. Because of this feature, and
because of the Web's ability to work with multimedia and advanced programming
languages, the Web is by far the most popular component of the Internet.

HYPERTEXT AND LINKS: THE MOTION OF THE WEB

The operation of the Web relies primarily on hypertext as its means of information
retrieval. HyperText is a document containing words that connect to other documents.
These words are called links and are selectable by the user. A single hypertext document
can contain links to many documents. In the context of the Web, words or graphics may
serve as links to other documents, images, video, and sound. Links may or may not
follow a logical path, as each connection is created by the author of the source document.
Overall, the Web contains a complex virtual web of connections among a vast number of
documents, graphics, videos, and sounds.

Producing hypertext for the Web is accomplished by creating documents with a language
called HyperText Markup Language, or HTML. With HTML, tags are placed within the
text to accomplish document formatting, visual features such as font size, italics and bold,
and the creation of hypertext links. Graphics may also be incorporated into an HTML
document.
HTML is an evolving language, with new tags being added as each upgrade of the
language is developed and released. For example, visual formatting features are now
often separated from the HTML document and placed into Cascading Style Sheets (CSS).
This has several advantages, including the fact that an external style sheet can centrally
control the formatting of multiple documents. The World Wide Web Consortium (W3C),
led by Web founder Tim Berners-Lee, coordinates the efforts of standardizing HTML.
The W3C now calls the language XHTML and considers it to be an application of the
XML language standard.

PAGES ON THE WEB

The World Wide Web consists of files, called pages or Web pages, containing information
and links to resources throughout the Internet.

Web pages can be created by user activity. For example, if you visit a Web search engine
and enter keywords on the topic of your choice, a page will be created containing the
results of your search. In fact, a growing amount of information found on the Web today
is served from databases, creating temporary Web pages "on the fly" in response to user
queries.

Access to Web pages may be accomplished by:

1. Entering an Internet address and retrieving a page directly


2. Browsing through pages and selecting links to move from one page to another
3. Searching through subject directories linked to organized collections of Web
pages
4. Entering a search statement at a search engine to retrieve pages on the topic of
your choice

RETRIEVING DOCUMENTS ON THE WEB: THE URL and


DOMAIN NAME SYSTEM

URL stands for Uniform Resource Locator. The URL specifies the Internet address of a
file stored on a host computer connected to the Internet. Every file on the Internet, no
matter what its access protocol, has a unique URL. Web browsers use the URL to retrieve
the file from the host computer and the specific directory in which it resides. This file is
downloaded to the user's client computer and displayed on the monitor connected to the
machine.

URLs are translated into numeric addresses using the Domain Name System (DNS). The
DNS is a worldwide system of servers that stores location pointers to Web sites. The
numeric address, called the IP (Internet Protocol) address, is actually the "real" URL.
Since numeric strings are difficult for humans to use, alphneumeric addresses are
employed by end users. Once the translation is made by the DNS, the browser can
contact the Web server and ask for a specific file located on its site.

Anatomy of a URL

This is the format of the URL:

protocol://host/path/filename

For example, this is a URL on the Web site of the U.S. House of Representatives:

http://www.house.gov/house/2004_House_Calendar.html

This URL is typical of addresses hosted in domains in the United States.


Structure of this URL:

1. Protocol: http
2. Host computer name: www
3. Second-level domain name: house
4. Top-level domain name: gov
5. Directory name: house
6. File name: 2004_House_Calendar_html

Note how much information about the content of the file is present in this well-
constructed URL.

Several top-level domains (TLDs) are common in the United States:

com commercial enterprise


edu educational institution
gov U.S. government entity
mil U.S. military entity
net network access provder
org usually nonprofit organizations

New domain names were approved in November 2000 by the Internet Corporation for
Assigned Names and Numbers (ICANN): .biz, .museum, .info, .pro (for professionals)
.name (for individuals), .aero (for the aerospace industry), and .coop (for cooperatives).
ICANN continues to investigate proposals for addding additional domain names, for
example, .mobi for sites designed for mobile devices, and .jobs for the human resources
community.

In addition, dozens of domain names have been assigned to identify and locate files
stored on host computers in countries around the world. These are referred to as two-
letter Internet country codes, and have been standardized by the International Standards
Organization as ISO 3166. For example:
ch Switzerland
de Germany
jp Japan
uk United Kingdom

As the technology of the Web evolves, URLs have become more complex. This is
especially the case when content is retrieved from databases and served onto Web pages.
The resulting URLs can have a variety of elaborate structures, for example,

http://spills.incidentnews.gov/incidentnews/FMPro?-db=images&-
Format=maps.htm&SpillLink=8&Subject=Waterway%20Closure%20Map&-
SortField=EntryDate&-SortOrder=descend&-SortField=EntryTime&-
SortOrder=descend&-Token=8&-Max=20&-Find

The first part of this URL looks familiar. What follows are search elements that query the
database and determine the order of the results. As a growing number of databases serve
content to the Web, these types of URLs will appear more commonly in your browser's
address window.

HOW TO ACCESS THE WORLD WIDE WEB: WEB BROWSERS

To access the World Wide Web, you must use a Web browser. A browser is a software
program that allows users to access and navigate the World Wide Web. There are two
types of browsers:

1. Graphical: Text, images, audio, and video are retrievable through a graphical
software program such as Internet Explorer, Firefox , Netscape, Mozilla and
Opera. These browsers are available for Windows, Apple, Linux and other
operating systems. Navigation is accomplished by pointing and clicking with a
mouse on highlighted words and graphics.

You can install a graphical browser on your computer. For example, Internet
Explorer is a part of the Windows operating system, and is also available on the
Microsoft site: http://www.microsoft.com/. Firefox is available for downloading
from http://www.mozilla.org/products/firefox/ and Netscape is available from
http://home.netscape.com/.

2. Text: Lynx is a browser that provides access to the Web in text-only mode.
Navigation is accomplished by highlighting emphasized words in the screen with
the arrow up and down keys, and then pressing the forward arrow (or Enter) key
to follow the link. In these days of graphical browsers, it may be hard to believe
that Lynx was once very popular.

Extending the Browser: Plug-Ins


Software programs may be configured to a Web browser in order to enhance its
capabilities. When the browser encounters a sound, image or video file, it hands off the
data to other programs, called plug-ins, to run or display the file. Working in conjunction
with plug-ins, browsers can offer a seamless multimedia experience. Many plug-ins are
available for free.

File formats requiring plug-ins are known as MIME types. MIME stands for Multimedia
Internet Mail Extension, and was originally developed to help e-mail software handle a
variety of binary (non-ASCII) file attachments. The use of MIME has expanded to the
Web. For example, the basic MIME type handled by Web browsers is text/html
associated with the file extention .html.

A common plug-in utilized on the Web is the Adobe Acrobat Reader. The Acrobat Reader
allows you to view documents created in Adobe's Portable Document Format (PDF).
These documents are the MIME type "application/pdf" and are associated with the file
extension .pdf. When the Acrobat Reader has been downloaded to your computer, the
program will open and display the file requested when you click on a hyperlinked file
name with the suffix .pdf. The latest versions of the Acrobat Reader allow for the viewing
of documents within the browser window.

Web browsers are often standardized with a small suite of plug-ins, especially for playing
multimedia content. Additional plug-ins may be obtained at the browser's Web site, at
special download sites on the Web, or from the Web sites of the companies that created
the programs.

Once a plug-in is configured to your browser, it will automatically launch when you
choose to access a file type that it uses.

Beyond Plug-Ins: Active X

ActiveX is a technology developed by Microsoft which make plug-ins less neccesary.


ActiveX offers the opportunity to embed animated objects, data, and computer code on
Web pages. A Web browser supporting ActiveX can render most items encountered on a
Web page. As just one example, Active X allows you to view and edit PowerPoint
presentations directly within your Web browser. ActiveX works best with Microsoft's
Internet Explorer.

THE EXPERIENCE OF THE WEB

Today's World Wide Web presents an ever-diversified experience of multimedia,


programming languages, and real-time communication. There is no question that it is a
challenge to keep up with the rapid pace of developments. The following presents a brief
description of some of the more important trends to watch.

Multimedia
The Web has become a broadcast medium. It is possible to listen to audio and video over
the Web, both pre-recorded and live. For example, you can visit the sites of news
organizations and view the same videos shown on the nightly news. Several plug-ins are
available for viewing these videos.

At one time, the entire multimedia file had to be downloaded before viewing. Since these
types of files tend to be quite large, download times can be lengthy. This problem has
been answered by a revolutionary development in multimedia capability: streaming
media. In this case, audio or video files are played as they are downloading, or streaming,
into your computer. Only a small wait, called buffering, is necessary before the file
begins to play.

The Windows Media Player, RealPlayer and QuickTime plug-ins play streaming audio
and video files. Extensive files such as interviews, speeches, hearings televised video
clips and music work very well with these players. They are also ideal for the broadcast
of real-time events. These may include live radio and television broadcasts, concerts,
Web-only broadcasts, and so on.

Shockwave and Flash are plug-ins that provide another multimedia experience. They
offer the creation and implementation of an entire multimedia display combining
graphics, animation and sound.

Sound files, including music, are also a part of the Web experience. Sound files may be
incorporated into Web sites, and are also available for downloading independent of Web
site visits. For example, try the search engine FindSounds.com. Sound files of many
types are supported by the Web with the appropriate plug-ins. The MP3 file format, and
the choice of supporting plug-ins, is one of the most popular music trends to sweep the
Web.

MP3 files are also the source of podcasts. These are audio files distributed through RSS
feeds. A good example of library podcasting can be found on the site of Dowling College,
which distributes podcasts of interest to its user community. A variation on the podcast is
the vodcast, which is a video file distributed through RSS. This type of multimedia
broadcasting is up-and-coming on the Web. (More on RSS below.)

Live cams are another aspect of the multimedia experience available on the Web. Live
cams are video cameras that send their data in real time to a Web server. These cams may
appear in all kinds of locations, both serious and whimsical: an office, on top of a
building, a scenic locale, a special event, and so on.

Programming Languages and Functions

The use of existing and new programming languages have extended the capabilities of
the Web. What follows is a basic guide to a group of the more common languages and
functions in use on the Web today.
CGI, Active Server Pages: CGI (Common Gateway Interface) refers to a specification
by which programs can communicate with a Web server. A CGI program, or script, is any
program designed to accept and return data that conforms to the CGI specification. The
program can be written in any programming language, including C, Perl, and Visual
Basic Script. A common use for a CGI script is to process a form on a Web page. For
example, you might fill out a form to order a book at Amazon. The script processes your
information and sends it to Amazon to process your order.

Another type of dynamically generated Web page is called Active Server Pages (ASP).
Developed by Microsoft, ASP is a programming environment that processes scripts on
the Web server. The scripts run on the server, rather than on the Web browser, to generate
the HTML pages sent to browsers. Visual Basic and JScript (a subset of JavaScript) are
often used for the scripting. ASPs end in the file extension .asp or .aspx.

Java/Java Applets: Java Java is an object-oriented programming language similar to


C++. Developed by Sun Microsystems, the aim of Java is to create programs that will be
platform independent. The Java motto is, "Write once, run anywhere." A perfect Java
program should work equally well on a PC, Macintosh, Unix, and so on, without any
additional programming. This goal has yet to be realized. Java can be used to write
applications for both Web and non-Web use.

Web-based Java applications are usually in the form of Java applets. These are small
Java programs called from an HTML page that can be downloaded from a Web server
and run on a Java-compatible Web browser. A few examples include live newsfeeds,
moving images with sound, calculators, charts and spreadsheets, and interactive visual
displays. Java applets can tend to load slowly, but programming improvements should
lead to a shortened loading time.

JavaScript/JScript: JavaScript is a programming language created by Netscape


Communications. Small programs written in this language are embedded within an
HTML page, or called externally from the page, to enhance the page's the functionality.
Examples of JavaScript include moving tickers, drop-down menus, real-time calendars
and clocks, and mouse-over interactions. JScript is a similar language developed by
Microsoft and works with the company's Internet Explorer browser.

XML: XML (eXtensible Markup Language) is a mark-up language that enables Web
designers to create their own customized tags to provide functionality not available with
HTML alone. XML is a language of data structure and exchange, and allows developers
to separate form from content. With XML, the same content can be formatted for multiple
applications. In May 1999, the W3 Consortium announced that HTML 4.0 has been
recast as an XML application called XHTML. This move is slowing having an impact on
the future of both XML and HTML.

Ajax: Ajax stands for Asynchronous JavaScript and XML. This langauge creates
interactive Web applications. Its premise is that it sends data to the browser behind the
scenes, so that when it is time to view the information, it is already "there." Google Maps
is an example of an Ajax-enabled application. Another is SurfWax LookAhead, an RSS
search tool that retrieves feeds as you type your query.

Real-Time Communication

Text, audio and video communication can occur in real time on the Web. This capability
allows people to conference and collaborate in real time. In general, the faster the Internet
connection, the more successful the experience.

At its simplest, chat programs allow multiple users to type to each other in real time.
Internet Relay Chat and America Online's Instant Messenger are prime examples of this
type of program. The development of a messenging protocols is underway. Such a
protocol would allow for the expansion of this capability throughout the Internet.

More enhanced real-time communication offers an audio and/or video component. CU-
See Me is a sotware programs of this type. Even more elaborate are programs that allow
for true real-time collaboration. Microsoft's NetMeeting and Netscape's Conference
(available with Communicator) are good examples of this.

Featured collaboration tools include:

• audio: conduct a telephone conversation on the Web


• video: view your audience
• file transfer: send files back and forth among participants
• chat: type in real time
• whiteboard: draw, mark up, and save images on a shared window or board
• document/application sharing: view and use a program on another's desktop
machine
• collaborative Web browsing: visit Web pages together

Currently no standard exists that will work among all conferencing programs.

Current Trends: Blogs and RSS

The Web is a welcoming medium for experimentation and user participation. It is


becoming easier to post Web content and share comments with other users. The idea of
the Web site is still very much alive, but Web participation is taking new forms and being
driven by new technologies that foster social interaction. Here are two of the latest trends.

Blogs: A blog is an easy-to-create Web site that allows users to share their thoughts with
the world managed by a lightweight content management system. The word "blog" comes
from "Weblog" because a blog consists of a signed and dated log of individual postings.
The topic of the blog can be anything, from the personal to the professional. A blog is
what you make of it.
What is important about blogs is the content management system that manages the
content. This system can offer a variety of features that can make the blog a useful tool.
Examples include a calendar view of postings, organization of postings into categories,
archived postings, options to send e-mail notification of new postings, and so on.

Blogging can be an interactive activity. Readers can add comments to a blogger's


postings, other can respond, and a conversation ensues. Lately, bloggers have become
well-known commentators on the political scene, but blogging can encompass any topic
or no topic at all. If the blogging software allows it, bloggers can use RSS to distribute
their postings.

Visit Technorati, a search engine devoted to locating blogs. You can set up your own blog
by visiting Blogger.

RSS: RSS allows people to place news and other announcement-type items into a simple
XML format that can then be pushed to RSS readers and Web pages. The initials RSS can
stand for different things, including Rich Site Summary or Really Simple Syndication.
Users can subscribe to the RSS newsfeeds of their choice, and then have access to the
updated information as it comes in. RSS is used for all kinds of purposes, including the
news itself and announcing new content on Web sites.

RSS content may be read by using an RSS reader, or aggregator. This is usually free
software that you can install on your computer that posts new items and stores old ones in
a graphical interface. An RSS reader similar to e-mail software in that it displays
incoming items and can store content for offline reading. Subscribing to a newsfeed is
usually as simple as entering the address of the RSS document.

A useful list of RSS readers is available on the site of RSS Compendium.

It is also possible to subscribe to and read your own collection of RSS feeds on Web sites
devoted to this purpose. Bloglines is one such example. The advantage here is that you
can access your RSS feeds from any computer that is connected to the Web.

Return to Top

Updated: January 2007

Send comments to Laura Cohen

Current Web Architecture


Table of Contents
Introduction

Basic Web Architecture

Web Architecture Extensibility

Other Transfer Protocols

Other Open Standards

Introduction
This section of the Internet Tool Survey describes the current architecture of the World
Wide Web (WWW). The NCSA Glossary is a useful starting point for Web terms.
Another is the ILC glossary of Internet Terms.

The following sections describe

• the basic two-tier architecture of the web in which static web pages (documents)
are transferred from information servers to browser clients world-wide,
• extensions that permit three-tiered architectures where content pages can be
constructed dynamically and where programs as well as data can be transferred,
• other information transfer protocols, and
• related standards.

Basic Web Architecture


The basic web architecture is two-tiered and characterized by a web client that displays
information content and a web server that transfers information to the client. This
architecture depends on three key standards: HTML for encoding document content,
URLs for naming remote information objects in a global namespace, and HTTP for
staging the transfer.

• HyperText Markup Language (HTML) - the common representation language


for hypertext documents on the Web. HTML had a first public release as HTML
0.0 in 1990, was Internet draft HTML 1.0 in 1993, and HTML 2.0 in 1994. The
September 22 1995 draft of the HTML 2.0 specification has been approved as a
standard by the IETF Application Area HTML Working Group. HTML 3.0 and
Netscape HTML are competing next generations of HTML 2.0. Proposed features
in HTML 3.0 include: forms, style sheets, mathematical markup, and text flow
around figures. For more detailed information, see the HTML Reference Manual.
HTML is an application of the Standard Generalized Markup Language (SGML
ISO-8879), an international standard approved in 1986, which specifies a formal
meta-language for defining document markup systems (more here and here). An
SGML Document Type Definition (DTD) specifies valid tag names and element
attributes. HTML consists of embedded content separated by hierarchical case
sensitive start and end tag names which may contain embedded element attributes
in the start tag. These attributes may be required, optional, or empty. In addition,
documents can be inter or intra linked by establishing source and target anchor
points. Many HTML documents are the result of manual authoring or word
processing HTML converters, but now several WYSIWYG editors support
HTML styles -- see listing at W3C and the Internet Tools Survey section on
Authoring HTML.
HTML files are viewed using a WWW client browser (software), the primary user
interface to the Web. HTML allows for embedding of images, sounds, video
streams, form fields and simple text formatting. References, called hyperlinks, to
other objects are embedded using URLs (see below). When an object is selected
by a hyperlink, the browser takes an action based on the URL's type, e.g., retrieve
a file, connect to another Web site and display a HTML file stored there, or launch
an application such as an E-mail or newsgroup reader.
• Universal Resource Identifier (URI) - an IETF addressing protocol for objects
in the WWW ("if it's out there, we can point at it"). There are two types of URIs,
Universal Resource Names (URN) and the Universal Resource Locators (URL).
The current IETF URI spec is here and the URL spec is here.
URLs are location dependent and contain four distinct parts: the protocol type, the
machine name, the directory path and the file name. There are several kinds of
URLs: file URLs, FTP URLs, Gopher URLs, News URLs, and HTTP URLs.
URLs may be relative to a directory or offsets into a document. Arguments to CGI
programs (see below) may be embedded in URLs after the ? character.
• HyperText Transfer Protocol (HTTP) - an application-level network protocol
for the WWW. Tim Berners-Lee, father of the Web, describes it as a "generic
stateless object-oriented protocol." Stateless means neither the client nor the
server store information about the state of the other side of an ongoing
connection. Statelessness is a scalability property but is not necessarily efficient
since HTTP sets up a new connection for each request, which is not desirable for
situations requiring sessions or transactions.
o In HTTP, commands (request methods) can be associated with particular
types of network objects (files, documents, network services). Commands
are provided for
 establishing a TCP/IP connection to a WWW server,
 sending a request to the server (containing a method to be applied
to a specific network object identified by the object's identifier, and
the HTTP protocol version, followed by information encoded in a
header style),
 returning a response from the server to the client (consisting of
three parts: a status line, a response header, and response data), and
 closing the connection.
o HTTP supports dynamic data representation through client-server
negotiation. The requesting client specifies it can accept certain MIME
content types (more on this below) and the server responds with one of
these. All WWW clients can handle text/plain and text/html.
o HTTP/1.0 Internet Draft 05 (the seventh release of HTTP/1.0) is targeted
as an Internet Informational RFC. The next immediate version of HTTP is
HTTP/1.1 Internet Draft 01.

Web Architecture Extensibility


This basic web architecture is fast evolving to serve a wider variety of needs beyond
static document access and browsing. The Common Gateway Interface (CGI) extends the
architecture to three-tiers by adding a back-end server that provides services to the Web
server on behalf of the Web client, permitting dynamic composition of web pages.
Helpers/plug-ins and Java/JavaScript provide other interesting Web architecture
extensions.

• Common Gateway Interface(CGI) - CGI is a standard for interfacing external


programs with Web servers (see Figure 1). The server hands client requests
encoded in URLs to the appropriate registered CGI program, which executes and
returns results encoded as MIME messages back to the server. CGI's openness
avoids the need to extend HTTP. The most common CGI applications handle
HTML <FORM> and <ISINDEX> commands.
o CGI programs are executable programs that run on the Web server. They
can be written in any scripting language (interpreted) or programming
language (must be compiled first) available to be executed on a Web
server, including C, C++, Fortran, PERL, TCL, Unix shells, Visual Basic,
Applescript, and others. Security precautions typically require that CGI
programs be run from a specified directory (e.g, /cgi-bin) under control of
the webmaster (Web system administrator), that is, they must be registered
with the system.
o Arguments to CGI programs are transmitted from client to server via
environment variables encoded in URLs. The CGI program typically
returns HTML pages that it constructs on the fly.
o Some problems with CGI are:
 the CGI interface requires the server to execute a program
 the CGI interface does not provide a way to share data and
communications resources so if a program must access an external
resource, it must open and close that resource. It is difficult to
construct transactional interactions using CGI.
o The current version is CGI/1.1. W3C and others are experimenting with
next generation object-oriented APIs based on OMG IDL; Netscape
provides Netscape Server API (NSAPI) and Progress Software and
Microsoft provide Internet Server API (ISAPI).

• Helpers/Plug-ins - When a client browser retrieves a file, it launches an installed


helper application or plug-in to process the file based on the file's MIME-type
(see below). For example, it may launch a Postscript or Acrobat reader, or MPEG
or QuickTime player. A helper application runs external to the browser while a
plug-in runs within the browser. For information on how to create new Netscape
Navigator plug-ins, see The Plug-in Developer's Guide.
• Common Client Gateway (CCI) - this gateway allows a third-party application
to remotely control the Web browser client. Netscape Client APIs 2.0 (NCAPIs)
depends on platform specific native methods of interprocess communication
(IPC). They plan to support DDE and OLE2 for Windows clients, X properties for
UNIX clients, and Apple Events for Macintosh clients.
• Extensions to HTTP. W3C and IETF Application Area HTTP Working Group are
working together on current and future versions of HTTP. The HTTP-NG project
is assessing two implementation approaches to HTTP "replacements":
o Spero's approach - allows many requests per connection, the requests can
be asynchronous and the server can respond in any order, allowing several
transfers in parallel. A "session layer" divides the connection into
numerous channels. Control messages (GET requests, meta information)
are returned in a control channel; each object is returned in its own
channel.
o W3C approach - Jim Gettys at W3C is using Xerox ILU (a CORBA
variant) to implement an ILU transport similar to Spero's session protocol.
The advantages of this approach are openness with respect to pluggable
transport protocols, support for multiple language environments, and a
step towards viewing the "web of objects." Related to this approach,
Netscape recently announced future support for OMG Internet Inter-ORB
Protocol (IIOP) standard on both client and server. This will provide a
uniform and language neutral object interchange format making it easier to
construct distributed object applications.
• Java/ JavaScript - Java is a cross-platform WWW programming language
modeled after C++ from Sun Microsystems. Java programs embedded in HTML
documents are called applets and are specified using <APPLET> tags. The
HTML for an applet contains a code attribute that specifies the URL of the
compiled applet file. Applets are compiled to a platform-independent bytecode
which can be safely downloaded and executed by the Java interpreter embedded
into the Web browser. Browsers that support Java are said to be Java-enabled. If
performance is critical, a Java applet can be compiled to native machine language
on the fly. Such a compiler is known as a Just-In-Time (JIT) compiler.
JavaScript is a scripting language designed for creating dynamic, interactive Web
applications that link together objects and resources on both clients and servers. A
client JavaScript can recognize and respond to user events such as mouse clicks,
form input, and page navigation, and query the state or alter the performance of an
applet or plug-in. A server JavaScript script can exhibit behavior similar to
common gateway interface (CGI) programs. JavaScript scripts are embedded in
HTML documents using <SCRIPT> tags. Similar to Java applets, JavaScript
scripts are directly interpreted within the client's browser and are therefore
platform-independent. For a comparison of Java and JavaScript, see here.
The Java Language Specification can be found here, a Java tutorial here, the Java
Virtual Machine (interpreter) here, the Java Developer's Kit (JDK) here, and Java
FAQs here. A comprehensive Java page of resources can be found at JPL.
The JavaScript Language Specification can be found here, a JavaScript tutorial
here, and the JavaScript FAQs here.
• The IETF Security Area Web Transaction Security (WTS) Working Group is
working on security services for WWW. As chartered, it has produced Internet-
drafts of a Requirements for Web Transaction Security and a Secure HyperText
Transfer Protocol specification plus Security Extensions For HTML.

Other Transfer Protocols


The Web also uses other HTTP-related protocols for transferring and representing
information, including:

• Transmission Control Protocol/Internet Protocol (TCP/IP) - the fundamental


protocol that provides for the reliable delivery of streams of data from one host to
another. An introduction to TCP/IP is here.
• File Transfer Protocol (FTP) - a common method of moving files between two
Internet sites. It is based on TCP/IP.
• Gopher - a distributed document search and retrieval protocol (IETF RFC 1436)
for obtaining files or information from hierarchical menus in the Gopher
information-retrieval system.
• Internet Inter-ORB Protocol (IIOP) - an inter-ORB protocol for communication
between objects and applications. It is based on the Common Object Request
Broker Architecture (CORBA) specification.
• Multipurpose Internet Mail Extensions (MIME) - the protocol for multimedia
email and a building block of HTTP. The first packet of information received by a
client identifies the type of file the server has sent, e.g., binary, audio, video,
movie, formatted word-processor documents, graphics, spreadsheets, etc.. The
extensions to the SMTP format allow it to carry multiple types of data. When
multimedia files are sent using the MIME standard they are encoded into non-
readable text. The Web browser maintains a list of pairs of MIME-Types and
helper applications for handling each type.
• Network News Transfer Protocol (NNTP) - the protocol used to connect to Usenet
discussion groups.
• Secure Socket Layer (SSL) - a security protocol developed by Netscape for
sending and receiving encrypted information. It is based on encryption technology
developed by RSA, Inc..
• Simple Mail Transfer Protocol (SMTP) - a protocol for transferring electronic
mail from one host to another.
• Simple Network Management Protocol (SNMP) - a protocol that allows a
network administrator to monitor network devices over the network.
• Z39.50 - a protocol that governs the formats and procedures by which two
computers interact with one another. It is used to search several databases of the
same type, and is session-oriented and stateful.

Other Open Standards


The Web also builds on additional open standards:

• GIF, JPEG, and XBM image formats.


• Virtual Reality Modeling Language (VRML) - a proposed standard language for
describing multi-participant interactive simulations within the WWW.
• HyperMedia Management Protocol (HMMP) - a protocol to access and
manipulate components of the Hypermedia Management Schema (HMMS), a data
representation formalism (schema) for representing managed objects. HMMS and
HMMP are major components of the Web-Based Enterprise Management
standards effort to integrate existing standards, such as SNMP/UDP,
HTML/HTTP and DMI/RPC into a browser-managed architecture.
• Real Time Streaming Protocol (RTSP) - a recently proposed communication
protocol for control and delivery of video and audio in real-time.
• Proxy and SOCKS firewall protocols.
• S-HTTP security protocol.
A more complete list of standards can be found at Netscape and the World Wide Web
Consortium. A complete list of Internet Engineering Task Force (IETF) standard RFCs
can be found here.

This research is sponsored by the Defense Advanced Research Projects Agency and managed by the U.S.
Army Research Laboratory under contract DAAL01-95-C-0112. The views and conclusions contained in
this document are those of the authors and should not be interpreted as necessarily representing the official
policies, either expressed or implied of the Defense Advanced Research Projects Agency, U.S. Army
Research Laboratory, or the United States Government.

© Copyright 1996 Object Services and Consulting, Inc. Permission is granted to copy this document
provided this copyright statement is retained in all copies. Disclaimer: OBJS does not warrant the
accuracy or completeness of the information on this page.

This page was written by Craig Thompson and Gil Hansen. Send questions and comments about this page
to thompson@objs.com or gil@objs.com.

Last updated: 1/3/97 sjf

Back to Internet Tool Survey -- Back to OBJS

Issues in the Development of Distributed


Hypermedia Applications
@@obsoletes old address.

Here are the issues that are important in a system for developing distributed hypermedia
applications. They are useful for evaluating mobile code systems and object technologies.
See also: The Web Object Model and Type System.

Concurrency
User interfaces need to be responsive and lively, and servers need to respond to
multiple clients concurrently. A system with explicit support for threads, mutexes,
and condition variables facilitates these and a wide variety of other techniques.
Thread-Aware I/O
One of the major opportunities for parallelism is during I/O requests. An I/O
library that is thread-aware avoids the need for event-loops, stupid-select()-tricks,
and other fragile techniques that tend to conflict between large libraries.
MS Windows GUI
currently, the dominant user platform. Must be supported. Developing for Win3.1
is a pain, but Windows NT and Windows-95 remove many of the gotchas of the
Win3.1 programming model.
X/Motif GUI
deployment on this platform is pretty important -- critical, I'd say.
Mac GUI
A complete solution must support this.
Portable GUI API
A programming model that allows one body of code to be delivered on all three
platforms is very handy. Unfortunately, I've never seen one that doesn't involve
major compromises and/or major license fees.
Objects with inheritance
Programmers are more productive with APIs that exploit inheritance.
Protocols/Mixins/Interfaces
In Smalltalk (and hence objective-C) systems, a "protocol" is a collection of
methods that an object must support, regardless of where it sits in the class
hierarchy. In many cases, it obviates the need for multiple inheritance, which can
be messy.
POSIX API
Not a critical feature, especially since it conflicts with Thread-aware I/O and
exception based programming. But the ability to spawn processes, get at files, etc.
is handy. Higher level interfaces are sometimes better, but sometimes not.
Module System
To allow independent bodies of code to be used together, we need some
technique(s) for managing the namespace. I like the way Modula-3 handles name
spaces, but partial revelations is a key to making it work, and you won't find that
outside of Modula-3. But Ada95, python, and some others give the right feel. I
like having qualified names and ways to abbreviate them (e.g. IMPORT ... FROM
...). Separation of interfaces from modules is handy, and noteably lacking from
Python and Java. But it's evidently not critical.
Exceptions
Counting on folks to check return codes is madness. For really scalable code
bases, Modula-3's static RAISES checking is really handy.
Run-time Safety
The question here is: does the runtime system guarantee that the program will
behave according to its source code (or abort)? Languages like C/C++ do not.
Languages like lisp, scheme, python, Java, etc. do. Modula-3 supports both kinds
of modules. Run-time safety generally mandates automatic storage management
(e.g. garbage collection)
System Safety
Slightly different question: can I control access to resources by a piece of
executing code? Can I prevent it from erasing my files, forging mail from me,
etc.? Much harder question, as it involves security policies. This is critical to
mobile code systems. An Introduction to Safety and Security in Telescript gives
an excellent discussion of run-time safety and system safety.
@@For an application to span administrative domains, there must be access
control mechanisms, usually including authentication, to maintain security and
system integrity.
Foreign Function Interface
There's a lot of code written in C etc. that we don't want to -- or cannot, for
reasons of copyright etc. -- rewrite. A cliean foriegn function interface is probably
Tcl's greatest strength. Threads, garbage collection, and exceptions can really
complicate FFI.
TCP/IP API
Gotta have access to the network.
Flatten/Serialize/Pickle
Useful for persistent object store, and for RPC marshalling.
Mobile Closures
This allows computations to span address spaces in a fairly natural, if not
transparent way. Obliq pioneered this, in a way.

Some other handy language features for programming-in-the-large:

Generics
Ala Ada and Modula-3, or C++ templates.
Module/Interface Separation
Ala the separation of .h from .c in C.

You might also like