You are on page 1of 4

Web Information Retrieval vs.

Traditional Information Retrieval


Reported by Karen Cecille C. Victoria

In today’s fast-phased world, where lots of things come in an instant (coffee,


noodles, messages, …etc.), demands for instant information is also high. This has been
the main motivation of the libraries around the world to provide access to their
collections to their clients even they are not in the library premises. That setting also gave
rise to the “Ask a Librarian” service of the libraries. Through the said service, the clients
can obtain information for their assignments, researches and other scholarly endeavor in
just a matter of minutes. But is it really the speed or the relevance of the gathered
information that matters?
This paper discusses the two ways of retrieving information or simply information
retrieval (IR). Information retrieval is the science of searching for documents, for
information within documents and for metadata about documents, as well as that of
searching relational databases and the World Wide Web (www.wikipedia.org) and the
process of searching within a document collection for information most relevant to a
user’s query (Langville & Meyer, 2006).

Components or IR (Davis & Lew, 2004)

− Person with an information problem


− An information system (database)
− Interaction between the person and the system to resolve the problem

Without the presence of even one of its component, information retrieval won’t be
completed. For information to be retrieved, there should be a person in need of
information. These information vary from one person to another. There should be
a system or database where these information are stored and will be retrieved. The
person should then need to interact with the system for it won’t search for the
information without a command coming from the person. This interaction is
searching.

Two Kinds of Information Retrieval

Traditional Information Retrieval


• search within small, controlled, non-linked collections (Langville &
Meyer, 2006).
• oldest and most simple of the system-centered models (Davis & Lew.
2004)

Web Information Retrieval


• search with the world’s largest and linked document collection (Langville
& Meyer, 20606)
Traditional IR vs. Web IR

Traditional Information Web Information Retrieval


Retrieval
Collection − document collection − the publicly accessible
Web
Goal − retrieve documents or − retrieve high quality
text with information pages that are relevant
content relevant to the to the user’s needs
user’s needs
Organization − accumulated, edited and − self-organized (edited
categorized by trained by robots/machines)
specialists
Users − projected number of − unpredictable number of
users with relatively users with wide range of
same information needs information needs
Extent of content − small, static and − massive amounts of
homogenous text dynamic, heterogeneous
corpora and hyperlinked
information
Queries − relatively descriptive − short and unfocused
and specific queries queries
Display of results − ranks documents − Retrieved documents are
according to their of equal value; results
estimated degree of are not ranked by degree
relevance of relevance

1. Traditional IR’s collection is specific. It only covers documents and materials that
will serve a specific need of information. Web IR’s collection on the other hand is
comprise of range of materials and articles varying in subject and are open to the
public.
2. Both aim to retrieve all the relevant documents at the same time retrieving as few
of the non-relevant as possible. (Davies & Lew 2004)
3. Documents in traditional IR were structured, planned and organized by experts
trained to do such database, thus, it makes retrieval systematic. Since a lot of
information is being added every minute in the web, information can’t be
organized in such a way that all relevant information will be retrieved in just a
single search. The Web has no standards, no reviewers, and no gatekeepers to
police content, structure and format (Langville & Meyer 2006). It only has spiders
or crawlers who collect information and indexer which are machines or robots.
4. In traditional IR, the number of users has been projected before the database is
developed. Even if the users of such method increase, it only does gradually.
These users may belong to a specific community (e.i. school, industry, health,…
etc.) which needs rather identical information, Web IR users, on the other hand,
come from all walks of life on every side of the globe and these users vary in
information needs.
5. Traditional text-based IR research uses homogeneous copora with coherent
vocabulary, high quality content and congruous authorship. The Web corpus,
however, introduces the challenges of diverse authorship, vocabulary and quality.
Furthermore, some Web documents are intentionally fragmented to facilitate
navigation and hyperlinking, making it difficult to determine their topics from
local content alone (Yang). In addition, information on he Web does not only
come in text but also in different formats such as graphs, images, videos and
others which also contribute in organizing and indexing difficulties.
6. Queries in traditional IR can be refined by the users, depending on how the
database is designed. Search can be refined based on the range of year and article
format. On the other hand, in searching the Web, most users just use one or two
keywords. It can also be refined but by just using Boolean operators
7. Traditional IR matches the queries with just the information in the text, thus, the
degree of relevance is measured in term similarity. The search result lists the hits
according to how relevant they are to the query. The more matching words there
are in the field the more probable that it would be placed on top of the result. Web
IR, however, doesn’t rank the retrieved information based on relevance, results
are shown in no particular order of relevance.
8. At the end, it is the user’s judgment which will decide whether the retrieved
information is relevant or not.

Advantages of Traditional IR
1. Clean formalism- it uses the Boolean operators AND, OR and NOT which makes
the queries more specific
2. Well understood
3. Good for numeric, bibliographic and structured data
4. Searches can be analyzed and can be revised strategically if it failed

Disadvantages of Traditional IR
1. Unfriendliness of Boolean formula- Boolean formula may refine the search but it
may also be a cause of confusion
2. Exact-match often means low output or output overload
3. Keyword terms may be taken out of context
4. Not good for the end-user

Advantages of Web IR
1. Many tools are available- there are lots of search engines available in the Web
2. Hyperlinking- retrieved results provide links to other documents relevant to the
query
3. Huge amount of pages have been indexed, so most queries can get sufficient
results

Disadvantages of Web IR
1. Web contents are usually heterogeneous and noisy and need careful treatment
2. Features are extracted from the corresponding query’s content instead of the
whole page
References

Davies, B. (2004). Information Retrieval Models: Traditional: Understanding the


Logic Behind Systems Like Dialog. Retrieved July 8, 2009 from
http://www.slais.ubc.ca/COURSES/libr557/03-04-
wt2/IRModels_trad/irmodels_trad.htm

Henzinger, M. Google Tutorial: Web Information Retrieval. Retrieved July 13, 2009
from http://www.tcnj.edu/~mmmartin/CMSC485/Papers/Google/icde.pdf

Langville, A. N. and Meyer, C. D. (2006). Information Retrieval and Web Search.


Retrieved July 8, 2009 from http://www.cofc.edu/~langvillea/HLA.pdf

Shui-Lung Chuang, & Lee-Feng Chien. (2003). Automatic query taxonomy


generation for information retrieval applications. Online Information
Review, 27(4), 243-255. Retrieved July 17, 2009, from Academic Research
Library. (Document ID: 443413551).

Yang, K. Information Retrieval on the Web. Retrieved July 6, 2009 from


http://74.125.93.132/search?
q=cache:kiA6U4bgdNIJ:130.203.133.121:8080/viewdoc/download%3Bjsessionid
%3DA773A599C6661B9643F01B40322B907B%3Fdoi
%3D10.1.1.85.6202%26rep%3Drep1%26type
%3Dpdf+information+retrieval+on+the+web+kiduk+yang&cd=4&hl=en&ct=cln
k&gl=ph

You might also like