Traditional IR vs. Web IR

Web Information Retrieval vs.
Traditional Information Retrieval

Reported by Karen Cecille C. Victoria
In today’s fast-phased world, where lots of things come in an instant (coffee,

noodles, messages, …etc.), demands for instant information is also high. This has been
the main motivation of the libraries around the world to provide access to their
collections to their clients even they are not in the library premises. That setting also gave
rise to the “Ask a Librarian” service of the libraries. Through the said service, the clients
can obtain information for their assignments, researches and other scholarly endeavor in
just a matter of minutes. But is it really the speed or the relevance of the gathered
information that matters?
This paper discusses the two ways of retrieving information or simply information
retrieval (IR). Information retrieval is the science of searching for documents, for
information within documents and for metadata about documents, as well as that of
searching relational databases and the World Wide Web (www.wikipedia.org) and the
process of searching within a document collection for information most relevant to a
user’s query (Langville & Meyer, 2006).
Components or IR (Davis & Lew, 2004)
− Person with an information problem

− An information system (database)
− Interaction between the person and the system to resolve the problem
Without the presence of even one of its component, information retrieval won’t be
completed. For information to be retrieved, there should be a person in need of
information. These information vary from one person to another. There should be
a system or database where these information are stored and will be retrieved. The
person should then need to interact with the system for it won’t search for the
information without a command coming from the person. This interaction is
searching.
Two Kinds of Information Retrieval
Traditional Information Retrieval

• search within small, controlled, non-linked collections (Langville &
Meyer, 2006).
• oldest and most simple of the system-centered models (Davis & Lew.
2004)
Web Information Retrieval

• search with the world’s largest and linked document collection (Langville
& Meyer, 20606)
Traditional IR vs. Web IR
Traditional Information Web Information Retrieval

Retrieval
Collection − document collection − the publicly accessible
Web
Goal − retrieve documents or − retrieve high quality
text with information pages that are relevant
content relevant to the to the user’s needs
user’s needs
Organization − accumulated, edited and − self-organized (edited
categorized by trained by robots/machines)
specialists
Users − projected number of − unpredictable number of
users with relatively users with wide range of
same information needs information needs
Extent of content − small, static and − massive amounts of
homogenous text dynamic, heterogeneous
corpora and hyperlinked
information
Queries − relatively descriptive − short and unfocused
and specific queries queries
Display of results − ranks documents − Retrieved documents are
according to their of equal value; results
estimated degree of are not ranked by degree
relevance of relevance
1. Traditional IR’s collection is specific. It only covers documents and materials that
will serve a specific need of information. Web IR’s collection on the other hand is
comprise of range of materials and articles varying in subject and are open to the
public.
2. Both aim to retrieve all the relevant documents at the same time retrieving as few
of the non-relevant as possible. (Davies & Lew 2004)
3. Documents in traditional IR were structured, planned and organized by experts
trained to do such database, thus, it makes retrieval systematic. Since a lot of
information is being added every minute in the web, information can’t be
organized in such a way that all relevant information will be retrieved in just a
single search. The Web has no standards, no reviewers, and no gatekeepers to
police content, structure and format (Langville & Meyer 2006). It only has spiders
or crawlers who collect information and indexer which are machines or robots.
4. In traditional IR, the number of users has been projected before the database is
developed. Even if the users of such method increase, it only does gradually.
These users may belong to a specific community (e.i. school, industry, health,…
etc.) which needs rather identical information, Web IR users, on the other hand,
come from all walks of life on every side of the globe and these users vary in
information needs.
5. Traditional text-based IR research uses homogeneous copora with coherent
vocabulary, high quality content and congruous authorship. The Web corpus,
however, introduces the challenges of diverse authorship, vocabulary and quality.
Furthermore, some Web documents are intentionally fragmented to facilitate
navigation and hyperlinking, making it difficult to determine their topics from
local content alone (Yang). In addition, information on he Web does not only
come in text but also in different formats such as graphs, images, videos and
others which also contribute in organizing and indexing difficulties.
6. Queries in traditional IR can be refined by the users, depending on how the
database is designed. Search can be refined based on the range of year and article
format. On the other hand, in searching the Web, most users just use one or two
keywords. It can also be refined but by just using Boolean operators
7. Traditional IR matches the queries with just the information in the text, thus, the
degree of relevance is measured in term similarity. The search result lists the hits
according to how relevant they are to the query. The more matching words there
are in the field the more probable that it would be placed on top of the result. Web
IR, however, doesn’t rank the retrieved information based on relevance, results
are shown in no particular order of relevance.
8. At the end, it is the user’s judgment which will decide whether the retrieved
information is relevant or not.
Advantages of Traditional IR
1. Clean formalism- it uses the Boolean operators AND, OR and NOT which makes
the queries more specific
2. Well understood
3. Good for numeric, bibliographic and structured data
4. Searches can be analyzed and can be revised strategically if it failed
Disadvantages of Traditional IR
1. Unfriendliness of Boolean formula- Boolean formula may refine the search but it
may also be a cause of confusion
2. Exact-match often means low output or output overload
3. Keyword terms may be taken out of context
4. Not good for the end-user
Advantages of Web IR
1. Many tools are available- there are lots of search engines available in the Web
2. Hyperlinking- retrieved results provide links to other documents relevant to the
query
3. Huge amount of pages have been indexed, so most queries can get sufficient
results
Disadvantages of Web IR
1. Web contents are usually heterogeneous and noisy and need careful treatment
2. Features are extracted from the corresponding query’s content instead of the
whole page
References
Davies, B. (2004). Information Retrieval Models: Traditional: Understanding the

Logic Behind Systems Like Dialog. Retrieved July 8, 2009 from
http://www.slais.ubc.ca/COURSES/libr557/03-04-
wt2/IRModels_trad/irmodels_trad.htm
Henzinger, M. Google Tutorial: Web Information Retrieval. Retrieved July 13, 2009
from http://www.tcnj.edu/~mmmartin/CMSC485/Papers/Google/icde.pdf
Langville, A. N. and Meyer, C. D. (2006). Information Retrieval and Web Search.

Retrieved July 8, 2009 from http://www.cofc.edu/~langvillea/HLA.pdf
Shui-Lung Chuang, & Lee-Feng Chien. (2003). Automatic query taxonomy

generation for information retrieval applications. Online Information
Review, 27(4), 243-255. Retrieved July 17, 2009, from Academic Research
Library. (Document ID: 443413551).
Yang, K. Information Retrieval on the Web. Retrieved July 6, 2009 from

http://74.125.93.132/search?
q=cache:kiA6U4bgdNIJ:130.203.133.121:8080/viewdoc/download%3Bjsessionid
%3DA773A599C6661B9643F01B40322B907B%3Fdoi
%3D10.1.1.85.6202%26rep%3Drep1%26type
%3Dpdf+information+retrieval+on+the+web+kiduk+yang&cd=4&hl=en&ct=cln
k&gl=ph

Traditional IR vs. Web IR

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Traditional IR vs. Web IR

Uploaded by

Copyright:

Available Formats

Web Information Retrieval vs.

Traditional Information Retrieval

In today’s fast-phased world, where lots of things come in an instant (coffee,

Components or IR (Davis & Lew, 2004)

− Person with an information problem

Two Kinds of Information Retrieval

Traditional Information Retrieval

Web Information Retrieval

Traditional Information Web Information Retrieval

Davies, B. (2004). Information Retrieval Models: Traditional: Understanding the

Langville, A. N. and Meyer, C. D. (2006). Information Retrieval and Web Search.

Shui-Lung Chuang, & Lee-Feng Chien. (2003). Automatic query taxonomy

Yang, K. Information Retrieval on the Web. Retrieved July 6, 2009 from

You might also like