Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Web Dragons: Inside the Myths of Search Engine Technology
Web Dragons: Inside the Myths of Search Engine Technology
Web Dragons: Inside the Myths of Search Engine Technology
Ebook485 pages

Web Dragons: Inside the Myths of Search Engine Technology

Rating: 4.5 out of 5 stars

4.5/5

()

Read preview

About this ebook

Web Dragons offers a perspective on the world of Web search and the effects of search engines and information availability on the present and future world.

In the blink of an eye since the turn of the millennium, the lives of people who work with information have been utterly transformed. Everything we need to know is on the web. It's where we learn and play, shop and do business, keep up with old friends and meet new ones. Search engines make it possible for us to find the stuff we need to know. Search engines — web dragons — are the portals through which we access society's treasure trove of information. How do they stack up against librarians, the gatekeepers over centuries past? What role will libraries play in a world whose information is ruled by the web? How is the web organized? Who controls its contents, and how do they do it? How do search engines work? How can web visibility be exploited by those who want to sell us their wares? What's coming tomorrow, and can we influence it?

As we witness the dawn of a new era, this book shows readers what it will look like and how it will change their world. Whoever you are: if you care about information, this book will open your eyes and make you blink.

  • Presents a critical view of the idea of funneling information access through a small handful of gateways and the notion of a centralized index--and the problems that may cause
  • Provides promising approaches for addressing the problems, such as the personalization of web services
  • Presented by authorities in the field of digital libraries, web history, machine learning, and web and data mining
  • Find more information at the author's site: webdragons.net
LanguageEnglish
Release dateJul 27, 2010
ISBN9780080469096
Web Dragons: Inside the Myths of Search Engine Technology
Author

Ian H. Witten

Ian H. Witten is a professor of computer science at the University of Waikato in New Zealand. He directs the New Zealand Digital Library research project. His research interests include information retrieval, machine learning, text compression, and programming by demonstration. He received an MA in Mathematics from Cambridge University, England; an MSc in Computer Science from the University of Calgary, Canada; and a PhD in Electrical Engineering from Essex University, England. He is a fellow of the ACM and of the Royal Society of New Zealand. He has published widely on digital libraries, machine learning, text compression, hypertext, speech synthesis and signal processing, and computer typography.

Read more from Ian H. Witten

Related to Web Dragons

Databases For You

View More

Reviews for Web Dragons

Rating: 4.5 out of 5 stars
4.5/5

2 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Web Dragons - Ian H. Witten

    Network).

    SETTING THE SCENE

    The universe (which others call the Library) is composed of an indefinite and perhaps infinite number of hexagonal galleries, with vast air shafts between, surrounded by very low railings …

    Thus begins Jorge Luis Borges’s fable The Library of Babel, which conjures up an image not unlike the World Wide Web. He gives a surreal description of the Library, which includes spiral staircases that sink abysmally and soar upwards to remote distances and mirrors that lead the inhabitants to conjecture whether or not the Library is infinite (… I prefer to dream that their polished surfaces represent and promise the infinite, declares Borges’s anonymous narrator). Next he tells of the life of its inhabitants, who live and die in this bleak space, traveling from gallery to gallery in their youth and in later years specializing in the contents of a small locality of this unbounded labyrinth. Then he describes the contents: every conceivable book is here, the archangels’ autobiographies, the faithful catalogue of the Library, thousands and thousands of false catalogues, the demonstration of the fallacy of those catalogues, the demonstration of the fallacy of the true catalogue….

    Although the celebrated Argentine writer wrote this enigmatic little tale in 1941, it resonates with echoes of today’s World Wide Web. The impious maintain that nonsense is normal in the Library and the reasonable is an almost miraculous exception. But there are differences: travelers confirm that no two books in Borges’s Library are identical—in sharp contrast with the web, replete with redundancy.

    The universe (which others call the Web) is exactly what this book is about. And the universe is not always a happy place. Despite the apparent glut of information in Borges’s Library of Babel, its books are completely useless to the reader, leading the librarians to a state of suicidal despair. Today we stand at the epicenter of a revolution in how our society creates, organizes, locates, presents, and preserves information—and misinformation. We are battered by lies, from junk e-mail, to other people’s misconceptions, to advertisements dressed up as hard news, to infotainment in which the borders of fact and fiction are deliberately smeared. It’s hard to make sense of the maelstrom: we feel confused, disoriented, unconfident, wary of the future, unsure even of the present.

    Take heart: there have been revolutions before. To gain a sense of perspective, let’s glance briefly at another upheaval, one that caused far more chaos by overturning not just information but science and society as well. The Enlightenment in the eighteenth century advocated rationality as a means of establishing an authoritative system of knowledge and governance, ethics, and aesthetics. In the context of the times, this was far more radical than today’s little information revolution. Up until then, society’s intellectual traditions, legal structure, and customs were dictated partly by an often tyrannical state and partly by the Church—leavened with a goodly dose of irrationality and superstition. The French Revolution was a violent manifestation of Enlightenment philosophy. The desire for rationality in government led to an attempt to end the Catholic Church and indeed Christianity in France, as well as bringing a new order to the calendar, clock, yardstick, monetary system, and legal structure. Heads rolled.

    Immanuel Kant, a great German philosopher of the time, urged thinkers to have the courage to rely on their own reason and understanding rather than seeking guidance from other, ostensibly more authoritative, intellects as they had been trained to do. As our kids say today, Grow up! He went on to ask new philosophical questions about the present—what is happening right now. How can we interpret the present when we are part of it ourselves, when our own thinking influences the very object of study, when new ideas cause heads to roll? In his quest to understand the revolutionary spirit of the times, he concluded that the significance of revolutions is not in the events themselves so much as in how they are perceived and understood by people who are not actually front-line combatants. It is not the perpetrators—the actors on the world stage—who come to understand the true meaning of a revolution, but the rest of society, the audience who are swept along by the plot.

    In the information revolution sparked by the World Wide Web, we are all members of the audience. We did not ask for it. We did not direct its development. We did not participate in its conception and launch, in the design of the protocols and the construction of the search engines. But it has nevertheless become a valued part of our lives: we use it, we learn from it, we put information on it for others to find. To understand it we need to learn a little of how it arose and where it came from, who were the pioneers who created it, and what were they trying to do.

    The best place to begin understanding the web’s fundamental role, which is to provide access to the world’s information, is with the philosophers, for, as you probably recall from early university courses in the liberal arts, early savants like Socrates and Plato knew a thing or two about knowledge and wisdom, and how to acquire and transmit them.

    ACCORDING TO THE PHILOSOPHERS …

    Seeking new information presents a very old philosophical conundrum. Around 400 B.C., the Greek sage Plato spoke of how his teacher Socrates examined moral concepts such as good and justice, important everyday ideas that are used loosely without any real definition. Socrates probed students with leading questions to help them determine their underlying beliefs and map out the extent of their knowledge—and ignorance. The Socratic method does not supply answers but generates better hypotheses by steadily identifying and eliminating those that lead to contradictions. In a discussion about Virtue, Socrates’ student Meno stumbles upon a paradox.

    Meno: And how will you enquire, Socrates, into that which you do not know? What will you put forth as the subject of enquiry? And if you find what you want, how will you ever know that this is the thing which you did not know?

    Socrates: I know, Meno, what you mean; but just see what a tiresome dispute you are introducing. You argue that man cannot enquire either about that which he knows, or about that which he does not know; for if he knows, he has no need to enquire; and if not, he cannot; for he does not know the very subject about which he is to enquire.

    Plato Meno, XIV 80d–e/81a (Jowett, 1949)

    In other words, what is this thing called search? How can you tell when you have arrived at the truth when you don’t know what the truth is? Web users, this is a question for our times!

    KNOWLEDGE AS RELATIONS

    Socrates, typically, did not answer the question. His method was to use inquiry to compel his students into a sometimes uncomfortable examination of their own beliefs and prejudices, to unveil the extent of their ignorance. His disciple Plato was more accommodating and did at least try to provide an answer. In philosophical terms, Plato was an idealist: he thought that ideas are not created by human reason but reside in a perfect world somewhere out there. He held that knowledge is in some sense innate, buried deep within the soul, but can be dimly perceived and brought out into the light when dealing with new experiences and discoveries—particularly with the guidance of a Socratic interrogator.

    Reinterpreting for the web user, we might say that we do not begin the process of discovery from scratch, but instead have access to some preexisting model that enables us to evaluate and interpret what we read. We gain knowledge by relating new information and experience to our existing model in order to make sense of our perceptions. At a personal level, knowledge creation—that is, learning—is a process without beginning or end.

    The American philosopher Charles S. Peirce (1839–1914) founded a movement called pragmatism that strives to clarify ideas by applying the methods of science to philosophical issues. His work is highly respected by other philosophers. Bertrand Russell thought he was certainly the greatest American thinker ever, and Karl Popper called him one of the greatest philosophers of all time. When Peirce discussed the question of how we acquire new knowledge, or as he put it, whether there is any cognition not determined by a previous cognition, he concluded that knowledge consists of relations.

    All the cognitive faculties we know of are relative, and consequently their products are relations. But the cognition of a relation is determined by previous cognitions. No cognition not determined by a previous cognition, then, can be known. It does not exist, then, first, because it is absolutely incognizable, and second, because a cognition only exists so far as it is known.

    Peirce (1868a, p. 111)

    What thinking, learning, or acquiring knowledge does is create relations between existing cognitions—today we would call them cognitive structures, patterns of mental activity. But where does it all begin? For Peirce, there is no such thing as the first cognition. Everything we learn is intertwined—nothing comes first, there is no beginning.

    Peirce’s pragmatism sits at the very opposite end of the philosophical spectrum to Plato’s idealism. But the two reached strikingly similar conclusions: we acquire knowledge by creating relationships among elements that were formerly unconnected. For Plato, the relationships are established between the perfect world of ideas and the world of actual experience, whereas Peirce’s relations are established among different cognitions, different thoughts. Knowing is relating. When philosophers arrive at the same conclusion from diametrically opposing starting points, it’s worth listening.

    The World Wide Web is a metaphor for the general knowledge creation process that both Peirce and Plato envisaged. We humans learn by connecting and linking information, the very activity that defines the web. As we will argue in the next chapters, virtually all recorded knowledge is out there on the web—or soon will be. If linking information together is the key activity that underlies learning, the links that intertwine the web will have a profound influence on the entire process of knowledge creation within our society. New knowledge will not only be born digital; it will be born fully contextualized and linked to the existing knowledge base at birth—or, more literally, at conception.

    KNOWLEDGE COMMUNITIES

    We often think of the acquisition of new knowledge as a passive and solitary activity, like reading a book. Nothing could be further from the truth. Plato described how Socrates managed to elicit Pythagoras’s theorem, a mathematical result commonly attributed to the eponymous Greek philosopher and mathematician who lived 200 years earlier, from an uneducated slave—an extraordinary feat. Socrates led the slave into discovering this result through a long series of simple questions. He first demonstrated that the slave (incorrectly) thought that if you doubled the side of a square, you doubled its area. Then he talked him through a series of simple and obvious questions that made him realize that to double the area, you must make the diagonal twice the length of the side, which is not the same thing as doubling the side.

    We can draw two lessons from this parable. First, discovery is a dialogue. The slave could never have found the truth alone, but only when guided by a master who gave advice and corrected his mistakes. Learning is not a solitary activity. Second, the slave reaches his understanding through a dynamic and active process, gradually producing closer approximations to the truth by correcting his interpretation of the information available. Learning, even learning a one-off fact, is not a blinding flash of inspiration but a process of discovery that involves examining ideas and beliefs using reason and

    Enjoying the preview?
    Page 1 of 1