You are on page 1of 1

actually are.

(Notice that this book treats the word data as a plu ral you will appear at her house for dinner the next day. Assuming
noun - in common usage you may often hear it referred to as that message gets through without being garbled or lost, you will
singular instead.) If you have studied computer science or mathe- have successfully transmitted one bit of information from you to
matics, you may nd the discussion in this chapter a bit redun- dant, her. Claude Shannon developed some mathematics, now often re-
so feel free to skip it. Otherwise, read on for an introduction ferred to as "Information Theory," that carefully quanti ed how
to the most basic ingredient to the data scientist’s efforts: data. bits of data transmitted accurately from a source to a recipient can
reduce uncertainty by providing information. A great deal of the
A substantial amount of what we know and say about data in the
computer networking equipment and software in the world today
present day comes from work by a U.S. mathematician named
- and especially the huge linked worldwide network we call the
Claude Shannon. Shannon worked before, during, and after World
Internet - is primarily concerned with this one basic task of getting
War II on a variety of mathematical and engineering problems re- lated
bits of information from a source to a destination.
to data and information. Not to go crazy with quotes, or any- thing, but
Shannon is quoted as having said, "The fundamental
Once we are comfortable with the idea of a "bit" as the most basic
problem of communication is that of reproducing at one point ei- ther unit of information, either "yes" or "no," we can combine bits to-
exactly or approximately a message selected at another point." gether to make more complicated structures. First, let’s switch la-
bels just slightly. Instead of "no" we will start using zero, and in-
This quote helpfully captures key ideas about data that are impor- tant
stead of "yes" we will start using one. So we now have a single
in this book by focusing on the idea of data as a message that
moves from a source to a recipient. Think about the simplest possi- ble digit, albeit one that has only two possible states: zero or one
message that you could send to another person over the phone, (we’re temporarily making a rule against allowing any of the big- ger
via a text message, or even in person. Let’s say that a friend had digits like three or seven). This is in fact the origin of the word
asked you a question, for example whether you wanted to come to "bit," which is a squashed down version of the phrase "Binary

Page 11 of 196

(Try looking
digIT." A single
up binary
the worddigit
"nybble"
can beonline!)
0 or 1, but
A byte
there
offers
is nothing
enoughstop-
dif- ping
ferent combinations to encode all of the letters of the alphabet, in- cluding
us fromand
capital using
small
moreletters.
than There
one binary
is andigit
old rulebook
in our messages.
called
Have a look at the example in the table below: "ASCII" - the American Standard Code for Information Interchange
- which matches up patterns of eight bits with the letters of the al-
phabet, punctuation, and a few other odds and ends. For example
MEANING 2ND DIGIT 1ST DIGIT
the bit pattern 0100 0001 represents the capital letter A and the next
No 0 0 higher pattern, 0100 0010, represents capital B. Try looking up an
ASCII table online (for example, http://www.asciitable.com/) and
Maybe 0 1
you can nd all of the combinations. Note that the codes may not
Probably 1 0 actually be shown in binary because it is so di cult for people to
read long strings of ones and zeroes. Instead you may see the
De nitely 1 1
equivalent codes shown in hexadecimal (base 16), octal (base 8), or
Here we have started to use two binary digits - two bits - to create the most familiar form that we all use everyday, base 10. Although

You might also like