You are on page 1of 357

Mathematics of the Secondary School

Curriculum, I (Math 151)


H. Wu

August 23, 2010

Hung-Hsi
c Wu, 2010

Structure of the Book (p. 3)


General Introduction (p. 4)
Suggestions on How to Read This Book (p. 6)

Chapter 1: Fractions (p. 8)


Chapter 2: Rational Numbers (p. 95)
Chapter 3: The Euclidean Algorithm (p. 146)
Chapter 4: Basic Isometries and Congruence (p. 167)
Chapter 5: Dilation and Similarity (p. 229)
Chapter 6: The Symbolic Notation and Linear Equations
(p. 273)

1
Mathematics of the Secondary School Curriculum, II and III
(Math 152-153) NOT INCLUDED HERE

Chapter 7: Linear and Quadratic Functions, and


Quadratic Equations
Chapter 8: Polynomial and Rational Functions
Chapter 9: Exponential and Logarithmic Functions
Chapter 10: Polynomial Forms and Complex Numbers
Chapter 11: Basic Theorems of Plane Geometry
Chapter 12: Ruler and Compass Constructions
Chapter 13: Axiomatic Systems
Chapter 14: Comments About 3-Dimensions

Chapter 15: Trigonometry


Chapter 16: The Concept of Limit
Chapter 17: The Decimal Expansion of a Number
Chapter 18: Length, Area, and Volume
Chapter 19: Derivative and Integral
Chapter 20: Exponential and Logarithmic Functions,
Revisited

2
Structure of the Book

Chapter 1




 Chapter 2 `
 ```
 ```
Chapter 4 h ```
```
hh hhhh
Chapter 5 Chapter 3


Chapter 6


Chapter 7



Chapter 8

aa

aa

aa


a

Chapter 11
Chapter 9 Chapter 10

aa
! !


aaa !!
!
!
!


a a !! !!

Chapter 12 Chapter 15 !!
!

!!
!

! !!

!
Chapter 13 Chapter 16 !

Chapter 14 Chapter 17

Chapter 18

Chapter 19

Chapter 20

The last section of Chapter 7 depends tangentially on Chapter 11.

The discussion of volume in Chapter 18 depends on Chapter 14.

3
General Introduction

This book is a school mathematics textbook for grades 612 written for teachers. It
covers the relevant mathematics outside of probability and statistics in a self-contained
manner and with particular emphasis on mathematical integrity. If there was something
you never understood as a student, you would likely find the explanation here. Formally, it
assumes only a knowledge of whole numbers. Informally, it also assumes a certain level of
mathematical maturity to make possible the explanation of all the relevant mathematical
issues. A teacher has to know all these explanations. The purpose of this book is to
provide you with the needed knowledge.

Because you will be a teacher, you have to approach this book with a different mindset
from all other textbooks. Reading the latter, you would likely congratulate yourself if you
achieve mastery over 90% of the material. But would a teacher in a math class who is
correct only 90% of the time be considered a good teacher? To be blunt, such a teacher is
a disaster. So your mission in reading this book should be nothing short of total mastery.
You are expected to know this material 100%. This is the standard you have to set
for yourself.
You have to approach this book differently in yet another respect. The standard as-
sumption in a math course is that if you can do all the homework problems, most of your
work is done. Think back on your calculus courses and you will understand how true this
is. In this course sequence, which is designed specifically for prospective teachers, your
emphasis cannot be just on doing the homework assignments. When you stand in front
of a classroom, what you will be talking about, most of the time, is not the exercises
at the end of each section but the materials in the exposition proper. For example, very
likely you will have to convince a class on geometry why the Pythagorean Theorem is
true. There are two proofs of this theorem in this book, one in Chapter 11 and the other
in Chapter 18. Yet on neither occasion is it possible to assign a problem that asks for a
proof of this theorem. There are problems that assess whether you know enough about
the theorem to apply it when the need arises, but how to assess whether you know how
to prove the theorem when the proofs have already been given in the text? It is therefore
up to you to achieve mastery of everything in the text itself. I may add that the most
taxing part of writing the text was in fact to do it in a way that allows you, as much as

4
possible, to adapt it for use in a school classroom with minimal change.

The order of presentation in this book follows the school curriculum as much as pos-
sible.1 When you read Chapter 1 on fractions, for instance, be aware that you are in
a sixth grade classroom and therefore, no matter how much algebra or geometry you
know, you have to use the language of sixth grade mathematics to explain yourself, as
any teacher would. Similarly, when you come to Chapter 6, you are developing algebra
from the beginning so that even the use of symbols becomes an issue. Therefore, temper
your explanations accordingly.

The main goal of this volume is to provide the necessary background for the teaching
of algebra. Getting all students to take algebra in grade eight is at present a national goal.
Compare the mathematical discussion in the National Mathematics Panels Conceptual
Knowledge and Skills Task Group Report:

http://www.ed.gov/about/bdscomm/list/mathpanel/report/conceptual-knowledge.pdf

Currently, school students are generally deficient in their knowledge of the two pillars that
support algebra: rational numbers and similar triangles; these two topics are the subject
of Chapters 12 and Chapters 45, respectively. Algebra begins in earnest in Chapter 6.

I am greatly indebted to Ted Slaman, Emiliano Gomez and Shari Lind Scott for many
corrections.

1
There are one or two exceptions. For example, the conversion of fractions to infinite decimals is
taught as a rote skill in middle school. However, this topic is taken up very late in this book only after
the concept of limit has been introduced because no explanation can be given any time earlier. There is
also a slight but inevitable deviation from the school curriculum in geometry; see the discussions in 5
of Chapter 4 as well as in Chapter 13.

5
Suggestions on How to Read This Book

The major conclusions in this book, like all mathematics books, are summarized into
theorems; depending on the authors (and other mathematicians) whims, theorems are
sometimes called propositions, lemmas, or corollaries as a way of indicating which
theorems are deemed more important than others (a formula or an algorithm is just
a theorem). This idiosyncratic classification of theorems started with Euclid around 300
B.C. and it is too late to do anything about it now. The main concepts of mathematics are
codified into definitions. Definitions are set in boldface in this book when they appear
for the first time; a few truly basic ones are even individually displayed in a separate
paragraph, but most of the definitions are embedded in the text itself so that you should
watch out for them.
The statements of the theorems as well as their proofs depend on the definitions, and
the proofs are the guts of mathematics.

I would like you to know everything in this book, of course. Please note that
when mathematicians talk about knowing something, they have in mind knowing all the
definitions and theorems by heart, knowing why the definitions are what they are, knowing
the proofs of all the theorems and the main idea of each proof, knowing why the theorems
are needed, knowing what the implications of the theorems are, and finally, knowing how
to apply the theorems in new situations if possible. At the very least, I want you to
know by heart all the theorems and definitions as well as all the main ideas of the proofs
because, if you dont, it is futile to talk about the other aspects of knowing. Therefore,
a preliminary suggestion to help you master the content of this book is for you to
copy out the statements of every definition, theorem, proposition, lemma, and
corollary, along with page references so that they can be examined in detail
if necessary,
and also to
summarize the main idea of each proof.
These are good study habits. When it is your turn to teach your students, be sure to pass
on these suggestions to them.

6
You should also be aware that reading a mathematics book is not the same as reading
a gossip magazine. You can probably flip through such a magazine in an hour, if not
less. But in this book, there will be many passages that require slow reading and re-
reading, perhaps many times. I cannot single out those passages for you because they
will be different for different people. We dont learn the same way. What is true under
all circumstances is that you should accept as a given that mathematics books make for
exceedingly slow reading. I learned this very early in my career. On my very first day as
a graduate student many years ago, a professor, who was eventually to become my thesis
advisor, was lecturing on a particular theorem in a newly published volume. He mentioned
casually that in the proof he was going to present, there were two lines in that book that
took him fourteen hours to understand and he was going to tell us what he found out in
those long hours. That comment greatly emboldened me not to be afraid of spending a
lot of time on any passage in my own reading. If you too get stuck in any passage of this
book, take heart, because you are supposed to.

7
Chapter 1: Fractions

Overview of Chapters 1 and 2 (p. 9)


1 Definition of a fraction (p. 13)
2 Equivalent fractions (p. 28)
3 Adding and subtracting fractions (p. 42)
4 Multiplying fractions (p. 54)
5 Dividing fractions (p. 64)
6 Complex fractions (p. 74)
7 Percent, ratio, and rate problems (p. 79)
Appendix (p. 93)

8
Overview of Chapters 1 and 2

We are going to develop a theory of fractions (positive rational numbers and 0)


and rational numbers that is suitable for use in upper elementary and middle school.
This theory is equally important for prospective high school teachers because most
students come to high school with a very defective understanding of fractions. High
school teachers who do not possess a knowledge of fractions that is accessible to school
students would be tremendously handicapped when they try to help their students.
From a broader perspective, every teacher must have a firm grasp of fractions and
rational numbers, because school mathematics as a whole is about rational
numbers. Real numbers are strictly the purview of college mathematics. To illustrate
this point, consider the following simple operation with irrational numbers:

2 5 (4 2) + ( 3 5)
+ =
3 4 4 3

In school mathematics, one does not explain what 2 and 45 are, much less the
3
meaning of adding the numbers on the left. By the same token, the meaning of the

product 3 5 on the right is even more of a mystery. In the tradition of school
mathematics, this difficulty has never been confronted honestly. Implicitly, however,
the way school mathematics deals with such arithmetic operations is to appeal to
what we call the Fundamental Assumption of School Mathematics (FASM),
which states that

if an identity or an inequality among numbers is valid


for all fractions (respectively, all rational numbers), then it is
also valid for all nonnegative real numbers (respectively, all
real numbers.)2

FASM plays a pivotal role throughout this book. For example, the above equality is
clearly patterned after the formula for the addition of fractions:
a c ad + bc
+ =
b d bd
2
We should call attention to the subtle point that the weak inequality is used in the statement
of FASM; see the end of 1 in Chapter 19 for an explanation. You will not see any mention of FASM
in the school mathematics literature, but we will prove it in Theorem 2 of Chapter 19, 1.

9
where a, b, c, d are whole numbers (bd 6= 0). One way or another, one can eventually
prove that the same equality also holds for all rational numbers a, b, c, d (bd 6= 0),
even if this is rarely done in standard texts. (See 5 of Chapter 2.) Therefore, by
FASM, the same equality is valid for all real numbers a, b, c, d (bd 6= 0), rational

or irrational. This is how we could let a = 2, b = 3, c = 5, and d = 4 to get

the previous result, regardless of the fact that 3 and 5 are no longer rational.
Clearly, FASM makes it mandatory that every school teacher, regardless of grade
level, acquire a firm grasp of fractions and rational numbers.
From a mathematical perspective, the rational numbers are among the simplest
mathematical structures. Let us briefly recall how it is handled in abstract alge-
bra. Given the integers Z, introduce an equivalence relation among ordered
pairs of integers (a, b) with b 6= 0, so that (a, b) (a0 , b0 ) iff ab0 = ba0 . Then the
equivalence class containing the ordered pair (a, b) is what we usually call the quo-
a
tient b . For example, 32 stands for the infinite set of ordered pairs of integers:
{(2, 3), (4, 6), (6, 9), . . .}. The collection of all such quotients is called the
field of rational numbers Q. In Q, we define addition + and multiplication by
a c ad + bc
+ =
b d bd
and
a c ac
=
b d bd
Since we are defining the addition and multiplication between two infinite sets of
numbers, as each of ab and dc is an infinite set of ordered pairs of integers, we need
to check that these definitions are well-defined, in the following sense. Consider,
for example, the second definition: if (a, b) (a0 , b0 ) and (c, d) (c0 , d0 ), then by
0 0
definition ab = ab0 and dc = dc 0 . Since the definition of multiplication would have
a0 c0 a0 c 0
= ,
b0 d0 b0 d 0
we must verify that
ac a0 c 0
= 0 0
bd bd
in order for the preceding definition of multiplication to make sense. It goes without
saying that such verifications are routine and these definitions are well-defined. Then

10
we proceed to further verify that + and are associative, commutative, and distribu-
tive, and that Q is a field under these operations. For the purpose of mathematics,
the fact that Q is a field is all that matters.
Now there are many reasons why this way of dealing with the rational numbers is
not suitable for use in grades 57, which is where fractions and rational numbers are
systematically taught in schools. Children around the age of twelve have little or no
conception of why a field is interesting or important, and they cant care less whether
fractions can be added or multiplied or not. They come to the concept of a fraction
through the common usage of parts of a whole, such as two-thirds of a glass of orange
juice. Furthermore, they come to the addition and multiplication of fractions through
their experience with whole numbers, so that + connotes combining things and
signifies repeated addition. The need of the school classroom therefore dictates that
fractions be introduced in a way that respects students prior experience with whole
numbers rather than the (to them) abstruse concerns about fields or the distributive
law. We cannot allow mathematicians favorite definitions to derail childrens learning
process and contribute to their mental disorientation.
What the foregoing discussion hints at is that school mathematics education is
not the teaching of straightforward mathematics. If engineering is the customization
of abstract scientific principles to meet human needs, then mathematics education
is Mathematical Engineering.3 In this case, we have to customize the abstract
theory of rational numbers to meet the needs of students around the age of twelve,
as well as the needs of their teachers. At the same time, the customization must be

consonant with mathematics in terms of reasoning and precision.

The tension between the need to be faithful to the basic principles of mathematics and
the need to make the mathematics grade-level appropriate will be a dominant theme
in this book. We must not forget that this is the basic charge of good engineering.
This chapter and the next give an exposition of rational numbers that is suitable
for use in a classroom of grades 57. In the process, we will also make occasional
references to the abstract definition of Q above to remind ourselves that what we are
3
For an extended discussion, see H. Wu, How mathematicians can contribute to K-12 mathematics
education, Proceedings of The International Congress of Mathematicians, Madrid 2006, Volume III,
European Mathematical Society, Zurich, 2006, 1676-1688. Also available at:
http://math.berkeley.edu/wu/ICMtalk.pdf

11
doing is nothing but an engineered version of Q. We start in this chapter with the
rational numbers which are 0. In the tradition of school mathematics, these are
called fractions.

In reading these two chapters, please keep in mind that the emphasis here will not
be on the individual facts or skills. For example, it is taken for granted that you are
entirely comfortable with identifying the fractions and rational numbers as certain
points on the x-axis (called the number line in school mathematics), and are fluent in
the four arithmetic operations with rational numbers. Rather, the emphasis is on the
reorganization of familiar facts and skills in a new way to form a logical and coherent
whole that is compatible with the learning processes of upper elementary and middle
school students. The hope is that, with the reorganization, you as a teacher will be
able to explain fractions to your students in a way that makes sense to them and to
you yourself. This is the first step toward establishing mathematical communication
between you and your students. For example, it may come as a surprise to you that
it is possible to develop the concepts of adding and multiplying fractions from the
standpoint that a fraction such as 23 can be identified with the interval [0, 32 ], or that
although the invert-and-multiply rule for the division of fractions is a definition of
division in the field Q, it is a theorem that requires proof in the context of school
mathematics.
There is the danger that, because the facts are so familiar to you, you will sleep-
walk through these two chapters. If you do, that would be a major mistake.
The teaching of fractions is the most problematic part of school mathematics because,
in the usual way it is done, there is hardly any valid definition offered and almost
nothing is ever explained. The resulting non-learning of fractions is not only a na-
tional scandal about the state of mathematics education, but also a major stumbling
block in students learning of algebra. The importance of fractions to the learning of
algebra is beginning to be recognized. See, for example, the report of the National
Mathematics Panel (2008),
http://www.ed.gov/about/bdscomm/list/mathpanel/report/conceptual-knowledge.pdf
See also the article, H. Wu, From arithmetic to algebra,
http://math.berkeley.edu/~wu/C57Eugene 2.pdf
For example, can you recall from your own K-12 experience if you were ever told

12
what it means to multiply 54 73 and, in addition, why is it equal to 53
47
? If not,
then it should give you incentive to do better when it is your turn to teach. What
you will learn in these two chapters, and perhaps all through this book, will be a
reorganization of the bits and pieces that you learned haphazardly in K12 into a
coherent body of knowledge, and your job is to make this knowledge accessible to
your students. You are being asked to become an advocate of a new way to teach
school mathematics: give precise definitions to all the concepts and explain every
algorithm and every skill. The purpose of this book is to optimize your chances of
success in this undertaking.

1 Definition of a fraction

Mathematics rests on precise definitions. We need a definition of a fraction, not


only because this is what mathematics demands, but also because children need a
precise mental image for fractions to replace the mental image of their fingers for
whole numbers.
We begin with the number line, which is the name in school mathematics for the
real line, i.e., the x-axis. So on a line which is (usually chosen to be) horizontal,
we pick a point and designate it as 0. We then choose another point to the right
of 0 and, by reproducing the distance between 0 and this point, we get an infinite
sequence of equi-spaced points to the right of 0. Think of this as an infinite ruler.
Next we denote all these points by the nonzero whole numbers 1, 2, 3, . . . in the usual
manner. Thus all the whole numbers N = {0, 1, 2, 3, . . .} are now displayed on the
line as equi-spaced points increasing to the right of 0, as shown:

0 1 2 3 4

A horizontal line with an infinite sequence of equi-spaced points identified with N


on its right side is called the number line. By definition, a number is just a point
on the number line. Note that except for the original sequence of equi-spaced points
which we have chosen to denote by 1, 2, 3, etc., most numbers have no names as yet.
The next order of business will be to name more numbers, namely, the fractions.

13
Now fractions are introduced to students in the upper elementary grades, and
their basic understanding of a fraction is that it is parts of a whole. The transition
from parts of a whole to a point on the number line has to be handled with care.
This is because, while the idea of placing a fraction such as 23 on the x-axis comes
naturally to you because you are familiar with coordinate systems and calculus, it is
anything but natural to sixth graders or beginning high school students. To them,
parts of a whole is an object, e.g., an area, a part of a pizza, an amount of water in
a glass, or a certain line segment, but not a point on a line. Therefore, the following
informal discussion is intended to smooth out this transition as well as prepare you
for the contingency of having to convince your students to accept a fraction as a
certain point on the number line.

We begin the informal discussion by considering a special case: how the fractions
with denominator equal to 3, i.e., 13 , 23 , 33 , 43 , etc., come to be thought of as a certain
collection of points on the number line. We take as our whole the unit segment
[0, 1]. The fraction 31 is therefore one-third of the whole, i.e., if we divide [0, 1] into
3 equal parts, 13 stands for one of the parts. One obvious example is the thickened
segment below:

0 1 2 3

Of course this particular thickened segment is not the only example of a part
when the whole is divided into 3 equal parts. Let us divide, not just [0, 1], but every
segment between two consecutive whole numbers [0, 1], [1, 2], [2, 3], [3, 4], etc.
into three equal parts. Then these division points, together with the whole numbers,
form an infinite sequence of equi-spaced points, to be called the sequence of thirds:

0 1 2 3

For convenience, we call any segment between consecutive points in the sequence
of thirds a short segment. Then any of the following thickened short segments is

14
one part when the whole is divided into 3 equal parts and is therefore a legitimate
representation of 13 :

0 1 2 3

0 1 2 3

0 1 2 3

The existence of these multiple representations of 31 complicates life and prompts


the introduction of the following standard representation of 31 , namely, the short
segment whose left endpoint is 0 (see the very first example above of a thickened
short segment). With respect to the standard representation of 13 , we observe that this
short segment determines its right endpoint, and vice versa: knowing this segment
means knowing its right endpoint, and knowing the right endpoint means knowing
this segment. In other words, we may as well identify the standard representation of
1
3
with its right endpoint. It is then natural to denote this right endpoint by 31 :

0 1 2 3
1
3

In like manner, by referring to the sequence of thirds and its associated short
segments, the fraction 53 , being 5 of these short segments, can be represented by any
of the following collections of thickened short segments:

0 1 2 3

15
0 1 2 3

0 1 2 3

Again, our standard representation of 53 is the first one, which consists of


5 adjoining short segments abutting 0. This standard representation is completely
determined by its right endpoint, and vice versa. Thus to specify the standard repre-
sentation of 35 is to specify its right endpoint. For this reason, we identify the standard
representation of 53 with its right endpoint, and proceed to denote the latter by 53 , as
shown.

0 1 2 3
5
3

In general then, a fraction m3 for some whole number m has the standard rep-
resentation consisting of m adjoining short segments abutting 0, where short seg-
ment refers to a segment between consecutive points in the sequence of thirds. Since
we may identify this standard representation of m3 with its right endpoint, we denote
the latter simply by m3 . The case of m = 10 is shown below:

0 1 2 3
10
3

We note that in case m = 0, 03 is just 0.

Having identified each standard representation of m3 with its right endpoint, each
point in the sequence of thirds now acquires a name, as shown below. These are
exactly the fractions with denominator equal to 3.

16
0 1 2 3
0 1 2 3 4 5 6 7 8 9 10 11
3 3 3 3 3 3 3 3 3 3 3 3

In terms of the sequence of thirds, each fraction m m


3 is easily located: the point 3 is
the m-th point to the right of 0. Thus if we ignore the denominator, which is 3, then
the naming of the points in the sequence of thirds is no different from the naming of
the whole numbers.

Of course the consideration of fractions with denominator equal to 3 extends to


fractions with other denominators. For example, replacing 3 by 5, then we get the se-
quence of fifths, which is a sequence of equi-spaced points obtained by dividing each
of [0, 1], [1, 2], [2, 3], . . . , into 5 equal parts. The first 11 fractions with denominator
equal to 5 are now displayed as shown:

0 1 2
0 1 2 3 4 5 6 7 8 9 10 11
5 5 5 5 5 5 5 5 5 5 5 5

Finally, if we consider all the fractions with denominator equal to n, then we


would be led to the sequence of n-ths, which is the sequence of equi-spaced points
resulting from dividing each of [0, 1], [1, 2], [2, 3], . . . , into n equal parts. The fraction
m
n is then the m-th point to the right of 0 in this sequence.
This ends the informal discussion.

We now turn to the formal definition of a fraction.


We will begin by making precise the common notion of equal parts. A segment
[a, b] is said to be of length k, for a number k if, when we slide [a, b] along the number
line until a is at 0, the right endpoint b lies over the number k. 4 In particular, the
unit segment has length 1. We say a segment [a, b] is divided into m equal parts
4
It will be observed that, insofar as the only numbers (i.e., points on the number line) which
have been given names thus far are the whole numbers, for now this k must be a whole number and
a segment can only have length equal to a whole number. But as we proceed to name the fractions,
the concept of length will be broadened to include fractions.

17
if [a, b] is expressed as the union of m adjoining, nonoverlapping segments of equal
length. A sequence of points is said to be equi-spaced if the segments between
consecutive points in the sequence are all of equal length.
Divide each of the line segments [0, 1], [1, 2], [2, 3], [3, 4], . . . , into 3 equal parts.
The totality of division points, which include the whole numbers, form a sequence
of equi-spaced points, to be called the sequence of thirds. By definition, the
fraction 31 is the first point in the sequence to the right of 0, 32 is the second point,
3
3
is the third point, and in general, m3 is the m-th point in the sequence to the right
of 0, for any nonzero whole number m. By convention, we also write 0 for 03 . Note
that 33 coincides with 1, 63 coincides with 2, 39 coincides with 3, and in general, 3m 3
coincides with m for any whole number m. Here is the picture:
0 1 2 3 etc.
0 1 2 3 4 5 6 7 8 9 10
3 3 3 3 3 3 3 3 3 3 3

The fraction m3 is called the m-th multiple of 31 . Note that the way we have
just introduced the multiples of 13 on the number line is exactly the same way that
the multiples of 1 (i.e., the whole numbers) were introduced on the number line. In
other words, if we do to 31 exactly what we did to the number 1 in putting the whole
numbers on the number line, then we would also obtain all the m3 s for all m N.
In general, if a nonzero n N is given, we introduce a new collection of points on
the number line in the following way. Divide each of the line segments [0, 1], [1, 2],
[2, 3], [3, 4], . . . into n equal parts, then these division points (which include the whole
numbers) form an infinite sequence of equi-spaced points on the number line, to be
called the sequence of n-ths. The first point in the sequence to the right of 0 is
denoted by n1 , the second point by n2 , the third by n3 , etc., and the m-th point in the
sequence to the right of 0 is denoted by m n
. By convention, n0 is 0.

Definition The collection of all the sequences of n-ths, as n runs through the
nonzero whole numbers 1, 2, 3, . . . , is called the fractions. The m-th point to the
right of 0 in the sequence of n-ths is denoted by m n
. The number m is called the
numerator and n, the denominator of the symbol m n
. By the traditional abuse of

18
language, it is common to say that m and n are the numerator and denomina-
tor, respectively, of the fraction m
n
. 5 By definition, 0 is denoted by n0 for any n.

Remark The thing to keep in mind is that we first identify a sub-collection of


numbers as fractions before giving them names. The symbol m n
, which denotes the
m-th point in the sequence of n-ths, is merely one way to signify the location of this
number. There will be other ways; see below. For example, as is well-known, both
2
4
and 12 refer to the same point on the number line, a phenomenon that will be ex-
plained in detail in the next section.

As before, we shall refer to m


n
as the m-th multiple of n1 . In the future, we will
relieve the tedium of always saying the denominator n of a fraction mn
is nonzero by
simply not mentioning it.

A few remarks about the definition are in order.


(A) In general, if m is a multiple of n, say m = kn, then it is self-evident that
n
n
= 1, 2n
n
= 2, 3n
n
= 3, 4n
n
= 4, and in general,

kn
= k, for all whole numbers k, n, where n > 0.
n
In particular,
m m
=m and =1
1 m
for any whole number m.

(B) For the study of fractions, the unit is of extreme importance. On the number
line, it is impossible to say which point is what fraction until the number 1 is fixed.
This means that we do not know what a fraction is, precisely, until we know what
the unit is. In a classroom situation, it is wise to always remind students that, before
they wave a fraction around, they had better make sure they know what it refers to:
does 31 mean a third of the volume of the liquid in a cup, or a third of the liquid
by height? Or, a fraction 57 could be five-sevenths of a bucket of water by volume,
5
The correct statement is of course that m is the numerator of the symbol which denotes the
fraction that is the m-th point of the sequence of n-ths, and n is the denominator of this symbol.
Needless to say, it takes talent far above the norm to talk like this.

19
five-sevenths of a pie by area when looked at from the top, or five-sevenths in dollars
of your life-savings. An example of a common error is to refer to a pizza as the unit
(the whole), and ask what fraction is represented by putting one of four pieces
below

'$

&%

together with one of the eight pieces as shown:

'$
@
@
@
&%
@

When the answer of 83 is not forthcoming, there is the usual bemoaning of students
not getting the concept of a fraction. Students would get it if, instead of saying
that the pizza represents 1, we tell them that the area of the pizza represents 1.
Students dont know how to put two shapes together to get a fraction, because they
have been misled by confusing instruction.
(C) We have been talking about the number line, but in a literal sense this is
wrong. A different choice of the line or even a different choice of the positions of the
number 0 and 1 would lead to a different number line. What is true, however, is that
anything done on one number line can be done on any other in exactly the same way.
In technical language, all number lines are isomorphic,6 and therefore we identify all
of them. Now it makes sense to speak of the number line.
(D) This way of approaching fractions implicitly assumes a foundational knowledge
of Euclidean geometry. Mathematically, this is not an issue as Euclidean geometry
can be developed independently of the concept of numbers. More important is the fact
that this approach is pedagogically sound because the amount of geometric knowledge
that is implicitly assumed of the students is nothing more than what every student
around the age of ten would naturally take for granted.
6
Corresponding to the fact that all complete ordered fields are isomorphic.

20
(E) Although a fraction is formally a point on the number line, the informal
discussion above makes it clear that on an intuitive level, a fraction m
n
is just the
m
segment [0, n ]. So in the back of our minds, the segment image never goes away
completely, and this fact is reflected in the language we now introduce. First, a
definition: the concatenation of two segments L1 and L2 on the number line is the
line segment obtained by putting L1 and L2 along the number line so that the right
endpoint of L1 coincides with the left endpoint of L2 :

L1 L2

Thus the segment [0, m n


] is the concatenation of exactly m segments each of length n1 ,
to wit, [0, n1 ], [ n1 , n2 ], . . . , [ m1
n
,mn
]. Because we identify [0, m
n
] with the point m
n
, and
1 1
[0, n ] with n , it is natural to adopt the following suggestive terminology to express
the fact that the segment [0, m n
] is the concatenation of exactly m segments each of
1
length n :
m 1
is m copies of concatenated
n n

(F) In school mathematics, the meaning of the equal sign is a subject that is
much discussed, mainly because the meaning of equality is never made clear. You
will find later on the traditional use of the word equivalent for fractions when equal
is meant. This adds to the confusion. For this reason, we make explicit the fact that
two fractions k` and m n are, by definition, equal if they are the same point on the
number line. We have already seen above, for example, that kn n
= k1 = k for any
n, k N.
(G) The definition of a fraction as a point on the number line allows us to make
precise the common concept of one fraction being bigger than another. First consider
the case of whole numbers. The way we put the whole numbers on the number line,
a whole number m is smaller than another whole number n (in symbols: m < n )
if m is to the left of n. We expand on this fact by defining a fraction A to be smaller
than another fraction B, (in symbols: A < B ) if A is to the left of B on the number
line:

21
A B

Note that in the standard education literature, the concept of A < B between frac-
tions is never defined, one reason being that if the concept of a fraction is not defined,
it is difficult to say one unknown object is smaller than another unknown object.
Sometimes the symbol B > A is used in place of A < B. Then we say B is
bigger than A.
(H) A final remark has to do with the symbol used to denote a fraction: 23 , 14 5
, k` ,
etc. Students are known to raise the issue of why use three symbols (k, `, and the
fraction bar ) to denote one concept. Remember that a fraction is a point on
the number line, no more and no less. The symbols employed are merely means to an
end: they serve to indicate where the points are. Thus the symbol 14 5
says precisely
14
that, if we look at the sequence of fifths, then 5 is the 14-th point in the sequence to
the right of 0. We clearly need every part of the symbol 14 5
, namely, the number 5,
the number 14, and the fraction bar in between, to describe the location of this point.
The need of 5 and 14 is obvious, and the role of the fraction bar is to separate 5
from 14 so that one does not confuse 14 5
with 145, for example.
Thus the symbol of a fraction is no more than a particular representation of the
real thing, namely, a point on the number line. This piece of information about frac-
tions should be clearly conveyed to students in grades 57.

There is a special class of fractions that deserves to be singled out at the outset:
those fractions whose denominators are all positive powers of 10, e.g.,
1489 24 58900
, ,
102 105 104
These are called decimal fractions, but they are better known in a more common
notation under a slightly different name, to be described presently. Decimal fractions
were understood and used in China by about 400 A.D., but they were transmitted to
Europe as part of the so-called Hindu-Arabic numeral system only around the twelfth
century. In 1593, the German Jesuit priest (and Vatican astronomer) C. Clavius
introduced the idea7 of writing a decimal fraction without the fraction symbol: just
7
See J. Ginsburg, On the early history of the decimal point, American Mathematical Monthly,
35 (1928), 347349.

22
use the numerator and then keep track of the number of zeros in the denominator (2
in the first decimal fraction, 5 in the second, and 4 in the third of the above examples)
by the use of a dot, the so-called decimal point, thus:

14.89, 0.00024, 5.8900,

respectively. The rationale of the notation is clear: the number of digits to the right of
the decimal point, the so-called decimal digits, keeps track of the power of 10 in the
respective denominators, 2 in 14.89, 5 in 0.00024, and 4 in 5.8900. In this notation,
these numbers are called finite or terminating decimals. In context, we usually
omit any mention of finite or terminating and just say decimals. Notice the
24
convention that, in order to keep track of the power 5 in 10 5 , three zeros are added

to the left of 24 to make sure that there are 5 digits to the right of the decimal point
in 0.00024. Note also that the 0 in front of the decimal point is only for the purpose
of clarity, and is optional.
You may be struck by the odd looking number 5.8900, because you are probably
more used to seeing 5.89 and have assumed all along that it is ok to omit the zeros
to the right of the decimal point. Before going any further with this thought, just be
aware that what you have taken for granted is the fact that the following two fractions
are equal:
58900 589
4
and
10 102
This fact is correct, but your job is not finished until you can prove that it is correct.
We will do that in the next section.

We conclude this section by giving some examples on locating fractions on the


number line. Let us start with 43 , for example. As usual, this is the fourth point to
the right of 0 in the sequence of thirds:

4
0 1 3

6 6 6 6 6

20 4
Question: Can you locate the fraction 15
? How is it related to 3
?

23
Next we consider the problem of locating a fraction such as 8417 , approximately,
84
on the number line, i.e., on the following line, where should 17 be placed, approxi-
mately?

0 1 2 3 4 5 6 7 8

The key idea will turn out to be the use of division-with-remainder for whole
numbers.8 First, look at the multiples of 17: 0, 17, 34, 51, 68, 85, . . . Thus the 68-th
1 1
multiple of 17 is 4 (because 68 = 4 17), and the 85-th multiple of 17 is 5 (because
84 68 85 1
85 = 5 17). Therefore, 17 lies between 17 (= 4) and 17 (= 5), and is just 17 shy of
5, i.e., 84
17
is the point on the number line which is 171
to the left of 5. In terms of
division-with-remainder, since 84 = (4 17) + 16, we have

84 (4 17) + 16
=
17 17

1
So if each step we take is of length 17 , going another 16 steps to the right of 4 will
get us to 17 . If we go 17 steps instead, we will get to 5. Therefore 84
84
17
should be quite
near 5, as shown:

0 1 2 3 4 5 6 7 8
6
84
17

In general, if mn
is a fraction and division-with-remainder gives m = qn + k,
where q and k are whole numbers and 0 k < n, then
m qn + k
= ,
n n
qn
and the position of mn
on the number line would be between q (= n
) and q + 1
(q+1)n qn+n
(= n , which is n ).

8
Division-with-remainder in school mathematics is what is usually called the division algo-
rithm in university algebra texts. There is a good reason why the latter is not used in school
mathematics: it would be too easily confused with the long division algorithm.

24
Caution: Because the above reasoning gives 84 17
16
as 17 beyond 4, most school text-
84 16
books would tell you that 17 = 4 17 . The latter is of course an example of a mixed
number. Similarly, the above m n
is supposed to be written as q nk . As a teacher, how-
ever, you should exercise self-control not to introduce mixed numbers at this point,
16
because (for example) the symbol 4 17 is more than a verbal shorthand for 1617
be-
yond 4. In the context of computations, 4 17 has a precise meaning, namely, 4 + 16
16
17
.
Inasmuch as the concept of fraction addition is rarely mentioned when the symbol
16
4 17 is introduced, confusion is the inevitable result. It may therefore be conjectured
that this confusion is what underlies the fear of mixed numbers.

Exercises 1.1

In doing these and subsequent exercises, please observe the following basic
rules:
(a) Use only what you have learned so far in this course (this is the situ-
ation you face when you teach).
(b) Show your work; the explanation is as important as the answer.
(c) Be clear. Get used to the idea that everything you say has to be un-
derstood.

1. Indicate the approximate position of each of the following on the number line, and
briefly explain why. (a) 186
7
. (b) 457
13
39
. (c) 350 . (d) 5.127.

2. Suppose the unit 1 on the number line is the area of the following shaded re-
gion obtained from a division of a given square into eight congruent rectangles (and
therefore eight parts of equal area).9
9
We will give a precise definition of congruence in Chapter 4, and will formally discuss area
in Chapter 17. In this chapter, we only make use of both concepts in the context of triangles
and rectangles, and then only in the most superficial way. For the purpose of understanding this
chapter, you may therefore take both concepts in the intuitive sense. If anything more than intuitive
knowledge is needed, it will be supplied on the spot, e.g., in 4 of this chapter.

25
Write down the fraction of that unit representing the shaded area of each of the
following divisions of the same square and give a brief explanation of your answer.
(In the picture on the right, two copies of the same square share a common side and
the square on the right is divided into two halves.)

@
@
@
@
@
@

3. With the unit as in problem 2 above, write down the fraction representing the area
of the following shaded region (assume that the top and bottom sides of the square
are each divided into three segments of equal length):

4. A text on professional development claims that students conception of equal


parts is fragile and is prone to errors. As an example, it says that when a circle is
presented this way to students

'$

QQ
&%

26
they have no trouble shading 23 , but when these same students
2
are asked to construct their own picture of 3
, we often see them
create pictures with unequal pieces:

'$

&%

(a) What kind of faulty mathematical instruction might have promoted this kind of
misunderstanding on the part of students? (b) What would you do to correct this
kind of mistake by students?

5. Ellen ate 13 of a large pizza with a 1-foot diameter and Kate ate 12 of a small pizza
with a 6-inch diameter. (Assume that all pizzas have the same thickness and that
the fractions of a pizza are measured in terms of area.) Ellen told Kate that since
she had eaten more pizza than Kate, 31 > 12 . Discuss all the mathematical mistakes
in Ellens assertion.

6. Take a pair of opposite sides of a unit square and divide each side into 478 equal
parts. Join the corresponding points of division to obtain 478 thin rectangles (we will
assume that these are rectangles). For the remaining pair of opposite sides, divide
each into 2043 equal parts and also join the corresponding points of division; these
lines are perpendicular to the other 479 lines. The intersections of these 479 and 2044
lines create 478 2043 small rectangles which are congruent to each other (we will
assume that too). What is the area of each such small rectangle, and why ? (This
problem is important for 4 below.)

7. [Review the above remark (B) on the importance of the unit before doing this
problem. Also make sure that you do it by a careful use of the definition of a fraction
rather than by some transcendental intuition you possess which cannot be explained
to your students.]
(a) After driving 148 miles, we have done only two-thirds of the driving for the
day. How many miles did we plan to drive for the day? Explain.

27
(b) After reading 180 pages of a book, I am exactly four-fifths of the way through.
How many pages are in the book? Explain.
(c) Alexandra was three quarters of the way to school after having walked 0.78
mile from home. How far is her home from school? Explain.

8. Three segments (thickened) are on the number line, as shown:


137
A B 25 C
3 4 5 6 7

It is known that the length of the left segment is 11 8


16 , that of the middle segment is 17 ,
and that of the right segment is 23
25 . What are the fractions A, B, and C? (Caution:
Remember that you have to explain your answers, and that you know nothing about
mixed numbers until we come to this concept in 3.)

9. The following is found in a certain third-grade workbook:

Each of the following figures represents a fraction:

Point to two figures that have the same fractions shaded.

If you are the third grade teacher teaching from this workbook, how would you change
this problem to make it suitable for classroom use?

2 Equivalent fractions

Recall that two fractions are equal if they are the same point on the number line.
We observed above that nkn
= k1 , as both are equal to k. The following generalizes

28
this simple fact.

m k
Theorem 1 Given two fractions and , suppose there is a nonzero whole
n `
number c so that
k = cm and ` = cn.
Then
m k
=
n `

Proof First look at a special case: why is 43 equal to 54


53 ? We have as usual the
following picture:
4
0 1 3

6 6 6 6 6 6

Now suppose we further divide each of the segments between consecutive points in
the sequence of thirds into 5 equal parts. Then each of the segments [0, 1], [1, 2],
[2, 3], . . . is now divided into 5 3 = 15 equal parts and, in an obvious way, we have
obtained the sequence of fifteenths on the number line:
4
0 1 3

6 6 6 6 6 6

The point 43 , being the 4-th point in the sequence of thirds, is now the 20-th point in
the sequence of fifteenths because 20 = 5 4. The latter is by definition the fraction
20 54 4 54
15 , i.e., 53 . Thus 3 = 53 .
The preceding reasoning is enough to prove the general case. Thus let k = cm
and ` = cn for whole numbers c, k, `, m, and n. We will prove that m k
n = ` . In other
words, we will prove:
m cm
=
n cn
m
The fraction n is the m-th point in the sequence of n-ths. Now divide each of the
segments between consecutive points in the sequence of n-ths into c equal parts. Thus

29
each of [0, 1], [1, 2], [2, 3], . . . is now divided into cn equal parts. Thus the sequence of
n-ths together with the new division points become the sequence of cn-ths. A simple
reasoning shows that the m-th point in the sequence of n-ths must be the cm-th
point in the sequence of cn-ths. This is another way of saying m cm
n = cn . The proof
is complete.

The content of Theorem 1 is generically referred to in school mathematics as the


theorem on equivalent fractions. As mentioned earlier, it is a tradition in school
mathematics to say that two fraction (symbols) k` and mn are equivalent if they are
equal, i.e., k` and mn are the same point. Thus Theorem 1 gives a sufficient condition
for two fractions ` and m
k
n
to be equivalent: if we can find a whole number c so that
k = cm and ` = cn. For brevity, Theorem 1 is usually stated as
m cm
=
n cn
for all fractions m
n and all whole numbers c 6= 0. In this form, Theorem 1 is called the
cancellation law for fractions: one cancels the c from numerator and denominator.
This is the justification for the usual method of reducing fractions, i.e., canceling a
common divisor of the numerator and the denominator of a fraction. Thus, 51 34 = 2
3

because
51 17 3 3
= =
34 17 2 2

Theorem 1 is the fundamental fact about fractions, and the reason can be easily
seen, as follows. Given two fractions m k
n and ` , we want to know when they are equal.
Theorem 1 gives a sufficient condition. On the other hand, this is not a necessary
condition, in the sense that the equality m k
n = ` does not imply that k = cm and
` = cn for some whole number c. For example, Theorem 1 shows that 23 = 21 14 (as
21 = 7 3 and 14 = 7 2), so that coupled with the preceding remark about 51
34 , we
have
21 51
=
14 34
However, there is clearly no whole number c so that c times 21 yields 51 and that the
same c times 14 yields 34. It turns out that, with a mild twist, Theorem 1 can be used

30
to give a necessary and sufficient condition for two fractions to be equal. Precisely:

Theorem 2 (Cross-Multiplication Algorithm) A necessary and sufficient


condition for two fractions k` and m
n to be equal is that kn = `m.

For later needs, we pause to note that there are several different but equally valid
ways to state Theorem 2. One way is to say that
k m
= if and only if kn = `m.
` n
Another says
k m
= is equivalent to kn = `m.
` n
A more symbolic way is
k m
= kn = `m
` n
No matter how the theorem is stated, all it says is that both of the following statements
are valid:
k m
First, ` = n implies kn = `m.
Second, kn = `m implies k` = m
n.

As is well-known, each is said to be the converse of the other.

Proof of Theorem 2 (I) We prove k` = m


n implies kn = `m.
By Theorem 1, k` = kn m `m k m
`n and n = `n . Because we are assuming ` = n , we
therefore have
kn `m
=
`n `n
1 1
What this says is that the kn-th multiple of `n is equal to the `m-th multiple of `n
.
This is possible only if kn = `m.
(II) We next prove kn = `m implies k` = m
n.
The hypothesis implies that
kn `m
=
`n `n

31
By Theorem 1, the left side is k` while the right side is m k m
n . Thus we have ` = n .
The proof of Theorem 2 is complete.

The education literature often mistakes Theorem 2 for a rote-learning skill with-
out knowing that, once a fraction has been clearly defined and the equality of two
fractions also clearly defined, Theorem 2 is something capable of being proved pre-
cisely. As a result of this misunderstanding, many students have been taught to
avoid using this theorem or have not been taught this theorem. In this book, we
explicitly ask you to make ample use of this useful result whenever the equality of
two fractions is discussed, and there will be many opportunities for you to do just that.

Two remarks about Theorem 2 are relevant. One is that sometimes Theorem 2
551
provides the only easy way to decide if two fractions are equal, e.g., 247 and 203
91 are
equal because 551 91 = 203 247. A second remark is that from the vantage
point of abstract algebra, the importance of Theorem 2 (and hence of Theorem 1) is
manifest because the cross-multiplication algorithm is exactly the equivalence relation
between ordered pairs of integers when fractions are defined as equivalence classes of
such ordered pairs: (a, b) (a0 , b0 ) if and only if ab0 = a0 b.

As an application of Theorem 1, we bring closure to the discussion in the last


section about the decimal 5.8900. Recall that we had, by definition,
58900
= 5.8900
104
We now show that 5.8900 = 5.89 and, more generally, one can add zeros to or delete
zeros from the right end of the decimal point without changing the decimal. Indeed,
58900 589 102 589
5.8900 = 4
= 2 2
= 2 = 5.89,
10 10 10 10
where the middle equality makes use of equivalent fractions. The reasoning is of
course valid in general, e.g.,
127 127 104 1270000
12.7 = = 4
= = 12.70000
10 10 10 105

From the proof of Theorem 2, we can extract a very useful statement about pairs
of fractions, which we call the Fundamental Fact of Fraction-Pairs (FFFP):

32
Any two fractions may be symbolically represented as two fractions with the
same denominator.

The reason is simple: if the fractions are m


n
and k` , then because of Theorem 1, we
have
m m` k nk
= and =
n n` ` n`
In terms of the new fraction symbols, they now share the denominator n`.

We can paraphrase FFFP this way: any two fractions can be put on equal footing,
in the following sense. Given k` and m n
as above. Assume ` 6= n. Then k` is k copies
of 1` , and m n
is m copies of n1 . Since 1` 6= n1 , it is difficult to get a sense of k` relative to
m
n
. It is like talking about 155 inches and 4 meters, one doesnt have a sense which
is longer until one expresses both measurements in terms of the same unit, e.g., an
inch. Then since 1 inch is 2.54 cm, 4 meters is 400 2.54 = 157.48 . . . inches. This
is how we can tell 4 meters is longer. In the same way, FFFP allows us to think of
k
`
and m n
as nk
n`
and m`
n`
, respectively. Therefore they become, respectively, nk copies
1
and m` copies of the same `n . Now we can make sense of these two fractions by
comparing nk and m`. For example, given 23 and 47 , we may replace them with 21 14

and 12 21
, respectively.
In special cases, such as 32 and 89 , then because 32 = 12 8
, the two fractions are
already put on equal footing without having to multiply their denominators.
There will be numerous applications of FFFP in subsequent discussions.

The power of Theorem 1 on equivalent fractions has not been fully exploited in
the school mathematics literature, but it should be. We give one illustration of this
fact. First, we have to give a precise meaning to a common expression, two-thirds
of something, or more generally, m
n
of something. What is meant by, for example,
I ate two-thirds of a pie? The truth is that we say this without thinking, but if
pressed, we would probably agree that this means if we look at the pie as a circular
disk and ignore its depth, and we cut it into 3 parts of equal area, then I ate 2
parts. Another example: what is meant by he gave three-fifths of a bag of rice to his
roommate? Most likely, he measured his bag of rice by weight and, after dividing
the bag of rice into 5 equal parts by weight, he gave away 3 parts. In each case, the

33
choice of the unit (area in the first and weight in the second) is implicit and depends
on the readers common sense. While the choice in each of these two cases is not
controversial, one can imagine that such good fortune may not hold out in general.
Consider the statement: I put away three quarters of the ham. This leaves a lot
of room for different interpretations. I could have cut the ham, length-wise, into four
parts of equal length. Or I could have cut it into four parts of equal weight. I could
even have managed to cut it into four parts so that each part had (approximately)
the same volume of meat. There is an even more pertinent example from real life.10
A man obtained a construction loan from a bank for his house, and it stipulated that
he should be at such a percentage of completion of the project at a certain point.
When that time came, the bank said he had not met his obligation. Whereupon, he
wrote to the bank: Percentage of completion by what measure? He explained that
if by, say, volume of materials used, then the bank might have been correct, but if by
sheer brute labor, then he was way ahead of schedule. The bank was flummoxed by
his response.
These examples illustrate the fact that statements about a fraction of something
could be ambiguous and, for the purpose of doing mathematics, the choice of the unit
of measurement must be made explicit at the outset. With this in mind, we proceed
to give a formal definition of a fraction of something. If we fix a unit of measure-
ment, then we will use the informal language of quantity, understood to be relative
to the unit, to mean a number on the number line where the number 1 is the given unit.

Definition Suppose a unit of measurement has been chosen. Then k` of a quan-


tity means the totality (relative to this unit) of k parts when that quantity is parti-
tioned into ` equal parts according to this unit.

The simplest quantity in the present context is that of the length of a segment.
In this case, the unit of measurement will always be understood to be the length.
Consider for example the case of
1
3 of 24
7

This is then the length of 1 part when the segment [0, 24


7
] is divided into 3 parts of
10
As related to me by my friend David Collins.

34
equal length. Now, 247
is 24 copies of 17 , and since 24 = 3 8, clearly [0, 24
7
] can be
divided into 3 equal parts so that each part is the concatenation of 8 copies of 17 .
Thus 13 of 24 8 24
7 is 7 . The key point here is that the numerator of 7 is divisible by 3.
Next, suppose we want
2
5 of 87

Now we have to divide [0, 78 ] into 5 equal parts and then measure the length of 2 of
those parts. But first things first: we have to divide 78 into 5 equal parts. Noting that
8 is not divisible by 5, we make use of equivalent fractions to force the numerator of
8
7
to be divisible by 5 by writing 87 = 58
57 . The numerator 5 8 is now divisible by
5, and so by retracing the preceding steps, we conclude that if [0, 78 ] is divided into 5
1 1
equal parts, each part would be the concatenation of 8 copies of 57 = 35 , i.e., each
8
part has length 35 . Two of these parts then have length 28 16 2 8
35 = 35 . Thus, 5 of 7 is
16
35 .
Pictorially, what we did was to sub-divide the segments between consecutive points
of the sequence of sevenths, as shown,
8
0 1 7

6 6 6 6 6 6 6 6 6
6

into 5 equal parts:


8
0 1 7

6 6 6 6 6 6 6 6 6
6

The unit segment is now divided into 5 7 = 35 equal parts, so that the new division
points furnish the sequence of 35-ths. The segment [0, 87 ] is now divided into 40 equal
parts by this sequence of 35-ths. Taking every 8th division point (in this sequence of
35-ths) then gives a division of [0, 78 ] into 5 equal parts. So the length of a part in the
8
latter division is 35 . (Of course, what we have done is merely to re-prove the theorem
on equivalent fractions in this particular case of 78 = 5857
.)

35
This way of exploiting equivalent fractions will be seen to clarify many aspects
of fractions, such as the interpretation of a fraction as division, or the concept of
multiplication; see below. It also allows us to solve word problems of the following
type.

Example Kate walked 25 of the distance from home to school, and there was still
4
9
of a mile to go. How far is her home to school?
We can draw the distance from home to school on the number line, with 0 being
home, the unit 1 being a mile, and S being the distance of the school from home.
Then it is given that, when the segment from 0 to S is partitioned into 5 equal parts,
Kate was at the second division point after 0:

0 Kate S
| {z }
4
9
mi

If we can find the length of one of these five segments, which for convenience will be
called the short segments, then the total distance from home to school would be 5
times that length. We are given that the distance from where Kate stands to S is 49
of a mile, and this distance comprises 3 short segments. If we can find out how long
a third of 49 of a mile is, then we would know the length of a short segment and the
problem would be solved. By the theorem on equivalent fractions,
4 34 34
= = ,
9 39 27
and this exhibits 49 as (3 4) copies of 27
1
. Therefore 4 1
copies of 27 4
(i.e., 27 ) is the
4 1
length of one third of 9 . The total distance from 0 to S is thus (5 4) copies of 27 ,
20
which is 20
27
. The distance from Kates home to school is 27 miles.

Remark This is one of the standard problems on fractions which is usually given
after the multiplication of fractions has been introduced and the solution method is
given out as an algorithm (flip over (1 52 ) to multiply 49 ). We can see that there
is in fact no need to use multiplication of fractions for the solution, and moreover, no
need to memorize any solution template. The present method of solution makes the

36
reasoning very clear.

Using the same reasoning, we now give a completely different interpretation of a


fraction. We will prove:
m
= the length of one part when a segment of
n
length m is partitioned into n equal parts

Recall that the original definition of m n


is m copies of n1 , which means to locate
m
n
, it suffices to consider the unit segment [0, 1], divide it into n equal parts and
concatenate m of these parts. The above statement, on the contrary, says that to
locate m n
, one should divide, not [0, 1], but [0, m] into n equal parts and take the first
division point to the right of 0. So the two are quite different statements.
The proof is simplicity itself. To divide [0, m] into n equal parts, we express
m = m1 as
nm
n
That is, [0, m] is nothing but nm copies of n1 . So one part out of the n equal parts
into which [0, m] has been divided is just m copies of n1 , i.e., m n.

The full significance of this assertion will emerge only after we reexamine the
meaning of division among whole numbers. This we proceed to do. Let m, n be whole
numbers and let m be a multiple of n, let us say m = kn for some whole number k.
Then m n is the number of objects in a group when m objects are partitioned into
n equal groups; clearly, there are k objects in each group, and therefore, m n = k.
In other words, m n is the length of one part when a segment of length m is
partitioned into n equal parts. (This is the so-called partitive meaning of division.)
This definition of division requires that we assume m is a multiple of n at the outset,
because we are doing division among whole numbers and must make sure that m n
comes out to be a whole number (i.e., k). Now if we are allowed to use fractions,
then m would no longer need to be a multiple of n, and the preceding definition of
m n (i.e., the length of one part when a segment of length m is partitioned into n
equal parts) makes sense for arbitrary whole numbers m and n.11 With this in mind,
11
In fact, m could even be a fraction.

37
we now define for two arbitrary whole numbers m and n, the general concept of the
division m n of two whole numbers m and n:

m n is the length of one part when a segment


of length m is partitioned12 into n equal parts.
m
The previous assertion on n
can now be rephrased as a

Theorem 3 For any two whole numbers m and n, n 6= 0,


m
=mn
n
This is called the division interpretation of a fraction. In school texts and
education writing, the meaning of m n when m is not a multiple of n is not made
explicit and, worse, the equality m
n
= m n is taken to be another meaning of a
fraction that students must memorize without benefit of explanation. When you
teach, please remember that there needs to be a clear definition for m n, and that
the statement about the equality of the two numbers m n
= m n is a theorem, i.e.,
something that can be explained logically.

As a result of the division interpretation of a fraction, we will retire the


division symbol from now on and use fractions to stand for the di-
vision among whole numbers.

As a final remark on the concept of k` of m n


, we prove a theorem that will be
useful for the consideration of fraction multiplication in 4. As motivation, recall the
fact proven above that 2114
= 51
34
. If m
n
is a fraction, then it would stand to reason that
21
14 of m 51 m
n = 34 of n
But this fact clearly needs a proof because, according to the definition of k` of m n
,
m
this says the concatenation of 21 parts when [0, n ] is divided into 14 equal parts has
the same length as the concatenation of 51 parts when the same [0, m n
] is divided into
34 equal parts. This is certainly not obvious. Moreover, we would also expect that if
m
n
=M N
, then also
12
To avoid the possibly confusing appearance of the word divide at this juncture, we have
intentionally used partition instead.

38
21
14 of m 51 M
n = 34 of N

This too needs a proof. All this is taken care of by the following theorem.

Theorem 4 If k` = K m M
L and n = N . Then
k
` of m K M
n = L of N

Proof We first prove that


k
` of m km
n = `n

Because m `m m 1
n = `n , we see that [0, n ] is `m copies of `n . Therefore if we divide
1
[0, m
n
] into ` equal parts, each part will be m copies of `n . Therefore if we concatenate
1
k of these parts, we get km copies of `n , i.e., we get km k
`n . By the definition of ` of
m
n, we have proved that k` of m km
n = `n .
In like manner, we have
K
L of M KM
N = LN

Hence, to prove the theorem, we must prove km KM


`n = LN . According to Theorem
2 (cross-multiplication algorithm), this would be the case if we can prove kmLN =
`nKM . In other words, we have to prove:

(kL)(mN ) = (`K)(nM )

By the assumption that k` = K


L and by Theorem 2, we have kL = `K. Similarly,
by the assumption that m M
n = N and by Theorem 2, we also have mN = nM .
Therefore (kL)(mN ) = (`K)(nM ), as claimed. The proof of Theorem 4 is complete.

Exercises 1.2

[Reminder] In doing these and subsequent exercises, please observe the


following basic rules:
(a) Use only what you have learned so far in this course (this is the situ-
ation you face when you teach).

39
(b) Show your work; the explanation is as important as the answer.
(c) Be clear. Get used to the idea that everything you say has to be un-
derstood.

1. Reduce the following fractions to lowest terms, i.e., until the numerator and
denominator have no common divisor > 1. (You may use a four-function calculator
to test the divisibility of the given numbers by various whole numbers.)
42 52 204 414 1197
, , , , .
91 195 85 529 1273
2. Explain each of the following to an eighth-grader, directly and without using The-
orem 1 or Theorem 2, by drawing pictures using the number line :
6 3 28 7 12 4
= , = , and = .
14 7 24 6 27 9
3. School textbooks usually present the cancellation law for fractions as follows.

Given a fraction m
n . Suppose a nonzero whole number k divides both m
and n. Then m mk
n = nk .

Explain to a seventh grader why this is true.

4. The following points on the number line have the property that the thickened
segments [A, 1], [B, 2.7], [3, C], [D, 4], [ 13
3
, E], all have the same length:

A B C D E

0 1 2 6 3 4 6 5
13
2.7 3

If A = 47 , what are the values of B, C, D, E ? Be careful with your explanations: we


dont know how to add or subtract fractions yet. (Rest assured that on the basis of
what has been discussed in this section, you can do this problem.)

8
5. (a) 73 is 11 7
of which number? (b) I was on a hiking trail, and after walking 10 of
a mile, I was 59 of the way to the end. How long is the trail? (c) After driving 18.5

40
miles, I am exactly three-fifths of the way to my destination. How far away is my
destination?

6. Explain to a sixth grade student how to do the following problem: Nine students
chip in to buy a 50-pound sack of rice. They are to share the rice equally by weight.
How many pounds should each person get? (If you just say, divide 50 by 9, that
wont be good enough. You must explain what is meant by 50 divided by 9, and
why the answer is 5 59 .)

7. (a) A wire 314 feet long is only four-fifths of the length between two posts. How
far apart are the posts? (b) Helena was three quarters of the way to school after
having walked 98 miles from home. How far is her home from school?

8. (a) 37 of a fraction is equal to 56 . What is this fraction? (b) m


n
of a fraction is equal
to k` . What is this fraction?

7
9. James gave a riddle to his friends: I was on a hiking trail, and after walking 12 of
5
a mile, I was 9 of the way to the end. How long is the trail? Help his friends solve
the riddle.

10. Prove that the following three statements are equivalent for any four whole
numbers a, b, c, and d, with b 6= 0 and d 6= 0:
a c
(a) = .
b d
a c
(b) = .
a+b c+d
a+b c+d
(c) = .
b d
(One way is to prove that (a) implies (b) and (b) implies (a). Then prove (a) implies
(c) and (c) implies (a).)

11. Place the three fractions 13 11 9


6 , 5 , and 4 on the number line and explain how
they get to where they are.

41
12. (a) For which fraction m m m+1 m
n is it true that n = n+1 ? (b) For which fraction n
is it true that m m+b
n = n+b , where b is a positive whole number?

13. Prove that between any two fractions A and C, there is a fraction B, i.e.,
A < B < C.

3 Adding and subtracting fractions


5 3
What is the meaning of + ?
7 8

This simple question, incredibly, is almost never answered in school mathematics.


What is usually done, after some vague statements about putting two fractions into a
common denominator, is to give a formula for the sum in terms of the lowest common
denominator of the two fractions in question. Now students in the upper elementary
grades have certain conceptions about adding numbers, because they have seen that
adding whole numbers is combining things. Yet the addition of fractions in terms
of the lowest common denominator fails to build on this sound conception, thereby
causing disruption in the learning process. We will provide the necessary corrective
measure by defining the addition of fractions as a direct extension of the addition of
whole numbers.
Consider, for example, the addition of 4 to 7. In terms of the number line, this
is just the total length of the concatenation of two segments, one of length 4 and the
other of length 7, which is of course 11, as shown.

0 4 11
| {z }
7

Similarly, if we have two whole numbers m and n, then m + n is simply the length
of the concatenation of the two segments of length m and n:

42
m n
| {z }
m+n

We are therefore led to the following definition of the sum of two fractions:

k m
Definition Given fractions k` and m n , we define their sum ` + n by
k m
+ = the length of two concatenated segments, one
` n
of length k` , followed by one of length m
n

k m
` n
| {z }
k m
`
+ n

It follows directly from this definition that the addition of fractions satisfies the
associative and commutative laws. (Cf. the Appendix at the end of this chapter for
a summary of these laws.)
It is also an immediate consequence of the definition that
k m k+m
+ =
` ` `
because both sides are equal to the length of k + m copies of 1` . More explicitly, the
left side is the length of k copies of 1` combined with m copies of 1` , and is therefore
the length of k + m copies of 1` , which is exactly the right side. This tells us that,
to compute the sum of two fractions with the same denominator `, one adds them as
if they are whole numbers, with the only difference being that instead of adding so
many copies of 1, we now add so many copies of 1` , as above.
Because of FFFP, the general case of adding two fractions with unequal denomi-
nators is immediately reduced to the case of equal denominators, i.e., to add
k m
+
` n
k kn m `m
where ` 6= n, we use FFFP to rewrite `
as `n
and n
as `n
. Then
k m kn `m kn + `m
+ = + =
` n `n `n `n

43
In a school classroom, it may not be a good idea to use FFFP in this peremptory
fashion. One may proceed instead as follows. The reason it is not obvious how to
compute the exact value of k` + m n
is that we are asked to add k copies of 1` to m
copies of n1 , which is similar to adding 5 inches to 3 meters. We cannot get an exact
answer to the latter until we can express inch and meter in terms of a common unit
such as centimeters, for instance. We know 1 in. = 2.54 cm. and of course 1 m. = 100
cm. Therefore
5 in. + 3 m. = (12.7 + 300) cm. = 312.7 cm.
So in the same way, we will do the addition of k` + m n
by first expressing both 1` and
1 1
n
in terms of a common unit, and the most obvious such unit is `n . Then 1` = `nn

1
(which is n copies of `n ), and n1 = `n
`
(which is ` copies of the same). Consequently,
k
`
is k copies of 1` , and is therefore kn copies of 1
`n
,
m
n
is m copies of n1 , and is therefore `m copies of 1
`n
.

Thus, k` + m 1
n is kn+`m copies of `n , i.e.,
kn+`m
`n , which is the same result as before.

It is clear, by the same reasoning, that if given m


n
and k` , there is a whole number D
that is a common multiple of both n and `, say D = `1 ` = n1 n, then the computation
of the sum k` + m n
can proceed as follows:

k m k`1 mn1 k`1 + mn1


+ = + =
` n D D D
Such a D is called a common denominator of the denominators n and `. We add
the perhaps superfluous comment that the most obvious common denominator is the
product of the denominators (of the two fractions in question), and this is the one we
use automatically.

The first application of fraction addition is the explanation of the addition al-
gorithm for (finite) decimals. For example, consider

4.0451 + 7.28

This algorithm calls for

44
() lining up 4.0451 and 7.28 by their decimal point,
() adding the two numbers as if they are whole numbers and get a whole
number, to be called N , and
() putting the decimal point back in N to get the answer of 4.0451+7.28.

We now supply the reasoning for the algorithm. First of all, we use the theorem on
equivalent fractions13 to rewrite the two decimals as two with the same number of
decimal digits, i.e., 4.0451 + 7.28 = 4.0451 + 7.2800. This corresponds to (). Then,
40451 + 72800
4.0451 + 7.28 =
104
113251
= (corresponds to ())
104
= 11.3251 (corresponds to ())

The reasoning is of course completely general and is applicable to any other pair of
decimals.

A second application is to get the so-called complete expanded form of a (finite)


decimal. For example, given 4.1297, we know it is the fraction
41297
104
But we have the expanded form of the whole number 41297:

41297 = (4 104 ) + (1 103 ) + (2 102 ) + (9 101 ) + (7 100 )


4
1103
We also know that, by equivalent fractions, 410
104 = 4, 1
104 = 10 , etc. Thus

1 2 9 7
4.1297 = 4 + + 2+ 3+ 4
10 10 10 10
This expression of 4.1297 as a sum of descending powers14 of 10, where the coefficients
of these powers are the digits of the number itself (i.e., 4, 1, 2, 9, and 7), is called
13
A little reflection would tell you that we are essentially using FFFP here.
14
Descending if you think of 10 1
as 101 , etc.

45
the complete expanded form of 4.1297. In the same way, a decimal 0.d1 d2 dn ,15
where each dj is a single-digit number, has the following complete expanded form:

d1 d2 dn
0.d1 d2 dn = + 2 + + n
10 10 10

A third application of fraction addition is to introduce the concept of mixed num-


bers. We have seen that, in order to locate fractions on the number line, it is an
effective method to use division-with-remainder on the numerator. With the avail-
ability of the concept of addition between fractions, we are now in a position to go
further than before, e.g.,

187 (13 14) + 5 (13 14) 5 5


= = + = 13 +
14 14 14 14 14
5 5
Thus the sum 13+ 14 , as a concatenation of two segments of lengths 13 and 14 , clearly
187
exhibits the fraction 14 as a point on the number line about one-third beyond the
5 5
number 13. The sum 13 + 14 is usually abbreviated to 13 14 by omitting the + sign
and, as such, it is called a mixed number. More generally, a mixed number is a
sum n + k` , where n is a whole number and k` is a proper fraction, and it is usually
k
abbreviated to just n ` . 16 This concept causes terror among students probably
because it is usually introduced in textbooks before the concept of the addition of
fractions is in place, with the result that deep confusion is built into the concept
itself. It is for the reason of avoiding this pitfall that we have postponed the intro-
duction of this concept until now. So just remember: a mixed number is a sum of a
whole number and a proper fraction. No more, and no less.

Before leaving the topic of adding fractions, it is time to bring closure to the
comment at the beginning of this section about the mathematical inappropriateness
of the usual formula for the addition of fractions in terms of their least common
denominator. Given k` and m n
, one is told how to add k` + m n
by first finding the
lowest common denominator of the fractions, which is by definition the LCM17 of the
15
The notation here is unfortunate: d1 d2 dn is not the product of d1 , d2 , . . . , dn .
16
The discussion of fractions and decimals seems to be rife with notational problems: please note
that n k` is not the product of n and k` .
17
Least common multiple. For a precise definition, see Chapter 3, problem 3 of Exercises 3.2.

46
denominators, say B, then letting B = `0 ` = n0 n for some whole numbers `0 and n0 ,
the sum of these two fractions is given as
k m `0 k n0 m `0 k + n0 m
+ = 0 + 0 =
` n `` nn B
First of all, this formula is sometimes offered as a definition of the sum of fractions,
and this is certainly unacceptable because it bears no resemblance to the intuitive
notion of addition as combining things. But even as a formula for addition, it is
no less objectionable because, as we have seen, it is not necessary to use the LCM of
` and n when the obvious multiple of both denominators, `n, is both adequate and
natural.18

Be sure to rid school mathematics of this way of defining the sum of fractions at ev-
ery opportunity, because this definition has caused great harm in school mathematics.

It remains to point out that if a simpler common denominator other than the
product of the denominators is available, then it would be usually advantageous to
use it. One simple illustration is that, if we are given 34 + 78 , then using 8 rather
than 4 8 as a common denominator for the addition visibly leads to a simpler
computation:
3 7 6 7 13
+ = + =
4 8 8 8 8
A more interesting example to illustrate the advantage of using a simpler common
denominators can be found in problem 8 of Exercises 1.3 at the end of this section.

We next wish to discuss the subtraction of fractions. We are handicapped by not


having negative fractions at our disposal, however, so that to compute k` m
n , we
must first make sure that m n
< k` . Recall from 1 that m k
n < ` means the point
m
n
is
to the left of the point k` on the number line:
m k
n `

18
Mathematical aside: The real mathematical objection is this. If finding the LCM of the two
denominators is necessary for the addition formula of two fractions, then addition cannot be per-
formed in the field of quotients of a domain unless the domain is (essentially) noetherian. However,
we know that such is not the case.

47
In practice, it may not be easy to tell, for example, which of 49 and 37 is bigger.
We need a general method for comparing fractions. Now FFFP comes to the rescue:
we rewrite both to have denominator 9 7, so that
4 28 3 27
= and =
9 63 7 63
1 1
Since the 28th multiple of 63 is to the right of the 27th multiple of the same 63 , we
see that 94 is bigger. This reasoning is perfectly valid in general, and we have the
following theorem, also called the cross-multiplication algorithm.

Theorem (Cross-Multiplication Algorithm) Given two fractions k` and m


n.
k m
Then ` < n is equivalent to kn < `m.

Here is the formal proof of the algorithm. By FFFP, we can rewrite k` and
m kn `m k m kn `m
n as `n and `n , respectively. If ` < n , then we have `n < `n . This means
1
the kn-th multiple of `n is to the left of the `m-th multiple of the same, so that kn
must be smaller than `m, i.e., kn < `m. Conversely, suppose kn < `m. Then the
1 1
kn-th multiple of `n is to the left of the `m-th multiple of `n , so that

kn `m
<
`n `n
k m
By the theorem on equivalent fractions, this becomes ` < n. The proof of the
theorem is complete.

This algorithm has a deceptively simple

1
Corollary ` > n is equivalent to `
< n1 .

` n 1
The reason is as follows. If ` > n, then `n > `n (the `-th multiple of `n is to the
right of the n-th multiple of the same), which by the theorem on equivalent fractions
is equivalent to n1 > 1` . To prove the converse, we essentially run the argument
backwards: if n1 > 1` , then `n
` n
> `n , which then is equivalent to ` > n.

48
Many teachers dismiss the need for any proof of this corollary. The common
thinking behind the dismissal is that, for small values of ` and n, e.g., 3 and 2, then
3 > 2 implies 31 < 12 , as the following picture clearly shows:
1 1 2
0 3 2 3 1

Therefore, we extrapolate this intuitive argument to include the general statement, to


the effect that, if ` > n, dividing [0, 1] into ` segments of equal length would give
a segment shorter than a segment obtained from the division of [0, 1] into a smaller
number of segments of equal length, namely, n. This kind of intuitive argument is
valuable for guiding students to the correct conclusion, but as far as mathematics is
concerned, one cannot be blind to the fact that it fails to bring conviction to assertions
such as
1 1
>
8594276 8594277
This is why we should teach students the correct proof using the cross-multiplication
algorithm in addition to the intuitive argument using small values of ` and n. More-
over, students in sixth or seventh grade should begin to learn how to reason mathe-
matically using the available facts, because they will need to do that in algebra.

The following observations about the comparison of fractions are useful and also
easy to prove. Let A, B, C, D be fractions. Then, using to stand for is
equivalent to as before, we have:

(1) A < B there is a fraction C so that A + C = B.


(2) A < B implies A + C < B + C for every fraction C.
(3) A < B and C < D implies A + C < B + D.

The proofs require nothing more than the drawing of the relevant pictures on the
number line and we will leave it as a problem at the end of this section.

k m
We can now define the subtraction of fractions: Suppose `
> n
, then a segment
of length k` is longer than a segment of length m
n
.

49
k m
Definition If k` > m n , then the subtraction ` n is by definition the
length of the remaining segment when a segment of length m
n is taken from one end
of a segment of length k` .

The same reasoning as in the case of addition, using FFFP, then yields
k m kn `m
=
` n `n
It is to be noted that this formula makes implicit use of the preceding cross-multiplication
algorithm: the subtraction of whole numbers in the numerator of the right side,
kn `m, does not make sense unless we know kn > `m, but this is guaranteed by
the fact that k` > mn
.

We wish to bring out the fact that subtraction is an alternate way of expressing
addition. Indeed, the definition of k` m
n
implies that
k m m k
( )+ =
` n n `
In other words, assuming k` > m
n , then a fraction A satisfies
k m
=A
` n
if and only if it satisfies
m k
A+ =
n `
Thus we may regard k` m n as the fraction that solves the equation for the unknown
m k
A: A + n = ` .
Although this alternate view seems to add nothing to the concept of subtraction,
the more abstract perspective does serve as a bridge to the definition of the division
of fractions (see 5).

The subtraction of mixed numbers reveals a sidelight about subtraction that may
not be entirely devoid of interest. Consider the subtraction of 17 25 7 34 . One can do
this routinely by converting the mixed numbers into fractions:
2 3 85 + 2 28 + 3 87 31 87 4 31 5 193
17 7 = = = = .
5 4 5 4 5 4 54 20

50
However, there is another way to do the computation:
2 3 2 3
17 7 = (17 + ) (7 + ).
5 4 5 4
Anticipating a reasoning that will be made routine when we come to the next chapter
on rational numbers, we rewrite the right side as (17 7) + ( 25 34 ). Now we
are stuck because 52 < 34 so that the subtraction on the right cannot be performed
according to the present definition of subtraction. Using an idea that is reminiscent
of the trading technique in the subtraction algorithm for whole numbers, we get
around this difficulty by computing as follows:
2 3 2 3
17 7 = (16 + 1 ) (7 + )
5 4 5 4
2 3
= (16 7) + (1 )
5 4
7 3
 
= 9+
5 4
13 13
= 9+ = 9
20 20
The whole computation looks longer than it actually is because we interrupted it
with explanations. Normally, we would have done it the following way:
2 3 2 3 7 3 13 13
17 7 = (16 7) + (1 ) = 9 + ( ) = 9 + = 9
5 4 5 4 5 4 20 20
exactly the same as before.

Finally, there is a similar subtraction algorithm for finite decimals that


allows finite decimals to be subtracted as if they were whole numbers provided they
are aligned by the decimal points, and then the decimal point is restored at the
end. The reasoning is exactly the same as the case of addition (of decimals) and will
therefore be left as a problem.

Exercises 1.3

1. (a) We have an algorithm for adding two fractions: k` + m


n =
kn+`m
`n . Now explain
p
to an eighth grader how to obtain an algorithm for adding three fractions k` + m
n + q.

51
Make sure you can justify the algorithm. (b) If a, b, c are nonzero whole numbers,
1
what is ab + bc1 + ac
1
? Simplify your answer as much as possible.

2. Show a sixth grader how to do the following problem by using the number line:
I have two fractions whose sum is 17
12 and whose difference (i.e., the larger one mi-
1
nus the smaller one) is 4 . What are the fractions? (We emphasize that no solution
of simultaneous linear equations need be used. The purpose of this problem is to
demonstrate the power of the number line in the teaching of school mathematics.)

3. Explain to a sixth grader how to get 5.09 + 7.9287 = 13.0187.

3
4. Compute 78 54 67
14 in two different ways, and check that both give the same
result. (Large numbers are used on purpose. You may use a four-function calculator
to do the calculations with whole numbers, and only for that purpose.)

5. (a) Which is closer to 27 , 13 or 21


5
? (b) Which is closer to 23 , 12 9
19 or 13 ?
(c) Which whole number is closest to the following sum?
12987 114
+
13005 51
(Dont forget to prove it!)

a1
6. Suppose a, b are whole numbers so that 1 < a < b. Which is bigger: a or
b1 a+1 b+1
b ? Can you tell by inspection? What about a and b ?

7. State the subtraction algorithm for finite decimals, and explain why it is true. (See
the discussion of the addition algorithm for finite decimals near the beginning of this
section.)

8. Explain to a fifth grader why 1.92 is bigger than 1.91987.

7
9. (a) 52 + 12 =? (b) Laura worked on a math problem for 35 minutes without
success. She came back and re-focused and got it done in 24 minutes. How much time
did she spend on this problem altogether, and what does this have to do with part (a)?

52
10. (a) We want to make some red liquid. One proposal is to mix 18 fluid ounces of
liquid red dye in a pail of 230 fluid ounces of water, and the other proposal is to mix
12 fluid ounces of red dye in a smaller pail of 160 fluid ounces of water. The question:
which would produce a redder liquid? Do this problem in two different ways. (b)
An alcohol solution mixes 5 parts water with 23 parts alcohol. Then 3 parts water
and 14 parts alcohol are added to the solution. Which has a higher concentration of
alcohol, the old solution or the new?

11. If n is a whole number, we define n! (read: n factorial) to be the product of all


the whole numbers from 1 through n. Thus 5! = 1 2 3 4 5. Also define 0!
n
to be 1. Define the so-called binomial coefficients k for any whole number k
satisfying 0 k n as !
n n!
=
k (n k)! k!
Then prove: ! ! !
n n1 n1
= +
k k k1

(For those who remember Pascals triangle, this formula describes the usual rule for
generating Pascals triangle: add two consecutive numbers in the (n 1)-th row to
get the number right below them in the n-th row. We will explain Pascals triangle
in Chapter 10.)

12. Prove each of the following statements for fractions A, B, C, and D:

(1) A < B there is a fraction C so that A + C = B.


(2) A < B implies A + C < B + C for every fraction C.
(3) A < B and C < D implies A + C < B + D.

13. Let ab be a nonzero fraction, with a 6= b. Order the following (infinite number
of) fractions: ab , a+1 a+2 a+3
b+1 , b+2 , b+3 , . . . (Caution: it makes a difference whether
a < b, or a > b.)

53
14. In the notation of problem 11, observe that each fraction n!
j , where n, j are
whole numbers and 1 j n, is actually a whole number. Find the following sum
and simplify your answer as much as possible:
1 1 1 1 1 1
100! + 100! + 100! + + 100! + 100! + 100!
1 2 3 98 99 100

4 Multiplying fractions

In the context of school mathematics, it is of vital importance that we give a


definition of fraction multiplication. The reason is that this concept is one of those
whose precise meaning seems to elude textbook authors and education researchers.
Recall that for whole numbers, multiplication is, by definition, just repeated addition:
3 5 means 5 + 5 + 5 and 4 17 means 17 + 17 + 17 + 17. Such a definition cannot be
literally extended to fractions, e.g., it makes little sense to define 47 12 as adding 21
to itself 47 times. Consequently, the presentation of fraction multiplication in school
mathematics is usually muddled, thereby forcing some educators to advocate extreme
measures to achieve any kind of understanding of this concept.19
We will do mathematics the traditional way by giving a precise definition and
then draw precise consequences.

k m
Definition The product of two fractions `
n
is by definition

the length of the concatenation of k of the parts


when [0, mn
] is partitioned into ` equal parts

Note that, according to the definition in 1 of m


n
of a quantity, we may rephrase
the definition as:
k m k m
= of a segment of length
` n ` n
Or, more simply, when the unit is understood to be the unit of length:
19
For example, some suggest that one must rethink this concept by finding multiplicative rela-
tionships between multiplicative structures. This is not helpful.

54
k m k m
= of
` n ` n
This definition poses a potential problem, and we should deal with it right away.
We first illustrate the problem with a simple example. Consider 12 34 . We know
that 12 = 24 and 34 = 15 20 , and therefore we expect that, no matter how fraction
multiplication is defined, we will have the equality
1 3 2 15
=
2 4 4 20
But what if the preceding definition does not yield this equality? More generally,
suppose
k K m M
= and =
` L n N
Then we need assurance that, according to the preceding definition of a product, we
will have
k m K M
=
` n L N
Theorem 4 of 2 gives precisely this assurance, so we know that the definition of a
product is, in the common mathematical language, well-defined.

It is by no means obvious from this definition of the multiplication of fractions


that k` m m k
n = n ` . In other words, we cannot assume at this point that
multiplication is commutative among fractions. However, this property will follow
immediately from the product formula to be established below.
We now give a motivation for this definition of a product. For whole numbers k
and m, the product k m may be interpreted as k1 m 1 , and the definition of the
latter then states that it is equal to

the length of the concatenation of k of the parts


when [0, m1 ] is partitioned into 1 equal part

In other words, if we interpret k m as a product of fractions according to the pre-


ceding definition, then it is the length of the concatenation of k segments of length
m, i.e., m + m + + m (k times), which coincides with the usual multiplication
k m as whole numbers. This definition of fraction multiplication is therefore con-
sistent with our philosophy that any definition for fractions should be modeled on

55
the same definition for whole numbers. Moreover, this definition is also grounded in
our everyday experience. Indeed, suppose we want to give away two-fifths of a ham
(by weight), and the ham weighs 14 78 lbs. Without thinking, we would calculate the
amount of ham given away as
2 7
14 pounds
5 8
On the other hand, two-fifths of a ham by weight is (by the definition of of in 2)
the amount in 2 parts when we cut the ham into 5 parts of equal weight. In terms
of the number line whose unit 1 represents 1 lb. of ham, the ham is represented by
the number 14 78 . Cutting the ham into five parts of equal weight is then the same as
partitioning the segment [0, 14 87 ] into 5 segments of the same length. Therefore the
weight of two-fifths of 14 78 lbs. of ham is exactly the length of 2 of these concatenated
parts. Thus the above definition of fraction multiplication is a faithful, and precise,
reflection of what is done in our daily life. At the risk of harping on the obvious, the
precision comes from the precise definition of of given in 2.

There are two immediate consequences of the definition of fraction multiplication.


The first one is a new way to look at division by a whole number. Recall that we
defined k ` for whole numbers k and ` (` 6= 0) to mean the length of one part when
[0, k] is partitioned into ` equal parts. (See Theorem 3 2.) Thus from the definition
of fraction multiplication, we see that
1
k`= k
`
More generally, we may define, as a direct extension of the partitive interpretation of
division among whole numbers, the division of a fraction A by ` to be the length
of one part when [0, A] is partitioned into ` equal parts. Then it follows from the
definition of fraction multiplication that 1` A is equal to A divided by ` .

Division by a whole number ` will henceforth be replaced by multiplication by 1` .

A second immediate consequence of fraction multiplication is that, if k is a whole


number, then
m m m
k = + +
n |n {z n}
k

56
In other words, the multiplication k m
n
retains the meaning of repeated addition: it
m
is just k copies of n .

We want to prove the following theorem.

k m km
Theorem 1 (Product Formula) =
` n `n

This formula, which is usually presented, in one fashion or another, as the defini-
tion of the product k` m
n
in school textbooks or professional development materials,
is in fact the central theorem about fraction multiplication in school mathematics.
As an immediate corollary, we have:

Corollary The multiplication of fractions is associative, commutative, and dis-


tributive.

(See the Appendix for a summary of these laws.) We will leave the detailed proof
of the Corollary to a problem at the end of the section.

Proof of Theorem 1 In the proof of Theorem 4 in 2, we already observed that


k
` of m km
n = `n , which then proves Theorem 1. However, this fact is so important that
we will reprove it here. Let us partition [0, m
n
] into ` equal parts. By the definition
of k` of m
n , it suffices to show that the length of the concatenation of k of these
km
parts is `n . Since
m `m
= ,
n `n

we see that m 1 m
n is `m copies of `n , or what is the same thing, ` copies of `n . There-
fore if [0, m
n
] is partitioned into ` equal parts, the length of one part in this partition
m 1 1
is `n , i.e., m copies of `n . The length of k concatenated parts is thus km copies `n ,
which is km `n
, by definition of a fraction. Theorem 1 is proved.

As is well-known, the product formula has numerous applications. One of the


simplest may be the explanation of the usual cancellation rule for fractions. For

57
example, we have
135 49 105
=
28 9 4

because we can cancel the 9s and 7s in the numerators and denominators. The
precise reasoning is the following:
135 49 135 49
= (product formula)
28 9 28 9
15 9 7 7
=
479
15 7
= (equivalent fractions)
4
105
=
4

The same reasoning of course proves that for any fractions ma k


n and `a we have

ma k mk
=
n `a n`

A more substantial application of the product formula is undoubtedly the following


interpretation of fraction multiplication in terms of area;20 this interpretation is as
basic as the definition (of fraction multiplication) itself. We will prove that the area
of a rectangle is equal to the product of its sides. Let us first review some basic
properties of area. We fix a unit square, i.e., a square whose sides all have length
1. The area of the unit square is by definition equal to 1. Congruent figures have the
same area. Therefore, if the unit square is partitioned into n congruent pieces, then
all these pieces have equal areas. This partition is then a division of the unit (area
of the unit square) into n equal parts; by the definition of the fraction n1 , the area
of each of these pieces is n1 . For example, each of the following shaded regions of the
unit square has area equal to 14 :
20
See footnote 10 in Exercises 1.1 after 1 concerning the concepts of area and congruence. You
may take both in the naive sense in the present discussion.

58
@
@
@
@
@
@

For each n = 1, 2, 3, 4, . . ., it is straightforward to get subsets of the unit square


with areas equal to 1, 12 , 31 , 14 , . . . Any of these subsets will be referred to as fractions
of the unit square. The area of a geometric figure which is paved by a finite number
of fractions of the unit square (where pave is used to mean the pieces overlap only
at their boundaries, and their union is the figure itself) is just the sum of the areas
of the latter.
The interpretation in question is now given in the following

Theorem 2 The area of a rectangle with sides of lengths k` and m


n is equal to

k m

` n

In school mathematics, this theorem has to serve as the complete statement of


area = length times width, i.e., we go only as far as fractions for the lengths of the
sides of the rectangle, but nothing more. See the discussion of FASM at the beginning
of the chapter. However, the area of a rectangle whose sides have arbitrary lengths
will be considered in Chapter 17.
Proof We first prove a special case: the area of a rectangle whose sides have
length 1` and n1 is 1` n1 . If ` = 2 and n = 3, we take a unit square and divide one
side into 2 halves and the other into 3 parts of equal length. Joining corresponding
points of the division then partitions the square into 6 congruent rectangles:

59
1
2

1
3

1
Each of the 2 3 (= 6) rectangles therefore has area equal to 23 . In particular,
the shaded rectangle has sides of length 12 and 13 , and its area is 23
1
, which is equal
1 1
to 2 3 , by the product formula.
We can now give the proof of the Theorem. We first prove it for the special
case that k = m = 1. Divide the two sides of a unit square into ` equal parts and n
equal parts, respectively. Joining the corresponding division points creates a partition
of the unit square into `n congruent rectangles, and therefore `n rectangles with the
same area.

` copies

1
? `
1
n
 n copies -

1
The area of the shaded rectangle is therefore `n , which is 1` n1 , by the product
formula. Since 1` and n1 are the lengths of the sides of the shaded rectangle, the proof
of the special case is complete.
The area of a general rectangle with sides of length k` and mn
can now be computed.
One side of such a rectangle consists of k concatenated segments each of length 1` , and
the other consists of m concatenated segments each of length n1 . Joining corresponding
division points on opposite sides leads to a partition of the big rectangle into km small
rectangles each of which has sides of length 1` and n1 .

60
6

k copies

1
? `
1
n
 m copies -

1
We have just seen that each of these small rectangles has area equal to `n . Since the
big rectangle is paved by km of these congruent small rectangles, the area of the big
rectangle is the sum of the areas of these small rectangles, and is therefore equal to
1 1 km
+ + =
|`n {z `n} `n
km

Thus we have proved that the area of a rectangle with sides of length k` and m
n
is km
`n
.
k m km
But by the product formula, the product ` n is also equal to `n . The proof of
the Theorem is now complete.

We round off the discussion of the multiplication of fractions with two remarks.
The first is the explanation of the usual multiplication algorithm for finite dec-
imals. Consider for example
1.25 0.0067
The algorithm calls for
() multiply the two numbers as if they are whole numbers by ignoring
the decimal points,
() count the total number of decimal digits of the two decimal numbers,
say p, and
() put the decimal point back in the whole number obtained in () so
that it has p decimal digits.
We now justify the algorithm using this example, noting at the same time that the
reasoning in the general case is the same:
125 67
1.25 0.0067 = 2
4
10 10

61
125 67
= (product formula)
102 104
8375
= (corresponding to ())
102 104
8375
= (corresponding to ())
102+4
= 0.008375 (corresponding to ())

A second remark is that there are two standard inequalities concerning multipli-
cation that should be mentioned: If A, B, C, and D are fractions, then:

(A) If A > 0, then AB < AC is equivalent to B < C.


(B) A < B and C < D imply AC < BD.

Both are obvious when we interpret fraction multiplication as the area of a rectangle.
See problem 2 immediately following.

Exercises 1.4

1. Do each of the following without calculators. (a) (12 32 12 32 12 23 ) (2 19


1 1
2 19
1 1 7
2 19 ) 26
= ? (b) ( 18 4 32 ) + (2 61 7
18
) 7
+ ( 18 3 16 ) = ? (c) 8 50
2
1250 12 = ?

2. Give detailed proofs of the following for fractions A, B, C, and D:

(A) If A > 0, then AB < AC is equivalent to B < C.


(B) A < B and C < D imply AC < BD.

3. Give a detailed proof of the Corollary to Theorem 1.

4. (a) Find a fraction q so that 28 21 = q 5 41 . Do the same for 218 17 = q 19 12 .


(b) Make up a word problem for each situation, and make sure that the problems are
not the same for both.

62
5. The perimeter of a rectangle is by definition the sum of the lengths of its four
sides. Show that given a fraction A and a fraction L, (a) there is a rectangle with area
equal to A but with a perimeter that is bigger than L, and (b) there is a rectangle
with perimeter equal to L but with an area that is less than A.

6. (a) 16 21 cups of liquid would fill a punch bowl. If the capacity of the cup is 9 13 fluid
ounces, what is the capacity of the punch bowl? Explain carefully. (b) A rod can be
cut into 18 85 of a short piece that is 3 14 inches long. How long is the rod? Explain.

7. How many buckets of water would fill a container if the the capacity of the bucket
is 3 31 gallons and that of the container is 7 21 gallons? (Caution: Getting an answer
for this problem is easy, but explaining it logically is not.)

8. Consider the following two numbers A and B, where


18
A is the length of the concatenation of 4 parts when [0, 29 ] is divided into
17 equal parts.
4
B is the length of the concatenation of 18 parts when [0, 17 ] is divided
into 29 equal parts.

Is A equal to B. Why?

9. Give a proof of the distributive law for the division of whole numbers, namely, let
k, m, n, be whole numbers, and let n > 0. Then

(m n) + (k n) = (m + k) n

10. (This is problem 9 in Exercises 1.2. Now do it again using the concept of fraction
multiplication.) James gave a riddle to his friends: I was on a hiking trail, and after
7
walking 12 of a mile, I was 95 of the way to the end. How long is the trail? Help his
friends solve the riddle.

11. Given two fractions. Their difference is 54 of the smaller one, while their sum is
equal to 28
15 . What are the fractions? (Hint: Use the number line.)

63
5 Dividing fractions

The overriding fact concerning the concept of division is that division is an alter-
nate, but equivalent way of expressing multiplication. This statement is more delicate
than most people realize because something quite similar to it, but no longer correct,
usually makes its way into most school textbooks, namely, division and multiplica-
tion are inverse operations. By the time we are done with the discussion of division,
you will probably understand why this is not correct.
We start with whole numbers.
We teach children that 369
= 4 because 4 is the whole number so that 4 9 = 36.
This then is the statement that 36 divided by 9 is the whole number which, when
multiplied by 9, gives 36. In symbols, we may express the foregoing as follows: 36 9
is by definition the number k which satisfies k 9 = 36. Similarly, 72 divided by
24 is the whole number n which, when multiplied by 24, gives 72, i.e., n 24 = 72.
Likewise, 84
7
is the whole number m which satisfies m 7 = 84, etc. In general, given
any two whole numbers a and b with b 6= 0, we always want the division ab to be the
whole number c so that cb = a, and this cannot happen if a is not a multiple of b
to begin with. This is why the precise formulation of the concept of division among
whole numbers is this:

Given whole numbers a and b, with b 6= 0 and a being a multiple of b, then


the division of a by b, in symbols ab , is the whole number c so that the
equality cb = a holds.21

Once we have this precise definition of division, we can assert the following:

(*) Given whole numbers a, b, and c, with b 6= 0, ab = c is equivalent to


a = bc.

This is not the same as the usual glib statement that multiplication and division
are inverse operations. The difference is that the latter statement is usually made
without first defining precisely what division means, and without explicating what an
inverse operation is, whereas statement (*) refers to a precise definition of division
21
This precise definition of division explains why division by 0 has no meaning, because if it had,
then for a nonzero whole number a, a0 is the whole number k so that k 0 = a. But the last
equation make no sense because the left side is 0 while the right side a is nonzero to start with.

64
so that ab = c already has a precise meaning. One cannot say a statement A is
equivalent to another statement B if one does not know what A or B means in the
first place.
The preceding definition of division among whole numbers is important for the
understanding of division among fractions because the latter is patterned after the
former, with one caveat. The definition of whole number division a b makes sense
only when a is a multiple of b. Our first task in approaching the division of fractions
is to remove this assumption. We have the following theorem.

Theorem 1 Given fractions A and B (B 6= 0), there is a unique (one and only
one) fraction C, so that A = CB.

Proof Let A = k` and B = m n


, then the fraction C defined by C = `m kn
clearly
satisfies A = CB. (Since B is nonzero, m is nonzero. Therefore `m 6= 0 and this
fraction C makes sense.) This proves that such a C exists. If there is another fraction
C 0 that satisfies A = C 0 B, then
k m
= C0
` n
Multiplying both sides by m yields `m = C . So C 0 = C, and the theorem is proved.
n kn 0

The proof of Theorem 1 shows explicitly how to get the fraction C so that CB = A:
If A = k` and B = mn
, then the proof gives C as

kn k n
C= = (\)
`m ` m
This fact will be useful below.
Despite the simplicity of the statement of the theorem, Theorem 1 itself is con-
ceptually subtle and may take some getting used to. It says, for example, that if a
fraction B is nonzero, then every fraction A is a fractional multiple of B, in the
sense that A = CB for some fraction C. (Note that, since we are no longer dealing
exclusively with whole numbers, the meaning of multiple has to be suitably modified.
In the future, if we want to indicate that there is a whole number C so that A = CB,
we will say explicitly that A is a whole number multiple of B.) Taking A = 1,
the theorem implies that there is exactly one fraction, which we will denote by B 1 ,

65
so that B 1 B = 1. We call this B 1 the inverse (or multiplicative inverse, to be
precise) of B. If B = m n
, then the proof of the theorem shows that B 1 = m n
. For
1
this reason, B is also called the reciprocal of B in the context of fractions. Using
this notation, the expression of C in (\) above can be rewritten as C = AB 1 . For
example, if A = 115
and B = 23 8
, then the C that satisfies CB = A is, according to
(\),
11 8 88
C = AB 1 = =
5 23 115
We now give the definition of fraction division. It is, word for word, the same as
the preceding definition of whole number division, with the exception that, thanks to
the theorem, there is no need to require that A be a fractional multiple of B.

6 0), then the division of A by B, or


Definition If A, B, are fractions (B =
A
the quotient of A by B, denoted by B , is the unique fraction C so that CB = A.

By the preceding theorem, the fact that there is always a unique such C is not in
doubt. In fact, we have seen that this C is just AB 1 . Thus we have, as a matter of
definition, that
A
= AB 1
B
If the given fractions are k` and m
n
, then the comment immediately following the proof
of the theorem implies that
k
` k n
m =
n
` m
This is the famous invert and multiply rule for the division of fractions. We see
that there is nothing mysterious to this rule: it is a simple consequence of the correct
definition of division.
As before, we note that, for fractions A, B, C, with B 6= 0,
A
the statement B
= C is equivalent to the statement A = CB.

This is the precise meaning of the statement that division is an alternate, but equiv-
alent, way of writing multiplication.

There is a subtle point about fraction division that must be cleared up. For a
fraction such as 75 , we have explained in what sense it is a division of 7 by 5 as whole

66
numbers at the end of 2. Yet 75 may also be regarded as the division of the fraction
7 by the fraction 5 (remember that each whole number is also a fraction). Are these
two concepts of division the same? We now show that they are. Indeed, let us denote
the division of 7 by 5 as whole numbers by the old notation 7 5 to avoid confusion.
Then 7 5 is the length of one part when [0, 7] is partitioned into 5 equal parts (see
the end of 2). It follows that the concatenation of 5 such parts would be [0, 7] and
therefore 5 (7 5) = 7, or,
7 = (7 5) 5
But this says that the number 7 5 is the number C which satisfies 7 = C 5. There-
fore, according to the uniqueness assertion in the definition of fraction division, this
C is equal to the division of the fraction 7 by the fraction 5, i.e., 7 5 has the same
meaning as fraction division between the two fractions 7 and 5. The same reasoning
serves equally well to show that if m, n are two whole numbers (n 6= 0), then the
meaning of m divided by n as whole numbers as defined at the end of 2 is the same
as the meaning of m divided by n as fractions.

The following is a typical application of the concept of fraction division in school


mathematics. Notice the difference between the usual presentation and the one given
here: we give the explicit reason why division has to be used.

Example A rod 43 83 meters long is cut into pieces which are 53 meters long. How
many such pieces can we get out of the rod?
If we change the numbers in this example to if a rod 48 meters long is cut into
pieces which are 2 meters long, how many such pieces can we get out of the rod?,
then there would be no question that we do the problem by dividing 48 by 2. So we
will begin the discussion by following this analogy and simply divide 43 83 by 35 and
see what we get:
43 38 1041 1
5 = = 26
3
40 40
We have used invert and multiply for the computation, of course. Now what does the
1
answer 26 40 mean? Remembering the definition of division, we see that the preceding
division is equivalent to
3 1 5
43 = 26
8 40 3

67
1 5
 
= 26 +
40 3
5 1 5
   
= 26 + (distributive law)
3 40 3
In other words, we have
3 5 1 5
   
43 = 26 +
8 3 40 3
The first term on the right, 26 53 , is the length of the concatenation of 26 segments
each of length 35 , and the second term on the right, 40 1
53 , is the length of a segment
1
which is 40 of 53 , by the definition of fraction multiplication. Thus the rod can be cut
into 26 pieces each of 35 meters in length, plus a piece that is only 40 1
of 53 meters. This
then provides the complete answer to the problem, and retroactively justifies the use
of division to do the problem.
Notice that the key to getting the correct answer is knowing the precise definition
of division (which allowed us to convert the division into a multiplication) and know-
ing the distributive law (which allowed us to arrive at a correct interpretation of the
1
answer 26 40 from the division).

You may find such an after-the-fact justification of the use of division to do


the problem to be unsatisfactory. There is in fact a logical reasoning that leads
inexorably to the conclusion that division should be used. We now present this
reasoning.
Let there be a maximum of K copies of 53 in 43 38 , where K is a whole number.
Then 43 38 K 53 is less than 53 (as otherwise K would not be the maximum
number of such copies). Denote 43 38 K 53 by r, then we may rewrite the
definition of r as
3 5 5
43 = (K ) + r, where 0 r < 3
8 3
Now, by the theorem at the beginning of this section, we may express r as a
multiple of 53 , i.e., there is a fraction m
n so that

m 5
r=
n 3
We notice that m n must be a proper fraction in the sense that m < n,
because r < 53 and r is m 5
n of 3 . Therefore substituting this value of r into the

68
above equation gives:
3 5 m 5
43 = (K ) + ( )
8 3 n 3
m 5
= (K + )
n 3
m m
Note that K + n is a mixed number (because n is a proper fraction), so we
have
3 m 5
43 = (K )
8 n 3
By the definition of division, we see that

m 43 83
K = 5
n 3

Of course if we know the mixed number K m n , then we would know the answer
to the problem, which is K. Therefore, the import of the preceding equation
is that, in order to find the maximum number of 53 s in 43 83 , we should do the
division:
43 38
5
3
m 1
Recall that, by the above calculation, K = 26 and n = 40 .
We have thus explained how one can give an a priori justification for the use
of division to solve this problem.

We now bring closure to the discussion of the arithmetic of finite decimals by tak-
ing up the division of decimals. The main observation is that the division of decimals
is reduced to the division of whole numbers. The following example is sufficient to
illustrate the general case:
2.18
0.625
becomes, upon using invert and multiply,
2180
2.180 103 2180
= 625 =
0.625 103
625

This reasoning is naturally valid for the division of any two finite decimals. According
to Theorem 3 of 2, the division of two whole numbers is just a fraction. Therefore
the general conclusion is that the division of any two finite decimals is equal to a
fraction.

69
The next step is to convert a fraction to a decimal. It turns out that in almost
all cases, a fraction is equal to an infinite decimal. Referring to Theorem 2 in 2
of Chapter 3 for the precise statement, we will be content here to explain, in the
special case of fractions whose denominators are a product of 2s and 5s, why one
can convert these fractions to finite decimals by long division. It suffices to give two
examples because they already embody the general reasoning.
Consider the fraction 2180
625 above. By the cancellation rule for the product of
fractions (see 4), we know that for any whole number k,

2180 10k
!
2180 1
= (])
625 625 10k

Because 625 = 54 , the exponent 4 suggests that for k = 4, the fraction on the right
side of (]),

2180 104
!
, ([)
625

is a whole number. The reason is that 104 = 24 54 , so that by the cancellation law
(2),
2180 104 2180 24 54
= = 2180 24 = 34880
625 54

Therefore with k = 4 in (]), we get


2180 1 34880
= 34880 4 = = 3.4880
625 10 104
where the last step is by the definition of a finite decimal.
We pause to reflect on the above reasoning. First of all, the case of k > 4 in (])
is immediately reduced to the case of k = 4, because

2180 10k 2180 104 10k4


!
1
=
625 10k 625 10k

Now the numerator of 10k4 /10k is 10 times itself k 4 times (dont forget k > 4)
while the denominator is 10 times itself k times, which is 4 more 10s than what is in
the numerator. Therefore
10k4 1
k
= 4
10 10

70
so that
2180 10k 2180 104
! !
1 1
k =
625 10 625 104
A second comment is that if k > 4 the fraction in ([) with the exponent 4 replaced
by k will continue to be a whole number, because
2180 10k 2180 104 10k4 2180 104
= = 10k4 = 34880 10k4
625 625 625
By (]), we see that with any whole number k 4, we have
2180 K
= ()
625 10k
where K is the whole number
2180 10k
!
K=
625
By Theorem 3 of 2, K can be obtained by the long division of 2180 10k by 625,
so that (on account of ()) when a decimal point is placed k digits from the right of
the quotient, we get a decimal equal to 2180
625 . We have thus retrieved the traditional
algorithm for converting a fraction to a decimal by long division, at least
for the special case where the denominator is a product of 2s and 5s.
We will quickly go through another example to firm up the ideas. Consider 15 32 .
5
Because 32 = 2 , we assert that
15 105
!
15 1
=
32 32 105
and that
15 105
!
is a whole number.
32
In fact, by long division, we find that the latter is 1500000
32 = 46875, so that
15 46875
= = 0.46875
32 105
5
The fact that 1510
32 is a whole number is because
15 105 15 55 25
= = 15 55 = 46875
32 25

71
Moreover, as before, for any whole number k > 5,

15 10k 15 105
! !
15 1 1
= k =
32 32 10 32 105

This then leads to the usual statement that we can convert 15


32 to a finite decimal by
performing the long division (15 10k ) 32 and then placing the decimal point k
digits from the right. The same reasoning proves the following

Theorem 2 Let m n be a fraction so that n is a product of 2s and 5s. Then for


a sufficiently large whole number k, the division of m 10k by n, i.e.,

m 10k
n
q
is a whole number q, and m
n is equal to the finite decimal 10k .

The general case of converting a fraction to an infinite decimal will be taken up


in Chapter 16 after the limit concept has been introduced.

Exercises 1.5

1. You want to cut pieces that are 1 31 inches long from a rod whose length is 85 12
inches. Explain to a sixth grader what is the maximum number of such pieces you
can get, and how many inches of the rod are left behind.

2. It takes 2 tablespoons of a chemical to de-chlorinate 120 gallons of water. Given


that 3 teaspoons make up a tablespoon, how many teaspoons of this chemical are
needed to de-chlorinate x gallons of water? (Assume that the amount of water, di-
vided by the amount of chemical needed to de-chlorinate this amount of water, is
a constant.) Caution: Dont even think about doing this problem by setting up a
proportion, because this procedure cannot be justified.

3. Let a, d be whole numbers, and let q and r be the quotient and remainder of a
divided by d. Let also Q be the fraction so that a = Qd. Determine the relationship

72
among Q, q, and r. (Those who are unsure of the meaning of division with remainder
can look up 1 of Chapter 3 below.)

4. The following is supposedly a different approach to the division of fractions:


k/`
We try to find out what m/n
could mean. Using equivalent fractions,
we get
k k k`n
` `
`n ` kn
m = m = m`n = ,
n n
`n n
`m
and therefore
k
` kn
m = .
n
`m
Is this correct?

5. (a) How many 1 13 s are there in 95 27 ? (b) How many blocks of 18 minutes are
there in 8 12 hours? Do it in terms of minutes, and then do it in terms of hours.
Compare.

12
6. (a) Explain to a sixth grader how to use long division to convert 3125 to a decimal.
3
(b) Do the same with 64 .

7. Do the following problem using only what we have done thus far: Two fractions x
3
and y satisfy xy = 10 and xy = 15
8
. What are x and y?

8. Let A and B be fractions and let AB = 0. If A 6= 0, prove that B = 0.

5
9. (a) 12 of a sack of rice is 8 23 the weight of 5 books. Each book weighs 2 12 lbs. How
much (in lbs.) does a sack of rice weigh? (b) A pizza parlor has a Learning Fractions
Special. Normally, it charges m m
n 8 dollars for n of a small pizza. During this
special sale, it sells 12 of a pizza for the usual price of 31 of a pizza.22 At the sales
price, how much would 8 23 small pizzas cost?

22
I got this idea from my friend David Collins. We believe that if all pizza parlors buy into this
idea, the national fractions achievement will improve.

73
1 1
10. (a) 1 1 =? = ? (b) If x, y are nonzero fractions, what is
(
2 3
+ 14 ) 1 1
(
2 2/3
+ 1
5/4
)
1
1 1 ? (This expression for x and y turns up often enough to merit a name: the
(
2 x
+ y1 )
harmonic mean of x and y.) (c) If x, y, u, v are nonzero fractions so that x < u
and y < v, prove that
xy uv
<
x+y u+v

5
11. (a) Use the number line to solve the following: If 13 of a number N exceeds a
third of N by 8, what is N ?

12. Show that there is a rectangle with area < 1 sq. cm and perimeter equal to 1,000
cm.

6 Complex fractions

Further applications of the concept of division cannot be given without introduc-


ing a certain formalism for computation about complex fractions, which are by
A
definition the fractions obtained by a division B of two fractions A, B (B > 0).23 We
A
continue to call A and B the numerator and denominator of B , respectively. Note
that any complex fraction B is just a fraction, more precisely, the fraction AB 1 , so
A

A C
all that we have said about fractions applies to complex fractions, e.g., if B and D
are complex fractions, then
A C A C
B D is B of D .
Such being the case, why then do we single out complex fractions for a separate
discussion? For an answer, consider a common example of adding fractions:
1.2 3.7
+
31.5 0.008
23
This is a confusing piece of terminology because it suggests that complex numbers are involved,
but they are not. Since this is the terminology in use in school mathematics and the confusion is
tolerable, we will go along. Such compromises are unavoidable.

74
Note that this is an addition of complex fractions because 1.2 = 12
10
, 31.5 = 315
10
, etc.
Now, the addition can be handled by the usual procedures for fractions because we
may invert and multiply to obtain
12 37
1.2 3.7 10 10
+ = 315 + 8
31.5 0.008 10 103

12 3700
= +
315 8
1165596
=
2520
Nevertheless, school students are taught to do the addition by treating the decimals
as if they were whole numbers and directly apply the addition algorithm for fractions
to get the same answer:
1165596
(1.2 0.008) + (3.7 31.5) 116.5596 10000 1165596
= = 2520 =
31.5 0.008 0.252 10000
2520

What this does is to make use of the formula k` + m n =


kn+m`
`n , which is valid
up to this point only for whole numbers k, `, m, n by letting k = 1.2, ` = 31.5,
m = 3.7, and n = 0.008, regardless of the fact that 1.2, 31.5, etc., are not whole
numbers. Because the simplicity of such a computation is so attractive, it gives us a
strong incentive to prove that the formula
k m kn + m`
+ = is also valid when k, `, m, n are fractions.
` n `n
Similarly, we would like to be able to multiply the following complex fractions as if
they were ordinary fractions by writing
0.21 84.3 0.21 84.3
=
0.037 2.6 0.037 2.6

regardless of the fact that the product formula k` m km


n = `n has only been proved for
whole numbers k, `, m, n.
So far we have talked about perhaps nothing more than a subjective preference
for formal simplicity in calculations with complex fractions. However, the need for
extending the usual formulas for ordinary fractions to complex fractions is real. For

75
example, suppose we consider the multiplication of so-called rational expressions in
a number x (see 1 of Chapter 6), e.g.,

x+1 7 (x + 1) 7
2
3 = 2
x 5 x +2 (x 5)(x3 + 2)

This x can take any value; in particular, suppose x = 43 . Then the left side becomes
a product of complex fractions:
3
4
+1 7
3 2
(4) 5 ( 34 )3+2

The fact that this product is equal to the right side, i.e., equal to

( 43 + 1) 7
(( 34 )2 5)(( 43 )3 + 2)

depends on the fact that the product formula k` m n


= km
`n
is valid for complex fractions,
i.e., for fractions k, `, m, n. Similar computations with rational expressions related to
addition or subtraction abound in the study of algebra. In other words, the validity
of the usual algebraic computations requires an extension of the usual formalism in
fractions to complex fractions.
It is considerations of this type that force us to take a serious look at complex
fractions.

Almost all existing textbooks allow computations with complex fractions to be


performed as if they were ordinary fractions without a word of explanation. As a
teacher, you should make every effort to correct this oversight in your classroom.

Here is a brief summary of the basic facts about complex fractions that figure
prominently in school mathematics: Let A, . . . , F be fractions, and we assume fur-
ther that they are nonzero where appropriate in the following. Then:

AC A
(a) Cancellation law: If C 6= 0, then BC = B .
16 7 16
17
Example: 52 7 = 5
2 .
3
17 3

76
A C
(b) B = D if and only if AD = BC .
A C
B
< D
if and only if AD < BC
4 13
5 2 4 16 2 13
Example: 2 < 16 because < .
3 3
5 3 3 2

A C (AD)(BC)
(c) B D = BD

1.2 3.7 (1.2 0.008) + (31.5 3.7)


Example: + = .
31.5 0.008 31.5 0.008

A C AC
(d) B D = BD

0.21 84.3 0.21 84.3


Example: = .
0.037 2.6 0.037 2.6

A C E A C A E
     
(e) Distributive law: B
D
F
= B
D
B
F
2 6 2 6
! ! !
0.5 3 7 0.5 3 0.5 7
Example: 4 + 8 = 4 + 8 .
1.7 5 9
1.7 5
1.7 9

Formulas (a), (b), and (d) are the generalized versions of the cancellation law, the
cross-multiplication algorithm, and the product formula, respectively, for ordinary
fractions. Formula (e) is nothing more than the usual distributive law stated in the
A C
context of complex fractions, as each of B , D , etc., is just a fraction. We call explicit
1.2 3.7
attention to the fact that (c) and (d) justify the above computations with 31.5 + 0.008
0.21
and 0.037 84.3
2.6
. Note also that it follows immediately from (a) that the cancellation
rule for fractions (see 4) continues to hold for complex fractions: CE D
A
BE AC
= BD , if
E 6= 0. For example,
125 125 125
8.7 = 8.7 =
26.1 3 8.7 3
One can give algebraic proofs of (a)(d) that are entirely mechanical: e.g., for (a),
let A = k` , B = m n
, C = pq , substitute these values into both sides of (a), invert and
multiply each side separately and verify that the two sides are equal. Do the same
for every other assertion. This way of proving (a)(d) would be correct, but it would
also not be particularly educational. We now explain a more sophisticated method

77
of proving (a)(e); it is one that you would use in a school classroom perhaps only
sparingly, but it is a piece of mathematics that is worth learning.
AC A AC A
Let us prove (a), i.e., BC = B . Let x = BC and y = B . We have to prove
AC
x = y. Since x = BC , by the definition of division, AC = xBC. Similarly,
A = yB, so that multiplying both sides by C gives AC = yBC. Comparing this
with AC = xBC, we see that we have expressed AC as a multiple of BC in two
AC
ways. Since BC 6= 0 (it is the denominator of BC ), Theorem 1 in 5 says that these
two ways are the same, i.e., x = y.
The proofs of the others can be safely left as exercises.
There will be no end of examples to illustrate the ubiquity of these formulas in
subsequent computations, but we can give an interesting application right away.

82
Example Give the approximate location of on the number line.
26 12
What we want to say, intuitively, is that 26 12 is more or less 26, and therefore the
given complex fraction is roughly 8226
4
, which is 3 26 , which is a little beyond 3 on the
number line. Here is one way to convert such intuitive feelings into solid mathematics.
(The ability to do such conversions is a basic part of mathematics learning.) We wish
to compare this clumsy complex fraction with an ordinary fraction, and there is no
better way to do that than replacing 26 12 with the whole numbers closest to it: 26
and 27. Clearly, 26 < 26 21 < 27, and since (intuitively) the smaller the denominator,
the bigger the fraction if the numerator is fixed, we expect
82 82 82
< 1 <
27 26 2 26

Having made this guess, we must prove it. Let us first prove the left inequality. By
(b) above, the inequality
82 82
< 1
27 26 2
is equivalent to 82 26 12 < 82 27, and this is true because 26 21 < 27 and because of
the fact that A < B implies AC < BC (see the end of 4). One proves in a similar
manner the other inequality:
82 82
1 <
26 2 26

78
Thus the given complex fraction is trapped between 82 27
82
and 26 1
, i.e., between 3 27 and
4 4 1
3 26 . Since both of the latter are less than 3 24 = 3 6 , the given complex fraction is
beyond 3 but to the left of 3 16 on the number line.

Exercises 1.6

1. Prove (b)(d), not by the mechanical procedure, but by employing the reasoning
used in the text to prove (a).

2. Explain, in as simple a manner as possible, approximately where the fraction


2
163 65
1 is on the number line. (This is a mathematical problem, which means that
54 27
you have to be precise even when you make approximations. If you need a model,
look at the Example immediately above and learn how to give both an upper and
lower bound for each approximation.)

3. Let A and B be fractions and B 6= 0. Prove that for any nonzero whole number j,
A A jA
+ + = .
|B {z B} B
j

A
4. Divide 98 into two parts A and B (i.e., A + B = 98) so that B = 67 .

5. Divide 72 into two parts A and B so that B


A
= 45 .

6. (a) Find the fraction K so that K 2.5 = 4 31 K ? (b) Find the fraction A so
A2.5
that 4.1A = 23 . (c) Can you interpret the statements in (a) and (b) geometrically?

7 Percent, ratio, and rate problems

The concepts of percent, ratio, and rate are among the most troublesome for
students. Concerning this phenomenon, what we know with certainty is that these

79
concepts have never been clearly explained in the mathematics education literature. If
something is never adequately taught, then it is difficult to even make a guess about
the root cause of the learning difficulty. Any hope of improvement must therefore
begin with a mathematically adequate presentation of the material in the school
classroom.
In this section, we supply precise definitions of the first two by making essential
use of complex fractions. We also explain why there is no need for any definition
of the third (rate), but that the concept of complex fractions makes possible a lucid
discussion of so-called rate problems. We begin with a precise definition of percent
in terms of complex fractions.

Definition A percent is a complex fraction whose denominator is 100.

N
By tradition, a percent 100 , where N is a fraction, is often written as N %. By
N
regarding 100 as an ordinary fraction, we see that the usual statement N % of a
quantity m n
is exactly N % m n
(see the discussions at the end of 2 and at the
beginning of 6).
Now, the following are examples of the three kinds of standard questions on per-
cents that students traditionally consider to be difficult:
(i) What is 5% of 24?
(ii) 5% of what number is 16?
(iii) What percent of 24 is equal to 9?
The answers are simple consequences of what we have done provided we follow the
5
precise definitions.24 Thus, (i) 5% of 24 is 5% 24 = 100 24 = 65 . For (ii), let
us say that 5% of a certain number y is 16, then again strictly from the definition
5
given above, this translates into (5%)y = 16, i.e., y 100 = 16. By the definition of
division, this says
16 100
y= 5 = 16 = 320
100
5
N
Finally, (iii). Suppose N % of 24 is 9. This translates into N %24 = 9, or 100
24 = 9.
24
Always remind your students that if they dont know definitions, they are not in a position to
do mathematics, in the same way that anyone who has no vocabulary is not in a position to write
novels.

80
Multiplying both sides by 100
24 , we have
900 75 1
N= = = 37
24 2 2
So the answer to (iii) is 37 12 %.
What we can conclude from this short discussion is that, if students have an
adequate background in fractions and have been carefully instructed in the use of
symbols, the concept of percent is straightforward and involves no subtlety. If this
kind of instruction has been implemented in the school classroom, then education
research would be in a position to shed light on what the real learning difficulties are.
Until then, we should concentrate on meeting the minimum requirement of mathe-
matics, which is to provide clear and precise definitions of all the concepts. Note
however that such a definition of percent cannot be given if the concept of a complex
fraction is not available.

Next we take up the concept of ratio, and it is unfortunately one that is encrusted
in excessive verbiage. It would be expedient, therefore, to begin with a short defini-
tion.

Definition Given two fractions A and B. The ratio of A to B, sometimes


A
denoted by A : B, is the complex fraction B .

In connection with ratio, there are some common expressions that need to be
made explicit. To say that the ratio of boys to girls in a classroom is 3 to 2 is
to say that if B (resp., G) is the number of boys (resp., girls) in the classroom, then
the ratio of B to G is 32 . Similarly, in making a fruit punch, the statement that the
ratio of fruit juice to rum is 7 to 2 means that we are comparing the volumes
of the two fluids (because the use of volume for measurement is understood in this
situation), and if the amount of fruit juice is A fluid ounces and the amount of rum
is B fluid ounces, then the ratio of A to B is 27 . And so on.
We will now work out some standard problems on ratios strictly using this defini-
tion. The clarity of the ensuing discussion, as well as the ease with which we dispatch
the problems, may serve as a persuasive argument for the definition.

81
Example 1 In a school auditorium with 696 students, the ratio of boys to girls
is 11 to 13. How many are boys and how many are girls?
Let the number of boys be B and the number of girls be G, then we are given
that BG
= 11
13
. Thus by the cross-multiplication algorithm, 13B = 11G. Let k be this
k k
common number, i.e., 13B = 11G = k, so B = 13 and G = 11 . Now we are
k k 24k
also given B + G = 696, so 13 + 11 = 696. This gives 143 = 696, and therefore
k
24k = 143 696, i.e., k = 29 143. Since B = 13 , we get B = 319. The value
k
of G can be obtained from either B + G = 696, or from G = 11 . In any case, G = 377.

A more sophisticated problem is the following.25

2
Example 2 Divide 88 into two parts so that their ratio is 3
to 45 .
Let the two parts be A and B. Then we are given that
2
A 3
= 4
B 5

Using invert-and-multiply on the right and simplifying, we get


A 5
=
B 6
By the cross-multiplication algorithm, 6A = 5B. Let s be the common value. Thus
6A = s and 5B = s, leading to A = 6s and B = 5s . Because A + B = 88, we have
s
6
+ 5s = 88, so that 11s
30
= 88, and therefore s = 240. It follows from 6A = s that
A = 40, and from 5B = s that B = 48. Thus the two parts are 40 and 48.

In this short discussion, we have intentionally used only one method to do both
examples to emphasize the simplicity of such problems. We leave to an exercise the
exploration of other methods of solution.

In school mathematics, the most substantial application of the concept of division


is to problems related to rate, or more precisely, constant rate. The precise definition
of the general concept of rate requires more advanced mathematics, and in any
25
This was a problem in a 1875 California Exam for Teachers, and it was mentioned in the well-
known address of Lee Shulman, Those who understand: Knowledge growth in teaching, Educational
Researcher 15 (1986), 4-14.

82
case, it is irrelevant whether we know what a rate is or not.26 What is relevant is
to know the precise meaning of constant rate in specific situations, and the most
common of these situations will be enumerated in due course. Among these, the
rate involving motion is what we call speed. Because this concept may be the most
intuitive, we proceed to discuss it in some detail.
Instead of giving, outright, a definition of what constant speed motion is, we
assume an intuitive understanding of the concept and use it to formulate, heuristically,
a correct definition of constant speed. We idealize the motion to be on the number
line and traveling to the right. We will present an argument that lends credence to
the fact that for the motion to be of constant speed, there must be a fixed constant
v, so that, if during a time interval of length t (hours, minutes, etc.) the distance
traveled is d (feet, miles, etc.), then regardless of what t is,

d
=v
t
This number v is what we call the speed of the motion and the unit of v will have
to be feet per second, feet per minute, miles per hour, etc. For definiteness, let us
say it is miles per hour, and we abbreviate it as mph as usual. We first assume that
d and t are only fractions. Then by the definition of division, the equation dt = v is
equivalent to
d = vt
for any t. We are therefore going to present a heuristic argument why, if a motion
has constant speed v, it is reasonable to have d = vt for any t.

The case that t is a whole number: From t = 0 to t = 1, the distance


traveled is v miles. From t = 1 to t = 2, the distance traveled is also v
miles, and in general, from t = n 1 to t = n for any whole number n,
the distance traveled is also v miles during that one hour. Thus during
the time interval of n hours (n still a whole number) from t = 0 to t = n,
the total distance traveled is v + v + + v (n times) miles, which is nv
26
Students of calculus beware! If you want to be able to teach school mathematics, you still need
to learn how to explain constant rate in an elementary manner without once mentioning a function
whose derivative is a constant. This is in fact an excellent example of why, no matter how much
advanced mathematics one knows, one must learn school mathematics in order to be a good teacher.

83
miles. Thus d = vt is certainly true if t is a whole number n.

The case that t is a unit fraction, i.e., t = n1 for some nonzero whole num-
ber n: Now fix a nonzero whole number n and we ask how much distance
is traveled in n1 of an hour. We know the distance traveled in one hour is
v miles. Since there are n time intervals of n1 of an hour in one hour, any
motion of constant speed should move the same distance in each of the
n time intervals. Thus the distance traveled in any time interval of n1 of
an hour should be the length of a part when [0, v] is divided into n equal
parts. By the division interpretation of a fraction, the length of such a
part is exactly nv miles. But nv = v n1 , so again, d = vt is correct when
t = n1 .

The case that t is any fraction: It remains to prove that the equality
d = tv is valid for any fraction value of t. Consider then a time interval
of length m m 1
n hours. Now n is m copies of n , and we already know that in
any n1 of an hour the motion travels nv miles. But
m 1 1
= + + ,
n |n {z n}
m

so the total distance d traveled in m


n hours must be
v v
+ + miles,
|n {z n}
m

which comes to mv
n miles (notice that we have made use of formula (c) of
complex fractions in 6). Since
mv m
=v ,
n n
we see that d = vt is true when t = m
n hours.

Now we have seen, heuristically at least, that for motion of constant speed v, the
distance traveled during a time interval of length t for any fractional value of t is

84
vt. As remarked above, this means dt = v for any fraction t. By FASM, we may
conclude that the equation dt = v must be valid for any motion of constant speed for
any positive number t. This gives us conviction to introduce the following definition.

Definition A motion is of constant speed v if, given any positive number t,


the distance traveled, d, during a time interval of length t is d = vt.

Equivalently, a motion is of constant speed if there is a fixed constant v, so that for


any positive number t, the distance d (feet, miles, etc.) traveled in any time interval
of length t (seconds, minutes, etc.) satisfies
d
=v
t
We now describe a characteristic property of motions of constant speed. For any
motion, we introduce the concept of the average speed over the time interval
from t1 to t2 , t1 < t2 , as
distance traveled from t1 to t2
t2 t1

What needs to be singled out is the fact that the term average speed by
itself carries no information, because we have to know the average speed
from a specific point in time t1 to another point in time t2 . In addition,
because the terminology (average) stimulates the conditioned reflex of
add two numbers and divide by 2, students need to put this conditioned
reflex in check. The added cognitive complexity associated with average
speed is thus something your students will not take to kindly when you
teach this concept, but it is nevertheless something you must impress on
them because mastering subtleties of this kind prepares them for higher
mathematics and science.
Now suppose we have a motion of constant speed along a straight line as above. Then
the distance traveled from time t1 to time t2 is just the distance traveled from time
0 to t2 minus the distance traveled from time 0 to t1 , i.e., vt2 vt1 . Therefore the
average speed of a motion of constant speed v over the time interval from t1 to t2 is
vt2 vt1 v(t2 t1 )
= =v
t2 t1 t2 t1

85
Therefore, a motion of constant speed v has the same average speed over any time
interval.
The reason we said this was a characteristic property of motions of constant
speed is that we are going to prove that if the average speed of a motion over any
time interval is always equal to a constant v, then it has constant speed v in the sense
of the above definition. Indeed, let d(t) be the distance traveled from time 0 to time
t, then the average speed over a time interval from t1 to t2 is

d(t2 ) d(t1 )
t2 t1
which is equal to v by assumption. Since we assume the motion is on a straight
line, the distance d traveled during the time interval of length t = t2 t1 is just
d(t2 ) d(t1 ). Therefore dt = v, and d = vt. Since this t is arbitrary, we have proved
the claim.

Remark One would gain a better understanding of the preceding discus-


sion from the point of view of calculus. In that context, constant speed
means the derivative of the distance function d(t) (which measures the
distance from the starting point at time t) is constant, let us say, equal to
v. Now such a function is equal to a linear polynomial, d(t) = vt + c, for
some constant c. Its derivative at t1 is thus
d(t) d(t1 ) (vt + c) (vt1 + c)
lim = lim = lim v = v
tt1 t t1 tt1 t t1 tt1

The last step makes it clear that there is no need to take limit in this case
because the difference quotient is already a constant. Therefore we rewrite
the preceding computation without using limit:

d(t) d(t1 )
= v
t t1
for this fixed constant v, and for any t. This explains why we can define
constant speed in terms of constant average speed.

86
In the language of school mathematics, speed is the rate at which the work of
moving from one place to another is done. There are other standard rate problems
which deserve to be mentioned. One of them is painting (the exterior of) a house.
The rate there would be the number of square feet painted per day or per hour. A
second one is mowing a lawn. The rate in question would be the number of square
feet mowed per hour or per minute. A third is the work done by water flowing out
of a faucet, and the rate is the number of gallons of water coming out per minute or
per second. In each case, the concept of constant rate can be precisely defined as in
the case of constant speed. For example, a constant rate of lawn-mowing can be
defined in one of two equivalent ways. One is to say that if A is the total area that has
been mowed after T hours, then there is a constant r (with unit square-feet-per-hour)
so that A = rT , and this equality is valid no matter what T is. The other is to define
0
the average rate of lawn-mowing from time T1 to time T2 as T A , where A0
2 T1
is the area mowed from time T1 to time T2 . Then the lawn is said to be mowed at
a constant rate if the average rate of the lawn being mowed over any time interval
[T1 , T2 ] is equal to a fixed constant.
However, textbooks over the years have developed an abstract kind of work
problems, which typically read as follows.

Example It takes Regina 10 hours to do a job, and Eric 12 hours. If they work
together, how long would it take them to get the job done?

The mathematical defects of such a problem are overwhelming. First, this problem
cannot be solved if Regina and Eric do not each work at a constant rate, yet the
assumption of constant rate is typically not mentioned. A second assumption is that,
somehow, Regina and Eric manage to do different parts of the job, and at the end
the two parts fit together perfectly to get the job done faster. If the nature of the
work is not made explicit, however, such an assumption would sorely tax a students
imagination. For example, suppose the job involved is driving from Town A to Town
B, and a student interprets working together to mean Regina and Eric sharing the
driving! A third serious defect is that the concept of constant rate becomes difficult
to formulate precisely when the job in question is not clearly specified. Indeed, the

87
average rate of work from time t1 to time t2 is by definition,
the amount of work done from t1 to t2
t2 t1
But the numerator has to be a number, and a student would have a hard time asso-
ciating the vague description of amount of work with a number. Such vagueness
interferes with the learning of mathematics.

Make sure that you will not damage your students learning with
these kinds of work problems. Convince the publishers that if such
problems are ever given, there should be an explicit understanding that the
work refers to something specific, such as mowing a lawn.

We now revisit the preceding problem by giving four different reformulations that
are mathematically acceptable:
(P1) Regina drives from Town A to Town B in 10 hours, and Eric in 12. Assuming
that each drives at the same constant speed, Regina from Town A to Town B, and
Eric from Town B to Town A, and that they drive on the same highway, after how
many hours will they meet in between?
(P2) Regina mows a lawn in 10 hours, and Eric in 12. Assuming that each mows
at the same constant rate, how long would it take them to mow the same lawn if they
mow together without interfering with each other?
(P3) Regina paints a house in 10 hours and Eric in 12. Assuming that each paints
at the same constant rate, how long would it take them to paint the same house if
they paint together without interfering with each other?
(P4) A faucet can fill a tub in 10 minutes, and a second faucet in 12. Assuming
that the rate of the water flow remains constant in each faucet, how long would it
take to fill the same tub if both faucets are turned on at the same time?

It should be recognized that all four problems are the same problem: if you can
solve one, you can solve them all. Let us give a solution of the first, (P1).
Regina Eric
-
A B
| {z }
d mi

88
We have to determine the speeds of Regina and Eric. We do not know the distance
between Towns A and B, so to facilitate thinking, let us say this distance is d miles.
d
Therefore Reginas speed vR satisfies d = 10vR , and we have vR = 10 mph. Similarly,
d
Erics speed vE is 12 mph. We have to find out how long it takes Regina and Eric
to meet, but again, to facilitate thinking, let us say Regina and Eric meet after T
hours. At the moment we do not know what T is, but the assumption of constant
speed guarantees that the distance Regina has driven in T hours is vR T = dT
10 miles.
Similarly, the distance Eric has driven after T hours is dT
12 miles. Since they meet
in between the towns, the total distance they have driven together after T hours is
exactly d miles. Therefore we have
dT dT
+ =d
10 12
By the distributive law (e) for complex fractions in 6, we have
1 1
 
dT + =d
10 12

Since d is just a number, multiplying both sides by the complex fraction d1 (and using
1 1
rules (a) and (d) of 6) gives ( 10 + 12 )T = 1. By the definition of division. we get
1 5
T = 1 1 =5 (hours)
10
+ 12
11

It may be instructive if we also solve problem (P2) for comparison.


Let the area of the lawn be A sq. ft. Because in 10 hours Regina can mow the
A
whole lawn, i.e., A sq. ft., her (constant) rate of lawn-mowing is, by definition, 10
A
sq. ft. per hour. Similarly, Erics rate of lawn-mowing is 12 sq. ft. per hour. Now
suppose the two together can finish mowing the lawn in T hours. If in T hours,
Regina mows R sq. ft., then by definition of constant rate, R A
T = 10 , and therefore,
R = AT AT
10 . Similarly, in T hours, Eric mows 12 sq. ft. Because they mow with no
interference from each other, the sum total of the areas they mow in T hours adds
up exactly to A, i.e.,
AT AT
+ =A
10 12
1 1 1
By the distributive law, AT ( 10 + 12 ) = A. Multiplying both sides by A , we get

89
1 1
T ( 10 + 12 )= 1, so that

1 5
T = 1 1 =5 (hours),
10
+ 12
11

exactly as before.

We give one more example of constant rate.

Example Tom and May drive on the same highway at constant speed. May
starts 30 minutes before Tom, and her speed is 45 mph. Toms speed is 50 mph. How
many hours after May leaves will Tom catch up with her?
We give two slightly different solutions. Suppose T hours after May leaves, Tom
catches up with May. In those hours, May has driven 45T miles. Since Tom does
not start driving until half an hour after May does, the total distance he travels
in that time duration is 50(T 21 ) miles. The two distances being equal, we get
45T = 50(T 21 ). By the distributive law, 45T = 50T 25. Adding 25 to both sides,
we get 45T + 25 = 50T , and so we get 25 = 5T after subtracting 45T from both sides.
Thus T = 5, i.e., 5 hours after May leaves, Tom catches up with her.
Another solution is to watch Toms car from Mays car. So starting at 21 hour
after she leaves, she sees Toms car coming from a distance of 45 12 = 22.5 miles.27
Let t measure the number of hours after Tom starts driving. In t hours, Reginas car
travels 45t miles, whereas Toms car travels 50t miles. Therefore, after t hours, Mays
observation is that Tom is 50t 45t = 5t miles closer to her car, which is the same as
saying that, Toms car as observed from Mays car travels 5t miles in t hours.
By definition of constant speed, Toms car is driving at a constant speed of 5 mph
when observed from Mays car. Since May is initially 22.5 miles away, it will take
Tom 22.55
= 4.5 hours to catch up. Since Tom starts 0.5 hours after May leaves, it
takes Tom 4.5 + 0.5 = 5 hours after May leaves to catch up with her.

Exercises 1.7

1. A hi-fi store sells a CD player for $225. The owner decides to increase sales by
27
She has omnidirectional vision.

90
not charging customers the 8% sales tax. Then he changes his mind and charges
customers $x so that, after they pay the sales tax, the total amount they pay is still
$225. What is x?

2. Helena drives from Town A to Town B at x mph, and drives back at y mph. What
is her average speed for the round trip? If the round trip takes t hours, how far apart
are the towns?

3. A high-tech stock dropped 45% of its value in June to its present value of $N. A
stock broker tells his clients that if the stock goes up by 60% of its present value,
then it would be back to where it was in June. Is he correct? If so, why? If not,
by what percent must the stock at its present value of $N rise in order to regain its
former value?

4. (a) Define precisely what it means for water to flow out of a faucet at a constant
rate. (b) A fully open faucet with a constant rate of water flow takes 25 seconds to
fill a container of 5 21 cubic feet. At the same rate, how long does it take to fill a tank
of 12 12 cubic feet? (Be careful with your explanation!)

5. A faucet with a constant rate of water flow fills a tub in 9 minutes. If the rate of
water flow increases by 10%, how long would it take to fill the tub?

6. Kate and Laura walk straight toward each other at constant speed. Kate walks
1 32 times as fast as Laura. If they are 2000 feet apart initially, and if they meet after
2 12 minutes, how fast does each walk?

7. Let A and B be two fractions so that 0 < A < B. (a) Find the midpoint C of
the segment [A, B], i.e., find C so that B C = C A. (b) Find the point D so that
the ratio of the length of [A, D] to that of [D, B] is 2 : 5. (c) (b) Find the point E
so that the ratio of the length of [A, E] to that of [E, B] is m : n, where m and n are
positive integers.

8. (Sixth-grade Japanese exam question) A train 132 meter long travels at 87 kilo-

91
meters per hour and another train 118 meter long travels at 93 kilometers per hour.
Both trains are traveling in the same direction on parallel tracks. How many seconds
does it take from the time the front of the locomotive of the faster train reaches the
end of the slower train to the time that the end of the faster train reaches the front
of the locomotive on the slower one?

9. In Examples 1 and 2 of this section, the same algebraic method is used to arrive
at the desired conclusion. Now solve these problems again, and more pictorially, by
making use of the number line.

10. Driving at her usual constant speed of v mph, Stefanie can get from A to B in 5
hours. Today, after driving 1 hour, she decides to speed up to a constant speed of w
mph so that she can finish the whole trip in 4 21 hours instead of 5. By what percent
is w bigger than v (compared with v)?

11. (a) Define precisely what it means for someone to paint a house at a constant
rate. (b) Max and Nancy working together can finish painting a house in 56 hours.
If Max paints the same house alone, it would take him 90 hours to get it done. How
long does it take Nancy to finish painting the house if she works alone? (Assume
each paints at a constant rate, and that when they paint together there is no mutual
interference.)

12. Alfred, Bruce, and Chuck mow lawns at a constant rate. It takes them 2 hours,
1.5 hours, and 2.5 hours, respectively, to finish mowing a certain lawn. If they mow
the same lawn at the same time, and if there is no interference in their work, how
long will it take them to get it done?

13. (a) How much money would be in an account at the end of three years if the
initial deposit was $93 and the bank pays an interest of 6% at the end of each year?
Assume that no money is ever withdrawn from the account so that, for instance, at
the beginning of the second year there are $93 plus the interest of the preceding year
in the account. (You may use a calculator, but write down the steps clearly.) (b)
And at the end of n years?

92
Appendix

In this appendix, we briefly recall the commutative and associative laws of addi-
tion and multiplication, and also the distributive law that connects the two. Some
standard consequences will also be discussed.
In the following, lowercase italic letters will be used to stand for arbitrary numbers
without further comment. Notice that we are intentionally vague about what num-
bers we are talking about. The fact is that Theorems 1 and 2 are valid for whole
numbers, integers, rational numbers, real numbers, and even complex numbers, and
these theorems will be used in such generality without comment for the rest of this
book. With this understood, the associative and commutative laws for addition
state that for any x, y, z, we always have

x + (y + z) = (x + y) + z

and
x + y = y + x,
respectively. A fairly tedious argument, one that is independent of the specific num-
bers x, y, z involved but is dependent formally only on these two laws, then leads
to the following general theorem. For everyday applications, this theorem is all that
matters as far as these simple laws are concerned:

Theorem 1 For any finite collection of numbers, the sums obtained by adding
them up in any order are all equal.

A similar discussion holds for multiplication. Thus the associative and commu-
tative laws for multiplication state that for any x, y, z, we always have

x(yz) = (xy)z

and
xy = yx,

93
respectively. And, in like manner, we have:

Theorem 2 For any finite collection of numbers, the products obtained by mul-
tiplying them in any order are all equal.

Finally, the distributive law is the link between addition and multiplication. It
states that, for any x, y, z,
x(y + z) = xy + xz
Here it is understood that the multiplications xy and xz are performed before the
products are added. A simple argument then extends this law to allow for any number
of additions other than two. For example, the distributive law for five additions states
that for any x, a, b, c, d, e, we have

x(a + b + c + d + e) = xa + xb + xc + xd + xe

94
Chapter 2: Rational Numbers

1 The two-sided number line (p. 96)


2 Adding rational numbers (p. 98)
3 The vectorial representation of addition (p. 108)
4 Multiplying rational numbers (p. 115)
5 Dividing rational numbers (p. 124)
6 Comparing rational numbers (p. 132)

95
1 The two-sided number line

We are going to revisit the number line. Up to now, we have only made use of the
right side of 0. It is time that we make full use of the entire number line, both to the
left and right. Because we already have the fractions to the right of 0, we now look
at the collection of numbers (i.e., points on the number line) to the left of 0 obtained
by reflecting the fractions across 0. The fractions together with their reflected images
will be seen to form a number system, in the sense that we can perform the four
arithmetic operations on them in a way that is consistent with the operations already
defined on the fractions. This number system, called the rational numbers, is the
subject of this chapter.
Recall that a number is a point on the number line. We now look at all the
numbers as a whole. Take any point p on the number line which is not equal to 0;
such a p could be on either side of 0 and, in particular, it does not have to be a
fraction. Denote the mirror reflection of p on the opposite side of 0 by p , i.e., p
and p are equi-distant from 0 (i.e., same distance from 0) and are on opposite sides
of 0. If p = 0, let
0 = 0
Then for any point p, it is clear that

p = p

This is nothing but a succinct way of expressing the fact that reflecting a nonzero
point across 0 twice in succession brings it back to itself (if p = 0, of course 0 = 0).
Here are the mirror reflections of two points p and q on the number line:

q p 0 p q

Because the fractions are to the right of 0, the numbers such as 1 , 2 , or ( 95 )


are to the left of 0. Here are some examples of the mirror reflections of fractions
(remember that fractions include whole numbers):

96
3 (2 34 ) 2 1 ( 23 ) 0 2
3
1 2 2 43 3

The set of all the fractions and their mirror reflections, i.e., the numbers m
n
and
k
( ` ) for all whole numbers k, `, m, n (` 6= 0, n 6= 0), is called the rational numbers,
and is denoted by Q. Recall that the whole numbers, denoted by N, are a sub-set
of the fractions. The set of whole numbers and their mirror reflections,

. . . 3 , 2 , 1 , 0, 1, 2, 3, . . .

is called the integers, and is denoted by Z. We therefore have:

NZQ

We now extend the order among numbers from fractions to all numbers: for any
x, y on the number line, x < y means that x is to the left of y. An equivalent
notation is y > x.
x y

Numbers which are to the right of 0 (thus those x satisfying x > 0) are called
positive, and those which are to the left of 0 (thus those that satisfy x < 0) are
negative. So 2 and ( 13 ) are negative, while all nonzero fractions are positive.
The mirror reflection of a positive number is therefore negative, by definition, but
the mirror reflection of a negative number is positive. The number 0 is, by definition,
neither positive nor negative.
You are undoubtedly accustomed to writing, for example, 2 as 2 and ( 13 )
as 13 . You also know that the sign in front of 2 is called the negative sign.
So you may wonder why we employ this notation and have avoided mentioning
the negative sign up to this point. The reason is that the negative sign, having to
do with the operation of subtraction, simply will not figure in our considerations
until we begin to subtract rational numbers. Moreover, the terminology of negative
sign carries certain psychological baggage that may interfere with learning rational
numbers the proper way. For example, if a = 3, then there is nothing negative

97
about a, which is 3. It is therefore best to hold off introducing the negative sign
until its natural arrival in the context of subtraction in the next section.

Exercises 2.1

1. Show that between any two rational numbers, there is another rational number.


2. Which is bigger? (1.23) or (1.24) ? (1.7) or ( 12 1 2
7 ) ? (587 5 ) or (587 11 ) ?
9
( 16 ) or ( 47 ) ?

3. Which of the following numbers is closest to 0 (on the number line)?



( 15
7) , ( 11
5) ,
13
6,
9
4

2 Adding rational numbers

Before we proceed to a discussion of the arithmetic operations with rational num-


bers, we should ask why we bother with rational numbers at all. To answer this
question, we first take a backward step and look at the transition from whole num-
bers to fractions: with the whole numbers at our disposal, why did we bother with
fractions? One reason is to consider the problem of solving equations. If we ask which
whole number x has the property that when multiplied by 7 it equals 5, the answer is
obviously none. The fraction 57 , on the other hand, has exactly this property. One
may therefore say that if we insist on getting a solution to the equation 7x = 5, then
we would inevitably be led to x = 57 . More generally, the solution to the equation
nx = m where m, n are given whole numbers, with n 6= 0, is m n
. In this sense, we may
regard the fractions as the numbers which are the solutions of the equation nx = m
with n 6= 0, as m and n run through all whole numbers. Once we have these new
numbers, then we introduce the arithmetic operations among them in a way that is
consistent with the original arithmetic operations among whole numbers.
We now come back to our present situation. With fractions at our disposal,
suppose we want a fraction x so that 23 + x = 0. We would get no solution in this

98
case. The number ( 23 ) is there precisely to provide a solution to this equation. In
the same way, the number ( m n
) will be designated to be the solution of the equation
m
n
+ x = 0, for any whole numbers m and n (n 6= 0). Now that we have the negative
fractions, we are faced with the same problem concerning the rational numbers that
we faced concerning the fractions, namely, how to define the arithmetic operations
among the rational numbers in a way that is consistent with the original arithmetic
operations among fractions. In this section we deal with addition and subtraction.
We can approach the addition of rational numbers by imitating what we did in
Chapter 1, which is to explicitly define the sum of two rational numbers and then
show that, so defined, it satisfies the associative and commutative laws. This will
be done in the next section by introducing the new concept of a vector; for the
middle school classroom, this may well be the most prudent approach to the addition
of rational numbers. We will, however, begin with a more abstract approach in this
section, and an explanation is in order. If we look beyond addition to multiplication
(see 4 below), then we would realize that, while abstraction can be downplayed for
addition, it becomes inevitable for multiplication. There seems to be no satisfactory
way to do the multiplication of rational numbers short of letting the abstract con-
sideration of the distributive law dictate the course of action. Recognizing this fact,
we set our sights on winning the war, even if it means losing a battle or two in the
process, by presenting the addition of rational numbers in essentially the same way as
we will about multiplication. This has the advantage of imposing a certain unity on
the subject of rational numbers. Moreover, from the point of view of teaching in high
school, it is a good thing to know that, while we downplay unnecessary abstraction
whenever possible, we can nevertheless cope with abstraction when necessary.

Since we already expect the addition of rational numbers, however it is defined,


to satisfy the associative and commutative laws, we may as well assume all these
properties at the outset. Historically, that was pretty much how people before the
eighteenth century dealt with the new numbers like negative numbers and complex
numbers. The modus operandi was essentially that, No matter what they are, let
us treat them like any other number. Thus restricting ourselves to addition for the
moment, we make three fundamental assumptions on addition. The first two
are entirely noncontroversial:

99
(A1) Given any two rational numbers x and y, one can add these to get
another rational number x + y so that, if x and y are fractions, x + y is
the same as the usual addition of fractions, and such that this addition of
rational numbers satisfies the associative and commutative laws.
(A2) x + 0 = x for any rational number x.

The last assumption explicitly prescribes the role of the mirror reflection of a
number:

(A3) If x is any rational number, x + x = 0.

We would like to explicitly bring out the significance of the statement in (A1), to
the effect that if x and y are fractions, then x + y is just ordinary fraction addition.
Just as the addition of fractions is an extension of the addition of whole numbers
rather than a radical departure from it, the addition of rational numbers is also an
extension of the addition of fractions and not a radical departure.
Incidentally, the fact that we assume the addition of rational numbers to be asso-
ciative and commutative means that Theorem 1 in the Appendix of Chapter 1 applies
to rational numbers. In particular, we will be free to add a group of numbers in any
order we like.
The last assumption (A3) makes it official that, for example, 2 + 2 = 0. As to
(A2), it is not as vacuous as it appears: if x is a negative rational number, then
x + 0 is an unknown quantity at the moment because our experience with 0 has been
conditioned by our encounters with positive quantities. Thus it takes an explicit
assumption to get x + 0 = x for any rational number x, including negative ones.
Because we are assuming that addition among rational numbers is commutative,
(A2) and (A3) then also imply that

(A20 ) 0 + x = x for any rational number x.


(A30 ) If x is any rational number, x + x = 0.

Now that we have two new operations on the rational numbers, the mirror reflec-
tion and addition, the first thing we should ask is how they are related. For example,
is the order of applying them interchangeable, i.e., given two rational numbers x and

100
y, if we add them and then take the mirror reflection, do we end up with the same
number as when we take their mirror reflections first before adding them? In symbols,
this becomes whether (x + y) = x + y . We will prove that such is the case, but
we need some preparations for this purpose in the form of a lemma.
The lemma in question is the converse of (A3). The motivation comes from the
fact that there are times, even critical times, when we want to verify that a number x
is the mirror reflection of a given number y. What the lemma tells us is that we can
get it done by a straightforward computation, namely, just compute that x + y = 0.
This is a very attractive scenario because it is always nice to be able to give a proof
by computation. You will get to see, several times in this chapter, that the need to
verify that one number is the mirror reflection of another arises more often than you
realize.

Lemma For all x, y Q, if x + y = 0, then y = x and x = y .

Proof We exploit (A30 ):

x + y = 0 = x + (x + y) = x + 0
= (x + x) + y = x + 0 (associative law)
= 0 + y = x (by (A30 ) and (A2))
= y = x (by (A20 ))

To finish the proof, we also need to show that x = y . But knowing y = x , we can
take the mirror reflection of both sides to obtain y = x . Since x = x, we get
y = x, which is the same as what we want. The lemma is proved.

We are now in a position to prove what we are after concerning and addition.

Theorem 1 For all x, y Q, (x + y) = x + y .

Proof It is possible to give an elementary proof of the theorem using a case-by-


case argument by letting x or y to be alternately a fraction and a negative fraction,
but the proof we are going to give is more sophisticated and makes use of the Lemma,
something that we already hinted at above. Because the same reasoning will be used

101
a few more times, it is worth learning. What Theorem 1 asserts is that x + y is the
mirror reflection of (x + y). Now according to the Lemma, this would be the case if

(x + y) + (x + y ) = 0

We now prove this by the repeated use of the associative and commutative laws:

(x + y) + (x + y ) = (x + x ) + (y + y ) = 0 + 0 = 0

where we have made use of (A2) and (A3) in the last two equalities. We are done.

Perhaps it is not obvious, but Theorem 1 is actually a statement about how to


remove parentheses, as we shall soon witness.

With these basic facts out of the way, we are in a position to explicitly compute
the sum of any two rational numbers. Since we already know how to add 0 to any
number, by (A2) and (A20 ), it suffices to consider the sum of two nonzero rational
numbers. Now a nonzero rational number is either a fraction or a negative fraction,
so we proceed to look at all the possibilities. Therefore let s and t be any two nonzero
fractions, i.e., s and t are both positive. Then the following four cases exhaust all
the possibilities of adding two rational numbers:

s + t, s + t , s + t , s + t

By (A1), we already know how to compute the sum s + t as these are fractions (see
Chapter 1, 3). Therefore, we only need to examine the three remaining cases. We
emphasize that s and t here stand for any two fractions. Since s + t = t + s, we see
that the fourth case above follows from the third case.28 Therefore we need examine
only the following two cases:

s + t , s + t
28
Such an assertion is sometimes confusing to a beginner who becomes fixated on the symbols s
and t themselves. Here is a more detailed explanation of this assertion. Suppose we can prove a
formula for the third case s + t for all fractions s and t. Then the commutative law of addition
implies that we have a formula for t + s, again for all positive fractions s and t. Since s and t are
arbitrary, we may switch the symbols and write s for t as well as t for s, all the while remembering
that they stand for any fractions. Assuming that the switch has been done, then we have a formula
for s + t for all positive fractions s and t. But then this is exactly the fourth case.

102
The first case is easily disposed of because by Theorem 1, we have

s + t = (s + t)

Thus,

2 + (3 54 ) = (2+ 3 45 )

which is of course equal to (5 54 ) .


As to the second case, i.e., s + t , we would like to preface the computation with
a general comment. What we will prove is that

s + t = (s t) if s t
s + t = (t s) if s < t

As proofs go, these are about as sophisticated as mathematics in grades K8 is ever


going to get. Not long or complicated, just sophisticated. Let us explain what the
latter means. To prove that two numbers are equal, the normal procedure is to do
a straightforward computation. For example, to prove that if x and y are numbers,
(x y)(x + y) = x2 y 2 , one expands the left side by the distributive law and collects
terms to arrive at the right side. However, the above equalities cannot be proved
this way. Rather, the proof is achieved by an indirect method which appeals to the
Lemma; it requires a delicate touch and cannot be boiled down to a mechanical pro-
cedure. Is such a proof strictly necessary? For the addition of rational numbers the
answer is no, because in the next section, we will use a hands-on method (of vector
addition) to achieve the same goal. However, if our goal is the understanding of the
multiplication of rational numbers (4 below), then there seems to be no way to get
around the sophistication. Such a proof for addition, therefore, serves the purpose of
acclimatizing ourselves to the inevitable abstract arguments, and ultimately to the
learning of algebra. As a teacher, you have to thoroughly internalize these kinds of
arguments before you can hope to teach them to your students with conviction.

Let us begin the computation of s + t proper. As noted above, it separates into


two sub-cases: s t or s < t, and we take up the first sub-case first. We claim that

s + t = (s t) if s t

103
For example,
14 + 2.5 = (14 2.5)
which is obviously equal to 11.5. To prove s + t = (s t), observe that since s t,
the subtraction s t makes sense as ordinary subtraction between two fractions
(Chapter 1, 3), and we have (s t) + t = s. This implies (s t) + t + t = s + t .
Since t + t = 0, we get s t = s + t , as desired.
Next, we claim:

s + t = (t s) if s < t

For example, this assertion implies that

9 + 3.6 = (9 3.6)

which is 5.4 . The reason for s + t = (t s) is that we can apply Theorem 1 (and
the fact that s = s) to get
s + t = (s + t)
We can simplify the right side: we have t > s, so the first sub-case shows that
s + t = t + s = (t s). Altogether, we have s + t = (t s) for s < t. This proves
our claim.
In summary, we have shown how to compute the sum of any two rational numbers
strictly on the basis of assumptions (A1)(A3). We can summarize our results into
the following theorem.

Theorem 2 For all fractions s and t,

s+t = the ordinary sum of the fractions s and t


s + t = (s + t)

(s t) if s t
s + t = t + s =
(t s) if s < t

These explicit computations of the addition of rational numbers lead to the fol-
lowing insight: the subtraction of fractions becomes addition in the larger context of

104
rational numbers, in the sense that, if s, t are fractions so that s t, then s t makes
sense, on the one hand, as ordinary fraction subtraction (see 3 of Chapter 1) and, on
the other, as the addition of two rational numbers s and t , and both give the same
result. This fact prompts us to define, in general, the subtraction between any two
rational numbers x and y as:
def
xy = x + y
Thus, a subtraction such as 56 14 has exactly the same meaning whether we look
at it as a subtraction between the two fractions 56 and 14 or between these fractions
considered as rational numbers; this is reassuring. On the other hand, we are now free
to do a subtraction between any two fractions such as 14 56 , whereas before (i.e.,
in 3 of Chapter 1) we could not carry out the subtraction because the first fraction
is smaller than the second. We see for the first time the advantage of having rational
numbers available: we can as freely subtract any two fractions as we add them. But
this goes further, because we can even subtract not just any two fractions, but any
two rational numbers, e.g., 5.5 17 .
This definition reveals that subtraction is just a different way of writing addition
among rational numbers. Any property about subtraction among rational numbers
is ultimately one about addition.

In the rest of the section, we will explore a bit the ramifications of this concept
of subtraction. The overriding fact is that, without this general definition, we do not
have a good grasp of what subtraction is about. Beyond the oddity of not being able
to subtract a larger fraction from a smaller one, there is also the unpleasant observa-
tion that subtraction is not associative, i.e., in general, (x y) z 6= x (y z)
for fractions x, y, z. For example, letting x = 4, y = 2, z = 1, the left side is 1 while
the right side is 3. We need clarity on the situation.

We start from the beginning. As a consequence of the definition of x y, we have


0 y = y
because 0 + y = y (see (A20 )). Common sense dictates that we should abbreviate
0 y to y. So we have
y = y

105
At this point, we abandon the notation of y and replace it by the more common y.
We call y minus y.
Let us re-state some of our previous conclusions in the new notation. From x = x
for any x Q, we get
(x) = x
(A3) now states that

(A3*) If x is any rational number, x + (x) = (x) + x = 0.

The Lemma and Theorems 1 and 2 now read:

Lemma* For all x, y Q, if x + y = 0, then y = x and x = y.

Theorem 1* For all x, y Q, (x + y) = x y.

(For Theorem 1*, observe that x + y = x y, by the definition of subtraction,


so that x + y = x y.)

Theorem 2* For all fractions s and t,

s + t = the ordinary sum of the fractions s and t


s t = (s + t)

(s t) if s t
st = t+s =
(t s) if s < t

Observation: We now see that Theorem 1*, in the form of

(x + y) = x y

for all rational numbers x and y, is a statement about removing parentheses. We can
go a step further: for all rational numbers x and y,

(x y) = x + y and (x + y) = x y

We leave these as exercises.

106
We pursue the theme that subtraction is another way of writing addition among
rational numbers and bring closure to a remark we made at the end of 3 in Chapter
1 about the subtraction of fractions. We now show that for any rational numbers a,
b, x, y,
(a + b) (x + y) = (a x) + (b y)
This is because
(a + b) (x + y) = a + b + (x + y) = a + b + x + y
where the first equality is by the definition of subtraction and the second equality is
on account of Theorem 1. Thus (a + b) (x + y) = (a + x ) + (b + y ), by Theorem
1 of the Appendix in Chapter 1. By the definition of subtraction again, we get
(a + b) (x + y) = (a x) + (b y).
It is clear from this reasoning that there is a similar assertion if a + b is replaced
by a sum of k rational numbers for any positive integer k and the same is done to
x + y. The details are left as an exercise.

There is a certain commutativity of subtraction that follows naturally from the


definition of subtraction in terms of addition. For any x, y Q, we claim that
x + y = y x
For example, 32 + 4 = 4 32 , both being equal to 10
3 as a simple application of
Theorem 2* shows. In general, we have
y x = y + x (definition of subtraction)
= x + y (commutativity)
= x + y
Finally we take up the issue of (x y) z 6= x (y z). We want to find out,
on the basis of the associative law of addition, what either side is equal to. We will
do the left side and leave the right side to a problem. Thus,
(x y) z = (x + y ) + z (definition of subtraction)
= x + (y + z ) (associativity)

= x + (y + z) (Theorem 1)
= x (y + z) (definition of subtraction)

107
Therefore
(x y) z = x (y + z)

Exercises 2.2

1. Prove that or all x, y Q, if x + y = x, then y = 0.

2. Without using Theorem 2 or Theorem 2*, and using only (A1)(A3), the Lemma,
and Theorem 1, explain to an eighth grader why 34 2 15 = 15
13
.

3. Prove that for all rational numbers x and y, we have (x y) = x + y and


(x + y) = x y. Give the reason at each step.

4. Explain carefully why each of the following is true for all rational numbers x, y, z:
(a) (xy)z = (xz)y. (b) x(yz) = (xy)+z. (c) (x+y)z = x(zy).

5. Compute:(a) 4 67 + 2 32 . (b) 7.1 22 13 . (c) 7 (2.5 3 32 ).


7
(d) (703.2 + 689.4) ( 15 3 32 ). (e) ( 65 (1 18 5
) ) + 24 .

6. (a) Let a, b, . . . , z, w be rational numbers. Give a detailed proof of


(a + b + c + d) (x + y + z + w) = (a x) + (b y) + (c z) + (d w)
by justifying every step. (b) Can you extend (a) from a pair of 4 rational numbers
to a pair of n rational numbers for any positive integer n? For notation, try
(a1 + a2 + + an ) (x1 + x2 + + xn ) = (a1 x1 ) + (a2 x2 ) + + (an xn )
(Although officially we do not take up mathematical induction until Chapter 8, you
may use this technique here if you want.)

3 The vectorial representation of addition

We learned in the preceding section how to add rational numbers on the basis of
the assumption that addition satisfies three reasonable properties (A1)(A3), and we

108
also observed that this way of adding rational numbers coincides with the concate-
nation of segments if the rational numbers are positive. For a sixth or seventh grade
classroom, however, it is better to have a more concrete approach to addition, either
as an alternative or, at least, as a supplement. We now outline such an approach by
returning to the number line and introducing new objects called vectors. Then we
define the addition of vectors, and on the basis of that we give a new definition of the
addition of rational numbers. At the end, we will indicate why the two definitions, the
one introduced in the last section and the present one defined using vectors, coincide.
Thus we start from the beginning all over again. By definition, a vector is a
segment on the number line together with a designation of one of its two endpoints
as a starting point and the other as an endpoint. We will continue to refer to the
length of the segment as the length of the vector, and call the vector left-pointing
if the endpoint is to the left of the starting point, right-pointing if the endpoint is
to the right of the starting point. The direction of a vector refers to whether it is
left-pointing or right-pointing.
We denote vectors by placing a bar above the letter, e.g., A, x, etc., and in
pictures we put an arrowhead at the endpoint of a vector to indicate its direction.
For example, the vector K below is left-pointing and has length 1, with a starting
point at 1 and an endpoint at 2 , while the vector L is right-pointing and has length
2, with a starting point at 0 and an endpoint at 2.

3 2 1 0 1 2 3
 -
K L

For the purpose of discussing the addition of rational numbers, we can further
simplify matters by restricting attention to a special class of vectors. Let x be a
rational number, then we define the vector x to be the one with starting point at 0
and endpoint at x. It follows from the definition that, if x is a nonzero fraction, then
the segment of the vector x is exactly [0, x]. Here are two examples of vectors arising
from rational numbers:

109
4 3 2 1 0 1 1.5 2
 -
3 1.5

In the following, we will concentrate only on those vectors x where x is a rational


number, so that all vectors under discussion will be understood to have starting point
at 0. We now describe how to add such vectors. Given x and y, where x and y are two
rational numbers, the sum vector x + y is, by definition, the vector whose starting
point is 0, and whose endpoint is obtained as follows:

slide the vector y along the number line until its starting point (which is
0) is at the endpoint of ~x, then the endpoint of y in this new position is
by definition the endpoint of x + y.

As an example: to obtain the endpoint of the sum 2 + 1 , we slide the vector 1


until its starting point (i.e., 0) is at 2, as shown:

3 2 1 0 1 2
 -

The vector 2 +1 is therefore the vector that starts at 0 and ends at 1, as shown:

3 2 1 0 1 2
-
2 + 1

In general, a vector is completely determined by its length and its direction. There-
fore the following is a complete description of the sum of any two vectors:

If both vectors x and y have the same direction, then the sum vector x + y
has the same direction as x and y, and its length is the length of the
concatenation of the segments of x and y, and is therefore the sum of the
lengths of x and y.

110
If the vectors x and y have different directions, then the direction of the
sum vector x+y is the same as that direction of the vector with the greater
length, and the length of the sum vector x+y is the difference of the lengths
of the vectors x and y.

As an application, observe that the above description of the sum x + y, where x


and y are two rational numbers, does not make any distinction between whether x
comes first or y comes first. In other words, according to this description, x + y is the
same vector as y + x. Therefore we have proved:

Lemma 1 If x and y are rational numbers, then x + y = y + x.

We are now in a position to define the addition of rational numbers. The sum
x + y of any two rational numbers x and y is by definition the endpoint of the
vector x + y. In other words,

x + y = the endpoint of x + y.

Put another way, x + y is defined to be the point on the number line so that its
corresponding vector x + y satisfies:

x + y = x + y.

From Lemma 1, we conclude:

Lemma 2 The addition of rational numbers using the vector method is commu-
tative.

To get some intuitive feelings for the sum of two rational numbers x and y, we
will now systematically go through the various possibilities for x + y. These are:

(i) Both x and y are positive.


(ii) Both x and y are negative.
(iii) x is positive but y is negative, and the length of x is less than the
length of y.

111
(iv) x is positive but y is negative, and the length of y is less than the
length of x.

Because we know from Lemma 1 that x+y = y +x, the preceding four cases exhaust
all the possibilities. For example, there is no need to consider the case of x is negative
but y is positive, and the length of y is less than the length of x, because looking at
y + x (which is equal to x + y), this would be case (iii) above provided we interchange
the symbols x and y.
Now case (i) is straightforward: we are given the following picture:

0 y
- -
x

The vector x + y is, according to the definition of the vector sum, right-pointing with
length equal to the sum of the lengths of x and y. The number x + y is therefore as
indicated:

0
- -?
x

This confirms the fact that for two fractions x and y, their sum as rational numbers
by the use of vectors is exactly the length of the concatenation of the segments [0, x]
and [0, y], i.e., the same as their sum as fractions.
Next we tackle case (iii):

y 0
 -
x

Then, by definition, x + y is the point as indicated,

0
?
 -
x

112
In this case x + y is negative, and the picture shows that the length of the segment
[x+y, 0] is the length of y minus the length of x. This is consistent with the description
of x + y above.
We leave cases (ii) and (iv) to an exercise.

The preceding discussion shows that if x and y are fractions, then x + y, being the
length of the concatenation of [0, x] and [0, y], has exactly the same meaning as in 3
of Chapter 1. Therefore the addition of rational numbers x + y, as defined by vector
addition, coincides with the addition of fractions in the sense of 3 of Chapter 1 when
x, y are fractions. In fact, we are going to show that this definition of the addition
of rational numbers coincides with the definition of addition in the preceding section.
Let s, t be two fractions. Then the sum of two rational numbers has to be one of
the following four types of sums:

s + t, s + t , s + t , s + t

In order to show that the sum of two rational numbers is the same whether they are
added according to the method of 2 or according to the method of vectors, it suffices
to show that each of the above four sums is the same for these two methods. For
this particular discussion, we are going to employ an ad hoc notation: hhx + yii will
denote the sum of rational numbers of x and y as defined by vector addition, while
x + y will continue to denote the addition in the sense of 2. We have just seen that

hhs + tii = s + t = the ordinary sum of the fractions s and t.

Next, according to 2,
(
ts if t s
s +t=
(s t) if t < s

Now we use the above observation about the sum of two vectors: if t s, then the
direction of s + t is the direction of t (i.e., right-pointing) as it is longer, and its
length is t s. This shows hhs + tii = s + t, in case t s. If, however, t < s, then
the same observation about the sum of two vectors says that the direction of s + t
is the direction of s (i.e., left-pointing) as it is longer, and its length is s t. In other
words, we also have hhs + tii = s + t, in case t < s. Thus hhs + tii = s + t for

113
any fractions s and t. Similarly, hhs + t ii = s + t for any fractions s and t. We
have thus disposed of three of the four sums.
Finally, we saw in 2 that s + t = (s + t) . But since both s and t are left-
pointing, s + t is also left-pointing, and the length of s + t is just the length of the
concatenation of the segments of s and t , i.e., equal to s + t. So hhs + t ii = s + t
as well.
We have thus proved that hhx + yii = x + y for all rational number x and y.
Thus the addition of rational numbers can be defined purely algebraically as in 2,
or by the use of vectors.

Since the addition of rational numbers according to the method of 2 is associative


(see (A1)), it follows that the addition of rational numbers by the method of vectors
is also associative. This conclusion would be extremely tedious to prove directly using
vectors.

Exercises 2.3

1. Compute each of the following in two different ways: by the method of 2 but
without making use of Theorem 2 of 2, and then by using vectors: (a) 45 + 3 , and
(b) ( 25 ) + 23 .

2. For each of the following numbers, explain to a sixth grader using vector addition
whether it is positive or negative:
(68 21 ) + 68 25 , (1 78 ) + 2 10
1
, 16
7 + (2 14 ) , 3
(1 10 ) + 79 .

3. Give a direct proof of the associative law of addition using vectors in the following
two special cases: (3 + 6.5 ) + 7 = 3 + (6.5 + 7), and (6 + 3.5 ) + 2 = 6 + (3.5 + 2).
(You would have a better understanding of why it is so tedious to prove associativity
using vectors after doing this problem.)

4. Find the sums of case (ii) and case (iv) for x + y, where x and y are rational
numbers (see discussion after Lemma 2).

114
4 Multiplying rational numbers

As we mentioned in 2, we are going to take the same approach to multiplication


as addition, namely, we make the following two fundamental assumptions on
multiplication:

(M1) Given any two rational numbers x and y, there is a way to multiply
them to get another rational number xy so that, if x and y are fractions,
xy is the usual product of fractions. Furthermore, this multiplication of
rational numbers satisfies the associative, commutative, and distributive
laws.
(M2) If x is any rational number, then 1 x = x.

We note that (M2) must be an assumption because, for instance, we do not know as
yet what 1 5 is until (M2) assures us that it is in fact 5 .
It turns out that the other seemingly obvious fact that

0 x = 0 for any x Q.

need not be assumed because it can be proved on the basis of (M1) and (M2); see
Exercises 2.4 below.

As in the case of addition, now that we have introduced multiplication among


rational numbers, our first task is to find out how multiplication is related to the
existing operations, in particular, addition and the mirror reflection . As always,
the relationship between addition and multiplication is codified by the distributive
law, which we must point out is part of the assumption in (M1). As to the oper-
ation , we can ask as before whether the order of applying multiplication and
is interchangeable. In other words, given two rational numbers x and y, if we get
their mirror reflections first and then multiply (thus x y ), how is it related to the
number obtained by multiplying them first and then get its mirror reflection (thus
(xy) ). Unlike the case of addition, these two numbers turn out to be not equal,
i.e., x y 6= (xy) , or in the notation of the minus sign, what we are saying is that
(x)(y) 6= (xy). As is well-known, the correct answer is

(x)(y) = xy for all rational numbers x and y.

115
This surprising fact, the bane of many middle school students, can be given a very
short proof. We will present this proof at the end of the section, but not here, because
for the middle school classroom, such a sophisticated proof is far from appropriate.
Instead we will take a leisurely detour through the multiplication facts of fractions
and negative fractions, and wind up with the equality (x)(y) = xy as our final
destination.

Our first order of business is to find out explicitly how to multiply rational num-
bers. (Before proceeding further, you may wish to review the discussion about the
nature of these kinds of computations in the proof of Theorem 2 of 2.) Thus let
x, y Q. What is xy? If x = 0 or y = 0, we have just seen that xy = 0. We may
therefore assume that both x and y are nonzero, so that each is either a fraction s,
or the negative of a fraction, s. Therefore, letting s and t be nonzero fractions, we
consider the following four cases separately:

st, (s)t, s(t), and (s)(t).

Since Chapter 1 already dealt with the case st, it suffices to deal with the remaining
three cases. We will prove that, for all positive fractions s and t,

(s)t = (st) (e.g., (5)7 = 35)


s(t) = (st) (e.g., 5(7) = 35)
(s)(t) = st (e.g., (5)(7) = 35)

Let us now prove these assertions. Since multiplication of rational numbers is


assumed to be commutative and s and t are arbitrary, knowing the first implies
knowing the second. Let us prove the first, i.e., (s)t = (st). We are thus trying
to prove that (s)t is minus st. According to Lemma* of 2, all we have to do is to
prove that
(s)t + st = 0
This is so because, by the distributive law, st + (s)t = (s + (s)) t. This is the cru-
cial step because the right side is now equal to 0 t, which is 0 because, as we already
observed, 0 times any number is 0. So the proof for (s)t = (st) is complete.

116
It remains to deal with the third equality above, i.e., for all nonzero fractions s
and t,
(s)(t) = st
We can again invoke Lemma* of 2 if we think of st as minus (st) ; in other words,
st = ((st)). Then this lemma says (s)(t) would be equal to ((st)) if we
can prove (s)(t) + ((st)) = 0. This we can do because we have just finished
proving that (st) = (s)t. This then opens the way to another application of the
distributive law:

(s)(t) + ((st)) = (s)(t) + (s)t


= (s)((t) + t) (distributive law)
= (s) 0 (by (A3*) of 2)
= 0

and the proof of (s)(t) = st is also complete.


We summarize our findings in the following theorem.

Theorem 1 For all fractions s and t,

st = the ordinary product of the fractions s and t


(s)t = (st)
s(t) = (st)
(s)(t) = st

Theorem 1 gives us a basic idea about the multiplication of rational numbers, but
it needs to be complemented by a broader view of the situation. We will attend to
that presently, but we wish to take note of some of its immediate consequences. For
example, note the following simple rules implied by Theorem 1:

positive positive is positive


positive negative is negative
negative negative is positive

117
In particular, we know that

x2 0 for any x Q

regardless of whether it is 0 or positive or negative. By FASM, we have

x2 0 for any number x

The equality (s)(t) = st for all fractions s, t is among the Most Frequently
Asked Questions in school mathematics. Not all students will understand the preced-
ing proof, but very likely they all want some answer. We now address this classroom
issue by presenting a much simpler proof of something less, namely (1)(1) = 1.
There are two reasons why we take the trouble to give an independent proof of a
result already implied by Theorem 1. One is that this proof may be more widely
accessible to students; in mathematics education, every bit of reasoning helps. The
other reason is the surprising fact that this simple result actually leads to the proof
of a generalization of Theorem 1.

We begin with a basic, but believable observation, and will prove it by using the
hands-on vectorial approach to the addition of rational numbers.

Theorem 2 For any rational number x, the number (1)x is the mirror reflec-
tion of x. In symbols: (1)x = x.

Proof The number x is the point on the opposite side of 0 from x so that x and
x are equi-distant from 0. Therefore this is the picture we want to be true when x
is positive:

(1)x 0 x

and this is the picture we want to be true when x is negative:

x 0 (1)x

118
Now think of the sum x + (1)x in terms of vectors (see 3). If we can show that

x + (1)x = 0

then the vectors ~x and (1)x must have opposite direction and equal length (see
the description of the sum of two vectors in 3). Consequently, (1)x will have to
be equal to x. Let us therefore prove that x + (1)x is equal to 0. We use the
distributive law:
(M2)
x + (1)x = 1 x + (1)x = {1 + (1)} x

But 1 + (1) = 0 because ~1 and (1) have the same length and have opposite
directions. Therefore

x + (1)x = {1 + (1)} x = 0 x = 0

We are done.

If we let x = (1), then Theorem 2 yields the conclusion we sought.

Corollary (1)(1) = 1.

We now show how to deduce a result more general than Theorem 1 from Theorem
2 and its Corollary: instead of multiplying fractions with other fractions or negative
fractions, we can now draw direct conclusions about the multiplication of rational
numbers.

Theorem 3 For all rational numbers x and y,

(x)y = x(y) = (xy)


(x)(y) = xy

Proof We first prove (x)y = x(y) = (xy). If we read the equality


of Theorem 2 backward, we get x = (1)x. Therefore (x)y = ((1)x) y =

119
(1) (xy). Now we apply Theorem 2 again in the same way, but this time to the
rational number (xy). Then we get (1)(xy) = (xy). Hence
(x)y = (1)(xy) = (xy)
The proof of x(y) = (xy) is similar (or we can apply the commutative law twice
to what we have just proved: x(y) = (y)x = (yx) = (xy)).
Next, we prove (x)(y) = xy. Theorem 2 gives (x)(y) = ((1)x) ((1)y).
Now Theorem 2 in the Appendix of Chapter 1 implies that
((1)x) ((1)y) = ((1)(1)) (xy)
So the Corollary to Theorem 2 says ((1)(1)) (xy) = 1 (xy) = xy. Theorem 3 is
proved.

If we reflect on Theorems 13 a little, wed realize that the key ingredient in


their proofs is the distributive law. This law was explicitly mentioned in the proofs
of Theorems 1 and 2, and of course Theorem 2 lies behind Theorem 3. There is a
fundamental reason why this has to be the case, namely, what connects addition to
multiplication is the distributive law, and since we need to import information about
addition (such as (A3*) and Lemma* 2) to multiplication, the distributive law is our
only recourse.

It remains to bring closure to this discussion of multiplication by delivering on a


promise made at the beginning of the section, to the effect that there is a short and
self-contained proof of Theorem 3.
We first prove (x)y = (xy), where x, y Q. By Lemma* of 2, it suffices to
prove that (x)y + xy = 0. This is so because by the distributive law,
(x)y + xy = ((x) + x) y = 0 y = 0
Next we prove (x)(y) = xy. Again by Lemma* of 2, it suffices to prove that
(x)(y) + ((xy)) = 0 because this equality implies that (x)(y) is equal to
((xy)), which is xy. Now
(x)(y) + ((xy)) = (x)(y) + ((x) y) (because (x)y = (xy))
= (x)((y) + y) (distributive law)
= (x) 0 = 0

120
The proof of Theorem 3 is complete.

We conclude this section with three remarks. First, there is a simple consequence
of Theorem 3 which is an explicit algorithm for the multiplication of rational numbers:
if m k
n and ` are fractions, then:

m k mk
=
n ` n`
m k mk
=
n ` n`
In the next section, we will see that these formulas remain valid even when m, n, k,
` are rational numbers (rather than just whole numbers).
Second, Theorem 2 gives us another way to think of how to remove parentheses,
to the effect that (x + y) = x y for all x, y Q (see Theorem 1* of 2). This
is because

(x + y) = (1)(x + y) (Theorem 2)
= (1)x + (1)y (distributive law)
= x y (Theorem 2 again)

Third, we use Theorem 3 to tie up a loose end by proving the following form of
the distributive law, which is commonly taken for granted.

x(y z) = xy xz for all x, y, z Q

Indeed, by using the ordinary distributive law, we have: x(y z) = x(y + z ) =


xy + xz = xy + x(z). But xy + x(z) = xy + (xz), by Theorem 3, so x(y z) =
xy + (xz). By the definition of subtraction, xy + (xz) = xy xz, and we have the
desired conclusion.

Exercises 2.4

1. Use (M1) and (M2) to give a direct, simple explanation of 0 x = 0 for any x Q.

121
2. Compute, and justify each step: (a) (4)(1 12 + 14 ). (b) 165 560( 43 87 ).
(c) ( 23 )(0.64 43 ). (d) (20 92 ( 17
5
)) + (3 29 17
5
).

3. (a) Find a simple proof of (1)n = n for a whole number n without making use
of Theorem 2. (Hint: n = 1 + 1 + + 1 (n times).) (b) Write down a direct proof
of (1)(1) = 1 without quoting Theorem 2. (c) Explain directly to a sixth grader
why (3)(2) = 6 by using only (1)(1) = 1, but not Theorem 2. (d) Do you
see how to prove (m)(n) = mn for all whole numbers m and n using only the
fact that (1)(1) = 1 ? (This is something you would want to keep in mind when
you teach: the special case of Theorem 1 when s and t are whole numbers is easier
to explain than the general case.)

4. (a) Use mathematical induction but none of Theorems 13 to prove that, for all
whole numbers m and n, (m)n = (mn). (b) Use mathematical induction and the
fact that (1)(1) = 1 to prove that for all whole numbers m and n, (m)(n) =
mn.

(As mentioned in Exercises 2.2, we will officially take up mathematical


induction in Chapter 8. This problem is not to suggest that you use math-
ematical induction to convince seventh graders that for all whole numbers
m and n, (m)(n) = mn. Rather, if you understand mathematical
induction sufficiently well, then this problem points to a way (more clearly
than problem 3) for you to prove (m)(n) = mn (m, n N) for seventh
graders without using the technicality of mathematical induction. If you
can convince all seventh graders of this much, it would already be a major
triumph of mathematics teaching.)

5. Use Theorem 3 to prove the other two rules of removing parentheses:

(x y) = x + y and (x + y) = x y

for all rational number x and y. (Many students learn to remove parentheses by rote,
using this method. So the purpose of this exercise is to bring awareness to the fact
that there are two ways to remove parentheses, the other one being problem 1 in

122
Exercises 2.2, and that both are based on genuine mathematics.)

6. Consider each of the following two statements about any rational number x:

(a) 3x < x.
1
(b) 10 x > x.

If it is always true or always false, prove. If it is sometimes true and sometimes false,
give examples to explain why.

7. The following is a standard argument in textbooks to show, for example, that


(2)(3) = 6:

Consider the sequence of products

4 (3) = 12, 3 (3) = 9, 2 (3) = 6,


1 (3) = 3, 0 (3) = 0, (1)(3) = a, (2)(3) = b,
(3)(3) = c, (4)(3) = d,

Observe the pattern that, for m(3) as m decreases to 0, each


product increases by 3. To continue this pattern beyond 0, one
should assign 3 to a, 6 to b, 9 to c, 12 to d, and so on, because
(1)(3) = 0 + 3 = 3, (2)(3) = 3 + 3 = 6, (3)(3) = 6 + 3 = 9,
(4)(3) = 9 + 3 = 12.

Is this a valid argument? What are the implicit assumptions used? Write a critique.
(Hint: If you write down precisely what this so-called pattern says, it would be the
statement that (n 1)(3) = n(3) + 3 for any positive integer n.)

8. (a) I have a rational number x so that 5 (2x 1) = (1 38 x). What is this x?


(b) Same question for (2 3x) (x + 1) = 35 x + 12 .

9. [For this problem, we extend the definition in 4 of Chapter 1 by defining, for any
rational number x and any fraction m n
, the meaning of m n
of x to be m n
x.] (a) A
3
number x has the property that 4 of x exceeds x itself by 49. What is this x? (b)

123
A number t has the property that twice t exceeds t2 by 74 of t. Find t.

5 Dividing rational numbers

The concept of the division of rational numbers is the same as that of dividing
whole numbers or dividing fractions. See the overview in 5 of Chapter 1. As before,
we begin such a discussion with the proof of a theorem that is the counterpart of
Theorem 1 in 5 of Chapter 1.

Theorem 1 Given x, y Q, with y 6= 0, there is a unique (i.e., one and only


one) z Q such that x = zy.

For example, making use of Theorem 1 of 4, we have that if x = 13 and y = 25 ,


then z = ( 13 25 ). Similarly, if x = 57 and y = 23 , then z = ( 75 32 ), or if x = 75
and y = 23 , then z = 75 23 . Note that, except for the negative sign, the z in all
cases is obtained by invert-and-multiply.

Proof First, we prove the existence of z. If x = 0, just take z = 0. We may


therefore assume that x > 0 or x < 0.
First assume y > 0. If also x > 0, then both x and y are fractions and the
existence of z is already known (see Theorem 1 of 4 in Chapter 1). If however x < 0,
then (x) is a fraction and again there exists a fraction z 0 such that (x) = z 0 y.
Thus x = (x) = (z 0 y) = (z 0 )y, by Theorem 3 of the preceding section. So
letting z = z 0 , we have proved that such a z Q exists in case y > 0.
If y < 0, then (y) > 0, and we know from the last paragraph that, for any
x Q, there is a fraction z 0 such that x = z 0 (y). By Theorem 3 of the preceding
section, this is the same as x = (z 0 )y. Setting z = z 0 , we have again proved that
there is such a z. We have now proved that such a z exists in all cases.
Now uniqueness, i.e., one and only one z. The proof is a standard piece of
mathematical reasoning, but it may take some getting used to, but you cannot avoid
encountering it as you advance in higher mathematics. We first prove the uniqueness,
for any y 6= 0, of a number to be denoted by Y , so that yY = 1. Indeed, suppose
there are two numbers Y1 and Y2 so that yY1 = 1 = yY2 . Then, using the associative

124
and commutative laws of multiplication,

Y1 = Y1 1 = Y1 (yY2 ) = (Y1 y)Y2 = (yY1 )Y2 = 1 Y2 = Y2

Therefore Y1 and Y2 coincide and there is only one such number which, as we said,
will be denoted by Y . Next, with arbitrary x, y given, suppose x = zy for some
number z. Then
z = z 1 = z(yY ) = (zy)Y = xY
Thus any such z must be equal to xY , and the proof is complete.

The standard notation for the Y so that yY = 1 is y 1 . Thus, yy 1 = y 1 y = 1


by definition, and y 1 is called the multiplicative inverse of the nonzero y. Theo-
rem 1 has two useful corollaries.

Corollary 1 If x, y Q and xy = 0, then x = 0 or y = 0.

Proof Indeed, suppose y 6= 0, and we must prove x = 0. Now we always have


0 = 0 y. Compare with 0 = x y. The uniqueness part of Theorem 1 implies that 0
must be equal to x, as desired. A more down-to-earth reasoning would proceed as
follows: since xy = 0, we have xyy 1 = 0 y 1 . The left side is so that x 1 while
the right side is 0. Therefore x = 0. The Corollary 1 is proved.

The fact that xy = 0 implies x = 0 or y = 0 is important in the solution of


equations in algebra, so this fact should be carefully pointed out to students before
they take up algebra.

Corollary 2 For any nonzero y Q, (y)1 = (y 1 ).

Proof This can be verified separately for positive and negative ys (see Exer-
cises 2.5 below), but it is also valuable to learn an abstract proof. Indeed, from
1 = y 1 y, we get 1 = ((y 1 ))(y) (by Theorem 3 of 4). Comparing the latter with
1 = ((y)1 )(y) and using the uniqueness of the multiplicative inverse of y, we
get (y)1 = (y 1 ), as claimed.

125
We normally omit the parentheses around y 1 in (y 1 ) and simply write y 1 ,
and we can do this because Corollary 2 guarantees that there is no possibility of
confusion.
Thus ( 72 )1 = 72 , and (2 71 )1 = 15
7
.

What does the preceding theorem really say? It says that if we have a nonzero
rational number y, then any rational number x can be expressed as a rational mul-
tiple of y, in the sense that x = zy for a unique rational number z; in fact, z = xy 1 .
Thus with y fixed, every x Q determines a unique rational number z = xy 1 so
that x = zy. This number z = (xy 1 ) is, by definition, the division of x by y, or,
more formally:

x
Definition Given x, y Q, with y 6= 0, the division of x by y, in symbols, y
,
is the unique rational number so that
!
x
x= y
y

In other words,
x
= xy 1
y
x
The number y is also called the quotient of x by y. Notice that the cancellation
of the ys is built into the very definition of division, i.e., x = ( xy ) y.

We can now clear up a standard confusion in the study of rational numbers. One
finds, for instance, the equalities
3 3 3
= =
7 7 7
but explanations are usually not given. We now supply the explanation. Because 3,
7, etc. are rational numbers, we see from the definition that
3 1 3
= 3 (7)1 = 3 ( ) =
7 7 7
where we have made use of Corollary 2 to get (7)1 = 71 , and Theorem 3 of 4
in the last step. In a similar fashion, we have 3
7
= 37 . More generally, the same

126
reasoning supports the assertion that if k and ` are whole numbers and ` 6= 0, then
k k k
= =
` ` `
and
k k
= .
` `
We may also summarize these two formulas in the following statement:

Lemma For any two integers a and b, with b 6= 0,


a a a
= = .
b b b

This lemma will be seen to be a special case of basic facts about so-called rational
quotients, to be introduced presently, but in terms of everyday computations with
rational numbers, it is well-nigh indispensable. In particular, it implies that every
rational number can be written as a quotient of two integers. In view of the Lemma,
we have proved the following.

Theorem 2 Every rational number can be written as a quotient of two integers.


In addition, the quotient can be chosen so that the denominator is a whole number.

Theorem 2 is conceptually important because it gives an alternative view of a


rational number. In advanced mathematics, rational numbers are usually defined as
quotients of integers.
Theorem 2 implies that, for instance, the rational number 97 is equal to 9 7
9
or 7 , and the former is the preferred choice. Before explaining why this is so, we
first rewrite the algorithm for multiplying rational numbers in its final form: if a, b,
c, d are integers, then:
a c ac
=
b d bd
The proof is nothing more than a routine case-by-case verification. For example,
3 14 3 14
 
= (by Lemma)
7 5 7 5
3 14
 
= (Theorem 3 of 4)
7 5

127
3 14
= (by Lemma again)
(7 5)
(3) (14)
= (Theorem 3 of 4)
7 (5)
The reasoning for the general case is the same.
As a special case, we have that for integers a, b,
a 1
= a
b b
We can now give an indication of why 9
7 is to be preferred, most of the time,
9
over 7 as a representation of 79 . This is because
9 1
= (9)
7 7
whereas
9 1
= 9,
7 7
1
and it is much easier to think of one-seventh of 9 than 7 of 9.

Just as the division of fractions led to the concept of complex fractions, the di-
vision of rational numbers leads to a similar concept which, for lack of a name, will
be simply referred to as rational quotients. We now list the analogs of the basic
properties (a)(e) of complex fractions. Let x, y, z, w, . . . be rational numbers so
that they are nonzero where appropriate in the following. Then xy is an example of
a rational quotient; x will be called its numerator, and y its denominator.

x zx
(a) Generalized cancellation law: y
= zy for any nonzero z Q.
x z
(b) y = w if and only if xw = yz.
x z xwyz
(c) y w = yw
.
x z xz
(d) y w = yw .
   
x z u x z x u
 
(e) Distributive law y w v = y
w
y
v
.

128
Compared with the corresponding assertions for complex frac-
tions in 6 of Chapter 1, it will be noticed that in (b), the analog
of the inequality version of the cross-multiplication algorithm is
missing. Indeed, the presence of negative numbers adds com-
plexity to the comparison of rational numbers. This issue needs
extra care and will be left as an exercise.

An immediate consequence of (a) and (d) is the cancellation rule among ratio-
nal numbers:

x z z
(f ) y x = y .

For example, (f) justifies the cancellation 3


17
5
3 5
= 17 .
As in 6 of Chapter 1, we will avoid proving (a)(e) by the mechanical procedure
of writing out each rational number as a quotient of two integers for the routine
computation, but will instead make repeated use of the uniqueness part of Theorem
1.
To prove (a), for example, let A = xy , B = zx
zy
, and we will prove that A = B. By
the definition of division of rational numbers, we have x = Ay and zx = B(zy).
But the first equality implies zx = z(Ay) which is of course equal to zx = A(zy).
Now compare the latter with zx = B(zy). Theorem 1 says there is only one way to
express zx as a rational multiple of zy, so that we must have A = B.

We explicitly caution against incorrect reasoning in the passages


from
x
A= to x = Ay,
y
and from
zx
B= to zx = Bzy.
zy
It is tempting to think that each is the result of an appropriate
cancellation. For example, it would appear that by multiplying
both sides of A = xy by y, one would get Ay = x. However,

129
unless we already know that (a) and (c) are true, we do not get
Ay = x by cancellation (cf. (e) above). But we are, at this
stage, still trying to prove (a) itself, so we are in no position to
do any kind of cancellation as yet. Rather, the equality x = Ay
is the result of the definition of the division of x by y. Similarly,
one obtains zx = Bzy from B = zx zy
by virtue of the definition
of dividing zx by zy.
We repeat, there is no cancellation of any kind in the preced-
ing proof of (a).

To prove (d), let A = xy , B = wz , and C = yw


xz
. We want to show AB = C. Again,
by the definition of division, we get, respectively,

Ay = x
Bw = z
C(yw) = xz

Multiplying the first and second equalities together, we get AB(yw) = xz. Comparing
the latter with the third equality, we get AB = C by appealing to the uniqueness
part of Theorem 1 on how to express xz as a rational multiple of yw.
The proofs of (b) and (c) are similar and will be left as an exercise, and (e) re-
quires no proof as it is just the ordinary distributive law expressed in terms of rational
quotients.

These formulas may seem unnecessarily abstract, but they have interesting, prac-
tical consequences. For example, let x, y, . . . be rational numbers as before. Then
!1
x y
=
y x
y x
This is because, by (e), x
y
= 1. Also, we have the general form of invert and
multiply:
x
y x w
z =
w
y z
 1
This is because, by the definition of division, the left side is xy wz , and because
z 1
 
w = wz .

130
In school textbooks, computation of the following type is routinely given:
3
5 (3)(7)
2.4 =
7
5 2.4
At the same time, students are only told to invert and multiply ordinary fractions (see
5 of Chapter 1). The cumulative effect of this and similar practices on students is that
every skill they acquire will be perceived to be unconditionally valid. Consequently,
the basic fact that, in mathematics, a conclusion is valid only when certain hypotheses
are satisfied will be ignored in the long run.
Be sure to point out to your students that there is serious mathematics (such as
the Lemma, Theorem 2, and the rules (a) and (d) for rational quotients) behind the
seemingly simple general invert-and-multiply rule.

Exercise 2.5

1. Give a direct proof of (x)1 = (x1 ) by considering the two cases separately:
(i) x is a fraction and (ii) x is a negative fraction.

4
2. Write down an explanation you would give to an seventh grader that 54 = 5 .
Be prepared that this seventh grader is probably hazy about all these symbols to
begin with.

4
3. Explain to an eighth grader why 3/ 5 = 15
4 . Assume only a knowledge of the
multiplication of rational numbers, and explain what division means.

4. (a) Prove that, for rational numbers x, y, z, w (yw 6= 0), xy = wz if and only if
xz+wy
xw = yz. (b) Give a proof of xy + wz = yz for rational numbers x, y, z, w
(yw 6= 0), by making use of the uniqueness part of Theorem 1. (See the above proofs
of (a) and (c).)

5. Let x, y, z be rational numbers so that z = xy . Explain to a seventh grader why


(a) if x and y are both positive or both negative, z is positive, and (b) if one of x

131
and y is positive and the other negative, then z is negative.

6. Compute and simplify: (a) ( 39 9 39 5 7 5 1


8 11 ) + ( 8 33 ). (b) 1.2 + 1.8 . (c) 6 4
27 2 8
8 ( 3 9 ). (d) (4.79) 0.25 (0.5)(1.87).

7. Show that if A, B, C, and D are rational numbers, and A C 6= 0, then there is


a rational number x so that Ax + B = Cx + D.

8. (a) Let x be a nonzero rational number. Explain why x0 cannot be defined. (Hint:
Look carefully at the definition of a division xy and see where the reasoning begins to
break down if y = 0.) (b) Explain why 00 cannot be defined.

6 Comparing rational numbers

Recall the definition of x < y between two rational numbers x and y (see 1):
it means x is to the left of y on the number line.
x y

We also write y > x for x < y. A related symbol is x y (or, y x), which
means x < y or x = y.
In this section, we will take a serious look at the comparison of rational numbers
and prove several basic inequalities that are useful in school mathematics. In general,
we use the symbol < exclusively, but you should be aware that every one of these
inequalities has an analogous statement about .

We take note of three simple properties of the inequality between numbers. The
first two are the following:

If x y and y x, then x = y.
If x y and y z, then x z.

132
The third property deserves to be singled out because it plays a critical role in
many proofs. Given any two numbers x and y, then either they are the same point,
or if they are distinct, one is to the left of the other, i.e., x is to the left of y, or y is to
the left of x. These three possibilities are obviously mutually exclusive. In symbols,
this becomes:

Trichotomy law Given two numbers x and y, then one and only one of
the three possibilities holds: x = y, or x < y, or x > y.

The way this law comes up in proofs is typically the following. Suppose we try to
prove that two numbers x and y are equal. Sometimes it is impossible or difficult
to directly prove x = y. But by the trichotomy law, if we can eliminate x < y or
x > y, then the fact that x = y would follow.

The basic inequalities we are after are as follows. (Recall that stands for
is equivalent to.)

(A) For any x, y Q, x < y x > y.

For example, 2 < 3 3 < 2.


If x < 0 < y, then x > 0 while y < 0 and there is nothing to prove. Therefore
we need only to attend to the cases where x and y have the same sign, i.e., are
both positive or both negative. If 0 < x < y, then we have

y x 0 x y

On the other hand, if x < y < 0, then we have

x y 0 y x

In both cases, the truth of x > y is obvious.

(B) For any x, y, z Q, x < y x + z < y + z.

133
For example, given 2 < 3, we can verify directly that 2 15 < 3 15 and
2 + 73 < 3 + 37 .
We first prove that x < y implies x + z < y + z for any z. So suppose x < y.
Because of the commutativity of addition, it suffices to prove z + x < z + y, or
equivalently, the endpoint of the vector ~z + ~x is to the left of the endpoint of the
vector ~z + ~y . By the definition of vector addition, both vectors ~z + ~x and ~z + ~y are
obtained by placing the starting points of ~x and ~y , respectively, at the endpoint of ~z,
and the endpoints of the displaced ~x and ~y , respectively, will be z + x and z + y. Since
by hypothesis, the endpoint of ~x is to the left of the endpoint of ~y , the conclusion is
immediate.
The following picture shows the case where x > 0 and y > 0:

z z+x 0
- - -x -
z+y y

Next we prove x + z < y + z for some z implies that x < y. To do this, we


make use of what we have just proved: adding z to both sides of x + z < y + z
immediately yields x < y. The proof of (B) is complete.

Corollary For any x, y, w, z Q, if x < y and w < z, then x + w < y + z.

The detailed proof will be left as an exercise.

(C) For any x, y, Q, x < y y x > 0.

For example, (5) < (3) = (3) (5) > 0 (because (3) (5) = 2), and
conversely, (3) (5) > 0 = (5) < (3).
First, we prove that x < y = y x > 0. By (B), x < y implies x + (x) <
y+(x), which is equivalent to 0 < yx. Conversely, we prove yx > 0 = x < y.
Again we use (B): y x > 0 implies that (y x) + x > 0 + x, which is equivalent to
y > x, as desired.
It should be remarked that sometimes (C) is taken as the definition of x < y.

134
(D) For any x, y, z Q, if z > 0, then x < y xz < yz.

Thus, 4 < 5 = ( 23
6
)4 < ( 23
6
)5 (because the left side is 92
6
and the right side is
115
6
),and (11) < (9) = 7(11) < 7(9) (because the left side is 77 while the
right side is 66).
We first prove that, with x, y, z as given, x < y = xz < yz. We give two
proofs.
First, we make use of (C). By (C), xz < yz is equivalent to (yz xz) > 0. Now
(yz xz) = (y x)z. But we know z > 0 by hypothesis, and y x > 0 because
of the hypothesis that x < y and because of (C). Since the product of two positive
numbers is positive, we have (y x)z > 0, which means (yz xz) > 0,29 and therefore
xz < yz (by (C) again), as claimed. A second proof uses the theorem on fraction
multiplication, which equates a product with the area of a rectangle. Given z > 0 and
x < y. If x < 0 < y, then xz < 0 and yz > 0 and there would be nothing to prove.
Therefore we need only consider the cases where x and y have the same sign (which,
we recall, means they are both positive or both negative). If x, y > 0, then this
inequality is exactly inequality (A) at the end of 4 in Chapter 1. Briefly, the proof
goes as follows: x, y, and z are fractions and xz and yz are then areas of rectangles
with sides of length x, z and y, z, respectively (Theorem 2 in 4 of Chapter 1). Since
x < y, clearly the rectangle corresponding to yz contains the rectangle corresponding
to xz and therefore has a greater area. Hence yz > xz. Next, suppose x, y < 0, then
we get (x), (y) > 0. Moreover x < y implies (y) < (x), by (A). Thus we know
from the preceding argument that (y)z < (x)z, which is equivalent to yz < xz
(Theorem 3 in 4 of this chapter), and therefore yz > xz, by (A) again.
Finally, we prove the converse: if for some z > 0, xz < yz, then x < y. We claim
that z1 > 0. Indeed, since z( z1 ) = 1, and 1 > 0, we see that the product of the positive
number z with z1 is positive. Therefore z1 has to be positive. Such being the case,
then by what we have just proved, z1 > 0 and xz < yz imply that z1 (xz) < z1 (yz),
which is the same as x < y. (D) is proved.

As a corollary, we have: If x, y, z, w are fractions and x y and z w, then


29
Note that from now on, we are secure enough about the multiplication of rational numbers that
we will write xz in place of (xz) without fear of confusion.

135
xz yw. The proof will be left as an exercise.

(E) For any x, y, z Q, if z < 0, then x < y xz > yz.30

To students, the fact that when z < 0, the inequality x < y would turn into xz >
yz is the most fascinating aspect about inequalities. This goes against everything they
have learned up to this point, which suggests that whatever arithmetic operation they
apply to an inequality, the inequality would stay unchanged. Here is a situation where
an inequality gets reversed. We first illustrate with some examples. In each of the
following cases, the initial inequality is multiplied by 4:
1<2 but 4 > 8,
3 15
2
< 4
but 6 > 15,
1
2 < 2
but 8 > 2,
1 < 23 but 4 > 2 23 .
Again, we give two proofs of x < y = xz > yz when z < 0. First we make use
of (C). Given z < 0 and x < y, we have to prove xz > yz. By (C), this is equivalent
to proving that xz yz > 0, i.e., (x y)z > 0. From the hypothesis x < y and
using (C), we have y x > 0, which implies (y x) < 0, by (A), and therefore
y + x < 0. In other words, x y < 0. Since z is also negative, the product of the
two negative numbers z and x y is positive, i.e., (x y)z > 0, as desired.
For the second proof, let z = w, where w is now positive. Since x < y, (D) implies
that wx < wy. By (A), wx > wy. But Theorem 3 of 4 says wx = (w)x = zx,
and wy = (w)y = zy. So zx > zy.

The second proof suggests a more intuitive way to understand why, if z < 0, then
multiplying an inequality by z would reverse that inequality. Consider the special
case where 0 < x < y and z = 2. So we want to understand why (2)y < (2)x.
By Theorem 3 of 4, (2)y = (2y) and (2)x = (2x). Thus we want to see,
intuitively, why 2y < 2x. From 0 < x < y, we get the following picture:
30
In the preceding section, we warned against the tendency to assume that every skill is universally
applicable. There is no better illustration of the danger of this tendency than the contrast between
(D) and (E). One must begin to be sensitive to the fact that some facts are true only under restrictive
hypotheses.

136
0 x y

Then the relative positions of 2x and 2y do not change, but each is pushed further
to the right of 0. (Of course, if z were 21 , then x and y would be both pushed closer
to 0, but their relative position would still be the same.)

0 2x 2y

If we reflect this picture across 0, we get the following:

2y 2x 0 2x 2y

We see that 2y is now to the left of 2x, so that 2y < 2x, as claimed.

It remains to prove that if z < 0, then xz > yz implies x < y. We claim that
1
z
< 0. This is because z( z1 ) = 1 and 1 is positive. Since z is negative, z1 has to be
negative too. Thus by the first part of the proof, multiplying both sides of xz > yz
by z1 would reverse the inequality, i.e., z1 (xz) < z1 (yz). This is the same as x < y. The
proof of (E) is complete.

1
(F) For any x Q, x > 0 x
> 0.

This has essentially been proved in the course of proving (E). Observe that
= 1 > 0. Therefore x and x1 are either both positive, or both negative.
x( x1 )

The next item is an immediate consequence of (D)(F).

y y
(G) For any x, y, z Q, let x < y. If z > 0, then xz < z ; if z < 0, then xz > z .

Intrinsically tied to any discussion of inequalities is the notion of the absolute


value |x| of a number x, which is by definition the distance from x to 0 (i.e., the

137
length of the segment [x, 0] or [0, x], depending on whether x is negative or positive,
respectively). In particular, |x| 0 no matter what x may be. The most pleasant
property of the absolute value is that, for all numbers x, y,

|x| |y| = |xy|

This can be proved by a case-by-case examination of the four cases where x and y
take turns being positive and negative. The reasoning is routine. On the other hand,
inequalities involving absolute value present difficulties to students, so it is absolutely
essential that we come to grips with such inequalities. If b is a positive number, then
the set of all numbers x so that |x| < b consists of all the points x of distance less
than b from 0, indicated by the thickened segment below (excluding the endpoints):

b 0 x b

It follows that the inequality |x| < b for a point x is equivalent to the fact
that x satisfies both b < x and x < b. It is standard practice in mathematics to
combine these two inequalities into a composite statement in the form of a double
inequality:

|x| < b is equivalent to b < x < b.

In the usual notation for intervals on the number line, this becomes:

|x| < b is equivalent to x (b, b).

(Recall: the set of all the points x satisfying c < x < d, where c and d are two fixed
numbers so that c < d, is denoted by (c, d). The segments in our discussion thus
far are sets of the form (c, d) together with the endpoints c and d; these are denoted
by [c, d] , i.e., [c, d] is the set of all points x so that c x d.) The fact that the
single inequality |x| < b involving absolute value is equivalent to a double inequality
b < x < b is a very useful fact in the elementary considerations involving absolute
value. In the following, we sometimes refer to b < x < b as the associated dou-
ble inequality of |x| < b. The following example illustrates the way the conversion
of an absolute value inequality into its associated double-inequality can be put to use.

138
Example Determine all the numbers x so that |6x + 1| + 2 41 < 5, and show
them on the number line.
The inequality |6x + 1| + 2 41 < 5 is equivalent to |6x + 1| < 5 2 14 , (by
(B) above), which is just |6x + 1| < 2 34 , which in turn is equivalent to the double
inequality 2 34 < 6x + 1 < 2 34 . The left inequality is equivalent to 2 34 1 < 6x (by
(B) again), i.e., 15
4
< 6x. Now we multiply both sides of this inequality by 16 and
use (D) to conclude that it is equivalent to 1524
< x. By exactly the same reasoning,
3 7
the right inequality 6x + 1 < 2 4 is equivalent to x < 24 . Putting all this together,
1
we have that the inequality |6x + 1| + 2 4 < 5 is equivalent to the double inequality
15
24
< x < 247
. The set of all x satisfying this double inequality is indicated by the
thickened segment in the picture.

15 7
24 0 24
| {z }

Having introduced the concept of absolute value, we now face the question, asked
by most teachers (not to mention innumerable students), of why we bother with this
concept. As one educator noted, in schools, absolute value is often taught as a topic
disconnected from anything else in the curriculum; it is barely touched on, and is not
understood except as a kind of rote procedure (take off the minus sign if there is
one). Teachers feel handicapped by being made to teach something for which they
dont see any relevance.
It is not possible in an elementary text, especially one of this nature, to give a
wholly satisfactory answer to the question of why absolute value should be taught.
The importance of absolute value emerges mostly in the more advanced portions of
mathematics or the sciences, such as Chapter 16 below where we come face-to-face
with the concept of limit and the related inequalities. Here we have to be content
with giving only the main idea of why it is needed. There are situations where we
want only the absolute value (magnitude) of a number, but not so much whether
the number is positive or negative. For example, suppose you try to estimate the
sum of two 3-digit whole numbers, 369 + 177, by rounding to the nearest hundred.
The sum is of course 546, but the estimated sum would be 400 + 200 = 600. The
measurement of the accuracy of such an estimate is the so-called absolute error of

139
the estimation, which is by definition the absolute value of the difference between the
true value and the estimated value, i.e.,

absolute error = |true value estimated value|

In this case, it is |546 600| = 54. Now, if we do the same to the sum 234 + 420,
then the absolute error of the estimated value of 600 (= 200 + 400) is still 54, because
|654 600| = 54. These two estimates differ in that the former under-estimates by
an amount of 54, whereas the latter over-estimates by the same amount. However, as
a preliminary indication of the accuracy of these estimates, it can be said that they
both miss the mark by 54 and it doesnt matter whether they are over or under by this
amount. Thus it is the absolute value of this difference, rather than the difference
itself, that is of primary interest. The absolute value in this instance provides exactly
the right tool to express the error of such estimations.
Another illustration of the same idea, but one that has more substance, is the
following inequality:

Theorem 1 (i) For any numbers x and y,

2|xy| x2 + y 2

(ii) Equality holds in the preceding weak inequality if and only if x = y.

By FASM, it suffices to prove this for rational values of x and y, and we will
tacitly assume x, y Q in the discussion following. Before giving the proof, let us
understand the role played by the absolute value of xy, i.e., |xy|. Clearly, this theorem
is of no interest if one of x and y is 0, as it merely says in that case that 0 x2 + y 2 .
So we may as well assume both x and y to be nonzero. Such being the case, we make
use of (G) above to rewrite the theorem as

2|xy|
1
x2 + y 2
1
for all x and y. Since x2 + y 2 > 0, we have x2 +y 2 > 0 on account of (F). It follows

1 1
that | x2 +y 2 | = x2 +y 2 . Therefore, using |AB| = |A| |B| for all numbers A and B,

140
we get
2xy 1 1 2|xy|
= |2xy| = 2|xy| 2 = 2

2
x + y2
2 2 2 x + y2

x +y x +y
Thus the theorem is equivalent to

2xy

2

1,
x + y2

and that equality holds if and only if x = y. We know from a previous remark that
this inequality is equivalent to the double inequality
2xy
1 1
x2+ y2

In this form, part (i) of the theorem asserts that the number x22xy
+y 2
is trapped inside
the segment [1, 1] between 1 and 1 for all x and y.
Without the absolute value sign, part (i) of the theorem merely says that
2xy
1
x2+ y2
2xy
This inequality does not preclude the possibility that x2 +y2 = 100. With the
2xy
absolute value sign in place, however, we know that x2 +y2 cannot be to the left of 1
on the number line and, in particular, cannot be equal to 100. We see clearly that,
in this case, the presence of absolute value in the inequality makes a big difference.
This line of reasoning gives us another insight into why absolute value is impor-
tant. Very often in advanced mathematics, we want to control the absolute value of
a given number, in much the same way that we want to control the absolute value of
xy in Theorem 1. Typically, this control is obtained only after stringing together a
sequence of inequalities involving absolute values. If we do not explicitly make use
of absolute value at each stage, then we would be forced to deal instead with two
inequalities each time (i.e., those given by the associated double inequalities). As the
number of such inequalities involving absolute value increases, the number of ordinary
inequalities we need to look at becomes unmanageable. The use of absolute value is
thus a necessity.

141
It remains to give the simple proof of Theorem 1. We first prove part (i) in its
original formulation:
2|xy| x2 + y 2
Let u = |x| and v = |y|, then as we have seen, 2|xy| = 2|x| |y| = 2uv. Now we make
the simple observation that for all numbers t, t2 = |t|2 ; this is clear when we first
consider the case t 0 and then the case t < 0. Therefore, we have x2 = |x|2 =
|x| |x| = uu = u2 . Similarly, y 2 = v 2 . Thus part (i) becomes the statement that

2uv u2 + v 2

which is equivalent to 0 u2 2uv + v 2 , by (B) above. In other words, part (i) is


equivalent to
u2 2uv + v 2 0
for any numbers u and v. This is however obvious because u2 2uv + v 2 = (u v)2
and (u v)2 0. This proves (i). To prove part (ii), the fact that if x = y then
2|xy| = x2 + y 2 is trivial. So suppose 2|xy| = x2 + y 2 , and we will prove x = y. By
(B) above, 2|xy| = x2 + y 2 implies x2 2|xy| + y 2 = 0, so that |x|2 2|x| |y| + |y|2 = 0.
Therefore (|x| |y|)2 = 0, and therefore |x| |y| = 0, i.e., |x| = |y|. The proof is
complete.

It may be thought that part (ii) of Theorem 1 about when an inequality becomes
an equality is not interesting. The opposite is true; see problem 16 of Exercises 2.6
below.
We conclude with what is probably the most basic inequality involving absolute
value in elementary mathematics.

Theorem 2 (Triangle Inequality) (i) For any numbers x and y,

|x + y| |x| + |y|

(ii) This inequality is an equality if and only if x and y are of the same sign, i.e.,
both are 0 or both are 0.

Proof We first prove part (i). If one of x and y is 0, then there is nothing to
prove. We assume therefore that both x and y are nonzero. The most elementary

142
proof is one using case-by-case examination of the inequality. There are four cases to
consider: (i) both x, y > 0, (ii) both x, y < 0, (iii) x < 0 but y > 0, and (iv)
x > 0 but y < 0. Such a proof is boring, but would be instructive if you want to
get some down-to-earth feelings about absolute values.
We give a different proof, one that makes use of the fact that the inequality
|x| b is equivalent to the double-inequality b x b. This is a standard
proof, but is also one from which one can learn something about absolute values.
Therefore instead of proving |x + y| |x| + |y|, we prove the double inequality
|x| |y| x + y |x| + |y|. There is no question that |x| x |x| and
|y| y |y|. From |x| x and |y| y, we use the corollary of (B) above
to conclude that |x| |y| x + y. Similarly, we use x |x| and y |y| and
the corollary of (B) to conclude that x + y |x| + |y|. Thus we have proved both
inequalities in the double inequality.
Next we prove part (ii). There is no question that if x and y are of the same
sign, then equality holds in the inequality, i.e., |x + y| = |x| + |y|. We now prove the
converse, namely, |x+y| = |x|+|y| implies that x and y are of the same sign. If one of
x and y is 0, then x and y are already of the same sign and there is nothing to prove.
Assume therefore that both x and y are nonzero, and we will use a contradiction
argument. Suppose |x + y| = |x| + |y| and x, y are not of the same sign. This means
one of x and y is negative and the other is positive. For definiteness, say x < 0 and
y > 0. Then |x|+|y| = x+y, but |x+y| = x+y or |x+y| = (x+y). In case of the
former, we have x + y = x + y, so that 2x = 0 and therefore x = 0. this contradicts
x being negative. If the latter, then x + y = (x + y), so that x + y = x y,
and therefore 2y = 0. Hence y = 0 and this contradicts the positivity of y. Thus it is
impossible that x and y are not of the same sign when |x + y| = |x| + |y|. The proof
of Theorem 2 is complete.

Exercises 2.6
1. (a) If x, y, z, w are rational numbers and x y and w z, then (a) show that
x + w y + z. (This is the Corollary of (B).) (b) If, in addition, all four numbers
0, then show that xw yz.

2. (a) Let x, y, z, w Q, and let y, w > 0. Then prove that xy < wz xw < yz.

143
(b) Give examples to show that both implications xy < wz = xw < yz and
xw < yz = xy < wz are false without the assumption that y, w > 0. (c) Are
the following numbers
32.5 30 23
and
3 2 45
equal? If so, prove. If not, which is bigger?

2 2 4
3. Which is greater? (a) (1.7)(9) or 22 + 6 32 . (b) 5
1.1
or (5) 12.5 . (c) /
3 7
or
( 14
3
)( 2
8.5
).

4. (a) Determine all the numbers x which satisfy |x 1| 5 < 23 . (b) Do the same
for 11 |3 + 2x| > 2.5. (c) Do the same for |2x 35 | 51 . (d) Do the same for
3 |2x 5| 4.2.

5. For any two rational numbers p and q, prove that the length of the segment be-
tween p and q is exactly |p q|.

6. Let x and y be rational numbers. (a) How does |x| |y| compare with |x y|?
Why? (b) How does | |x| |y| | compare with |x y| ? Why?

7. If x and y are rational numbers, and y 6= 0, then prove that



x |x|
=

|y|

y

8. If x and y are positive rational numbers, then prove that (a) x2 = y 2 if and only
if x = y, and (b) x2 < y 2 if and only if x < y.

9. If x is a rational number, is it true that x < 1 implies x1 > 1? If so, prove. If not,
formulate a true statement, and prove it.

10. If x, y are numbers so that 0 < x < y, and n is a positive integer, how does xn
compare with y n ? Why?

144
11. If x > 1, then prove that xn > 1 for any positive integer n. Also if 1 < x < 1,
then prove that 1 < xn < 1 for any positive integer n.

12. Let x be a rational number. (a) If x > 1, then show that xm > xn for whole
numbers m > n. (b) If 0 < x < 1, then prove that xm < xn for whole numbers
m > n.

13. Show that for all numbers x, y, and c 6= 0,


1
|x + y|2 (1 + )|x|2 + (1 + c2 )|y|2
c2

14. Can you see why if x and y are any two rational numbers (in particular, they
could be negative), then 19 x2 12
1 1 2
xy + 64 y 0?


15. For any two positive numbers s and t, show that s + t 2 st. Furthermore,
show that equality holds in this inequality if and only if s = t.
 2
16. (a) For any positive numbers a and b, prove that ab a+b 2
, and that equality
holds if and only if a = b. (b) Prove that among all rectangles with a fixed perimeter,
the square has the biggest area.

145
Chapter 3: The Euclidean Algorithm

1 The reduced form of a fraction (p. 147)


2 The fundamental theorem of arithmetic (p. 160)

146
This chapter gives two principal applications of the Euclidean Algorithm, which is
a finite procedure for finding the greatest common divisor of two given whole numbers.

The reduced form of a fraction

A fraction mn
is said to be a reduced form of a given fraction k` if m n
= k` and if
no whole number other than 1 divides both the numerator m and the denominator n.
In general, a fraction with the property that no whole number other than 1 divides
both the numerator and the denominator is said to be in lowest terms, or reduced.
A fact taken for granted in elementary school is that any fraction has a reduced form,
and that there is only one. When classroom instruction focuses entirely on fractions
with single-digit numerator and denominator, the reduced form of a fraction can be
obtained by visual inspection. For fractions with larger numerators and denominators,
deciding whether a fraction is in reduced form is often not so obvious. For example,
is the fraction
1147
899
reduced? (It is not. See Exercises 3.1.)
The purpose of this section is to clarify this situation once and for all by proving
the following theorem. The statement requires that we introduce the term algo-
rithm, which is an explicit finite procedure that leads to a desired outcome.

Theorem 1 Every fraction has a unique reduced form. Furthermore, this reduced
form can be obtained by an algorithm.

The proof of the theorem involves some number-theoretic considerations about


whole numbers. We start at the beginning.

We say a nonzero integer d is a divisor or a factor of an integer a, or d divides


a, if a = cd for some integer c. We also call a = cd a factorization of a. Another
way to say d divides a is to say that the rational number ad is an integer. We write
d|a when this happens, and we also say a is an (integral) multiple of d. If d does
not divide a, we write d 6 | a.
Observe that (i) if k|` and `|m, then k|m, and (ii) every nonzero integer

147
divides 0. The simple proofs are left as exercises.

In the following discussion, most of the time all the integers involved are whole
numbers, i.e., integers which are positive or 0. However, there are one or two places
which would become very awkward if we restrict ourselves only to whole numbers
(cf. the proof of the Key Lemma below). For this reason, we bring in integers from
the beginning. When we need to focus on whole numbers, we will be explicit about
it, e.g., the concept of a prime immediately following.
A whole number a which is greater than 1 has at least two whole number divisors,
1 and a itself. A proper divisor d of a whole number a is a whole number divisor
of a so that 1 < d < a. Note that if a = cd for whole numbers c and d so that
both c and d are > 1, then each of c and d is a proper divisor of a. A whole number
> 1 without proper divisors is called a prime, or prime number. A whole number
which is > 1 and is not a prime is called a composite. Note that by definition, 1
is neither prime nor composite. Checking whether a whole number is a prime, while
difficult in general, is easier than appears at first sight, because we have the following
lemma.

Lemma Given a whole number n > 1. If no prime number p satisfying 2 p

n is a divisor of n, then n is a prime.



For a positive number x, its positive square root x is the

positive number so that its square is equal to x, i.e., ( x)2 = x.
In Chapter 16, we will prove that any positive real number has a
unique positive square root. Here we anticipate this fact, but rest
assured that there is no danger of circular reasoning.
Observe that if a, b are positive numbers, then a < b is equivalent

to a < b (problem 8(b) in Exercises 2.6).

For example, to check whether 233 is a prime, it suffices to check whether any of
the primes 16 divides 233 (because 162 = 256 > 233). The primes in question are
2, 3, 5, 7, 11, 13. Since none of them divides 233, we conclude that 233 is a prime.
The Lemma can be rephrased equivalently as one about composite numbers: If

n N is composite, then it has a prime divisor p in the range 2 p n. We will

148
prove this version of the Lemma.

Proof Suppose n is composite. We first show that n has a proper divisor ` in



the range 2 ` n. If not, then every proper divisor of n exceeds n. So let `
be a proper divisor of n, and we write n = m` for some whole number m. Then m is

itself a divisor of n and therefore m > n as well. We then have (cf. problem 1(b) of
Exercises 2.6):

n = m` > n n = n
and therefore we have the absurd situation of n > n. Thus n must have a proper

divisor ` so that 2 ` n.
If ` happens to be a prime, we are done. If not, then among all the distinct proper
divisors of `, let p be the smallest. Then p must be a prime because otherwise p has a
proper divisor q, and since q|p and p|`, we have q|`. Since q < p, p is not the smallest

proper divisor of `, which is a contradiction. So p is a prime, and p < ` n, as
claimed. The Lemma is proved.

Given two whole numbers a and d, there is at least one whole number that divides
both a and d, namely, the number 1. A whole number c is said to be the GCD
(greatest common divisor) of whole numbers a and d if, among all the whole
numbers which divide both a and d, c is the greatest. Notation: GCD(a, d). Two
whole numbers a and d are said to be relatively prime if GCD(a, d) = 1.
We can approach GCD from another angle. Given a whole number n, let D(n) be
the collection of all the divisors of n. Then

the collection of all common divisors of a and d = D(a) D(d)

The following is therefore an equivalent formulation of the concept of GCD:

GCD(a, d) = max{D(a) D(d)}

where max indicates the largest number in the set. In this formulation, it is clear
that the GCD of any two whole numbers a and d always exists: after all, the set
D(a) D(d)} is a finite set and all we have to do is pick the largest number in it. In
this notation, we also see that a and d being relatively prime is equivalent to

D(a) D(d) = 1

149
At this point, we can explain why we are interested in the GCD of two whole
numbers. Let a fraction m
n be given. Let k be the GCD of m and n, and let m = km
0
0
and n = kn0 for some whole numbers m0 and n0 , We claim that m
n0 is the reduced
m m0 m
form of n . The equality n0 = n is a consequence of the Theorem on Equivalent
0
Fractions. As to why m 0 0
n0 is reduced, suppose it is not. Then m and n have a
common divisor ` > 1, let us say m0 = `a and n0 = `b for some whole numbers a
and b. Then
m = km0 = k`a, and n = kn0 = k`b
It follows that k` is a common divisor of m and n. Since ` > 1, k` > k, and this
contradicts the fact that k is the greatest of the common divisors. Our claim of the
existence of a reduced form for m n is now proved.
From a practical point of view, what we want is more than a theoretical assurance
that there is a GCD; we want an explicit procedure that unfailingly yields the GCD
of two given whole numbers. This is what the Euclidean algorithm accomplishes.
The proof of the uniqueness of the reduced form turns out to be more subtle. The
proof given below relies on the following lemma which is also the key to many other
basic facts. What is interesting is that this lemma is a natural consequence of the
Euclidean algorithm.

Key Lemma Suppose `, m, n are nonzero whole numbers, and `|mn. If ` and
m are relatively prime, then `|n.

One can appreciate this Key Lemma better if one notices that a whole number `
can divide a product without dividing either factor. Thus, 63|(18 245), but 63 6 |18
and 63 6 |245. What the Key Lemma says is that if ` is relatively prime to one of the
factors, then ` dividing the product would imply that ` divides the other factor. (It
goes without saying that 63 is relatively prime to neither 18 nor 245, so there is no
contradiction to the Key Lemma.)
The proof of the Key Lemma requires some preparation concerning the greatest
common divisor of two numbers, which in turn requires that we review the well-
known procedure of division with remainder: given whole numbers a and d, then

150
the division with remainder of a by d is given by

a = qd + r where q, r N, and 0 r < d

The whole number r is the remainder. (In abstract algebra, this is of course the
division algorithm for integers, but in school mathematics, one cannot afford to use
this terminology because it causes confusion with the long division algorithm.) The
main observation here is that, given a = qd + r,

GCD(a, d) = GCD(d, r)

The way we prove this is to prove something slightly more general, namely, the
equality of the following two sets:

D(a) D(d) = D(d) D(r)

By the characterization of the GCD of two numbers in terms of the divisors of each
number, this implies the above observation about GCD.
Before giving the proof of this equality of the sets of common divisors, we should
make two remarks. First, this equality of sets means precisely that the following two
inclusions hold:
D(a) D(d) D(d) D(r)
and
D(d) D(r) D(a) D(d)
A second remark is that suppose A, B, C are whole numbers and A = B + C. If a
whole number n divides any two of A, B, C, then n divides all three. The proof is
straightforward (see Exercise 3.1). This fact will be used several times in the proof
of the inclusions.
Let us prove the first inclusion relationship:

D(a) D(d) D(d) D(r)

Suppose a whole number n belongs to the left side, then we must prove that it belongs
to the right side. In other words, if n divides both a and d, then it divides both d
and r. Therefore we must prove that if n divides both a and d, then it divides the r
in
a = qd + r

151
However, this means n divides both a and qd in this equation, so it divides the third
number r, by the second remark above.
The proof of the reverse inclusion is entirely similar. It follows that if

a = qd + r,

then
GCD(a, d) = GCD(d, r)
There are many reasons why this fact is interesting. The most obvious reason can
be seen from a simple example. Suppose we want to get the GCD of 469 and 154.
We have
469 = (3 154) + 7
Therefore GCD(469, 154) = GCD(154, 7), and since obviously GCD(154, 7) = 7, it
follows immediately that GCD(469, 154) = 7. Thus one application of division with
remainder suffices to yield the GCD of 469 and 154.
Is this an accident? Not entirely, because the determination of the GCD of two
numbers a and d is quite easy if at least one of a and d is sufficiently small. Witness
the fact that GCD(154, 7) = 7. In general, if d is sufficiently small, then the set
D(d) can be determined by visual inspection one way or another, and therewith also
the set D(a) D(d). Thus GCD(a, d), being max{D(a) D(d)}, can be determined.
The virtue of having the equality GCD(469, 154) = GCD(154, 7) is, therefore, that
instead of having to deal with two fairly large numbers 469 and 154, we are reduced
to dealing with 154 and 7, and 7 is sufficiently small (no matter how one defines
sufficiently small). As we have seen, this leads immediately to the determination
of GCD(469, 154).
In one sense, though, this example is an accident: in this case, one application of
division with remainder suffices to determine GCD(469, 154). In general, one appli-
cation is not enough. For example, suppose we try to find GCD(3008, 1344). From

3008 = (2 1344) + 320,

we get GCD(3008, 1344) = GCD(1344, 320). This time, it is not obvious what
GCD(1344, 320) is, because for one thing, 320 has too many divisors so that D(320)

152
becomes unwieldy. But we can again apply division with remainder to 1344 and 320
to get a yet smaller number:

1344 = (4 320) + 64

and we get GCD(1344, 320) = GCD(320, 64). But now D(64) is seen to consist of
powers of 2 up to the 6th power, and it turns out that 64 itself divides 320, and
therefore GCD(320, 64) = 64. Hence

GCD(3008, 1344) = GCD(1344, 320) = GCD(320, 64) = 64

We can further illustrate this process with a slightly more complicated example:
let us find the GCD of 10049 and 1190. From

10049 = (8 1190) + 529

we get GCD(10049, 1190) = GCD(1190, 529). Since it is not obvious what GCD(1190, 529)
is, we apply division with remainder to 1190 and 529:

1190 = (2 529) + 132

Now GCD(10049, 1190) = GCD(529, 132), because GCD(1190, 529) = GCD(529, 132).
However, it is still not immediately obvious what GCD(529, 132) is, so we go one step
further:
529 = (4 132) + 1
This time, GCD(132, 1) = 1, so GCD(10049, 1190) = GCD(132, 1) = 1. Incidentally,
we have exhibited a nontrivial example of a pair of relatively prime integers: 10049
and 1190.
It is not difficult to explain this method of determining the GCD of two whole
numbers in general. The idea is that the application of division with remainder to
two whole numbers reduces the problem of finding the GCD of these numbers to the
determination of the GCD of a second pair of numbers which are correspondingly
smaller than the original pair. Thus, given the whole number pair a and d, with
a > d, division with remainder yields

a = qd + r, where 0 r < d

153
The equality
GCD(a, d) = GCD(d, r)
replaces the determination of the GCD of a and d by the determination of the GCD
of d and r, and the advantage is that a > d and d > r. This process can be repeated
by doing division with remainder on the pair d and r, getting

d = q1 r + r1 , where 0 r1 < r

Now we get GCD(a, d) = GCD(d, r) = GCD(r, r1 ), and a > d > r > r1 . In this
way, we introduce smaller and smaller numbers at each step into the problem, until
we inevitably reach either 1 or 0. At that point, if not earlier, we would be done
because, in either case,

GCD(n, 1) = 1 and GCD(n, 0) = n

for any whole number n.

It is now clear that iterations of division with remainder lead to the determination
of the GCD of any two whole numbers in a finite number of steps. What is even more
remarkable is the fact that there is more information to be extracted from this process.
First consider the simplest case of the GCD of 469 and 154. We had

469 = (3 154) + 7

This equation not only shows that GCD(469, 154) = 7, but also exhibits the GCD,
which is 7, as an integral linear combination of the two original numbers 469 and
154, in the sense that 7 is an integer multiple of 469 plus an integer multiple of 154,
namely,
7 = {1 469} + {(3) 154} .
If we consider the fact that the GCD of 469 and 154 is defined in terms of 469 and
154 using the concept of multiplication, the expression of this GCD as the sum of
multiples of these two numbers must come as a surprise, to say the least.
Let us represent GCD(3008, 1344) = 64 as an integral linear combination of 3008
and 1344. We first list the steps of the division with remainder:

3008 = (2 1344) + 320


1344 = (4 320) + 64

154
Now rewrite, in reverse order, each of these divisions-with-remainder as an equation
expressing the remainder as an integral linear combination of the divisor and dividend,
thus:

64 = 1344 + ((4) 320)


320 = 3008 + ((2) 1344)

Substitute the value of 320 in the second equation into the first, and we get:

64 = 1344 + (4) (3008 + (2) 1344)


= 1344 + ((4) 3008) + (8 1344)
= {9 1344} + {(4) 3008}

In other words the GCD of 1344 and 3008 is 64, and

64 = {9 1344} + {(4) 3008}

Let us also express GCD(10049, 1190) as an integral linear combination of 10049


and 1190. Again, we first list the steps of division with remainder:

10049 = (8 1190) + 529


1190 = (2 529) + 132
529 = (4 132) + 1

As before, we rewrite each equation as an expression of the remainder in terms of the


dividend and divisor, but in reverse order:

1 = 529 + ((4) 132)


132 = 1190 + ((2) 529)
529 = 10049 + ((8) 1190)

So we have, by repeated substitution:

1 = 529 + ((4) 132)


= 529 + (4)(1190 + (2) 529)
= 529 + ((4) 1190) + (8 529)

155
= (9 529) + ((4) 1190)
= 9 (10049 + (8) 1190) + ((4) 1190)
= (9 10049) + ((72) 1190) + ((4) 1190)
= {9 10049} + {(76) 1190}

Thus the GCD of 10049 and 1190 is 1, and

1 = {9 10049} + {(76) 1190}

Clearly, no one would consider this expression of 1 as an integral linear combination


of 10049 and 1190 to be obvious.

The general case is quite clear at this point. Let whole numbers a and d be given,
with a > d. If we iterate the process of performing division with remainder on a and
d, and then on d and the remainder, etc., obtaining:

a = qd + r
d = q 1 r + r1
r = q2 r1 + r2
r1 = q3 r2 + r3
r2 = q4 r3 + 0

Note that the division with remainder can, in principle, continue for d 1 steps
before it terminates with remainder 0, but for simplicity of writing, we have allowed
the remainder 0 to appear after 4 steps. Clearly there is no loss of generality in the
reasoning. That said, we conclude that

GCD(a, d) = GCD(d, r) = GCD(r, r1 ) = GCD(r1 , r2 ) =


GCD(r2 , r3 ) = GCD(r3 , 0) = r3 .

Furthermore, the GCD, which is r3 , can be expressed as an integral linear combination


of a and d as follows. First, rewrite the preceding sequence of equations in reverse
order, each time expressing the remainder as a linear combination of the dividend
and the divisor:

r3 = r1 + (q3 ) r2

156
r2 = r + (q2 ) r1
r1 = d + (q1 ) r
r = a + (q) d

Therefore, repeated substitution of ri into the equation above it yields:

r3 = r1 + (q3 ) r2
= r1 + (q3 )(r + (q2 ) r1 )
= (q3 ) r + (1 + q2 q3 ) r1
= (q3 ) r + (1 + q2 q3 ) (d + (q1 ) r)
= (1 + q2 q3 ) d + (q1 q3 q1 q2 q3 ) r
= (1 + q2 q3 ) d + (q1 q3 q1 q2 q3 ) (a + (q) d)
= (1 + q2 q3 + q (q1 + q3 + q1 q2 q3 )) d + (q1 q3 q1 q2 q3 ) a

We have therefore proved the following theorem:

Theorem 2 (Euclidean Algorithm) Given a, d N. Then GCD(a, d) can be


obtained by a finite number of applications of division with remainder. Furthermore,
GCD(a, d) is an integral linear combination of a and d.

The reason we are interested in the Euclidean Algorithm in this context is that it
leads directly to proof of the Key Lemma. We recall:

Key Lemma Suppose `, m, n are nonzero whole numbers, and `|mn. If ` and
m are relatively prime, then `|n.

Proof The following brilliant proof is (so far as we can determine) due to Euclid,
a fact that, of course, also accounts for the name.31 We are given whole numbers
31
Euclid, whose name has become part of the English language (and probably every language), was
a Greek mathematician who lived in Alexandria (in Egypt) around 300 B.C. Essentially nothing is
known about him other than the fact that he authored the Elements, a comprehensive account and re-
organization of the mathematical knowledge known at his time together with his own contributions.
Contrary to common perception, only the first six of the thirteen books of this work are devoted
to plane geometry. The Euclidean Algorithm appears in Book VII. The Fundamental Theorem of
Arithmetic is in Book IX, as is his famous proof that there are an infinite number of primes.

157
`, m, and n, so that `|mn and ` and m are relatively prime. We must prove `|n.
Since ` and m are relatively prime, GCD(`, m) = 1. By the Euclidean algorithm,
1 = ` + m for some integers and . Multiply this equation through by n, and
we get n = `n + mn. Since ` divides mn by hypothesis, `|(mn); obviously,
`|(`n). Therefore ` divides `n + mn, which is n. In other words, ` divides n. The
proof is complete.

We are now in a position to give a proof of the main theorem of this section an-
nounced earlier: Any fraction k` , is equal to a unique fraction m
n in reduced form.
m
Moreover, there is an algorithm to produce this n .

Proof Let GCD(k, `) = a. Thus k = am and ` = an for some whole numbers


m and n. Note that m and n are relatively prime because if GCD(m, n) = b > 1, then
ab|(am) and ab|(an), so that ab is a common divisor of k and ` which is bigger than a,
contradicting the fact that a is the GCD of k and `. Therefore m
n is a reduced fraction.
By equivalent fractions, m am k m
n = an = ` . Thus this n is the desired fraction. Since
m = ka and n = a` , and since a is obtained from k and ` by the Euclidean algorithm,
the theorem is proved with the exception of the uniqueness statement.
0 0
To prove uniqueness, suppose k`0 = k` and k`0 is reduced. We must prove that
0
k 0 = m and `0 = n. We have k`0 = m k
n , both being equal to ` . By the cross-
multiplication algorithm, k 0 n = `0 m. Since n|(k 0 n), we see that n|(`0 m). Since m
n
is
reduced, m and n are relatively prime. Therefore the Key Lemma implies that n|`0 ,
so that n `0 . We now look at k 0 n = `0 m from a different angle. Since `0 |(`0 m), we
0
have `0 |(k 0 n). Since k`0 is reduced, `0 and k 0 are relatively prime. By the Key Lemma,
we must have `0 |n and thus `0 n. Together with n `0 , we get n = `0 . Using
k 0 n = `0 m, we conclude that also k 0 = m, as desired.

Exercises 3.1

1. (i) If k, `, m are integers, and if k|` and `|m, then prove that k|m. (ii)
Show that every nonzero integer divides 0. (Caution: Use the precise definition of

158
divisibility.)

2. Suppose A, B, C are whole numbers and A = B + C. Show that if a whole number


n divides any two of A, B, C, then n divides all three.

3. Find the GCD of each of the following pair of numbers by listing all the divisors
of each number and compare: 35 and 84, 54 and 117, 104 and 195.

4. Find the GCD of each of the following pairs of numbers, and express it as an
integral linear combination of the numbers in question: 322 and 159, 357 and 272,
671 and 2196.

5. Let the GCD of two positive integers a and d be k, and let k = ma nd for
some whole numbers m and n. Show that m and n are relatively prime.

160 273
6. In each of the following, find the reduced form of the fraction. (a) 256 . (b) 156 .
144 1147
(c) 336 . (d) 899 .

7. Let x, y be whole numbers. Suppose 6 divides both 35x and 11y . Does 6 divide
385 + xy ?

8. The effectiveness of the Euclidean algorithm depends on how fast the remainders
in the sequence of iterated divisions-with-remainder get to 0. Here is an indication:
Suppose we have three iterated divisions-with-remainder as follows:

d = q1 r + r1
r = q2 r1 + r2
r1 = q3 r2 + r3

Then prove that r3 < 21 r1 . (Of course, r3 being the remainder when r1 is divided
by r2 , it is already guaranteed that r3 < r2 . Similarly, the second equation shows
r2 < r1 . Together, we get r3 < r1 . However, what this problem says is that r3 is half
as small as expected.)

159
9. Prove that a whole number is divisible by 4 exactly when the number formed by
its last two digits (i.e., its tens digit and ones digit) is divisible by 4. (Thus 93748 is
divisible by 4 because 48 is divisible by 4.)

10. Prove that a whole number is divisible by 5 if and only if its last digit is 0 or 5.

11. Prove that the number 3 is a divisor of a whole number n if and only if 3 is a
divisor of the number obtained by adding up all the digits of n. (Hint: 3 dividing a
power of 10 always has remainder 1.)

12. Repeat problem 10 with the number 3 replaced by the number 9.

13. (a) For any whole number n, prove that GCD(n, n + 1) = 1. (b) What is
GCD(n, n + 2) for a whole number n? (c) Let n be a whole number. What could
GCD(n, n + k) be for a whole number k?

2 The fundamental theorem of arithmetic

The purpose of this section is to prove the following basic theorem and to use it
to draw two consequences about numbers: the first about fractions which are equal
to finite decimals, and the second about the existence of numbers (i.e., points on the
number line) which lie outside Q.

Theorem 1 (Fundamental theorem of arithmetic) Every whole number n


is the product of a finite number of primes: n = p1 p2 pk . Moreover, this collection
of primes p1 , . . . , pk , counting the repetitions, is unique.

This theorem will usually be referred to as FTA. The uniqueness statement,


which is important for many reasons, can be made more explicit, as follows: Suppose
n = p1 p2 pk = q1 q2 q` , where each of the pi s and qj s is a prime. Then k = `
and, after renumbering the subscripts of the qs if necessary, we have pi = qi for all
i = 1, 2, . . . , k.
The expression of n as a product of primes, n = p1 p2 pk , is called its prime

160
decomposition. Let it be noted explicitly that in the above expression, some or all
of the pi s could be the same, e.g., 24 = 2 2 2 3. FTA says that, except for the
order of the primes, the prime decomposition of each n is unique.
It should not be assumed that getting the explicit prime decomposition of a whole
number is easy. Try 9167, for instance. Even with the help of the Lemma of the last
section, we still have to check all the primes 96 to see if any of them divides 9167.
It turns out that 9167 has the prime decomposition: 9167 = 89 103. The whole
field of cryptography, which makes possible the secure transmission of confidential
information on the internet, depends on the fact that if a number is very large, say
300 digits, then all the computers in the world put together cannot get its explicit
prime decomposition in 100 years.
On the other hand, it is easy to establish that, on a theoretical level, every whole
number has a prime decomposition.32 Given n N, if it is a prime, we are done. If
not, n has a proper divisor. Among all its proper divisors, take the smallest, to be
called p. Arguing as in the proof of the Lemma above, this p is a prime. Therefore
let us write n = pn1 for some whole number n1 . Apply the same argument to the
whole number n1 , and we get n1 = qn2 , where q, n2 are whole numbers and q is a
prime. Then we have n = pqn2 . Repeat the same argument on n2 , and after a finite
number of steps, we get a prime decomposition of n.
It is the uniqueness that is more interesting and more difficult. The proof of
uniqueness is mathematically sophisticated, and it is due to Euclid. Let us first
convince ourselves that there is something to prove. Consider the following two
expressions of 4410 as a product:

4410 = 2 9 245 = 42 105

These two products, 2 9 245 and 42 105, have different numbers of factors and
the factors are all distinct. The nonuniqueness of the expression of 4410 as a product
is striking. Of course with the exception of 2, none of the factors is a prime. Once we
require that each factor in the product is a prime, then we get only one possibility
32
The difference between the explicit determination of a number and the theoretical statement
that this number exists can be seen from an example.
R 7 It is easy to write down a definite integral
whose exact value is impossible to determine, e.g., 0 sin(x3.6 )dx, but the fact that this integral is
equal to some number is easy to prove.

161
(other than those obtained by permuting the factors):

4410 = 2 3 3 5 7 7

The question is: why must uniqueness emerge as soon as we require each factor to be
a prime? The answer resides, in large part, in the Key Lemma of the last section.

Proof of uniqueness of prime decomposition Let n be a whole number so


that n = p1 p2 pk = q1 q2 q` , where the ps and the qs are primes. We want to
prove that k = `, and that after a renumbering of the subscripts of the qs if necessary,
we have pi = qi for all i = 1, 2, . . . , k.
We first prove that p1 is equal to a qj , for some j. Since p1 |p1 pk , we have
p1 |q1 q` . Write Q1 = q2 q` , then p1 |q1 Q1 . If p1 = q1 , we are done. If not,
then p1 and q1 are distinct primes and are therefore relatively prime. By the Key
Lemma of the last section, p1 |Q1 . Writing Q2 for q3 ql , we have p1 |q2 Q2 . Again,
if p1 = q2 , we are finished. If not, then p1 and q2 , being distinct primes, are relatively
prime. The Key Lemma implies p1 |Q2 , etc. Either p1 is equal to one of q3 , . . . q`1 ,
or after ` 1 steps, we have p1 |q`1 q` . If p1 = q`1 , we are done. Otherwise, p1 and
q`1 are relatively prime and therefore p1 |q` . Since both p1 and q` are primes, this is
possible only if p1 = q` . Thus p1 is equal to a qj , for some j.
By relabeling the subscripts of the qs if necessary, we may assume that p1 =
q1 . The hypothesis that n = p1 p2 pk = q1 q2 q` now reads: p1 (p2 pk ) =
p1 (q2 q` ). Multiply both sides by (p1 )1 , and we get:

p2 pk = q2 q`

We may repeat the same argument, and conclude that p2 is equal to one of q2 , . . . q` .
By re-arranging the subscripts of the qs if necessary, we may assume that p2 = q2 ,
so that
p3 pk = q3 q ` ,
etc. If k 6= `, let us say k < `, then after k 1 such steps, we get

1 = qk+1 q`

This is impossible because each of the primes qk+1 , . . . q` is greater than 1. Thus k = `
after all, and pi = qi for all i = 1, . . . k. The proof of FTA is complete.

162
FTA has an interesting application to fractions. The following characterizes all
the fractions which are equal to finite decimals.

Theorem 2 If the denominator of a fraction is of the form 2a 5b , where a and b


are whole numbers, then it is equal to a finite decimal. Conversely, if a reduced frac-
tion m
n
is equal to a finite decimal, then the prime decomposition of the denominator
contains no primes other than 2 and 5.

Note that the second part of the theorem is clearly false if m n


is not reduced. For
example, 63 = 0.5, but the prime decomposition of 6 contains a 3. It is also good to
recall that a finite decimal is just a fraction whose denominator is a power of 10. It
will be apparent from the proof how important it is to have such a clear-cut definition
of a decimal.
We first prove that if the prime decomposition of the denominator n contains no
primes other than 2 and 5, then m n
is equal to a finite decimal. The idea of the proof
27
is so simple that an example would give it away: since 160 = 25 5, the fraction 160
is equal to the decimal 0.16875 because, by equivalent fractions,
27 27 27 54 16875
= 5 = 5 4
= ,
160 2 5 2 55 105
which by definition is 0.16875. In general, if n = 2a 5b , where a, b are whole numbers,
we may assume without loss of generality that a < b. Then

m m 2ba m 2ba m
= a b = ba a b =
n 2 5 2 2 5 10b
and the last is a finite decimal.
Conversely, suppose m n
is a reduced fraction which is equal to a finite decimal:

m k
= b
n 10
where k, b are whole numbers. We have to show that no prime other than 2 and 5
divides n. By the cross-multiplication algorithm, nk = m10b . Since m n
is reduced, n
b
is relatively prime to m. Since n divides nk, it divides m10 as well. The Key Lemma
shows that n must divide 10b , which is 2b 5b . Therefore, 2b 5b = n` for some whole
number `. By the uniqueness of the prime decomposition, the primes on the right

163
consists of only 2s and 5s. Thus n = 2a 5c , where a and c are whole numbers b.
The theorem is proved.

At this point, we can take a look at the question of whether the rational numbers
are sufficient for doing arithmetic. The following theorem implies that they are not,
because the square roots of many whole numbers cannot be rational numbers. For
the statement of the theorem, a perfect square is a whole number which is equal to
the square of another whole number. Thus the first few perfect squares are 1, 4, 9,
16, 25, 36, . . . .

Theorem 3 If a whole number n is not a perfect square, then there is no rational


number r so that r2 = n.

Proof Let the prime decomposition of n be expressed as a product of powers of


distinct primes. (For example, 72 = 23 32 , 3375 = 33 53 , etc.) Consider the case
where n is the product of powers of three distinct primes: n = pa1 pb2 pc3 , where a, b,
c are nonzero whole numbers, and p1 , p2 , p3 are primes not equal to each other. The
reasoning for this special case will be perfectly general, and by limiting ourselves to
three primes, we save ourselves from some horrendous notation. If a, b, and c are
all even, let a = 2, b = 2 and c = 2 for some whole numbers , and . Then
n = (p1 p2 p3 )2 , contradicting the hypothesis that n is not a perfect square. Therefore
at least one of a, b, and c is odd, let us say, a = 2k + 1 for some whole number k.
Thus n = p2k+1 1 pb2 pc3 .
A
Suppose there is some rational number r so that r2 = n. Let r = B , where A and
B are whole numbers. Then
A2
= n = p2k+1
1 pb2 pc3 ,
B2
which implies that
A2 = B 2 p2k+1
1 pb2 pc3 .
By FTA, there are exactly the same number of p1 s on the left as on the right. We
claim that the number of p1 s on the right is odd. Indeed, if the prime decomposition
of B contains a p1 , then B 2 contains an even number of p1 s. There are of course
2k + 1 p1 s in p2k+1
1 , and there are no p1 s in either pb2 or pc3 because the three ps are

164
distinct. Therefore there are an odd number of p1 s on the right, as claimed. But on
the left, it is A2 . If there are j p1 s in the prime decomposition of A, where j is any
whole number, then there are 2j p1 s in A2 . In any case, the number of p1 s on the
left has to be even, which is a contradiction. Thus there is no such r Q.

A number, i.e., a point on the number line, is formally called a real number.
A real number is said to be irrational if it does not lie in Q. At this point, we do
not know if there is any irrational number or not because there is the possibility that
every real number is rational, or equivalently, the number line consists entirely of the
numbers in Q. While the preceding theorem says that the square root of many whole
numbers is not a rational number, it does not by itself say that these square roots can
be found among the real numbers. The fact that given any whole number n, there
is a real number t so that t2 makes sense and t2 = n will require a different kind of
discussion, one that involves limits. This will be done in Chapter 16.

Exercises 3.2

1. Without using the Fundamental Theorem of Arithmetic, give a direct, self-


contained proof of why the prime decomposition of 455 (= 5 7 13) is unique.

2. Given two positive integers a and b. Prove that if their GCD is k, then the two
positive integers ka and kb are relatively prime.

3. Let a, b, c be positive integers. Prove that if a is relatively prime to b and both a


and b divide c, then ab also divides c.

4. Define the least common multiple (LCM) of two whole numbers a and b to
be the smallest whole number m so that m is a multiple of both a and b. (a) If
a = p2 q 7 r3 and b = p6 qs4 , where p, q, r, s denote distinct primes, what are the GCD
and LCM of a and b in terms of p, q, r, and s? (b) If k is the GCD of a and b, and
m is their LCM, prove that mk = ab.

5. There are consecutive odd numbers which are primes, e.g., 5 and 7, 51 and 53, 101

165
and 103, etc. An example of three consecutive odd numbers which are primes is 3, 5,
and 7. Are there other examples of three consecutive odd numbers which are primes?

6. A whole number which is the n-th power of another whole number is called a
perfect n-th power. Prove that if a whole number k is not a perfect n-th power,
there is no rational number whose n-th power is equal to k.

166
Chapter 4: Basic Isometries and Congruence

1 The basic vocabulary (p. 168)


2 Transformations of the plane (p. 192)
3 The basic isometries (p. 204)
4 Congruence (p. 218)
5 A brief pedagogical discussion (p. 226)

167
In this chapter, we begin a preliminary study of the geometry of the plane. The
goal is to achieve at least a working knowledge of the meaning of congruence. In
middle school materials, congruence is nothing but same size and same shape. To
the extent that this is not a mathematically acceptable definition, we are forced to
conclude that most middle school students do not have a precise conception of con-
gruence. In high school geometry courses, congruence tends to become synonymous
with triangle congruence, and such a turn of events again leaves unexplained the
general concept of congruence between curved figures.
In this chapter, we shall define congruence in terms of rotations, translations, and
reflections in the plane. We will therefore begin with an explanation of these three
concepts. We need to explain congruence only because we need to explain similarity,
and the reason for the latter is that we need some knowledge of similar triangles to
make sense of the discussion of the slope of a line in beginning algebra. For this reason,
we adopt a strictly utilitarian attitude toward the proofs of geometric theorems. We
prove only those theorems that are indispensable for the understanding of straight
lines. A more systematic study of geometry must await Chapter 11.
School geometry is the analytic and symbolic study of our visual perception of the
space around us. In order to faithfully capture spatial information, we need to use very
precise language. This is why we will be spending a lot of time on definitions in this
and the next chapters. We will give new definitions of many concepts already familiar
to you, such as half-planes, angles, convex sets, rectangles, polygons, etc.
There is an inherent danger here that, because these terms are so familiar to you,
you would take the new definitions for granted and ignore them. Youd better
not, because in an overwhelming majority of the cases, the new definitions are more
precise than the ones you already know. Please pay special attention to the higher
level of precision. To give an example, you may know a rectangle as a quadrilateral
with four right angles and two pairs of equal opposite sides, but here there will be
two surprises. One, a rectangle is merely a quadrilateral with four right angles but
there is no mention about equal sides, and two, you are going to get a dire warning
that a priori we have no idea whether there are any rectangles or not. Eventually we
will prove that rectangles do exist and that, indeed, they have equal opposite sides.
Please be aware of these new elements when we begin this tour of geometry.
Having pointed out the importance of precise geometric definitions, we are obli-

168
gated to also emphasize that the goal of geometry is not to study the precise definitions
per se, but to understand the visual information encoded in the definitions. The sub-
sequent discussions will amply bear out this point.

Unlike the preceding three chapters, which present school mathematics at a level
quite close to what can be taught in the school classroom, the presentation of this
and the next chapter is at a somewhat higher level than what is found at present in
the school curriculum. The purpose of this chapter is to advocate a reform of the
school geometry curriculum, so that, very soon, it will be aligned with the materials
of this chapter. Part of the reason for the present discrepancy between the two lies
in the failure of the school geometry curriculum to meet the intrinsic requirement
of the mathematics at hand in terms of clear definitions, reasoning, and coherence.
The pedagogical comments in 5 below put this situation in some perspective. You
might wish to glance over this section first before you begin this chapter. It is also
suggested that you skim through Chapter 13 right now to get an idea of the overall
picture concerning geometry education in the schools.

1 The basic vocabulary

By a line, we mean a straight line. We assume you know what a straight line is,
but will nevertheless explicitly point out that it is understood to be infinite in both
directions. We will be looking at lines lying in a fixed plane, so we will begin with a
precise enunciation of what we assume to be known about lines in a plane.
There are six such assumptions, (L1)(L6). You will undoubtedly agree that every
single one of them is perfectly obvious, and that the only reason we enunciate them
is so that we all will have a common starting point.

(L1) Through two distinct points passes a unique line.

If L is the line passing through the points A and B, we say L joins A to B, and
if there is any fear of confusion, we will write LAB for L. It follows from (L1) that:

(L2) Two distinct lines intersect either at one point or none at all.

169
This last statement is not completely satisfactory because it needs to be supple-
mented by a more precise statement about when two lines will intersect and when
they will not. With this in mind, this is the right place to introduce one of the key
definitions of plane geometry.

Definition Two lines in the plane are said to be parallel if they do not intersect.

In symbols, if two lines L and L0 are parallel, we write L k L0 . The following


statement now completely clarifies the situation. It turns out to have profound im-
plications in geometry as well as in the development of mathematics as a whole.

(L3) (Parallel Postulate) Given a line L and a point P not on L but lying in the
same plane, there is exactly one line in the plane passing through P which is parallel
to L.

In other words, we assume as obvious that in the plane that we normally work
with, for a point P not on a line L, every line containing P intersects L except for
one line. You will see that the Parallel Postulate dominates the discussion of plane
geometry.

If A and B are points on a line L, denote by AB the collection of all the points
on L between A and B, together with the points A and B themselves. We call AB
the line segment, or more simply the segment joining A and B, and the points A
and B are called the endpoints of the segment AB. Note that it makes sense to talk
about points on L between A and B, because L may be regarded as a number line
and A and B then become numbers, e.g., A = 0 and B > 0. In that case, the points
between A and B would be all the numbers x so that 0 < x < B.

AB
A x B

The next property of a line is most likely not one you have encountered in math-
ematical discussions thus far, possibly because it is deemed too obvious to be men-

170
tioned.

(L4) (Line separation) A point P on a line L separates L into two non-empty


subsets L+ and L , called half-lines, so that

(i) The line L is the disjoint union of L+ , {P }, and L , in the sense that
every point of L is in one and only one of these three sets (in particular,
the three sets L+ , {P }, and L are disjoint).
(ii) If two points A, B belong to the same half-line, then the line segment
AB does not contain P ,

P
P A B

(iii) If two points A and B belong to different half-lines, then the line
segment AB contains P .

A P
P B

This property of the line is best understood from the perspective that every line
is a number line. Thus if we have a point P on a line L , as shown, then letting P be
0, we may let L+ be all the positive numbers (those > 0), and L be all the negative
numbers (those < 0). In this case, if A L and B L+ , then one sees that the
segment AB must contain P (= 0).

P
L
A B

Of course, if A and B are both positive or both negative numbers (so that A, B L+
or A, B L ), AB would not contain P .
The preceding identification of a line with a number line is of course the reason
for the notation L+ and L . One should not get the idea that, as a consequence of
this fact, one should always think of L as the half-line on the left and L+ as the
half-line on the right. This only works if the line is horizontal (and we have yet to

171
make formal sense of this terminology). For a line that is vertical or almost so, it
would be impossible to carry out this analogy.
The union of either half-line, L+ or L , with P is called a ray. We also say these
are rays issuing from P . If we want to specifically refer to the ray containing A, we
use the symbol RP A . Similarly, the ray containing B issuing from P is denoted by
RP B . The point P is the vertex of either ray. The two rays have only the point P
in common, and each ray is infinite in only one direction.
If a point P lies on a line L and A, B are any two points on L not equal to
P , then (L4) tells us how to determine whether two distinct points A and B are on
opposite half-lines of L relative to P : they are on opposite half-lines if and only if
AB contains P , and are in the same half-line if and only if AB does not contain P .
This observation will be used frequently below.

You may wonder why, if a point P on a line L is given, we dont just identify
L with a number line and P with 0 so that, once that is done, we wont even have
to bother mentioning something as obvious as (L4). One reason is that if each time
we see a line, we have to make clear what the identification is (i.e., which direction
is positive), it gets tedious. And if we consider several lines all at once? Then you
begin to worry about how to make the identifications in a nice consistent way,
which then becomes a nonessential distraction. So what (L4) does is to provide a
direct description of the separation of a line by a point lying in it independent of any
identification with a number line. The same idea lies behind the next property of a
line, which describes the relationship between a line and the plane containing it.

(L5) (Plane Separation) A line L divides the plane into two non-empty subsets,
L and R, called half-planes. These half-planes have the following properties:

(i) The plane is the disjoint union of L, L, and R.


(ii) If two points A and B in the plane belong to the same half-plane, then
the line segment AB does not intersect L.

172


 B


A 


L

(iii) If two points A and B in the plane belong to different half-planes,


then the line segment AB must intersect the line L.




r
A  B


L

A comment about (L5) is in order. Clearly, one would prefer a more explicit de-
scription of the half-planes of a line. After all, if a line is drawn on a piece of paper
or on a black board, one can point to the two halves of the plane separated by the
line. In a middle school classroom, this is what you should do without a doubt: just
point to the half-planes and not burden these students with abstract statements like
(L5). But as a teacher, you should learn to appreciate the difficulty of transcribing
the visual information into precise and (in this case) abstract language. Without
waving our hands about what is on the left or what is on the right, we learn
to use properties (i)(iii) above to pin down precisely what these half-planes are.33
Although they are non-intuitive, (i)(iii) nevertheless leave no doubt that each half-
plane is exactly what our intuition says it is: all the points on one side of L. More
precisely, suppose we are given two points A and B in the plane, neither lying on L
and they lie on opposite sides of L in the sense that the segment AB intersects L.
Then according to (ii), A and B lie in different half-planes. Moreover, (iii) tells us
that if two points C and D are in the same half-plane, then they do not lie on oppo-
site sides of L in the above intuitive sense. Without more information about a line,
this is all we can do about its half-planes. However, once we have coordinates and we
can describe a line by an equation, then we will be able to describe the half-planes of
33
We will use the same abstract idea once more at the end of this section for the statement of the
theorem about polygons.

173
a line explicitly (see Theorem 3 in 3 of Chapter 7).

The union of either L or R with L is called a closed half-plane.

In the normal way of drawing a plane, if the line L goes left-right, then we will
continue to refer to it informally as horizontal, and the half-planes are referred to as
the upper half-plane and the lower half-plane. If L goes up-down, then it will continue
to be referred to as a vertical line and the half-planes are the left half-plane and the
right half-plane. All this will be made more precise in 3 of Chapter 7.

The last property of lines that we assume to be known cannot be stated until we
have the (always troublesome) concept of an angle. To this end, we first introduce the
concept of convexity. A subset R in a plane is called convex if given any two points
A, B in R, the segment AB lies completely in R.34 The definition has the obvious
advantage of being simple to use, so the concern with this definition is whether or not
it captures the intuitive feeling of convexity. Through applications, you will see
that it does. Every line and the plane itself are of course convex. The half-lines and
half-planes are convex, by virtue of the properties (L4) and (L5) above, respectively
(exercise). It is also an elementary exercise to show that each closed half-plane is
convex. Many common figures, such as the interior of a triangle or a rectangle or
a circle, once they have been properly defined, will also be seen to be convex. The
following shaded subsets of the plane are, however, not convex, because, visibly, the
segment AB of the points A and B in each region does not lie entirely in the respective
shaded subset itself.
11111111111111111111111111
00000000000000000000000000
00000000000000000000000000
11111111111111111111111111
A.
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
B.
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111

34
This definition is in fact valid in any dimension.

174
11111111111111111111111111111111111111
00000000000000000000000000000000000000
00000000000000000000000000000000000000
11111111111111111111111111111111111111
00000000000000000000000000000000000000
11111111111111111111111111111111111111
00000000000000000000000000000000000000
11111111111111111111111111111111111111
00000000000000000000000000000000000000
11111111111111111111111111111111111111
00000000000000000000000000000000000000
11111111111111111111111111111111111111
B. A.
00000000000000000000000000000000000000
11111111111111111111111111111111111111
00000000000000000000000000000000000000
11111111111111111111111111111111111111
00000000000000000000000000000000000000
11111111111111111111111111111111111111
00000000000000000000000000000000000000
11111111111111111111111111111111111111
The intersection of two half-planes or two closed half-planes is easily seen to be convex
(see problem 2 in Exercises 4.1 for a more comprehensive statement). This fact,
together with the concepts of rays and half-planes enter into the following definition
of the concept of an angle.
Given three points O, A, B in the plane which are not collinear, i.e., do not lie
on a line, let ROA , ROB be two rays issuing from O. These rays determine two subsets
of the plane. One of them is the intersection of the following two closed half-planes:

the closed half-plane of the line LOA containing B, and


the closed half-plane of the line LOB containing A.

By the observation above, this is a convex set, and is suggested by the shaded set in
the following figure.

A
11111111111111111111111111111111111111111
00000000000000000000000000000000000000000
00000000000000000000000000000000000000000
11111111111111111111111111111111111111111
00000000000000000000000000000000000000000
11111111111111111111111111111111111111111
00000000000000000000000000000000000000000
11111111111111111111111111111111111111111
00000000000000000000000000000000000000000
11111111111111111111111111111111111111111
O 11111111111111111111111111111111111111111
00000000000000000000000000000000000000000
00000000000000000000000000000000000000000
11111111111111111111111111111111111111111
00000000000000000000000000000000000000000
11111111111111111111111111111111111111111
00000000000000000000000000000000000000000
11111111111111111111111111111111111111111
00000000000000000000000000000000000000000
11111111111111111111111111111111111111111
00000000000000000000000000000000000000000B
11111111111111111111111111111111111111111
The other subset determined by ROA and ROB is the union of the complement35 of
the shaded set together with the two rays ROA and ROB . This is suggested by the
unshaded set in the above figure, which is visibly not convex . Then either the convex
or the nonconvex subset determined by these two rays ROA and ROB is called the
angle determined by these rays. These rays are called the sides of the angles,
and the point O is the vertex of the angle. We emphasize that, in this book, an angle
is always one of the subsets of the plane between the two rays rather than just the
35
The complement of a set S in the plane is by definition the set of all the points in the plane
not lying in S.

175
union of the two rays themselves. Unless stated otherwise, we follow the standard
practice of taking the convex subset (the shaded subset) to be the angle and denote it
by 6 AOB . If we want to consider the angle determined by the non-convex subset,
we would have to say so explicitly or use an arc to so indicate, e.g.,

O B

A better notation, one that will be used often, is to use an arc and a letter in the
region to indicate the angle. Thus 6 b denotes the nonconvex region below, while
6 c denotes the convex region.

A A

O c
O
b B B

Assuming you know what a triangle is (a precise definition will be given presently),
this way of denoting angles is especially relevant in the case of a triangle ABC. In
this case, since the angles 6 ABC, 6 ACB, and 6 BAC are understood to be convex,
each must contain triangle ABC itself, and we usually let 6 A to stand for 6 BAC,
6 B for 6 ABC, and 6 C for 6 ACB.

A
c
 cc
 c
 c
 c
 c
c
 c
B  cC

176
If O, A, B are collinear, then either A and B are in the same half-line with respect
to O, or in opposite half-lines. In the former case, we call it a zero angle. In the
latter case, either half-plane determined by the line containing these points will be
called a straight angle.

A O B
01
00000000000000000000000000000000000000000000
11111111111111111111111111111111111111111111

The last assumption on the properties of lines will now be stated:

(L6) (Crossbar Axiom) Given angle AOB, then for any point C in 6 AOB, the
ray ROC intersects the segment AB (indicated as point D in the following figure).

A 



@
O `
``` @D C
``` @
``` @
``@
```
B

You may regard the Crossbar Axiom as frivolous, because what else can the ray
ROC do? First of all, so long as you consider this statement to be obvious, then our
objective of agreeing on a common starting point has been reached. As to whether
the Crossbar Axiom is frivolous, we should point out that up to this point, none of
(L1)(L5) guarantees that the ray ROC must intersect AB. The purpose of the Cross-
bar Axiom is therefore to firm up the intuitive idea that a ray is indeed straight
and therefore must meet a segment that is in front of it. For example, if we assume
for a moment what the angle bisector of an angle is (a concept that will be defined
presently), then (L6) guarantees that the angle bisector of an angle in a triangle must
intersect the opposite side. This is reassuring.

We note that in a strictly logical development of plane (Euclidean) geometry, the


Crossbar Axiom can be deduced from the Plane Separation property (L5). See page 116
of M. J. Greenberg, Euclidean and Non-Euclidean Geometry, 4th Edition, W. H. Free-
man, 2008. However the proof is too technical to be of any educational value to most

177
prospective teachers.

The next basic concept we need is that of a polygon. Intuitively, we do not want
the following figure on the left to be called a polygon because it crosses itself,
and we do not want the following figure on the right to be a polygon either because
it doesnt close up.

S
C P
CS  PPP
C S 
C S  C
C  C
C  C
 C  C

 C

It is clear that the definition of a polygon requires some care. We first define a
special case of a polygon, the hexagon, which is by definition a geometric figure
(i.e., a subset of the plane) consisting of six points A, B, C, D, E, F in the plane
together with the six segments

AB, BC, CD, DE, EF , and F A,

so that none of them intersects each other except at the endpoints as indicated, i.e.,
CD intersects DE at D, DE intersects EF at E, etc.

C

B  AA D
PP

P
E

A
F

The six points A, B, . . . , F , are called the vertices of the hexagon and the six
segments AB, BC, . . . , F A are its sides or edges. Notice that by its very defini-
tion, a hexagon labels its vertices cyclically in the sense that its sides connect all of
them in alphabetical order until the very end, when the last vertex F is connected to
the first vertex A.

178
Now that we have defined a hexagon, we want to define a polygon of any number
of sides (or vertices, for that matter). Then we come up against a problem with
notation: for six vertices, we can employ A, . . . , F , but if we have a polygon with
234 vertices, what symbols should we employ to denote these vertices? We can use
numbers instead of letters to denote the vertices, in which case, we can go from 1, 2,
. . . all the way to 234. But because integers come up in so many contexts, sooner or
later this would lead to hopeless notational confusion. We are therefore forced into
using subscripts: we can efficiently denote the 234 vertices by the 234 symbols A1 ,
A2 , A3 , . . . , A233 , A234 . Of course we could have used any letter, say V , instead of A
for this purpose, e.g., V1 , V2 , V3 , . . . , V233 , V234 .

We can now give the general definition of a polygon and related concepts.
Let n be any positive integer 3. An n-sided polygon (or more simply an
n-gon) is by definition a geometric figure consisting of n distinct points A1 , A2 , . . . ,
An in the plane, together with the n segments A1 A2 , A2 A3 , . . . , An1 An , An A1 so
that none of these segments intersects each other except at the endpoints as indicated,
i.e., A1 A2 intersects A2 A3 at A2 , A2 A3 intersects A3 A4 at A3 , etc. In symbols: the
polygon will be denoted by A1 A2 An . If n = 3, the polygon is called a triangle;
n = 4, a quadrilateral; n = 5, a pentagon; and if n = 6, a hexagon, as we have
seen. These names came to us from Euclids Elements, and in principle there is a
name for every n-gon. For example, if n = 7 the polygon is called a heptagon and
if n = 10, a decagon. But such extra erudition is not necessary since 7-gon and
10-gon, respectively, would do just fine. So unless stated otherwise, we will use the
Greek names only for the first six polygons.
Given polygon A1 A2 An , as in the earlier case of the hexagon, the Ai s are
called the vertices and the segments A1 A2 , A2 A3 , etc. the edges or sometimes the
sides. For each Ai , both Ai1 and Ai+1 are called its adjacent vertices (except
that in the case of A1 , its adjacent vertices are An and A2 , and in the case of An ,
its adjacent vertices are A1 and An1 ). Thus the sides of a polygon are exactly the
segments joining adjacent vertices. Any line segment joining two nonadjacent vertices
is called a diagonal.
The best way to remember the notation associated with a polygon is to think of

179
the points A1 , A2 , . . . , An as being placed consecutively on a circle,36 for example, in
clockwise (or counterclockwise) direction:

A1
An A2

An1 A3

A4
.
. .
.

Then it is quite clear from this arrangement whether or not two vertices are
adjacent.
The following are examples of polygons (with the labeling of the Ai s omitted):

  
 E  E  A
E A E AP
E A E P P
E E
E E
E E

We next address the issue of measurement: how to measure the length of segments
and degrees of angles.
We begin with length. When we dealt with the number line, we could choose any
segment to be the unit segment, i.e., we could declare any length to be 1. That is
because if we only deal with one line, such a decision affects only what is done on
that line. Now that we have to deal with the plane which has many lines, the choice
of a segment of unit length on one line will have to be consistent with the choices on
other lines in order to make possible the discussion of length in the plane.
36
Note that, here, we are using the concept of a circle in an informal way. The formal definition
will be given later.

180
Assume that we can decree one choice of a unit segment, once
and for all, on all the lines, and consider this done.

Then every line can be considered to be the number line and every segment AB on
a line L now has an unambiguous length, to be denoted by |AB|. This has many
implications. The first one is that all that we have learned about the number line can
now be transferred to each line in the plane. For example, let A and B be points on
a line L, and suppose |AB| = r for some positive number r. If we consider L as the
number line, and take A to be 0 with B in the positive direction, then the segment
AB coincides exactly with the segment [0, r]. Note that this also implies there is
another point on L, to be called B 0 , so that B and B 0 are on different rays issuing
from A and |AB| = |AB 0 | . Indeed, simply take B 0 to be the point corresponding to
r when L is regarded as the number line as above.
(To anticipate a future development, the ability to look at each line as the number
line is one reason that we can set up coordinate axes in the plane.)
Another consequence of the universal adoption of a unit length for all lines in the
plane is the possibility of defining the concept of distance between any two points
A and B in the plane, denoted by dist(A, B). It is by definition the length of the
segment between A and B on the line LAB passing through A and B. Clearly:

(D1) dist(A, B) = dist(B, A).


(D2) dist(A, B) 0, and dist(A, B) = 0 A and B coincide.
(D3) If A, B, C are collinear points, and C is between A and B, then

dist(A, B) = dist(A, C) + dist(C, B)

Note that (D2) implies that if we want to prove A = B, all we have to do is to prove
that dist(A, B) = 0. This may sound trivial, but it will turn out to be useful.
There is one more fact about distance that is equally basic but perhaps less obvi-
ous. We point it out now but will not use it until we get to Chapter 11.

(D4) (Triangle inequality) For any three points A, B, C, it is always


true that dist(A, C) dist(A, B)+dist(B, C). The equality dist(A, C) =
dist(A, B) + dist(B, C) holds exactly when A, B, C are collinear and B
is between A and C.

181
We will prove (D4) in Chapter 11.

With the availability of measurements for line segments, we can now formally
introduce the concept of a circle. Fix a point O. Then the set of all points A in the
plane so that dist(O, A) is a fixed positive constant r is called the circle of radius
r (in ) about O. The point O is the center.
This is the official meaning of the concept of a circle, but in school mathematics,
the word circle is usually used in an undisciplined way. Given a circle of radius r
and center O, then school mathematics also refers to either of the following sets as
the circle of radius r and center O:

all the points A satisfying dist(A, O) r,

and,

all the points A satisfying dist(A, O) < r.

In advanced mathematics, the former is called the closed disk of radius r about
O, and the latter is called the open disk of radius r about O.37 If there is no
danger of confusion, or if it doesnt matter whether an open or a closed disk is used,
we would simply say disk. When absolute clarity is mandatory, we will make the
distinction between circle and disk in this book.

A circle whose radius is of length 1 is called a unit circle. Using a unit circle, we
now describe how to measure the size of an angle by assigning it a degree. Divide the
unit circle into 360 parts of equal length,38 360 equal parts for short. The length of
one part is called one degree. Then we can subdivide a degree into n equal parts
(where n is any whole number), thereby obtaining n1 of a degree, etc. It is exactly the
same as the division of the chosen unit on a number line into fractions, except that
in this case, we have a circular number line and, once a point has been chosen to
be 0, the number 360 coincides with 0 again. A single (connected) piece of a circle is
called an arc.
37
This is the standard way to use the words open and closed in advanced mathematics.
38
There is a subtle point in this definition which will be addressed in Chapter 18 when we discuss
length and area. It has to do with the fact that equal parts here refer to arcs of equal length,
but we have yet to define the concepts of arc and length of an arc. There is no fear of circular
reasoning, however, because an arc and the length of an arc can be defined independently.

182
Now suppose 6 AOB is given, then it intercepts an arc on the unit circle around
O (equivalently, this arc is the intersection of 6 AOB with the unit circle around O).
For the sake of notational simplicity, let us assume that both A and B are points on
the unit circle around O. In the picture below, B is in the clockwise direction from A,
but if B happens to be in the counterclockwise direction from A, then the following
discussion will have to be adjusted accordingly.

xo

O A

As in the case of the number line, we are free to choose A or B (or in fact any point
on this unit circle) as the 0 of this circular number line; let us say A for definiteness.
Because the arc intercepted by 6 AOB on this unit circle is in the clockwise direction
from A, we chose the point on the unit circle, which is 1 degree from A and in the
clockwise direction from A, to be the unit 1 of this circular number line. Then the
degrees 1, 2, 3, . . . , 359, 360 go around the unit circle in a clockwise direction until
360 comes back to 0 (which is A). Now, on this circular number line, whose whole
numbers up to 359 increase in the clockwise direction, B has a numerical value x.
Then we say 6 AOB is x degrees or that its measure or magnitude is x degrees,
and we write |6 AOB| = x , where the small circle in the superscript position
indicates that we are using degree as the unit. Thus by definition, degree is a number
0 and 360. For now, if we measure degrees in the clockwise direction (as
we just did), we would explicitly say so, but if we measure in the counterclockwise
direction we can do that too. In 3, we will adopt a more symbolic method of saying
whether we want to measure angles in the clockwise or counterclockwise direction.
(There is another unit of measurement for measuring angles, called radian, that
is more commonly used in advanced mathematics for a special reason. This will be
explained in the appendix of Chapter 15, but for now, degree is perfectly fine as a
unit.)

183
Notice that the method of angle measurement we have just described is exactly
the principle used in the construction of the protractor.
Notice also the following:

If B and B 0 are two points lying in the same half-plane of LOA and
|6 AOB| = |6 AOB 0 |, then the rays ROB and ROB 0 coincide. (This is
because these rays must intersect the unit circle around the point O at
exactly the same point.)
If C is a point in 6 AOB, then

|6 AOC| + |6 COB| = |6 AOB|

(This is because, if the ray ROC intersects the unit circle around O at a
point C 0 , as shown in the figure below, then the degrees of the arcs (on
the unit circle) from A to C 0 and from C 0 to B add up to the degree of
the arc from A to B.)

A
C C

O B

An angle is a zero angle exactly when it is 0 , and is a straight angle exactly when
it is 180 . An angle of 90 is called a right angle. An angle is acute if it is less than
90 , and is obtuse if it is greater than 90 . There are analogs of these names for
triangles, namely, a triangle is called a right triangle if one of its angles is a right
angle, an acute triangle if all of its angles are acute, and an obtuse triangle if (at
least) one of its angles is obtuse. (We will see in Chapter 11 that a triangle cannot
have more than one obtuse angle or more than one right angle.)
We observe that our convention of taking every angle to be convex unless other-
wise specified amounts to saying that, without statement to the contrary, an angle is

184
at most 180 .

Let two lines meet at O, and suppose one of the four angles, say 6 AOB as shown,
is a right angle. A

q
B0 B
O

A0

Then we claim that all the remaining angles are also right angles, i.e., |6 BOA0 | =
|6 A0 OB 0 | = |6 B 0 OA| = 90 . This is because |6 BOA0 | = |6 AOA0 | |6 AOB| =
180 90 = 90 . Similarly, the remaining two angles are also 90 . It follows that
when two lines meet and if any one of the four angles so produced is a right angle,
then it is unambiguous to say that the two lines are perpendicular. In symbols:
LAO LOB in the notation of the preceding figure, although it is equally common
to write instead, AO OB. A ray ROC in an angle39 AOB is called an angle
bisector of 6 AOB if |6 AOC| = |6 COB|. Sometimes we also say less precisely that
the line LOC (rather than the ray ROC ) bisects the angle AOB.
A




O
PP
P C
PP
P PP
P
B

It is clear that an angle has one and only one angle bisector. Therefore if CO AB
as shown below, then CO is the unique angle bisector of the straight angle 6 AOB.

A O B

39
Recall, an angle is a region.

185
We thus have:

Let L be a line and O a point on L. Then there is one and only one line
passing through O and perpendicular to L.

With the availability of measurements for both angles and line segments, we can
complete the list of standard definitions. If AB is a segment, then the point C in
AB so that |AC| = |CB| is called the midpoint of AB. Analogous to the angle
bisector, the perpendicular bisector of a segment AB is the line perpendicular to
LAB and passing through the midpoint of AB. It follows from the uniqueness of the
line perpendicular to a line passing through a given point that there is one and only
one perpendicular bisector of a segment.
We now introduce some common names for certain triangles and quadrilaterals.
An equilateral triangle is a triangle with three sides of the same length, and
an isosceles triangle is one with at least two sides of the same length. (Thus by
our definition, an equilateral triangle is isosceles.) A quadrilateral all of whose angles
are right angles is called a rectangle. A rectangle all of whose sides are of the same
length is called a square. Be aware that at this point, we do not know whether there
is a square or not, or worse, whether there is a rectangle or not. (If it is the case
that the sum of (the degrees) of the four angles of quadrilateral is 361 , then clearly
no rectangle can exist, much less a square.) A quadrilateral with at least one pair of
opposite sides that are parallel is called a trapezoid. A trapezoid with two pairs of
parallel opposite sides is called a parallelogram. A quadrilateral with four sides of
equal length is called a rhombus. We shall prove in Chapter 11, 4, that rhombi are
parallelograms.
There is a debate in school mathematics on whether one should define an isosceles
triangle to have exactly two sides of equal length, a rectangle to be a quadrilateral
with four right angles but with at least two unequal adjacent sides, or a trapezoid
to be a quadrilateral with exactly one pair of parallel sides. Because mathematicians
prefer equilateral triangles to be special cases of isosceles triangles, squares to be
special cases of rectangles, and parallelograms to be special cases of trapezoids, we
believe school mathematics should simply adopt this convention.

In the above catalog of names for polygons, we all know that equilateral triangles

186
and squares are special; they are examples of regular polygons. It turns out that
the precise definition of the latter concept is somewhat subtle and requires some
preparations. We take them up next.
We already mentioned the fact that school mathematics conflates a closed disk of
radius r and center O with the circle of the same radius and center. Thus when school
mathematics talks about the area of the circle of radius r around O, what it means
is actually the area of the closed disk of radius r around O. There is little hope of
forcing a change of terminology in school mathematics at this late date, so just grin
and bear it. Nevertheless, we want to create a mathematical framework in which the
difference between a circle and a disk of the same radius and center can be carefully
analyzed when the need arises. In everyday language, we normally refer to a circle
as the boundary of the closed disk with the same radius and center. This way of
talking about a circle and its associated disk is both unambiguous and convenient,
so there is no reason not to adopt it in mathematics provided we can make precise
sense of boundary. This we do now. Let S be a subset of the plane . A point B
is a boundary point of S if every disk40 centered at B, of positive radius, contains
a point in S and a point not in S. By definition, the boundary of S is the set of all
the boundary points of S. We leave as an exercise the verification that the circle of
radius r (r > 0) about O is the boundary of the closed disk of radius r about O as
well as the boundary of the open disk of radius r about O. It is also easy to check
that if S is a segment AB, then the boundary of AB is in fact AB itself.
Although both the open disk and the closed disk (with the same radius and cen-
ter) have a common boundary, namely, the circle with the same radius and the same
center, it is much more convenient for our purpose to use a set which contains its
boundary. A set that contains its boundary is called a closed set. Thus a closed
disk is a closed set, whereas an open disk is never a closed set because it does not
contain any of its boundary points. We will refer to the closed disk D with radius r
and center O as the closed set inside the circle C with the same center and radius;
sometimes we say more simply that D is the inside of C. Now D is not the only
closed set with C as its boundary. Indeed, let E be the outside of C, where E is
the set of all the points P so that |P O| r. Then the boundary of E is also C
(exercise). Naturally we are more interested in D rather than in E. To distinguish
40
It doesnt matter whether it is closed or open.

187
between the two sets, we introduce the following concept: a set S in the plane is said
to be bounded if it is contained in some closed disk of radius R. We can understand
boundedness from a slightly different point of view, as follows.

Lemma A set S in the plane is bounded if and only if there is a point O0 in the
plane and a positive number R so that the distance of every point in S from O0 is R.

Proof First let S be bounded. So it is contained in a closed disk A of radius R.


If the center of A is O0 , then by definition of A, the distance of each point of S from
O0 is R. Conversely, if a set S has the property that for some fixed positive number
R, every point of S is of distance R from a fixed point O0 in the plane, then S is
bounded because S is then contained in the closed disk of radius R centered at O0 .
The proof is complete.

To return to the unit circle D, we see that, in terms of this terminology, the inside
of D is bounded but the outside of D is not.
We have discussed the situation of the circle in such detail because it sets up a
model for the discussion of polygons. The ambiguity in the use of the word circle
just described in fact spills over to triangle, quadrilateral, and in general, poly-
gon. For example, a triangle is just the union of the three segments consisting of the
three sides, but when we speak of, e.g., the area of the triangle, we certainly do not
mean the area of the three segments but rather the area of the set inside those
three segments, where the meaning of inside is understood in an intuitive sense. We
now try to give clarity to this situation by drawing a parallel to the situation of a
circle. First, we have:

Theorem A polygon P divides the plane into two non-empty subsets, B and E,
so that:

(i) The plane is the disjoint union of P, B, and E.


(ii) The boundary of each of B and E is exactly P.
(iii) One of B and E is bounded and the other is not.

188
This theorem is reminiscent of the Plane Separation property (L5). From now on,
we will designate the bounded set in the Theorem by B. Then we call the union of
B and P the inside of P or the region enclosed by P and the union of E and
P the outside. By the definition of a closed set, it follows from the Theorem that
both the inside and the outside of a polygon are closed sets whose boundaries are just
P. If we think of the polygon P as a circle, then these definitions of inside and
outside coincide with those given earlier. We will not give a proof of this theorem
to avoid getting side-tracked; an essentially complete proof is given on pp. 267269 of
R. Courant and H. Robbins, What is Mathematics? Oxford University Press, 1941.41
One can easily believe this theorem from looking at a few pictures; the shaded set in
each of the following is the inside of the polygon in question.

From the Theorem, a polygon is the boundary of the inside of the polygon, which
is a closed bounded set. This motivates the following definition.

Definition A polygonal region is a closed bounded set whose boundary is a


polygon.

Thus every polygon is the boundary of a polygonal region. When school math-
ematics talks about the area of a polygon, it actually means the area of the
polygonal region inside the polygon.

41
This theorem is a very simple case of the famous Jordan Curve Theorem when the curve in
question is a polygon. One can find a relatively elementary proof of the latter in Michael Henle, A
Combinatorial Introduction to Topology, Freeman, San Francisco, 1979. Incidentally, the Courant-
Robbins volume is highly recommended as a general introduction to advanced mathematics.

189
Finally, we are in a position to define regular polygons. We say a polygon is
convex if the polygonal region enclosed by the polygon is a convex set. A regular
polygon is a convex polygon which has the property that its sides are of the same
length and its angles (at the vertices) have the same degrees. We call attention to the
subtlety of this definition: because of our standing convention that an angle is always
automatically taken to be the convex set determined by the two rays, the cross on
the right in the preceding picture of three polygons would be a regular 12-gon if we
do not require convexity in the definition of the latter. On the other hand, if one
is willing to spend the effort, one can define the concept of an interior angle of a
polygon. When that is done, then a regular polygon could be equally well defined as
any polygon whose sides all have equal lengths, and all of its interior angles have the
same degrees.
It will be seen in Chapter 11 that for a triangle (a 3-gon) to be regular, it suffices
to require either the equality of the (lengths of the) sides or the equality of the (de-
grees of the) angles. For this reason, a regular 3-gon is just an equilateral triangle
(which literally means a triangle with equal sides). Moreover, as soon as we can show
that the sum of all the angles of a convex quadrilateral is 360 , it would follow that a
regular 4-gon must be a square. A problem in Exercises 11.8 will show that regular
n-gons exist for any whole number n 3, but this is certainly not obvious.

In conclusion, a pedagogical comment has to be made. We have given the precise


definitions of the boundary of a set and a closed set because they shed light on a
murky corner of school mathematics, and because they are absolutely essential to
the considerations of area in Chapter 18. However, it would not do to make heavy
weather of these definitions in the school classroom because school students have far
more pressing things to learn. The everyday meaning of boundary is good enough
in the school setting. Therefore, this definition is for your own conceptual clarification
as a teacher: to the extent possible, this book tries to convince you at every step that
there is no room for ambiguity in mathematics.

Exercises 4.1

1. Explain clearly why the following figure with five vertices, as indicated, cannot be

190
a polygon:
r
CS
CS
C S
C Sr
Cr 

C
 C
r Cr



2. Prove that the intersection of convex sets is convex. (Caution: The number of sets
here could be infinite.)

3. (a) Suppose we have a finite or an infinite number of convex sets Ci , where i is a


whole number, and suppose that each Ci is contained in the next one, Ci+1 . Prove
that the union of these Ci s is also convex. (Caution: be very clear in your proof.)
(b) Is the union of convex sets convex in general?

4. (a) Prove that half-lines and half-planes are convex. (b) Prove that closed half-
planes are also convex.

5. Explain why, given any three non-collinear points A, B, C, the three segments
AB, BC, CA can never intersect each other except at the endpoints as indicated.
(In other words, take any three noncollinear points, then the union of the segments
joining them is always a polygon.)

6. (a) Suppose we have three distinct lines, L1 , L2 , and L3 , such that L1 k L2 and
L1 k L3 . Prove that the line L2 is parallel to the line L3 , (This problem justifies the
terminology that three lines are parallel.) (b) Let L1 k L2 and let a third line `
be distinct from L1 . Prove that if ` intersects L1 , then it must intersect L2 .

7. Imagine the hands of a clock to be idealized rays emanating from the center of the
clock. What is the angle between the hands at 8:20?42

42
Problem due to Tony Gardiner.

191
8. For a triangle, its triangular region can be precisely defined as the intersection
of its three angles. Show that the triangular region of a triangle is always convex.

9. Given a circle C and a point P on C. A line LP is said to be a tangent to C at


P if LP intersects C exactly at P , i.e., LP C = {P }. Assume that every point
of a circle has a tangent (a fact we will prove in Chapter 11), and furthermore that
the circle always lies entirely in one of the closed half-planes of each tangent. Then
prove that the circular region of a circle is always convex. (Caution: This is not an
easy problem; try to do everything according to the definitions.)

10. Prove that (a) the boundary of a closed disk of radius r centered at a point O
is the circle of radius r centered at O, (b) the boundary of an open disk of radius r
centered at a point O is the circle of radius r centered at O, (c) the boundary of the
outside of the circle of radius r centered at O is the circle itself, (d) the boundary of
a segment is the segment itself.

11. Assume that for any given subset S of the plane, if a segment joins a point in S
and a point not in S, then the segment contains a boundary point of S. (This simple
fact requires the Least Upper Bound Axiom for its proof; see Chapter 16.) Then
prove that any closed bounded set in the plane with a circle C as its boundary is the
closed disk with the same radius and center.

2 Transformations of the plane

Given two segments AB and CD, how can we find out if they have the same length
without measuring them individually? For example, suppose we have a rectangle
ABCD. Are the opposite sides of equal length?

A B

D C

192
Similarly, given two angles, how can we tell whether they are equal without actu-
ally measuring the angles individually? For example, if two lines L and L0 are parallel
and they are intersected by another line, how can we tell if the angles 6 a and 6 b as
shown have the same degrees?

L
a


b L0

These questions, while seemingly silly on a piece of paper, take on a new meaning
if the sides of the rectangle ABCD are several miles apart, or if the lines L and L0 are
also very far apart. We are therefore confronted with a real-world situation of having
to find out whether two geometric figures (two segments, two angles, or two triangles)
in different parts of the plane are the same in some sense (e.g., same length, same
degrees, etc.).

The traditional way of dealing with this problem in Euclidean geometry is to write
down a set of axioms which abstractly guarantee that the two figures in question are
the same (i.e., congruent). This is how it is usually done in the school classroom,
and the drawback of such an approach is that, in a mathematical environment where
proofs and reasoning are scarce or nonexistent, to introduce students to proofs by the
opaque formalism of axioms is to invite disaster. At the moment (as of 2009), the
teaching of geometry in high schools vacillates between teaching proofs by rote via
axioms from the beginning, or not teaching any proofs at all.43
We propose a third alternative by adopting a third approach, one that is more
direct, more tangible and makes use of three standard moves to bring one figure
on top of another in order to check if two geometric figures are congruent. Even
more importantly, we base proofs of theorems directly on these moves. In this way,
the concept of congruence ceases to be abstract and intangible; it can be realized
concretely. So the key idea is how to move things around in a plane, with the
43
A discussion of the situation can be glimpsed in the book review of H. Wu, Geometry:
Our Cultural Heritage, Notices of the American Mathematical Society, 51 (2004), 529-537. Also
http://math.berkeley.edu/ wu/)

193
understanding that the lengths of segments and degrees of angles remain unchanged
in the process. But moving things around in a plane is exactly where the concept
of a transformation comes in, so we first define transformations.

For convenience, we denote the plane by . A transformation F of is a rule


that assigns to each point P of a unique point F (P ) (read: F of P ) in .
There are two extreme examples of a transformation. The first is the constant
transformation: if X is a point in , then the transformation FX which assigns every
P of the point X is an example of a constant transformation. Thus FX (P ) = X
for every point P in . The other extreme is the identity transformation I which
assigns every P of the point P . Thus I(P ) = P for every P in .
To acquire some intuitive feelings for transformations, we need some nontrivial ex-
amples beyond the constant and identity transformations. To this end, we introduce
a new class of transformations called rotations. Rotations are among the cornerstones
of geometry in school mathematics as well as in advanced mathematics.

ROTATIONS Let O be a point in the plane and let a number be given


so that 180 180. Notice that we allow to be both 180 and 180. Then
the rotation of degrees around O (or sometimes we say with center O) is the
transformation % defined as follows: % (O) = O, and if P and P 6= O, let C be
the circle of radius |OP | centered at O; then

% (P ) is the point Q on C so that if 0, Q is obtained from P by turn-


ing degrees in the counterclockwise direction along C (in other words,
|6 QOP | = ), and if < 0, Q is obtained from P by turning || degrees
in the clockwise direction along C (in other words, |6 P OQ| = || ).

The case of > 0 is illustrated by the following figure,

O
P

194
and the case of < 0 is illustrated by the following figure.

O
P

Notice that, by choice, we have allowed the two transformations %180 and %180 to
be the same transformation. This will not cause confusion.
For an intuitive understanding of rotations, the following activity, done preferably
in class, would be helpful. It gives a tactile realization of a rotation of 32 degrees.

Activity On a piece paper, which is our model for the plane , fix a point
O, and then draw a geometric figure S. On a piece of transparency, copy all this
information exactly, in red color (say), and keep the transparency in exactly this
position. In particular, the red point O on the transparency is on top of the point
O on the paper. Now use a pointed object (e.g., the needle of a compass) to pin
the transparency to the paper at the point O. Holding the paper fixed, rotate the
transparency around O, counterclockwise by (let us say) 32 degrees and stop; this
the position of the red figure is exactly where moves S. If the figure S consists of
a single point Q, then the position of the red Q is where the rotation moves Q.
Notice that does not move O, the center of rotation. See picture; we will explain
the notations used in the picture below.
(S)

S
Q o
32
(Q) 32
o

195
Of course there is nothing special about the number 32. One can use any angle,
clockwise or counter-clockwise.

To facilitate subsequent discussions, we introduce a standard concept, that of the


image of a set by F : If S is a subset of the plane, then the image of S by F , denoted
by F (S), is the collection of all the points in which can be written as F (U ) for
some point U in S. We also say F maps S onto F (S). Thus the (S) and (Q) in
the picture above are the images of S and the point Q, respectively. Likewise, for the
constant transformation FX and identity transformation I, we have FX () = {X}
and I() = .

We will be looking at transformations that are very well-behaved, in the follow-


ing sense. A transformation T is said to be a bijection (one-one correspondence)
if it is

(T1) injective (one-to-one), i.e., if for any two distinct points P1 and
P2 of , T assigns to them distinct points T (P1 ) and T (P2 ) of , and
(T2) surjective (onto), i.e., if for every point Q of , there is a point P
of which is assigned to Q by T , i.e., Q = T (P ) for some point P .

We would like to make a pointed remark that the common terminology for these
concepts, i.e., one-one correspondence, one-to-one, and onto, are linguistically
awkward. The suggested replacement of bijective, injective, and surjective (by
Bourbaki), respectively, are more clear and more civilized.

The constant transformations are clearly neither injective nor surjective, but the
identity transformation is both and is therefore an example of a bijection. A rotation
is always a bijection. While this is intuitively obvious, let us go through the argument
carefully. Let us fix a rotation around some O of degrees, let us say, > 0:
is injective. Let P1 and P2 be two distinct points, and we must show (P1 ) 6=
(P2 ). If one of them is equal to O, say P1 = O, then (P1 ) = P1 by definition of
and, because P2 6= O, (P2 ) 6= O, also by the definition of . Therefore (P1 ) 6= P2
in this case. We may therefore assume that both P1 and P2 are not equal to O. If

196
6 |OP2 |, then there is nothing to prove because by the definition of , |OP10 | =
|OP1 | =
|OP1 | =6 |OP2 | = |OP20 |, where P10 and P20 denote (P1 ) and (P2 ), respectively.
Therefore P10 6= P20 , i.e., (P1 ) 6= (P2 ). Let us therefore assume |OP1 | = |OP2 |.
Then both P1 and P2 lie on some circle C around O and maps them to P10 and P20 ,
respectively, as shown.

P2
P
1
P1
P
2

Therefore |6 P10 OP20 | = |6 P1 OP2 | 6= 0, and P10 6= P20 . The proof of the injectivity of
is complete.
is surjective. Let a point Q be given, We must find a point Q0 so that (Q0 ) = Q.

C Q

If Q = O, just let Q0 = O. So let Q 6= O, and let C be the circle with center O and
radius OQ. Now rotate Q by degrees along C in the clockwise direction to get to a
point Q0 . By definition of , we have (Q0 ) = Q. Hence is surjective. This proves
that is bijective.

Pictorially, what a bijection T does is to re-arrange the points of the plane in


such a way that distinct points are not lumped together by T into the same point

197
(injectivity), and such that the image of the plane by T , T (), covers all of and
not just a part of it (surjectivity).

The following three examples of transformations make use of concepts from


pre-calculus, which wont be discussed in this book until Chapters 6, 8, and
15. Nevertheless, since these examples are likely to help you better understand
transformation, we put them here.

Example 1 Let coordinates be introduced in the plane, so that points in the


plane are now just a pair of numbers (x, y). Define the folding transformation
by: (x, y) = (|x|, y). Pictorially, folds the plane along the y axis onto the
right half-plane of the y-axis, because for every (x0 , y) to the right of the y=axis, i.e.,
x0 > 0, we have (x0 , y) = (x0 , y) = (x0 , y). Therefore the definition of implies
that is not injective. Furthermore, is not surjective either, because, for example,
the point (1, 0) cannot be written as (x0 , y 0 ) since (x0 , y 0 ) = (|x0 |, y 0 ), and |x0 | can
never be equal to 1 for any x0 . You can see easily that the image () is in fact
the right half-plane.

Example 2 Recall the inverse tangent function from trigonometry, arctan, which
is defined for every number and is strictly increasing, i.e., arctan(x) makes sense
for every number x, and if x < x0 , then arctan(x) < arctan(x0 ). Furthermore,
2 < arctan(x) < 2 for every number x. Here is the graph:

0.5

-8 -6 -4 -2 0 2 4 6 8

-0.5

198
Now again assume that coordinates have been introduced in . Then the follow-
ing transformation G of , defined by G(x, y) = (arctan(x), y) satisfies (T1) but
not (T2), for the following reasons. To show G is injective, we must show that, if
(x1 , y1 ) 6= (x2 , y2 ), then G(x1 , y1 ) 6= G(x2 , y2 ). Now G(x1 , y1 ) = (arctan(x1 ), y1 )
and G(x2 , y2 ) = (arctan(x2 ), y2 ). Also (x1 , y1 ) 6= (x2 , y2 ) means either x1 6= x2 or
y1 6= y2 . If x1 6= x2 , then either x1 < x2 or x1 > x2 . Correspondingly, arctan(x) be-
ing strictly increasing means arctan(x1 ) < arctan(x2 ) and arctan(x1 ) > arctan(x2 ),
respectively. In either case, we see that G(x1 , y1 ) 6= G(x2 , y2 ). If on the other
hand y1 6= y2 , then again G(x1 , y1 ) 6= G(x2 , y2 ) because they would have distinct
y-coordinates. So G is injective. On the other hand, G is not surjective, and this is
because since 2 < arctan(x) < 2 , the expression of G as G(x, y) = (arctan(x), y)
means the x-coordinates of all the points G(x, y), no matter what x and y may be,
lie in the open interval ( 2 , 2 ). This implies that the image G() of G lies in the
infinite vertical strip in the plane bounded between the vertical lines x = 2
and x = 2 . In particular, the image G() is not all of , and therefore G is not
surjective. One can also see the non-surjectivity of G directly by noting that the
point (5, 0) cannot be written as G(x0 , y 0 ) for any (x0 , y 0 ), because if it were, then
(5, 0) = G(x0 , y 0 ) = (arctan(x0 ), y 0 ), so that arctan(x0 ) = 5 and y 0 = 0. In particular,
arctan(x0 ) > 2 , which is impossible.

Example 3 Assume as before that we have coordinates in the plane. Define a


transformation H so that H(x, y) = (x3 9x + 4, y). We claim that H is surjective
but not injective. The failure of injectivity is easy: H(0, y) = H(3, y) = (4, y) no
matter what y is. To show surjectivity, given any point, (2, 3), for instance, we will
show how to find an (x0 , y0 ) so that H(x0 , y0 ) = (2, 3), i.e., (x30 9x0 +4, y0 ) = (2, 3).
Consider the cubic equation x3 9x + 4 = 2, which is the same as x3 9x + 2 = 0.
But we know that any polynomial (whose coefficients are real numbers) of odd degree
must have a real root (6 of Chapter 7). So let x0 be a real root of x3 9x + 2 = 0.
Then with this x0 and with y0 = 3, we get H(x0 , y0 ) = (2, 3). This shows H is
surjective. (It can be seen from the preceding argument that, for the purpose of sur-
jectivity, the cubic polynomial x3 9x+4 could be replaced by any cubic polynomial.)

Bijections can be looked at from a completely different angle. To this end, we

199
now introduce a few more concepts. If F and G are transformations, we say the
transformations F and G are equal, in symbols F = G, if F (Q) = G(Q) for
every point Q . The composite transformation F G (sometimes also called
the composition of F and G) is by definition the transformation which assigns a
point P in the plane to the point F (G(P )), i.e., if P 0 denotes the point G(P ), then
F G sends P 0 to F (P 0 ).

Observe that we have now introduced a new meaning to the equal sign, the
equality of two transformations. This is a break from the past because we
have only used the equal sign between two numbers or two sets up to this
point. Observe also the fact that this definition is completely unambigu-
ous, so that understanding the equality of two transformations is just a
routine part of learning mathematics that does not require a psychological
discussion of our a priori perception of the concept of equality.

For example, no matter what F is, F I = I F = F . Moreover, if FX is the


constant transformation into the point X, then no matter what the transformation G
is, FX G = FX , and G FX = FX 0 , where X 0 = G(X) so that FX 0 is the constant
transformation that assigns every point to the point G(X). As another example, the
folding transformation of Example 1 above satisfies = . (Can you explain
this?) Note also that the composite of two bijections is again a bijection. This is
simple to verify directly (see Exercises 4.2). But a more revealing example is the
composition of rotations with the same center. Thus let and 0 be two rotations,
both with the same center O, of degrees and 0 , respectively. One can easily see
that if = 30 and 0 = 75, then 0 = 0 = a rotation around O of 75 . Or
if = 30 and 0 = 45, then 0 = 0 = a rotation around O of 15 , i.e., a
clockwise rotation of 15 degrees.
The last two examples lend credence to the possibility that the composition of
transformations is commutative, in the sense that if F and G are transformations,
then it is always the case that F G = G F . It is instructive, as well as essential, to
look at a simple example to see that this is false. Let A and B be two distinct points
on a line and let A , B be rotations of 90 degrees around A, and B, respectively.
Now consider
(A B )(A), (B A )(A)

200
and
(A B )(B), (B A )(B)
We have the following picture, where the lines LCN and LBD are perpendicular to
LAE , and |AB| = |AN | = |BD| = |BE| = |AC| = |CM |.

Mq

Cq qP

q q q
A B E

Nq qD

By unraveling the definitions, we see that

(A B )(A) = A (D) = P
(B A )(A) = B (A) = D
(A B )(B) = A (B) = C
(B A )(B) = B (C) = N

where we have made implicit use of the fact that |6 CBN | = |6 P AD| = 90 . This is
easy to prove using standard facts from Euclidean geometry (see Chapter 11, 3-4),
but for the present need it suffices to verify it experimentally. In any case, we see
that
(A B )(A) 6= (B A )(A) and (A B )(B) 6= (B A )(B)

Now we come to the main point. Given a transformation F . Suppose there is a


transformation G so that both F G and GF are equal to the identity transformation
I on the plane. Then we say G is an inverse transformation of F (and of course, also
that F is an inverse transformation of G). Often, we simply say F is an inverse of
G. Again, referring to rotations, let be the rotation of degree around O, and let 0

201
be the rotation of degree around the same point O, then it can be immediately
verified by using the definition of a rotation that

0 = 0 = I

so that is an inverse transformation of 0 .


The following theorem characterizes transformations which have an inverse trans-
formation.

Theorem (i) If a transformation of the plane has an inverse transformation,


then it is a bijection. (ii) If a transformation is a bijection, then it has an inverse
transformation.

Proof In an exercise, you will, prove (ii). We can prove (i) very simply by use of
a standard argument, one that deserves to be learned. Let G be an inverse of a given
transformation F . Then F is injective because if F (P1 ) = F (P2 ) for two points P1
and P2 , then also G(F (P1 )) = G(F (P2 )) and therefore P1 = P2 because G F = I.
Thus if P1 6= P2 , then F (P1 ) 6= F (P2 ). Also, F is surjective because given Q , if
we let P = G(Q), then F (P ) = F (G(Q)) = Q, because F G = I. We are done.

In short, a transformation F being a bijection is equivalent to its having an inverse


transformation. You will also show in anexercise that the inverse of a transformation
(if it has one) is unique, i.e., if there are transformations G and G0 relative to a given
F , so that F G = I, G F = I and F G0 = I, G0 F = I, then G = G0 . From
now on we can speak of the inverse of a transformation.

The inverse of a bijection F is traditionally denoted by F 1 (F inverse). If


F and G are bijections, then the inverse of F G is G1 F 1 (note the order!)
(homework).

There is a special class of transformations that is of fundamental importance to


geometry. A transformation F of the plane is said to be an isometry if F preserves
distance in the sense that dist(F (P ), F (Q)) = dist(P, Q) for all the points P and
Q in . Equivalently, an isometry F is a transformation so that the length of the

202
segment P Q is equal to the length of the segment P 0 Q0 for any points P and Q,
where P 0 = F (P ) and Q0 = F (Q0 ). A trivial example of an isometry is the identity
transformation. A less trivial example of an isometry is a rotation. Now the above
Activity on rotations using transparencies makes this fact perfectly believable. For
this reason, no one would object to the fact that, in this book, we assume that all
rotations are isometries. This is one of our fundamental assumptions in geometry.
An isometry is injective. Indeed, suppose P 6= Q, then dist(P, Q) > 0, on
account of property (D2) of distance. Thus dist(F (P ), F (Q)) = dist(P, Q) > 0, and
therefore F (P ) 6= F (Q). Thus F assigns distinct points to distinct points. Also
observe that the composition of isometries is an isometry.
It will turn out that every isometry maps segments to segments. Moreover, every
isometry is surjective, so that in fact, an isometry is a bijection. These facts are far
from obvious, and we will prove them in Chapter 11.
Our interest in transformations lies mainly in isometries. Although the only non-
trivial examples of isometries that have been given thus far are rotations, the next
section will provide many more of such examples.

Exercises 4.2

1. (a) Prove that a bijection F of the plane must have an inverse G. (b) Prove that
if the inverse of a transformation F exists, then it must be unique.

2. (a) Prove that the composition of two isometries is an isometry. (b) Prove that the
composition of two surjections is a surjection, and the composition of two injections
is an injection. (Hence the composition of bijections is a bijection.) (c) If F , G are
bijections, then prove that the inverse of F G is G1 F 1 .

3. For each of the following assertions about transformations F and G of the plane, if
it is true, prove. If always false, prove. If sometimes true and some times false, give
examples of each kind. (a) If F G is injective, then G is injective. (b) If F G is
injective, then F is injective. (c) If F G is surjective, then G is surjective. (d) If
F G is surjective, then F is surjective.

203
4. Exhibit two rotations F and G in the plane so that F G 6= G F . Of course
these should not be the same as the A and B above the Theorem!

5. (This problem makes use of coordinates.) (a) Let F and G be transformations of


the plane defined by F (x, y) = (x, y + 1), and G(x, y) = (xy, y). Are the transfor-
mations F G and G F equal? (b) Is F G injective? Surjective? (c) Is G G
injective? Surjective?

6. (This problem makes use of coordinates.) (a) Consider the transformation G of


the plane defined by G(x, y) = (x2 , y). Is it injective? Is it surjective? (b) Consider
the transformation F of the plane defined by F (x, y) = (x, y 3 ). Is it injective? Is it
surjective? (c) With F and G as in (a) and (b), what is (F G)(x, y) for any point
(x, y)? Is the composite injective? Surjective?

7. (This problem makes use of coordinates.) Consider the transformation F of the


plane defined as follows. Let P = (x, y). If x 1, then we define F (P ) = P .
If 1 x 2, then we define F (P ) = (2 x, y). If 2 x, then we define
F (P ) = (x 2, y). Is F injective? Is it surjective? Can you roughly describe what F
does to the plane?

3 The basic isometries

In this section, we study three transformations of the plane: rotations, reflections,


and translations. We have already come across rotations in the last section.These will
be referred to as the basic isometries of the plane. We will, in the process, make
a few tentative steps toward proving theorems in geometry. Our attitude toward ge-
ometric proofs at this point is strictly utilitarian: we prove the minimum number of
theorems that are needed for the discussion of linear equations in beginning algebra.
A more systematic presentation of the proofs of the basic theorems in plane geometry
will be given in Chapter 11. Moreover, Chapter 13 will discuss the nature of proofs
in geometry from a broader perspective.

ROTATIONS We have already defined this class of transformations. It remains

204
to make explicit our assumptions about rotations:
(%1) Given any point O and any satisfying 180 180, there
is a rotation of degrees around O.
(%2) Any rotation maps a line to a line, a ray to a ray, and therefore a
segment to a segment.
(%3) Any rotation preserves length of segments (and is therefore an isom-
etry) and degrees of angles.
Note that the rotation of zero degrees around a point is just the identity trans-
formation I of the plane. If and are numbers so that 180 , 180 and also
180 + 180, then relative to the same center of rotation, the composition of
the rotations % and % can be seen to satisfy
% % = %+
It has been noted in the last section that for any such ,
% % = I and % % = I
Therefore % is the inverse transformation of % for 180 180. By the Theo-
rem in 2, we have that each rotation % is a bijection.

Having rotations at our disposal, we can now prove quite a few theorems. But
we will not indulge in doing that at this point because we will treat geometric proofs
seriously and systematically in Chapters 11 and 12. In this section, we just prove enough
to define reflections and translations. Later on, we just do enough to understand similarity.

We point out a special convention concerning the numbering of geometric the-


orems. We will henceforth number all the theorems on plane geometry consecutively
by G1, G2, G3, etc. This is because, in Chapter 11, we will bring all these theorems
together to give a coherent account of plane geometry.

Theorem G1 Let O be a point not contained in a line L, and let % be the rotation
of 180 around O. Then % maps L into a line parallel to itself, i.e., %(L) k L.

Remark: Notice that the rotation of 180 is the same as the rotation of 180 .

205
%(L)




 qO


 q L
Q P

Proof Suppose %(L) is not parallel to L. Then they intersect at a point Q.


Since Q %(L), there is a point P L so that %(P ) = Q. Since % is a rotation of
180 around O, the three points P , O, and %(P ) are collinear, i.e., P , O, and Q are
collinear. As usual, call this line LP Q . Now, not only is P on L, but Q is also on L
because Q = L %(L). Thus L and LP Q have two points P and Q in common and
therefore they coincide: L = LP Q . But O also lies on LP Q , so O lies on L, and this
directly contradicts the hypothesis that O is not contained in L. Therefore %(L) has
to be parallel to L.

Theorem G2 Two lines perpendicular to the same line are either identical or
parallel to each other.

Proof Let L1 and L2 be two lines perpendicular to a line ` at A1 and A2 ,


respectively. If A1 = A2 , then as we noted in 1 about the uniqueness of the line
passing through a given point of a line and perpendicular to that line, L1 and L2 are
identical. So suppose A1 6= A2 . We need to prove that L1 k L2 . Let % be the rotation
of 180 degrees around the midpoint M of A1 A2 . If we can show that the image of L1
by % is L2 , then we know L2 k L1 by virtue of Theorem G1.

L1 L2  %(L1 )




M
q
` 
A1  A2


To this end, note that %(L1 ) contains A2 because %(A1 ) = A2 . We are given that
L1 `. Since %(A1 ) = A2 and %(A2 ) = A1 , we see that %(`) = `. By property

206
(%3), which implies that rotations map perpendicular lines to perpendicular lines, we
have %(L1 ) `. Therefore each of %(L1 ) and L2 is a line that passes through A2
and perpendicular to `. By the same observation about the uniqueness of the line
perpendicular to a line ` at a given point of `, we see that, indeed, %(L1 ) = L2 .

Recall that we have introduced the concept of a rectangle as a quadrilateral whose


adjacent sides are all perpendicular to each other. As a result of Theorem G2, we
now have:

Corollary A rectangle is a parallelogram.

Given two lines L1 and L2 . A transversal of L1 and L2 is a line ` that inter-


sects both in distinct points. The following fact rounds off the picture of Theorem G2.

Theorem G3 A transversal of two parallel lines that is perpendicular to one of


them is also perpendicular to the other.

Proof Let L1 k L2 and let the transversal ` meet L1 and L2 at A1 and A2 ,


respectively. Assuming L1 `, we will prove that L2 `. Again we consider the
rotation % of 180 around the midpoint M of the segment A1 A2 .

L1 L2

M
q
`
A1 A2

As before, %(A1 ) = A2 so that %(L1 ) is a line passing through A2 . By Theorem G1,


we also know that %(L1 ) k L1 . Since L2 is likewise a line passing through A2 and
parallel to L1 , the Parallel Postulate implies that %(L1 ) = L2 . Now L1 `, and %
preserves degrees of angles. Therefore %(L1 ) %(`), i.e., L2 %(`). Since M `,
%(`) = `. Thus L2 `, as desired.

207
For our immediate need, the following consequence of Theorem G3 should be sin-
gled out:

Corollary Through a point P not lying on a line ` passes one and only one line
L perpendicular to `.
Proof Take any point A ` and let be the line passing through A and per-
pendicular to `. By the Parallel Postulate, there is a line L passing through P and
parallel to .

L
q
P

`
A

By Theorem G3, we have L `. To prove the uniqueness of L, suppose another line


L0 passes through P and is also perpendicular to `. By Theorem G2, since these lines
are not parallel (because they have P in common), they have to be identical. Thus
L = L0 .

REFLECTIONS We can now define reflection. Given a line L, the reflection


across L (or with respect to L) is by definition the transformation RL of , so
that:

(1) If P L, then RL (P ) = P .
(2) If P
6 L, then RL (P ) is the point Q so that L is the perpendicular
bisector of the segment P Q.

q
Q (= RL (P ))
@S
@
q @ @
@
P @
@
L

208
We hasten to show that the definition is well-defined, in the sense that the
conditions that we stipulate in the definition, i.e., (1) and (2) above, make sense.
Condition (1) is clear, so it is a matter of checking condition (2): can we get such a
unique Q? By the Corollary to Theorem G3, there is a unique line passing through P
and perpendicular to L; let us say this line intersects L at S. On the ray RP S , we take
a point Q so that Q and P lie in opposite half-planes of L and so that |P S| = |SQ|.
Then this Q satisfies (2). Suppose there is another Q0 that satisfies (2), i.e., L is the
perpendicular bisector of P Q0 . Then P Q ` and P Q0 `; therefore the lines LP Q
and LP Q0 are the same because of the uniqueness part of the Corollary to Theorem
G3. It follows that Q0 lies on LP Q , and Q = Q0 because Q and Q0 are in the same
half-plane of L, and |QS| = |Q0 S|. Thus the definition of a reflection is well-defined.
As in the case of rotations, it would be helpful to do the following activity in order
to gain an intuitive understanding of reflections.

Activity On a piece of paper, draw a line, to be called ` for the sake of discussion.
Draw some figures on the paper. Then use a piece of overhead-projector transparency
to carefully copy everything that is on the paper, using a different color, say red. In
particular, the line ` is also on the transparency. Flip over the transparency and
super-impose it on the paper, making sure that the red line ` on the transparency
matches point-for-point the line ` on the paper. Now a comparison between the fig-
ures on the paper and the corresponding red figures on the transparency gives a clear
idea of how the reflection across ` moves the points around: the red version of a figure
on the paper is the reflection of the corresponding figure across `.

A subset S of the plane is said to be symmetric with respect to a line L if


the reflection R across L maps S onto itself, i.e., R has the property that R(S) = S.
It is also common to say that the set S has a line symmetry or has bilateral
symmetry if it is symmetric with respect to some line. The following letters, for
example, have obvious line symmetry:

A, C, D, E, H, I, M, O, T, U, V, W, X, Y
Reflections enjoy a remarkable property. Fix a line L, and let R be the reflection
with respect to L. Then it is straightforward to check that R R = I, where I as

209
usual denotes the identity transformation of the plane. But this means R is its own
inverse. It follows from the Theorem of 2 that every reflection is a bijection.

As in the case of rotations, we make the following entirely plausible assumptions


about reflections:
(R1) Given any line L, there is a reflection with respect to L.
(R2) Any reflection maps a line to a line, a ray to a ray, and therefore a
segment to a segment.
(R3) Any reflection preserves lengths of segments (and is therefore an
isometry) and degrees of angles.

We give a simple application of reflections by proving: every point on the per-


pendicular bisector of a segment is equi-distant from the endpoints of the segment.
Indeed, let ` be the perpendicular bisector of BC and let A `. We have to prove
|AB| = |AC|. Let R be the reflection with respect to `. By the definition of reflection,
we see that R(B) = C and R(A) = A. Therefore (R2) implies that R(AB) = AC.
By (R3), we have |AB| = |AC|.

TRANSLATIONS The last basic isometry to be introduced is translation.


Intuitively, the translation T , along the direction from point A to point B and of
distance |AB|, does the following to an arbitrary point P in the plane: draw the line
L passing through P and parallel to LAB , then on the line L, we mark off the point
Q so that |P Q| = |AB| and so that the direction from P to Q is the same as the
direction from A to B. By definition, Q = T (P ).

L HHq Q0
H H
HH
q
AH HqH
P
H
H H
HH HH
jB HqH
Q
H H
H
H

Intuitively, all this is good and well. But in terms of precision, the difficulty with this
description lies in the fact that on L, there is a point Q0 which is also of distance |AB|

210
from P but the direction from P to Q0 is opposite to that from P to Q (see picture
above). The problem becomes one of how to say, precisely, that it is Q and not Q0
that should be defined as T (P ). The following discussion is designed to circumvent
this difficulty. The trick is to observe that ABQP is a parallelogram but ABQ0 P
is not. So we begin with a discussion of a key fact about parallelograms that will
eventually make the definition of a translation more meaningful.
To this end, it would be helpful to adopt a common abuse of language: we say two
segments are equal if their lengths are equal, and say two angles are equal if their
degrees are equal. The following will be helpful in our discussion of translations.

Theorem G4 Opposite sides of a parallelogram are equal.

Theorem G4 together with the Corollary to Theorem G2 imply that the opposite
sides of a rectangle are equal. This reconciles the usual definition in school mathe-
matics of a rectangle (a quadrilateral with four right angles and equal opposite sides)
with our definition of a rectangle (a quadrilateral with four right angles). The proof
of Theorem G4 requires the following lemma.

Lemma If F is a bijection of the plane that maps lines to lines, then for any two
distinct lines L1 and L2 , if P is the intersection of the lines L1 and L2 and Q is the
intersection of the image lines F (L1 ) and F (L2 ), then F (P ) = Q.

Proof of Lemma Since P lies on the line L1 , F (P ) lies on the line F (L1 ).
Similarly, F (P ) lies on the line F (L2 ). Therefore F (P ) lies in the intersection of the
lines F (L1 ) and F (L2 ), which by hypothesis is Q. By (L2), two lines intersect at at
most one point. Hence F (P ) = Q.

Proof of Theorem G4 Given parallelogram ABCD, we must show |AD| = |BC|


and |AB| = |CD|. It suffices to prove the former.
We have so few tools at our disposal that our first thoughts have to be: how can we
make use of Theorem G1? If we look at the picture of a parallelogram, sooner or later
the idea would surface that we should do a 180 degree rotation around the midpoint
of a diagonal, e.g., around the midpoint M of the diagonal AC. (The diagonal AC

211
was not in the original picture ABCD, but putting it there clearly helps us see the
situation better.)

A
@

D

@


@q M

B
@
C

Let % be the rotation of 180 degrees around M . Then %(C) = A so that %(LBC ) is a
line passing through A and (by Theorem G1) parallel to LBC . Since the line LAD has
exactly the same two properties by assumption, the Parallel Postulate implies that
%(LBC ) = LAD . Similarly, %(LAB ) = LCD . Thus,

the intersection of %(LBC ) and %(LAB )


= the intersection of LAD and LCD = D
On the other hand, the intersection of LBC and LAB is B. By the Lemma, we have

%(B) = D

Recall we also have %(C) = A. Therefore %(BC) = AD. Since % is an isometry (by
(%3)), we have |BC| = |AD|, as desired.

Remark It may be of interest to add a comment to the preceding assertion that,


because %(B) = D and %(C) = A, we have %(BC) = AD. This is so intuitively
obvious that, at least in the school classroom, it would be wise to leave it as is. As
a teacher, however, you should be able to give a precise proof, which runs as follows.
By assumption (%2), % maps the segment BC to a segment which joins %(B) (= D)
and %(C) (= A), which is to say, %(BC) is a segment joining D to A. But AD is
a segment joining D to A, and there is only one segment joining D to A, by (L1).
Therefore, %(BC) = AD.

Corollary to Theorem G4 The angles of a parallelogram at opposite vertices


are equal.

212
The proof is already implicit in the proof of Theorem G4, and will therefore be
left as an exercise.

We are now ready to define translation. We first extend the concept of a vector
first introduced in 3 of Chapter 2 from the number line to the plane. Given two

points A and B in , the vector AB is the segment AB together with a starting

point A, which is the first letter in AB, and an endpoint B, which is the second

letter in AB.44 In other words, a vector is just a segment together with a direction
from a designated endpoint to the other endpoint. For example, while the segments

AB and BA are the same, the vectors AB and BA are different because they have

different starting points and endpoints. With this understood, given a vector AB,

the translation along AB is the transformation TAB of defined as follows:

(1) If P LAB , then TAB (P ) is the point Q LAB so that P Q has the

same length and same direction as AB. More precisely, if we regard LAB

as the number line so that the starting point A of AB is 0 and so that the
endpoint B is a positive number (to the right of 0), then Q is the point
on LAB also to the right of P and |P Q| = |AB|.

Pq Q
q Aq Bq

(2) If P 6 LAB , then TAB (P ) is the point Q obtained as follows. Let L1


be the line passing through P and parallel to LAB . Let L2 be the line

passing through the endpoint B of AB and parallel to the line LAP , which

joins the starting point A of AB and P . The point Q is the intersection
of L1 and L2 .
44
This is the same concept as the one used in calculus in 3-space, though the notation may be
slightly different.

213
L2

P Q q



L1

A

B

A few supplementary comments would make the definition more intuitive. First,
why must L1 and L2 in (2) intersect? Suppose not, then L2 k L1 . Now B does not lie
on L1 (because L1 k LAB ). Thus through B pass two lines parallel to L1 , namely L2
and LAB . By the Parallel Postulate, L2 = LAB . In particular, A L2 and therefore
L2 intersects LAP at A. This contradicts the fact that L2 k LAP . Therefore L1 must
intersect L2 . Next, by construction, ABQP is a parallelogram. By Theorem G4,
|P Q| = |AB|. What this means is that

if TAB maps a point P not on LAB to a point Q (i.e., TAB (P ) = Q), then
the distance from P to Q is always equal the length |AB|

and furthermore, LP Q k LAB . In other words, a translation moves every point in the
plane the same distance and in the same direction.
If a point P lies in LAB , then TAB (P ) lies in LAB , because of (1) in the definition
of a translation. In particular, TAB (A) = B.


Keeping the same notation, we note that if we consider the vector BA, then
for a point P 6 LAB , the translation TBA maps the point Q to exactly P because,
according to (2), we obtain TBA (Q) as follows: it is the point of intersection of the
line passing through Q and parallel to LBA (that would be L1 again), and the line

passing through the endpoint A of BA and parallel to LBQ (that would be LAP ). This
point of intersection is of course just P . Therefore for any point P not lying on LAB ,
we have
TBA (TAB (P )) = P
By retracing the steps in (1), it is simple to see that the equality TBA (TAB (P )) = P

214
persists even when P LAB . Therefore we have

TBA TAB = I

By switching the letters A and B, we obtain

TAB TBA = I

This means that for any vector AB, the translation TAB has an inverse transformation
TBA . By the Theorem in 2, every translation is a bijection of the plane.
As before, the following hands-on activity is highly recommended as a way to
enhance ones intuitive understanding of translations.

Activity We use a piece of paper as a model for the plane. On the paper, draw

a vector AB, and also extend the segment AB to a line, denoted as usual by LAB .
Draw some figures on the paper. Then use a piece of overhead-projector transparency
to copy everything on the paper, using (let us say) a red pen. In particular, both the

vector AB and the line LAB are on the transparency. Holding the paper in place, slide
the transparency along the line LAB until the red point A on the transparency is on
top of the point B on the paper. The new positions of all the red figures on the trans-
parency then display how the translation from A to B moves the figures on the paper.

We proceed to make the same assumptions about translations as those on


rotations and reflections:

(T 1) Given any vector AB, there is a translation along AB.
(T 2) Any translation maps a line to a line, a ray to a ray, and therefore a
segment to a segment.
(T 3) Any translation preserves lengths of segments (and is therefore an
isometry) and degrees of angles.

Translations have a noteworthy property: the translation TAB maps a line L which
is not parallel to LAB to a line parallel to L itself. Suppose not. Then L intersects
TAB (L) at a point Q. Since Q TAB (L), there is a point P L so that TAB (P ) = Q.

215
But Q is also in L, so P and Q are both in L, and therefore LP Q = L. We know L is
not parallel to LAB , therefore LP Q is not parallel to LAB . This contradicts the fact
that TAB always maps a point P to another point lying on the line parallel to LAB
and passing through P . This completes the proof.

We have now finished the definitions of the basic isometries (rotations, reflections,
and translations). A lot more will be said about these basic isometries in Chapter 11.
The reason we do not pursue this discussion here is that we are merely trying to do
enough geometry to make the study of linear equations possible. This is a reflection
(no pun intended) of the reality of the middle school mathematics curriculum: to the
extent that the study of linear equations is taken up in the middle school, we must
try to do enough about similar triangles to support this study.

We take this opportunity to make a useful observation. Given two parallel lines,
we can now define the distance between them. First, let P be a point not lying on
a line `. The distance of P from the line ` is by definition the length |P Q|,
where Q is the point intersection of the line ` and the line passing through P and
perpendicular to `.

`
Q

Now suppose we have parallel lines ` and `0 , and P `0 . If P 0 is another point on


`0 , then we claim that the distance of P from ` is the same as the distance of P 0 from
`.45 Indeed, let the line passing through P 0 and perpendicular to ` intersect ` at Q0 .
By Theorem G2, LP Q k LP 0 Q0 . Therefore P QQ0 P 0 is a parallelogram. Consequently,
|P Q| = |P 0 Q0 |, by Theorem G4. This proves the claim.
45
This explains why the sleepers (cross ties) across rail tracks can afford to be all of the same
length.

216
P P0
`0

`
Q Q0

The common distance from points on one of two parallel lines to the other is called
the distance between the parallel lines.

Exercises 4.3

1. In the picture below, C denotes the lower left corner of the black figure, |6 CAB| =
45 , |AB| = |BC|, and line L makes 45 degrees with line LAB .
Let F be the counterclockwise rotation of 45 with center at the point B, let G
be the clockwise rotation of 90 with center at the point A, let H be the reflection

across the line L, and let J be the translation along AB. Furthermore, let S denote
the black figure.
Using a separate sketch for each of the following items, indicate the positions of
(a) G(S), (b) F (G(S)), (c) G(H(S)) and H(G(S)), (d) J(S), (e) J(F (S)) and
F (J(S)), (f) H(J(S)) and J(H(S)), (g) G(H(J(S))), and (h) J(H(F (S))).

C
L

A B

2. In our definition of translation, we drew a picture to show, if TAB (P ) = Q, where


the point Q is. Using exactly the same notation and same picture, show where Q0 is
if TBA (P ) = Q0 .

217
3. Prove the Corollary to Theorem G4.

4, Let L be a line in the plane, which may be taken to be the usual number line.
Denote 0 on L by A, and the number 1 on L by B. Let %1 be the counterclockwise
rotation of 45 around A, and let %2 be the clockwise rotation of 90 around B. De-
scribe as precisely as you can the line %1 %2 (L) and the line %2 %1 (L). In particular,
does %1 %2 (L) equal %2 %1 (L) ?

5. Prove that every point on the angle bisector of an angle is equidistant from both
sides of the angle.

6. If ABCD is a parallelogram, prove that 6 ADB and 6 CBD are equal. (Caution:
We have not yet proved anything about alternate interior angle, so you cannot use
that fact for this proof. Look instead at the proof of Theorem G4. You should be
quite impressed by how much information is carried by that simple proof.)

7. Prove that every translation is equal to the composition of two reflections. (b)
Prove that every rotation is also equal to the composition of two reflections. (Com-
ments: The net effect of this problem seems to be that we can forget rotations and
translations because we only need refections. This is an algebraic afterthought on
the geometry of basic isometries, and it must be said that, in advanced mathematics,
this algebraic point of view has paid immense dividend. On the other hand, this al-
gebraic fact is something to keep in mind from the point of view of geometry, but no
more than that. Geometers continue to think in terms of translations and rotations
directly.)

4 Congruence, SAS, and ASA

We begin with a key definition.

Definition A transformation of the plane is called a congruence if it is a


composition of a finite number of basic isometries.

218
Congruence is one of the main concepts in the school geometry curriculum. Here
are its most basic properties:

Theorem G5 (a) Every congruence is an isometry; it preserves lines and the


degrees of angles, and it is also a bijection. (b) The inverse of a congruence is a con-
gruence. (c) Congruences are closed under composition in the following sense:
if F and G are congruences, so is F G.

Proof It has been pointed out that every one of the basic isometries has the
following three properties: it is a bijection, it is an isometry, and it maps lines to
lines as well as preserves the degrees of angles. Because these properties persist un-
der composition, the proof of part (a) of the theorem is straightforward. To prove
part (b), i.e., the inverse of a congruence is a congruence, let a congruence be the
composition of three basic isometries F G H, then it is simple to directly verify
that if = H 1 G1 F 1 , then = I = . So is the inverse of . But the
inverse of a basic transformation is a basic transformation (the inverse of a rotation
is a rotation, the inverse of a reflection is itself, and the inverse of a translation is a
translation), so is also a congruence. A similar statement holds if is the compo-
sition of any number of basic isometries for exactly the same reason. Part (c) follows
immediately from the definition of a congruence as a composition of basic isometries.

A subset of the plane S is said to be congruent to another subset S 0 of the plane


if there is a congruence so that (S) = S 0 . In symbols: S = S 0 . Since the in-
verse of a congruence is a congruence (Theorem G5(b)), and since (S) = S 0 implies
1 (S 0 ) = S, we see that S
= S 0 implies S 0
= S. Thus we can speak unambiguously
about two sets S and S being congruent since if S
0
= S 0 , then also S 0
= S. We
leave as an exercise to show that if S1 is congruent to S2 and S2 is congruent to S3 ,
then S1 is congruent to S3 . This fact is usually expressed by saying that congruence
is transitive.

Example We will have to use coordinates for the description of the following
example of congruence. Let R be the reflection across the diagonal line defined by
x = y and let T be the translation along the vector OA, where O is the origin and

219
A = (0, 5). Consider the congruence = T R. If S is the isosceles triangle shown
below, what is (S)?

7
C
 C A
 C
 C

O
5 3

By definition, (S) = T (R(S)). So if we denote the set R(S) by K, then we first


figure out what K is, and then we consider T (K). Now R is the reflection across the
diagonal {x = y}. Since R maps the x-axis to the y-axis and the positive x-axis to the
positive y-axis, we see that K is the triangle in the fourth quadrant in the following
figure.

7
C
 C A
 C
 C

O
5 3 7
3 XXX
K
X
5 

Now T moves everything up by 5 units (O = (0, 0) and A = (0, 5)), so the bottom
vertex of K must move up 5 units and therefore lands exactly on the x-axis. The
triangle in the first quadrant of the following figure is then T (K), which is (S).

220
7
C
 C A
 C
 C
X (S)
XXX
O 

5 3 7
3 XXX
K
X
5 

It is worth pointing out that this definition of congruence applies not only to
polygons, but to any geometric figures. This, and not same size and same
shape, is the meaning of congruence. For example, the following two strange looking
figures are congruent because one can map either one onto the other, by a translation
(such as from the end of the thin protruding tip in one figure to the same point in
the other) followed by a rotation of 90 .

Congruent triangles occupy a special position in elementary geometry and has


its own special conventions. Denote a triangle ABC by 4ABC. The congruence
notation 4ABC = 4A0 B 0 C 0 will be understood to mean in addition to the
established meaning that the two sets (4ABC) and 4A0 B 0 C 0 are equal for some
congruence that also satisfies (A) = A0 , (B) = B 0 , and (C) = C 0 .
It follows from Theorem G5(a) that, if 4ABC = 4A0 B 0 C 0 , (AB) = A0 B 0 ,
(AC) = A0 C 0 , and (BC) = B 0 C 0 , and also that (6 A) = 6 A0 , (6 B) = 6 B 0 , and
(6 C) = 6 C 0 . Therefore, again by Theorem G5(a), 4ABC = 4A0 B 0 C 0 implies that

|6 A| = |6 A0 |, |6 B| = |6 B 0 |, |6 C| = |6 C 0 |,

221
and
|AB| = |A0 B 0 |, |AC| = |A0 C 0 |, |BC| = |B 0 C 0 |.
We now prove the converse:

Theorem G6 If for two triangles 4ABC and 4A0 B 0 C 0 ,

|6 A| = |6 A0 |, |6 B| = |6 B 0 |, |6 C| = |6 C 0 |,

and
|AB| = |A0 B 0 |, |AC| = |A0 C 0 |, |BC| = |B 0 C 0 |,
then 4ABC
= 4A0 B 0 C 0 .

Conceptually, Theorem G6 is the correct statement. However, in a practical sense,


this theorem is an overkill, in that it is hardly necessary to require the equalities of
all the angles and all the sides of two triangles before we can prove the triangles are
congruent. Typically, it suffices to impose three suitably chosen conditions to get
it done, and the best known of which are SAS, ASA, and SSS. The proof of SSS
requires too much of an interruption at this juncture and will therefore be postponed
to Chapter 11, but we can prove the other two here:

Theorem G7 (SAS) Given two triangles ABC and A0 B 0 C 0 so that |6 A| = |6 A0 |,


|AB| = |A0 B 0 |, and |AC| = |A0 C 0 |. Then the triangles are congruent.

Theorem G8 (ASA) Given two triangles ABC and A0 B 0 C 0 so that |AB| =


|A0 B 0 |, |6 A| = |6 A0 |, and |6 B| = |6 B 0 |. Then the triangles are congruent.

We will prove Theorem G8 and leave the proof of Theorem G7 as as exercise


because it is very similar.

Proof of Theorem G8 We will prove that if the triangles ABC and A0 B 0 C 0


satisfy |AB| = |A0 B 0 |, |6 A| = |6 A0 |, and |6 B| = |6 B 0 |, then there is a congruence
so that (4ABC) = 4A0 B 0 C 0 . To this end, we break up the proof into three steps,
going from a special case to the most general:

222
Case I. The triangles satisfy in addition, A = A0 , B = B 0 .
Case II. The triangles satisfy in addition, A = A0 .
Case III. The general case.

Case I. In this case, either C, C 0 are already in the same half-plane of LAB , or
they are in opposite half-planes of LAB . If the former, then we claim that C = C 0 , so
that in this situation, we need only let be I, the identity transformation. To prove
the claim, observe that because |6 CAB| = |6 C 0 AB| by hypothesis, the fact that C
and C 0 are in the same half-plane of LAB implies that we have the equality of rays,
RAC = RAC 0 (see the discussion of angle measurement in 1).
C 0
" C
"
"
"
"

"
"

"
""

"
A"

B

In like manner, because |6 CBA| = |6 C 0 BA| we have RBC = RBC 0 . Therefore

RAC RBC = RAC 0 RBC 0

which means of course that C = C 0 . So in this situation, 4ABC = 4A0 B 0 C 0 , or,


more formally, I(4ABC) = 4A0 B 0 C 0 .
Next, suppose A = A0 , B = B 0 but C, C 0 are in opposite half-planes of LAB . Then
let R be the reflection with respect to LAB , and let R(C 0 ) = C . Because R maps
A0 and B 0 to themselves (they are on LAB ), we have R(4A0 B 0 C 0 ) = 4A0 B 0 C . Now
compare the two triangles ABC and A0 B 0 C : we have A = A0 , B = B 0 as before,
but C and C are now in the same half-plane of LAB . Because R is a basic isometry,
the equalities of sides and angles between 4ABC and 4A0 B 0 C 0 in the assumption
continue to hold for the corresponding sides and angles in 4ABC and 4A0 B 0 C .
The first part of Case I therefore implies that 4ABC = 4A0 B 0 C , and together
with R(4A0 B 0 C 0 ) = 4A0 B 0 C , we have 4ABC = R(4A0 B 0 C 0 ). Since R R = I,
by applying R to both sides, we obtain R(4ABC) = 4A0 B 0 C 0 . Thus letting 1 be

223
either I or R, depending on whether C, C 0 lie in the same half-plane or different
half-planes of LAB , we have

1 (4ABC) = 4A0 B 0 C 0

and Case I is proved.


Case II. We now let the triangles satisfy the less restrictive condition that
A = A0 , but nothing about B and B 0 . Then there is some , 180 < 180, so
that the rotation % around the point A satisfies % (RAB 0 ) = RAB . In the following
picture, > 0, but if the points B and B 0 are interchanged, then we would have to
rotate clockwise from B 0 to B and would be negative.

"B
"
"
"
"
"
"
"
"
"
"
A" B0

Now B and % (B 0 ) (to be called B ) are two points on the ray RAB , and (because
|AB| = |A0 B 0 | and % is a basic isometry) |A0 B 0 | = |% (A0 B 0 )| = |AB |. Therefore
B = B , i.e., B = % (B 0 ). Letting C = % (C 0 ), we get % (4A0 B 0 C 0 ) = 4ABC .
Therefore, the two triangles ABC and ABC satisfy the condition of Case I. Conse-
quently, 1 (4ABC) = ABC for a basic isometry 1 . In other words, 1 (4ABC) =
% (4A0 B 0 C 0 ). Letting 2 be the inverse transformation % of % , and applying 2 to
both sides of this equation, we see that

2 (1 (4ABC)) = 4A0 B 0 C 0

Thus the theorem is also proved for Case II because 2 1 is a congruence.


Case III. Finally, we deal with the general case where no restriction is placed on
the triangles ABC and A0 B 0 C 0 . We may therefore assume that the vertices A and A0
are distinct. Let T be the translation along the vector A0 A; note that T (A0 ) = A, so
that if we define B = T (B 0 ) and C = T (C 0 ), then T (4A0 B 0 C 0 ) = 4AB C . Now
since the triangles ABC and AB C have the vertex A in common, Case II applies.
Thus for suitable basic isometries 1 and 2 , we have 2 (1 (4ABC)) = 4AB C ,

224
which is of course equivalent to 2 (1 (4ABC)) = T (4A0 B 0 C 0 ). Let 3 be the
inverse translation TAA0 of T . Then we obtain for the general case of two triangles
ABC and A0 B 0 C 0 with three pairs of equal sides and equal angles,

3 (2 (1 (4ABC))) = 4A0 B 0 C 0

Letting be the congruence 3 2 1 , we see that (4ABC) = 4A0 B 0 C 0 . This


completes the proof of Theorem G8.

We will also prove in Chapter 11 that every isometry of the plane is a congruence.
In other words, every isometry is nothing but the composition of a finite number
of basic isometries. This underscores the importance of the basic isometries: they
are the basic building blocks of all the isometries of the plane. Once we know this,
then we know that if a transformation preserves distance, it must be a congruence
and therefore it is automatically surjective and it automatically preserves lines and
degrees of angles. However, until we can prove this fact about isometries, we cannot
assume that an isometry preserves degrees of angles or that it has an inverse. So be
careful.

Exercises 4.4

1. Prove that congruence is transitive.

2. Using the notation and picture of the Example in this section, exhibit R(T (S)).
Is it equal to T (R(S))?

3. Prove that any two circles of the same radius are congruent. (Caution: This is a
slippery proof. Be very precise.)

4. Prove Theorem G7.

5. Prove that the angle bisector from a vertex of a triangle is perpendicular to the
opposite side if and only if the two sides of the triangle issuing from this vertex are
equal. (Note that by the Crossbar Axiom, there is no question that the angle bisector

225
must intersect the opposite side.)

6. Let ABCD be a parallelogram. If a diagonal is an angle bisector (e.g., BD bisects


6 ABC), then prove that all four sides of ABCD are equal.

7. Explain why two triangles with two pairs of congruent sides and one pair of con-
gruent angles need not be congruent.

5 A brief pedagogical discussion

Unlike other parts of this book, the discussion of this chapter cannot be used
in the school classroom of 2010 until there is substantial alteration in the standard
curriculum. At the moment (2010), it is impossible to write an exposition of middle
school geometry that is consistent with the typical school geometry curriculum and
also makes mathematical sense. This curriculum is in an unsettled state. Given a
choice, we are duty bound to choose the path that is consistent with the minimum
requirements of mathematics, and hope that the school curriculum will eventually
catch up.

The most obvious fact about professional development bears repeating: what one
teaches teachers is not what one can use, unchanged, to teach school students. Trans-
formations of the plane, and concepts of surjectivity and injectivity, are taxing topics
even for college students, and it would not do to subject the average middle or high
school student to a treatment with the same degree of precision as here. A teacher,
especially a middle school teacher, will have to judiciously simplify the content of this
chapter and the next in order to convey to students the main message of these two
chapters, namely, that congruence and similarity are precise mathematical concepts.
One suggestion is that, while all these concepts must be defined and discussed so that
school students have something precise to go on, one can minimize the formalism of
the discussion to a large extent.
For example, it is possible to get across the main idea of Example 1 of 2 without
referring to coordinates by the simple device of paper-folding. More importantly, the
whole discussion of transformations can be made much more accessible if one makes

226
liberal use of overhead transparencies as described in the Activities of 3. One
can even assign homework problems on such activities and ask students to report
on their findings of the effects of various transformations. With enough of such
hands-on experiences, students will build up their geometric intuition about the basic
isometries, and therefore about isometries themselves. Using transparencies in a
similar manner to illustrate the composition of transformations is also recommended.
Such hands-on activities are meant, of course, to supplement the definitions and
the accompanying mathematical discussions, but not to replace them entirely. At
the same time, the presentation of the definitions in this chapter and the next can
be streamlined. For example, we proved Theorems 1-3 in order to show that from a
point outside a given line, one can drop one and only one perpendicular to the given
line. In our exposition, this fact is needed to make the definition of a reflection well-
defined. In a middle school classroom, however, it would be pedagogically legitimate
to simply state this fact without proof, if for no other reason than the fact that
students at that age are not likely to harbor doubts about such things. One can
therefore move smoothly from the definition of a rotation directly to the definition of
a reflection without the interruption of proofs of theorems.
Pretty much the same comment applies to the definition of a translation. Again,
the exposition in 3 makes use of Theorem G4 to get an intuitive description of a
translation in terms of parallelograms, but middle school students can be convinced
of the equality of opposite sides of a parallelogram by experimental evidence alone,
such as the 180 degree rotation around the midpoint of a diagonal that interchanges
the two triangles separated by the diagonal. One should therefore be able to define a
translation directly in terms of a parallelogram without any discussion of proofs. For
these students, proofs of most geometric theorems can wait till high school, whereas
their working knowledge with congruence and similarity cannot. Algebra and graphs
of linear equations are looming in their horizon.

Experience in the actual classroom will suggest the proper balance between hands-
on activities and precise definitions in a middle school classroom or a high school
classroom. What such pedagogical considerations cannot do, however, is to lighten
your mathematical load as a prospective teacher. If you are going to teach these
concepts effectively, you must know the whole mathematical story first before you can

227
simplify or pick and choose. One cannot write a plot outline of War and Peace with-
out first reading through the thousand and more pages of the original. Likewise,
without a complete knowledge of the relevant mathematics, you will not know what
to keep and what to leave out in your lessons because you wont be able to distinguish
between what is truly essential and what is expendable. Besides, if by chance you get
a precocious youngster who wants the whole truth and nothing but the truth, then
you will have to supply the whole truth and nothing but the truth. This too is part
of your basic duty as a teacher and these chapters are designed to get you ready for
such contingencies.

228
Chapter 5: Dilation and Similarity

1 The fundamental theorem of similarity (p. 230)


2 Dilation (p. 245)
3 Similarity (p. 260)

229
This chapter introduces the other basic concept in school geometry: similarity.
Like congruence, similarity has not fared well in school mathematics. Middle school
students are taught that two sets are similar if they are the same shape but not the
same size. Then when they get to high school, they are told that similarity means
equal angles and proportional sides. In other words, suddenly similarity becomes
synonymous with the similarity of triangles. What such instructions leave behind
is a vacuum about what it means for two geometric shapes, which are not polygons
(e.g., two ellipses), to be similar. Consequently, there is a high probability that, upon
graduation from high school, students understanding of similarity consists of two
disconnected sound bites: a definition of similar triangles in terms of proportional
sides and equal angles, and a conception of same shape but not the same size for
anything other than triangles. What seems to have escaped the notice of most text-
book authors is that a correct description of similarity, one that is discussed below,
can be easily introduced in middle school through ample hands-on experiments plus
a judicious amount of reasoning. In any case, we have to get rid of same shape but
not same size in the school classroom.

A quick perusal of 5 of the last chapter may lend some perspective on the mate-
rial of this chapter.

1 The Fundamental Theorem of Similarity

The purpose of this section is to give a partial explanation of the following Fun-
damental Theorem of Similarity (FTS). At this point, we have not said what
similarity means (the definition will be given in 3), much less why this theorem
is fundamental. However, it will become all too clear that this theorem dominates
the whole discussion of similarity.

Theorem G9 (FTS) Let 4ABC be given, and let D, E be points on AB and


|AB| |AC|
AC respectively. If |AD| = |AE| , and their common value is denoted by r, then
DE k BC and
|BC|
=r
|DE|

230
A
@
 @
 @
 @
 @
D @E
 @
B  @C

First of all, a remark about terminology. The number r is generally referred to


as the scale factor. The statement above on DE k BC is a standard abuse of
notation for LDE k LBC , i.e., the line containing the segment DE is parallel to the
line containing the segment BC. We will continue to use this abuse of notation for
the rest of this book.
|AB|
In applications, it is sometimes more convenient to assume, instead of |AD| =
|AC|
|AE| , the equivalent condition
|AD| |AE|
=
|DB| |EC|
See problem 1 of Exercises 5.1.

In this section, will we give a proof of this theorem for the special case of r = 2,
i.e., for the case that D and E are midpoints of AB and AC, respectively. The
complete proof of FTS for all fractional values of r can be given now, and the only
reason for not giving it is that the rather intricate reasoning would be too much of a
distraction at this point. This proof will be left to Chapter 11. Let it be mentioned
that we will prove FTS only when r is a fraction, and then appeal to FASM for the
validity of FTS for all positive numbers r.
The proof aside, it is worthwhile to point out that the lined papers in your note-
books provide fertile ground for experimentations related to FTS, at least when the
scale factor r in FTS is a positive fraction. We take for granted that the lines of the
lined papers are equi-distant parallel lines, in the sense that the distances between
adjacent parallel lines (see the end of 3 in Chapter 4 for a definition of this concept)
are equal. Therefore the segments intercepted on a transversal by adjacent parallel
lines are all of the same length (see problem 6 at the end of this section). It follows
that if a segment has its endpoints on two of the lines on your notebook paper (such

231
as AB below), then any one of the parallel lines in between will divide the segment
into two sub-segments whose lengths can be instantly read off by counting the number
of parallel lines. To be explicit, consider the following situation:
A
\\
\
\
Pq \ qQ
\
\
Dq \qE
\
\
\
\
\
B C

If the length of the segment on AB trapped between two adjacent parallel lines is s,
then |AD| = 3s and |AB| = 5s. Likewise, if the length of the segment on AC trapped
between two adjacent parallel lines is t, then |AE| = 3t and |AC| = 5t. Therefore,
|AB| |AC| 5
= =
|AD| |AE| 3
and the FTS predicts that
|BC| 5
=
|DE| 3

Similar considerations apply to AP and AQ, so that

|AB| |AC| 5
= =
|AP | |AQ| 2
and the FTS predicts that
|BC| 5
=
|P Q| 2
It would be a satisfying experience for a student to verify these two predictions of the
FTS by directly measuring |BC|, |DE|, and |P Q|. In a school classroom, such an
experiment, when repeated for many different variations of this configuration, should
provide an excellent opportunity for students to build up their intuition about FTS.

232
In applications, there is a different but equivalent formulation of FTS that is some-
times more convenient to use:

Theorem G10 (FTS*) Let 4ABC be given, and let D AB. Suppose a line
parallel to BC and passing through D intersects AC at E. Then

|AB| |AC| |BC|


= =
|AD| |AE| |DE|

A
@
 @
 @
 @
 @
D @E
 @
B  @C

The simple proof that, because of the presence of the Parallel Postulate, Theo-
rems G10 follows from Theorem G9 and that, conversely, Theorem G9 follows from
Theorem G10 will be left as an exercise.
The proof of FTS for the special case of r = 2 requires some preparation in the
form of the following sequence of theorems.

Theorem G11 Let O be a point on a line L, and let % be the rotation of 180
around O. Then % maps each half-plane of L to its opposite half-plane.

Proof Let the half-planes of L be L+ and L . The theorem says

%(L+ ) = L and %(L ) = L+

It suffices to prove the first assertion, i.e., %(L+ ) = L . Let us first prove that
%(L+ ) L . So let P be a point in L+ , and we will prove %(P ) L . On the line
LP O joining P to O, let Q be the point on the other half-line of LP O relative to O so
that the distance from O to P is equal to the distance from O to Q. It follows that
Q = %(P ).

233
q
Q
L 





O L


 L+
P q


Now the segment P Q contains a point of L, namely, O, so P and Q are in opposite


half-planes of L (see (L4) in 1). Since P L+ , we have Q L , i.e., %(P ) L ,
as claimed. Next we need to prove that L %(L+ ). Thus given Q L , we must
show that there is a point P L+ so that %(P ) = Q. We reverse the preceding
argument: Join Q to O to obtain the line LQO , and on this line, take the point P
on the other half-line relative to O so that P , Q are equi-distant from O. Then by
definition, %(P ) = Q. Since the segment P Q contains a point of L (namely, O), P
and Q have to be in opposite half-planes. Thus P L+ , and the proof is complete.

Let L and L0 be two lines meeting at a point O. On L (resp., L0 ), let P , Q


(resp., P 0 , Q0 ) be points lying on opposite half-lines determined by O, as shown in
the following figure.
0
L XX Qq 
XXPXq 
XXb
XOXb
XX
XXXq
q 

 XX
L0  P 0 Q X


Then the angles 6 P OP 0 and 6 QOQ0 are called opposite angles.46 We have:

Theorem G12 Opposite angles are equal.

Proof We make use of the preceding figure. The proof is rather trivial: each of
the two numbers, |6 P OP 0 | and |6 QOQ0 |, when added to |6 P 0 OQ| is 180 because
6 P OQ and 6 P 0 OQ0 are both straight angles. So |6 P OP 0 | = |6 QOQ0 |. We want to

give a different proof, however, because our purpose is to demonstrate how to make
46
Most school textbooks in the U.S. call these vertical angles.

234
use of basic isometries to prove theorems. In this case, we argue as follows. Consider
the rotation % of 180 around O. Clearly %(ROP ) = ROQ and %(ROP 0 ) = ROQ0 . There-
fore %(6 P OP 0 ) = 6 QOQ0 . Since % preserves angles (by assumption (%3) of rotations),
we have |6 P OP 0 | = |6 QOQ0 |.

The next two theorems give characterizations of a parallelogram that will prove
to be useful. The first one says that parallelograms are the quadrilaterals whose di-
agonals bisect each other.

Theorem G13 Let L and L0 be two lines meeting at a point O. P , Q (resp.,


P 0 , Q0 ) are points lying on opposite half-lines of L (resp., L0 ) determined by O. Then
|P O| = |OQ| and |P 0 O| = |OQ0 | P P 0 QQ0 is a parallelogram.

L
L0
@ ""
@ "
P @ " 0
" Q
" 
@O
 @ "
 " 
 "
" @ 
 "" @ 
P0 "
 @Q
" @
"
@

Proof We will prove that if |P O| = |OQ| and |P 0 O| = |OQ0 |, then P P 0 QQ0


is a parallelogram; the converse will be left as an exercise. As usual, let % be
the rotation of 180 around O. Then % interchanges the rays of LP Q with ver-
tex at O. Thus %(ROP ) = ROQ , so that %(P ) ROQ . But % is an isometry, so
|%(OP )| = |OP |, or written differently, |O%(P )| = |OP |. By hypothesis, |OP | = |OQ|,
so |O%(P )| = |OQ|. Since both %(P ) and Q are in ROQ , we conclude that %(P ) = Q.
Similarly, %(P 0 ) = Q0 , so that %(P P 0 ) = QQ0 . By Theorem G1, P P 0 k QQ0 . In the
same way, we can prove P Q0 k P 0 Q. This proves that P P 0 QQ0 is a parallelogram, as
desired.

Theorem G14 A quadrilateral is a parallelogram it has one pair of sides


which are equal and parallel.

235
Proof The fact that a parallelogram has a pair of sides which are equal and
parallel is implied by Theorem G4. We prove the converse. Let ABCD be a quadri-
lateral so that |AD| = |BC| and AD k BC. We have to prove that ABCD is a
parallelogram. It suffices to prove that AB k CD. Let % be the rotation of 180
degrees around the midpoint M of the diagonal AC.

A
B
D

B


Bq


M B

C
B

As usual, %(A) = C by the definition of %, and we also have %(LAD ) k LAD , by


Theorem G1. Therefore %(LAD ) is a line passing through C and parallel to LAD itself.
Since LBC is also a line passing through C and parallel to LAD , the Parallel Postulate
implies that %(LAD ) = LBC . At this juncture, we only know that %(D) lies in LBC ,
but we are going to show that, in fact, %(D) = B. To this end, observe that on the
line LBC , there are four points: B, C, %(A), and %(D), but because C coincides with
%(A), there are at most three distinct point B, %(D), and C on LBC .

B C
%(A) %(D)?

We want to show that %(D) also coincides with B.


Since % is an isometry, |%(AD)| = |AD|. But |AD| = |BC| by hypothesis, so
|%(AD)| = |BC|. Since % maps a segment to a segment, this equality says % maps
AD to a segment in LBC of length |BC| joining %(A) (which is just C) to %(D).
Now BC is also a segment of length |BC| that has C as an endpoint, so to prove
%(D) = B, all we need to do is to prove that on the line LBC , both B and %(D) lie
in the same ray of LBC with vertex C. By Theorem G11, D and %(D) must lie in
opposite half-planes of LAC . Since D and B also lie in opposite half-planes of LAC ,
B and %(D) must lie in the same half-plane of LAC , and therewith, also on the same
ray RCB .47 Hence B and %(D), being two points of the same distance from C on the
47
For the reason that the intersection of the closed half-plane of LAC containing B with the line
LBC is the ray RCB .

236
same ray RCB , must coincide, i.e., %(D) = B. Coupled with the fact that %(C) = A,
we see that %(CD) = AB, and therefore, %(LCD ) = LAB . But according to Theorem
G1, %(LCD ) k LCD . Hence LAB k LCD , as desired.

Remark In the preceding proof, the fact that B and D lie in opposite half-
planes of the diagonal line LAC was taken for granted. This assumption allowed us to
conclude that B = %(D). This assumption was made without comment for a reason,
and it is time to amplify on that. First of all, note that the vertices B and D of an
arbitrary quadrilateral ABCD need not be in opposite half-planes of LAC . Here is
an example:

D
E
 E
B

 E
@ E

 @ E


 @E
A C

q %(D)

In this case, %(D) and B would be in opposite half-planes and the preceding proof
of Theorem G14 would fall apart. On the other hand, it is obvious from the picture
that a quadrilateral like that can never satisfy the condition of AD k BC, and that if
we draw a quadrilateral ABCD so that both (i) AD k BC and (ii) B, D are in the
same half-plane of LAC , then, visibly, the side AB would have to intersect the side
CD and we would have a contradiction to the fact that ABCD is a quadrilateral.
See figure:

A BZ
D
Z
B Z

Bq Z r

MB
ZZ
B

Zq
Z
%(D) B
B
C

237
In the school classroom, it is most likely that such an obvious fact as B and D
being in opposite half-planes of LAC would be taken for granted, and if so, one should
just proceed as in the preceding proof by concentrating on the main idea of using
the rotation % and ignoring this subtle point. In a beginning class on geometry, one
should sidestep, if at all possible, the reasoning for such too obvious facts as why B
and D reside in opposite half-planes of LAC . But in case this subject is broached, one
should encourage students to experiment with drawings and be convinced that, so
long as AD k BC and so long as B and D are in the same half-plane of LAC , it would
be impossible to draw a quadrilateral ABCD because AB will always intersect CD.
This may be enough to convince them to just assume the truth of such an assumption
(about B and D being in opposite half-planes of LAC ) and move on. But why not
give students a proof of this fact? One cannot answer this question without first going
through such a proof. Thus, without further ado, we now show why B and D lie in
opposite half-planes of LAC under the hypothesis of Theorem G14. It will be seen
that the Crossbar Axiom (L6) of 1 plays a crucial role.
Recall that we are assuming AD k BC. We want to prove that if B and D lie in
the same half-plane of LAC , there would be a contradiction. There are two cases to
consider: Case 1 : A and B lie in opposite half-planes of LCD . Case 2 : A and B lie in
the same half-plane of LCD . (There is no need to consider the remaining alternative
of one or more of A and B lying on LCD as it is excluded by the hypothesis that
AD k BC.)
Consider first Case 1: A and B lie in opposite half-planes of LCD . Then AB
intersects the line LCD at a point E, as shown:

B

 E
@E E
@ D E

 
 @ E

 @ E

 @E
A C

Now E does not lie in the segment CD because otherwise CD AB contains at least
E, contradicting the fact that AB and CD, not being adjacent sides of a quadrilateral,
cannot intersect. Therefore the segment CD does not intersect the line LAB . (More
formally, suppose it does, say CD LAB = {E 0 }. Now E 6= E 0 because E 0 CD by

238
definition, whereas E 6 CD. Therefore LCD and LAB meet at two distinct points E
and E 0 , contradicting (L2) of 1.) This implies that C, D lie in the same half-plane
of LAB . Thus,

D lies in the half-plane of LAB that contains C.

On the other hand, by the assumption that B and D lie in the same half-plane of
LAC , we have

D lies in the same half-plane of LAC that contains B.

Together, we have that D lies in 6 BAC. By the Crossbar Axiom, RAD intersects BC,
which implies that LAD intersects LBC , contradicting the hypothesis that LAD k LBC .
Thus we have shown that, in this case, B and D cannot lie in the same half-plane of
LAC .
Next, Case 2: A and B lie in the same half-plane of LCD .

D

 E
E
 B

E

 @ E


 @ E

 @E
A C

Again, the idea is to prove that B lies in 6 ACD and then apply the Crossbar
Axiom to deduce a contradiction. To this end, we have to show that (a) B is in the
half-plane of LAC containing D, and (b) B is in the half-plane of LCD containing A.
That (a) is true follows from the assumption that B and D are in the same half-plane
of LAC . The fact that (b) is true is because we are in Case 2. Thus B is in 6 ACD.
By the Crossbar Axiom, RCB intersects AD, which implies that LBC intersects LAD ,
contradicting the hypothesis that LBC k LAD . Therefore the assumption that B and
D lie in the same half-plane of LAC is also impossible in this case.
We have now proved that under the assumption of Theorem G14, B and D must
lie in opposite half-planes of the diagonal line LAC .

The preceding proof reveals one of the less attractive features of plane geometry,
namely, the fact that there are many technical details that are quite subtle, very tedious,

239
and also very boring. You have to see it to believe it, and now that you have seen it, you
can answer the preceding question yourself as to why not give students a proof of the fact
that B and D lie in opposite half-planes of the diagonal LAC . The simple answer is that,
for the sake of a meaningful school mathematics education in geometry, your students
(at least the beginners in geometry) should be shielded from these unattractive features
if at all possible. In this case, for example, it would be entirely legitimate to just say: To
avoid unnecessarily tedious arguments, we will assume the pictorially plausible fact that
B and D lie in opposite half-planes of the diagonal AC. At the risk of belaboring the
obvious, let it also be said that you will not be able to say this with conviction if you have
not gone through a few proofs such as the preceding one.

We are finally in a position to prove the special case of FTS when r = 2:

Theorem G15 Let 4ABC be given, and let D and E be midpoints of AB and
AC, respectively. Then DE k BC and |BC| = 2|DE|.

A
@
 @
 @
D @E F
 @ 
 @ 
 @ 
B  @C

Analysis In the above picture, the segments EF and F C have been added to the
original picture that comes with the theorem, and it is time that we say a few words
about these seemingly random add-ons. Think of a theorem as an edifice, then the
analog of proving a theorem is finding ways to build a given edifice. Of course when
one shows off an edifice, one first takes down all the scaffolding and removes all traces
of the construction process. If we are serious about building the edifice, however, we
must mentally put back the scaffolding and imagine the messy process itself so as not
to be seduced by the nice-looking finished product. In the same way, if we want to
prove the theorem, we cannot be limited by the pristine picture that comes with the
theorem. We have to put back some of the lines or circles that are integral to the

240
proof itself. So the add-ons are not random, but are things we put back into the
picture that allow us to better understand the construction process.
For the case at hand, we have to prove (among other things) |BC| = 2|DE|. This
is awkward, because we know how to prove two segments have the same length find
a congruence that carries one segment to the other but not one segment equal to
twice another. Let us therefore find a segment that has twice the length of DE and
then prove that the latter has the same length as BC. It is then natural to extend the
segment DE to F so that DF has twice the length of DE. In the above picture, we
have chosen to extend DE along the ray RDE , but the proof is essentially the same if
we extend instead along the opposite ray RED . (Exercise!) Referring specifically to
the above picture now, we wish to prove |DF | = |BC|. What do we know that could
bring this about? We remember Theorem G4, so it is at least worth a try to prove
DBCF is a parallelogram. But how to prove a quadrilateral is a parallelogram? At
this point, we remember Theorem G14, which then prompts us to try to show that
BD k CF and |BD| = |CF |. By hypothesis, |AD| = |DB|, so our focus shifts to
proving AD k CF and |AD| = |CF |. Seeing that E is the midpoint of both AC and
DF , a 180 degree rotation around E should come immediately to mind (Theorem G1
again!). At this point, the whole proof becomes natural.
One observation of the above analysis is worth mentioning. We see that the
reasoning process is built on a solid knowledge base: anyone who does not have The-
orems G1, G4, and G14 at his or her finger tips may as well forget about proving
this theorem. What we have is therefore a simple illustration of the fact that doing
mathematics requires a solid memory bank of basic facts. Dont listen to
anyone telling you that conceptual understanding, but no memorization, is all its
takes to do mathematics.

Proof On the ray RDE , we take a point F so that |DE| = |EF |. Join CF .
Consider the rotation % of 180 degrees around E. Since A and C are equi-distant
from E (hypothesis), as are D and F , we have %(CF ) = AD. Since % is an isometry,
|CF | = |AD|, but since |AD| = |DB| by hypothesis, we have |CF | = |DB|. On the
other hand, by Theorem G1 in 3 of Chapter 4, CF k AD, which is of course the
same as CF k BD. The quadrilateral DBCF therefore has a pair of sides which are
equal and parallel. By Theorem G14, DBCF is a parallelogram. Thus DF k BC,

241
which is the same as DE k BC. Furthermore, |DF | = |BC| (Theorem G4), and since
|DE| = |EF |, we have |BC| = 2|DE|. The proof is complete.

Theorem G15 has a surprising consequence: if ABCD is any quadrilateral, then


the quadrilateral obtained by joining midpoints of the adjacent sides of ABCD is
always a parallelogram. (See problem 4 in Exercises 5.1 immediately following).

Because of the importance of Theorem G15 in our work, we will give a second
proof using translations. The strategy is to first prove a special case of Theorem G10
(FTS*), and then use it to get at Theorem G15. So our first task is to prove the
following theorem.

Theorem G15* Let 4ABC be given and let D be the midpoint of AB. Suppose
a line parallel to BC passing through D intersects AC at E. Then E is the midpoint
of AC and 2|DE| = |BC|.
A
@
 @
 @
D @E
@ @
 @ @
 @ @
B  @ @C
F

Proof Let T denote the translation along the vector AD. Because |AD| = |DB|
by hypothesis, the definition of T implies that T (D) = B. From 3 of Chapter
4, we also know that T maps any line L not parallel to AD to another line parallel
to L itself. Therefore T (LDE ) is a line passing through B and parallel to DE. By
hypothesis again, we already know LBC k DE. By the Parallel Postulate, we see that
T (LDE ) = LBC . In particular, T (E) is a point F on BC, i.e.,

T (E) = F

Therefore we have T (DE) = BF . Because T is an isometry, we have also

|DE| = |BF |

242
Now consider T (LAC ). Because T (A) = D and T (E) = F , it follows that T (AE) =
DF and T (LAC ) = LDF . Using once more the fact that a translation is an isometry
that maps a line to another line parallel to itself, we have

DF k AC, and |AE| = |DF |

Since DE k BC by hypothesis, DF CE is a parallelogram and therefore,

|DE| = |F C| and |DF | = |EC|

by Theorem G4. Since |DE| = |BF |, the first equality implies 2|DE| = |BC|. Since
also |AE| = |DF |, the second equality implies E is the midpoint of AC. The proof
of Theorem G15* is complete.

Now we prove Theorem G15 again. Using the notation and picture of that the-
orem, we draw a line L through D parallel to BC. By what we have just proved,
L passes through the midpoint E of AC and therefore DE k BC. Also we already
know that 2|DE| = |BC|. We are done.

Exercises 5.1

1. Let D and E be points on sides AB and AC, respectively, of 4ABC. Prove:

|AB| |AC| |AD| |AE| |AB| |AC|


= = =
|AD| |AE| |DB| |EC| |DB| |EC|

2. (a) Finish the proof of Theorem G13 by proving that the diagonals of a parallelo-
gram bisect each other, in the sense that their point of intersection is the midpoint
of each diagonal. (Caution: Be careful.) (b) Give a proof of Theorem G15 by follow-
ing the proof in the text, but this time, extend DE along the ray RED (rather than
the opposite ray RDE ) to a point F , so that |F D| = |DE|.

3. Let D, E, F be the midpoints of sides AD, AC, and BC, respectively, of 4ABC.
Prove that the four triangles ADE, DBF , DEF , EF C are all congruent.

243
A
@
 @
 @
D @E
@ @
 @  @
 @  @
B  @ @C
F

4. If ABCD is any quadrilateral. Prove that the quadrilateral obtained by joining


midpoints of the adjacent sides of ABCD is always a parallelogram.

5. Given a line L, prove that all the points of a fixed distance k from L form two
lines each parallel to L.

6. Let L1 , L2 , and L3 be three mutually parallel lines, and let ` and `0 be two dis-
tinct transversals. If the two segments intercepted on ` by L1 , L2 , and by L2 , L3 are
of the same length, then prove that the same is true of the corresponding segments
intercepted on `0 . (There are many ways to prove this, but the most natural is to
make use of Theorem G15*.)

7. Use the idea in the proof of Theorem G15, but do not assume FTS, to prove
that if in triangle ABC, D and E are points on AB and AC respectively, so that
|AB| = 3|AD| and |AC| = 3|AE|, then DE k BC and |BC| = 3|DE|.

8. (a) Prove that FTS* (Theorem G10) implies FTS (Theorem G9). More precisely,
this means: assume everything we have proved up to and including Theorem G8 plus
Theorem G10, and prove Theorem G9. (b) Prove that FTS (Theorem G9) implies
FTS* (Theorem G10). More precisely, this means: assume everything we have proved
up to and including Theorem G9, and prove Theorem G10. (The standard mathe-
matical statement that succinctly summarizes (a) and (b) is that Theorem G9 and
Theorem G10 are equivalent.)

9. Given positive numbers a and b, prove that there exists a rectangle whose sides
have lengths a and b. (Dont skip any steps!)

244
10. Let F be the midpoint of the side BC of 4ABC. Prove that AF is the angle
bisector of 6 A if and only if AB and AC are equal.

11. Let ABCD be a parallelogram. Prove that B, D lie in opposite half-planes of


the diagonal AC, or more correctly, lie on opposite half-planes of the line LAC .

12. Let a segment AC lie in a half-plane of a given line `, and let B be the midpoint
of AC. Let LAD , LBE , and LCF be three parallel lines which meet ` at D, E, F ,
respectively. Prove that 2|BE| = |AD| + |CF |.

13. Let be a congruence and D be a dilation with center O and scale factor r.
Prove that 1 D is a dilation; be sure to state what its center of dilation is
and what its scale factor is.

2 Dilation

We have been considering isometries almost exclusively thus far. Now we have to
look seriously into an important class of transformations that are not isometries.

Definition A transformation D of the plane is a dilation with center O and


scale factor r (r > 0) if

(1) D(O) = O.
(2) If P 6= O, the point D(P ), to be denoted simply by Q, is the point on
the ray ROP so that |OQ| = r|OP |.

Thus the definition of a dilation is starkly simple: a dilation with center at O


maps each point by pushing out or pulling in the point along the ray from O to
that point, depending on whether the scale factor r is bigger than 1 or smaller than
1. It corresponds to the intuitive concept of a projection from the point O. In
particular, each ray issuing from O is mapped into itself (caution: all this says is that

245
the ray is mapped onto itself, but each point on the ray will in general be mapped
into another point on the same ray). Here is an example of how a dilation with r = 2
maps four different points (for any point P , we let the corresponding letter with a
prime, P 0 , denote the image D(P ) of P ):

c U0
cr
r
Q0
c

c
c

cU
r r

c
Q
c

c
c

s
c
r r
O P P0
r
V
r 0
V

A fundamental property of dilations, one that makes possible the simple drawings
of the dilation of rectilinear figures (i.e., figures composed of line segments), is the
following. It will be clear from this and subsequent proofs related to dilation that the
FTS and the Parallel Postulate lie at the heart of the matter.

Theorem G16 Dilations map segments to segments. More precisely, a dilation


D maps a segment P Q to the segment joining D(P ) to D(Q). Moreover, if the line
LP Q does not pass through the center of the dilation D, then the line LP Q is parallel
to the line containing D(P Q).

Proof Let D have center O and scale factor r. If LP Q passes through O, then
either P and Q lie on the same side of O or they lie on opposite sides of O. In either
case, the fact that D maps P Q to the segment in LP Q from D(P ) to D(Q) follows
immediately from the definition of a dilation. We therefore assume that LP Q does
not pass through O. Let P 0 = D(P ), Q0 = D(Q).

246
P0 U0
0
C
 Q
C


C

C

P C U


C
 Q
C

C

C

Let U be any point of P Q, and we will show that D(U ) is on P 0 Q0 . Let U 0 = D(U ).
Consider 4OP 0 U 0 . Because D maps P and U to P 0 and U 0 , respectively,
|OP 0 | |OU 0 |
= =r
|OP | |OU |
By FTS, LP U k LP 0 U 0 . Denoting LP Q by L, this says LP 0 U 0 k L. If we apply the same
reasoning to 4OP 0 Q0 , we get
|OP 0 | |OQ0 |
= =r
|OP | |OQ|
and, therewith, also LP 0 Q0 k L. Denoting LP 0 Q0 by L0 , we therefore have L0 k L. Thus
both L0 and LP 0 U 0 are lines passing through P 0 and parallel to L. By the Parallel
Postulate, LP 0 U 0 = L0 , and in particular, U 0 lies on L0 . The fact that U 0 lies in the
segment P 0 Q0 follows from the Crossbar Axiom: the latter implies that, because U
lies in 6 P OQ, the ray ROU must meet the segment P 0 Q0 at some point, which then
must be U 0 because two distinct lines meet only at one point. We pause to observe
that we have also proved in the process that L0 k L, assuming L does not contain O.
Next, we show that P 0 Q0 D(P Q). Let U 0 P 0 Q0 . Let U be the intersection of
ROU 0 and L. For exactly the same reason as before, U P Q. We now prove that
D(U ) = U 0 . If we can prove that
|OU 0 |
=r
|OU |
then by the definition of D, we would have D(U ) = U 0 . To prove this, let V be the
point on ROU so that |OV | = r|OU |. In 4OP 0 V , we have |OP 0 |/|OP | = r, so that
|OP 0 | |OV |
=
|OP | |OU |

247
By FTS, LP 0 V k LP U , or what is the same thing, LP 0 V k L. Since L0 is also a
line passing through P 0 and parallel to L, the Parallel Postulate again dictates that
L0 = LP 0 V . So V in fact lies on L0 , and as it also lies on the ray ROU , we see that
V = U 0 . Thus from |OV | = r|OU |, we conclude that |OU 0 |/|OU | = r, as desired.
The proof of Theorem G16 is complete.

There are two useful corollaries of this theorem.

Corollary 1 A dilation maps lines to lines, and rays to rays.

Proof of Corollary 1 Given a line L, we must prove that D(L) is also a line.
Let P , Q be points on L. The theorem says D(P Q) is the segment P 0 Q0 , where
P 0 = D(P ) and Q0 = D(Q). Let L0 be the line containing P 0 Q0 . Let U L, we will
prove that D(U ) L0 . We may assume that U lies outside P Q, in which case, we
may assume without loss of generality that Q P U .

P0 Q0 U0
L0
B

B


B

B

P B Q
U
L
B

B

B

B

Let U 0 = D(U ). Theorem G16 implies that D(P U ) is the segment P 0 U 0 . Since
Q P U , Q0 P 0 U 0 . Thus if L denotes the line containing P 0 U 0 , then P 0 and Q0
both belong to L0 and L . Therefore L0 = L , and therefore also U 0 L0 . We have
proved that D(L) L0 . It remains to prove that L0 D(L). Let U 0 L0 , and we
let U be the point of intersection of ROU 0 and L. If we let V = D(U ), then we prove
exactly as before that V = U 0 , so that D(L) = L0 .
The fact that D maps rays to rays is proved in the same manner.

Corollary 2 Let triangles ABC and A0 B 0 C 0 be given. If a dilation D maps the


vertices to vertices, then it also maps triangle to triangle. Precisely: if D(A) = A0 ,

248
D(B) = B 0 , and D(C) = C 0 , then D(4ABC) = 4A0 B 0 C 0 .

Proof of Corollary 2 Recall that 4ABC is the union of the segments AB, BC,
and AC. Thus the conclusion means we must prove D(AB) = A0 B 0 , D(BC) = B 0 C 0 ,
and D(AC) = A0 C 0 . But this follows immediately from Theorem G16 and the hy-
pothesis that D(A) = A0 , D(B) = B 0 , and D(C) = C 0 . Corollary 2 is proved.

Armed with Theorem G16, we see that it is very simple to draw the image of a
segment by a dilation. Indeed, to draw the image of a segment AB by a dilation D,
simply find the image points D(A) and D(B) of the endpoints and then draw the
segment joining D(A) to D(B). Here are two examples. The first has a scale factor
of 2.5.
A0 

 @
@ 

 @0
  C 
 

  
  
A 
  
@  
 C
  
   
  

O  
 

B B0

Here A0 = D(A), B 0 = D(B), and C 0 = D(C). In the next example, the scale factor
is 2.1 , the original figure S is the quadrilateral ABCD, and D(S) is the enlarged
quadrilateral to the right:

 Q

  Q D(S)

 Q

J Q

J Q
Q
Q

J 

J  


J 


D J 
A


 QQ J


J Q J
 C

J  J

J J

 J J

 J J
O 

J
B
J

249
You are encouraged to make many such drawings of dilated rectilinear figures.
It should not have escaped your attention that the dilation of a rectilinear figure
has the same shape as the original figure. But what about the dilation of curved
figures? There is no simple replacement of Theorem G16 in that case, but in practical
terms (in a sense to be made precise below), the procedure is not more complicated.
Consider for example the following curve:

Let us enlarge it by a scale factor of 1.8. Here is what we do: We pick an arbitrary
point O as the center of dilation, and then pick some judiciously chosen points on the
curve, as shown. For convenience, we shall refer to the chosen points on the original
geometric figure as data points.

r
r
r
rr
r
r r
r r r

q
O

Now we draw the rays from O to each of the data points on the curve and dilate
the latter by a scale factor of 1.8 along these rays and ignore the curve itself. We
thereby obtain a collection of points, and these will be points on the dilated curve.
It should not be difficult to discern, just from these few dilated points, the general
shape of the dilated curve.

250
,#
,#
,#
r
,#
,# 
"
 "
,#  ""
,#  " 
r # 
, " 
,# "" 
,#r  "

 !
#    !!
# r "
, "  !
 !
r
,
# " " !! 
,  ! 
r  "

r 
#  "  ! 
, ! 
# !! 
,
# "
r "" 
 r !  

r 
, !  
#r"  !!! 
# !  
r 
, 
r  r
r (r(
,
# "  !   

"r !!
, "  !   (
 ((((
r r
#
  (
,  (
#" !!  (((
r r
" ! 
,
#"
 !   r ( (((((
, 
#"!! 
  (( (
,

"
#!
! 
(( ( ((((
,"
#!

! 
((((
,(
#!"
(( (
q(


"

!

 
(

"
!
,


#

(
O

In the preceding picture, we used 11 data points to demonstrate how to carry out
the approximate dilation of the given curved figure, namely, we simply dilate these
points one by one. The dilated points then give a suggestion of what the dilated curve
would look like. If we delete the rays, we may see better the data points and their
dilated images.
r

r
r
r
r
r r
r r
r
r r
rr r r
r
r r
r r r

q
O

It is obvious that the more data points we choose on the original curve, the better

251
we would be able to approximate the dilated curve. To drive home this idea, let us
use 88 data points instead of 11 on the original curve and dilate them from O to
get the following picture. (We omit the rays issuing from O in the interest of visual
clarity.)

q
O

Next we triple the number of data points and use 264 instead of 88. The ap-
proximation of this finite collection of points to the curve itself is already remarkably
good.

q
O

252
If we use 600 points, then the images can almost pass for the real thing except
that, if we look very carefully, we can still see discrete dots rather than a smooth
piece of curve near the tail end of the longer curve.

q
O

Finally, if we use 1200 data points, then to the naked eye, these are two smooth
curves, one being the dilation of the other. For all practical purposes, this approxi-
mation to the true dilated curve is the real thing.

q
O

What we have described is a basic principle of constructing the dilated image of


any object: To dilate a given object by a scale factor of r, replace the object by

253
a finite collection of judiciously chosen points on the object, still to be called data
points, and then simply dilate these data points one by one by a scale factor of
r. By increasing the number of data points, their dilated images yield a closer and
closer approximation to the true dilated object. A few experiments with this kind of
drawing (see problems 5 and 6 of Exercises 5.2) would suffice to convey the idea that
the dilated object so obtained has the same shape as the original, but is magnified
or shrunk by a scale factor of r (depending on whether r > 1 or r < 1). This is how
we can enlarge or shrink arbitrary figures regardless of how curved they may be.
It would be very instructive in the school classroom for students to magnify or
shrink many simple curved figures by such hands-on activities. We will elaborate on
these ideas in the next section.

Incidentally, what we have described is also the basic operating principle of digital
photography: approximate any real object by a large number of data points on the
object, and then magnify or shrink these data points by dilation.

The following theorem summarizes the remaining basic properties of dilations.

Theorem G17 Let D be a dilation with center O and scale factor r. Then:
(a) D is a bijection. In fact its inverse is the dilation with the same center O but
with a scale factor 1/r.
(b) For any segment AB, |D(AB)| = r|AB|.
(c) D maps angles to angles and preserves degrees of angles.

remark Observe the delicate point that the statements of part (b) and part (c)
depend on the validity of Theorem G16 and its Corollary 1. Indeed, without knowing
that D(AB) is a segment, the notation |D(AB)| would not even make sense (because
the notation || only makes sense when is a segment or an angle), and without
knowing that D maps rays to rays, we would not know that D maps angles to angles.

Proofs of parts (a) and (b) (a) Let D0 be the dilation with center O and scale
factor 1r . From the definition of a dilation, it is easy to check that D D0 = I = D0 D.
Thus D is a bijection.

254
Part (b) has been implicitly proved in the proof of Theorem G16. Indeed, in the no-
tation above, if P 0 = D(P ) and U 0 = D(U ), then we have shown that D(P U ) = P 0 U 0 .
If we look at 4OP 0 U 0 , then FTS implies that |P 0 U 0 |/|P U | = r, i.e., |D(P U )| = r|P U |.
Since P and U are arbitrary points, (b) is proved.

For the proof of part (c), we need to first get to know something about parallel
lines and angles. First some definitions.
Let two distinct lines L1 , L2 be given. Recall that a transversal of L1 and L2 is
any line ` that meets both lines in distinct points. Suppose ` meets L1 and L2 at P1
and P2 , respectively. Let Q1 , Q2 be points on L1 and L2 , respectively, so that they
lie in opposite half-planes of `. Then 6 Q1 P1 P2 and 6 P1 P2 Q2 are said to be alternate
interior angles of the transversal ` with respect to L1 and L2 .
`E
ES
E
E ((((2((
R ( L2
Q2 ( s E
( (
(((( E P2
E
E
E
E s

P1E
E
Q1 L1
E

An angle which is the opposite angle of one of a pair of alternate interior angles
is said to be the corresponding angle of the other angle. For example, because
6 Q1 P1 P2 and 6 P1 P2 Q2 are alternate interior angles and because 6 SP2 R2 in the above

figure is the opposite angle of 6 P1 P2 Q2 , 6 SP2 R2 is then the corresponding angle of


6 Q 1 P1 P2 .

In the school classroom, we suggest that alternate interior angles be defined simply
by drawing a picture as above and pointing to 6 Q1 P1 P2 and 6 P1 P2 Q2 . The correct
definition (the one just given), using the precise concept of the half-planes of a line,
may be pointed out as a side remark to open students minds to the potential of com-
plete logical precision. It will be seen presently that we need such precision because
we want to present logically complete proofs of several theorems, including the one on

255
the angle sum of a triangle (see 5, Chapter 11). A overwhelming majority of school
students would not take kindly to the need of such precision in the proofs of theorems
(as we will do in the proof of Theorem G18 and later in Chapter 11), because they
would consider the investment of so much effort into something so visibly obvious to
be ridiculous. So some compromise in the school classroom would be advisable. The
purpose of this book is, however, to expand your mathematical horizon for teaching in
schools by supplying you with a solid foundation on all things directly related to the
K-12 classroom. Acquiring the ability to reason through such bread-and-butter issues
as alternate interior angles with precision would certainly be essential for this purpose.

The basic theorem about parallel lines and angles is the following:

Theorem G18 Alternate interior angles of a transversal with respect to a pair


of parallel lines are equal. The same is true of corresponding angles.
`
ES
E
E
Q2 EP
2 R2 L2
sE
E
Eq
EM
E s
E L1
P1E Q1
E

Proof We continue to use the above notation. Let M be the midpoint of P1 P2 ,


and let % be the rotation of 180 degrees around M . Because %(P1 ) = P2 , %(L1 ) is
a line passing through P2 . By Theorem G1, %(L1 ) k L1 , and the hypothesis says
L2 is also a line passing through P2 and parallel to L1 , the Parallel Postulate says
%(L1 ) = L2 . By hypothesis, the rays RP1 Q1 and RP2 Q2 lie in opposite half-planes of
`, and by Theorem G11, % maps each half-plane of ` to its opposite half-plane. Thus
%(RP1 Q1 ) = RP2 Q2 . Of course, %(RP1 P2 ) = RP2 P1 . Hence, %(6 Q1 P1 P2 ) = 6 P1 P2 Q2 ,
and since % preserves degrees of angles, |6 Q1 P1 P2 | = |6 P1 P2 Q2 |. The proof is the
same for the other pair of alternate interior angles. The last assertion of the theorem
about corresponding angles follows from Theorem G12. Theorem G18 is proved.

256
Remark The readers who are familiar with some high school geometry would be
very tempted at this point to immediately use Theorem G18 to prove the well-known
fact that the sum of (the degrees of) angles in a triangle is 180 . The argument goes as
follows. Given 4ABC, draw a line DE through A that is parallel to BC, as shown:

A
D s E
J?
J
J
s
J
?J
B C

By Theorem G18, |6 DAB| = |6 B| and |6 CAE| = |6 C|, so that

|6 B| + |6 BAC| + |6 C| = |6 DAB| + |6 BAC| + |6 CAE| = 180

This would seem to finish the proof. Let us affirm that this intuitive argument is
indeed how a high school student should remember why the angle sum of a triangle is
180. For a teacher to really come to grips with the delicate points about Euclidean
geometry, however, it is necessary to point out that for Theorem G18 to be applicable,
we must first prove that 6 B and 6 DAB are alternate interior angles, as are 6 C and
6 CAE. See the italicized remarks above Theorem G18. In 5 of Chapter 11, we will

present a proof of the angle sum theorem with such details filled in.

We can finally finish the proof of part (c) of Theorem G17. Since D maps
rays to rays, it maps angles to angles. Given 6 P QR, let D(RQP ) = RQ0 P 0 and
D(RQR ) = RQ0 R0 , so that D(6 P QR) = 6 P 0 Q0 R0 . We have to prove that

|6 P QR| = |6 P 0 Q0 R0 |

A

P 
 
 
 t  t
 
Q R
 B

0 
P


 t

Q0 R0

257
Without loss of generality, we may assume that both have positive degree. We
claim that LQ0 P 0 must intersect LQR . If not, then LQ0 P 0 k LQR . But we already know
from part (b) that LQ0 R0 k LQR . Thus we have two distinct lines LQ0 P 0 and LQ0 R0
passing through Q0 and parallel to LQR , and this contradicts the Parallel Postulate.
Thus LQ0 P 0 intersects LQR . In the interest of notational economy, let the point of
intersection continue to be denoted by R, as shown. By Theorem G16, LQR k LQ0 R0
and LQP k LQ0 P 0 . Therefore, according to Theorem G18 about corresponding angles,
(notation as in the preceding figure) |6 P QR| = |6 ARB| = |6 P 0 Q0 R0 |, as desired.

The following converse of Theorem G18 will also be useful; the proof is sufficiently
straightforward to be left as an exercise.

Theorem G19 If the alternate interior angles of a transversal with respect to


a pair of distinct lines are equal, then the lines are parallel. The same is true of
corresponding angles.

To conclude this discussion of dilation, it would be pleasant to report that a


composite of two dilations (with respective centers) is also a dilation (with perhaps
some other center), but unfortunately such is not the case. An example will be given
in an exercise below.

Exercises 5.2

1. Prove Theorem G19.

2. (a) Prove: The dilation of a convex set is a convex set. (b) Prove: The dilation
of a polygon is a polygon. (c) Prove: The dilation of a regular polygon is a regular
polygon.

3. Let ABCD and A0 B 0 C 0 D0 be two quadrilaterals. Suppose there is a point K so


that the rays RKA , RKB , RKC , RKD contain A0 , B 0 , C 0 , D0 , respectively. Assume

258
also
|KA| |KB| |KC| |KD|
0
= 0
= 0
=
|KA |KB |KC |KD0 |
Prove that if ABCD is a square, then so is A0 B 0 C 0 D0 . (Caution: Be careful what
you say and how you say it.)

4. Let D and E be the midpoints of AB and AC, respectively, of 4ABC, and let
K be the midpoint of DE (see picture below). Let D be the dilation with center A
and scale factor 21 . (a) If % is the rotation of 180 around K, describe precisely the
figure D(%(4ABC)). (b) If T is the translation along AD, describe precisely the
figure T (D(4ABC)). (c) How are the figures in (a) and (b) related?

A
@
 @
 @
D r @E
 K @
 @
 @
B  @C

5. Let O be a point not on a given circle C with center K. Let D be the dilation
with center O and scale factor r. Prove that the image D(C) is a circle, and that the
center of D(C) is the image under D of the center of C. (Caution: This is a slippery
proof. Follow the precise definitions of a circle and a dilation.)

6. Given a point O and the following curve in the plane:

Or

Trace both onto a piece of paper, and choose enough points on the curve so that,
by dilating these points with center O and scale factor 2, the dilated points give a

259
reasonable picture of the dilated curve with scale factor 2.

7. Let P and Q be two distinct points in the plane and let DP , DQ be two dilations
with center at P , Q respectively, and with scale factor 21 and 2, respectively. Prove
that DP DQ is not a dilation. (Hint: Suppose DP DQ is equal to a dilation DX
with center X, then DP DQ maps X to X. On the other hand, there is no point Y
so that (DP DQ )(Y ) = Y .)

8. Let P and Q be two distinct points in the plane and let DP , DQ be two dilations
with center at P , Q respectively, and with scale factor r and s, respectively. If rs 6= 1,
prove that there is a point X so that (DP DQ )(X) = X. (In fact, in this case, the
composition DP DQ is a dilation with center X, but the proof requires some tools
that we have not yet developed.)

9. [This exercise refers to a coordinate system.] (a) Let D be the dilation with center
O and scale factor 2, and let be the congruence which is the reflection across the
horizontal line corresponding to {y = 1}. Are D and D equal? (b) Repeat
part (a) if is now the congruence which is the rotation of 90 degrees around the
point (1, 0). (c) Repeat part (a) if is now the congruence which is the translation
that sends a point (x, y) to (x + 2, y).

3 Similarity

The goals of this section are to introduce a correct definition of similarity, prove
two basic criteria for similarity, and as an application, prove the most famous theorem
in elementary mathematics: the Pythagorean Theorem.

Let S and S 0 be two sets in the plane. How to say correctly that they are simi-
lar? First and foremost, the phrase having the same shape has no precise meaning
and therefore cannot be used as a definition of similarity, contrary to what school
textbooks tell you. Moreover, the only precise definition of similar figures offered
in the school curriculum seems to be that of similar polygons, and the problem
with that is that it leaves out geometric figures that are not rectilinear. We want a

260
definition of similar figures that applies to all situations. Now in 2, we saw that if
one figure is a dilation of the other, then they do appear to have the same shape.
Why not just say a figure is similar to another if one is the dilation of the other? To
answer this question, consider the following figures:

S0
S

One can convince oneself that S 0 is obtained from S by a dilation of scale factor 21 .
Now rotate S 0 clockwise by 90 degrees around the center of the circle in S 0 to obtain
S , as shown:

S
S

Now S is of course congruent to S 0 and therefore must have the same shape as
S, but can S be a dilation of S? Not according to Theorem G16 because if it were,
then the horizontal segment of S would have to be parallel to the vertical segment
of S.
What this simple example shows is that it is too restrictive to define similarity
in terms of dilations alone. One must allow for a composition with a congruence as
well, i.e., a dilation of S by a scale factor of 12 , followed by a clockwise rotation of 90
degree yields the figure S which still has the same shape as S. With this in mind,
we now give the formal definition of similarity.

261
Definition We say S is similar to S 0 , in symbols, S S 0 , if there is a dilation
D so that
D(S) = S0

More precisely, S S 0 means there is a congruence and a dilation D so that


(D(S)) = S 0 , i.e., D maps S to S 0 . A composition D of a congruence and
a dilation D is called a similarity. The scale factor of the similarity D is by
definition the scale factor of the dilation D.
The fact that we define a similarity as a composite D, where is a congruence
and D is a similarity, is a matter of convention: we could have equally well
defined a similarity by composing D and in the reverse order, i.e., D . But
of course, once so defined, one must be consistent throughout. One can prove
that the two definitions are equivalent, in the sense that for any two sets S and
S 0 , (D(S)) = S 0 for some congruence and dilation D if and only if there
is a congruence 0 so that D(0 (S)) = S 0 ; see the Lemma at the end of this
section.
This situation is somewhat reminiscent of the definition of the multiplication
of whole numbers, e.g., 3 5 can be defined either as 5 + 5 + 5, or 3 +
3 + 3 + 3 + 3 + 3, but whatever definition is used, once it is fixed, we must
be consistent in following through with the definition and should not change it
without explicitly invoking the commutativity of multiplication.

Remarks
(i) We call attention to the fact that in the definition of similarity, a congruence
(and not just an isometry) is used. Although every congruence is an isometry, at
this stage, we still do not know whether an isometry is a congruence or not. So the
advantage of a congruence over an isometry, at this stage, is that a congruence has
an inverse and it also preserves the degrees of angles.
(ii) Observe that the definition of similarity given above is not obviously symmetric
with respect to the two sets S and S 0 , in the sense that if S S 0 (so that (D(S)) = S 0
for a dilation D and a congruence ), it is not clear that also S 0 S (so that
0 (D0 (S 0 )) = S for a dilation D0 and a congruence 0 ). We will prove that such is
indeed the case. Thus one can speak unambiguously of two subsets being similar.
Since this proof is somewhat off the main line of reasoning, we will postpone this
discussion until the end of this section.
(iii) By the very definition of similarity, the concept of congruence must precede
the concept of similarity.

262
(iv) We note explicitly that, although most of our attention will be lavished on
triangles, this definition of similarity gives us a precise conception of what it means
when we say one object (regardless of its shape) is similar to another. For example,
it follows directly from the definition that all circles are similar to each other (see
problem 3 in Exercises 5.3).
(v) This concept of similarity applies not only to any geometric figure in the plane,
but also to figures in higher dimensions provided we extend the definitions of rota-
tions, reflections, and translations to higher dimensions.

As in the case of congruence, the notation with the similarity of triangles, by


tradition, is made to carry more information. We say 4ABC 4A0 B 0 C 0 if there
is a similarity F so that

F (A) = A0 , F (B) = B 0 , F (C) = C 0

In other words, 4ABC 4A0 B 0 C 0 means not only that there is a similarity F so
that the sets F (4ABC) and 4A0 B 0 C 0 are equal, but that F specifically maps A to
A0 , B to B 0 , and C to C 0 .

Theorem G20 Given two triangles ABC and A0 B 0 C 0 , their similarity, i.e.,
4ABC 4A0 B 0 C 0 , is equivalent to the following equalities:

|6 A| = |6 A0 |, |6 B| = |6 B 0 |, |6 C| = |6 C 0 |

and
|AB| |AC| |BC|
0 0
= 0 0 = 0 0
|A B | |A C | |B C |

Remark It is in the proof of this theorem that we get to see why a similarity is
defined as the composition D of a congruence (and not just an isometry) and
a dilation D. It follows from Theorems G5 and G17 that a similarity preserves the
degrees of angles, and this fact accounts for the validity of Theorem G20.

Proof If we have 4ABC 4A0 B 0 C 0 , then the assertions about angles and sides
follow from Theorems G5 and G17. For the converse, we prove something stronger:

263
Theorem G21 (SAS for similarity) Given two triangles ABC and A0 B 0 C 0 .
If |6 A| = |6 A0 |, and
|AB| |AC|
0 0
= 0 0
|A B | |A C |
then 4ABC 4A0 B 0 C 0 .

Proof of Theorem G21 If |AB| = |A0 B 0 |, then the hypothesis would imply
|AC| = |A0 C 0 | and we are reduced to the SAS criterion for congruence. Thus we
may assume that |AB| and |A0 B 0 | are not equal. Suppose |AB| < |A0 B 0 |. Then the
hypothesis that |AB|/|A0 B 0 | = |AC|/|A0 C 0 | implies |AC| < |A0 C 0 |. On A0 B 0 , let B
be the point so that |A0 B | = |AB|. Similarly, on A0 C 0 , let C be the point satisfying
|A0 C | = |AC|.
C
A0
@
 @
 @
 @
 @
A PPP
B  @ C PP

 @
@
P
B
B0  @ C0

By SAS for congruence (Theorem G7), 4A0 B C = 4ABC. Let be the con-
0
gruence that maps 4A B C to 4ABC. Moreover, if r denotes the common value
of |AB|/|A0 B 0 | and |AC|/|A0 C 0 |, then the dilation D with center A0 and scale factor
r maps A0 to A0 of course, but also B 0 to B because by the definition of dilation,
D(B 0 ) is the point on the ray RA0 B 0 so that the distance of D(B 0 ) from the center A0
is
|AB|
r |A0 B 0 | = 0 0 |A0 B 0 | = |AB| = |A0 B |
|A B |
Thus D(B 0 ) = B . Similarly, D(C 0 ) = C . Thus D maps 4A0 B 0 C 0 to 4A0 B C ,
thanks to Corollary 2 of Theorem G16. Hence, we have:

( D)(4A0 B 0 C 0 ) = (D(4A0 B 0 C 0 )) = (4A0 B C ) = 4ABC

This shows that 4A0 B 0 C 0 4ABC.

264
We next give the proof of the most easily applied criterion of similarity: AA for
similarity.

Theorem G22 (AA for similarity) Two triangles with two pairs of equal an-
gles must be similar.

Remark Of course as soon as we prove that the sum of angles in a triangle is


180 , then knowing the equality of two pairs of angles is seen to be equivalent to
knowing that all three pairs of angles are equal. This is why this criterion is some-
times cited as the AAA criterion.

Proof of Theorem G22 Let two triangles ABC and A0 B 0 C 0 be given. We may
assume |6 A| = |6 A0 | and |6 B| = |6 B 0 |. Then we must prove that 4ABC 4A0 B 0 C 0 .
If |AB| = |A0 B 0 |, then the hypothesis would imply 4ABC = 4A0 B 0 C 0 because of the
ASA criterion for congruence (Theorem G8). Thus we may assume that |AB| and
|A0 B 0 | are not equal. Suppose |AB| < |A0 B 0 |. On A0 B 0 , choose a point B so that
|A0 B | = |AB|, and let the line parallel to B 0 C 0 and passing through B intersect A0 C 0
at C . By Theorem G18, |6 A0 B C | = |6 B 0 | which, by hypothesis, is equal to |6 B|.
Since also |6 A0 | = |6 A| by hypothesis, 4A0 B C = 4ABC by ASA for congruence.
0
In particular, |A C | = |AC|, by Theorem G5.
C
A0
@
 @
 @
 @
r @ A PPP
B  @ C PP r
 @ PB
 @
0
B @
C0

We now claim that


|A0 B 0 | |A0 C 0 |
=
|A0 B | |A0 C |
Once we prove this, we would see that the triangles A0 B 0 C 0 and ABC satisfy the
conditions of SAS for similarity (Theorem G21) and are therefore similar. In order

265
to prove the preceding equality, we take a point C0 on A0 C 0 so that
|A0 B 0 | |A0 C 0 |
=
|A0 B | |A0 C0 |

By FTS, LB C0 k LB 0 C 0 . Then the two lines LB C0 and LB C both have the property
that they are parallel to LB 0 C 0 and they pass through B . By the Parallel Postu-
late, LB C0 = LB C , which implies C0 = C . The equality that |A0 B 0 |/|A0 B | =
|A0 C 0 |/|A0 C0 | then becomes the sought-for equality |A0 B 0 |/|A0 B | = |A0 C 0 |/|A0 C |.
The proof of Theorem G22 is complete.

We emphasize once again that inasmuch as the validity of Theorems G21 and G22
depends on Theorem G16, it ultimately depends on the Parallel Postulate. The fact
that the conclusions about similar figures depend on the Parallel Postulate will be
pointed out once more in the last section of Chapter 13.
To round off the picture, let us also mention the fact that, in analogy with the
case of congruence, there is also an SSS criterion for similarity. This will be proved
in Chapter 11.

For the purpose of learning about linear equations, students have to learn how
to apply Theorems G21 and G22 in specific situations. There is probably no better
illustration of how this is done than the following proof of the Pythagorean Theorem.
Note that there will be a second proof of this theorem in Chapter 18 using the concept
of area.
Let us fix the terminology. Given a right triangle ABC with C being the vertex
of the right angle. Then the sides AC and BC are called the legs of 4ABC, and
AB is called the hypotenuse of 4ABC.
AQ
Q
Q
Q
Qc
Q
b Q
Q
Q
Q
Q
C a
QB

Theorem G23 (Pythagorean Theorem) If the lengths of the legs of a right

266
triangle are a and b, and the length of the hypotenuse is c, then a2 + b2 = c2 .

The basic idea of the proof is very simple. Referring to the same picture, we draw
a perpendicular CD from C to side AB, as shown:
AQ
Q
QD
Q

Q c
b


Q
Q
Q

Q

Q
Q
C

a
Q
B

Then a simple use of Theorem G22 reveals that both right triangles CBD, and ACD
are similar to 4ABC and therefore their corresponding sides are proportional. This
immediately leads, via the cross-multiplication algorithm, to several equalities be-
tween the products of (lengths of) the sides of these triangles. If you already know
what to prove, then by trial and error, you cannot help but arrive at a combination
of these equalities that will give you what you want. On the other hand, if you dont
know what to prove, then these identities are not likely to do you any good. It is
in this sense that we believe that guessing the correct statement of the Pythagorean
theorem is more important than being able to find a proof of the theorem. The first
person to discover this theorem had to be an extraordinary mathematician.48

Proof We will prove that 4ABC 4CBD, and also 4ABC 4ACD. It
suffices to prove 4ABC 4CBD as the other similarity can be proved the same
way. The two triangles ABC and CBD have two pairs of equal angles: |6 CDB| =
|6 ACB| = 90 , and |6 CBD| = |6 ABC|. By the AA criterion for similarity
|BA| |BC|
(Theorem G22), the triangles are similar. Hence |BC| = |BD| , so that by the
cross-multiplication algorithm,

|BC|2 = |AB| |BD|


48
There were obviously lots of smart people in the ancient world because this theorem was inde-
pendently discovered in, at least, Babylonia, India, and China; the Babylonian discovery preceded
Pythagoras (sixth century B.C.) by at least twelve centuries.

267
|AC| |AD|
By considering the similar right triangles ABC and ACD, we conclude |AB| = |AC|
and
|AC|2 = |AB| |AD|
Adding, we obtain

|BC|2 + |AC|2 = |AB| |BD| + |AB| |AD| = |AB| ( |BD| + |DA| ) = |AB|2

This is the same as a2 + b2 = c2 . The proof is complete.

The converse of the Pythagorean Theorem is also true; the proof makes use of the
Pythagorean Theorem itself and is sufficiently simple to be left as an exercise.

Theorem G24 (Converse of Pythagorean Theorem) If triangle ABC sat-


isfies |CB|2 + |CB|2 = |AB|2 , then |6 C| = 90 .

An immediate consequence of the Pythagorean Theorem is the following extension


of the SAS criterion for triangle congruence in the case of right triangles:

Theorem G25 (HL) Two right triangles with equal hypotenuses and a pair of
equal legs are congruent.

Here HL stands for hypotenuse-leg. By the Pythagorean Theorem, the other


pair of legs of these two right triangles must be equal. The SAS criterion then yields
the desired congruence.

Finally, we return to the discussion of S S 0 for two sets S and S 0 . By definition,


this means that there is a dilation D and a congruence so that (D(S)) = S 0 . In
this definition, the order between S and S 0 seems to matter, i.e., S first and S 0 second,
because to say instead S 0 S would mean that there is a dilation D0 and a congruence
so that (D0 (S 0 )) = S. It is not obvious how to conclude from (D(S)) = S 0
that (D0 (S 0 )) = S for some D0 and . Why is this a concern? Because if it is
indeed the case that there are two sets S and S 0 so that S S 0 but S 0 is not similar
to S, then we will not be able to say in general that two sets are similar. Rather,

268
given two sets S and S 0 , we must be careful to distinguish between S is similar to
S 0 or S 0 is similar to S. Life would then be unbearably complicated.
We salvage the situation by proving that S S 0 must imply S 0 S. Henceforth,
we can simply say that S and S 0 are similar without any fear of confusion. In
standard terminology, we express the fact that S S 0 if and only if S 0 S by the
statement that similarity is a symmetric relation. The crux of the matter is the
following lemma:

Lemma Let D be a dilation and a congruence. Then there is a dilation D0 so


that D = D0 .

This is almost the statement that D and commute (compare problem 9 in Ex-
ercises 5.2), but it doesnt quite say that because D0 is not going to be D in general.
In fact, it is not difficult to see what D0 must be if the lemma is true. Indeed,
D = D0 implies that D0 = 1 D . This then tells us how to define
D0 in order to prove the Lemma. All it remains is therefore to prove that this D0 so
defined must be a dilation. Here is a hint: If D has center O and scale factor r, let
(O0 ) = O. Then show that D0 has center O0 and scale factor r. We leave the details
of the proof of the Lemma and also the proof that S S 0 must imply S 0 S to an
exercise.

In connection with the symmetry of similarity, we should also mention the related
concept of transitivity: we say similarity is a transitive relation if for any
three sets S1 , S2 , and S3 , S1 S2 , and S2 S3 always imply S1 S3 . In a naive
sense, what this says is that if S1 looks like S2 and S2 looks like S3 , then it stands
to reason that S1 also looks like S3 . Clearly if something this reasonable is not true
for our definition of similarity, then we must have given the wrong definition. It is
therefore reassuring that, in fact, our definition is not wrong because similarity is
transitive. The proof of transitivity is, however, a bit more involved technically than
what we should be doing here, which is to learn enough about similarity to deal with
the geometry of linear equations. This proof will therefore be postponed to 7 of
Chapter 7 when the transitivity of similarity becomes a critical issue.

269
Exercises 5.3

1. Let D, E, F be the midpoints of the sides BC, AC, AB, respectively, of a triangle
ABC. Prove that 4DEF 4ABC with a scale factor of 2.

2. [This problem generalizes problem 6 of Exercises 5.1.] Assume FTS. Let L1 , L2 ,


and L3 be three mutually parallel lines, and let ` and `0 be two distinct transversals
which intersect the three parallel lines at A1 , A2 , A3 , and B1 , B2 , B3 , respectively.
Prove that
|A1 A2 | |B1 B2 |
=
|A2 A3 | |B2 B3 |

3. Prove that all circles are similar to each other. (Caution: Dont skip steps.)

4. Prove that two rectangles are similar to each other if and only if either the ratios
of their sides are equal or the product of these ratios is 1. Precisely, let the lengths
of the sides of one rectangle be a and b, and those of the other be a0 and b0 ; then the
0 0
rectangles are similar if and only if either ab = ab0 or ab ab0 = 1.

5. (a) Write a detailed proof of Theorem G25. (b) Prove Theorem G24 (Converse
of Pythagorean Theorem).

6. (a) Write out a detailed proof of the Lemma at the end of this section. (b) Write
out a complete proof of the fact that, for any two sets S and S 0 , if S S 0 , then S 0 S.

7. Let L and L0 be two lines intersecting at a point O. Take any point P on L, and
let the line passing through P and perpendicular to L0 meet L0 at a point P 0 . Prove
|P P 0 |
that the ratio |OP 0 | is independent of the position of P on L, i.e., if Q is another
point on L, and if the line passing through Q and perpendicular to L0 meets L0 at a
point Q0 , then
|P P 0 | |QQ0 |
=
|OP 0 | |OQ0 |

8. (a) Let |6 B| = |6 C| in 4ABC. Prove that |AB| = |AC|. (b) Prove that every
point on the angle bisector of an angle is equi-distant from both sides of the angle.

270
(Note: In some sense, these two assertions should be proved in the setting
of congruence, not similarity; any theorem related to similar triangles re-
quires the FTS, which a more sophisticated theorem than anything about
congruent triangles. Indeed, we will revisit these problems again in Chap-
ter 11 and you will prove them using only theorems about congruence.
That said, the virtue this problem is that you get to see at least another
approach to these standard facts.)

9. Suppose we have two parallel lines L and L0 , and a point O not lying on either
line. Let three lines passing through O intersect L and L0 at points A, B, C, and A0 ,
B 0 , C 0 , respectively, as shown.
A B C
L
@  
@  
@ 
@O
@
 @
  @
  @
  @
  @ L0
C0 B0 A0

(This picture puts O between L and L0 , but O could be anywhere.) Prove that

|AB| |BC| |AC|


= =
|A0 B 0 | |B 0 C 0 | |A0 C 0 |

10. Suppose in 4ABC, AB is longer than AC. Let a point D on the segment BC
be such that AD BC.
A

 JJ

 J
 J
 J

 J
B D C

(a) Prove that BD is longer than DC. (b) Prove that |BD| |DC| > |AB| |AC|.

271
11. Given 4ABC, let a point D on the segment BC be such that AD BC.. Let
also
|AB| = c, |AC| = b, |AD| = h, |DC| = `, |BC| = a
A

 JJ
c

 h JJb
 J

 J
B D ` C

a2 + b 2 c 2
(a) Prove that ` = . (b) Prove that
2a
1 q
h = (a + b + c)(a + b c)(a + b + c)(a b + c)
2a

12. Suppose you are a teacher in middle school and you are handed a textbook series
that takes up similarity in grade 7 and congruence in grade 8. (Such a series exists.)
(a) Do you believe such a curricular decision is defensible? Explain. (b) If you are a
seventh grade teacher, what would you do? (Obviously there will be no unique answer
to part (b), but the idea is that you had better start thinking about such real-world
situations because the world out there is full of curricula that make no sense, and
your ability to adjust is, alas, part of your responsibility.).

272
Chapter 6: The Symbolic Notation and Linear Equa-
tions,

1 Symbolic expressions (p. 274)


2 Solving linear equations in one variable (p. 301)
3 The graphs of linear equations in two variables (p. 308)
4 Parallelism and perpendicularity (p. 330)
5 Simultaneous linear equations (p. 338)

273
In this chapter, we begin the study of algebra. In the context of school mathe-
matics, the most urgent task in helping students to achieve algebra may very well be
getting them to be fluent in the use of symbols. There is at present an unhealthy pre-
occupation with the concept of a variable in the teaching of algebra, to the point
of elevating it to a formal mathematical concept. There is in fact no such formal
concept in mathematics. A main goal of this chapter is to explain why, if students
know the protocol in the use of symbols, they need never bother with a variable
except to learn how to use it as a piece of informal terminology.
This chapter is concerned with linearity: linear equations in one or two variables
and systems of two linear equations in two variables. We make use of the concept of
similar triangles to prove that the graph of a linear equation is a line and each line is
the graph of a linear equation. In the process, we make some comments on the rela-
tionship between slope of a line on the one hand, and parallelism and perpendicularity
on the other. Such a clarification is needed for a systematic discussion of linear sytems.

1 Symbolic expressions

In mathematics, we use symbols to expedite the expression of ideas. The begin-


ning of algebra, as we understand this term, is the introduction of generality and
abstraction by using symbols (usually letters of the alphabet) to represent numbers.
In order to convince students with only a background in arithmetic that the use of
symbols is something well worth learning, we can begin by giving examples of the
immediate benefits of using symbols. This is not difficult at all, as we shall presently
demonstrate. Now, using a letter to stand for a number is the same as using an
abbreviation to stand for a short phrase (e.g., U.S.A. for United States of America)
or a pronoun to stand for a person or object. It is a simple task that is well within
the learning capability of the average student. But if we tell students that

a symbol is a variable,
a variable is a quantity that changes or varies,
variables represent quantities whose values vary,
a sentence with a variable is called an open sentence, and it is called
open because its truth cannot be determined until the variable is re-

274
placed by values, and
developing students understanding of variables is an important part of
the middle school curriculum,
then we are mystifying a simple task and setting up students for failure by asking them
to learn something mystical. While in certain contexts, such as the description of the
trajectory of a moving particle as a function of time t, one can discuss the intuitive
content of t in this particular context as a quantity that changes from moment to
moment, it is nevertheless not a good idea to accord such an intuitive idea a precise
formal status in mathematics. Moreover, what is a student supposed to understand
about a variable? That they should learn how to use a letter to represent a member
in a collection of numbers? That is simple enough without any verbose instruction.
As to open sentences, we will have some comments on this topic when we come to
the basic etiquette in the use of symbols.
The goal of the present discussion is to impress on you as a future teacher how not
to mislead your students when you teach algebra. To put this discussion in context,
consider this question: would third graders ever learn to use abbreviations if they were
confronted with similar portentous warnings on the use of abbreviations at the outset?

Due to the pressure of tradition and, it must be said, also because of the
convenience we will make free use of the word variable to refer to any letter
that represents a number in a collection of more than one element. We will be more
precise about this usage presently, but one thing must be clear: a variable is not
a mathematical concept in the sense of a formal definition in mathematics proper.
The misplaced emphasis on the meaning of a variable is but one example of the
mathematical blunders committed in many school algebra curricula, which convert
students introduction to algebra into a mangled lesson in vocabulary. One can learn
algebra better if one is not subjected to any elaborate discussions of what a vari-
able means, what an expression is, what an equation is, what an equality is,
what a formula is, and how the last three differ from each other. Such meaningless
discussions make mathematics more complicated than it is.

In this section, we are going to give a simple demonstration of how to introduce the
use of symbols into mathematical discussions, and how a proper use of symbols can

275
immediately lead to meaningful statements about prime numbers, geometric series,
and rational expressions. You will not encounter many new mathematical skills or
concepts; the absence of novelty is part of the design. Indeed, if this section achieves
its goal, then you will walk away from it with the conviction that all it takes to learn
algebra is proficiency in the arithmetic of rational numbers. We hope this conviction
will translate into good teaching when it is your turn to teach algebra.

Let us begin with an explanation of the basic etiquette in the use of sym-
bols. Consider the problem, by no means uncommon in mathematics education, of
interpreting a string of symbols such as y 2 = 3x7 and how to probe students under-
standing of what y 2 = 3x 7 means. Please know that, in mathematics, such a string
of symbols has no meaning , because they are the exact analog of the question,Is
he someone with 225 pounds on a six-foot-five frame? Without knowing who he is,
this statement may be true, or it may be false. By the same token, without knowing
what y and x are in y 2 = 3x 7, what is there to talk about? The afore-mentioned
etiquette dictates that each time you use a symbol, you must say what that
symbol stands for , in the same way that each time you use a pronoun, you must
make sure that your listener or reader knows whom you are talking about. For ex-
ample, we can make sense of y 2 = 3x 7 by specifying what x and y are and by
providing a context. Here are four variations on this theme:

For all values of x we can find a number y so that y 2 = 3x 7.


There are no numbers x and y so that y 2 = 3x 7.
Are there an infinite number of fractions x and y so that y 2 = 3x 7?
Are there an infinite number of positive integers x and y so that y 2 =
3x 7?

It is only when x and y are thus quantified (which is the mathematical terminology
for prescribed) that each statement or question about y 2 = 3x7 can be confirmed,
or negated, or answered, as the case may be. The importance of quantification can be
seen by noting that, despite the similarity between the last two questions, the answer
to the first is yes whereas the answer to the second is no (exercise).
A pertinent remark in this connection is that many students commit the elemen-
tary error of writing down symbolic expressions without quantifying the symbols,

276
such as y 2 = 3x 7 above. If we make our teachers better informed, then they can
help eradicate such mistakes among students. This is one reason why this section is
here.

To make sure you see the importance of always quantifying your symbols, consider
another example. Here are three statements related to the commutative law:

(C1) mn = nm.
(C2) mn = nm for all whole numbers m and n so that 0 m, n 10.
(C3) mn = nm for all real numbers m and n.

The statement (C1) has no meaning , because we dont know what the symbols m
and n stand for. If m and n in (C1) are real numbers, then (C1) is true, but there
are other mathematical objects m and n for which (C1) is false49 or meaningless.50
On the other hand, (C2) is true, but it is a trivial statement because its truth can
be checked by successively letting both m and n be the numbers 0, 1, 2, . . . , 9, 10,
and then computing the finite number of products mn and nm for comparison. The
statement (C3), which is the commutative law of multiplication among real numbers,
is however both true and more profound. It is either something you take on faith (see
3 of Chapter 16), or, if one wants to construct the real numbers from the rationals,
a not-so-trivial theorem to prove because a goodly number of careful definitions will
have to be made. Thus, despite the fact that all three statements (C1)(C3) contain
the equality mn = nm, they are in fact radically different statements because the
quantifications for the symbols m and n are different.

Once the need for quantification of symbols is understood, we now clarify the use
of the word variable. First we give an example. Consider the problem of finding all
the numbers x which satisfy 3x + 7 = 5. In the usual jargon, this is known as solving
the linear equation 3x + 7 = 5. The usual procedure for solving such equations yields
3x = 5 7, and therefore
57
x=
3
49
For example, certain 2 2 matrices.
50
For example, if m is a 2 3 matrix and n is a 3 5 matrix.

277
There is a reason why we do not carry out the computation in the numerator to write
the solution as 2
3
, and it is because if we consider 3x + 12 = 13, then we get
1
13 2
x=
3
Or consider 3x 25 = 4.6 and by rewriting it as 3x + (25) = 4.6, we get

4.6 (25)
x=
3
Or consider 5x 25 = 4.6 and get

4.6 (25)
x=
5
And so on. There is an unmistakable pattern here: no matter what the numbers a,
b, and c may be, the solution of the linear equation ax + b = c, with a, b, c (a 6= 0)
understood to be three fixed numbers throughout this discussion, is
cb
x=
a

We have now witnessed the fact that in some symbolic expressions, the symbols
stand for elements in an infinite set of numbers,51 e.g., the statement that mn = nm
for all real numbers m and n, while in others, the symbols stand for the element
in a set consisting of exactly one element (in other words, they stand for a fixed
value throughout the discussion), e.g, the numbers a, b, and c in the linear equation
ax + b = c. In the former case, the symbols m and n are called variables, and in
the latter case, a, b, and c are called constants. Notice that such terminology is an
afterthought because if we carefully quantify the symbols in each situation, there is no
need to use the word variable or constant; such information is already contained
in the quantification. Moreover, we see that a variable so defined does not vary or
change. It is simply an element in an infinite set of numbers.
In a situation when we have to determine which number x satisfies an equation
such as 2x2 + x 6 = 0 or 2x = x, the value of the number x would be unknown
51
Strictly speaking, all it matters is that the symbols stand for elements in a set consisting of
more than one element. But for school algebra, infinite suffices for the purpose at hand.

278
for the moment. Then the symbol x will also be called an unknown. To the extent
that we will never make logical deductions based on the properties of an unknown,
it is not necessary to make this terminology more precise.52

At the risk of pointing out the obvious, note that we have been making use of
symbols from the very beginning of this book out of necessity. One example is the
addition formula for fractions (3 of Chapter 1): for any two fractions k` and m
n
, where
k, `, m, n are whole numbers (`n 6= 0),
k m kn + `m
+ =
` n `n
If we dont use symbols, we would be forced to express the formula as follows:

The sum of two fractions is the fraction whose numerator is the sum of
the product of the numerator of the first fraction with the denominator
of the second, and the product of the numerator of the second with the
denominator of the first, and whose denominator is the product of the
denominators of the given fractions.

Even if you are inordinately fond of the English language, you would have to admit
that the symbolic statement is far more clear, and this is not even taking into account
the difficulty of trying to provide an explanation of this formula without the benefit
of symbols.
This example may serve to explain to students, should they hesitate to embrace
the symbolic language, why the use of symbols is a necessity. Of course, there are
many other examples that would serve the same purpose equally well.

We now begin the mathematical discussion.


By a number expression, or more simply an expression, in a given collection
of numbers x, y, . . . , w, we mean a number obtained from these x, y, . . . , w using
a specific combination of arithmetic operations (i.e., +, , , ). In due course,
we will enlarge the meaning of number expression to include the use of the operation
of taking the n-th root and more generally the use of the values of functions at
52
For example, we will not try to enter into a long discussion of the relationship between an
unknown and a variable.

279
these x, y, . . . , w. Because all the symbols we use are numbers, we can apply all
we know about numbers to number expressions without having to learn anything new,
including the fact that the associative, commutative, and distributive laws are valid
for computations with number expressions.
In a number expression such as
x
x4 5x3 y 2 + x2 y 2 xy 3 + 2y 4 + ,
1 + y2
which involves the numbers x and y, we may regard it as nothing more than a sum
of products, namely,

(x4 ) + (5x3 y 2 ) + (x2 y 2 ) + (xy 3 ) + (2y 4 ) + (x (1 + y 2 )1 )

(You may wish to review 2 of Chapter 2 at this point concerning the definition of
subtraction in terms of addition, and 5 of Chapter 2 concerning division as multipli-
cation by a multiplicative inverse). Any of the expressions x4 , 5x3 y 2 , x2 y 2 , xy 3 ,
x
2y 4 , and 1+y 2 , which are separated by two consecutive +s (except for the first one
4 x
x and the last one 1+y 2 ), is called a term of the expression. As is the custom, the
x
writing of the expression x4 5x3 y 2 + x2 y 2 xy 3 + 2y 4 + 1+y 2 has made implicit use

of three notational conventions:

Retiring : The multiplication sign is omitted in symbolic expres-


sions except that, if emphasis on a particular multiplication is needed, a
dot is used, as in (x (1 + y 2 )1 ). As is well-known, the reason for
retiring is that it is too easily confused with the letter x in ordinary
writing.
The writing of specific numerical values in a symbolic expression: The
number 1 is always suppressed, e.g., x4 in place of 1x4 , and in general,
an specific numerical value such as 5 in 5x3 y 2 is always put in front of
the symbols, e.g., it is never x3 (5)y 2 or even x3 y 2 (5).
The order of arithmetic operations among symbols in an expression: It
is understood that (i) we do the multiplications of each symbol x and y
x
indicated by the exponent first (e.g., x3 and y 2 in 5x3 y 2 , or y 2 in 1+y 2 ),
2 1 x
then (ii) the multiplications within each term (e.g., x (1 + y ) in 1+y2 ),
and finally (iii) the addition of the various terms.

280
We will have more to say about the third convention presently.

It can happen that the equality of two number expressions in a collection of


numbers x, y, z, . . . is valid for many values of x, y, z, . . . . Here, many could
mean all numbers x, y, z, . . . , as in
(x + y)2 = x2 + 2xy + y 2 for all numbers x and y
Or many could mean all numbers with a small number of exceptions, as in53
1 + tan2 x = sec2 x for all numbers x 6= an integer multiple of
2

This equality is not true for all numbers x because tan x and sec x are not defined
when x is equal to an integer multiple of 2 . Or many could mean all nonzero
whole numbers only, as in
n(n + 1)
1 + 2 + 3 + + (n 1) + n = for all whole numbers n 1
2
By tradition, each of these three equalities, thus carefully quantified, is called an
identity. You recognize that we have not offered a definition of what an identity is,
other than that an identity is a figure of speech that alerts you to the fact (already
made explicit above) that it is an equality between two number expressions which is
valid for many values of the numbers in question. It is left to you to be careful
about what many means in each case. This is but one example among many that
we sometimes use mathematical terms out of respect for tradition, not as a precise
mathematical concept, but as a convenient way to suggest a mental image. Variable
is another example of such usage, as we already discussed.
Fortunately, a majority of the well-known identities are valid for all numbers, with
no exceptions. We will focus on these in this section.

Now with the terminology of an identity understood, let us list the three most
common identities:
(x + y)2 = x2 + 2xy + y 2 for all numbers x and y
2 2 2
(x y) = x 2xy + y for all numbers x and y
2 2
(x + y)(x y) = x y for all numbers x and y
53
We continue to make use of some mathematics that we have not discussed, but which you
undoubtedly know, to illustrate a point.

281
At least three comments should be made. The first is obvious: insofar as x and y
are numbers,54 these three identities follow from routine number computations using
the associative, commutative, and distributive laws. For example, here is a proof of
the third identity:

(x + y)(x y) = (x + y)x (x + y)y (dist. law)


= x2 + yx xy y 2 (dist. law; assoc. law of addition)
= x2 + xy xy y 2 (comm. law)
= x2 y 2

A second comment is that once we have the first identity, the second one becomes
a trivial consequence of the first, to wit, because the first identity is valid for all
numbers x and y, it is also valid for x and y. Therefore, for any number x and y,

(x + (y))2 = x2 + 2x(y) + (y)2

But this is the same as


(x y)2 = x2 2xy + y 2
by Theorem 3 in 4 of Chapter 2. Since the last equality is valid for all numbers x
and y, we have proved the second identity above.
You may be wondering why we bother with the preceding proof of the second
identity since a simple direct computation already proves (x y)2 = (x y)(x y).
The point is rather that, at the beginning stage of algebra, your students are put in
touch with the idea of generality for the first time, namely, the first identity is not
just the equality of the expressions (x + y)2 and x2 + 2xy + y 2 for certain numbers x
and y, but that it is valid for all numbers x and y. If you can teach them to take
the latter statement seriously, and make them realize that the validity of the equality
when y is replaced by y immediately implies the validity of the second identity,
then you have taught them something valuable. One may paraphrase by saying that,
because of the generality of the first identity, the first identity already contains the
second identity as a special case. An important part of learning algebra is to become
alert to the potential implications of a general statement, and the derivation of the
54
By FASM, a more precise statement is that we are dealing with rational numbers x and y here.

282
second identity from the first is a trivial, but nonetheless noteworthy demonstration
of the power of generality.
A third comment is that, in practice, the usefulness of these identities is, more
often than not, derived from ones ability to read these identities backwards,
i.e., the ability to recognize in a given situation that, for any two numbers x and y,
x2 + 2xy + y 2 is (x + y)2 ,
x2 2xy + y 2 is (x y)2 ,
x2 y 2 is (x + y)(x y).

For example, 25x2 + 49y 2 70xy is the square (5x 7y)2 , because

25x2 + 49y 2 70xy = (5x)2 2(5x)(7y) + (7y)2

We pause to make another remark about identity. We may rewrite (x y)(x +


y) = x2 y 2 as
x2 y 2
=x+y
xy
Remembering that we cannot divide by 0, we see that this equality holds for all
numbers x and y so that x 6= y. This equality is of course also considered to be an
identity, but again it is not one that is valid for all x and all y since an exception has
to be made: x 6= y.

We have more to say about the identity (x y)(x + y) = x2 y 2 ! We ask if


there is an analogous identity that has x3 y 3 on the right side. There is, because
a straightforward computation making repeated use of the distributive law (and of
course also Theorems 1 and 2 of the Appendix in Chapter 1) shows that

(x y)(x2 + xy + y 2 ) = x(x2 + xy + y 2 ) y(x2 + xy + y 2 )


= x3 + x2 y + xy 2 yx2 yxy y 3
= x3 + x2 y + xy 2 x2 y xy 2 y 3
= x3 y 3

In other words,
(x y)(x2 + xy + y 2 ) = x3 y 3

283
Similarly,
(x y)(x3 + x2 y + xy 2 + y 3 ) = x4 y 4
and
(x y)(x4 + x3 y + x2 y 2 + xy 3 + y 4 ) = x5 y 5
The pattern is now clear: for any positive integer n, and for all numbers x and y,

(x y)(xn + xn1 y + xn2 y 2 + xn3 y 3 + + xy n1 + y n ) = xn+1 y n+1

Let us rewrite this identity in the following way: for all numbers x and y,

xn+1 y n+1 = (x y)(xn + xn1 y + xn2 y 2 + xn3 y 3 + + xy n1 + y n )

Thus, the difference of two numbers x and y raised to the same power can always be
expressed as a product of x y and xn + xn1 y + + y n . By any measure, this is a
nice-looking identity. We will examine its implications in some detail in two different
settings.
Since this identity is valid for all numbers x and y, it is certainly valid when
x and y are positive integers. In that case, observe that both sides of the identity
become integers. Assume x > y > 0. Then also xn+1 > y n+1 > 0 (see problem
4 in Exercises 2.6) and the identity says that the positive integer xn+1 y n+1 has
a factorization (1 of Chapter 3) as the product of two positive integers: x y and
xn + xn1 y + + xy n1 + y n . For the sake of clarity, we restate it as follows: for all
positive integers a, b, and n, so that a > b > 0, we have the following factorization of
the positive integer an+1 bn+1 as

(an+1 bn+1 ) = (a b)(an + an1 b + an2 b2 + + abn1 + bn )

If ab > 1, then in particular a > 1 so that an +an1 b+an2 b2 + +abn1 +bn > 1.
Therefore the positive integer an+1 bn+1 , being the product of two integers each
bigger than 1, is not a prime (1 of Chapter 3). We have thus proved the following.

Proposition 1 If a, b are positive integers so that ab > 1, then for all positive
integers n, an+1 bn+1 is never a prime.

284
For example, 1273 663 = 1760887, and this proposition guarantees that 1,760,887
is not a prime. By no means is this fact obvious because its smallest divisor is 61. In
fact, the prime decomposition (see 2 of Chapter 3) of 1,760,887 is 61 28867. (By
coincidence, the identity 1273 663 = (127 66)(1272 + 127 66 + 662 ) also provides
the prime decomposition of 1,760,887.)
In a similar vein, you can show off to your friends by challenging them to check
whether 13,997,513 is a prime. You of course know that it is not a prime because

13997513 = 2413 23

What makes the testing of the primality of this number difficult is that the prime
decomposition of 13,997,513 is 239 58567. In other words, its smallest divisor is
239, so that guess-and-check wont be very effective in this case. (Again, the identity
2413 23 = (241 2)(2412 + 142 2 + 22 ) also happens to exhibit the prime decom-
position of 13,997,513.)

Suppose b = 1 in the identity. Then we get, for all positive integers a and n,

an+1 1 = (a 1)(an + an1 + an2 + + a2 + a + 1)

As before, if a > 2, then a 1 > 1 and an + an1 + + a2 + a + 1 > 1 so that


an+1 1 is never a prime. This is a special case of Proposition 1. It turns out that
the case of a = 2 provides a different kind of intrigue. In this case, a 1 = 1, and
the preceding identity no longer provides a nontrivial factorization of 2n+1 1. The
identity therefore no longer exhibits the whole number 2n+1 1 as a composite.
More to the point, 2n+1 1 is actually a prime for certain values of n, as we can see
from the first five cases of n = 1, 2, 3, 4, 5:

n = 1, then 2n+1 1 = 3.
n = 2, then 2n+1 1 = 7.
n = 3, then 2n+1 1 = 15.
n = 4, then 2n+1 1 = 31.
n = 5, then 2n+1 1 = 63.

285
So among the first five values of 2n+1 1, three of them are primes but two are not.
The question naturally arises: among all possible values of 2n+1 1 as n runs through
all the positive integers, which of them are primes?
There is an easy reduction of this question, as the following proposition tells us
that there is no need to look for primes among the numbers 2n+1 1 where n + 1
is a composite:

Proposition 2 If m is a composite positive integer, then 2m 1 is also a com-


posite.

Proof Indeed, if m = pq for positive integers p and q so that p, q > 1, let k = 2q .


Then we will show k 1 is a proper divisor of 2m 1. This is because:

2m 1 = (2q )p 1p = k p 1p = (k 1)(k p1 + k p2 + + k + 1)

(Notice that, once more, we rely on our identity in the last step.) But q > 1
implies (k 1) > 1 because k 1 = 2q 1 3. Moreover, p > 1 implies
k 1 = 2q 1 < (2q )p 1 = 2m 1. Therefore 1 < k 1 < 2m 1, and
k 1 is a proper divisor of 2m 1. The proposition is proved.

Proposition 2 explains why 24 1 (= 15) and 26 1 (= 63) in the above list are
not primes.
In view of Proposition 2, our original question about which of 2n+1 1 are primes
can now be simplified to:

Which of 2p 1 are primes, as p runs through all the primes?

In 1644, Father Mersenne55 claimed that, among all the primes p < 258, 2p 1 is
a prime exactly when

p = 2, 3, 5, 7, 13, 17, 19, 31, 67, 127, 257.


55
Marin Mersenne (1588-1648) was a French theologian and amateur mathematician. He was a
friend of all the French mathematical luminaries of his time, including Descartes, Fermat, Pascal, De-
sargues and Roberval, and performed the critical service of disseminating mathematical information
among them at a time when mathematical publications were basically nonexistent.

286
Keep in mind that the computer is a creation of the latter part of the twentieth
century, so it was a nontrivial matter in the days of Mersenne to check the primality
of a (whole) number with, say, ten digits such as 231 1 = 2, 147, 483, 647. Mersenne
probably did not test all the numbers he wrote down and his statement was likely a
mixture of guesses and wishful thinking. Even the fact that his list is correct up to
p 31 is nontrivial, as this requires the verification of two separate sets of assertions:

(A) 2p 1 is composite for p equal to 11, 23, and 29.


(B) 2p 1 is prime for p equal to 13, 17, 19, 31 (the cases where p is 2, 3,
5, and 7 are obvious).

For (A), the fact that 211 1 = 2389 was known back in 1536, and the fact that 223 1
is composite was found by Fermat56 in 1640 as a refutation of the opposite claim made
by Cataldi in 1603. The case of 229 1 was not known at the time Mersenne made his
conjecture, and it would stay that way until Euler,57 almost a century after Mersenne
made his conjecture, exhibited the prime decomposition 229 1 = 233 1103 2089
in 1738. As for (B), the primality of 213 1 = 8191 only takes a little patience and
was known in any case as far back as the fifteenth century. The primality of 217 1
and 219 1 was verified by the same Cataldi in 1588. But the primality of 231 1
was finally proved only in 1772 by Euler.
With the help of an electronic computer, we can easily see that Mersennes list
contains five errors: 61, 89, 107 should have been on his list but they were not, and
the numbers 67 and 257 which were on his list shouldnt have been there (i.e., 267 1
and 2257 1 are composites).
As a result of Mersennes list, a number of the form 2p 1 which is a prime is
called a Mersenne prime. You can get more information about Mersenne primes
from
56
Pierre de Fermat (1601(?)1665) was an amateur mathematician but one of the greatest math-
ematicians of all time nonetheless. The terminology of Cartesian coordinates masks the fact that
Fermat was a co-discoverer of analytic geometry with Descartes; in fact Fermat made the discovery
a few years earlier and seemed to have a better understanding of the potential of his discovery.
Fermat was a co-founder of the theory of probability (with Blaise Pascal) and the modern theory
of numbers also owes its existence to Fermat. His definition of the tangent to a curve at a point
inspired Newtons definition of the derivative.
57
Leonhard Euler (1707-1783) was the dominant mathematician of the 18th century and rightfully
ranks among the greatest. He made important contributions in every part of mathematics, physics,
and astronomy as they were known in his time.

287
http://mersenne.org/

So what are the Mersenne primes? Unfortunately, we know very little about this
question. We dont even know if there are an infinite number of Mersenne primes. A
search for bigger and bigger Mersenne primes is an ongoing enterprise (see the website
above), and the largest Mersenne prime as of June 2009 is a number with about 12.9
million digits corresponding to the prime p = 43,112,609 (discovered on August 23,
2008). From a mathematical perspective, this search would be more significant if we
knew that the number of Mersenne primes is finite.

Returning to the identity

xn+1 y n+1 = (x y)(xn + xn1 y + xn2 y 2 + xn3 y 3 + + xy n1 + y n )

for all numbers x and y and all positive integer n. Letting y = 1, we get

xn+1 1 = (x 1)(xn + xn1 + xn2 + + x2 + x + 1)

for all numbers x and for all positive integers n. We now explore the implications of
1
the latter identity from another angle. If x 6= 1, multiplying both sides by x1 and
switching the left and the right sides give:

xn+1 1
1 + x + x 2 + x3 + + xn =
x1
for any number x 6= 1, and for any positive integer n. The sum 1+x+x2 +x3 + +xn
is called a finite geometric series, and this identity is usually referred to as the
summation formula for the finite geometric series. We may assume that
x 6= 0 because the case of x = 0 is not interesting. Recalling that58 1 = x0 , so this
identity expresses the sum of all the powers of x up to and including xn as a quotient
(xn+1 1)/(x 1). For example, if x = 3 and n = 12, then

(3)13 1 1594324
1 3 + 32 33 + 34 311 + 312 = = = 398581
3 1 4
3
But if x = 4
and n = 15, then we have
58
For a fuller discussion of the 0-th power of x, see Chapter 7.

288
1 + 43 + ( 34 )2 + ( 43 )3 + + ( 34 )15 = {( 34 )16 1}/( 34 1)

which is equal to approximately


0.9899774
= 3.9599096
0.25
It is roughly 3.96.

The geometric series is usually tucked away at the end of the second year of al-
gebra, with the result that it is often taught hastily for lack of time. It is one of the
most basic pieces of mathematics students should know for advanced mathematical
or scientific work. As we have just seen, it is also one of the most elementary. It
should be taught, therefore, at the beginning of algebra, not at the end of it. Please
be aware of this fact when you teach algebra.

Before leaving the identity xn+1 y n+1 = (xy)(xn +xn1 y + +xy n1 +y n ), we


make a passing remark that this identity also gives a short proof of the calculus fact
that the derivative of xn+1 is equal to (n+1)xn . Briefly, the proof goes as follows. The
derivative of xn+1 at a is the limit of the difference quotient (xn+1 an+1 )/(x a)
as x goes to a. Because of the preceding identity, the numerator of the difference
quotient is equal to the product of (x a) and xn + xn1 a + + xan1 + an . Thus
the difference quotient itself becomes xn + xn1 a + + xan1 + an after (x a)
has been cancelled from numerator and denominator. So as x goes toward a, we get
an + an + + an (n + 1 times), which is (n + 1)an .

We next introduce polynomials. Underlying the whole discussion of polynomials


is a basic technique known as collecting like terms. It is nothing more than a
simple observation based on the distributive law, and we deal with this first. Suppose
we have a sum
(18 53 ) + (53 23) + (69 53 )
One can compute this sum by first multiplying out each term 18 53 , 53 23, and
69 53 , and then adding the resulting numbers to get

(18 53 ) + (53 23) + (69 53 ) = 2250 + 2875 + 8625 = 13750

289
Now if we reflect for a moment, we would realize that we wasted precious time do-
ing three multiplications before adding. If we apply the distributive law, then the
computation simplifies:

(18 53 ) + (53 23) + (69 53 ) = (18 + 23 + 69) 53


= 110 125 = 13750

Notice that we have made use of the commutative law of multiplication to change
53 23 to 23 53 in the process.
You may think that, with the advent of high speed computers, it doesnt matter
if we get the answer by multiplying three times and then add once, or (as in the
second case) add three times and multiply once. This is true, but the difference in
conceptual and visual clarity between

(18 53 ) + (53 23) + (69 53 )

and
(18 + 23 + 69) 53
is substantial. This is because multiplication is a far more complicated concept than
addition: 234 + 677 is what it is, an addition, but 234 677 means adding
677 to itself 234 times. It is therefore conceptually simpler to add three times and
multiply once than to multiply three times and add once. Because both conceptual
and visual clarity are important in the learning and doing of mathematics, we will
collect together terms involving the same numbers raised to a fixed power (such as
53 in (18 53 ) + (53 23) + (69 53 )) by using the distributive law. For example,
we will always rewrite

(181 25 ) + (67 25 ) + (25 96) (257 25 )

as
87 25 (= (181 + 67 + 96 257) 25 ),
Similarly, we will write

[24 5914 ] [( 53 )8 89] + [5914 73] + [5914 66] + [25 ( 35 )8 ] + [( 35 )8 11]

as

290
[163 5914 ] [53 ( 35 )8 ],

where 163 = 24 + 73 + 66 and 53 = 89 + 25 + 11. Recall once more that we refer


to both of the above expressions as a sum because subtraction is just addition in
disguise (2 of Chapter 2).
In an entirely similar manner, suppose we are given a sum of multiples of non-
negative integer powers of a fixed number x, where multiple here means simply
multiplication by any number and not necessarily by a whole number, and nonnega-
tive integers refers to the whole numbers 0, 1, 2, . . . . Then we would automatically
collect together the terms involving the same power of x as before. For example, we
will write
1 3 1
x + 16 8x2 + x3 x5 6x2 + 75x + 2x3
2 3
as
17
x5 + x3 14x2 + 75x + 16.
6
Observe that we have followed three notational conventions in writing the latter
sum involving the powers of a fixed number x:

(A) The earlier convention that parentheses are suppressed, with the un-
derstanding that exponents are computed first, multiplications second,
and additions third.
(B) The earlier convention that the power of x is placed last in each term,
so that instead of x2 14, we write 14x2 .
(C) The terms are written in decreasing powers of the number x in ques-
tion. (The term 16 is the term 16x0 , where by definition, x0 = 1; in-
cidentally, this is where we need the concept of the zeroth power of a
number59 .)

A sum of multiples of nonnegative integer powers of x is called a polynomial 60


in x. The number in front of a power of x is called the coefficient of that particular
59
Note that in this case, we have to make an ad hoc definition by agreeing to write x0 = 1
regardless of whether x = 0 or not.
60
Mathematical aside: This definition of a polynomial with real coefficients is the appropriate one
for school mathematics, and is based on the fact that the polynomial ring over R is ring-isomorphic
to the ring of R-valued polynomial functions.

291
power of x. For the polynomial at hand, the coefficient of x5 is 1, the coefficient of
x4 is 0 and the coefficient of x2 is 14, because, strictly as a sum of the powers of x,
this polynomial is in reality
17 3
(1)x5 + 0x4 + x + (14)x2 + 75x + 16x0 .
6
Similarly, 16 is the coefficient of x0 . A multiple of a single nonnegative power of x,
such as 58x12 is called a monomial. Thus, a monomial is a polynomial with only
one term. The highest power of x with a nonzero coefficient in a polynomial is called
the degree of the polynomial. The terminology about nonzero coefficient refers
to the fact that the preceding polynomial x5 + 17 6
x3 14x2 + 75x + 16 could be
written as 0 x37 x5 + 17 6
x3 14x2 + 75x + 16, but the 37-th power of x clearly
doesnt count. This polynomial has degree 5, and not 37 (and not any whole number
different from 5, for that matter).
This is the place to make a comment on the notational convention (A) above.
As we said earlier, visual clarity in the notation we use is important. We find the
polynomial
17
x5 + x3 14x2 + 75x + 16
6
easy to work with because it is visually simple. As written, though, this symbolic
expression is a priori ambiguous because it could mean, among other things, the
following:
17
{((x)5 + )x}3 {(14x)2 + 75}x + 16
6
But of course what we have in mind is
17 3
{(x5 )} + { (x )} + {14(x2 )} + {75x} + 16
6
(We need not specify the order of doing the additions at this point because of the gen-
eral associative law; cf. Theorem 1 in the Appendix of Chapter 1.) So the net effect
of notational convention (A) is to eliminate the need to write the last cumbersome-
looking expression by declaring that the expression x5 + 17 6
x3 14x2 + 75x + 16
already means the same thing. This is all there is to the notational convention (A).
It is a convention, and one should not invest more mathematical significance in a
convention than what it truly is.

292
As is well-known, school mathematics is sometimes led astray by misplaced em-
phases. Convention (A) seems to have taken a place of honor in the middle school
curriculum under the name of order of operations, students have invented mnemonic
devices to memorize it (Please Excuse My Dear Aunt Sally), and standard as-
sessments likewise contribute to the promotion of this convention. A mathematics
classroom has to deal with conventions, of course, but a convention should be put in
its proper place and not be magnified out of proportion. One suggestion is to explain
the genesis of the convention (A) above, quiz students at the beginning to make sure
they get it, and go on to more important things. For a fuller discussion, see the
article, Order of operations and other oddities in school mathematics, in

http://math.berkeley.edu/~wu/order3.pdf

As is well-known, a polynomial of degree 1 is called a linear polynomial, and


one of degree 2 is called a quadratic polynomial. Because a general quadratic
polynomial has only three terms, ax2 + bx + c, it is also called a trinomial in
school mathematics. (However, one should take note of the fact that the terminology
of trinomial is strictly a product of school mathematics, and should therefore be
avoided in general discussions.) We will discuss quadratic polynomials in some detail
in the last two sections. A polynomial of degree 3 is called a cubic polynomial.
The most familiar polynomials are the so-called expanded forms of whole numbers;
these are polynomials in the number 10. For example, the expanded form of 75,018
is
(7 104 ) + (5 103 ) + (0 102 ) + (1 101 ) + (8 100 )
which is a fourth-degree polynomial in 10. Of course the expanded form of any k-digit
whole number is a polynomial of degree (k 1) in 10. On the other hand, the so-called
complete expanded form of a decimal such as 32.58,

(3 101 ) + (2 100 ) + (5 101 ) + (8 102 ),

is not a polynomial in 10, for the reason that it contains negative powers of 10. It
should also be pointed out, however, that as a polynomial in 10, the expanded form
of a whole number is very special: each coefficient is a single digit whole number,
whereas a general polynomial in 10 would satisfy no such restriction.

293
Because polynomials are just numbers, we can add, subtract, multiply, and divide
them as numbers. Therefore, with the exception of division, the other three arithmetic
operations produce another polynomial in a routine manner. (Division of polynomials
does not generally produce a polynomial and will be looked at separately.) Consider,
for example, the product of two linear polynomials ax + b and cx + d:

(ax + b)(cx + d) = (ax + b)(cx) + (ax + b)d (dist. law)


= acx2 + bcx + adx + bd (dist. law)
= acx2 + (ad + bc)x + bd (dist. law)

Because of the definition of a polynomial, we had to collect terms of the same degree
using the distributive law and rearrange the terms so that they are in descending
powers of x. Other than that, this shows that the multiplication of polynomials is no
different from the usual operations with numbers. If the arithmetic of numbers (whole
numbers and rational numbers) is taught correctly, such operations with polynomials
are just routine and would not pose a problem. In particular, there is absolutely no
need for the uncivilized mnemonic device called FOIL.
We have mentioned the need to sometimes look at an equality backwards, and
now we have to repeat this message. What we obtained above,

(ax + b)(cx + d) = acx2 + (ad + bc)x + bd

is nothing but routine applications of the distributive law. However, when this equal-
ity is read backwards, it becomes

acx2 + (ad + bc)x + bd = (ax + b)(cx + d)

In general, if the polynomials p(x), q(x), r(x) in x satisfy p(x) = q(x)r(x), then
we say q(x)r(x) is a factorization of p(x) if the degrees of both q(x) and r(x)
are positive. (Thus 35 x3 2x2 + 23 = ( 13 )(5x3 6x2 + 2) is not a factorization of
5 3
3
x 2x2 + 23 .) In this terminology, what we have obtained is a factorization of
acx2 + (ad + bc)x + bd as a product (ax + b)(cx + d), provided a 6= 0 and c 6= 0.
For example, we get
1 2 5 1
x + x 3 = (2x 3)( x + 1)
2 4 4
1
by letting a = 2, b = 3, c = 4 , and d = 1. With some practice, the factorization of
1 2
2
x + 54 x 3 can be done directly. One way is the following. Since it is much easier

294
to deal with integers rather than rational numbers, we rewrite the trinomial by using
the distributive law to take out the denominators of all the coefficients, as follows.
1 2 5 1
x + x 3 = (2x2 + 5x 12)
2 4 4
Then we recognize that

(2x2 + 5x 12) = (2x 3)(x + 4)

because the zero-degree term (i.e., 12) of 2x2 + 5x 12 has to be the product of the
zero-degree terms 3 and 4 of 2x 3 and x + 4, and the coefficient 2 of 2x2 + 5x 12
has to be the product of the coefficients 2 and 1 of 2x 3 and x + 4, respectively. So
a few trials and errors would get it done. Hence, we obtain as before,
1 2 5 1 1
x + x 3 = (2x2 + 5x 12) = (2x 3)(x + 4)
2 4 4 4

At present, the teaching of factoring trinomials with integer coefficients figures


prominently, not to say obsessively, in an algebra course. For this reason, some per-
spective on this subject is called for. One should keep in mind that all it does is
factoring two integers A and C into products of integers so that a given trinomial
Ax2 +Bx+C can be written as acx2 +(ad+bc)x+bd (which equals (ax+b)(cx+d)).
There is no denying that beginning students ought to acquire some facility with de-
composing integers into products. It is also important that they can effortlessly factor
a simple trinomial such as x2 + 2x 35 into the product (x + 7)(x 5). But as
sometimes happens, if a little bit of something is good, a lot of it may actually be bad
for you. This seems to be the case here, when the teaching of a minor skill gets blown
up to a major topic, with the consequence that other topics that are more central
and more substantial (such as learning about the graphs of linear equations or solv-
ing rate problems correctly) get slighted. The teaching of algebra should avoid this
pitfall. Please also keep in mind the fact that once the quadratic formula becomes
available (see 4 of Chapter 7), there will be a two-step algorithm to accomplish this
factorization no matter what the coefficients of the trinomial may be.

We give one more illustration of the factorization of polynomials. We begin with


a multiplication of polynomials where each step except the last makes use of the

295
distributive law.
1 1 1 1
(5x3 )(x2 + 2x + 8) = (5x3 )x2 + (5x3 )2x + (5x3 ) 8
2 2 2 2
1
= (5x5 x2 ) + (10x4 x) + (40x3 4)
2
1
= 5x5 + 10x4 + 40x3 x2 x 4
2
Now read this equality backwards and it gives a factorization that is not so trivial:
1 2 1
5x5 + 10x4 + 40x3 x x 4 = (5x3 )(x2 + 2x + 8)
2 2
Note the fact that if p(x) and q(x) are polynomials of degree m and n, respectively,
then the degree of the product p(x)q(x) is (m + n). In other words, the degree of a
product is the sum of the degrees of the individual polynomials. For example, the pre-
ceding calculation which multiplies a degree 3 polynomial with a degree 2 polynomial
yields a polynomial of degree 5 (= 3 + 2).

Finally, a quotient (i.e., division) of two polynomials in a number x is called a


rational expression in x. Here is an example:

3x5 + 16x4 25 x2 7
x2 1
We note that in the case of rational expressions, we need to exercise some care in not
allowing division by 0 to take place. For example, in the preceding rational expres-
sion, x can be any number except 1 because if x = 1, then x2 1 = 0 and the
denominator would be 0.

In writing rational expressions, it is understood (unless stated to the contrary) that


only those values for which the denominator is nonzero are considered.

In middle school, we are mainly interested in rational numbers and, as a conse-


quence, all computations with numbers tacitly assume that the numbers involved are
rational numbers. See also the discussion of FASM in Chapter 1. With this in mind,
since x is a (rational) number, a rational expression is just a rational quotient (in the
sense defined in Chapter 2, 5) and can therefore be added, subtracted, multiplied,

296
and divided like any fraction (see items (a)(e) of 5, Chapter 2). For example, in
case x = 12 in the foregoing rational expression, we would be looking at the rational
quotient
1 1
3( 32 ) + 16( 16 ) 25( 41 ) 7
,
( 14 ) 1
5
which is equal to 16 24 . In general, no matter what x is, we can compute with rational
expressions in x in the usual way:

0.5 x3 + 1 2x7 (0.5 x3 + 1)(x3 + 37 ) + (2x7 )(x8 + x 2)


+ =
x8 + x 2 x3 + 37 (x8 + x 2)(x3 + 73 )

and
3 2
2
x+1 6 ( 32 x2 + 1)(6)
=
x2 + 4x 7 3x4 5 (x2 + 4x 7)(3x4 5)
and
2x+1
x2 0.3 (2x + 1)(2x)
4x3 x+11
=
2x
(x2 0.3)(4x3 x + 11)
These are exactly the same as any computation with rational numbers. This re-
alization is important in the teaching of algebra because, when you teach rational
expressions, you should remind students that if they know how to handle rational
numbers, then they already know all there is to know about this topic. There is so
much in algebra that is just a revisit of arithmetic.

Because the cancellation law is valid for rational quotients (item (a) in 5 of
Chapter 2, which says AB
AC
=BC
for all rational numbers A, B, C, with A 6= 0, C 6= 0),
some rational expressions can be simplified. Sometimes the cancellation presents
itself, as in
(5x4 x3 + 2)(2x 15)
(14x2 + 3x 28)(5x4 x3 + 2)
Here, the number (5x4 x3 + 2) in both the numerator and denominator can be
cancelled, resulting in

(5x4 x3 + 2)(2x 15) 2x 15


=
(14x2 + 3x 28)(5x4 x3 + 2) 14x2 + 3x 28

297
Sometimes, the cancellation can be less obvious. For example, the rational expression
1 3
8
x1
x2 + 2x + 4
can be simplified to 81 (x 2) because, by an earlier identity,

1 3 1 1 1
x 1 = (x3 8) = (x3 23 ) = (x 2)(x2 + 2x + 4)
8 8 8 8
and we can cancel the number (x2 + 2x + 4) from the numerator and denominator.
(As you will learn when we come to quadratic equations, it turns out that in the
case of this particular rational expression in x, x can be any (real) number because
3 8
x2 +2x+4 is never equal to 0. Therefore, we actually have an identity x2x+2x+4 = x2
for all (real) numbers x.)
In beginning algebra, often there is too much emphasis on simplifying rational
expressions. This is a left over from the questionable practice of teaching fractions
by insisting on the reduction of all fractions to lowest terms at all costs.

It remains to round off this discussion by mentioning that one can easily define
polynomials in several numbers x, y, z, etc., and therefore one can likewise
define rational expressions in x, y, z, etc.

SUMMARY All computations involving expressions in a number x are the


same as ordinary computations in rational numbers. A thorough knowledge of rational
numbers is the foundation for learning algebra.

Exercises 6.1

1. Suppose you know (x y)2 = x2 2xy + y 2 for all numbers x and y. Using this
fact alone, prove that (x + y)2 = x2 + 2xy + y 2 for all numbers x and y.

2. Let a, b be positive integers, not both equal to 1, and let n be an odd positive
integer > 1. Prove that an + bn is never a prime. Would this hold if n is even?

298
3. If x is a nonzero number and n is a positive integer, what is
1 1 1 1 1
1 + + + ?
x3 x6 x9 x12 x6n

4. Find the sum of


57 58 59 510 532
+ + 33
68 69 610 611 6

5. (a) Are there an infinite number of fractions x and y so that y 2 = 3x 7? (b)


Are there an infinite number of positive integers x and y so that y 2 = 3x 7?

6. If x is a number that makes all the denominators nonzero in the following, simplify:

(2x3 9x2 5x)/(x 2)2 )


(x2 3x 10)/(x4 16)

1 y
7. If x and y are numbers and x 6= y, + 3 = ? Simplify your answer
x2 y 2 x + y3
as much as possible.

8. (a) Factor 4x2 12x + 9, and 30x2 + 16x + 2 for any number x. (b) Factor
s4 + s2 t2 + t4 into polynomials in s and t with integer coefficients for any numbers s
and t. (c) Likewise factor s4k + s2k t2k + t4k for any positive integer k.

9. Simplify (assuming the denominators are never zero for the numbers x and y):
2x2 +7x+3
4x4 9y 4 15x3 y 4 x4 y 3 x3 7x+6
(i) ; (ii) ; (iii) x+2
4x4 + 12x2 y 2 + 9y 4 60x5 y 2 4x4 y 3 x1

10. How much money is in an account at the beginning of the sixteenth year if the
initial deposit is $500, the annual interest rate is 5%, and at the beginning of each
year starting with the second, $10 is added to the account? Write down the formula,
and then use a scientific calculator to get a numerical answer.

299
In each of the following problems, you are asked only to write the equa-
tions that fully capture the verbal information. No solution is required.

11. Two women started at sunrise and each walked at constant speed. One went
straight from City A to City B while the other went straight from B to A. They met
at noon and, continuing with no stop, arrived respectively at B at 4 pm and at A at
9 pm. If the sunrise was x hours before noon, and if L is the speed of the woman
going from A to B and R is the speed of the woman going from B to A, transcribe
the information above into equations using the symbols L, R and x.

12. The sum of the squares of three consecutive integers exceeds three times the
square of the middle integer by 2. If the middle integer is x, express this fact in terms
of x. If the smallest of the three integers is y, express the same fact in terms of y.

13 I have $4.60 worth of nickels, dimes and quarters. There are 40 coins in all, and
the number of nickels and dimes is three times the number of quarters. If N , D, and
Q denote the number of nickels, dimes, and quarters, respectively, write equations in
terms of these symbols to capture the given information.

14. There are two whole numbers. When the large number is divided by the smaller
number, the quotient is 9 and the remainder is 15. Also, the larger number is 97.5% of
ten times the smaller number. If x is the larger number and y is the smaller number,
express the given information in equations in terms of x and y.

15. A video game manufacturer sells out every game he brings to a game show. He
has two games, an A Game and a B Game. He can bring 50 of A Games and B
Games in total to the show. Each A Game costs $75 to manufacture and will bring in
a profit of $ 125. Each B Game costs $165 to manufacture and will bring in a profit
of $185. However, he only has $6, 000 to spend on manufacturing. If he brings x A
Games and y B Games, describe in terms of x and y how he can maximize his profit.

300
2 Solving linear equations in one variable

A statement about the equality of two expressions in a number x of the form


ax + b = cx + d, where a, b, c, d are given numbers, is called a linear equation
in one variable. The given numbers a, b, c, d are the constants of the equation.
More generally, if an equality of two number expressions has the property that, each
expression is of the form ax + b after applications of the commutative, associative,
and distributive laws, then such an equality is also called a linear equation in
one variable. For convenience, we will deal only with linear equations of the form
ax + b = cx + d. Because we do not know, or at least, we are not given the exact
value x so that ax + b = cx + d, the symbol x is also called an unknown, as noted in
1. Because we have to assume, a priori, that there may be many such numbers x so
that ax + b = cx + d, the symbol x will also be called the variable of the equation.
Notice that we have given no meaning to the term one variable other than to use
it to indicate that we are dealing with one symbol x and not several symbols all at
once. A number x0 that satisfies the equation, i.e., ax0 + b = cx0 + d, is called
a solution of the equation. For example, 55 is a solution of 2x 109 = 56 x
because 2(55) 109 = 56 (55). To determine all solutions of an equation is called
solving the equation. We have no prior guarantee about how many solutions a
linear equation has. There may be one, there may be an infinite number, or there
may be none. We will return to this question at the end of the section.
You are undoubtedly already proficient in solving linear equations of one variable,
but this section is about acquiring the correct perspective for teaching this subject.
School textbooks traditionally teach the solving of such equations by breaking it up
into the solving of one-step equations, two-step equations, and multi-step equations.
When something so simple is broken up in this unthinking manner, and especially if
at the end of the discussion no overview about what has happened is given, distortion
of the main goal is bound to take place in students minds. Mathematically, it would
be preferable to break up the method of solution into the following two steps instead;
these steps correspond to the two main ideas of the solving of such an equation:
(i) Solve linear equations of the form ax = b, where a 6= 0. There is a
unique solution x = ab .

(ii) Bring every linear equation to the form ax = b by transposing all the

301
terms involving x to one side of the equation, and all those which dont
to the other (see the discussion immediately following). This process is
called isolating the variable.

The simple proof of (i) will be left as an exercise. We will describe (ii) in some
detail presently, and will explain why the two steps, (i) and (ii) together, solve all
linear equations of one variable.
First of all, what makes (ii) possible are the following simple facts which follow
from the definition of the multiplication and addition of numbers in Chapters 1 and
2:

Two numbers that are equal remain equal when multiplied by equal num-
bers.
Two numbers which are equal remain equal when added to equal numbers.

For example, given the equation 3x 4 = 27 5


x + 1 in x, we describe how to solve
it by first bringing it to the form ax = b using the method of (ii). We do not know if
there is any number x that would satisfy this equation, but we assume that there is one
to begin our consideration. Therefore, let x0 be a solution. Then 3x0 4 = 27 x +1
5 0
is just an equality involving numbers; we may therefore apply the ordinary
arithmetic operations to both sides of this equation. By adding 27 x to
5 0
27
both sides of this equality, we get 3x0 5 x0 4 = 1. In standard terminology, we
have just transposed the term 27 x to the other side. Now transpose 4 to the
5 0
right side by adding 4 to both sides, thereby getting
27
3x0 x0 = 1 + 4
5
Since the distributive law implies
27 27 12
3x0 x0 = (3 ) x0 = x0
5 5 5
and since 1 + 4 = 5, we get
12
x0 = 5
5
We have now isolated the variable in this equation. We have completed step (ii).
By step (i), the number x0 is equal to 2512
.

302
We summarize what we have accomplished: if there is a solution x0 to the
27
equation 3x 4 = 5 x + 1, then we have proved that it must be 25
12 . This
says nothing about having found a solution to the original equation 3x4 = 27
5 x+1.
What we must do now is to verify that indeed 25
12 is a solution. This is the easy
part: simply check that
25 27 25
3( )4 = ( ) + 1
12 5 12
This is routine, provided one is fluent in the computations with rational numbers.

The reasoning above is perfectly general. Given an equation ax + b = cx + d in x


with constants a, b, c, d. Suppose a c 6= 0. We first assume there is a solution x0 .
Then a numbers computation as above leads to (a c)x0 = d b. By (i),
db
x0 =
ac
(Compare problem 8 in Exercises 2.5 of Chapter 2.) Therefore if there is a solution
of the equation, then it must have this unique value. Is this actually a solution?
Another computation with numbers verifies that
! !
db db
a +b=c +d
ac ac

(The details are left as as exercise.) Thus this x0 is a solution. We have therefore
proved in general the following

Theorem The equation ax + b = cx + d so that a c 6= 0 can be solved according


db
to steps (i) and (ii) above. The unique solution is ac .

Let us now reflect on the solution method.


First of all, observe that, if in applying the method of step (ii) to the equation
3x4 = 27 5
x+1, we replace x0 everywhere by the symbol x itself, then this computation
would be exactly the same as the usual method of solution of linear equations found
in standard school textbooks. So what is different between what is here and what is
in those books? Let us enumerate the key differences:

303
(a) School textbooks do not make explicit what the symbol x is, other
than the fact that it is a variable, so that such computations seem to take
place in a no mans land: you are made to understand that it is a new kind
of computation and you are supposed to do it without asking too many
questions. If it seems to you that it doesnt make any sense, but you are
told to withhold your skepticism because, at the end, you get the right
answer. This is not what mathematics is about. By comparison, we have
taken pains to underscore the fact that every single computation here is
a computation with numbers, the kind we did in Chapters 1 and 2.
There is no mystery about what we are doing here.

(b) In school textbooks, the computation in question is supposed to find


the solution. So this becomes a matter of manipulating symbols: do it,
dont ask why, and the end result is the solution. Here, we want to be
perfectly clear: this computation does not yield the solution. Rather, this
is a computation that depends on assuming at the outset that we already
have a solution x0 ; all that the computation does is to tell us what this x0
is. To actually claim that this value of x0 satisfies the equation, we have
to perform the extra step of direct verification.

(c) The proof we have given of the Theorem makes it obvious why fluency
with rational number computations is a pre-condition for learning alge-
bra. If one has the needed conceptual understanding of what is meant
by solving a linear equation, then one realizes that it is nothing more
than a skillful application of computations with rational number. School
textbooks generally fail to make this point.

Needless to say, for the purpose of solving equations, we do not want students to
repeat this ritual, every time, of first getting a candidate for the solution by assuming
that there is one, and then verifying that it is a solution by substitution. This is
why we prove the general theorem above, which then allow students to solve equa-
tion in the usual mechanical way. The point is therefore that you should explain to
your students at least the essence of the proof of the theorem, so that they know that
solving an equation is an exercise in numbers computations rather than some magical
incantation in the land of the great beyond. If you as a teacher can make sense to

304
them about what they are learning, they will repay your effort by making sense to you.

It remains to provide a new framework for the Theorem that would bring clarity
to equation-solving in general. To this end, we introduce following two operations on
a given a linear equation in the number x, such as 15x 21 = 28x + 3.

(E1) Add the same number to both sides of the equation. (For example,
adding 21 to both sides of the equation results in 15x 21 + 21 =
28x + 3 + 21.61 )
(E2) Multiply both sides by the same nonzero number. (For example,
1 1 1
multiplying both sides by 21 leads to 15 (15x 21) = 15 (28x + 3).)

We can see why these operations are relevant by noting that in (i) above, one can go
from the equation ax = b to the equation x = ab by using an operation of type (E2),
namely, multiply both sides of ax = b by a1 and, furthermore, that in (ii), transposing
a number A from one side to the other is nothing but applying an operation of type
(E1) to the equation, namely, add A to both sides.
We should also point out a key feature of both (E1) and (E2): each operation is
reversible in the sense that if, in (E1), we add A to both sides, then if we follow it
by adding A to both sides, we get back the original equation. Similarly, in (E2),
if we multiply both sides of an equation by B, and B 6= 0, then if we follow it by
multiplying both sides by B1 , again we get back to the original equation. Incidentally,
the second example shows why in (E2) we stipulate that the number B be nonzero,
as otherwise B1 would make no sense.
We say that two linear equations in one variable are equivalent if they have the
same solutions, i.e., every solution of one is also a solution of the other. Then we
have a simple observation:

Lemma Applying either of the operations (E1) and (E2) to a given linear equa-
tion of one variable results in an equivalent equation.

61
Notice that we are making use of Theorem 1 of the Appendix in Chapter 1 so that there is no
need to use any parentheses on either side.

305
We can give a formal proof, but for once, that would be less illuminating than
looking at a particular equation, such as 15x 21 = 28x + 3, which we will call
Equation (1). Let us add 21 to both sides as before to get 15x = 28x + 24, which we
call Equation (2). To show that Equations (1) and (2) are equivalent, suppose x0 is a
solution of Equation (1), then 15x0 21 = 28x0 + 3. If we add 21 to the numbers on
the left and on the right results in 15x0 = 28x0 + 24, which is exactly the statement
that x0 is a solution of Equation (2). Conversely, suppose x0 is a solution of Equation
(2), and we must show that it is a solution of Equation (1). Thus by hypothesis,
15x0 = 28x0 + 24. Now we add (21) to the numbers on the left and on the right,
and we end up with 15x0 21 = 28x0 + 3. But this says x0 is a solution of Equation
(1), and the proof is complete. We can see that the reason the Lemma is true is that
(E1) is reversible. The reasoning with (E2) is similar.

We can now give a new proof of the Theorem using the Lemma. Given ax +
b = cx + d, successive applications of (E1) to the equation results in the equation
(a c)x = d b. An application of (E2) to (a c)x = d b yields the equation x =
db db
ac .By the Lemma, the equations ax + b = cx + d and x = ac are equivalent.
db
But the latter of course has a unique solution ac , so by the definition of equivalence,
so does the former. The Theorem is proved again.

The preceding discussion gives the impression that every linear equation of one
variable has a unique solution, but this would be so only if one ignores the hypothesis
that a c 6= 0 in the Theorem. We now give some examples to counteract this
impression. The equation 2x 1 = 2x + 1 clearly has no solution, and the equation
2x 3 = 2x 3 has an infinite number of solutions as every number is a solution.
We leave the general case to an exercise below.

Exercises 6.2

1. Prove (i) above, i.e., prove that x = ab is a solution of ax = b, where a 6= 0, and


that any solution of the equation is equal to ab .

306
2. Give the details of the computation that if a c 6= 0, then for all b and d,
! !
db db
a +b=c +d
ac ac

3. Imagine that you are teaching eight grade algebra and you have to explain for the
first time how to solve 2x + 11 = 2 x. How would you explain (a) what it means to
solve this equation, and (b) how to solve it, step-by-step?

4. Formulate a precise statement about a linear equation in x, ax + b = cx + d,


where a, b, c, d are the constants, so that the equation has a unique solution, has no
solution, and has an infinite number of solutions. (Of course you have to prove it.)

5
5. If 13 of a number N exceeds a third of N by 8, what is N ? (This is problem 10
of Exercises 1.5 in Chapter 1.)

6. [In this problem, you are assumed to know the formula for the circumference of a
circle.] Two boys start walking around a circle from the same spot at the same time.
They walk in the same direction, one at a constant speed of F feet per minute, and
the other at 2F feet per minute. After T minutes, they are at the greatest distance
apart for the first time. What is the radius of the circle in terms of F and T ?

7. In a cage of chickens and rabbits, the ratio of chickens to rabbits is 1 : 3. If there


are 294 feet altogether in the cage, how many chickens and how many rabbits are
there?

8. Given three consecutive whole numbers. If the sum of the smallest plus twice the
next number plus three times the largest number is 110, what are these numbers?

9. Explain to an eighth grader why (a) two numbers (i.e., rational numbers; see
FASM) that are equal remain equal when multiplied by equal numbers, and (b) two
numbers which are equal remain equal when added to equal numbers.

307
3 The graphs of linear equations in two variables

Before discussing the graphs of linear equations, we observe that at this point we
can set up coordinates in the plane, in the sense that we can uniquely associate
to each point of the plane an ordered pair of numbers, and vice versa. Because you
are already familiar with these concepts, we will merely outline the main points of
how you can do this in the school classroom. In the process, we get to see why we
took the trouble to prove Theorem G4 in 3 of Chapter 4, which asserts that opposite
sides of a parallelogram are equal.
Choose two perpendicular lines in the plane which intersect at a point to be called
O. The horizontal line is traditionally designated as the x-axis, and the vertical one
the y-axis. Recall that these may be regarded as number lines (cf. the discussion
of distance in 1 of Chapter 4). We may henceforth identify every point on these
coordinate axes (as the x and y axes have come to be called) with a number. As
expected, we choose the positive numbers on the x-axis to be on the right of O so
that O is the zero of the x-axis, and we choose the positive on the y-axis to be above
O on the y-axis so that O is also the zero of the y-axis. Then to each point P in the
plane we associate an ordered pair of numbers in the following way. We now formally
agree to call any line parallel to the x-axis to be a horizontal line, and also any
line parallel to the y-axis to be a vertical line. Then through P draw two lines,
one vertical and one horizontal, so that they intersect the x-axis at a number a and
the y-axis at a number b, respectively.Then the ordered pair of numbers (a, b) are
said to be the coordinates of P (relative to the chosen coordinate axes); a is called
the x-coordinate and b the y-coordinate of P (relative to the chosen coordinate
axes).
Y

Pr rb

r X
a O

308
Now, by construction, P aOb is a parallelogram. By Theorem G4 in 3 of Chapter
4, we have that the length of the segment from P to b, |P b|, is just |a|. Likewise, the
length of the segment from P to a, |P a|, is just |b|. Since the line LP a is parallel to the
y-axis and the y-axis is perpendicular to the x-axis, we see that LP a is perpendicular to
the x-axis (Theorem G3 of 3 in Chapter 4). For the same reason, LP b is perpendicular
to the y-axis. Thus, |a| is in fact the distance from P to the y-axis, and |b| is the
distance from P to the x-axis. We have therefore obtained a different interpretation
of the coordinates of P :
The x-coordinate of P is the distance from P to the y-axis if P is in the
right half-plane of the y-axis, and is minus this distance from P to the
y-axis if P is in the left half-plane of the y-axis. The y-coordinate of P is
likewise the distance from P to the x-axis if P is in the upper half-plane
of the x-axis, and is minus this distance from P to the x-axis if P is in
the lower half-plane of the x-axis.
Conversely, with a chosen pair of coordinate axes understood, then given an or-
dered pair of numbers (a, b), there is one and only one point in the plane with coor-
dinates (a, b). Precisely, this is the point of intersection of the vertical line passing
through (a, 0) and the horizontal line passing through (0, b). These two lines being
unique, by virtue of the Parallel Postulate, the point of intersection is also unique.
There is thus a bijection between all the points in the plane and all the ordered pairs
of numbers.
With this bijection understood, we proceed to adopt the usual abuse of notation
by identifying a point with its corresponding ordered pair of numbers. This
bijection is further standardized by the introduction of a symbol. Let R denote the
collection of all the numbers (i.e., points) on the number line; in mathematics, R is
called the set of real numbers. Then the plane will henceforth be denoted by R2 ,
where the superscript 2 reminds us that each point (of the plane) corresponds to
two real numbers.
We emphasize that in denoting the plane by R2 , it is implicitly assumed that the
choice of a pair of coordinate axes has been made.

In R2 , we define (a, b) = (c, d) to mean that the points represented by (a, b)


and (c, d) are the same point. Since we have just shown that every point corresponds

309
to one and only one ordered pair of numbers, we see that

(a, b) = (c, d) a = c, b = d

Again, note that there is no ambiguity as to what the equality between two ordered
pairs of numbers means.

A linear equation in two variables x and y is a statement about the equality


between symbolic expressions in the two numbers x and y which either is of the form
ax + by = c, where a, b, c are the constants and at least one of a and b is nonzero,
or is in this form after applications of the two operations (E1) and (E2) in 2. A
priori, this equality may be valid for some x and some y, or for no x and no y. Thus
2x = 52 y + 7 and 6 + 38 y = 179 5x are examples of linear equations in two
variables. A solution of this equation is an ordered pair of numbers (x0 , y0 ), so
that they satisfy the equation ax + by = c, in the sense that ax0 + by0 = c. To
solve the equation is to find all the solutions. The graph of ax + by = c is the
collection of all the points (x0 , y0 ) in R2 so that the ordered pair of numbers x0 and
y0 are a solution of ax + by = c.
An equation such as x 2y = 2 is an example of a linear equation in two
variables. We observe that in this situation, it is easy to find all the solutions of this
equation with a prescribed first number x0 or a prescribed second number y0 . For
example, with the first number prescribed as 3, then we solve the linear equation in
y, 3 2y = 2, to get y = 25 . Therefore (3, 25 ) is the sought-for solution. Or,
if the second number is prescribed to be y0 , then we solve the linear equation in x,
x 2(y0 ) = 2, to get x = 2y0 2. The solution is now (2y0 2, y0 ). Thus we
see, and we shall continue to bear witness, that the study of linear equations in two
variables is grounded in the study of linear equations in one variable.
Using the above method of getting all the solutions of the equation x 2y = 2,
we can plot as many points of the graph of x 2y = 2 as we please to get a good
idea of the graph. For example, the following points are on the graph of x 2y = 2:
(5, 1.5), (4, 1), (2, 0), (0, 1), (2, 2),
(2.5, 2.25), (4, 3), (6, 4), (7, 4.5).
Students need the experience of plotting points on the graph of an equation by hand,
and they should form this good habit right at the outset for the case of a linear equa-

310
tion in two variables. In the age of the graphing calculator, this need tends to be
overlooked.

Consider next the graph of the linear equation in two variables y = 3 which,
as an equation in two variables, is in reality the abbreviated form of the equation
0 x + 1 y = 3. We want to prove that the graph of y = 3 is the line L0 , which
is the horizontal line passing through the point (0, 3).62 Here is the simple proof. As
always, in order to show that two sets are equal, we must show that the two sets have
the same collection of elements. For the case at hand, we have to show that the graph
of y = 3 and the line L0 have the same collection of points in the plane. There is a
standard way to do this, which is to show that each set is contained in the other set,
or in standard set-theoretic language:

we show that each is a subset of the other.

Let then G be the graph of y = 3, and we must prove G = L0 , which means we


have to prove:
L0 G, and G L0
Observe that G consists of all the points (s, t) which are solutions of y = 3, and
therefore consists of all the points (s, t) so that t = 3, i.e., all the points (s, 3) where
s is any number. Thus G consists all the points whose y-coordinate is 3. Let us
now prove L0 G. Let P be a point of L0 ; we must prove that P G, i.e., the
y-coordinate of P is 3. But the horizontal line passing through P (which is L0 )
intersects the y-axis at the number 3 (by assumption on L0 ), so the y-coordinate of
P is 3, by definition. Hence P G. Conversely, let P G, and we must prove
that P L0 . Since P G, the y-coordinate of P is 3. By the definition of the
y-coordinate of a point, the horizontal line passing through P intersects the y-axis at
the number 3, i.e., intersects the y-axis at the point (0, 3). But this horizontal line is
precisely L0 , so P L0 .
62
This obvious fact is usually taken for granted, but we want to emphasize that this fact can be
explained in terms of the precise definition of the graph of an equation that we have just given. As
a teacher, you want your students to know the reasoning for such basic things, and the only way
you can do that is to learn how to explain them correctly and clearly yourself.

311
Y

L0
3

X
O

The same reasoning shows that the graph of the equation y = b in the plane for
any number b is exactly the horizontal line L0 passing through the point (0, b) on the
y-axis. We can do the same to vertical lines. In summary, we have:

the graph of x = c for any number c is the vertical line passing through
the point (c, 0) on the x-axis, and the graph of y = b for any number b is
the horizontal line passing through the point (0, b) on the y-axis.

Since there is only one horizontal (respectively, vertical) line passing through a given
point of the plane (why?), it follows that

every vertical line is the graph of the equation x = c, where (c, 0) is the
point of intersection of the line and the x-axis, and every horizontal line
is the graph of an equation y = b, where (0, b) is the point of intersection
of the line with the y-axis.

In general, it is well-known that the graph of a linear equation in two variables is


a line, and vice versa. What is not well-known is the reason why this is true. The
main purpose of this section is to prove this fact. Precisely:

Theorem The graph of a linear equation in two variables is a line. Conversely,


every line is the graph of a linear equation in two variables.

If the graph of ax + by = c is L, it is common to say that L is the line defined


by ax + by = c. By the second part of the theorem, every line L is the graph of
some linear equation in two variables ax + by = c for some constants a, b, c. But
more can be said. Suppose L is also the graph of a0 x + b0 y = c0 for some constants
a0 , b0 , c0 . Are the two equations related? The answer is given by the following lemma.

312
Lemma 1 Suppose a line is the graph of both ax + by = c and a0 x + b0 y = c0 .
Then there is a nonzero number k so that (a0 , b0 , c0 ) = (a, b, c), i.e., a0 = ka,
b0 = kb, and c0 = kc. Conversely, the graphs of the two equations ax + by = c and
kax + kby = kc, where k 6= 0, are the same line.

The proof of the first part of Lemma 1 is quite straightforward once you consider
the three cases separately: the line is vertical, horizontal, or neither. Of course the
proof of the second part is trivial. We will leave the details to an exercise.
Assuming Lemma 1, we see that any linear equation having a fixed line as its
graph is necessarily of the form kax + kby = kc (with k 6= 0) for some constants a, b,
c. We naturally regard all such equations (for any value of k) as the same equation.
With this understood, we are in the habit of saying in this case that the equation
of L is ax + by = c.

We will prove the Theorem in an unorthodox way. We first give a direct proof
using straightforward computations. Then we introduce a new concept and use it to
give a shorter and more conceptual proof. In so doing, we hope to accomplish two
goals. The first is that we want to demonstrate that, in mathematics, there is some-
times a right way and a wrong way of doing things. Not literally but figuratively.
If you go through a proof and come out of it not knowing what has happened, then
it is not the right proof. A second goal is to demonstrate that the whole idea of
conceptual understanding, so highly prized in mathematics education discussions
of the past two decades, is not some elusive quantity divorced from skills. The sec-
ond proof is more conceptual and more illuminating because it manages to wrest the
central idea from the myriad skills of the first proof.

Proof of Theorem Let us begin with the proof of the first part of the theorem.

Given ax + by = c, we will prove that its graph G is a line. We consider separately


the case of b = 0 and the case of b 6= 0. If b = 0, then by the definition of a linear
equation in two variables, a 6= 0. The equation may therefore be rewritten as x = c0 ,
where c0 is the constant c0 = ac . In this case, we saw above that the graph is a vertical
line. The first part of the theorem is therefore true in this case. Next, suppose b 6= 0.
Such being the case, we may rewrite the equation as by = ax + c, and therefore

313
as y = mx + k, where m = ab and k = cb .
Choose two distinct points P = (p1 , p2 ) and Q = (q1 , q2 ) on the graph G of
y = mx + k, and let L be the line passing through P and Q. We are going to prove
that G = L, which would show, in particular, that G is a line. As usual, we must
prove that (i) L G, and (ii) G L. To prove (i), let U = (u1 , u2 ) be a point of
L, and we will prove that U G. By the definition of the graph of an equation, this
means we must prove:

u2 = mu1 + k ()

To this end, because P and Q are both in G, we have


)
p2 = mp1 + k
(#)
q2 = mq1 + k

If p1 = q1 , then p2 = q2 and P and Q would not be distinct. Therefore p1 6= q1 . Let us


say q1 < p1 . We distinguish two cases: when L slants this way /, and when L slants
this way \.63 Suppose the former. We are going to assume for definiteness that U is
positioned so that u1 < q1 , as shown. It will be clear from the subsequent argument
that the position of U is irrelevant.
Y
L
r
P




Qr  R



U
r

S

 X

 O

Let a vertical line be drawn from P , and let a horizontal line be drawn from each
of Q and U , so that they meet the vertical line at R and S, respectively. Then we
get right triangles P RQ and P SU (4P RQ is a right triangle, for example, because
LQR is parallel to the x-axis and LP R is by choice perpendicular to the x-axis, so by
63
We speak informally of the slant of a line here, but in the proof of Theorem 2 in 4, we will
make this concept completely precise.

314
Theorem G3 of Chapter 4, LQR is perpendicular to LP R ). We are going to prove that
4P RQ 4P SU by using the AA criterion of similarity. The two triangles already
have a common angle, 6 QP R = 6 U P S. Since both have a right angle, they have a
second pair of equal angles. Thus 4P RQ 4P SU as claimed. By Theorem G20 of
Chapter 5,
|P S| |U S|
=
|P R| |QR|
By appealing to the cross-multiplication algorithm (item (b) in 5 of Chapter 2), we
see that this is equivalent to
|P R| |P S|
=
|QR| |U S|
In terms of the coordinates of P , Q, U , this is the same as
p 2 q2 p2 u2
= ()
p 1 q1 p1 u1

Now look at the two equations in (#) above. If we subtract the second equation in
(#) from the first, we see that p2 q2 = m(p1 q1 ), and therefore
p2 q2
m=
p1 q1

(Here we use p1 6= q1 .) It follows from () that


p2 u 2
m=
p1 u 1
Therefore p2 u2 = mp1 mu1 , or,

p2 mp1 = u2 mu1

We can simplify the left side by use of the first equation in (#), which says p2 mp1 =
k. Thus k = u2 mu1 , or what is the same thing, u2 = mu1 + k, which is exactly
(), as desired.
What would happen if L slants this way \ ? Then the situation becomes:

315
Le Y
e
e
eP
er
e
e
R e rQ
e rU
S e
O e X
e

Of course, it is still true that


|P R| |P S|
=
|QR| |U S|
But now, while the coordinate expressions of |P R| and |P S| remain the same, we
have:
|QR| = q1 p1 and |U S| = u1 p1 ,
so that we have
p 2 q2 p2 u2
=
q 1 p1 u1 p1
Since the purely algebraic statement deduced from (#), to the effect that
p2 q2
m=
p1 q1
remains valid, we get
p2 u 2 p2 u2 p2 u 2
m = = = .
u1 p1 (p1 u1 ) p1 u 1
After multiplying both sides by 1, we obtain as before,
p2 u 2
m=
p1 u 1
The remainder of the proof of () is the same. This proves (i), i.e., L G.

Next we tackle part (ii), i.e., we will prove G L. Let U 0 G, and we have to
show U 0 L. We argue by contradiction: Suppose U 0 6 L. We will prove that this
is impossible. Let the vertical line through U 0 meet L at the point U and meet the
horizontal line through P at the point R. It will be seen from the following argument

316
that it doesnt matter whether L is slanted this way / or the other way \, so we will
give a proof only for the case where L is slanted this way /. Note that the reasoning
is independent of the positions of P , Q, and U 0 in the following picture.

0
"
U "L
r "
r"
"
"
" U
"
"
"
"
"
"
P"
r
"
"
" R
"
"
"
Q"
r
"
"
X
O
"
"
"

If U = P , then U 0 and P would share the same x-coordinate so that U 0 = (p1 , v) and
v 6= p2 . However, the fact that U 0 G means that v = mp1 + k. But by (#) above,
we also have p2 = mp1 + k. Therefore v = p2 , and U 0 = (p1 , p2 ) = P , which means
U 0 G, a contradiction. Thus we may assume U 6= P . Now let U = (u1 , u2 ). Then
since U L and P L, the same argument as before using similar triangles yields
the equality
(p2 u2 )
m=
(p1 u1 )
We proceed to derive a second formula for m. Because U 0 lies on the same vertical
line as U , the coordinates of U 0 must be (u1 , v) for some number v. However, since
U 0 G and P G, we have

v = mu1 + k
p2 = mp1 + k

Subtracting the first equation from the second, we obtain p2 v = m(p1 u1 ), so


that
p2 v
m=
p1 u 1

317
Comparing these two expressions of m, we conclude that
(p2 u2 ) p2 v
=
(p1 u1 ) p1 u 1
Multiplying through by p1 u1 , we get p2 u2 = p2 v, from which one deduces that
v = u2 . Therefore we get U 0 = (u1 , v) = (u1 , u2 ) = U L, contradicting U 0 6= L.
This then concludes the proof of the first part of the theorem (that the graph G is a
line).

To complete the proof of the theorem, we will show that every line L is the graph
of a linear equation in two variables. If L is vertical, we have already dealt with this
case and we know L is the graph of x = c for some number c. Thus we will assume
L is not vertical at the outset. We will look for an equation of the form y = mx + k,
where m and k are suitably chosen numbers. The exact values of m and k can be
found via the following considerations.
Pick two distinct points P = (p1 , p2 ) and Q = (q1 , q2 ) on L. Note that p1 6= q1 ,
as otherwise L, being a line passing through two points with the same x-coordinate,
would be vertical. If (x, y) is an arbitrary point on L not equal to P , what linear
equation would x and y satisfy? If we let
q2 p2
m=
q1 p1
(the division is permissible because q1 p1 6= 0) then the by-now familiar consideration
of similar triangles, as indicated, leads to
y p2
=m
x p1

Y
L
r
P




Qr 



(x, y)
r




O X


318
We rewrite the preceding equation as y = mx + (p2 mp1 ). If we let

k = p2 mp1

then we are led to consider the following linear equation in two variables:

y = mx + k

We know from the first part of the theorem that the graph G of y = mx + k is a
line. Therefore if we can prove L = G, then the theorem would be proved. To show
that two lines (in this case G and L) are equal, it suffices to show that they have
two distinct points in common (see (L1) of Chapter 4). Since P and Q lie in L, by
definition, we only need to show that P and Q also lie in G. This is equivalent to
showing that p2 = mp1 + k and q2 = mq1 + k. The first equality is true because,
by definition of k,
mp1 + k = mp1 + (p2 mp1 ) = p2
For the second equality, we again use the definition of k:

mq1 + k = mq1 + (p2 mp1 ) = m(q1 p1 ) + p2


q p
But we recall the definition of m = q2 p2 at this point and get
1 1

q2 p2
mq1 + k = (q1 p1 ) + p2 = (q2 p2 ) + p2 = q2
q1 p1
Finally, we have completed the proof of the theorem.

Now let us take a critical look at the preceding proof.

We have proved the theorem by resorting to a straightforward computation at


each step, but there are far too many details but no unifying idea to make sense of
them. For example, how would you give a brief summary of this proof for your stu-
dents? It is in the nature of mathematics that being correct is not enough, because
ideally we want to know not only that something is true in a formal sense, but also
why it really works. So what is the key to this proof? To answer this question, we
introduce a concept that would allow us to reshape the preceding proof to give it

319
more transparency. This concept also happens to be one that has been thoroughly
abused in school mathematics: the slope of a line.

Definition Given a line L which is non-vertical. Let P = (p1 , p2 ) and Q = (q1 , q2 )


be any two distinct points on L. The slope of L is the quotient64
q2 p 2
q1 p 1

We begin with some preliminary remarks about this definition. With P and Q
fixed, the preceding quotient has a geometric interpretation. First, notice that the
order of P and Q is irrelevant, because if we interchange P and Q, then the preceding
quotient becomes
p2 q 2
p1 q 1
But the two quotients are equal because, by the generalized cancellation law for
rational quotients (see item (a) in 5 of Chapter 2):
q2 p2 (1)(p2 q2 ) p2 q2
= =
q1 p1 (1)(p1 q1 ) p1 q1
The following geometric discussion of the meaning of slope therefore ignores the rel-
ative positions of P and Q.
Now, suppose the line slants this way /. We may assume that the point P is
above Q. Let the vertical line passing through P intersect the horizontal line passing
through Q at a point R to form a right triangle P QR:

Y L

r
P 




Qr  R








O X

64
This is not a fraction, but a rational quotient! This is the last reminder of why a detailed
discussion of the arithmetic of complex fractions and rational quotients is essential. Notice that
FASM is involved, because the the numbers q2 p2 and q1 p1 may not be rational numbers.

320
Then, by the definition of R, its x-coordinate equals the x-coordinate of P and its
y-coordinate equals the y-coordinate of Q. Thus if P = (p1 , p2 ) and Q = (q1 , q2 ),
then R = (p1 , q2 ). It follows that p2 q2 is the difference between the y-coordinates
of P and R, so that p2 q2 = |P R|. Similarly, p1 q1 is the difference between the
x-coordinates of R and Q and hence equals |QR|. Therefore
p2 q2 |P R|
= .
p1 q1 |QR|
We pause to note that, if we fix Q and R, then the bigger the slope the bigger |P R|
becomes and therefore the higher R becomes. Thus for a line L slanting this way /,
the bigger its slope, the closer to being vertical it becomes.
On the other hand, suppose the line slants this way \, then assuming as before
that P is above Q, we obtain a right triangle P QR in exactly the same way:

L Y
e
e
e
eP
er
e
e
R e rQ
e
e
O e X
e

But now, since P is to the left of Q (so that its x-coordinate is smaller than that
of Q), p1 q1 is the negative of |QR|. As P is still above Q, we continue to have
p2 q2 = |P R|. Therefore,
p2 q 2 |P R|
= .
p1 q 1 |QR|
Altogether, we have the following geometric interpretation of slope: Referring to the
right triangle 4P RQ formed by the vertical line through P and the horizontal line
through Q, where P , Q are points on the line L, we have:

the absolute value of the slope of the line L joining P and Q is the quotient
of the (length of the) vertical leg by the (length of the) horizontal leg of
4P RQ.

321
Precisely,

|P R|

if L slants this way /
|QR|




slope of line L =
|P R|
if L slants this way \




|QR|

We see as before that, if the slope of a line is positive, then bigger slope means
the line is more vertical; if the slope is negative, then the more negative the slope
is, the more vertical the line is, as shown.

Y Y
bigger slope
J
J
J
J
 J
   biggerPslope J

   PP J
PP J
  PP
J PP
X X
O OJ
PP

There is one special case of the concept of slope that can be easily understood.
Suppose a line L joining two distinct points P and Q is horizontal. Then P and
Q have equal y-coordinates so that q2 = p2 and the slope of L, according to the
preceding definition, must be 0. Conversely, if a non-vertical line L joining P and
q p
Q has 0 slope, then in the notation of the preceding definition, q2 p2 = 0, which
1 1
implies q2 = p2 , i.e., p2 = q2 . Thus P and Q have the same y-coordinate and, by
the definition of the y-coordinate of a point, the horizontal line L1 passing through P
and the horizontal line L2 passing through Q would both intersect the y-axis at the
point (0, p2 ). Since there is only one horizontal line passing through (0, p2 ), we see
that L1 = L2 , and P and Q lie on the same horizontal line. In short,

a non-vertical line has 0 slope if and only if it is horizontal.

Now we come to the critical question: is this definition of slope well-defined?


By this we mean: does it make any sense? In greater detail, there are two issues

322
concerning this definition that should give us pause. The first is whether the quotient
q2 p2
q1 p1 might be dividing by 0. The second is that the definition of slope is for the line
p2
L and not for the specific points P and Q chosen on L, and yet the quotient qq12 p 1
is
asserted to be the slope of L and not just the slope of P and Q. If A = (a1 , a2 )
and B = (b1 , b2 ) are two other points on L, and if we use A and B instead of P and
Q to define the slope of L, we would get that the slope of L is
b 2 a2
b 1 a1
But do we know that
q 2 p2 b 2 a2
= ?
q1 p1 b 1 a1

The first question is easily answered: since L is not vertical, distinct points on it
cannot have the same x-coordinate. In particular, q1 6= p1 and therefore q1 p1 6= 0.
The answer to the second question requires a bit more work, but it is nothing we
havent seen before. Consider the following picture where L is assumed to slant this
way /:

Y
L

Pr 





Qr  R



Ar


 X
 O
Br 
 C


The vertical lines AC and P R are parallel (cf. problem 5 of Exercises 4.1), as are
the horizontal lines BC and QR. By Theorem G18 of Chapter 5, the angles 6 QP R
and 6 BAC have the same degrees, as do 6 ABC and 6 P QR. Moreover, 6 P RQ and
6 ACB are both right angles. Thus the triangles ABC and P QR are similar for reason

of AA (Theorem G22 of Chapter 5). By Theorem G20 of Chapter 5,


|P R| |QR|
=
|AC| |BC|

323
which, on account of the cross-multiplication algorithm for complex fractions (item
(b) in 6 of Chapter 1), is equivalent to
|P R| |AC|
=
|QR| |BC|
The by-now familiar reasoning shows that |P R| = p2 q2 and |QR| = p1 q1 , so that
|P R| p2 q2 (q2 p2 )
= =
|QR| p1 q1 (q1 p1 )
Similarly,
|AC| b 2 a2
=
|BC| b 1 a1
We therefore get
q2 p 2 b2 a2
=
q1 p1 b1 a1
If the points P , Q, A, B are positioned differently or if the line L slants this way
\, the argument is the same (cf. the geometric interpretation of the slope above)
except that we need to make sure in each case that the minus signs that arise (e.g.,
q1 p1 = |QR|) get canceled. We have thus shown that the definition of slope is
well-defined.

In school textbooks and standard professional development materials, the defini-


tion of the slope of a line is usually defective in that the independence of the definition
from the two points chosen on the line (P and Q above) is not mentioned explicitly,
much less proven. As this independence is critical to understanding the graph of a
linear equation, a logical consequence of such a defective definition is that students do
not understand the relationship between a linear equation and its graph.

The legitimacy of the definition of slope tells us that given a line, we can calculate
its slope by choosing the two points on it that best suit our purpose. This is a useful
thing to keep in mind.

With the availability of the concept of slope, we can now give a new proof of the
Theorem. We need the following key fact about slope.

324
Lemma 2 (Key Lemma) If two lines pass through the same point and have the
same slope, then they are the same line.

Proof Let the lines be L and L0 , let their slopes be equal, and let them both
pass through the point P . A priori, they are distinct lines and will therefore be
schematically represented as such. Our task is to show that they are in fact the same.
In the following, we only take up the case of positive slope. The case of negative slope
can be handled the same way.

Y
L0
"
0  "

Q " L
r"

"r

" Q
 "
 "
"

"
"
"
"
P "
r
" 
"

" R0
"
"
"
"
X
O

Take an arbitrary point Q0 on L0 , and form right triangle 4P Q0 R0 so that the


horizontal line through P and the vertical line through Q0 intersect at the point R0 .
Let the vertical line through Q0 intersect L at Q. Recall that we are trying to show
the lines L and L0 coincide, and this would be true as soon as we can show that Q
and Q0 coincide because there is only one line passing through two given points. We
shall make use of the fact that the slope of a line can be computed using any two
points on the line. The equality of the slopes of L0 and L is now expressed as:

|Q0 R0 | |QR0 |
=
|P R0 | |P R0 |

Since the denominators of these two quotients are equal, the numerators must be
equal as well, i.e., |Q0 R0 | = |QR0 | (for example, multiply both sides by the number

325
|P R0 |). Therefore Q = Q0 and L and L0 coincide, as claimed.

We are now in a position to give a short, alternate proof of the Theorem.


The proof is not essentially different from the original proof, but with the concept
of slope available, we can now rephrase the latter so that the consideration of slope
assumes its natural place as the focal point of the argument. As before, let distinct
points P = (p1 , p2 ) and Q = (q1 , q2 ) be chosen on the graph G of y = mx + k. Let
the line LP Q joining P and Q be denoted more simply by L. We are going to prove
G = L, thereby proving that the graph of y = mx + k is a line. First: G L. Given
U = (u1 , u2 ) G, we must prove U L. We do so by proving that the line L0 = LP U
which joins P and U coincides with L. According to the Key Lemma, all we have to
do is to prove that L and L0 have the same slope and pass through the same point.
They clearly pass through the same point P . As to their slopes, we have that the
slope of L is
q 2 p2 (mq1 + k) (mp1 + k)
= =m
q 1 p1 q1 p1
and the slope of L0 is
u 2 p2 (mu1 + k) (mp1 + k)
= =m
u 1 p1 u1 p1
So L0 = L and therefore U L. This proves G L.
Next: L G. Let U = (u1 , u2 ) L and we have to prove that U G, i.e.,
u2 = mu1 + k. The slope of L computed from P and Q is
q2 p 2
q1 p 1
But from q2 = mq1 + k and p2 = mp1 + k, we get
q2 p2 m(q1 p1 ) + (k k)
= =m
q1 p 1 q1 p1
The slope of L is therefore equal to m. Now if we compute the slope of L using P
and U , then we get
u 2 p2
=m
u 1 p1
This means u2 = mu1 +(p2 mp1 ) = mu1 +k, the last equality because of p2 = mp1 +k.
So u2 = mu1 + k and L G. The proof that the graph of y = mx + k is a line is

326
complete.

It remains to prove the second part of the theorem. It suffices to prove that every
non-vertical line L is the graph of some equation y = mx + k. Let P = (p1 , p2 ) and
Q = (q1 , q2 ) be distinct points in L. Let m be the slope of L, then
q2 p2
m=
q1 p1
Take any point (x, y) on L not equal to P , and we can compute the slope of L using
(x, y) and P :
y p2
=m
x p1
This implies y = mx + (p2 mp1 ). Defining

k = p2 mp1

we see that all the points (x, y) on L not equal to P satisfy y = mx + k. Let G
be the graph of y = mx + k. We have just seen that G is a line. We now prove
that L = G. We already have Q G (because Q 6= P , and so Q lies in G). We now
check that also P G, i.e., p2 = mp1 + k: this is because, by the definition of k, we
have mp1 + k = mp1 + (p2 mp1 ) = p2 . Therefore G passes through both P and Q,
which are in L. The two lines L and G are therefore the same line. The proof of the
theorem is complete.

A student who masters this proof will never have to memorize the different forms
of the equation of a line. They will be able to derive the needed equation in a few
minutes time, or at worst, they can memorize those equations with much greater ease
because they now possess a mental framework to put those equations in the proper
place in their memory bank. We illustrate with some simple examples. Recall that
in the course of the preceding proofs, we have made the following observations:

() The equation of a non-vertical line L is of the form y = mx + k,


where m is the slope of L and k is a number. (See proof of second part of
theorem.)
() Let the points P = (p1 , p2 ) and Q = (q1 , q2 ) satisfy the equation
y = mx + k, where m and k are constants. Then the graph of y = mx + k

327
is the line joining P to Q. (See proof of first part of theorem.)

Example 1 The x-coordinate of the point at which a line L intersects the x-axis
is called the x-intercept of L; sometimes the point of intersection is itself called the
x-intercept of L. Similarly, the y-coordinate of the point at which a line L intersects
the y-axis is called the y-intercept of the line; sometimes the point of intersection
k
is itself called the y-intercept of L. Since the points ( m , 0) and (0, k) satisfy the
k
equation y = mx + k, and since the graph of y = mx + k is the line joining ( m , 0)
and (0, k) (by ()), we see that the x-intercept and the y-intercept of the graph of
k
y = mx + k are m and k, respectively.

Example 2 What is the equation of the line which passes through (1, 3) and
( 12 , 4)
?

Call this line `. The slope of ` computed using the points (1, 3) and ( 21 , 4) is
43 2
1 = ,
2
(1) 3

so the equation of ` has the form y = 23 x + k (see ()). Since the point (1, 3) lies
on `, we have 3 = 23 (1) + k. Thus k = 11 3
and the equation of ` is y = 23 x + 11
3
.
(Note that if we use the other point ( 12 , 4) to evaluate k in y = 23 x + k, then we
get 4 = 23 12 + k, so that
1 11
k =4 =
3 3
as before.)
A slightly different method to obtain the equation of ` is the following. We begin
by computing the slope of the line as before, obtaining 23 . Now we exploit the fact
that the slope of ` can be computed equally well using an arbitrary point (x0 , y0 ) (not
equal to (1, 3)) and (1, 3), so that
y0 3 2
=
x0 (1) 3

which is equivalent to y0 3 = 23 (x0 + 1). Notice that in the latter form, the equality
continues to hold even when (x0 , y0 ) = (1, 3). Therefore rewriting y0 3 = 32 (x0 +1)

328
as
2 11
y0 = x 0 +
3 3
then what we have proved is that any point (x0 , y0 ) on ` satisfies the linear equation
in two variables y = 23 x + 11 3
. The latter is therefore the equation of `. Again, if we
1
use ( 2 , 4) in place of (1, 3), then we see that every point (x, y) of ` must satisfy

2 1
 
y4= x ,
3 2
and this (of course) simplifies to y = 23 x + 11
3
as before.

Example 3 In general, the equation of the line joining two given points (p1 , p2 )
and (q1 , q2 ) is
y p2 = m(x p1 ),
p2 q2
where m is the slope of the line, m = p1 q1
. Equivalently, the equation is

y q2 = m(x q1 ).

The reasoning is the same.

Exercises 6.3

1. Prove Lemma 1.

2. Explain directly to an eighth grader why the slope of the line defined by 2x5y = 7
is 25 by making use of only the definition of slope.

3. Solve: (a) 4bx + 13 = 2x + 26b, where b is a number not equal to 12 . Simplify


your answer. (b) 52 ax 17 = 31 ax 15
2
, where a is a nonzero number.

4. (a) Let ` be the line with slope m passing through ( 12 , 43 ). For which value of m
would ` pass through ( 53 , 31 )? (b) Let ` be the line joining ( 32 , 4) and ( 54 , q), where
q is some number. For what value of q would ` pass through (2, 3)?

329
5. Does the line joining (3, 2) and (6, 2) contain the point (9, 6)? Explain it two
different ways.

6. Let a, b be positive numbers. Can the three points (a, b), (2a, b + 2), (a3 , b 1)
be collinear? Explain.

7. Let A be the point with coordinates (a1 , a2 ) in the (coordinate) plane, and let the
translation along the vector OA (O being the origin) be denoted by T . (a) Prove
that T (x, y) = (x + a1 , y + a2 ) for all (x, y). (b) If L is the vertical line defined by
x = 72 , what is the equation of T (L) ? (c) If L is the horizontal line defined by
y = 51, what is the equation of T (L) ? (d) If L is the line defined by 2x 3y = 1,
what is the equation of T (L) ?

8. (a) Let L be the vertical line x = c. Prove that the reflection with respect to L is
given by the transformation R so that R(x, y) = (2c x, y) for all (x, y). (b) Formu-
late the corresponding statement for reflection with respect to a horizontal line y = d.

9. Let be the rotation of 180 with respect to the origin O. Prove that (x, y) =
(x, y) for all (x, y).

10. (a) What is the equation of the line joining the two points (X, Y ) and (Z, W )?
(b) What is the equation of the line whose slope is A and it passes through the point
(0, B) ?

11. Find the equation of the line passing through (c, c3 ) and (d, d3 ), where c and d
are numbers so that c 6= 1, d 6= 1, and c 6= d. Simplify your answer.

12. For large x, e.g., x 106 , which of the graphs of the following two equations is
above the other? y = 10x 5, 000, 000 and y = x + 1, 500, 000.

4 Parallelism and perpendicularity

In this section, we show how the parallelism and perpendicularity of lines can be

330
characterized in terms of the slope. We call special attention to the careful proof of
the characterization of perpendicularity (Theorem 2).

Theorem 1 Two distinct non-vertical lines are parallel they have the same
slope.

Proof If one of the lines `1 and `2 is horizontal, this assertion is easily disposed
of on account of the characterization of horizontal lines as those with zero slope (see
the discussion of horizontal lines after the definition of slope in the preceding section).
Assume then that neither is horizontal.
First, suppose `1 k `2 , and we will prove they have the same slope. For clarity,
we will assume that `1 and `2 have positive slope, but the case of negative slope can
be handled in exactly the same way. Take a point P on `1 and let a vertical line
through P intersect `2 at Q. (This vertical line must intersect `2 because the latter
is not vertical.) Since the lines are distinct, P 6= Q. Go along the ray RP Q from P
to Q and stop at some point U so that |P Q| = |QU |. From Q, draw a horizontal line
which meets `1 at S, and from U also draw a horizontal line which meets `2 at T .
The following gives one representation of this situation.
Y  `1

P
  `2


 
 
 
 
Q
 S


 
 
 


 T U



O X

Because 4P SQ and 4QT U are right triangles with legs parallel to the coordinate
axes, the slopes of `1 and `2 are

|P Q| |QU |
and
|SQ| |T U |

331
respectively. We have to show that these two numbers are equal. Since we already
know |P Q| = |QU |, it suffices to prove |SQ| = |T U |. We do this by showing that
4P SQ and 4QT U are congruent, which then immediately yields the desired equality
|SQ| = |T U |, because the corresponding sides of congruent triangles have the same
length (Theorem G5 in 4 of Chapter 4).
To prove the congruence, we will make use of the ASA criterion of congruence
(Theorem G8 in 4 of Chapter 4). We have |P Q| = |QU | by construction. Also
|6 P QS| = |6 QU T | = 90 , and 6 SP Q and 6 T QU have the same degrees because
they are corresponding angles of the parallel lines `1 and `2 (Theorem G18 in 2 of
Chapter 5). Hence triangles SP Q and T QU are congruent, from which we conclude
that |SQ| = |T U |. This then completes the proof that `1 and `2 have the same slope.

Before proving the converse, we should ask what strategy we might use.
As always, what we can do depends on what tools are available. Up to this
point, what tools (theorems) are at our disposal that would guarantee that
two lines are parallel? Basically only one: Theorem G18 in 2 of Chapter
5, which says that if the corresponding angles (or alternate interior angles)
of a transversal with respect to a pair of lines are equal, then the lines are
parallel. The above picture clearly shows that proving the equality of 6 SP Q
and 6 T QU would be our best bet. It then follows that we would try to prove
the congruence of triangles P SQ and QT U to achieve our goal.

Conversely, suppose two distinct, non-vertical lines `1 and `2 have the same slope,
and we have to show that they are parallel. We are assuming they are not horizontal,
so we may perform the same construction as before to get |P Q| = |QU | and right
triangles 4P SQ and 4QT U . We are going to prove that the triangles are congruent
by using the SAS criterion for congruence (Theorem G7 in 4 of Chapter 4). We
already have right angles 6 P QS and 6 QU T . We also have the equality of one pair
of sides, |P Q| = |QU |, by construction. To get the equality of the other pair of sides,
we use the hypothesis on the equality of the slopes of `1 and `2 :
|P Q| |QU |
=
|SQ| |T U |
Since |P Q| = |QU |, therefore |SQ| = |T U |. Hence the triangles 4P SQ and 4QT U
are congruent, and consequently, the corresponding angles 6 SP Q and 6 T QU are equal

332
(Theorem G5 in 4 of Chapter 4). This implies `1 k `2 because their corresponding
angles relative to the transversal P U are equal (Theorem G18 in 2 of Chapter 5).
The proof is complete.

Remark We could have rephrased the proof so that the concept of congruence
is replaced by the concept of similarity. For then, we could have let U be any point
on the ray RP Q and not finesse it so that |P Q| = |QU |, and the whole argument
would still be valid provided we replace the SAS criterion for congruence by the SAS
criterion for similarity (Theorem G21 in 3 of Chapter 5). Such a proof would be a
trifle bit more natural. But in terms of teaching in the school classroom, especially
at the beginning of teaching geometry, congruence is the more intuitive and the more
elementary concept and students would find it more accessible than similarity. To the
extent possible, we would therefore exploit congruence. This accounts for the above
proof.

The next theorem characterizes perpendicularity.

Theorem 2 Two distinct, non-vertical lines are perpendicular the product


of their slopes is 1.

We begin with a general observation about why the slopes of perpendicular lines
that pass through the origin of a coordinate system must have opposite signs (i.e,
one is negative and the other is positive). The four right angles formed by the positive
and negative coordinate axes, with vertex at the origin O, are usually called the four
quadrants of the coordinate system and are labeled I, II, III and IV, as shown.

II I
q
O
III IV

Restricting the discussion to non-vertical and non-horizontal lines passing through

333
the origin, we see that (outside of the origin) they must lie completely inside either
quadrants I and III, or quadrants II and IV, as shown.

 T
 T
 T
 T
O OT
 T
 T
 T

We are going to give a simple but decisive explanation that the lines in the left
picture, those that lie completely inside quadrants I and III, are the ones with positive
slope, and those in the right picture, those that lie completely inside quadrants II and
IV, are the ones with negative slope. Take a point (a, b) on such a line, then using
the particular choice of (a, b) and (0, 0) to compute its slope, we see that the slope
is ab . Since b and a have the same sign in quadrants I and III, and opposite signs in
quadrants II and IV, it follows that the slope is positive for lines lying in quadrants I
and III, and negative for lines lying in quadrants II and IV. As a consequence, if we
have two rays issuing from O with positive slopes (i.e., the lines containing them have
positive slopes), then they lie in either quadrant I or III and therefore the degree of
the angle between these rays is either greater than 90 , or less than 90 . Similarly for
two lines with negative slopes. It follows that two lines whose slopes have the same
sign can never be perpendicular to each other. Hence:

If two lines passing through O are perpendicular, their slopes have opposite
signs, i.e., one is positive and the other is negative.

Side remark We have been talking about lines that slant this way / or that way
\ informally without being precise. Now precision is finally possible. We say a line
slants this way / if the line passing through the origin and parallel to it lies in
quadrants I and III , and similarly, we say a line slants this way \ if the line
passing through the origin and parallel to it lies in quadrants II and IV. It follows
from Theorem 1 that a non-vertical and non-horizontal line slanting this way / has
positive slope, and one slanting this way \ has negative slope.

334
We can now give the Proof of Theorem 2. First suppose `1 and `2 are per-
pendicular lines. Let lines L1 and L2 be lines passing through the origin so that
`1 k L1 and `2 k L2 . By Theorem 1, `1 and L1 have the same slope. Same for `2
and L2 . Therefore it suffices to prove that the slopes of L1 and L2 have a product
equal to 1. We break this up into two smaller steps. By hypothesis, both L1 and
L2 are non-vertical and non-horizontal. By the observation immediately preceding
this proof, we already know that the product is a negative number, so we dont need
to pay any attention to the sign of the product. Hence, it suffices to prove that the
product of the absolute values of the slopes of L1 and L2 is equal to 1.
Observe that, because `1 `2 , L1 L2 (exercise). It follows from the preceding
discussion that we may assume that L2 lies in quadrants I and III and L1 lies in
quadrants II and IV. Because the rotation % of 90 degrees around the origin O would
carry L2 to L1 , it is natural to think of using the congruence transformation % to
carry out the proof. Let P2 be some point on the line L2 and in quadrant I, and let
P1 = %(P2 ). Then P1 L1 . Furthermore, let the vertical line from P2 meet the x-axis
at Q2 , then %(Q2 ), to be denoted by Q1 , lies on the y-axis. As % is a congruence, we
have:
|P1 Q1 | = |P2 Q2 | and |OQ1 | = |OQ2 |

L1
J
J
P1J Q1 L
J
P  2
J 2 

J 
J 

J 
J 

J
O Q2
By the way the coordinate system is set up (see the discussion at the beginning of
3), we know that |P1 Q1 | and |OQ1 | are the absolute values of the x and y-coordinates
of the point P1 . Thus computing the slope of L1 using the points P1 and O, we see
that the absolute value of this slope is |OQ1 |/|P1 Q1 |. The absolute value of the slope
of L2 is of course |P2 Q2 |/|OQ2 |. Thus, taking into account the previous equalities,
the product of the absolute values of the slopes of L1 and L2 is
|OQ1 | |P2 Q2 | |OQ2 | |P2 Q2 |
= = 1
|P1 Q1 | |OQ2 | |P2 Q2 | |OQ2 |

335
This completes the first part of the proof of Theorem 2.

How shall we approach the proof of the converse? We want to prove, if


the product of the slopes of L1 and L2 is 1, that L1 L2 . In other
words, if P2 and P1 are two random points on L2 and L1 , respectively, we
want to prove |6 P1 OP2 | = 90 .

L1
J L2
J
P1J Q1 P2 
J 

J 
J 

J s 
J 
J s
O Q2

Since we are already given that 6 Q1 OQ2 is a right angle and |6 Q1 OQ2 | =
|6 P2 OQ2 | + |6 Q1 OP2 |, it means that if we can show

|6 P1 OQ1 | = |6 P2 OQ2 |

then we would have

|6 P1 OP2 | = |6 P1 OQ1 | + |6 Q1 OP2 | = |6 P2 OQ2 | + |6 Q1 OP2 | = 90

So how can we show |6 P1 OQ1 | = |6 P2 OQ2 |? We resort to the standard


reasoning of identifying these angles as corresponding parts of similar or
congruent triangles.

Next we prove the converse. Suppose we have two lines `1 and `2 so that the
product of their slopes is 1. We must prove that `1 `2 . Again we let L1 and L2 be
lines passing through the origin and parallel to `1 and `2 respectively, and it suffices
to prove that L1 L2 (see Exercises 6.4). By Theorem 1, the product of the slopes
of L1 and L2 is also 1. In particular, since the slopes of L1 and L2 have opposite

336
signs, we may assume that L2 lies in quadrants I and III, and L1 lies in quadrants II
and IV. Let P2 be some point on L2 lying in quadrant I. Drop a vertical line from P2
so that it meets the x-axis at Q2 (see preceding picture). Let P1 be some point on L1
lying in quadrant II, and let a horizontal line from P1 meet the y-axis at Q1 . If we
can prove that 4P1 OQ1 4P2 OQ2 , then we would have |6 P1 OQ1 | = |6 P2 OQ2 |
(Theorem G20 in 3 of Chapter 5) so that

|6 P1 OP2 | = |6 P1 OQ1 | + |6 Q1 OP2 |


= |6 P2 OQ2 | + |6 Q1 OP2 |
= |6 Q1 OQ2 | = 90

In other words, L1 L2 , and therefore `1 `2 .


It remains to prove that 4P1 OQ1 4P2 OQ2 . Since the product of the slopes
of L1 and L2 is 1, the product of the absolute values of slopes of L1 and L2 is equal
to 1. By a reasoning that is familiar to us by now, this means
|P2 Q2 | |OQ1 |
=1
|OQ2 | |P1 Q1 |
Multiplying both sides of the equality by |OQ2 |/|OQ1 |, we get
|P2 Q2 | |OQ2 |
=
|P1 Q1 | |OQ1 |
Since 6 P2 Q2 O and 6 P1 Q1 O are right angles, the SAS criterion for similarity (Theo-
rem G21 in 3 of Chapter 5) implies that 4P1 OQ1 4P2 OQ2 , as desired. The proof
of Theorem 2 is complete.

Remark It is possible to prove this theorem using only congruence without


invoking similarity. We will leave this as an exercise. It will be seen that the present
proof using similarity is more natural.

Exercises 6.4

1. Prove the following assertion which was used to prove Theorem 2: Let L and
be two perpendicular lines. If L0 and 0 are lines so that L k L0 and k 0 , then also

337
L 0 0 .

2. Let O be the origin of a coordinate system in R2 , and let r be a positive number.


Prove that the transformation T of R2 which assigns to each point (a, b) the point
(ra, rb) is exactly the dilation with center O and scale factor r. (Caution: This is a
slippery proof.)

3. What is the equation of the line which is perpendicular to the graph of ax+by = c,
where a, b, c are constants, a 6= 0, and b 6= 0, and which passes through the point
( 21 , 3)?

4. Given a triangle with vertices at (1, 1), (5, 1) and (7 12 , 4). Does the point (2, 16 )
lie on the line passing through the vertex (1, 1) and perpendicular to the opposite
side of the triangle?

5. (a) Let L be the line joining O (the origin (0, 0)) to the point (a, b), and let L be
the line joining O to the point (a0 , b0 ). Prove: L L0 if and only if aa0 +bb0 = 0. (You
recognize the latter as the dot product of the vectors (a, b) and (a0 , b0 ) that you came
across in calculus.) (b) Let L and L0 be the lines defined by the equations ax+by = c
and a0 x + b0 y = c0 , respectively. Prove that L L0 if and only if aa0 + bb0 = 0.

6. Write out a self-contained proof, using only congruent triangles but without using
similar triangles, of the second half of Theorem 2, i.e., if the product of the slopes of
two lines is 1, then the lines are perpendicular.

5 Simultaneous linear equations

Suppose we are given constants a, b, . . . f , and we want to know if there are


numbers x and y so that they are solutions of both of the following linear equations:
(
ax + by = e
cx + dy = f
Such a pair of linear equations is variously called a linear system, or a system of
linear equations, or sometimes, simultaneous (linear) equations in the numbers

338
x and y. To be precise, one would have to refer to such a pair of equations as a
linear system of two equations in two unknowns or two variables, where the
unknowns or variables refer to the symbols x and y. An ordered pair (x0 , y0 ) is
called a solution of the system if it is a solution of both equations. To solve the
system is to find all the solutions of the system. Sometimes we also call the collection
of all these (x0 , y0 )s the solution set of the system.
For example, consider (
x + y = 2
3x + 3y = 6
This linear system has as solution all ordered pairs of the form (t, 2 t), where t is
an arbitrary number. In this case, the solution set is an infinite collection of points.
On the other hand, the following system
(
x + y = 2
3x + 3y = 5

clearly has no solution, because if (x0 , y0 ) is a solution, the fact that it is a solution of
the first equation means x0 + y0 = 2. By multiplying both sides of this equality by
3, we get 3x0 + 3y0 = 6. But since (x0 , y0 ) is also a solution of the second equation,
we also have 3x0 + 3y0 = 5, and therefore 5 = 6, a contradiction. Therefore, the
solution set in this case is the empty set, the set with no element.
In between the two preceding extreme cases, most linear systems have exactly
one pair (x0 , y0 ) as a solution. We will explain what most means and why this is
true by way of geometry.

Let `1 , `2 be the lines which are the graphs of the equations ax + by = e and
cx + dy = f , respectively. Suppose (x0 , y0 ) is a solution of the linear system
(
ax + by = e
cx + dy = f

In particular, this means we are assuming that there is a solution of the system. We
wish to interpret this solution geometrically. Since (x0 , y0 ) is a solution of ax+by = e,
the point (x0 , y0 ) lies on `1 , by the definition of the graph of an equation. For the same
reason, (x0 , y0 ) lies on `2 as well. Therefore (x0 , y0 ) lies on both `1 and `2 , and therefore
lies in the intersection of `1 and `2 . (We have to be careful not to assume that the

339
intersection of `1 and `2 is a point, because we cannot a priori exclude the possibility
that `1 = `2 , in which case the intersection of `1 and `2 is the line itself.) Conversely,
suppose (x0 , y0 ) lies in the intersection of `1 and `2 , then it must be a solution of the
system (
ax + by = e
cx + dy = f
because, (x0 , y0 ) being on `1 means ax0 + by0 = e, and (x0 , y0 ) being on `2 means
cx0 +dy0 = f . We have therefore proved the following basic fact relating the solutions
of a linear system to the graphs of the equations in the system.

Theorem 1 A simultaneous system of linear equations has a solution (x0 , y0 )


the point (x0 , y0 ) lies in the intersection of the (linear) graphs of the equations
in the linear system.

Theorem 1 gives the precise reasoning of why the solution(s) of a linear system of
two linear equations in two unknowns correspond to the intersection of the two lines
defined by the linear equations of the system. This is the reason why one can get
the solution of a system of simultaneous linear equations by graphing the equations.
Unfortunately, such a correspondence is usually decreed by fiat in standard school
textbooks with no explanation whatsoever, probably because the precise definition of
the graph of an equation is rarely given or, if given, is not put to use. This is therefore
a reminder that you make definitions a crucial part of your teaching. In particular,
please do not forget to explain why the solution of a linear system can be obtained
from the intersection of the lines (i.e., the graphs of the equations in the linear system).

It is worth noting that Theorem 1 shares something in common with a coordinate


system: they both provide a dictionary that mediates two disparate sets of informa-
tion: the algebraic information about solutions of a linear system, and the geometric
information about intersections of lines. In this particular case, we know all about
the intersections of lines (see (L2) and (L3) in 1 of Chapter 4), and will therefore
use this knowledge to shed light on the solutions of linear systems. We know that
there are exactly three mutually exclusive possibilities for two lines in the plane: the
lines are either

340
identical, or
parallel, or
distinct but not parallel.

Correspondingly, the lines

intersect at an infinite number of points, or


do not intersect, or
intersect exactly at one point.

From Theorem 1, we therefore conclude the following.

Corollary Given a linear system of two equations in two unknowns. Let the
graphs of the linear equations be `1 and `2 . Then the linear system either

has an infinite number of solutions (corresponding to `1 = `2 ), or


has no solution (corresponding to `1 k `2 ), or
has a unique solution (corresponding to `1 6= `2 but `1 6k `2 ).

We can now explain what is meant by most linear systems have a unique solu-
tion. Given two lines, what are the chances that they are either identical or parallel?
This is in fact a precise mathematical question that can be answered completely: zero.
To explain this answer, one must do some advanced mathematics. Nevertheless, one
can provide an intuitive understanding of the situation by fixing one of the lines, say
`1 , and ask what the chances are that the other line `2 either coincides with `1 or is
parallel to `1 . Clearly we can ignore the possibility of `2 actually equal to `1 because
this almost never happens. What about `2 k `1 ? Look at it this way: restrict `2 to
be a line passing through a fixed point P not lying in `1 , then according to the Par-
allel Postulate, there is only one chance that `2 k `1 , whereas there are an infinite
number of possibilities for `2 not to be parallel to `1 . Since this is true for any point
P not lying on `1 , it is intuitively clear that, almost always, `2 will be a line distinct
from `1 and not parallel to `1 . So by the Corollary, a linear system will almost always

341
have a unique solution.

We now take this Corollary of Theorem 1 to the next level: what are the algebraic
properties of the linear system that would lead to an infinite number of solutions,
no solution, and a unique solution? We can literally follow the prescription of the
Corollary and just doggedly investigate the algebraic properties of the linear equa-
tions that correspond to `1 = `2 , `1 k `2 , and `1 6= `2 but `1 6k `2 . This would lead to
a depressing case-by-case argument with a thicket of details that are ultimately not
enlightening.

Here is one way this argument can be carried out.


Case 1 The graphs `1 and `2 of ax + by = e and cx + dy = f , respectively, coincide.
If they are vertical, then b = d = 0 and the equations become x = e/a and x = f /c.
Their graphs are identical ae = fc . Therefore this case is equivalent to

e f
b=d=0 and =
a c
If they are not vertical, then both b 6= 0 and d 6= 0 and we may rewrite the system as

y = mx + k
y = m0 x + k 0

where m = ab , k = eb , m0 = dc , and k 0 = fd . Then `1 and `2 being identical


means they have the same slope and therefore m = m0 . The equations of `1 and `2
become y = mx + k and y = mx + k 0 , respectively. If k 6= k 0 , then (0, k) would be a
point lying on `1 but not `2 . Thus k = k 0 . Therefore in this case,
a c e f
= and =
b d b d

Case 2 The graphs `1 and `2 of ax + by = e and cx + dy = f , respectively, are parallel.


Then they are distinct and either are both vertical, or are both non-vertical and have
the same slope (Theorem 1 of 4). In the former case, being vertical means b = d = 0,
and since they are distinct, the equations ax = e and cx = f have distinct graphs.
Hence, ae 6= fc . To summarize: `1 and `2 being vertical and parallel is equivalent to

e f
b = d = 0, and 6=
a c
In the latter case, since `1 and `2 are non-vertical, b 6= 0 and d 6= 0 and we can write
the system as 
y = mx + k
y = m0 x + k 0

342
where m = ab , k = eb , m0 = dc , and finally, k 0 = fd . Since `1 and `2 have the same
slope, m = m0 , so that ab = dc and therefore by the cross-multiplication algorithm,
ad = bc. Moreover, the graphs of y = mx + k and y = mx + k 0 (recall m = m0 ) are
distinct k 6= k 0 , i.e., eb 6= fd . To summarize, `1 and `2 being non-vertical, parallel,
and distinct is equivalent to:
e f
ad bc = 0 and 6=
b d

Case 3 The graphs `1 and `2 of ax + by = e and cx + dy = f , respectively, are not


parallel and not the same line. Assume that both lines are not vertical. Then as in
Case 1, b 6= 0 and d 6= 0, so that we may rewrite the system as

y = mx + k
y = m0 x + k 0

where m = ab , k = eb , m0 = dc , and k 0 = fd . By Theorem 1 of 4, `1 and `2


not being parallel is equivalent to m 6= m0 , and therefore ab 6= dc and therefore by the
cross-multiplication algorithm, ad 6= bc. Hence this case is equivalent to:
ad 6= bc

The situation is reminiscent of the proof of the Theorem in 3, where we first gave
a straightforward proof, and then pointed out only later that, had we realized the
importance of the concept of slope, we would have given a much shorter and more
conceptual proof using slope as the pivot. We are going to do the same here: we will
take advantage of the cumulative knowledge of the past and give a sophisticated al-
gebraic analysis of the preceding corollary (once again) from the perspective of slope.
The lesson that you can take from this experience is that, while there is value in go-
ing through clumsy but straightforward arguments on your own (there is no doubt of
that), it is equally true that one should learn from the wisdom of the past. Absorbing
what others have to offer is how we can grow intellectually. It is pointless to try to
discover everything yourself; it is impossible anyway.

We now begin the algebraic analysis of the Corollary. Let `1 , `2 be the lines which
are the graphs of the equations ax + by = e and cx + dy = f , respectively, in the
linear system: (
ax + by = e
cx + dy = f
To motivate what is to come, suppose `1 and `2 are not vertical, so that their slopes
are defined. We notice that there is one thing that distinguishes the first two cases

343
(`1 = `2 and `1 k `2 ) from the third (`1 6= `2 but `1 6k `2 ), namely, in the first two cases
`1 and `2 have the same slope (Theorem 1 of 4) whereas in the third case, `1 and `2
have different slopes. We now express this information about slope algebraically, as
follows. Since `1 and `2 are both non-vertical, we have b 6= 0 and d 6= 0 so that we
may rewrite the linear system as
a e
y = ( b ) x + b


y = ( ) x +
c f
d d
Thus the slope of `1 is b , and that of `2 is dc . The slopes are equal
a

a
b
= dc , which is equivalent to ad = bc (the generalized cross-multiplication algorithm
for rational quotients, item (b) in 5 of Chapter 2), which in turn is equivalent to
ad bc = 0. For the same reason, the slopes of `1 and `2 being different is equivalent
to ad bc 6= 0. Thus whether or not the two lines of the linear system have the
same slopes or different slopes is captured by the vanishing (i.e., equal to zero) or
non-vanishing of the number ad bc, respectively. Such considerations suggest that
the number ad bc is an important characteristic of the linear system. Indeed it is,
and it is called the determinant of the linear system, usually denoted by .65 The
theorem we want to prove is then the following.

Theorem 2 Given a linear system


(
ax + by = e
cx + dy = f
Let denote the determinant of the system, = ad bc. Then:
(i) If 6= 0, the linear system has a unique solution.
(ii) If = 0, the linear system has either an infinite number of solutions, or no
solution.

Before giving the proof, we observe that since the definition of the determinant
does not require that any of a, b, c and d be nonzero, the lines `1 and `2 that corre-
spond to the two equations in the system could therefore be vertical.

65
You may consider this discussion to be a review of the simplest case of linear algebra, namely,
the case of dimension 2.

344
Proof We handle the two cases (i) and (ii) separately.
Case (i): The determinant ad bc is nonzero. In this case, clearly not both
b and d are zero. If b = 0, then d 6= 0. Notation as above, we see that `1 is vertical
but `2 is not vertical. Therefore `1 intersects `2 at exactly one point, and the linear
system (according to the Corollary of Theorem 1) has a unique solution. Similarly, if
d = 0, then b 6= 0 and again the linear system has a unique solution. If both b and d
are nonzero, then the linear system can be rewritten as
a e
y = ( ) x +
b b


c
y = ( ) x +
f
d d
The slope of `1 is therefore ab and the slope of `2 is dc . Since ad bc 6= 0, ad 6= bc
and ab 6= dc by the cross-multiplication algorithm. Thus `1 and `2 have different slopes
and therefore (by Theorem 1 of 4) are not parallel to each other. It follows that `1
intersects `2 at exactly one point and the linear system again has a unique solution.
We have thus proved case (i).
Case (ii): The determinant ad bc is zero. We claim that in this case, either
both b and d are 0, or both b and d are not 0.
To prove the claim, it suffices to show that if b = 0 then d must be 0, and if
d = 0 then b = 0. Suppose b = 0. Then ad bc = 0 implies that ad = 0. But by
the definition66 of a linear equation of two variables, not both a and b can be 0 in
ax + by = e. So a 6= 0. It follows that ad = 0 implies d = 0. Similarly, if d = 0, then
also b = 0. The claim is proved.
We now examine the first possibility: b = d = 0. Then both a and c are nonzero,
by the definition of a linear equation of two variables. The linear system may therefore
be rewritten as e
x = a


x =
f
c
Clearly, these vertical lines are identical (and the system has an infinite number
of solutions) if e/a = f /c, and are parallel (and the system has no solution) if
66
This is another reminder that we must take definitions seriously. Since we have defined a linear
equation x + y = to be such that not both and are zero, it stands to reason that this part
of the definition will play a critical role sooner or later.

345
e/a 6= f /c.
Next we examine the second possibility: b 6= 0 and d 6= 0. The linear system may
therefore be re-written as

a e

y = ( ) x +
b b


c f
y = ( ) x +


d d
Thus `1 and `2 have slopes equal to a/b and c/d, respectively. Now ad bc = 0 by
hypothesis, so ad = bc and the cross-multiplication algorithm implies that a/b = c/d.
This implies `1 and `2 have the same slope. We have to decide if they are identical
or parallel. If e/b = f /d, then the equations are identical and therefore so are
their graphs (the linear system therefore has an infinite number of solutions), and
if e/b 6= f /d, then the lines are distinct (because, for example, (0, e/b) is a point
in `1 but not in `2 ) and are therefore parallel (the linear system therefore has no so-
lution). This completes the proof of Case (ii), and therewith, the proof of Theorem 2.

Remark From the proof itself, we see that the conclusion of Case (ii) can be
made very precise, namely,

If the determinant is 0, then either b = d = 0 or both b 6= 0 and d 6= 0.


In case b = d = 0, then the linear system has an infinite number of
solutions if e/a = f /c, and has no solution if e/a 6= f /c.
In case b 6= 0 and d 6= 0, then the linear system has an infinite number of
solutions if e/b = f /d and has no solution if e/b 6= f /d.

However, it is imperative that you dont try to memorize these conclusions. If you
understand the reasoning, then you can use it in each situation to draw the right
conclusion. We illustrate with some simple examples.
If we are given a linear system with b = d = 0, e.g.,

4
3x = 5
x = 15

346
then common sense dictates that you multiply the second equation by 3 to change
the system to
3x = 54

3
3x =

5

Direct inspection now shows that the linear system has no solution.
Suppose we are given

10.2x 13.6y = 11.5
94 x + 3y = 21
4

We note that 10.2 3 (13.6)( 94 ) = 0 as both products are equal to 30.6,


and so we are in the situation of ad bc = 0 but bd 6= 0. We know from the
preceding analysis that the linear system has either no solution or an infinite number
of solutions, depending on whether the lines defined by the equations are distinct
or identical, respectively. The simplest way to find out whether the lines defined by
these equations are identical or distinct is to rewrite both equations in the form of
1
y = mx + b and compare. Thus multiplying the first equation by 13.6 and the second
1
equation by 3 , we get
y = 10.2 x 11.5
13.6 13.6
3 7
y = 4
x 4

Notice that in so doing, we dont need to bother with checking whether 10.2 13.6
and 34
are equal or not as we know they must be (because the lines have the same slope!).
11.5
The only thing to compare is whether 13.6 and 74 are equal. Since the former is less
than 1 and the latter is greater than 1, they are obviously not equal. Hence the two
lines are distinct and this system has no solutions. Therefore the same is true of the
original system as well.

Up to this point, we have not talked about the explicit algebraic method of solv-
ing a linear system taught in school classrooms. You know all too well that it is the
method of elimination, or sometimes called the method of substitution. This
method is usually taught by rote in schools. What we do next is to subject this
method to a critical examination.

347
In the remainder of this section, we will explain why this method produces the
correct solution to any linear system with nonzero determinant and what it means
geometrically.

Let us first look at a simple case and then use the standard symbolic manipulations
taught in schools to get a solution. Consider
(
2x 3y = 1
3x + 2y = 1

Let us eliminate y. So from the second equation, we get


3 1
y = x
2 2
Substituting this expression of y into the first equation gives
3 1
 
2x 3 x =1
2 2
13
Simplifying (cf. 2), we get 2
x = 21 so that
1
x=
13
Substituting this value of x into y = 32 x 1
2
then leads to
5
y=
13

We pause to reflect on this method of solution. We have performed the whole


computation without knowing what x and y are beyond the fact that they are some
symbols. But in all the mathematics we learn up to this point, we have never talked
about computing with symbols. All we know are numbers and geometric figures.
So suppose x and y are numbers. But what numbers are they? Could x be 7 and y
be 2? Not really, because it is not true that
(
2(7) 3(2) = 1
3(7) + 2(2) = 1

Therefore the above computation could not possibly be valid for any two numbers x
and y, and would make sense only if this pair (x, y) is a solution of the linear system.

348
At this point, we must recall the basic protocol on the use of symbols: before we use
any symbol, we must know what it stands for. Therefore, let us start all over again
and do things in a way that makes sense.

Suppose (x0 , y0 ) is a solution of the system, i.e., we assume that for an ordered
pair of numbers (x0 , y0 ), we have
(
2x0 3y0 = 1
3x0 + 2y0 = 1

Then these are two equalities of numbers, and we can proceed to compute with them
in the usual way that we do arithmetic. From the second equation, we get
3 1
y 0 = x0
2 2
Substituting this value of y0 into the first equation gives
3 1
 
2x0 3 x0 =1
2 2
13
Solving this linear equation in x0 (as in 2), we get 2
x0 = 12 so that

1
x0 =
13
Substituting this value of x0 into y0 = 32 x0 1
2
then leads to

5
y0 =
13
Note that if we replace x0 by x and y0 by y, then this computation is formally
identical to the previous method of solution taught in the school classroom. The only
difference is that the second computation is the one that is mathematically valid, be-
cause it is nothing but a normal computation carried out with numbers.

To summarize, what we have proved is this:

() If (x0 , y0 ) is a solution of the given linear system, then


1 5
x0 = and y0 = .
13 13

349
1 5
Have we shown that ( 13 , 13 ) is a solution of the given linear system? No. For
that purpose, we need to prove the following assertion, which is in fact the converse
statement of ():
1 5
() If x0 = 13 and y0 = 13 , then (x0 , y0 ) is a solution of the given
linear system.

In other words, what these computations with the ordered pair (x0 , y0 ) lead to is the
statement (), which is not the desired conclusion (), but rather its converse. So in
what sense does the familiar method of elimination solve the linear system?

We can answer this question in two ways. The first answer is entirely pragmatic:
just check to see if it is true that
   
1 5
2 13 3 13 = 1
   
1 5
3 13 + 2 13 = 1

1 5
A routine computation then verifies that, indeed, the ordered pair ( 13 , 13 ) pro-
duced by the method of elimination is a solution.
Obviously, this pragmatic answer would be of little value if it is a singular occur-
rence that happens to furnish a solution for this linear system but for no other. Such
is not the case. We now give a self-contained and coherent account that shows that
the usual method of elimination is the procedural aspect of a mathematically valid
method of solution. In other words, the rote procedure may seem to make no sense,
but in fact it can be shown to make sense after all. Here is the general explanation.
Given a linear system:
(
ax + by = e (1)
cx + dy = f (2)

We assume that its determinant = ad bc 6= 0. If b = d = 0, then = 0. Thus


6= 0 implies that not both b and d are 0. Without loss of generality, we may assume
d 6= 0. Now suppose (x0 , y0 ) is a solution of the system (1)(2), so that
(
ax0 + by0 = e (3)
cx0 + dy0 = f (4)

350
Equation (4) implies y0 = dc x0 + fd . Substitute this value of y0 into (3) to get
!
c f
ax0 + b x0 + =e
d d

Multiplying both sides by d and simplifying (see 2), we obtain (adbc) x0 = debf ,
so that
de bf
x0 =

Substituting this value of x0 into the equation y0 = dc x0 + fd , we obtain
! !
c de bf f 1 cde + bcf + adf bcf
y0 = + =
d d d

and therefore
af ce
y0 =

We have now proved that if (x0 , y0 ) is a solution of the linear system (1)(2),
then
de bf af ce
(x0 , y0 ) = ( , ) (5)

Next, we show that the ordered pair in (5) is a solution of the linear system (1)-
(2) by a direct verification. This means that it satisfies both equations (1) and (2),
i.e.,
! !
de bf af ce
a +b = e (6)

! !
de bf af ce
c +d = f (7)

The computation is straightforward, and is in any case the kind that should be mas-
tered, so it is left as an exercise. This then completes our proof that the usual method
of elimination taught in the school classroom is basically sound, because we have just
supplied the reasoning that it is so.

We mentioned that there was a second answer to the question of why the method
of elimination solves the linear system. This answer is much shorter, but it is also

351
mathematically more sophisticated. It goes as follows. Again, we are given a linear
system (1)-(2) with the determinant not equal to 0. We make use of Theorem 2.
The latter tells us in advance that the system must have a unique solution (x0 , y0 ).
Thus reassured, all we need to do is to find out the exact values of x0 and y0 in terms
of a, b, . . . , f . We have, as before,
(
ax0 + by0 = e (3)
cx0 + dy0 = f (4)

Then exactly the same computation as above (i.e., the same procedure as the method
of elimination taught in schools) yields
de bf af ce
(x0 , y0 ) = ( , ) (5)

Since we know that (x0 , y0 ) is the unique solution of (1)-(2), (5) is the answer.

We wish to add three comments to round off this discussion of the solution of a
linear system. The first is perhaps the most obvious: if the second (i.e., the preceding)
explanation of the method of elimination, making use of Theorem 2, is so efficient and
so clear-cut, why did we bother with the long-winded first one? The answer is that in
other situations of this kind (e.g., in the solution of quadratic equations in Chapter
7), there may not be an analog of Theorem 2 to give an a priori guarantee that there
is a unique solution. Then one has to begin developing the mathematical sensitivity
which recognizes the difference between solving an equation and getting the specific
value of a solution when a solution is assumed to exist. The case of a linear system
is therefore a good testing ground for the development of such sensitivity.
A second comment is whether it is realistic to use the second explanation to teach
the solution of linear systems in an eighth grade classroom, which would require the
proof of something like Theorem 2. There is usually not one simplistic answer to a
pedagogical question, but in general terms, an abstract argument such as the proof
of Theorem 2 may not be appropriate for the average eighth grader. Nevertheless, it
is possible to convey the same basic idea while minimizing the needed abstraction.
We illustrate with the same linear system as in the example above:
(
2x 3y = 1
3x + 2y = 1

352
The explanation can then proceed as follows. Rewrite the system as
2

y = 3
x + 1

y = 23 x 1

We have to assume that students know the reasoning behind the basic Theorem 1
on the correspondence between the solutions of a linear system and the points of
intersections of the lines defined by the equations. With this understood, the two
lines defined by the two equations in the preceding system have different slopes (i.e.,
2
3
and 32 ) and are therefore not parallel to each other. Thus there is a unique point
of intersection (x0 , y0 ) of these lines, and this (x0 , y0 ) must be the unique solution
of the system. It remains therefore to find out the specific values of x0 and y0 . At
this point, the usual method of elimination can take over, and such an explanation
of how to solve this linear system would be entirely valid. Once this line of reasoning
is thoroughly understood, it would be safe, when dealing with other linear systems, to
dispense with the reasoning concerning the slopes and the intersection of non-parallel
lines and proceed directly to the usual method of elimination.

A final comment is that the method of elimination has a nice geometric interpre-
tation. For simplicity, we only deal with the case where both b and d are nonzero.
Let us first reformulate this method. Given a linear system as before:
(
ax + by = e (1)
cx + dy = f (2)
f
The method then calls for rewriting equation (2) as y = dc x + d and substituting
this value of y into equation (1) to obtain
!
c f
ax + b x + =e
d d

Transposing ax to the right and multiplying both sides by 1b , we get:

c f a e
x+ = x+ (8)
d d b b
Of course the method goes on to solve equation (8) for x, but getting to (8) is the
key step of the method of elimination.

353
We now give a different approach to (8) by rewriting the linear system (1)-(2) as:

y = ab x + e
b
(9)
f
y = dc x + d
(10)

Now if we equate the two right-hand sides, we would get exactly equation (8) again.
This means that the usual method of elimination can be re-interpreted as equating
the two expressions of x when the linear system is put in the form (9)-(10).
It is this process of equating the two expressions of x in (9)-(10) that we wish to
interpret geometrically. Recall the notation that `1 is the graph of the first equation
and `2 is the graph of the second. For a fixed number x, x arbitrary, we look at the
points on `1 and `2 that lie above x, in the sense that they lie on the vertical line
passing through the point (x, 0). According to equation (8), the y-coordinate of the
point on `1 that lies above x is ab x + eb . For the sake of clarity, let us denote it by
y1 , i.e.,
a e
y1 = x +
b b
Similarly, let the y-coordinate of the point on `2 lying above the same x be denoted
by y2 , i.e.,
c f
y2 = x +
d d
Furthermore, let (x0 , y0 ) be the solution of the linear system (1)-(2), as shown:

y1 q q
`1

`2
y2 q q
y0 r
(x0 , y0 )

q
O x0 x

From the picture, one can easily see that y1 and y2 coincide precisely when x = x0 .
Therefore, to look for the x-coordinate of the solution (x0 , y0 ), we look for the number

354
x so that y1 = y2 for this x, i.e., we look for the number x so that
a e c f
( ) x + = ( ) x +
b b d d
which is of course just equation (8). The geometric content of the method of elimina-
tion is therefore the search for this number x so that the corresponding y-coordinates
of the points on `1 and `2 lying above x coincide.

Exercises 6.5

1. Verify equations (6) and (7).

2. Discuss the solutions of each of the following systems without actually solving
them. Give reasons.
(
4x 3y = 1
(a)
9x 7y = 35
(
2.4x 7.7y = 43
(b)
0.264x + 0.847y = 4.63
(
15x + 12y = 4
(c)
30.5x 24.4y = 122
15

3. If 3 is added to the numerator of a fraction and 7 subtracted from the denomina-


tor, its value is 67 . But if 1 is subtracted from the numerator and 7 added to the
denominator, its value is 25 . Find the fraction.

5(x+1)
4. Let x be a number not equal to 4 or 3. Express 3(x2 +x12) as a sum of
1 1
(constant) multiples of x+4 and x3 .

5. If the digits of a two-digit number are reversed, the new number is 9 less than the
original. The sum of the digits is also 9. (a) If x is the tens digit and y the ones
digit, write down equations satisfied by x and y. (b) What is this number?

355
6. If the digits of a two-digit number are reversed, the new number is 1 less than
twice the original number. Furthermore, if 10 times the tens digit is divided by the
ones digit, the quotient is 4 and the remainder is 2. (a) If x is the tens digit and
y the ones digit, write down equations satisfied by x and y. (b) What is this number?

7. In a linear system (
ax + by = e
cx + dy = f
suppose ad bc = 0 but b 6= 0 and d 6= 0. Prove that there is a nonzero number k so
that the system can be rewritten as
(
ax + by = e
kax + kby = f

8. Suppose you are teaching a ninth grade algebra class and you want to show your
students how to solve the following linear system by the method of substitution (or
elimination) for the first time:
(
2x y = 1
x + 2y = 7

Carefully explain how you would teach it.

9. Given positive integers s and t with s < t. (a) Solve for u and v in the linear
system
u + v = t

s

u v = st
(b) Show that u > 0 and v > 0. (c) If the solutions u and v in terms of s and t
are written in the form of u = cb and v = ab , where a, b, c are positive integers
expressed in terms of s and t, show that {a, b, c} form a Pythagorean triple,
i.e., they are positive integers and a2 + b2 = c2 .

10. In each of the following, you are asked to solve the linear system in problem 9
above with the given values of s and t to obtain Pythagorean triples. You may use a
scientific calculator.

356
(a) s = 1, t = 2. (b) s = 2, t = 3. (c) s = 2, t = 69.
(d) s = 54, t = 125. (e) s = 8, t = 9907.

[The following four problems together show that, in a precise sense, the
method of problem 9 generates all the Pythagorean triples.]

11. A Pythagorean triple is said to be primitive if there is no (positive) common


divisor among the triple of positive integers other than 1. Prove that the following
are equivalent for a Pythagorean triple {a, b, c}:
(i) {a, b, c} is primitive.
(ii) {a, b} are relatively prime.
(iii) {a, c} are relatively prime.
(iv) {b, c} are relatively prime.

12. If {a, b, c} is a primitive Pythagorean triple (so that a2 + b2 = c2 ), then prove


that one of a and b is even, and the other is odd.

13. If s and t are relative prime positive integers so that one is even and the other is
odd, then prove that the Pythagorean triples produced in problem 9 are primitive.

14. Let {a, b, c} be a Pythagorean triple. Prove that the following are equivalent:
(i) {a, b, c} is a primitive Pythagorean triple with a odd and b even.
(ii) There is a pair of relatively prime positive integers s and t , with s < t and one
of them is even and the other odd, so that a = t2 s2 , b = 2st, and c = t2 + s2 .
(This is not an easy problem.)

357

You might also like