You are on page 1of 359

MATHEMATICS 23a/E23a, FALL 2015

Linear Algebra and Real Analysis I


Syllabus for undergraduates and local Extension students
(Distance Extension students will also need to consult a special syllabus)
Last revised: July 22, 2015
Course Website: https://canvas.harvard.edu/courses/4524
Instructor: Paul Bamberg (to be addressed as Paul, please)
Paul graduated from Harvard in 1963 with a degree in physics and received his
doctorate in theoretical physics at Oxford in 1967. He taught in the Harvard
physics department from 1967 to 1995 and joined the math department in 2001.
From 1982 to 2000 he was one of the principals of the speech recognition company
Dragon Systems. If you count Extension School and Summer School, he has probably taught more courses, in mathematics, physics, and computer science, than
anyone else in the history of Harvard. He was the first recipient of the White Prize
for excellence in teaching introductory physics.
This term, Paul is also teaching Math 152, Discrete Mathematics, and Math
116, Real Anaysis, Convexity, and Optimization.
Email: bamberg@tiac.net
Office: Science Center 322, (617-49)5-9560
Office Hours:
Tuesday and Thursday, 1:30-2:15 in Science Center 322.
Mondays 2-2:30 (longer if students are still there)
Head Teaching Assistant: Kate Penner (to be addressed as Kate, please)
Kate is the course head for Math E-23a, responsible for making it possible for
students from around the nation and the world to participate as fully as possible
in course activities.
Kates Harvard undergraduate degree is in government, but her interests have
moved to political economy and mathematics. After taking Math E-23 in the
Extension School, she became the head teaching assistant and is starting her sixth
year in that position. She has been course head for linear algebra and real analysis
courses in the Summer School. She may have set a Harvard record in Spring
2013 by teaching in four courses (Math M, Math 21b, Math 23b, and Math 117).
To date, she has received over a dozen teaching awards from the Bok Center for
Teaching and Learning for her work teaching undergraduate math.
This term, Kate is also teaching Math1a.
Email: penner@math.harvard.edu
Office: Science Center 424
Office Hours: TBD
Week 1: T Regular office hours TBA
1

Course Assistants:(all former students in Math 23a or Math E-23a)


Nicolas Campos, ncampos@college.harvard.edu
Jennifer Hu, jenniferhu@college.harvard.edu
Ju Hyun Lee, juhyunlee@college.harvard.edu
Elaine Reichert, reichertelaine@gmail.com
Ben Sorscher, bsorscher@college.harvard.edu
Sebastian Wagner-Carena, swagnercarena@college.harvard.edu
Kenneth Wang, kwang02@college.harvard.edu
Goals: Math 23a is the first half of a moderately rigorous course in linear algebra
and multivariable calculus, designed for students who are serious about mathematics and interested in being able to prove the theorems that they use but who are
as much concerned about the application of mathematics in fields like physics and
economics as about pure mathematics for its own sake. Trying to cover both
theory and practice makes for a challenging course with a lot of material, but it is
appropriate for the audience!
Prerequisites: This course is designed for the student who received a grade of 5
on the Math BC Advanced Placement examination or an A or A minus in Math
1b. Probably the most important prerequisite is the attitude that mathematics is
fun and exciting. Extension students should ordinarily have an A in Math E-16,
and an additional math course would be a very good idea.
Our assumption is that the typical Math 23a student knows only high-school
algebra and single-variable calculus, is currently better at formula-crunching than
at doing proofs, and likes to see examples to accompany abstractions. If, before
coming to Harvard, you took courses in both linear algebra and multivariable
calculus, Math 25 might be more appropriate. We do not assume that Math 23
students have any prior experience in either of these areas beyond solving systems
of linear equations in high school algebra.
This year, for the second time, we will devote four weeks to single-variable real
analysis. Real analysis is the study of real-valued functions and their properties,
such as continuity, and differentiability, as well as sequences, series, limits, and
convergence. This means that if you are an international student whose curriculum
included calculus but not infinite series OR if you had a calculus course that
touched only lightly on topics like series, limits, and continuity, you will be OK.
Mathematics beyond AP calculus is NOT a prerequisite! Anyone who tries
to tell you otherwise is misguided. In fact, since we will be teaching sequences
and series from scratch (but rigorously), you can perhaps get away with a weaker
background in this area than is required for Math 21.

Strange as it may seem, Part I of the math placement test that freshmen have
taken is the most important. Students who do well in Math 23 have almost all
scored 26 or more out of 30 on this part.
Extension students who register for graduate credit are required to learn and
use the scripting language R. This option is also available to everyone else in the
course. You need to be only an experienced software user, not a programmer.

Who takes Math 23?


When students in Math 23b were asked to list the two concentrations they were
most seriously considering, the most popular choices were mathematics, applied
math, physics, computer science, chemistry, mathematical economics, life sciences,
and humanities.
Extension students who take this course are often establishing their credentials
for a graduate program in a field like mathematical economics, mathematics, or
engineering. Programs in fields like economics like to see a course in real analysis
on your transcript. Successful Math E-23 students have usually taken more than
one course beyond single-variable calculus.
Upperclassmen who have made a belated decision to go into a quantitative PhD
program will also find this course useful.
Course Meetings:
The course ordinarily meets in Science Center A. To avoid overcrowding, the
first two lectures have been moved to Science Center C.
Lectures on Tuesdays and Thursdays run from 2:37 to 4:00. They provide
complete coverage of the weeks material, occasionally illustrated by examples done
in the R scripting language.
Problem Sessions (Section)
There are two types of weekly problem sessions led by the course staff. The
first is required; the second, though highly recommended, is optional.
The early sections on Thursday and Friday will be devoted to problem
solving in small groups. These are a required course activity and will
count toward your grade. Lecture on Thursday is crucial background for
section!
The late sections that meet on Monday will focus on the weekly problem
sets due on Wednesday mornings, and will also review the proofs that were
done in lecture. Attendance at these sections is optional, but most students
find them to be time well spent.
Videos will be made of all the lectures. Usually the video will be posted on
the Web site before the next lecture, and often it will appear on the same day.
The Thursday video will not be posted in time to provide preparation for the early
sections that meet on Thursdays, and we cannot guarantee that it will appear
before the Friday sections.
Even though all lectures are captured on video, Harvard rules forbid undergraduates to register for another course that meets at the same time as Math 23,
even one with just a 30-minute overlap! Here is the official statement of this years
policy:
In recent years, the Ad Board has approved petitions in which the direct
and personal compensatory instruction has been provided via video capture of
classroom presentations. In keeping with the views of the Standing Committee
on Undergraduate Educational Policy (formerly EPC), discussed with the Faculty
4

Council and the full faculty last April, the Ad Board will no longer approve such
petitions.
With regard to athletic practices that occur at the same time as classes, policy
is less well defined. Here is the view of the assistant director of athletics:
The basic answer is that our coaches should be accommodating to any academic conflict that comes up with class scheduling. Kids should be able to take
the classes they want and still be a part of the team. Especially for classes that
would only cause a student to miss a small part of a practice.
What complicates things are the classes that would cause a student to miss an
entire practice for 2-3 days a week. Those instances make it hard for a student to
engage fully in the sport and prepare adequately for competition.
Its hard for freshmen to ask a coach - the adult they have the closest relationship to in campus - for practice accommodations but in my experience many of
them will work with students on their total experience
The Math 23 policy, based on this opinion: It is OK to take Math 23a and
practice for your sport every Tuesday, but you must not miss Thursday lecture for
a practice.
Extension students may choose between attending lecture or watching videos.
However, students in Math E-23a who will not regularly attend lecture on Thursday
should sign up for a section that meets as late as possible. Then, with occasional
exceptions, they can watch the video of the Thursday lecture to prepare for section.
Sections will begin on September 10-11. Students should indicate their preferences for section time using the student information system. More details will be
revealed once the software is complete!
In order to include your name on a section list, we must obtain your permission
(on the sectioning form) to reveal on the Web site that you are a student taking
Math 23a or E-23a. If you wish to keep this information secret, we will include
your name in alphabetical order, but in the form Xxxx Xxxxxx.

Exams: There will be two quizzes and one final exam.


Quiz 1:
Quiz 2:
Final Exam:

Wednesday, October 7 (module 1, weeks 1-4)


Wednesday, November 4 (module 2, weeks 5-8)
date and time TBA (module 3, weeks 9-12)

Quizzes are held in the Yenching Auditorium, 2 Divinity Avenue. They run
from 6 to 9 PM, but you can arrive any time before 7 PM, since 120 minutes should
be enough time for the quiz.
Keep these time slots open. Do not, for example, schedule a physics lab
or an LS 1a section on Wednesday evenings. If you know that you tend to work
slowly, it would also be unwise to schedule another obligation that leaves only part
of that time available to you!
Students who have exam accommodations, properly documented by a letter
from the Accessible Education Office, may need to take their quizzes in a separate
location. Please provide the AEO letters as early in the term as you can, since we
may need to reserve one or more extra rooms.
The last day to drop and add courses (like Math 23a and Math 21a) is Monday,
October 5. This is before the first quiz. It is important that you be aware of how
you are managing the material and performing in the course. It is not a good
idea to leave switching out of any course (not just Math 23) until the fifth Monday. Decisions of this nature are best dealt with in as timely a manner as possible!!
Quizzes will include questions that resemble the ones done in the early sections, and each quiz will include two randomly-chosen proofs from among the
numbered proofs in the relevant module. There may be other short proofs similar to ones that were done in lecture and problems that are similar to homework
problems. However if you want quizzes on which you are asked to prove difficult
theorems that you have never seen before, you will need to take Math 25a or 55a,
not Math 23a.
If you have an unexpected time confilct for one of the quizzes, contact Kate
as soon as you know about it, and special arrangements can be made. Distance
students will take their quizzes near their home but on the same dates.
The final examination will focus on material from the last five weeks of the
course. Local Extension students will take it at the same time and place as undergraduates. The time (9AM or 2PM) will be revealed when the exam schedule is
posted late in September. If you have two or even three exams scheduled for that
day, dont worry: that is a problem for the Exams Office, not you, to solve.
Except for the final examination, local Extension students can meet all their
course obligations after 5:30pm.
Distance extension students who do not live near Cambridge and cannot
come to Harvard in the evening to hand in homework, attend section and office
hours, take quizzes, and present proofs can still participate online in all course
activities. Details will be available in a separate document. Since this fully-online

option is an experiment, we plan to restrict it to two sections of 12 students each,


with absolute priority given to students who live far from Cambridge.

Textbooks:
Vector Calculus, Linear Algebra, and Differential Forms, Hubbard and Hubbard,
fourth edition, Matrix Editions, 2009. Try to get the second printing, which includes a few significant changes to chapters 4 and 6.
This book is in stock at the Coop, or you can order it for $84 plus $10 for
priority shipping from the publishers Web site at
http://matrixeditions.com/UnifiedApproach4th.html. The Student Solution
Manual for the fourth edition, not in stock at the Coop, is also available from that
Web site.
We will cover Chapters 1-3 this term, Chapters 4-6 in Math 23b; so this one
textbook will last for the entire year.
Ross, Elementary Analysis: The Theory of Calculus, 2nd Edition, 2013.
This will be the primary text for the module on single-variable real analysis.
It is available electronically through the Harvard library system (use HOLLIS and
search for the author and title). If you like to own bound volumes, used copies can
be found on amazon.com for as little as $25, but be sure to get the correct edition!
Lawvere, Conceptual mathematics: a first introduction to categories, 2nd Edition, 2009.
We will only be using the first chapter, and the book is available for free
download through the Harvard library system.

Proofs:
Learning proofs can be fun, and we have put a lot of work into designing an
enjoyable way to learn high level and challenging mathematics! Each weeks course
materials includes two proofs. Often these proofs appear in the textbook and will
also be covered in lecture. They also may appear as quiz questions.
You, as students, will earn points towards your grade by presenting these proofs
to teaching staff and to each other without the aid of your course notes. Here is
how the system works:
When we first learn a proof in class, only members of the teaching staff are qualified listeners. Anyone who presents a satisfactory proof to a qualified listener
also becomes qualified and may listen to proofs by other students. This process of
presenting proofs to qualified listeners occurs separately for every proof.
You are expected to present each proof before the date of the quiz on which it
might appear; so each proof has a deadline date. Distance students may reference
the additional document which details how to go about remotely presenting proofs
to classmates and teaching staff.
Each proof is worth 1 point. Here is the grading system:
Presenting a proof to Paul, Kate, one of the course assistants, or a fellow
student who has become a qualified listener: 0.95 points before the deadline,
0.8 points after the deadline. You may only present each proof once.
Listening to a fellow students proof: 0.1 point. Only one student can receive
credit for listening to a proof.
After points have been tallied at the end of the term, members of the course
staff may assign the points that they have earned by listening to proofs
outside of section to any students that they feel deserve a bit of extra credit.
Students who do the proofs early and listen to lots of other students proofs can
get more than 100%, but there is a cap of 30 points total.You can almost reach
this cap by doing each proof before the deadline and listening twice to each proof.
Either you do a proof right and get full credit, or you give up and try again
later. There is no partial credit. It is OK for the listener to give a couple of small
hints.
You may consult the official list of proofs that has the statement of each theorem
to be proved, but you may not use notes. That will also be the case when proofs
appear on quizzes and on the final exam.
It is your responsibility to use the proof logging software on the course
Web site to keep a record of proofs that you present or listen to. You can also
use the proof logging software to announce proof parties and to find listeners for
your proofs.
Each quiz will include two questions which are proofs chosen at random from
the four weeeks of relevant material. The final exam will have three proofs, all from
material after the second quiz. Students generally do well on the proof questions.

Useful software:
R and RStudio
This is required only for Extension students who register for graduate credit,
but it is an option for everyone. Consider learning R if you
are interested in computer science and want practice in using software
to do things that are more mathematical than can be dealt with in CS
50 or 51.
are thinking of taking a statistics course, which is likely to use R.
are hoping to get an interesting summer job or summer internship that
uses mathematics or deals with lots of data.
want to be able to work with large data files in research projects in any
field (life sciences, economics and finance, government, etc.)
R is free, open-source software. Instructions for download and installation
are on the Web site. You will have the chance to use R at the first section
on Thursday, September 10 or Friday, September 11; so install it right away,
preferably on a laptop computer that you can bring to section.
On the course Website are a set of R scripts, with accompanying YouTube
videos, that explain how to do almost every topic in the course by using
R. These scripts are optional for undergraduate, but they will enhance your
understanding both of mathematics and of R.

10

LaTeX
This is the technology that is used to create all the course handouts. Once
you learn how to use it, you can create professional-looking mathematics on
your own computer.
The editor that is built into the Canvas course Web site is based on LaTeX.
One of the course requirements is to upload four proofs to the course Web site
in a medium of your choice. One option is to use LaTeX. Alternatively, you
can use the Canvas file editor (LaTeX based), or you can make a YouTube
video.
I learned LaTeX without a book or manual by just taking someone elses files,
ripping out all the content, and inserting my own, and so can you. You will
need to download freeware MiKTeX version 2.9 (see http://www.miktex.org),
which includes an integrated editor named TeXworks.
From http://tug.org/mactex/ you can download a similar package for the
Mac OS X.
When in TeXworks, use the Typeset/pdfLaTeX menu item button to create
a .pdf file. To learn how to create fractions, sums, vectors, etc., just find an
example in the lecture outlines and copy what I did. All the LaTeX source
for lecture outlines, assignments, and practice quizzes is on the Web site, so
you can find working models for anything that you need to do.
If you create a .pdf file for your homework, please print out the files and
hand in the paper at class. An exception can be made if if you are a distance
Extension student or for some other good reason you are not in Cambridge
on the due date.
The course documents contain examples of diagrams created using TikZ,
the built-in graphics editor. It is also easy to include .jpg or .png files
in LaTeX. If you want to create diagrams, use Paint or try Inkscape at
http://www.inkscape.org, an excellent freeware graphics program. Students have found numerous other solutions to the problem of creating graphics, so just experiment.
If you create a .pdf file for your homework, please print out the files and hand
in the paper. By default, undergraduates and local Extension students may
submit the assignment electronically only if you are out of town on the due
date. Individual section instructors may adopt a more liberal poicy about
allowing electonic submission. Do not submit .tex files.

11

Use of R:
You can earn R bonus points in three ways:
By being a member of a group that uploads solutions to section problems
that require creation of R scripts. These will be available most, but not all,
weeks. (about 10 points)
By submitting R scripts that solve the optional R homework problems (again
available most, but not all, weeks). (about 20 points)
By doing a term project in R. (about 20 points)
To do the graduate credit grade calculation, we wiil add in your R bonus
points to the numerator of your score. To the denominator, we will add in 95%
of your bonus points or 50% of the possible bonus points, whichever is greater.
Earning a lot of R points is essential if you are registered for graduate credit. Otherwise,earning more than half the bonus points is certain to raise your percentage
score a bit, and it can make a big difference if you have a bad day on a quiz or on
the final exam.

12

Grades: Your course grade will be determined as follows:


problem sets, 50 points. Your worst score will be converted to a perfect score.
presenting and listening to proofs, 26 points.
uploading proofs to the Web site, 4 points.
participation in the early sections, based on attendance, preparation, contributions to problem solving, and posting solutions to the Web site, 10
points.
two quizzes, 40 points each.
final exam, slightly more than 60 points.
R bonus points, about 50 points in numerator, 25-45 points in denominator.
For graduate students, only a graduate percentage score, using the R bonus
points, will be calculated. For everyone else, we will also calculate an undergraduate percentage score, ignoring the R bonus points, and we will use the higher of
the two percentage scores.
The grading scheme is as follows:
Points Grade
94.0%
A
88.0%
AB+
80.0%
75.0%
B
B69.0%
63.0%
C+
C
57.0%
51.0%
CIf you are conscientious about the homework, proofs, and quizzes, you will end up
with a grade between B plus and A, depending on your expertise in taking a fairly
long and challenging 3-hour final exam, and you will know that you are thoroughly
prepared for more advanced courses. For better or worse, you need to be fast as
well as knowledgeable to get an A, but an A- is a reasonable goal even if you make
occasional careless errors and are not a speed demon. Extension students who
earned a B plus have been successful at getting into PhD programs.
There is no curve in this course! You cannot do worse because your classmates
do better.

13

Switching Courses (Harvard College students only):


While transfers among Math 21a, 23a, 25a, and 55a are routine, it is important
to note that Math 21a focuses on multivariable calculus, while Math 23a and 25a
focus on linear algebra. Math 21b focuses on linear algebra, while Math 23b and
25b focus on multivariable calculus. Math 21a and b are given every semester, while
Math 23a and 25a are fall only with 23b and 25b given spring only. Ordinarily
there is a small fee if you drop a course after the third Monday of the term, but
this is waived in the case of math courses. However, the fifth Monday, October 5,
is a firm deadline after which you cannot change courses!
Math 23a to Math 21a or b
If you decide to transfer out of Math 23a within 3 weeks of the start of the
semester, then either Math 21a or 21b is a reasonable choice. If more than 3
weeks have elapsed, Math 21b will be a better place for you to go. You will
want to take Math 21a in the spring. You should avoid waiting until the last
minute to switch.
Switching to Math 21 at midyear (either to 21b or to 21a) does not make
sense except in desperate situations. You will have seen some of the topics in
Math 25b, since Math 25a does almost no real analysis. In addition, you will
have done about 60% of Math 112, which you are should skip after taking
Math 23.
Math 25a to Math 23a
Math 23a and Math 25a cover similar material during the first three weeks.
If you have taken a course in which you learned to multiply matrices and use
dot and cross products, you can probably attend only Math 25 lectures for
three weeks and still have only a little catching up to do if you add Math
23a during the week of the first quiz. However, if you are trying to decide
between 25a and 23a and have not taken a college-level linear algebra course,
it might be prudent to attend the lectures in both courses until you make up
your mind. Math 23a Weeks 2 and 4 will be new material!
In the case of transfers, graded Math 25a problem sets will be accepted in
lieu of missed Math 23a problem sets. It is imperative that you review the
problem sets and material that you have missed upon joining the course as
soon as possible.
For those who make the decision to change courses at the last minute, there
will be special office hours in Science Center 322 on Monday, October 5 from
3 to 4 PM at which study card changes can be approved and arrangements
for missed homework and quizzes can be discussed.
Switching from Math 23a to Math 25b at midyear has worked well for a few
students over the past several years, although you end up seeing a lot of real
analysis twice.

14

Switching from Math 25a to Math 23b at midyear requires you to teach
yourself about multivariable differential calculus and manifolds, but a handful
of students do it every year, and it generally works out OK.
Special material for Physics 15b and Physics 153
Math 23b does an excellent treatment of vector calculus (div, grad, and curl)
and its relation to differential form fields and the exterior derivative. Alas, this
material is needed in Physics 15b and Physics 153 before we can reach it in Math
23.
Week 13 covers these topics in a manner that relies only on Math 23a, never
mentioning muliple integrals. This will be covered in a special lecture during
reading period, and there will be a optional ungraded problem set. If you choose
to do this topic, which physics students last year said was extremely useful, there
will be one question about it on the final exam, which you can use to replace your
lowest score on one of the other questions.
If you are not taking Physics 15b or Physics 153, just wait to see this material
in Math 23b.
YouTube videos
These were made as part of a rather unsuccessful pedagogical experiment last
year. They are quite good, but you will need some extra time to watch them.
The Lecture Preview Videos were made by Kate. They cover the so-called
Executive Summaries in the weekly course materials, which go over all the
course materials, but without proofs or detailed examples.
If you watch these videos (it takes about an hour per week) you will be very
well prepared for lecture, and even the most difficult material will make sense
on a first hearing.
Last years experiment was unsuccessful because we assumed in lecture that
everyone had watched these videos, when in fact only half the class did
so. Those who did not watch them complained, correctly, that the lectures
skipped over basic material in getting to proofs and examples. This years
lectures will be self-contained, so the preview videos are not required viewing.
The R script videos were made by Paul. They provide a line-by-line explanation of the R scripts that accompany each weeks materials.
Last years experiment was unsuccessful because going over these scripts in
class was not a good use of lecture time. If you are doing the graduate
option, these scripts are pretty much required viewing, although the scripts
are so thoroughly commented that just working through them on your own
is perhaps a viable alternative.
If you are doing just the undergraduate option, you can ignore the R scripts
completely.

15

Homework: Homework (typically 8 problems) will be assigned weekly. The


assignment will be included in the same online document as the lecture notes and
section problems.
Assignments are due on Wednesdays by 10:00 AM. There will be a locked box
on the second floor, near Room 209, with your late section instructors name.
At 10 AM Kate will place a sheet of colored paper in each box, and anything above
that paper will be late! Please include your name, the assignment number, and
your CAs name on your assignment.
Each weeks assignment will include a couple of optional problems whose solutions require R scripts. These scripts should be uploaded electronically to the
dropbox on the Web site for that week. Please include your name as a comment
in the script and also in the file name.
The course assistant who leads your late section should return your corrected
homework to you at the section after the due date. If you are not receiving graded
homework on schedule, send email to penner@math.harvard.edu and the problem
will be dealt with.
Homework that is handed in after 10AM on the Wednesday when it is due
will not be graded. If it arrives before the end of Reading Period and looks fairly
complete, you will get a grade of 50% for it.
It is a violation of Federal privacy law for us to return graded homework by
placing it in a publicly accessible location like an instructors mailbox. You will
have to collect your graded homework from your section instructor in person.
Collaboration and Academic Integrity policy:
You are encouraged to discuss the course with other students and with the
course staff, but you must always write your homework solutions out yourself in
your own words. You must write the names of those youve collaborated
with at the top of your assignment.
If you collaborate with classmates to solve problems that call for R scripts, create
your own file after your study group has figured out how to do it.
Proofs that you submit to the course Web site must be done without consulting
files that other students have posted!
If you have the opportunity to see a complete solution to an assigned problem,
please refrain from doing so. If you cannot resist the temptation, you must cite
the source, even if all that you do is check that your own answer is correct.
You are forbidden to upload solutions to homework problems, whether your
own or ones that are posted on the course Web site, to any publicly available
location on the Internet.
Anything that you learn from lecture, from the textbook, or from working
homework problems can be regarded as general knowledge for purposes of this
course, and the source need not be cited. Anything learned in prerequisite courses
falls into the same category. Do not assume that other courses use some an expansive definition of general knowledge!

16

Tutoring: Several excellent students from previous years, qualified to be course


assistants but too busy, are registered with the Bureau of Study Counsel as tutors.
If you find yourself getting into difficulties, immediately contact the BSC and get
teamed up with one of them.
You will have to contact the BSC directly to arrange for a tutor, since privacy
law forbids anyone on the Math 23 staff to know who is receiving tutoring. A
website with more information can be found at www.bsc.harvard.edu.
Week-by-week Schedule:
Month
Fortnight 1
Week 2
Week 3
Week 4
Week 5

Date
Topic
September 3-11
Fields, vectors and matrices
September15-18
Dot and cross products; Euclidean geometry of Rn
September 22-25 Row reduction, independence, basis
Sept. 29 - Oct. 2 Eigenvectors and eigenvalues
October 6-9
Number systems and sequences
October 7
QUIZ 1 on weeks 1-4
Week 6
October 13-16
Series, convergence tests, power series
Week 7
October 20-23
Limits and continuity of functions
Week 8
October 27-30
Derivatives, inverse functions, Taylor series
Week 9
November 3-6
Topology, sequences in Rn , linear differential equations
October 29
QUIZ 2 on weeks 5-8
Week 10
November 10-13 Limits and continuity in Rn ; partial and directional derivatives
Week 11
November 17-20 Differentiability, Newtons method, inverse functions
Fortnight 12 Nov. 24-Dec. 3
Manifolds, critical points, Lagrange multipliers
November 26
Thansksgiving
Half-week 13 December 8
Calculus on parametrized curves; div, grad, and curl
December ?
FINAL EXAM on weeks 9-12
This schedule covers all the math that is needed for Physics 15a, 16, and 15b
with the sole exception of surface integrals, which will be done in the spring.
The real analysis in Math 23a alone will be sufficient for most PhD programs in
economics, though the most prestigious programs will want to see Math 23b also.
All the mathematics that is used in Economics 1011a will be covered by the end
of the term. The coverage of proofs is complete enough to permit prospective
Computer Science concentrators to skip CS 20.
Abstract vector spaces and multiple integration, topics of great importance to
prospective math concentrators, have all been moved to Math 23b.

17

MATHEMATICS 23a/E-23a, Fall 2016


Linear Algebra and Real Analysis I
Module #1, Week 1 (Fields, Vectors, and Matrices)
Authors: Paul Bamberg and Kate Penner
R scripts by Paul Bamberg
Last modified: June 13, 2015 by Paul Bamberg
Reading
Hubbard, Sections 0.1 through 0.4
Hubbard, Sections 1.1, 1.2, and 1.3
Lawvere and Schanuel, Conceptual Mathematics
Search the Internet for Harvard HOLLIS and type Conceptual Mathematics into the Search box.
Choose View Online. You will have to log in with your Harvard PIN.
At a minimum, read the following:
Article I (Sets, maps, composition definition of a category)
Session 2
This is very easy reading.
Proofs to present in section or to a classmate who has done them.
1.1 Suppose that a and b are two elements of a field F . Using only the
axioms for a field, prove the following:
If ab = 0, then either a or b must be 0.
The additive inverse of a is unique.
1.2(Generalization of Hubbard, proposition 1.2.9) A is an n m matrix.
The entry in row i, column j is ai,j
B is an m p matrix.
C is an p q matrix.
The entries in these matrices are all from the same field F . Using summation notation, prove that matrix multiplication is associative:
that (AB)C = A(BC). Include a diagram showing how you would lay out
the calculation in each case so the intermediate results do not have to be
recopied.
1.3 (Hubbard, proposition 1.3.14) Suppose that linear transformation
T : F n F m is represented by the m n matrix [T ].

a. Suppose that the matrix [T ] is invertible. Prove that the linear


transformation T is one-to-one and onto (injective and surjective),
hence invertible.
b. Suppose that linear transformation T is invertible. Prove that its
inverse S is linear and that the matrix of S is [S] = [T ]1
Note: Use * to denote matrix multiplication and to denote composition
of linear transformations. You may take it as already proved that matrix
multiplication represents composition of linear transformations. Do not
assume that m = n. That is true, but we are far from being able to prove
it, and you do not need it for the proof.

R Scripts
Script 1.1A-Finite Fields.R
Topic 1 - Why the real numbers form a field
Topic 2 - Making a finite field, with only five elements
Topic 3 - A useful rule for finding multiplicative inverses
Script 1.1B-PointsVectors.R
Topic 1 - Addition of vectors in R2
Topic 2 - A diagram to illustrate the point-vector relationship
Topic 3 - Subtraction and scalar multiplication
Script 1.1C-Matrices.R
Topic 1 - Matrices and Matrix Operations in R
Topic 2 - Solving equations using matrices
Topic 3 - Linear functions and matrices
Topic 4 - Matrices that are not square
Topic 5 - Properties of the determinant
Script 1.1D-MarkovMatrix
Topic 1 - A game of volleyball
Topic 2 - traveling around on ferryboats
Script 1.1L-LinearMystery
Topic 1 - Define a mystery linear function f M yst : R2 R2

Executive Summary
Quantifiers and Negation Rules
The universal quantifier is read for all.
The existential quantifier exists is read there exists. It is usually
followed by s.t, a standard abbreviation for such that.
The negation of x, P (x) is true is x, P (x) is not true.
The negation of x, P (x) is true is x, P (x) is not true.
The negation of P and Q are true is either P or Q is not true.
The negation of either P or Q is true is both P and Q are not true.
Functions
A function f needs two sets: its domain X and its codomain Y .
f is a rule that, to any element x X, assigns a specific element y Y .
We write y = f (x)
f must assign a value to every x X, but not every y Y must be of the
form f (x). The subset of the codomain consisting of elements that are of
the form y = f (x) is called the image of f . If the image of f is all of the
codomain Y , f is called surjective or onto
f need not assign different of elements of Y to different elements of X. If
x1 6= x2 = f (x1 ) 6= f (x2 ), f is called injective or one-to-one
If f is both surjective and injective, it is bijective and has an inverse f 1 .
Categories
A category C has objects (which might be sets) and arrows (which might
be functions)
An arrow f must have a specific domain objectX and a specific codomain
f
object Y ; we write f : X Y or X
Y.
If arrows f : X Y and g : Y Z are in the category, then the composition arrow f g : X Z is in the category.
For any object X there is an identity arrow IX : x X
Given f : X Y , f IX = f and IY f = f .
f
g
h
Associative law: given X
Y
Z
W , h (g f ) = (h g) f
Given an arrow f : X Y , an arrow g : Y X such that g f = IX is
called a retraction.
Given an arrow f : X Y , an arrow g : Y X such that f g = IY is
called a section.
If, for arrow f , arrow g is both a retraction and a section, then g is the
inverse of f , g = f 1 , and g must be unique.
Almost everything in mathematics is a special case of a category.
4

1.1

Fields and Field Axioms

A field F is a set of elements for which the familiar operations of addition and
multiplication are defined and behave in the usual way. Here is a set of axioms
for a field. You can use them to prove theorems that are true for any field.
1. Addition is commutative: a + b = b + a.
2. Addition is associative: (a + b) + c = a + (b + c).
3. Additive identity: 0 such that a F, 0 + a = a + 0 = a.
4. Additive inverse: a F, a such that a + a = a + (a) = 0.
5. Multiplication is associative: (ab)c = a(bc).
6. Multiplication is commutative: ab = ba.
7. Multiplicative identity: 1 such that a F, 1a = a.
8. Multiplicative inverse: a F {0}, a1 such that a1 a = 1.
9. Distributive law: a(b + c) = ab + ac.
Examples of fields include:
The rational numbers Q.
The real numbers R.
The complex numbers C.
The finite field Zp , constructed for any prime number p as follows:
Break up the set of integers into p subsets. Each subset is named after the
remainder when any of its elements is divided by p.
[a]p = {m|m = np + a, n Z}
Notice that [a + kp]p = [a]p for any k. There are only p sets, but each has
many alternate names. These p infinite sets are the elements of the field
Zp .
Define addition by [a]p + [b]p = [a + b]p . Here a and b can be any names for
the subsets, because the answer is independent of the choice of name. The
rule is Add a and b, then divide by p and keep the remainder.
Define multiplication by [a]p [b]p = [ab]p . Again a and b can be any names
for the subsets, because the answer is independent of the choice of name.
The rule is Multiply a and b, then divide by p and keep the remainder.

1.2

Points and Vectors

F n denotes the set of ordered lists of n elements from a field F . Usually the field
is R, but it could be the field of complex numbers C or a finite field like Z5 .
A given element of F n can be regarded either as a point, which represents
position data, or as a vector, which represents incremental data.
If an element of F n is a point, we represent it by a bold letter like p and write
it as a column of elements enclosed in parentheses.

1.1
p = 3.8
2.3
If an element of F n is a vector, we represent it by a bold letter with an arrow
like ~v and write it as a column of elements enclosed in square brackets.

0.2
~v = 1.3
2.2
To add a vector to a point, we add the components in identical positions together.
The result is a point: q = p + ~v. Geometrically we represent this by anchoring
the vector at the initial point p. The location of the arrowhead of the vector is
the point q that represents our sum.
q
~v

To add a vector to a vector, we again add component by component. The


result is a vector. Geometrically, the vector created by beginning at the initial
point of the first vector and ending at the arrowhead of the second vector is the
represents our sum.
~v + w
~
~
w
~v
To form a scalar multiple of a vector, we multiply each component by the
scalar. In Rn , the geometrical effect is to multiply the length of the vector by the
scalar. If the scalar is a negative number, we switch the position of the arrow to
the other end of the vector.
2~v
~v

2~v

1.3

Standard basis vectors

The standard basis vector ~ek has a 1 as its kth component, and all its other
components are 0. Since the additive identity 0 and the multiplicative identity
1 must be present an any field, there will always be n standard basis vectors in
F n . Geometrically, the standard basis vectors in R2 are usually associated with
one unit east and one unit north respectively.
~e2
~e1

1.4

Matrices and linear transformations

An m n matrix over a field F has m rows and n columns.


Matrices represent linear functions, also known as linear transformations:
A function g : F n F m is called linear if
g(a~v + b~
w) = ag(~v) + bg(~
w).
For a linear function g, if we know the value of g(~ei ) for each standard basis
vector e~i , the value of g(~v) for any vector v follows by linearity:
g(v1~e1 + v2~e2 + + vn~en ) = v1 g(~e1 ) + v2 g(~e2 ) + + vn g(~en )
The matrix G that represents the linear function g is formed by using g(~ek )
as the kth column. Then, if gi,j denotes the entry in the ith row and jth column
~ = g(~v) can be computed by the rule
of matrix G, the function value w
wi =

n
X

gi,j vj

j=1

1.5

Matrix multiplication

If m n matrix G represents linear function g : F n F m and n p matrix H


represents linear function h : F p F n , then the matrix product GH is defined
so that it represents their composition: the linear function g h : F p F m .
Start with standard basis vector ~ej . Function h converts this to the jth
column ~hj of matrix H. Then function g converts this column to g(~hj ), which
must therefore be the jth column of matrix GH.
The rule for forming the product GH can be stated in terms of the rule for a
matrix acting on a vector: to form GH, just multiply G by each column of H in
turn, and put the results side by side to create the matrix GH. If C = GH,
ci,j =

n
X

gi,k hk,j .

k=1

While matrix multiplication is associative, it is not commutative. Order matters!


7

1.6

Examples of matrix multiplication

0
1
B 2 1
2 0


2 1
0
A
1 1 2

0
1
1 1 2
3 2 BA
B 2 1 3
2 0
4 2 0
The number of columns in the first factor must equal the number of rows in
the second factor.




2 1
0
2 1
A
AB
1 1 2 6 2

1.7

Function inverses

A function f : X Y is invertible if it has the following two properties:


It is injective (one-to-one): if f (x1 ) = f (x2 ) , then x1 = x2 .
It is surjective (onto): y Y, x X such that f (x) = y.
The inverse function g = f 1 has the property that if f (x) = y then g(y) = x.
So g(f (x)) = x and f (g(y)) = y. Both f g and g f are the identity function.

1.8

The determinant of a 2 2 matrix





a b
For matrix A =
, det A = ad bc. If you fix one column, it is a linear
c d
function of the other column, and it changes sign if you swap the two columns.

1.9

Matrix inverses

A non-square m n matrix A can have a one-sided inverse.


If m > n, then A takes a vector in Rn and produces a longer vector in Rm .
In general, there will be many matrices B that can recover the original vector in
Rn , so that BA = In . In this case there is no right inverse.
If m < n, then A takes a vector in Rn and produces a shorter vector in Rm .
In general, there will be no left inverse matrix B that can recover the original
vector in Rn , but there may be many different right inverses for which AB = Im .
For a square matrix, it is possible for both a right inverse B and a left inverse
C to exist. In this case, we can prove that B and C are equal and they are
unique. We can say that an inverse A1 exists, and it represents the inverse of
the linear function represented by matrix A.
You can find the inverse of a 2 2 matrix A whose determinant is not zero
by using the formula




1
1
d b
d b
1
=
A =
det(A) c a
ad bc c a
8

1.10

Matrix transposes

The transpose of a given matrix A is written AT . The two are closely related.
The rows of A are the columns of AT and the columns of A are the rows of AT .




a b
a c
T
A=
,A =
c d
b d
The transpose of a matrix product is the product of the transposes, but in
the opposite order:
(AB)T = B T AT
A similar rule holds for matrix inverses:
(AB)1 = B 1 A1

1.11

Applications of matrix multiplication

In these examples, the sum of products rule for matrix multiilpication arises
naturally, and so it is efficient to use matrix techniques.
Counting paths: Suppose we have four islands connected by ferry routes:
1

0 0 1 1
1 0 0 0

The entry in row i, column j of the matrix A =


1 0 0 0 shows how
0 1 1 0
many ways there are to reach island i by a single ferry ride, starting from
island j. The entry in row i, column j of the matrix An shows how many
ways there are to reach island i by a sequence of n ferry rides, sarting from
island j.
Markov processes: A game of beach volleyball has two states: in state
1, team 1 is serving, in state 2, team 2 is serving. With each point that
is played there is a state transition governed by probabilities: for example, from state 1, there is a probability of 0.8 of remaining in state 1, a
probability of 0.2 of moving to state
 2. The
 transition probabilities can be
0.8 0.3
collected into a matrix like A =
. Then the matrix An specifies
0.2 0.7
the transition probabilities that result from playing n consecutive points.

Lecture Outline
1. Quantifiers and negation
Especially when you are explaining a proof to someone, it saves some writing
to use the symbols (there exists) and (for all).
Be careful when negating these.
The negation of x, P (x) is true is x, P (x) is not true.
The negation of x, P (x) is true is x, P (x) is not true.
When negating a statement, also bear in mind that
The negation of P and Q are true is either P or Q is not true.
The negation of either P or Q is true is both P and Q are not true.
For practice, lets negate the following statements (which may or may not
be true!)
There exists an even prime number.
Negation:
All 11-legged alligators are orange with blue spots. (Hubbard, page 5)
Negation:

The function f (x) is continuous on the open interval (0,1), which


means that x (0, 1),  > 0, > 0 such that y (0, 1),
|y x| < implies |f (y) f (x)| < .
Negation: f (x) is discontinuous on the open interval (0,1) means that

10

2. Set notation
Here are the standard set-theoretic symbols:
(is an element of)
{a|p(a)} (set of a for which p(a) is true)
(is a subset of)
(intersection)
(union)
(Cartesian product)
- or \ (set difference)
Using the integers Z and the real numbers R, lets construct some sets. In
each case there is one way to describe the set using a restriction and another
more constructive way to describe the set.
The set of real numbers whose cube is greater than 8 in magnitude.
Restrictive:

Constructive:

The set of coordinate pairs for points on the circle of radius 2 centered
at the origin (an example of a smooth manifold).
Restrictive:

Constructive:

11

3. Function terminology:
Here are some terms that should be familiar from your study of precalculus
and calculus:

Example a

Example b

Example c

domain
codomain
image
one-to-one = injective
onto = surjective
invertible = bijective
Using the sets X = {1, 2} and Y = {A, B, C}, draw diagrams to illustrate
the following functions, and fill in the table to show how the terms apply
to them:
f : X Y, f (1) = A, f (2) = B.

g : Y X, g(A) = 1, g(B) = 2, g(C) = 1.

h : Y Y, h(A) = B, h(B) = C, h(C) = A. (a permutation)

12

Here are those function words again, with two additions:


domain
natural domain (often deduced from a formula)
codomain
image
one-to-one = injective
onto = surjective
invertible = bijective
inverse image = {x|f (x) A}
Here are functions from R to R, defined by formulas.
f1 (x) = x2
f2 (x) = x3
f3 (x) = log x(natural logarithm)
f4 (x) = ex
Find one that is not injective (not one-to-one)

For f1 , what is the inverse image of (1, 4)?

Which function is invertible as a function from R to R?

What is the natural domain of f3 ?

What is the image of f4 ?

Specify domain and codomain so that f3 and f4 are inverses of one


another.

Did your calculus course use range as a synonym for image or for
codomain?

13

4. Composition of functions
Sometimes people find that a statement is hard to prove because it is so
obvious. An example is the associativity of function composition, which
will turn out to be crucial for linear algebra.
Prove that (f g) h = f (g h). Hint: Two functions f1 and f2 are equal
if they have the same domain X and, x X, f1 (x) = f2 (x).

Consider the set of men who have exactly one brother and least one son.
h(x) = father of x, g(x) = brother of x, f (x) = oldest son of x

f g is called
(f g) h is
g h is called
f (g h) is
Simpler name for both (f g) h and f (g h)

Consider the real-valued functions


g(x) = ex , h(x) = 3 log x, f (x) = x2

f g has the formula


(f g) h has the formula
g h has the formula
f (g h) has the formula
Simpler formula for both (f g) h and f (g h)

14

5. Finite sets and functions form the simplest example of a category


The objects of the category are finite sets.
The arrows of the category are functions from one finite set to another.
The definition of a function involves quantifiers.
Requirements for a function f : X Y
x X, !y Y such that f (x) = y
What is wrong with the following?

What is wrong with the following?

If arrows f : X Y and g : Y Z are in the category, then the


composition arrow f g : X Z is in the category.
For any object X there is an identity arrow IX : x X
Given f : X Y , f IX = f and IY f = f
Composition of arrows is associative:
f

Given X
Y
Z
W , h (g f ) = (h g) f
The objects do not have to be sets and the arrows do not have to be
functions. For example, the objects could be courses, and an arrow from
course X to course Y could mean if you have taken course X, you will
probably do better in course Y as a result. Check that the identity and
composition rules are satisfied.

15

6. Invertible functions - an example of invertible arrows


First consider the category of finite sets and functions between them.
The term inverse is used only for a two-sided inverse. Given f : X Y ,
an inverse f 1 : Y X must have the properties
f 1 f = IX and f f 1 = IY
Prove that the inverse is unique. This proof uses only things that are true
in any category, so it is valid in any category!

This function is not invertible because it is not injective, but it is surjective.

However, it has a preinverse (my terminology the official word is section.) Starting at an element of Y , choose any element of X from which
there is an arrow to that element. Call that function g. Then f g = IY
but g f 6= IX . Furthermore, g is not unique.
Prove the cancellation law that if f has a section and h f = k h, then
h = k (another proof that is valid in any category!)

This function f is not invertible because it is not surjective, but it is injective.

It has a postinverse (the official word is retraction). Just reverse all the
arrows to undo its effect, and define g however you like on the element of
Y that is not in the image of f . Then g f 6= IX g f = IX but f g 6= IY .
16

7. Fields
Loosely speaking, a field F is a set of elements for which the familiar operations of arithmetic are defined and behave in the usual way. Here is a set
of axioms for a field. You can use them to prove theorems that are true for
any field.
(a) Addition is commutative: a + b = b + a.
(b) Addition is associative: (a + b) + c = a + (b + c).
(c) Additive identity: 0 such that a F, 0 + a = a + 0 = a.
(d) Additive inverse: a F, a such that a + a = a + (a) = 0.
(e) Multiplication is associative: (ab)c = a(bc).
(f) Multiplication is commutative: ab = ba.
(g) Multiplicative identity: 1 such that a F, 1a = a.
(h) Multiplicative inverse: a F {0}, a1 such that a1 a = 1.
(i) Distributive law: a(b + c) = ab + ac.
This set of axioms for a field includes properties (such as the commutativity
of addition) that can be proved as theorems by using the other axioms. It
therefore does not qualify as an independent set, but there is no general
requirement that axioms be independent.
Some well-known laws of arithmetic are omitted from the list of axioms
because they are easily proved as theorems. The most obvious omission is
a F, 0a = 0.
Here is the proof. What axiom justifies each step?

0 + 0 = 0 so (0 + 0)a = 0a.
0a + 0a = 0a.
(0a + 0a) + (0a) = 0a + (0a).
0a + (0a + (0a)) = 0a + (0a).
0a + 0 = 0.
0a = 0.

17

8. Finite fields
Computing with real numbers by hand can be a pain, and most of linear
algebra works for an arbitrary field, not just for the real and complex numbers. Alas, the integers do not form a field because in general there is no
multiplicative inverse. Here is a simple way to make from the integers a
finite field in which messy fractions cannot arise.
Choose a prime number p.
Break up the set of integers into p subsets. Each subset is named after
the remainder when any of its elements is divided by p.
[0]p = {m|m = np, n Z}
[1]p = {m|m = np + 1, n Z}
[a]p = {m|m = np + a, n Z}
Notice that [a + kp]p = [a]p for any k. There are only p sets, but each
has many alternate names.
These p infinite sets are the elements of the field Zp .
Define addition by [a]p + [b]p = [a + b]p . Here a and b can be any
names for the subsets, because the answer is independent of the choice
of name. The rule is Add a and b, then divide by p and keep the
remainder.
What is the simplest name for [5]7 + [4]7 ?

What is the simplest name for the additive inverse of [3]7 ?

Define multiplication by [a]p [b]p = [ab]p . Again a and b can be any


names for the subsets, because the answer is independent of the choice
of name. The rule is Multiply a and b, then divide by p and keep the
remainder.
What is the simplest name for [5]7 [4]7 ?

Find the multiplicative inverse for each nonzero element of Z7

18

9. Rational numbers
The rational numbers Q form a field. You learned how to add and multiply
them years ago! The multiplicative inverse of ab is ab as long as a 6= 0.
The rational numbers are not a big enough field for doing Euclidean
geometry or calculus. Here are some irrational quantities:

2
.
most values of trig functions, exponentials, or logarithms.
coordinates of most intersections of two circles.
10. Real numbers
The real numbers R constitute a field that is large enough so that any
characterization of a number in terms of an infinite sequence of real numbers
still leads to a real number.
A positive real number is an expression like 3.141592... where there is no
limit to the number of decimal places that can be provided if requested.
To get a negative number, put a minus sign in front. This is Hubbards
definition.
An equivalent viewpoint is that a positive real number is the sum of an
integer and an infinite series of the form

X
i=1

ai (

1 i
)
10

where each ai is one of the decimal digits 0...9.


Write the first three terms of an infinite series that converges to .

The rational numbers and the real numbers are both ordered fields. This
means that there is a subset of positive elements that is closed under both
addition and multiplication. No finite field is ordered.
In Z5 , you can name the elements [0], [1], [2], [2], [1], and try to call the
elements [1] and [2] positive. Why does this attempt to make an ordered
field fail?

19

11. Proof 1.1 - two theorems that are valid in any field
(a) Using nothing but the field axioms, prove that if ab = 0, then either a
or b must be 0.

(b) Using nothing but the field axioms, prove that the additive inverse
of an element a is unique. (Standard strategy for uniqueness proofs:
assume that there are two different inverses b and c, and prove that
b = c.

20

12. Lists of field elements as points and vectors:


F n denotes the set of ordered lists of n elements from a field F . Usually
the field is R, but it could be the field of complex numbers C or a finite
field like Z5 .
An element of F n can be regarded either as a point, which represents position data, or as a vector, which represents incremental data. Beware:
many textbooks ignore this distinction!
If an element of F n is a point, we represent it by a bold letter like p and
write it as a column of elements enclosed in parentheses.

1.1
p = 3.8 ,
2.3
If an element of F n is a vector, we represent it by by a bold letter with
an arrow like ~v and write it as a column of elements enclosed in square
brackets.

0.2
~v = 1.3
2.2
13. Relation between points and vectors, inspired by geometry:
Add vector ~v component by component to point A to get point B.
Subtract point A component by component from point B to get vector
~v.
~
Vector addition: if adding ~v to point A gives point B and adding w
~ to point A gives point
to point B gives point C, then adding ~v + w
C.
A vector in F n can be multiplied by any element of F to get another
vector.
Draw a diagram to illustrate these operations without use of coordinates,
as is typically done in a physics course.

21

14. Examples from coordinate geometry


Here are two points in the plane.

p=




1.4
2.4
,q =
3.8
4.8

Here are two vectors.






0.2
0.6
~v =
~ =
,w
1.3
0.2
What is q p?

What is p + ~v?

What is ~v 1.5~
w?

What, if anything, is p + q?

What is 0.5p + 0.5q? Why is this apparently illegal operation OK?

22

15. Subsets of F n
A subset of F n can be finite, countably infinite, or uncountably infinite.
The concept is especially useful when the elements of F n are points, but it
is valid also for vectors.
Examples:
     
0
1
2
(a) In
consider the set {
,
,
}.
1
2
0
This will turn out (outline 7) to be a line in the small affine faculty
senate. Write it in the form {p + t~v|t Z3 }.
Z23 ,

(b) In R2 , consider the set of points whose coordinates are both positive
integers. Is it finite, countably infinite, or uncountably infinite?

(c) In R2 , consider the set of points on the unit circle, a one-dimensional


manifold. Is it finite, countably infinite, or uncountably infinite?

 
x
(d) In R , draw a diagram that might represent the set of points
,
y
where x is family income and y is family net worth, for which a family
qualifies for free tuition.
2

23

16. Subspaces of F n
A subspace is defined only when the elements of F n are vectors. It must
be closed under vector addition and scalar multiplication. The second requirement means that the zero vector must be in the subspace. The empty
set is not a subspace!
Geometrically, a subspace corresponds to a flat subset (line, plane, etc.)
that includes the origin.
For R3 there are four types of subspace. What is the geometric interpretation of each?

0

0-dimensional: the set { 0}


0
1-dimensional: {t~u|t R}
Exception: 0-dimensional if

2-dimensional: {s~u + t~v|s, t R}


Exception: 1-dimensional if

3-dimensional: {r~u + s~v + t~


w|r, s, t R}
Exceptions: 2-dimensional if

1-dimensional if

A special type of subset is obtained by adding all the vectors in a subspace


to a fixed point. It is in general not a subspace, but it has special properties.
Lines and planes that do not contain the origin fall into this category.
We call such a subset an affine subset. This terminology is not standard:
the Math 116 textbook uses linear variety.

24

17. Standard basis vectors:


These are useful when we want to think of F n more abstractly.
The standard basis vector e~i has a 1 in position i, a 0 everywhere else. Since
0 and 1 are in every field, these vectors are defined for any F .
The nice thing about standard basis vectors is that in F n , any vector can
be represented uniquely in the form
n
X

xi e~i

i=1

This will turn out to be true also in an abstract n-dimensional vector space,
but in that case there will be no standard basis.
18. Another meaning for field
Physicists long ago started using the term field to mean a function
that assigns a vector to every point. Examples are the gravitational field,
electric field, and magnetic field.
Another example: in a smoothly flowing stream or in a blood vessel, there
is a function that assigns to each point the velocity vector of the fluid at
that point: a velocity field.
 
x1
If
is the point whose coordinates are the interest rate x1 and the
x2
unemployment rate x2 , then the Fed chairman probably has in mind the
function that assigns to this point a vector: the expected change in these
quantities over the next month.
 
x1
~
A function F
that assigns to this point a vector of rates of change:
x2
 
 dx1 
dt
~ x1
dx2 = F
x2
dt
specifies a linear differential equation involving two variables. In November
you will learn to solve such equations by matrix methods.
Here is a formula for a vector field from Hubbard, exercise 1.1.6 (b). Plot
it.
   
x
x
F~
=
.
y
0

25

Here are formulas for vector fields from Hubbard, exercise 1.1.6, (c) and
(e). Plot them. If you did Physics C Advanced Placement E&M, they may
look familiar.
F~

   
x
x
=
,
y
y

F~

   
x
y
=
y
x

19. Matrices
An m n matrix over a field F is a rectangular array of elements of F with
m rows and n columns. Watch the convention: the height is specified first!
As a mathematical object, any matrix can be multiplied by any element of
F . This could be meaningless in the context of an application. Suppose
you run a small hospital that has two rooms with three patients in each.
Then



98.6 102.4 99.7
103.2 98.3 99.6

is a perfectly reasonable way to keep track of the body temperatures of the


patients, but multiplying it by 2.7 seems unreasonable. This matrix, viewed
as an element of R6 , is a point, not a vector, but we always use braces for
matrices.
Matrices with the same size and shape can be added component by component. What would you get if you add


0.2 1.4 0.0
0.6 0.9 2.35
to the matrix above to update the temperature data by one day?

26

20. Matrix multiplication


Matrix multiplication is nicely explained on pp. 43-46 of Hubbard. To
illustrate the rule, we will take



0
1
2 1
0
A=
, B = 2 1
1 1 2
2 0

0
1
Compute AB. 2 1
2 0


2 1
0
1 1 2



2 1
0
Compute BA.
1 1 2

0
1
2 1
2 0

In a set of nn square matrices, addition and multiplication of matrices are


always defined. Multiplication is distributive with respect to addition, too.
But because matrix multiplication is noncommutative, the n n matrices
do not form a field if n > 1. (They are said to form a ring.) Let



0 1
1 1
B=
A=
2 1
1 0


0 1
Find AB.
2 1


1 1
1 0
1 1
Find BA.
1 0


0 1
2 1

27

21. Matrices as functions:


Since a column vector is also an n 1 matrix, we can multiply an m n
matrix by a vector in F n to get a vector in F m . The product A~
ei is the
ith column of A. This is usually the best way to think of a matrix A as
representing a linear function f : the ith column of A is f (~
ei ).
 
 
 
 
1
1
0
2
Example: Suppose that f (
)=
, f(
)=
.
0
4
1
3
What matrix represents f ?

Since A(xi e~i + xj e~j ) is the sum of xi times column i and xj times column
j, we see that
f (xi e~i + xj e~j ) = xi f (~
ei ) + xj f (~
ej )
This is a requirement if f is to be a linear function.
 
2
Use matrix multiplication to calculate f (
).
1

The rule for forming the product AB can be stated in terms of the rule for
a matrix acting on a vector: to form AB, just let A act on each column of
B in turn, and put the results side by side to create the matrix AB.
What function does the matrix product AB represent? Consider (AB)~
ei .
This is the ith column of the matrix AB, and it is also the result of letting
B act on e~i , then letting A act on the result. So for any standard basis
vector, the matrix AB represents the composition A B of the functions
represented by B and by A.
What about the matrices (AB)C and A(BC)? These represent the composition of three functions: say (f g) h and f (g h). But we already know
that composition of functions is associative. So we have proved, without
any messy algebra, that multiplication of matrices is associative also.

28

22. Proving associativity by brute force (proof 1.2)


A is an n m matrix.
B is an m p matrix.
C is an p q matrix.
What is the shape of the matrix ABC?
Show how you would lay out the calculation of (AB)C.

If ai,j represents the entry in the ith row, jth column of A, then
(AB)i,k =

m
X

ai,j bj,k

j=1

((AB)C)i,q =

p
X

(AB)i,k ck,q =

p
m X
X

(ai,j bj,k )ck,q

j=1 k=1

k=1

Show how you would lay out the calculation of A(BC).

(BC)j,q =
(A(BC))i,q =
On what basis can you now conclude that matrix multiplication is associative for matrices over any field F ?

Group problem 1.1.1c offers a more elegant version of the same proof by
exploiting the fact that matrix multiplication represents composition of
linear functions.
29

23. Identity matrix:


It must be square, and the ith column is the ith basis vector. For example,

1 0 0
I3 = 0 1 0
0 0 1
24. Matrices as the arrows for a category C
Choose a field F , perhaps the real numbers R.
An object of C is a vector space F n .
An arrow of C is an n m matrix A, with domain F m and codomain
F n.
B

Given F p
Fm
F n the composition of arrows A and B is the
matrix product AB. Show that the shape of the matrices is right
for multiplication.

The identity arrow for object F n is the n n identity matrix.


Now we just have to check the two rules that must hold in any category:
The associative law for composition of arrows holds because, as we
just proved, matrix multiplication is associative.


2 3 4
Verify the two identity rules for the case where A =
.
1 2 3

30

25. Matrix inverses:


Consider first the case of a non-square m n matrix A.
If m > n, then A takes a vector in Rn and produces a longer vector in
Rm . In general, there will be many matrices B that can recover the original
vector in Rn . In the lingo of categories, such a matrix B is a retraction.
Here is a matrix that converts a 2-component vector (price of silver and
price of gold) into a three-component vector that specifies the price of
 alloys

4
containing 25%, 50%, and 75% gold respectively. Calculate ~v = A
.
8

 
.75 .25
4
A = .5 .5 , ~v = A
=
8
.25 .75
By elementary algebra you can reconstruct the price of silver and of gold
from the price of any two of the alloys, so it is no surprise to find two
different left inverses. Apply each of the following to ~v.


2 1 0
B1 =
, B1~v =
2 3 0



0 3 2
B2 =
, B2~v =
0 1 2
However, in this case there is no right inverse.
If m < n, then A takes a vector in Rn and produces a shorter vector in
Rm . In general, there will be no left inverse matrix B that can recover the
n
original
 vector
 in R , but there may be many different right inverses. Let
A = 1 1 and find two different right inverses. In the lingo of categories,
such a matrix A is a section.

31

26. Inverting square matrices


For a square matrix, the interesting case is where both a right inverse B
and a left inverse C exist. In this case, B and C are equal and they are
unique. We can say that an inverse A1 exists.
Proof of both uniqueness and equality:
To prove uniqueness of the left inverse matrix, assume that matrix A has
two different left inverses C and C 0 and a right inverse B:
C 0 A = CA = I
C 0 (AB) = C(AB) = IB
C 0 I = CI = B
C0 = C = B
In general, inversion of matrices is best done by row reduction, discussed
in Chapter 2 of Hubbard. For 2 2 matrices there is a simple formula that
is worth memorizing:
If



a b
A=
c d
then
1



1
d b
=
ad bc c a

If ad bc = 0 then no inverse

3
Write down the inverse of
4

exists.

1
, where the elements are in R.
2

32

The matrix

 inversion recipe works in any field: try inverting
3 1
A=
where the elements are in Z5 .
4 2

27. Other matrix terminology:


All these terms are nicely explained on pp 49-50 of Hubbard.
transpose
symmetric matrix
antisymmetric matrix
diagonal matrix
upper or lower triangular matrix
Try applying them to some 3 3 matrices:

3 1 2
A = 1 2 3
2 3 4

3 0 0
B = 1 2 0
2 3 4

3 1 2
C = 0 2 3
0 0 4

3 0 0
D = 0 2 0
0 0 4

0 1 2
E = 1 0 3
2 3
0

33

28. Linear transformations:


~ F n and
A function T : F n F m is called linear if, for any vectors ~v, w
any scalars a, b F
T (a~v + b~
w) = aT (~v) + bT (~
w)
Example:
The components of ~v are the quantities of sugar, flour, and chocolate re~ are the
quired to produce a batch of brownies. The components of w
quantities of these ingredients required to produce a batch of fudge. T
is the function that converts such a vector into the total cost of ingredients. T is represented by a matrix [T ] (row vector) of prices for the various
ingredients.
Write these vectors for the following data:
A batch of brownies takes 3 pounds of sugar, 6 of flour, 1 of chocolate,
while a batch of fudge takes 4 pounds of sugar, 0 of flour, 2 of chocolate.

Sugar costs $2 per pound, flour costs $1 per pound, chocolate costs $6
per pound.

Then a~v + b~
w is the vector of ingredients required to produce a batches of
brownies and b batches of fudge, while T (~v) is the cost of parts for a single
batch of brownies. The statement
T (a~v + b~
w) = aT (~v) + bT (~
w) is sound economics.
Two ways to find the cost of 3 batches of brownies plus 2 batches of fudge.

T (3~v + 2~
w) =
3T (~v) + 2T (~
w) =
Suppose that T produces a 2-component vector of costs from two competing
grocers. In that case [T ] is a 2 3 matrix.

34

29. A linear transformation interpreted geometrically.


A parallelogram has one vertex at the origin. Two other vertices are located
~ . Transformation T expands the
at points in the plane specified by ~v and w
parallelogram by a factor of 2 and rotates it counterclockwise through a
right angle.
You can either locate the fourth vertex by vector addition and then apply
T to it, or you can apply T separately to the second and third vertices,
than apply T . So
~ ) = T (~v) + T (~
T (~v + w
w)
Draw diagrams to illustrate both approaches.

The matrix that represents T is





0 2
[T ] =
2 0
 
a
By letting [T ] multiply an arbitrary vector
you can determine the effect
b
 
2
of T on any point in the plane. Do this for the vector
.
1

35

30. Matrices and linear transformations


Use * to denote the mechanical operation of matrix multiplication.
Any vector can be written as ~v = x1 e~1 + ... + xn e~n .
The rule for multiplying a matrix [T ] by a vector ~v is equivalent to
[T ] ~v = x1 [T ] e~1 + ... + xn [T ] e~n = [T ] (x1 e~1 + ... + xn e~n )
.
So multiplication by [T ] specifies a linear transformation of F n .
The matrix [T ] has columns [T ] (e~1 ), ...[T ] (e~n ).
The distinction is subtle. T is a function, a rule. [T ] is just a collection
of numbers, but the general rule for matrix multiplication turns it into a
function.
31. Composition and multiplication:
Suppose S : F n F m and T : F m F p are both linear transformations.
Then the codomain of S equals the domain of T and we can define the
composition U = T S.
Prove that U is linear.

To find the matrix of U , we need only determine its action on a standard


basis vector.
U (~
ei ) = T (S(~
ei )) = T ([S] e~i ) = [T ] ([S] e~i ) = ([T ] [S]) e~i
So the matrix of T S is [T ] [S].

36

32. Inversion
A function f is invertible if it is 1-to-1 (injective) and onto (surjective). If
g is the inverse of f , then both g f and f g are the identity function.
How do we reconcile this observation with the existence of matrices that
have one-sided inverses?
Here are two simple examples that identify the problem.
(a) Define f by the formula f (x) = 2x. Then
f : R R is invertible.
f : Z3 Z3 is invertible.
f : Z Z is not invertible.
f : Z 2Z is invertible. (2Z is the set of even integers)
In the last case, we have made f invertible by redefining its codomain
to equal its image.

(b) If we want to say that the inverse of f (x) = x2 is g(x) = x, we have


to redefine f (x) so that its codomain is the nonnegative reals (makes
it onto) and its domain is the nonnegative reals (makes it one-to-one).
The codomain of the function that an m n matrix represents is all of Rm .
Hubbard p. 64 talks about the invertibility of a linear transformation T :
F n F m and ends up commenting that m and n must be equal. Here is
the problem, whose proof will have to wait:
If m > n, T cannot be onto, because its image is just a subspace of F m .
 
1
Show how the case where [T ] =
illustrates the problem.
2

If m < n, T cannot be one-to-one, because there is always a subspace of


F n that gets mapped to the zero vector.


Show how the case where [T ] = 1 1 illustrates the problem.

37

33. Example - constructing the matrix of a linear transformation


Here is what we know about function f :
Its domain and codomain are both R2 .
It is linear.
 
 
1
7
f(
)=
.
2
5
 
 
1
11
f(
)=
.
4
9
Find the matrix T that represents f by using linearity to determine what
f does to the standard basis vectors.
Then automate the calculation by writing down a matrix equation and
solving it for T.

38

34. Invertibility of linear functions and of matrices (proof 1.3, Hubbard, proposition 1.3.14)
Since the key issue in this proof is the subtle distinction between a linear
function T and the matrix [T ] that represents it, it is a good idea to use
* to denote matrix multiplication and to denote composition of linear
transformations.
It is also a good idea to use ~x for a vector in the domain of T and ~y for a
vector in the codomain of T
Suppose that linear transformation T : F n F m is represented by the
m n matrix [T ].
(a) Suppose that the matrix [T ] is invertible. Prove that the linear transformation T is one-to-one and onto (injective and surjective), hence
invertible.

(b) Suppose that linear transformation T is invertible. Prove that its


inverse S is linear and that the matrix of S is [S] = [T ]1
The shortest version of this proof starts by exploiting the linearity of
T when it is applied to a cleverly-chosen sum of vectors.
T (aS(y~1 ) + bS(y~2 )) = aT S(y~1 ) + bT S(y~2 ).

39

35. Application: graph theory


This is inspired by example 1.2.22 in Hubbard (page 51), but I have extended it by allowing one-way edges and multiple edges.
A graph has n vertices: think of them as islands. Given two vertices Vi and
Vj , there may be Ai,j edges (bridges or ferryboats) that lead from Vj to Vi
and Aj,i edges that lead from Vi to Vj . If a bridge is two-way, it counts
twice, but we allow one-way bridges.
The matrix

0
1
A=
1
0

0
0
0
1

1
0
0
1

1
0

0
0

corresponds to the following directed graph:

40

Clearly A is a matrix, and it describes the graph completely. The challenge


is to associate it with a linear transformation and to interpret its columns
as vectors.
Suppose you are a travel agent and you keep a notebook with a complete list
of all the ways that you have found to reach each island. So one component,
xj , would count the number of ways that you have found to reach island j.
A standard basis vector like e~j describes a notebook that has one way of
reaching island j (land at the airport?) and no way of reaching any other
islands.
It is always worth asking what (if anything) the operations of addition and
scalar multiplication mean. Addition is tricky: in general, it would have
to correspond to two different agents combining their notebooks, with no
attempt to weed out duplicates. Multiplication by a non-integer makes no
sense.
What about A~
ej ? This is the jth column of A and its ith component is
Ai,j , the number of edges leading from Vj to Vi . (Hubbard has chosen the
opposite convention in Exercises 1.2.20 and 1.2.22, but for his example the
matrix is symmetric and it makes no difference). It is an annoying feature
of matrix notation that the row index comes first, since we choose a column
first and then consider its entries.)

x1
x2

Now consider a vector ~v =


... whose entries are arbitrary non-negative
xn
integers. After traversing one more edge, the number of walks that lead to
vertex Vi is
n
X

Ai,j xj .

j=1

This is a linear function, and we see that the vector A~v represents the
number of distinct ways of reaching each island after extending the existing
list of walks by following one extra edge wherever possible.
If you start on island Vj and make a walk of n steps, then the number of
distinct walks leading to each island is specified by the components of the
vector An e~j .
Hubbard does the example of a cube, where all edges are two-way.

41


0
1
For the four-island graph, with A =
1
0

0
0
0
1

1
0
,
0
0

1
0
0
1

use matrix multiplication to find


(a) the number of two-step paths from island 1 to island 4.
(b) the number of three-step paths from island 1 to island 2.
(c) the number of four-step paths from island 3 to island 1.

0
1

1
0

0
1

1
0

0
0
0
1

1
0
0
1

0
0
0
1

1
0
0
1

1
0

0
0

0
1

1
0

1
0

0
0

42

0
0
0
1

1
0
0
1

1
0

0
0

36. Application: Markov processes


This is inspired by example 1.2.21 in Hubbard, but in my opinion he breaks
his own excellent rule by using a line matrix to represent probabilities.
The formulation below uses a column vector.
Think of a graph where the vertices represent states of a random process.
A state could, for example, be
(a) A travel agent is on a specific island.
(b) Player 1 is serving in a game of badminton.
(c) Hubbards reference books are on the shelf in the order (2,1,3).
(d) A roulette player has two chips.
(e) During an inning of baseball, there is one man out and runners on first
base and third base.
All edges are one way, and attached to each edge is a number in [0,1], the
transition probability of following that edge in one step of the process.
The sum of the probabilities on all the edges leading out of a state cannot
exceed 1, and if it is less than 1 there is some probability of remaining in
that state.
Examples: write at least one column of the matrix for each case.
(a) If you are on Oahu, the probability of flying to Maui is 0.2, and the
probability of flying to Lanai is 0.1. Otherwise you stay put.

(b) Badminton: if player 1 serves, the probability of losing the point and
the serve is 0.2. If player 2 serves, the probability of losing the point
and the serve is 0.3.

(c) If John Hubbards reference books are on the shelf in the order (2,1,3),
the probability that he consults book 3 and places it at the left to make
the order (3,2,1) is P3 .

43

(d) Roulette: after starting with 2 chips and betting a chip on red, the
9
probability of having 3 chips is 19
and the probability of having 1 chip
10
is 19 . (in a fair casino, each probability would be 12 ).

For the badminton example, the transition matrix is




0.8 0.3
A=
.
0.2 0.7
What matrix represents the transition resulting from two successive points?



0.8 0.3
0.2 0.7


0.8 0.3
0.2 0.7
What matrix represents the transition resulting from four successive points?


0.7 0.45
0.3 0.55



0.7 0.45
0.3 0.55
If you raise the transition matrix A to a high power, you might conjecture
that after a long time the probability that player 1 is serving is 0.6, no
matter who served first.


0.6 0.6

In support of this conjecture, show that the matrix A =


has
0.4 0.4
the property that AA = A .

44

Group Problems
1. Some short proofs
Once your group has solved its problem, use a cell phone to take a picture
of your solution, and upload it to the topic box for your section on the
Week 1 page of the Web site.
(a) When we say that a matrix A is invertible, we mean that it has both
a right inverse and a left inverse. Prove that the right inverse and the
left inverse are equal, and that the inverse is unique.
If you need a hint, see page 48 of Hubbard.
Illustrate
your
answer by writing down the inverse B of the matrix


3 2
A=
, where all the entries are in the finite field Z5 , and showing
2 4
that both AB and BA are equal to the identity matrix.
Since you are working in a finite field, there are no fractions. In Z5 ,
dividing by 3 is the same as multiplying by 2.
(b) Here are two well-known laws of arithmetic that are not on the list of
field axioms. They do not need to be listed as axioms because they are
provable theorems! In each case, the trick is to start with an identity
that is valid in any field, then apply the distributive law. You should
be able to justify each step of your proof by reference to one or more
of the field axioms.
Starting with 0 + 0 = 0, prove that 0a = 0 for any a F .
Starting with 1 + 1 = 0, prove that (1)a = a for any a F .
(c) Prove that composition of functions, whether linear or not, is associative. Illustrate your proof by using the functions
f (x) = x2 , g(x) = ex , h(x) = 3 log x (natural logarithms)
and computing both f (g h) and (f g) h
Then use your result to give a one-line proof that matrix multiplication
must be associative. See Hubbard, page 63.

45

2. Matrices and linear functions


(a) Here is what we know about the function f :
The space it maps from and the space it maps to (the domain and
codomain, respectively) are both R2 .
It is linear. 
1
4
f(
)=
1
2
 
 
1
6
f(
)=
3
4
i. Find the matrix T that represents f by using linearity to determine what f does to the standard basis vectors.
ii. Automate the calculation of T by writing down a matrix equation
and solving it for T .
(b) Suppose that T : (Z5 )2 (Z5 )2 is a linear transformation for which
       
1
1
1
3
T
=
,T
=
.
1
2
1
0
Construct the matrix [T ] that represents T and the matrix [S] that
represents T 1 .
Since you are working in a finite field, there are no fractions. Dividing
by 2 is the same as multiplying by 3.
(c) You are a precious metals dealer. Every day you check the Internet
and download a vector whose first component is the price per ounce
of gold and whose second component is the price per ounce of silver.
You then calculate a vector in R3 whose components are respectively
the price per ounce of 18-carat gold (75% gold, 25% silver)
the price per ounce of 12-carat gold (50% gold, 50% silver)
the price per ounce of 6-carat gold (25% gold, 75% silver)
Write down the matrix F that represents the linear function
f : R2 R3 which converts the prices of pure metals to the prices of
alloys.
Invent two different left inverses G1 and G2 for F .
Show that no right inverse for F exists. Explain in economic terms
what is going on here. (The alloys may be inconsistently priced.)

46

3. Problems to be solved by writing or editing R scripts


Upload your answer immediately to the Week 1 page of the course Web
site. Then your classmates can try out your script.
(a) Use the outer() function of R to make a table of the multiplication
facts for Z17 and use it to find the multiplicative inverse of each nonzero
element. Then use these inverses to find the result of dividing 11 by 5
and the result of dividing 5 by 11 in this field.
(b) You are playing roulette in an American casino, and for any play you
may have 0, 1, 2, or 3 chips. When you bet a chip on odd you
have only an 18/38 chance of winning, because the wheel has 18 odd
numbers, 18 even numbers, plus 0 and 00 which count as neither even
nor odd.
If you have 0 chips you cannot bet and continue to have 0 chips.
If you have 1 chip you have probability 9/19 of moving up to 2
chips, probability 10/19 of moving down to 0 chips.
If you have 2 chip you have probability 9/19 of moving up to 3
chips, probability 10/19 of moving down to 1 chip.
If you have 3 chips you declare victory, do not bet, and continue
to have 3 chips.
Create the 4 4 matrix that represents the effect of one play. Assume
that before the first play you are certain to have 2 chips. Use matrix
multiplication to determine the probability of your having 0, 1, 2, or 3
chips after 1, 2, 4 and 8 plays. Make a conjecture about the situation
after a very large number of plays.
(c) If you include in your R script the line
source("1.1L-LinearMystery.R")
it will define a function f M yst : R2 R2 that is linear and invertible.
Every time you execute the source() line the function changes!
Write an R script that shows how to construct the matrix F for this
function
by evaluating f M yst on the standard basis vectors.
 
 
1
1
and
by evaluating f M yst only on the vectors
.
1
1
 
 
2
6
.
and
by evaluating f M yst only on the vectors
1
2
This script can solve problem a in set 2 on the preceding page!

47

Homework

(PROBLEM SET 1 - due on Tuesday, September 9 by 11:59 PM)


Problems 1-7 should be done on paper and placed in the locked box near
Science Center 209 that has the name of your Monday section instructor on it.
Problems 8 and 9 should be done in a single R script and uploaded to the
dropbox on the Week 1 page of the course Web site.
1. Prove the following, using only the field axioms and the results of group
problem 1(b).
(a) The multiplicative inverse a1 of a nonzero element a of a field is
unique.
(b) (a)(b) = ab.
2. Function composition
( Hubbard, exercise 0.4.10.)
Prove the following:
(a) Let the functions f : B C and g : A B be onto. Then the
composition (f g) is onto.
(b) Let the functions f : B C and g : A B be one-to-one. Then the
composition (f g) is one-to-one.
This problem asks you to prove two results that we will use again and again.
All you need to do is to use the definitions of one-to-one and onto.
Here are some strategies that may be helpful:
Exploit the definition:
If you are told that f (x) is onto, then, for any y in the codomain Y ,
you can assert the existence of an x such that f (x) = y.
If you are told that f (x) is one-to-one, then, for any a and b such that
f (a) = f (b), you can assert that a = b.
Construct what the definition requires by a procedure that cannot fail:
To prove that h(x) is onto, describe a procedure for constructing an x
such that h(x) = y. The proof consists in showing that this procedure
works for all y in the codomain Y .
Prove uniqueness by introducing two names for the same thing:
To prove that h(x) is one-to-one, give two different names to the same
thing: assume that h(a) = h(b), and prove that a = b.

48

3. Hubbard, exercise 1.2.2, parts (a) and (e) only. Do part (a) in the field
R, and do part (e) in the field Z7 , where -1 is the same as 6. Check your
answer in (e) by doing the calculation in two different orders: according to
the associative law these should give the same answer. See Hubbard, figure
1.2.5, for a nice way to organize the calculation.
4. (a) Prove theorem 1.2.17 in Hubbard: that the transpose of a matrix
product is the product of the matrices in the opposite order: (AB)T =
B T AT .




1 2
2 1
(b) Let A =
,B =
. Calculate AB. Then, using the
2 3
1 3
theorem you just proved, write down the matrix BA without doing any
matrix multiplication. (Notice that A and B are symmetric matrices.)
(c) Prove that if A is any matrix, then AT A is symmetric.
5. (a) Here is a matrix whose entries are in the finite field Z5 .


[1]5 [2]5
A=
[3]5 [3]5

Write down the inverse of A, using the names [0]5 [4]5 for the entries
in the matrix. Check your answer by matrix multiplication.
(b) Count the number of different 2 2 matrices with entries in the finite
field Z5 . Of these, how many are invertible? Hint: for invertibility, the
left column cannot be zero, and the right column cannot be a multiple
of the left column.
6. (a) Hubbard, Exercise 1.3.19, which reads:
. Show
If A amd B are n n matrices, their Jordan product is AB+BA
2
that this product is commutative but not associative.
Since this problem has an odd number, it is solved in the solutions
manual for the textbook. If you want to consult this manual, OK, but
remember to cite your source!
(b) Denote the Jordan product of A and B by AB. Prove that it satisfies
the distributive law A (B + C) = A B + A C.
(c) Prove that the Jordan product satisfies the special associative law
A (B A2 ) = (A B) A2 .

49

       
3
6
2
5
7. (a) Suppose that T is linear and that T
=
,T
=
.
2
8
1
5
 
 
1
0
Use the linearity of T to determine T
and T
, and thereby de0
1
termine the matrix [T ] that represents T . (This brute-force approach
works fine in the 2 2 case but not in the n n case.)
(b) Express the given information about T from part (b) in the form
[T ][A] = [B], and determine the matrix [T ] that represents T by using
the matrix [A]1 . (This approach will work in the general case once
you know how to invert an n n matrix .)
The last two problems require R scripts. It is fine to copy and edit similar
scripts from the course Web site, but it is unacceptable to copy and edit
your classmates scripts!
8. (similar to script 1.1C, topic 5)
~ and v2
~ denote the columns of a 2 2 matrix M . Write an R script
Let v1
that draws a diagram to illustrate the rule for the sign of det M , namely
~ counterclockwise (through less than 180 ) to
If you have to rotate v1
~ then detM > 0.
make it line up with v2,
~ clockwise (through less than 180 ) to make it
If you have to rotate v1
~ then detM < 0.
line up with v2,
~ and v2
~ lie on the same line through the origin, then
If v1
det M = 0.
9. (similar to script 1.1D, topic 2)
Busch Gardens proposes to open a theme park in Beijing, with four regions
connected by monorail. From region 1 (the Middle Kingdom), a guest can
ride on a two-way monorail to region 2(Tibet), region 3(Shanghai) or region
4(Hunan) or back. Regions 2, 3, and 4 are connected by a one-way monorail
that goes from 2 to 3 to 4 and back to 2.
(a) Draw a diagram to show the four regions and their monorail connections.
(b) Construct the 4 4 transition matrix A for this graph of four vertices.
(c) Using matrix multiplication in R, determine how many different sequences of four monorail rides start in Tibet and end in the Middle
Kingdom.

50

51

MATHEMATICS 23a/E-23a, Fall 2015


Linear Algebra and Real Analysis I
Module #1, Week 2 (Dot and Cross Products, Euclidean Geometry of Rn )
Authors: Paul Bamberg and Kate Penner
R scripts by Paul Bamberg
Last modified: June 16, 2015 by Paul Bamberg
Reading
Hubbard, section 1.4
Proofs to present in section or to a classmate who has done them.
~ in Euclidean Rn , prove that |~v w
~ | |~v||~
2.1 Given vectors ~v and w
w|
~ | |~v| + |~
(Cauchy-Schwarz) and that |~v + w
w| (triangle inequality). Use
the distributive law for the scalar product and the fact that no vector has
negative length.
(The standard version of this proof is in the textbook. An alternative is in
sections 1.3 and 1.4 of the Executive Summary.)
2.2 For a 3 3 matrix A, define det(A) in terms of the cross and dot
products of the columns of the matrix. Then, using the definition of matrix
multiplication and the linearity of the dot and cross products, prove that
det(AB) = det(A) det(B).

R Scripts Scripts labeled A, B, ... are closely tied to the Executive Summary.
Scripts labeled X, Y, ... are interesting examples. There is a narrated version on
the Web site. Scripts labeled L are library scripts that you may wish to include
in your own scripts.
Script 1.2A-LengthDotAngle.R
Topic 1 - Length, Dot Product, Angles
Topic 2 - Components of a vector
Topic 3 - Angles in Pythagorean triangles
Topic 4 - Vector calculation using components
Script 1.2B-RotateReflect.R
Topic 1 - Rotation matrices
Topic 2 - Reflection matrices
Script 1.2C-ComplexConformal.R
Topic 1 - Complex numbers in R
Topic 2 - Representing complex numbers by 2x2 matrices
Script 1.2D-CrossProduct.R
Topic 1 - Algebraic properties of the cross product
Topic 2 - Geometric properties of the cross product
Topic 3 - Using cross products to invert a 3x3 matrix
Script 1.2E-DeterminantProduct.R
Topic 1 - Product of 2x2 matrices
Topic 2 - Product of 3x3 matrices
Script 1.2L-VectorLibrary.R
Topic 1 - Some useful angles and basis vectors
Topic 2 - Functions for working with angles in degrees
Script 1.2X-Triangle.R
Topic 1 - Generating and displaying a randomly generated triangle
Topic 2 - Checking some formulas of trigonometry
Script 1.2Y-Angles3D.R
Topic 1 - Angles between vectors in R3
Topic 2 - Angles and distances in a cube Topic 3 - Calculating the airline
mileage between cities

Executive Summary

1.1

The dot product

The dot product of two vectors in Rn is ~x ~y = x1 y1 +x2 y2 +. . .+xn yn =

Pn

i=1

xi yi

It requires two vectors and returns a scalar.


It is commutative and it is distributive with respect to addition.
In R2 or R3 , the dot product of a vector with itself (a concept of algebra)
is equal to the square of its length (a concept of geometry):
~x ~x = |~x|2
Taking the dot product with any standard basis vector e~i extracts the corresponding component:
~x e~i = xi
Taking the dot product with any unit vector ~a (not necessarily a basis
vector) extracts the component of ~x along ~a:
~x ~a = xa
This means that the difference ~x xa~a is orthogonal to ~a.

1.2

Dot products and angles

We have the law of cosines, usually written c2 = a2 + b2 2ab cos .

~y b

~x ~y

a
~x

Consider the triangle whose sides lie along the vectors ~x(length a), ~y (length b),
and ~x ~y (length c). Let denote the angle between the vectors ~x and ~y.
By the distributive law,
(~x ~y) (~x ~y) = ~x ~x + ~y ~y 2~x ~y = c2 = a2 + b2 2~x ~y
Comparing with the law of cosines, we find that angles and dot products are
related by:
~x ~y = ab cos = |~x||~y| cos
3

1.3

Cauchy-Schwarz inequality

The dot product provides a way to extend the definition of length and angle for
vectors to Rn , but now we can no longer invoke Euclidean plane geometry to
guarantee that | cos | 1.
~ in Rn
We need to show that for any vectors ~v and w
~ | |~v||~
|~v w
w|
This is generally known as the Cauchy-Schwarz inequality.
~ into unit
For a short proof of the Cauchy-Schwarz inequality, make ~v and w
vectors and form their sum and difference.
(

~
~v
~
~v
w
w

)(

)0
|~v| |~
w|
|~v| |~
w|

~v w
~
~v w
~
0, and by algebra |
|1
|~v||~
w|
|~v||~
w|
We now have a useful definition of angle for vectors in Rn in general:
1+12

= arccos

1.4

~v w
~
|~v||~
w|

The triangle inequality

If ~x and ~y, placed head-to-tail, determine two sides of a triangle, the third side
coincides with the vector ~x + ~y.
~x + ~y
~y
~x
We need to show that its length cannot exceed the sum of the lengths of the
other two sides:
|~x + ~y| |~x| + |~y|
The proof uses the distributive law for the dot product.
|~x + ~y|2 = (~x + ~y) (~x + ~y) = (~x + ~y) ~x + (~x + ~y) ~y
Applying Cauchy-Schwarz to each term on the right-hand side, we have:
|~x + ~y|2 |~x + ~y||~x| + |~x + ~y||~y|
In the special case where |~x + ~y| = 0 the inequality is clearly true. Otherwise
we can divide by the common factor of |~x + ~y| to complete the proof.
4

1.5

Isometries of R2

A linear transformation T : R2 R2 is completely specified by its effect on the


basis vectors ~e1 and ~e2 . These vectors are the two columns of the matrix that
represents T . If you know what a transformation is supposed to do to each basis
vector, you can simply use this information to fill out the necessary columns of
its matrix representation.
Of special interest are isometries: transformations that preserve the distance
between any pair of points, and hence the length of any vector.
Since dot products can be expressed in terms of lengths, it follows that any
isometry also preserves dot products.
So the transformation T is an isometry if and only if for any pair of vectors:
T~a T ~b = ~a ~b
For the matrix associated with an isometry, both columns must be unit vectors
and their dot product is zero.
Two isometries:


cos sin
A rotation, R() =
, with det R = +1.
sin cos


cos 2 sin 2
A reflection, F () =
, with det F = 1.
sin 2 cos 2
Matrix R represents a counterclockwise rotation through angle about the
origin. Matrix F represents reflection in a line through the origin that makes an
angle with the first standard basis vector.
There are many other isometries of Euclidean geometry; translations, or rotations about points other than the origin. However, these do not hold the origin
fixed, and so they are not linear transformations and cannot represented by 2 2
matrices.
Since the composition of isometries is an isometry, the product of any number
of matrices of this type is another rotation or reflection. Remember that composition is a series of transformations acting on a vector in a specific order that
must be preserved during multiplication.

1.6

Matrices and algebra: complex numbers

The same field axioms we reviewed on the first day apply here to the complex
numbers, notated C.
The real and imaginary parts of a complex number can be used as the two
components of a vector in R2 . The rule for addition of complex numbers is the
same as the rule for addition of vectors in R2 (in that they are to be kept separate
from each other), and the modulus of a complex number is the same as the length
of the vector that represents it. So the triangle inequality applies for complex
numbers: |z1 + z2 | |z1 | + |z2 |.
This property extends to vector spaces over complex numbers.

1.7

What about complex multiplication?

The geometrical interpretation of multiplication by a complex number


z = a + ib = rei is multiplication of the modulus by r combined with addition
of to the angle with the x-axis.
This is precisely the geometrical effect of the linear transformation represented
by the matrix

 

a b
r cos r sin
=
b a
r sin r cos


r 0
Such a matrix is the product of the constant matrix
and the rotation
0 r


cos sin
matrix
.
sin cos
It is called a conformal matrix and it preserves angles even though it does not
preserve lengths.

1.8

Complex numbers as a field of matrices

In general, matrices do not form a field because multiplication is not commutative. There are two notable exceptions: n n matrices that are multiples of the
identity matrix and 2 2 conformal matrices. Since multiples of the identity
 ma
a b
trix and rotations all commute, the product of two conformal matrices
b a


c d
and
is the same in either order.
d c

1.9

The cross product


a1
b1
a2 b 3 a3 b 2
~a ~b = a2 b2 = a3 b1 a1 b3
a3
b3
a1 b 2 a2 b 1

Properties
1. ~a ~b = ~b ~a.
2. ~a ~a = 0.
3. For fixed ~a, ~a ~b is a linear function of ~b, and vice versa.
4. For the standard basis vectors, e~i e~j = e~k if i, j and k are in cyclic
increasing order (123, 231, or 312). Otherwise e~i e~j = e~k .
5. ~a ~b ~c = ~a ~b ~c. This quantity is also the determinant of the matrix
whose columns are ~a, ~b, and ~c.
6. (~a ~b) ~c = (~a ~c)~b (~b ~c)~a
7. ~a ~b is orthogonal to the plane spanned by ~a and ~b.
8. |~a ~b|2 = |~a|2 |~b|2 (~a ~b)2
9. The length of ~a ~b is |~a||~b| sin .
10. The length of ~a ~b is equal to the area of the parallelogram spanned by ~a
and ~b.

1.10

Cross product and determinants

If a 3 3 matrix A has columns ~a1 , ~a2 , and ~a3 , then its determinant det(A) =
~a1 ~a2 ~a3 .
1. det(A) changes sign if you interchange any two columns. (easiest to prove
for columns 1 and 2, but true for any pair)
2. det(A) is a linear function of each column (easiest to prove for column 3,
but true for any column)
3. For the identity matrix I, det(I) = 1.
The magnitude of ~a ~b ~c is equal to the volume of the parallelepiped spanned
by ~a, ~b and ~c.
If C = AB, then det(C) = det(A) det(B)

Lecture Outline
1. Introducing coordinates:
For three-dimensional geometry, we choose aspecific
point, the origin,

0
to correspond to the element of R3 , O = 0 . We also choose three
0
orthogonal, oriented, coordinate axes and a unit of length, which determine
the standard basis vectors. These are a right-handed basis: if you hold
your right hand so that the thumb points along ~e3 , then the fingers of your
right hand carry ~e1 into ~e2 the short way around, through 90 rather than
270 degrees. Now any point pa of Euclidean geometry can be represented
by a vector in R3 ,

a1

~a = a1~e1 + a2~e2 + a3~e3 = a2 .


a3
The length of ~a is

p
a21 + a22 + a23 . All the basis vectors have unit length.

For two-dimensional geometry, there are two alternatives.


 The simpler is
0
to make the origin correspond to an element of R2 , O =
and to choose
0
two coordinate axes. Then a point pa of Euclidean plane geometry can be
2
represented by a vector
  in R ,
p
a
~a = a1 e~1 + a2 e~2 = 1 . The length of ~a is a21 + a22 .
a2
Another way to do plane geometry is to use the plane x3 = 1. This is not
a subspace of R3 , since it does not include the zero vector. The
origin of
0
the plane corresponds to a non-zero element of R3 , p0 = 0 , and an
1

a1
arbitrary point of the plane is the element pa = a2 . Two points de1
termine a vector, whose third component
is
always
0.
The length of the
p
2
2
vector determined by pa and p0 is a1 + a2 . Now any transformation of
Euclidean plane geometry that preserves distance, even one like a translation that moves the origin, can be represented by a linear transformation
of R3 . However, only a transformation A that carries the plane x3 = 1 into
itself has geometrical significance. What does this imply about the bottom
row of the matrix A?

2. The dot product:


This is defined for vectors in Rn as
~x ~y = x1 y1 + x2 y2 + + xn yn
It has the following properties. The proof of the first four (omitted) is
brute-force computation.
Commutative law:
~x ~y = ~y ~x
Distributive law:
~x (y~1 + y~2 ) = ~x y~1 + ~x y~2
For Euclidean geometry, in R2 or R3 , the dot product of a vector
with itself (defined by algebra) is equal to the square of its length (a
physically meaningful quantity).
Taking the dot product with any standard basis vector e~i extracts the
corresponding component:
~x e~i = xi
Taking the dot product with any unit vector ~a (not necessarily a basis
vector) extracts the component of ~x along ~a:
~x ~a = xa
This means that the difference ~x xa~a is orthogonal to ~a.
Proof: Orthogonality of two vectors means that their dot product is
zero. So to show orthogonality, evaluate
(~x (~x ~a)~a) ~a.

3. Dot products and angles


From elementary trigonometry we have the law of cosines, usually written
c2 = a2 + b2 2ab cos .
In this formula, c denotes the length of the side opposite angle . Just in
case you forgot the proof, lets review it.

Angles and dot products are related by the formula


~x ~y = |~x||~y| cos
Proof (Hubbard, page 69):
Consider the triangle whose sides lie along the vectors ~x, ~y, and ~x ~y, and
let denote the angle between the vectors ~x and ~y.

c2 = (~x ~y) (~x ~y).


Expand the dot product using the distributive law, and you can identify
one of the terms as 2ab cos .

10

4. Cauchy-Schwarz inequality
The dot product provides a way to extend the definition of length and
angle for vectors to Rn , but now we can no longer invoke Euclidean plane
geometry to guarantee that | cos | 1.
~ in Rn ,
We need to show that for any vectors ~v and w
~ | |~v||~
|~v w
w|
This is generally known as the Cauchy-Schwarz inequality. Hubbard
points out that it was first published by Bunyakovsky. This fact illustrates
Stiglers Law of Eponymy:
No law, theorem, or discovery is named after its originator.
The law applies to itself, since long before Stigler formulated it, A. N.
Whitehead noted that,
Everything of importance has been said before, by someone who did not
discover it.
The best-known proof of the Cauchy-Schwarz inequality incorporates two
useful strategies.
No vector has negative length.
Discriminant of quadratic equation.
Define a quadratic function of the real variable t by
~ |2 = (t~v w
~ ) (t~v w
~)
f (t) = |t~v w
Since f (t) is the square of a length of a vector, it cannot be negative, so
the quadratic equation f (t) = 0 does not have two real roots.
But by the quadratic formula, if the equation at2 + bt + c = 0 does not have
two real roots, its discriminant b2 4ac is not positive.
Complete the proof by writing out b2 4ac 0 for quadratic function f (t).

11

So we have a useful definition of angle for vectors in Rn in general:


= arccos

~v w
~
|~v||~
w|

The function arccos(x) can be computed on your electronic calculator by


summing an infinite series. It is guaranteed to return a value between 0
and .


1
0

2
1?
Example: In R4 , what is the angle between vectors
and
1
1
0
2

5. The triangle inequality (second part of proof 2.1)


If ~x and ~y, placed head-to-tail, determine two sides of a triangle, the third
side coincides with the vector ~x +~y. We need to show that its length cannot
exceed the sum of the lengths of the other two sides:
|~x + ~y| |~x| + |~y|
The proof uses the distributive law for the dot product and the CauchySchwarz inequality.
Express |~x + ~y|2 as a dot product:

Apply the distributive law:

Use Cauchy-Schwarz to get an inequality for lengths:

Take the square root of both sides:

12

6. Proof 2.1 start to finish, done in a slightly differnt way


~ in Euclidean Rn , prove that |~v w
~ | |~v||~
Given vectors ~v and w
w| (Cauchy~ | |~v| + |~
Schwarz) and that |~v + w
w| (triangle inequality). Use the distributive law for the scalar product and the fact that no vector has negative
length.

13

7. Some short proofs that use the dot product:


(a) A triangle is formed by using vectors ~x and ~y, both anchored at one
vertex. The vectors are labeled so that the longer one is called ~x: i.e.
|~x| > |~y|. The vector ~x ~y then lies along the third side of the triangle.
Prove that
|~x ~y| |~x| |~y|.
~x
~x ~y
~y

(b) Prove that the dot product of vectors ~x and ~y can be expressed solely
in terms of lengths of vectors. It follows that an isometry, which by
definition preserves lengths of all vectors, also preserves dot products
and angles.

(c) A parallelogram has sides with lengths a and b. Its diagonals have
lengths c and d, Prove the parallelogram law, which states that
c2 + d2 = 2(a2 + b2 ).

14

8. Calculating angles and areas




2
4

Let ~v1 = 2 , ~v2 = 1 .


1
1

1

Both these vectors happen to be perpendicular to the vector ~v3 = 2 .


2
(a) Determine the angle between ~v1 and ~v2 .
(b) Determine the volume of the parallelepiped spanned by ~v1 , ~v2 , and ~v3 ,
and thereby determine the area of the parallelogram spanned by ~v1 and ~v2 .

15

9. Isometries of R2 .
A linear transformation T : R2 R2 is completely specified by its effect
on the basis vectors ~e1 and ~e2 . These vectors are the two columns of the
matrix that represents T .
Of special interest are isometries: transformations that preserve the distance between any pair of points, and hence the length of any vector.
Since
4~a ~b = |~a + ~b|2 |~a ~b|2 ,
dot products can be expressed in terms of lengths, and any isometry also
preserves dot products.
Prove this useful identity.

So T is an isometry if and only if


T~a T ~b = ~a ~b for any pair of vectors.
This means that the first column of T must be a unit vector, which can be
written without any loss of generality as


cos
.
sin
The second column must also be a unit vector, and its dot product with
the first column must be zero. So there are only two possibilities:
A rotation,


cos sin
R() =
,
sin cos
which has det R = 1.
A reflection,



cos 2 sin 2
F () =
,
sin 2 cos 2
which has det F = 1.
This represents reflection in a line through the origin that makes an
angle with the first basis vector.
Since the composition of isometries is an isometry, the product of any number of matrices of this type is another rotation or reflection.

16

10. Using matrices to represent rotations and reflections


(a) Use matrix multiplication to show that if a counterclockwise rotation
though angle is followed by a counterclockwise rotation though angle
, the net effect is a counterclockwise rotation though angle + .
(The proof requires some trig identities that you can rederive, if you
ever forget them, by doing this calculation.)
(b) Confirm, both by geometry and by matrix multiplication, that if you
reflect a point P first in the line y = 0, then in the line y = x, the net
effect is to rotate the point counterclockwise through 90 .

17

11. Complex numbers as vectors and as matrices


The field axioms thst you learned on the first day apply also to the complex
numbers, notated C.
The real and imaginary parts of a complex number can be used as the two
components of a vector in R2 . The rule for addition of complex numbers
is the same as the rule for addition of vectors in R2 , and the modulus of a
complex number is the same as the length of the vector that represents it.
So the triangle inequality applies for complex numbers: |z1 +z2 | |z1 |+|z2 |.
This property extends to vector spaces over complex numbers.
The geometrical interpretation of multiplication by a complex number
z = a + ib = rei is multiplication of the modulus by r combined with
addition of to the angle with the x-axis.
This is precisely the geometrical effect of the linear transformation represented by the matrix

 

a b
r cos r sin
=
b a
r sin r cos


r 0
Such a matrix is the product of the constant matrix
and the rotation
0 r


cos sin
matrix
.
sin cos
It is called a conformal matrix and it preserves angles even though it
does not preserve lengths.
Example: Compute the product of the complex numbers 2 + i and 3 + 1 by
useing matrix multiplication.

18

12. Complex numbers as a field of matrices


In general, matrices do not form a field because multiplication is not commutative. There are two notable exceptions: n n matrices that are multiples of the identity matrix and 2 2 conformal matrices. Since multiples
of the identity matrix
and
all

 rotations

 commute, the product of two cona b
c d
formal matrices
and
is the same in either order.
b a
d c

19

13. Cross products:


At this point it is inappropriate to try to define the determinant of an n n
matrix. For n = 3, however, anything that can be done with determinants
can also be done with cross products, which are peculiar to R3 . So we will
start with cross products:
Definition:

a1
b1
a2 b 3 a3 b 2
a2 b2 = a3 b1 a1 b3
a3
b3
a1 b 2 a2 b 1
Since this is a computational definition, the way to prove the following
properties is by brute-force computation.
(a) ~a ~b = ~b ~a.
(b) ~a ~a = ~0.
(c) For fixed ~b, ~a ~b is a linear function of ~b, and vice versa.
(d) For the standard basis vectors, e~i e~j = e~k if i, j and k are in cyclic
increasing order (123, 231, or 312). Otherwise e~i e~j = e~k .
You may find it easiest to calculate cross products in general as
(a1 e~1 + a2 e~2 + a3 e~3 ) (b1 e~1 + b2 e~2 + b3 e~3 ),
using the formula for the cross products of basis vectors. Try this
approach for


2
0
~

~a = 1 , b = 1 .
0
3

(e) ~a~b~c = ~a~b~c. No parentheses are necessary, because the operations


only make sense if the cross product is done first. This quantity is also
the determinant of the matrix whose columns are ~a, ~b, and ~c.
(f) (~a ~b) ~c = (~a ~c)~b (~b ~c)~a
Physicists, memorize this formula ! The vector in the middle gets the
plus sign.
14. Geometric properties of the cross product:

20

We can now prove these without messy calculations involving components.


Justify each step, using properties of the dot product and properties (a)
through (f ) from the preceding page.
~a ~b is orthogonal to the plane spanned by ~a and ~b.
Proof: Let ~v = s~a + t~b be a vector in this plane. Then
~v ~a ~b = s~a ~a ~b + t~b ~a ~b
~v ~a ~b = s~a ~a ~b t~b ~b ~a
~v ~a ~b = s~a ~a ~b t~b ~b ~a
~v ~a ~b = 0 0 = 0.
|~a ~b|2 = |~a|2 |~b|2 (~a ~b)2
Proof:
|~a ~b|2 = (~a ~b) (~a ~b)
|~a ~b|2 = ((~a ~b) ~a) ~b
|~a ~b|2 = ((~a ~a)~b (~a ~b)~a) ~b
|~a ~b|2 = (~a ~a)(~b ~b) (~a ~b)(~a ~b)
|~a ~b|2 = |~a|2 |~b|2 (~a ~b)2
The length of ~a ~b is |~a||~b| sin .
Proof:
|~a ~b|2 = |~a|2 |~b|2 (1 cos2 ) = |~a|2 |~b|2 (sin2 )
The length of ~a ~b is equal to the area of the parallelogram spanned
by ~a and ~b.
Proof: |~a| is the base of the parallelogram and |~b| sin is its height.
Draw a diagram to illustrate this property.

21

15. Cross products and determinants.


You should be familiar with 2 2 and 3 3 determinants from high-school
algebra. The general definition of the determinant, to be introduced in the
spring term, underlies the general technique for calculating volumes in Rn
and will be used to define differential forms.
 
 
a1
b
If a 2 2 matrix A has columns
and 1 , then its determinant
a2
b2
det(A) = a1 b2 a2 b1 .
Equivalently,


0
b1
a1

a2 b2 =

0

a b
0
0
det 1 1
a2 b 2
You can think of the determinant as a function of the entire matrix A or
as a function of its two columns.
Matrix A maps the unit square, spanned by the two standard basis vectors,
into a parallelogram whose area is | det(A)|.
Lets prove this for the case where all the entries of A are positive and
det(A) > 0. The area of the parallelogram formed by the columns of A is
twice the area of the triangle that has these columns as two of its sides.
The area of this triangle can be calculated in terms of elementary formulas
for areas of rectangles and right triangles.

22

16. Determinants in R3
Here is our definition:
If a 3 3 matrix A has columns ~a1 , ~a2 , and ~a3 , then its determinant
det(A) = ~a1 ~a2 ~a3 .
Apply this

1 0

A= 2 1
0 1

definition to the matrix

1
2 .
0

Check the following properties of the definition.


(a) det(A) changes sign if you interchange any two columns. (easiest to
prove for columns 1 and 2, but true for any pair)

(b) det(A) is a linear function of each column (easiest to prove for column
3, but true for any column)

(c) For the identity matrix I, det(I) = 1.

23

17. Determinants, triple products, and geometry


The magnitude of ~a ~b ~c is equal to the volume of the parallelepiped
spanned by ~a, ~b and ~c.
Proof: ~a ~b is the area of the base of the parallelepiped, and |~c| cos ,
where is the angle between ~c and the direction orthogonal to the base, is
its height.

Matrix A maps the unit cube, spanned by the three basis vectors, into a
parallelepiped whose volume is | det(A)|. You can think of | det(A)| as a
volume stretching factor. This interpretation will underly much of the
theory for change of variables in multiple integrals, a major topic in the
spring term.
If three vectors in R3 all lie in the same plane, the cross product of any
two of them, which is orthogonal to that plane, is orthogonal to the third
vector, so ~v1 ~v2 ~v3 = 0.



1
1
3

Apply this test to ~v1 = 0 , ~v2 = 2 , ~v3 = 2 .


1
0
2

If four points in R3 all lie in the same plane, the vectors that join any one
of the points to
each
allliein that plane. Apply
of theother
three
points

1
2
2
4
this test to p = 1 , q = 1 , r = 3 , s = 3 .
1
2
1
3

24

18. Determinants and matrix multiplication


If C = AB, then det(C) = det(A) det(B)
This useful result is easily proved by brute force for 2 2 matrices, and a
brute-force proof in Mathematica would be valid for 3 3 matrices. Here
is a proof that relies on properties of the cross product.
Recall that each column of a matrix is the image of a standard basis vector.
Consider the first column of the matrix C = AB, and exploit the fact that
A is linear.
3
3
3
X
X
X
~
~c1 = Ab1 = A(
bi,1~ei ) =
bi,1 A(~ei )) =
bi,1~ai .
i=1

i=1

i=1

The same is true of the second and third columns.


Now consider det C = c~1 c~2 c~3 .
3
3
3
X
X
X
det C = (
bi,1 a~i ) (
bj,2 a~j ) (
bk,3 a~k )
i=1

j=1

k=1

Now use the distributive law for dot and cross products.

det C =

3
X

bi,1

i=1

3
X

bj,2

j=1

3
X

bk,3 (~
ai a~j a~k )

k=1

There are 27 terms in this sum, but all but six of them involve two subscripts
that are equal, and these are zero because a triple product with two equal
vectors is zero.
The six that are not zero all involve ~a1 ~a2 ~a3 , three with a plus sign and
three with a minus sign. So
det C = f (B)(~a1 ~a2 ~a3 ) = f (B) det(A), where f (B) is some messy
function of products of all the entries of B.
This formula is valid for any A. In particular, it is valid when A is the
identity matrix, C = B, and det(A) = 1.
So det B = f (B) det(I) = f (B)
and the messy function is the determinant!

25

19. Proof 2.2 start to finish


For a 3 3 matrix A, define det(A) in terms of the cross and dot products of the columns of the matrix. Then, using the definition of matrix
multiplication and the linearity of the dot and cross products, prove that
det(AB) = det(A) det(B).

26

20. Isometries of R2 .
A linear transformation T : R2 R2 is completely specified by its effect
on the basis vectors ~e1 and ~e2 . These vectors are the two columns of the
matrix that represents T .
Of special interest are isometries: transformations that preserve the distance between any pair of points, and hence the length of any vector.
Since
4~a ~b = |~a + ~b|2 |~a ~b|2 ,
dot products can be expressed in terms of lengths, and any isometry also
preserves dot products.
Prove this useful identity.

So T is an isometry if and only if


T~a T ~b = ~a ~b for any pair of vectors.
This means that the first column of T must be a unit vector, which can be
written without any loss of generality as


cos
.
sin
The second column must also be a unit vector, and its dot product with
the first column must be zero. So there are only two possibilities:
A rotation,


cos sin
R() =
,
sin cos
which has det R = 1.
A reflection,



cos 2 sin 2
F () =
,
sin 2 cos 2
which has det F = 1.
This represents reflection in a line through the origin that makes an
angle with the first basis vector.
Since the composition of isometries is an isometry, the product of any number of matrices of this type is another rotation or reflection.

27

21. Calculations with cross products


(a) Prove the identity
|~a ~b|2 = |~a|2 |~b|2 (~a ~b)2

(b) Prove that


|~a ~b| = |~a||~b| sin ,
where is the angle between vectors ~a and ~b.

28

22. Transposes and dot products


Start by proving in general that (AB)T = B T AT . This is a statement about
matrices, and you have to prove it by brute force.

~ can also be written in terms of matrix


The dot product of vectors ~v and w
multiplication as
~v w
~ = ~vT w
~
~ as an m 1 matrix.
where we think of ~vT as a 1 m matrix and think of w
The product is a 1 1 matrix, so it equals its own transpose.
~ . This theorem lets you move a matrix from
Prove that ~v A~
w = AT ~v w
one factor in a dot product to the other, as long as you replace it by its
transpose.

29

23. Orthogonal matrices


If a matrix R represents an isometry, then each column is a unit vector and
the columns are orthogonal. Since the columns of R are the rows of RT we
can express this property as
RT R = I
Perhaps a nicer way to express this condition for a matrix to represent an
isometry is RT = R1 . Check that this is true for the 2 2 matrices that
represent rotations and reflections.
For a rotation matrix



cos sin
R() =
.
sin cos

For a reflection matrix





cos 2 sin 2
F () =
.
sin 2 cos 2

30

24. Isometries and cross products


Many vectors of physical importance (torque, angular momentum, magnetic
field) are defined as cross products, so it is useful to know what happens to
a cross product when an isometry is applied to each vector in the product.
~.
Consider the matrix whose columns are R~u, R~v, and w
Multiply this matrix by RT to get a matrix whose columns are
~ . In the process you multiply the determinant by
RT R~u, RT R~v, and RT w
T
det(R ) = det(R).
~ = det(R)R~u R~v w
~
Now, since RT R = I for an isometry, ~u ~v RT w
~ = det(R)R~u R~v w
~.
Equivalently, R(~u ~v) w
~ , in particular for any basis vector, it follows
Since this is true for any w
that
R(~u ~v) = det(R)R~u R~v
If R is a rotation, then det(R) = 1 and R(~u ~v) = R~u R~v
If R is a reflection, then det(R) = 1 and R(~u ~v) = R~u R~v
This is reasonable. Suppose you are watching a physicist in a mirror as she
calculates the cross product of two vectors. You see her apparently using
a left-hand rule and think that she has got the sign of the cross-product
wrong.

31

25. Using cross products to invert a 3 3 matrix


Thinking about transposes also leads to a formula for the inverse of a 3 3
matrix in terms of cross products. Suppose that matrix A has columns
~a1 , ~a2 , and ~a3 . Form the vector ~s1 = ~a2 ~a3 .
This is orthogonal to ~a2 and ~a3 , and its dot product with ~a1 is det(A).
Similarly, the vector ~s2 = ~a3 ~a1
is orthogonal to ~a3 and ~a1 , and its dot product with ~a2 is det(A).
Finally, the vector ~s3 = ~a1 ~a2
is orthogonal to ~a1 and ~a2 , and its dot product with ~a3 is det(A).
So if you form these vectors into a matrix S and take its transpose,
S T A = det(A)I.
If det A = 0, A has no inverse. Otherwise
A1 =

ST
.
det(A)

You may have learned this rule in high-school algebra in terms of 2 2


determinants.
Summarize the proof that this recipe is correct.

32

33

Group Problems
1. Dot products, angles, and isometries
(a) Making the reasonable assumption that a rotation though angle 2 can
be accomplished by making two successive rotations through angle ,
use matrix multiplication to derive the double-angle formulas for the
sine and cosine functions.
~ . Using the dot
(b) Consider a parallelogram spanned by vectors ~v and w
product, prove that it is a rhombus if and only if the diagonals are
perpendicular and that it is a rectangle if and only if the diagonals are
equal in length.
(c) A parallelogram is spanned by two vectors that meet at a 60 degree
angle, one of which is twice as long as the other. Find the ratio of the
lengths of the diagonals and the cosine of the acute angle between the
diagonals. Confirm that the parallelogram law holds in this case.
2. Proofs that involve cross products
(a) Consider a parallelepiped whose base is a parallelogram spanned by
two unit vectors, anchored at the origin, with a 60 degree angle between them. The third side leaving the origin, also a unit vector,
makes a 60 degree angle with each of the other two sides, so that each
face is made of of a pair of equilateral triangles. Using dot and cross
products, show that the angle between the third side and a line that

bisects the angle between the other two sides satisfies cos = 1/ 3
and that the volume of this parallelepiped is 12 .
(b) Using the definition det(A) = ~a1 ~a2 ~a3 and properties of the dot and
cross products, prove that the determinant of a 3 3 matrix changes
sign if you swap the first column with the third column.
(c) Prove that the cross product, although not associative, satisfies the
Jacobi identity
(~a ~b) ~c + (~b ~c) ~a + (~c ~a) ~b = 0.

34

3. Problems that involve writing or editing R scripts


(a) Construct a triangle where vector AB has length 5 and is directed east,
while vector AC has length 10 and is drected 53 degrees north of east.
On side BC, construct point D that is 1/3 of the way from B to C.
Using dot products, confirm that the vector AD bisects the angle at
A.
This is a special case of Euclids Elements, Book VI, Proposition 3.
(b) You are playing golf, and the hole is located 350 yards from the tee in
a direction 18 degrees south of east. You hit a tee shot that travels 220
yards 14 degrees south of east, followed by an iron shot that travels
150 yards 23 degrees south of east. How far from the hole is your golf
ball now located?
(c) Generate a triangle using the function in the vector library 1.2LVectorLibrary.R, then apply to each vertex of this triangle the conformal matrix C that corresponds to the complex number 1.2 + 1.6i.
Plot the triangle before and after C is applied, and confirm that these
triangles are similar but not congruent.

35

Homework

In working on these problems, you may collaborate with classmates and consult
books and general online references. If, however, you encounter a posted solution
to one of the problems, do not look at it, and email Paul, who will try to get it
removed.
1. One way to construct a regular pentagon
O

Take five ball-point pens or other objects of equal length(call it 1) and


arrange them symmetrically, as shown in the diagram above, so that O, A, C
and O, B, D are collinear and |OC| = |OD|. Let AO = ~v, |BO| = |~v|,
~ , CA = x~v, |DB| = x|~v|.
CD = w
~ . By using the
(a) Express vectors AD and OB in terms of x, ~v, and w
~ , get two
fact that these vectors have the same length 1 as ~v and w
~ . (Use the distributive law for the dot
equations relating x and ~v w
product).
~ . Show that
(b) Eliminate x to find a quadratic equation satisfied by ~v w
~
~
the angle between v and w satisfies the equation
. (In case you have forgotsin 3 = sin 2 and that therefore = 2
5
2
ten, sin 3 = sin (4 cos 1)).
(c) Explain how, given five identical ball-point pens, you can construct a
regular pentagon. (Amazingly, the obvious generalization with seven
pens lets you construct a regular heptagon. Crockett Johnson claims
to have discovered this fact while dining with friends in a restaurant
in Italy in 1975, using a menu, a wine list, and seven toothpicks)

36

2. One vertex of a quadrilateral in R3 is located at point p. The other three


vertices, going around in order, are located at q = p + ~a, r = p + ~b, and
s = p + ~c.
(a) Invent an expression involving cross products that is equal to zero if
and only if the four vertices of the quadrilateral lie in a plane. (See
section problem 2 for a special case).
(b) Prove that the midpoints of the four sides pq, qr, rs, and sp are the
vertices of a parallelogram.
3. Isometries and dot products
The transpose of a (column) vector ~v is a row vector ~vT , which is also a
1 n matrix.
~ are vectors in Rn and A is an n n matrix.
Suppose that ~v and w
(a) Prove that ~v A~
w = ~vT A~
w. (You can think of the right-hand side as
the product of three matrices.)
~ . You can do this by brute force using
(b) Prove that ~v A~
w =AT ~v w
summation notation, or you can do it by using part (a) and the rule
for the transpose of a matrix product (Therem 1.2.17 in Hubbard).
~ are vectors in R3 and R is an 33 isometry
(c) Now suppose that ~v and w
~ = ~v w
~ . If you believe that physical laws
matrix. Prove that R~v Rw
should remain valid when you rotate your epcerimental apparatus, this
result shows that dot products are appropriate to use in expressing
physical laws.
4. Using vectors to prove theorems of trigonometry.
(a) For vectors ~a and ~b,
~a ~b = |~a||~b| sin , where is the angle between the vectors.
~ , and ~v w
~,
By applying this formula to a triangle whose sides are ~v, w
prove the Law of Sines.
~.
(b) Consider a parallelogram spanned by vectors ~v and w
~.
Its diagonal is ~v + w
Let denote the angle between ~v and the diagonal ; let denote the
~ and the diagonal. By expressing sines and cosines in
angle between w
terms of cross products, dot products, and lengths of vectors, prove
the addition formula
sin( + ) = sin cos + cos sin .

37

5. Let R() denote the 22 matrix that represents a counterclockwise rotation


about the origin through angle . Let F () denote the 2 2 matrix that
represents a reflection in the line through the origin that makes angle with
the x axis. Using matrix multiplication and the trigonometric identities
sin ( + ) = sin cos + cos sin
cos ( + ) = cos cos sin sin , prove the following:
(a) F ()F () = R(2( )).
(b) F ()F ()F () = F ( + ). (You might want to work problem 7
first.)
(c) The product of any even number of reflections in lines through the
origin is a rotation about the origin and the product of any odd number
of reflections in lines through the origin is a reflection in a line through
the origin. (Hint: use induction. First establish the base cases n = 1
and n = 2. Then do the inductive step: show that if the result is
true for the product of n reflections, it is true for n + 2 reflections.)
6. Matrices that represent complex numbers
(a) Confirm that i2 = 1 using conformal matrices.
(b) Represent 4 + 2i as a matrix. Square it and interpret its result as
a complex number. Confirm your answer by checking what you get
when expanding algebraically.
(c) Show that using matrices to represent complex numbers still preserves
addition as we would expect.
That is, write two complex numbers as matrices. Then add the matrices, and interpret the sum as a complex number. Confirm your answer
is correct algebraically.

38

The last two problems require R scripts. Feel free to copy and edit existing
scripts, including student solutions to group problem 3b, and to use the
library script 2l, which has functions for dealing with angles in degrees.
7. Vectors in two dimensions
(a) You are playing golf and have made a good tee shot. Now the hole is
located only 30 yards from your ball, in a direction 32 degrees north
of east. You hit a chip shot that travels 25 yards 22 degrees north of
east, followed by a putt that travels 8 yards 60 degrees north of east.
How far from the hole is your golf ball now located? For full credit,
include a diagram showing the relevant vectors.
(b) The three-reflections theorem, whose proof was problem 5b, states that
if you reflect successively in lines that make angle , , and with
the xaxis, the effect is simply to reflect in a line that makes angle
+ with the x-axis. Confirm this, using R, for the case where
= 40 , = 30 , and = 80 . Make a plot in R to show where the
point P = (1, 0) ends up after each of the three successive rotations.
8. Vectors in three dimensions (see script 2Y, topic 3)
The least expensive way to fly from Boston (latitude 42.36 N, longitude
71.06 W) to Naples (latitude 40.84 N, longitude 14.26 E) is to buy a ticket
on Aer Lingus and change planes in Dublin (latitude 53.35 S, longitude
6.26 W). Since Dublin is more than 10 degrees further north than either
Boston or Naples, it is possible that the stop in Dublin might lengthen the
journey substantially.
(a) Construct unit vectors in R3 that represent the positions of the three
cities.
(b) By computing angles between these vectors, compare the length in
kilometers of a nonstop flight with the length of a trip that stops
in Dublin. Remember that, by the original definition of the meter,
the distance from the North Pole to the Equator along the meridian
through Paris is 10,000 kilometers. (You may treat the Earth as a
sphere of unit radius.)
(c) Any city that is on the great-circle route from Boston to Naples has a
vector that lies in the same plane as the vectors for Boston and Naples.
Invent a test for such a vector (you may use either cross products or
determinants), and apply it to Dublin.

39

MATHEMATICS 23a/E-23a, Fall 2015


Linear Algebra and Real Analysis I
Module #1, Week 3 (Row Reduction, Independence, Basis)
Authors: Paul Bamberg and Kate Penner
R scripts by Paul Bamberg
Last modified:June 18, 2015 by Paul Bamberg
Reading
Hubbard, Sections 2.1 through 2.5
Proofs to present in section or to a classmate who has done them.
3.1. Prove that in Rn , n + 1 vectors are never linearly independent and
n 1 vectors never span. Explain how these results show that a matrix
that is not square cannot be invertible.
You may use illustrations with row reduction for a specific value of n, but
your argument must be independent of the value of n.
You may use that fact that any matrix can be row reduced by multiplying
it on the left by a product of invertible elementary matrices.
3.2. Equivalent descriptions of a basis:
Prove that a maximal set of linearly independent vectors for a subspace of
Rn is also a minimal spanning set for that subspace.

R Scripts
Script 1.3A-RowReduction.R
Topic 1 - Row reduction to solve two equations, two unknowns
Topic 2 - Row reduction to solve three equations, three unknowns
Topic 3 - Row reduction by elementary matrices
Topic 4 - Automating row reduction in R
Topic 5 - Row reduction to solve equations in a finite field
Script 1.3B-RowReductionApplications.R
Topic 1 - Testing for linear independence or dependence
Topic 2 - Inverting a matrix by row reduction
Topic 3 - Showing that a given set of vectors fails to span Rn
Topic 4 - Constructing a basis for the image and kernel
Script 1.3C-OrthonormalBasis.R
Topic 1 - Using Gram-Schmidt to construct an orthonormal basis
Topic 2 - Making a new orthonormal basis for R3
Topic 3 - Testing the cross-product rule for isometries
Script 1.3P-RowReductionProofs.R
Topic 1 - In Rn , n + 1 vectors cannot be independent
Topic 2 - In Rn , n 1 vectors cannot span
Topic 3 - An invertible matrix must be square

Executive Summary

1.1

Row reduction for solving systems of equations

When you solve the equation A~v = ~b you combine the matrix A and the vector
~b into a single matrix. Here is a simple example.
x + 2y = 7, 2x + 5y = 16

 
 
1 2
x ~
7
Then A =
, ~v =
,b=
, so that A~v = ~b exactly corresponds
2 5
y
16


1 2 7
to our system of equations. Our matrix of interest is therefore
2 5 16
First,
subtract
twice
row
1
from
row
2,
then
subtract
twice
row
2 from row 1


1 0 3
to get
0 1 2
Interpret the result as a pair of equations (remember what each column corresponded to when we first appended A and ~b together: x = 3, y = 2.
The final form we are striving for is row-reduced echelon form, in which


The leftmost nonzero entry in every row is a pivotal 1.


Pivotal 1s move to the right as you move down the matrix.
A column with a pivotal 1 has 0 for all its other entries.
Any rows with all 0s are at the bottom.
The row-reduction algorithm converts a matrix to echelon form. Briefly,
1. SWAP rows, if necessary, so that the leftmost column that is not all zeroes
has a nonzero entry in the first row.
2. DIVIDE by this entry to get a pivotal 1.
3. SUBTRACT multiples of the first row from the others to clear out the rest
of the column under the pivotal 1.
4. Repeat these steps to get a pivotal 1 in the next row, with nothing but
zeroes elsewhere in the column (including in the first row). Continue until
the matrix is in echelon form.
A pivotal 1 in the final column indicates no solutions. A bottom row full of
zeroes means that there are infinitely many solutions.
Row reduction can be used to find the inverse of a matrix. By appending
the appropriately sized identity matrix, row reducing will give the inverse of the
matrix.
3

1.2

Row reduction by elementary matrices

Each basic operation in the row-reduction algorithm for a matrix A can be


achieved by multiplication on the left by an appropriate invertible elementary
matrix.
Type 1: Multiplying the kth row by a scalar m is accomplished by an elementary matrix formed by starting with the identity matrix and replacing
the kth element of the diagonal by the scalar m.

1 0 0
Example: E1 = 0 3 0 multiplies the second row of matrix A by 3.
0 0 1
Type 2: Adding b times the jth row to the kth row is accomplished by an
elementary matrix formed by starting with the identity matrix and changing
the jth element in the kth row for 0 to the scalar b.

1 3 0
Example: E2 = 0 1 0 adds three times the second row of matrix A to
0 0 1
the first row.
You want to multiply the second row of A by 3, so the 3 must be in the
second column of E2 . Since the 3 is in the first row of E2 , it will affect the
first row of E2 A.
Type 3: Swapping row j with row k is accomplished by an elementary
matrix formed by starting with the identity matrix, changing the jth and
kth elements on the diagonal to 0, and changing the entries in row j, column
k and in row k, column j from 0 to 1.

0 0 1
Example: E3 = 0 1 0 swaps the first and third rows of matrix A.
1 0 0
Suppose that A|I row-reduces to A0 |B. Then EA = A0 and EI = B, where
E = Ek E2 E1 is a product of elementary matrices. Since each elementary
matrix is invertible, so is E. Clearly E = B, which means that we can construct
E during the row-reduction process by appending the identity matrix I to the
matrix A that we are row reducing.
If matrix A is invertible, then A0 = I and E = A1 . However, the matrix
E is invertible even when the matrix A is not invertible. Remarkably, E is also
unique: it comes out the same even if you carry out the steps of the row-reduction
algorithm in a non-standard order.

1.3

Row reduction for determining linear independence

Given a set of elements such as {a1 , a2 , a3 , a4 }, a linear combination is the name


given to any arbitrary sum of scalar multiples of those elements. For instance:
a1 2a2 + 4a3 5a4 is a linear combination of the above set.
Given some set of vectors, we describe the set as linearly independent if
none of the vectors can be written as a linear combination of the others. Similarly,
we describe the set as linearly dependent if one or more of the vectors can be
written as a linear combination of the others.
A subspace is a set of vectors (usually an infinite number of them) that is
closed under addition and scalar multiplication. Closed means that the sum
of any two vectors in the set is also in the set and any scalar multiple of a vector
in the set is also in the set. A subspace of F n is the set of all possible linear
combinations of some set of vectors. This set is said to span or to generate the
subspace
A subspace W F n has the following properties:
1. The element ~0 is in W .
~ is also in W .
2. For any two elements ~u, ~v in W , the sum ~u + w
3. For any element ~v in W and any scalar c in F , the element c~v is also in W .
A basis of a vector space or subspace is a linearly independent set that spans
that space.
The definition of a basis can be stated in three equivalent ways, each of which
implies the other two:
a) It is a maximal set of linearly independent vectors in V : if you add any
other vector in V to this set, it will no longer be linearly independent.
b) It is a minimal spanning set: it spans V , but if you remove any vector from
this set, it will no longer span V .
c) It is a set of linearly independent vectors that spans V .
The number of elements in a basis for a given vector space is called the dimension
of the vector space. A subspace has at most the same dimension as the space of
which it is a subspace.
By creating a matrix whose columns are the vectors in a set and row reducing,
we can find a maximal linearly independent subset, namely the columns that
become columns with pivotal 1s. Any column that becomes a column without a
pivotal 1 is a linear combination of the columns to its left.

1.4

Finding a vector outside the span

To show that a set of vectors {~v1 , ~v2 , , ~vk } does not span F n , we must exhibit
~ that is not a linear combination of the vectors in the given set.
a vector w
Create an n k matrix A whose columns are the given vectors.
Row-reduce this matrix, forming the product E of the elementary matrices
that accomplish the row reduction.
If the original set of vectors spans F n , the row-reduced matrix EA will
have n pivotal columns. Otherwise it will have fewer than n pivotal 1s, and
there will be a row of zeroes at the bottom. If that is the case, construct
~ = E 1~en .
the vector w
Now consider what happens when you row reduce the matrix A|~
w. The
~ is independent
last column will contain a pivotal 1. Therefore the vector w
of the columns to its left: it is not in the span of the set {~v1 , ~v2 , , ~vk } .
If k < n, then matrix A has fewer than n columns, so the matrix EA has
fewer than n pivotal columns and must have a row of zeroes at the bottom. It
~ = E 1~en can be constructed and that a set of fewer
follows that the vector w
than n vectors cannot span F n .

1.5

Image, kernel, and the dimension formula

Consider linear transformation T : Rn Rm , represented by matrix [T ].


The image of T , Img T , is the set of vectors that lie in the subspace spanned
by the columns of [T ].
Img T is a subspace of Rm . Its dimension is r, the rank of matrix [T ].
A solution to the system of equations T (~x) = ~b is guaranteed to exist
(though it may not be unique) if and only if Img T is m-dimensional.
To find a basis for Img T , use the columns of the matrix [T ] that become
pivotal columns as a result of row reduction.
The kernel of T , Ker T , is the set of vectors ~x for which T (~x) = 0.
Ker T is a subspace of Rn .
A system of equations T (~x) = ~b has a unique solution (though perhaps no
solution exists) if and only if Ker T is zero-dimensional.
There is an algorithm (Hubbard pp 196-197) for constructing an independent vector in Ker T from each of the n r nonpivotal columns of [T ].
Since dim Img T = r and dim Ker T = n r,
dim Img T + dim Ker T = n (the rank-nullity theorem.)
6

1.6

Linearly independent rows

Hubbard (page 200) gives two arguments that the number of linearly independent
rows of a matrix equals its rank. Here is yet another.
Swap rows to put a nonzero row as the top row. Then swap a row that is
linearly independent of the top row into the second position. Swap a row that is
linearly independent of the top two rows into the third position. Continue until
the top r rows are a linearly independent set, while each of the bottom m r
rows is a linear combination of the top r rows.
Continuing with elementary row operations, subtract appropriate multiples
of the top r rows from each of the bottom rows in succession, reducing it to
zero. (Easy in principle but hard in practice!). The top rows, still untouched,
are linearly independent, so there is no way for row reduction to convert any of
them to a zero row. In echelon form, the matrix will have r pivotal 1s: rank r.
It follows that r is both the number of linearly independent columns and the
number of linearly independent rows: the rank of A is equal to the rank of its
transpose AT .

1.7

Orthonormal basis

A basis is called orthogonal if any two distinct vectors in the basis have a dot
product of zero. If, in addition, each basis vector is a unit vector, then the basis
is called orthonormal.
Given any basis {~v1 , ~v2 , , ~vk } of a subspace W and any vector ~x W , we
can express ~x as a linear combination of the basis vectors:
~x = c1~v1 + c2~v2 + + ck ~vk ,
but determining the coefficients requires row reducing a matrix.
If the basis {~v1 , ~v2 , , ~vk } is orthonormal, just take the dot product with ~vi
to determine that ~x ~vi = ci .
We can convert any spanning set of vectors into a basis. Here is the algorithm,
sometimes called the Gram-Schmidt process.
~ 1 : divide it by its length to make the first basis vector ~v1 .
Choose any vector w
~ 2 that is linearly independent of ~v1 and subtract off a multiple
Choose any vector w
of ~v1 to make a vector ~x that is orthogonal to ~v1 .
~x = w
~ 2 (~
w2 ~v1 )~v1
Divide this vector by its length to make the second basis vector ~v2 .
~ 3 that is linearly independent of ~v1 and ~v2 , and subtract off
Choose any vector w
multiples of ~v1 and ~v2 to make a vector ~x that is orthogonal to both ~v1 and ~v2 .
~x = w
~ 3 (~
w3 ~v1 )~v1 (~
w3 ~v2 )~v2
Divide this vector by its length to make the third basis vector ~v3 .
Continue until you can no longer find any vector that is linearly independent of
your basis vectors.

Lecture Outline
1. Row reduction
This is just an organized version of the techniques for solving simultaneous
equations that you learned in high school.
When you solve the equation A~x = ~b you combine the matrix A and the
vector ~b into a single matrix. Here is a simple example.
The equations are
x + 2y = 7
2x + 5y = 16.



 
1 2 ~
7
Then A =
,b=
,
2 5
16



1 2 7
and we must row-reduce the 2 3 matrix
.
2 5 16
First, subtract twice row 1 from row 2 to get

Then subtract twice row 2 from row 1 to get

Interpret the result as a pair of equations:

Solve these equations (by inspection) for x and y

You see the general strategy. First eliminate x from all but the first equation, then eliminate y from all but the second, and keep going until, with
luck, you have converted each row into an equation that involves only a
single variable with a coefficient of 1.

2. Echelon Form
The result of row reduction is a matrix in echelon form, whose properties
are carefully described on p. 165 of Hubbard (definition 2.1.5). Here is
Hubbards messiest example:

0 1 3 0 0 3 0 4
0 0 0 1 2 1 0 1 .
0 0 0 0 0 0 1 2
Key properties:
The leftmost nonzero entry in every row is a pivotal 1.
Pivotal 1s move to the right as you move down the matrix.
A column with a pivotal 1 has 0 for all its other entries.
Any rows with all 0s are at the bottom.
If a matrix is not is echelon form, you can convert it to echelon form by
applying one or more of the following row operations.
(a) Multiply a row by a nonzero number.
(b) Add (or subtract) a multiple of one row from another row.
(c) Swap two rows.
Here are the whats wrong? examples from Hubbard. Find row operations that fix them.

1 0 0 2
0 0 1 1 .
0 1 0 1

1 1 0 1
0 0 2 0 .
0 0 0 1

0 0 0
1 0 0 .
0 1 0

0 1 0 3 0 3
0 0 1 1 1 1 .
0 0 0 0 1 2

3. Row reduction algorithm


The row-reduction algorithm (Hubbard, p. 166) converts a matrix to echelon form. Briefly,
(a) SWAP rows so that the leftmost column that is not all zeroes has a
nonzero entry in the first row.
(b) DIVIDE by this entry to get a pivotal 1.
(c) SUBTRACT multiples of the first row from the others to clear out the
rest of the column under the pivotal 1.
(d) Repeat these steps to get a pivotal 1 in the second row, with nothing
but zeroes elsewhere in the column (including in the first row).
(e) Repeat until the matrix is in echelon form.

0 3 3 6
Carry out this procedure to row-reduce the matrix 2 4 2 4 .
3 8 4 7

10

4. Solving equations
Once you have row-reduced the matrix, you can interpret it as representing

x = ~b,
the equation A~
which has the same solutions as the equation with which you started, except
that now they can be solved by inspection.

A pivotal 1 in the last column ~b is the kiss of death, since it is an equation like 0x + 0y = 1. There is no solution. This happens in the second
Mathematica example,

1 0 1 0
where row reduction leads to 0 1 1 0 .
0 0 0 1
Otherwise, choose freely the values of the active unknowns in the nonpivotal columns(excluding the last one). Then each row gives the value of
the passive unknown in the column that has the pivotal 1 for that row.
This happens in the third Mathematica example,

1 0 1 23
2 1 3 1
where row reduction converts 1 1 0 1 to 0 1 1 31 .
0 0 0 0
1 1 2 31
The only nonpivotal column(except the last one) is the third. So we can
choose the value of the active unknown z freely.
Then the first row gives x in terms of z: x =

2
3

z.

The second row gives y in terms of z: y = 31 z.


If there are as many equations as unknowns, this situation is exceptional.
If there are fewer equations than unknowns, it is the usual state of affairs.
Expressing the passive variables in terms of the active ones will be the
subject of the important implicit function theorem in outline 9.
A column that is all zeroes is nonpivotal. Such a column must have been
there from the start; it cannot come about as a result of row reduction.
It corresponds to an unknown that was never mentioned. This sounds
unlikely, but it can happen when you represent a system of equations by
an arbitrary matrix.
Example: In R3 , solve the equations x = 0, y = 0 (z not mentioned)

11

5. Many for the price of one


If you have several equations with the same matrix A on the left and different vectors on the right, you can solve them all in the process of rowreducing A. This is Example 2.2.10, also done in Mathematica. Row reduction is more efficient than computing A1 , and it works even when A is
not invertible. Here is simple example with a non-invertible A:
x + 2y = 3
2x + 4y = 6
x + 2y = 3
2x + 4y = 7
The first pair has infinitely many solutions: choose any y and take x =
3 2y. The second set has none.
We must row-reduce the 2 4 matrix


1 2 3 3
.
2 4 6 7
This quickly gives


1 2 3 3
0 0 0 1
and then


1 2 3 0
0 0 0 1
The last column has a pivotal 1 no solution for the second set.
The third column has no pivotal 1, and the second column is also nonpivotal,
so there are multiple solutions for the first set of equations. Make a free
choice of the active variable y that goes with nonpivotal column 2.
How does the first row now determine the passive unknown x?

12

6. When is a matrix invertible?


Our definition of the inverse A1 of a matrix A requires it to be both a left
inverse and a right inverse: A1 A = I and AA1 = I.
We have also proved that the inverse of a matrix, if it exists, must be unique.
The notation I for the identity obscures the fact that one identity matrix
might be m m, the other n n, in which case we would have an invertible
non-square matrix. Now is the time to prove that this cannot happen: only
a square matrix can be invertible. This theorem is the key to Hubbards
proof of the most important theorem of linear algebra, which says that the
dimension of a vector space is well defined. The proof relies explicitly on
row reduction.
If A is invertible, a unique solution to A~x = ~b exists.
Existence: Prove that ~x = A1 ~b is a solution.

Uniqueness: Argue that the uniqueness of the solution follows from


the uniqueness of A1 .

Now we must show that if A~x = ~b has a unique solution, the number
of rows m must equal the number of columns n. Consider solving
A~x = ~b by row reduction, converting A to matrix A in echelon form.
To show that m = n, show that m n and n m.
If A has more rows than columns, there is no existence. Row reduction
must leave at least one row of zeroes at the bottom, and there exists
~b for which A~x = ~b has no solution.
If A has more columns than rows, there is no uniqueness. Row reduction must leave at least one nonpivotal column, and the solution to
A~x = ~b is not unique.
So if A is invertible, and A~x = ~b therefore has a unique solution, A
must be a square matrix.

13

7. Matrix inversion by row reduction


If A is square and you choose each standard basis vector in turn for the
right-hand side, then row reduction constructs the inverse of A if it exists.


1 2
As a simple example, we invert A =
.
2 5
Begin by appending the standard basis vectors as third and fourth columns
to get


1 2 1 0
.
2 5 0 1
Now row-reduce this in two easy steps:

The right two columns of the row-reduced matrix are the desired inverse:
check it!

For matrices larger than 2 2, row reduction is a more efficient way of constructing a matrix inverse than any techniques involving determinants that
you may have
learned! Hubbard,
Example 2.3.4, is done in Mathematica.

2 1 3 1 0 0
1 0 0 3 1 4
The matrix 1 1 1 0 1 0 row reduces to 0 1 0 1 1 1 .
1 1 2 0 0 1
0 0 1 2 1
3
1
What are A and A ?

14

8. Elementary matrices:
Each basic operation in the row-reduction algorithm can be achieved by
multiplication on the left by an appropriate invertible elementary matrix.
Here are examples of the three types of elementary matrix.
For each, figure
2 4

out what row operation is achieved by converting A = 1 1 to EA.


1 0

0 0
Type 1: E1 = 0 1 0
0 0 1

1 2 0
Type 2: E2 = 0 1 0
0 0 1

0 0 1
Type 3: E3 = 0 1 0
1 0 0
2

In practice, use of elementary matrices does not speed up computation, but


it provides a nice way to think about row reduction for purposes of doing
proofs.
For example, as on page 180 of Hubbard, suppose that A|I row-reduces to
I|B.
Then EA = I and EI = B, where
E = Ek E2 E1 is a product of elementary matrices. Since each elementary
matrix is invertible, so is E. Clearly E = B, which means that we can
construct E during the row-reduction process.
It is by no means obvious that E is unique, and in fact the general proof is
left as an exercise (2.4.12) in Hubbard. But in the case where A row-reduces
to the identity there is an easy proof.
Start with EA = I.
Multiply by E 1 on the left, E on the right, to get
E 1 EAE = E 1 E,
from which it follows that AE = I. So E is also a right inverse of A. But
we earlier proved that if a matrix A has a right inverse and a left inverse,
both are unique.

15

9. Row reduction and elementary matrices


We want to solve the equations
3x + 6y = 21
2x + 5y = 16.



 
3 6 ~
21
Then A =
,b=
,
2 5
16



3 6 21
and we must row-reduce the 2 3 matrix
.
2 5 16
Use an elementary matrix to accomplish each of the three steps needed to
accomplish row reduction.
Matrix E1 divides the top row by 3.
Matrix E2 subtracts twice row 1 from row 2.
Matrix E3 subtracts twice row 2 from row 1.

Interpret the result as a pair of equations and solve them (by inspection)
for x and y.

Show that the product E3 E2 E1 is the inverse of A.

16

10. Linear combinations and span


The defining property of a linear function T : for any collection of k vectors
in F n , ~v1 , ~vk , and any collection of coefficients a1 ak in field F ,
k
k
X
X
T(
ai~vi ) =
ai T (~vi ).
i=1

The sum

Pk

i=1

i=1

ai~vi is called a linear combination of the vectors ~v1 , v~k .

The set of all the linear combinations of ~v1 , v~k is called the span of the
set ~v1 , ~vk .
Prove that it is a subspace of F n .





1
0
3
2
1

Suppose ~v1 = 2 , ~v2 = 1 , ~v3 = 1 ,~


w1 = 3 ,~
w2 = 0
0
1
1
2
1

~ 1 is a linear combination of ~v1 and ~v2 .


Show that w

Invent an easy way to describe the span of ~v1 , ~v2 , and ~v3 . (Hint:
consider the sum of the components.)

~ 2 is not in the span of ~v1 , ~v2 , and ~v3 .


Show that w

1
0
3
2 1
1

The matrix 2 1 1 3 0 row reduces to 0


1 1 2 1 0
0
How does this result answer the question of whether or
is in the span of ~v1 , ~v2 , and ~v3 ?

17

0 3 2 0
1 5 2 0.
0 0 0 1
~ 1 or w
~2
not w

11. Special cases:


~ is in the span of ~u if and only if it is a multiple of ~u.
In F n , w
~ is in the span
In F 2 , if ~v is not a multiple of ~u, then every vector w
of ~u and ~v.
Write an equivalent statement using negation, and use it to construct
an example.

~ is in the span of ~u and ~v if and only if it is orthogonal


In F 3 , a vector w
to ~u ~v.
Give a geometrical interpretation of this statement.

~ is in the span of ~u and ~v, then it is


Prove algebraically that if w
orthogonal to ~u ~v. (Proof strategy: interchange dot and cross)

If matrix [T ] represents linear transformation T , the image of T is the


span of the columns of [T ].

~ is in the span of ~v1 , ~v2 , ~vk if the system of


In general, a vector w
equations
~ has at least one solution. To check this,
x1~v1 + x2~v2 + xk ~vk = w
make all the vectors into a matrix and row-reduce it. If the last column
~ ) has a pivotal 1, then w
~ is not in the span of the
(corresponding to w
others. You have already seen one example, and there is another in
the Mathematica file.

18

12. Linear independence


~v1 , ~v2 , ~vk are linearly independent if the system of equations
~ has at most one solution.
x1~v1 + x2~v2 + + xk ~vk = w
To test for linear independence, make the vectors ~v1 , ~v2 , ~vk into a matrix
and row-reduce it. If any column is nonpivotal, then the vectors are linearly
dependent. There is an example in the Mathematica file.



1
2
0
1
0
2



The vectors to test for independence are ~v1 =
2, ~v2 = 1, ~v3 = 3.
1
1
1
~ is irrelevant and might as well be zero, so we just make a
The vector w
matrix from the three given vectors:

1 2 0
1 0 2
1 0 2
0 1 1

2 1 3 reduces to 0 0 0
1 1 1
0 0 0
The third column is nonpivotal; so the given vectors are linearly dependent.
How can you write the third one as a linear combination of the first two?


0
2

Change ~v3 to
1 and test again.
1

1 2 0
1 0
1 0 2
0 1

Now
2 1 1 reduces to 0 0
1 1 1
0 0

0
0

1
0

There is no nonpivotal column. The three vectors are linearly independent.


~ = ~0, as we have already done, leads to the standard definition of
Setting w
linear independence: if
a1~v1 + a2~v2 + ak ~vk = ~0
then a1 = a2 = = ak = 0.

19

13. Constructing a vector outside the span


The vectors are


4
2

~v1 = 2 , ~v2 = 1
3
2

4 2
1 0
A = 2 1 reduces to EA = 0 1, and the matrix that does the job is
3 2
0 0

1 0 1

E = 23 0 2 .
1
1 0
2
We want to append a third column ~b such that when we row reduce the
square matrix A|~b, the resulting matrix EA|E ~b will have a pivotal 1 in the
third column. In this case it will be in the bottom row. Since E, being a
product of elementary matrices, must be invertible, we compute

0
0
1

0 = 1
E
1
0

0

We have found a vector, 1, that is not in the span of ~v1 and ~v2 .
0
Key point: the proof relies on the fact that this procedure will always work,
because the matrix E that accomplishes row reduction is guaranteed to be
invertible!

20

14. Two key theorems; your proof 3.1


In Rn , a set of n + 1 vectors cannot be linearly independent.
If we start with n + 1 vectors in Rn , make a matrix that has these
vectors as its columns, and row-reduce, the best we can hope for is to
get a pivotal 1 in each of n columns. There must be at least one nonpivotal column (not necessarily the last column), and the n + 1 vectors
must be linearly dependent: they cannot be linearly independent.
Show what the row-reduced matrix looks like and how it is possible
for the non-pivotal column not to be the last column.

In Rn , a set of n 1 vectors cannot span.


Remember that span means
~ has at least one solution.
~
w, x1~v1 + x2~v2 + xk ~vk = w
Since exists is easier to work with than for all, convert this into a
definition of does not span. A set of k vectors does not span if
~ has no solution.
~
w such that x1~v1 + x2~v2 + xk ~vk = w
~ , using elementary matrices.
We invent a method for constructing w
Make a matrix A whose columns are ~v1 , ~v2 , ~vk , and row-reduce it
by elementary matrices whose product can be called E. Then EA is
in echelon form.
If A has only n 1 columns, it cannot have more than n 1 pivotal
1s, and there cannot be a pivotal 1 in the bottom row. That means
~ that row-reduced to a pivotal 1 in the last
that if we had chosen a w
row, the set of equations
~
x1~v1 + x2~v2 + xk ~vk = w
would have had no solution.
Now E is the product of invertible elementary matrices, hence invert~ = E 1 e~n as an example of a vector that is not
ible. Just construct w
in the span of the given n 1 vectors.

21

15. Proof 3.1 start to finish


Prove that in Rn , n + 1 vectors are never linearly independent and n 1
vectors never span.

22

16. Definition of basis


This is Hubbard, Definition 2.4.12. It is really a definition plus two theorems, but it can conveniently be left ambiguous which is which!
A basis for a subspace V Rn has the following equivalent properties:
(a) It is a maximal set of linearly independent vectors in V : if you add any
other vector in V to the set, it will no longer be linearly independent.
(b) It is a minimal spanning set: it spans V , but if you remove any vector
from the set, it will no longer span.
(c) It is a set of linearly independent vectors that spans V .
To show that any of these three properties implies the other two would
require six proofs. Lets do a couple. Call the basis vectors ~v1 , ~v2 , ~vk .
Prove that (a) implies (b) (this is your proof 3.2).
~ to the basis set, the resulting set
When we add any other vector w
is linearly dependent. Express this statement as an equation that
includes the term b~
w.

~ as a linear combination of the


Show that if b 6= 0, we can express w
basis set. This will prove spanning set.

To prove that b 6= 0, assume the contrary, and show that the vectors
~v1 , ~v2 , ~vk would be linearly dependent.

To prove minimal spanning set, just exhibit a vector that is not in


the span of ~v1 , ~v2 , ~vk1 .

23

Prove that (c) implies (a).


This is easier, since all we have to show is maximal. Add another
~ to the linearly independent spanning set ~v1 , ~v2 , ~vk . How
vector w
do we argue that this set is linearly dependent?

Prove that (c) implies (b).


All we have to show is minimal. Imagine removing the last vector.
To show that the set ~v1 , ~v2 , ~vk1 is not a spanning set, we need to
find one vector that cannot be a linear combination of these.

Now we combine this definition of basis with what we already know about
sets of vectors in Rn .
Our conclusions:
In Rn , a basis cannot have fewer than n elements, since they would not
span.
In Rn , a basis cannot have more than n elements, since they would not be
linearly independent.
So any basis must, like the standard basis, have exactly n elements.

24

17. Basis for a subspace


Consider any subspace E Rn . We need to prove the following:
E has a basis.
Any two bases for E have the same number of elements, called the
dimension of E.
Before the proof, consider an example.
E R3 is the set of vectors for which x1 + x2 + x3 = 0.


1
0
One basis is 0 and 1 .
1
1


1
1

Another basis is 2 and 1.


1
0
Its obvious that either basis is linearly independent, since neither basis
vector is zero, and one is not a multiple of the other.
How could we establish linear independence by using row reduction?

To show that each spans is less trivial. Fortunately, in this


simple
case we
a
can write an expression for the general element of E as b
a b
How would we express this general element as a linear combination of basis
vectors?

25

Now we proceed to the proof. First we must prove the existence of a basis
by explaining how to construct one.
How to make a basis for a non-empty subspace E in general:
Choose any ~v1 to get started. Notice that we need not specify a method for
doing this! The justification for this step is the so-called axiom of choice.
If ~v1 does not span E, choose ~v2 that is not in the span of ~v1 (not a multiple
of it). Again, we do not say how to do this, but it must be possible since
~v1 does not span E.
If ~v1 and ~v2 do not span E, choose ~v3 that is not in the span of ~v1 and ~v2
(not a linear combination).
Keep going until you have spanned the space. By construction, the set is
linearly independent. So it is a basis.
Second, we must prove that every basis has the same number of vectors.
Imagine that two people have done this and come up with bases of possibly
different sizes.
One is ~v1 , ~v2 , ~vm .
~ 1, w
~ 2, w
~ p.
The other is w
~ j as a linear combination of
Since each basis spans E, we can write each w
the ~v. It takes m coefficients to do this for each of the p vectors, so we end
~ j.
up with an m p matrix A, each of whose columns is one of the w
~ j . It takes p
We can also write each ~vi as a linear combination of the w
coefficients to do this for each of the m vectors, so we end up with a p m
matrix B, each of whose columns is one of the ~vi .
Clearly AB = I and BA = I. So A is invertible, hence square, and m = p.

26

18. Kernels and Images


Consider linear transformation T : Rn Rm . This can be represented by
a matrix, but we want to stay abstract for the moment.
The kernel of T , Ker T , is the set of vectors ~x for which T (~x) = 0.
A system of equations T (~x) = ~b has a unique solution if and only if
Ker T is zero-dimensional.
Assume that T (~x1 ) = ~b and T (~x2 ) = ~b.
Since T is linear,
T (~x1 ~x2 ) = ~b ~b = 0.
If the kernel is zero-dimensional, it contains only the zero vector, and
~x1 = ~x2 .
Conversely, if the solution is unique: the only way that ~x1 and ~x2 can
both be solutions is ~x1 = ~x2 , the kernel is zero-dimensional.
Ker T is a subspace of Rn .
Proof:
If ~x and ~y are elements of Ker T , then, because T is linear,
T (a~x + b~y) = aT (~x) + bT (~y) = 0.
~ for which ~v such that
The image of T , Img T , is the set of vectors w
~ = T (~v).
w
Img T is a subspace of Rm .
Proof:
~ 1 and w
~ 2 are elements of Img T , then
If w
~ 1 = T (~v1 ) and
~v1 such that w
~ 2 = T (~v2 )
~v2 such that w
~1 + b~
T (a~v1 + b~v2 ) = aT (~v1 ) + bT (v~2 ) = aw
w2 .
We have shown that any linear combination of elements of Img T is
also an element of Img T .

27

19. Basis for the image


To find a basis for the image of T , we must find a linearly independent
set of vectors that span the image. Spanning the image is not a problem:
the columns of the matrix for T do that. The hard problem is to choose a
linearly independent set. The secret is to use row reduction.
Each nonpivotal column is a linear combination of the columns to its left,
hence inappropriate to include in a basis. It follows that the pivotal columns
of T form a basis for the image. Of course, you can permute the columns
and come up with a different basis: no one said that a basis is unique.
This process of finding a basis for Img T is carried out in Mathematica.

1 2 1 1
1 2 0 2
The matrix T = 0 0 1 1 row reduces to 0 0 1 1.
2 4 1 3
0 0 0 0
By inspecting these two matrices, find a basis for Img T . Notice that the
dimension of Img T is 2, which is less than the number of rows, and that
the two leftmost columns do not form a basis.

28

20. Basis for the kernel

1 2 1 1
1 2 0 2
The matrix T = 0 0 1 1 row reduces to 0 0 1 1.
2 4 1 3
0 0 0 0
To find a basis for Ker T , look at the row-reduced matrix and identify
the nonpivotal columns. For each nonpivotal column i in turn, put a 1
in the position of that column, a 0 in the position of all other nonpivotal
columns, and leave blanks in the other positions. The resulting vectors must
be linearly independent, since for each of them, there is a position where
it has a 1 and where all the others have a zero. What are the resulting
(incomplete) basis vectors for Ker T ?

Now fill in the blanks: assign values in the positions of all the pivotal
columns so that T (v~i ) = 0. The vectors v~i span the kernel, since assigning a
value for each nonpivotal variable is precisely the technique for constructing
the general solution to T (~v) = 0.

29

21. Rank - nullity theorem


The matrix of T : Rn Rm has n columns. We row-reduce it and find r
pivotal columns and n r nonpivotal columns. The integer r is called the
rank of the matrix.
Each pivotal column gives rise to a basis vector for the image; so the dimension of Img T is r.
Each nonpivotal column gives rise to a basis vector for the kernel; so the
dimension of Ker T is n r.
Clearly, dim(Ker T ) + dim(Img T ) = n.
In the special case of a linear transformation T : Rn Rn , represented by
a square n n matrix, if the rank r = n then
any equation T (~v) = ~b has a solution, since the image is n-dimensional.
any equation T (~v) = ~b has a unique solution, since the kernel is 0dimensional.
T is invertible.

30

22. Linearly independent rows


Hubbard (page 200) gives two arguments that the number of linearly independent rows of a matrix equals its rank. Here is yet another.
Swap rows to put a nonzero row as the top row. Then swap a row that is
linearly independent of the top row into the second position. Swap a row
that is linearly independent of the top two rows into the third position.
Continue until the top r rows are a linearly independent set, while each of
the bottom m r rows is a linear combination of the top r rows.
Now, continuing with elementary row operations, subtract appropriate multiples of the top r rows from each of the bottom rows in succession, reducing
it to zero. (This is easy in principle but hard in practice!). The top rows,
still untouched, are linearly independent, so there is no way for row reduction to convert any of them to a zero row. In echelon form, the matrix will
have r pivotal 1s: its rank is r.
It follows that r is both the number of linearly independent columns and
the number of linearly independent rows: the rank of A is equal to the rank
of its transpose AT .

31

23. Orthonormal basis:


If we have a dot product, then we can convert any spanning set of vectors
into a basis. Here is the algorithm, sometimes called the Gram-Schmidt
process. We will apply it to the 3-dimensional subspace of R4 for which
the components sum to zero. Details of the computation are in the Mathematica file.
~ 1 and divide it by its length to make the first basis
Choose any vector w
vector ~v1 .

1
1

~1 =
If w
1 , what is ~v1 ?
1
~ 2 that is linearly independent of ~v1 and subtract off
Choose any vector w
a multiple of ~v1 to make a vector ~x that is orthogonal to ~v1 . Divide this
vector by its length to make the second basis vector ~v2 .

2
1

~2 =
~ 2 (~
If w
w2 ~v1 )~v1
1, calculate ~x = w
0

~ 3 that is linearly independent of ~v1 and ~v2 , and subtract


Choose any vector w
off multiples of ~v1 and ~v2 to make a vector ~x that is orthogonal to both ~v1
and ~v2 . Divide this vector by its length to make the third basis vector ~v3 .
Continue until you can no longer find any vector that is linearly independent
of your basis vectors.
3
1
1

25
2 5
2
1
3
1
2 5
2 5
2 , ~
~
Mathematica gives ~v1 =
v
=
,
v
=

.
1 2 3 3
21 5
2
2 5
1
3
12

2 5

2 5

A nice feature of an orthogonal basis (no need for it to be orthonormal) is


that any set of orthogonal vectors is linearly independent.
Proof: assume a1 v~1 + a2 v~2 + ak v~k = ~0.
Choose any v~i and take the dot product with both sides of this equation.
You get ai = 0 for all i, which establishes independence.

32

Group Problems
1. Row reduction and elementary matrices
(a) By row reducing an appropriate matrix to echelon form, solve the
system of equations
2x + y + z = 2
x + y + 2z = 2
x + 2y + 2z = 1
where all the coefficients and constants are elements of the finite field
Z3 . If there is no solution, say so. If there is a unique solution, specify
the values of x, y, and z. If there is more than one solution, determine
all solutions by giving formulas for two of the variables, perhaps in
terms of the third one.


1
2
(b) Find the inverse of A =
by using row reduction by means of
3 7
elementary matrices, as was done in sample problem 2. Confirm that
the product of the three elementary matrices that you use is indeed
the inverse. Use the familiar rule method for finding a 2 2 inverse to
check your answer!
(c) The matrix

0 1 2
A = 1 2 3
2 3 4
is not invertible. Nonetheless, there is a product E of three elementary
matrices, applied as was done in sample problem 2, that will reduce it
to echelon form. Find these three matrices and their product E.

33

2. Some short proofs


(a) Show that type 3 elementary matrices are not strictly necessary, because it is possible to swap rows of a matrix by using only type 1 and
type 2 elementary matrices. (If you can devise a way to swap the two
rows of a 2 2 matrix, that it sufficient, since it is obvious how the
technique generalizes.)
(b) Prove that if a set of linearly independent vectors spans a vector space
W, it is both a maximal linearly independent set and a minimal spanning set.
(c) Prove that in a vector space spanned by a single vector ~v, any two
vectors are linearly dependent. Then using this result, prove that in
~ 1, w
~2
a space spanned by two vectors ~v1 and ~v2 , any three vectors w
~ 3 must be linearly dependent. In the interest of simplicity. you
and w
~ 1 = a1~v1 + a2~v2 with a1 6= 0.
may assume that w
~ 1 and w
~ 2 and
Hint: Show how to construct a linear combination of w
~ 1 and w
~ 3 , neither of which involves ~v1 .
a linear combination of w

34

3. Problems to be solved by writing or editing R scripts.

(a) The director of a budget office has to make changes to four line items
in the budget, but her boss insists that they must sum to zero. Three
of her subordinates make the following suggestions, all of which lie in
the subspace of acceptable changes:



1
3
3
2
2
1
~ 2 = ,w

~1 =
w
3 ,w
2 ~ 3 = 2.
6
3
2

1
1

The boss proposes ~y =


2, also acceptable on the grounds that it
0
is simpler.
~ i . Then convert the w
~ i to
Express ~y as a linear combination of the w
an orthonormal basis ~vi and express ~y as a linear combination of the
~vi .
(b) Find a basis for the image
and the kernel of the matrix

3 1 1 0 4
1 0 1 1 2

A=
0 1 2 0 1,
2 0 0 1 3
Express the columns that are not in the basis for the image as linear
combinations of the ones that are in the basis.
(c) Find two different solutions to the following set of equations in Z5 :
2x + y + 3z + w = 3
3x + 4y + 3w = 1
x + 4y + 2z + 4w = 2
(d) The R function
sample(0:2, n, replace=TRUE)
generates n random numbers, each equally likely to be 0, 1, or 2. Use
it to generate three equations of the form ax + by + cz + dw = e with
coefficients in Z3 , and solve them by row reduction. If the solution is
not unique, find two different solutions.

35

Homework

In working on these problems, you may collaborate with classmates and consult
books and general online references. If, however, you encounter a posted solution
to one of the problems, do not look at it, and email Paul, who will try to get it
removed.
For the first three problems, do the row reduction by hand. That should give
you enough practice so that you can do row reduction by hand on exams. Then
you can use R to do subsequent row reduction.
1. By row reducing an appropriate matrix to echelon form, solve the system
of equations
2x + 4y + z = 2
3x + y = 1
3y + 2z = 3
over the finite field Z5 . If there is no solution, say so. If there is a unique
solution, specify the values of x, y, and z and check your answers. If there
is more than one solution, express two of the variables in terms of an arbitrarily chosen value of the third one. For full credit you must reduce the
matrix to echelon form, even if the answer becomes obvious!
2. (a) By using elementary matrices, find a vector that is not in the span of



1
0
2

~v1 = 1 , ~v2 = 2 , and ~v3 = 4


1
2
0
(b) In the process, you will determine that the given three vectors are
linearly dependent. Find a linear combination of them, with the coefficient of ~v3 equal to 1, that equals the zero vector.
(c) Find a 1 3 matrix A such that A~v1 = A~v2 = A~v3 = 0, and use it to
check your answer to part(a).

36

3. This problem illustrates how you can use row reduction to express a specifed
vector as a linear combination of basis vectors.
Your bakery uses flour, sugar, and chocolate to make cookies, cakes, and
brownies. The ingredients for a batch of each product is described by a
vector, as follows:



1
4
7
Suppose ~v1 = 2, ~v2 = 2, ~v3 = 8 .
3
7
11
This means, for example, that a batch of cookies takes 1 pound of flour, 2
of sugar, 3 of chocolate.
You are about to shut down for vacation and want
toclear out your inven21
~ = 18.
tory of ingredients, described by the vector w
38
Use row reduction to find a combination of cookies, cakes, and brownies
that uses up the entire inventory.
4. Hubbard, exercises 2.3.8 and 2.3.11 (column operations: a few brief comments about the first problem will suffice for the second. These column
operations will be used in the spring term to evaluate n n determinants.)
5. (This result will be needed in Math 23b)
Suppose that a 2n 2n matrix T has the following properties:
The first n columns are a linearly independent set.
The last n columns are a linearly independent set.
Each of the first n columns is orthogonal to each of the last n columns.
Prove that T is invertible.
~ = a~u + ~v, where ~u is a linear combination of the first n
Hint: Write w
columns and ~v is a linear combination of the last n columns. Start by
~ = ~0,
showing that ~u is orthogonal to ~v. Then exploit the fact that if w
~ w
~ = 0.
w

37

6. (This result will be the key to proving the implicit function theorem, key
to many economic applications.)
Suppose that m n matrix C , where n > m, has m linearly independent
columns and that these columns are placed on the left. Then we can split
off a square matrix A and write C = [A|B].
(a) Let ~y be an (nm)-component vector of the active variables,
 and
 let
~x
~x be the m-component vector of passive variables such that C
= ~0.
~y
Prove that ~x = A1 B~y.
(b) Use this approach to solve the system of equations
5x + 2y + 3z + w = 0
7x + 3y + z 2w = 0
by inverting a 2 2 matrix, without using row reduction or any other
elimination technique. The solution will express the passive variables x and y in terms of the active variables z and w.
The remaining problems are to be solved by writing R scripts. You may
use the rref() function whenever it works.
7. (Like group problem 3b, but in a finite field, so rref will not help!)
In R, the statement
A<-matrix(sample(0:4, 24, replace = TRUE),4)
was used to create a 4 6 matrix A with 24 entries in Z5 . Each entry
randomly has the value 0, 1, 2, 3, or 4.
Here is the resulting matrix:

3
1
A=
0
1

0
1
2
0

4
3
1
2

0
3
1
0

2
2
4
3

2
1
.
2
4

Use row reduction to find a basis for the image of A and a basis for the
kernel. Please check your answer for the kernel.
8. One of the seventeen problems on the first Math 25a problem set for 2014
was to find all the solutions of the system of equations
2x1 3x2 7x3 + 5x4 + 2x5 = 2
x1 2x2 4x3 + 3x4 + x5 = 2
2x1 4x3 + 2x4 + x5 = 3
x1 5x2 7x3 + 6x4 + 2x5 = 7
without the use of a computer.
Solve this problem using R ( like script 3.1A).
38

9. (Like script 3.1C and group problem 3a)A neo-Cubist sculptor wants to use
a basis for R3 with the following properties:

1
The first basis vector w1 = 1 lies along the body diagonal of the
1
cube.

1

The second basis vector w2 = 0 lies along a face diagonal of the


1
cube.

3

The second basis vector w3 = 4 , has length 13.


12
Convert these three basis vectors to an orthonormal basis. Then make a
33 reflection matrix F by using this basis, and confirm that the transpose
of F is equal to its inverse.

39

MATHEMATICS 23a/E-23a, Fall 2015


Linear Algebra and Real Analysis I
Module #1, Week 4 (Eigenvectors and Eigenvalues)
Author: Paul Bamberg
R scripts by Paul Bamberg
Last modified: June 18, 2015 by Paul Bamberg
Reading
Hubbard, Section 2.7
Hubbard, pages 474-475
Proofs to present in section or to a classmate who has done them.
4.1 Prove that if ~v1 , , ~vn are eigenvectors of A : Rn Rn with distinct
eigenvalues 1 n , they are linearly independent. Conclude that an nn
matrix cannot have more than n distinct eigenvalues.
4.2
For real n n matrix A, prove that if all the polynomials pi (t) are
simple and have real roots, then there exists a basis for Rn consisting
of eigenvectors of A.
Prove that if there exists a basis for Rn consisting of eigenvectors of
A, then all the polynomials pi (t) are simple and have real roots.
Note - Theorem 2.7.6 in Hubbard is more powerful, because it applies to
the complex case. The proof is the same. Our proof is restricted to the real
case only because we are not doing examples with complex eigenvectors.

R Scripts
1.4A-EigenvaluesCharacteristic.R
Topic 1 - Eigenvectors for a 2x2 matrix
Topic 2 - Not every 2x2 matrix has real eigenvalues
1.4B-EigenvectorsAxler.R
Topic 1 - Finding eigenvectors by row reduction
Topic 2 - Eigenvectors for a 3 x 3 matrix
1.4C-Diagonalization.R
Topic 1: Basis of real eigenvectors
Topic 2 - Raising a matrix to a power
Topic 3 - Wnat if the eigenvalues are complex?
Topic 4 - What if there is no eigenbasis?
1.4X-EigenvectorApplications.R
Topic 1 - The special case of a symmetric matrix
Topic 2 - Markov Process (from script 1.1D)
Topic 3 - Eigenvectors for a reflection
Topic 4 - Sequences defined by linear recurrences

Executive Summary

1.1

Eigenvalues and eigenvectors

If A~v = ~v, ~v is called an eigenvector


for A, and is the corresponding
eigenvalue.

 
1 4
1
For example, if A =
, we can check that ~v =
2 5
1
is an eigenvector of A with eigenvalue 3.
If A is a 2 2 or 3 3 matrix, there is a quick, well-known way to find
eigenvalues by using determinants.
Rewrite A~v = ~v as A~v = I~v, where I is the identity matrix.
Equivalently, (A I)~v = ~0
Suppose that is an eigenvalue of A. Then the eigenvector ~v is a nonzero
vector in the kernel of the matrix (A I).
It follows that the matrix (A I) is not invertible. But we have a formula
for the inverse of a 2 2 or 3 3 matrix, which can fail only if the determinant
is zero. Therefore a necessary condition for the existence of an eigenvalue is that
det(A I) = 0.
The polynomial A () = det(A I) is called the characteristic polynomial of matrix A. It is easy to compute in the 2 2 or 3 3 case, where there
is a simple formula for the determinant. For larger matrices A () is hard to
compute efficiently, and this approach should be avoided.
Conversely, suppose that A () = 0 for some real number . It follows that
the columns of the matrix (A I) are linearly dependent. If we row reduce the
matrix, we will find at least one nonpivotal column, which in turn implies that
there is a nonzero vector in the kernel. This vector is an eigenvector.
This was the standard way of finding eigenvectors until 1995, but it has two
drawbacks:
It requires computation of the determinant of a matrix whose entries are
polynomials. Efficienc algorithms for calculating the determinant of large
square matrices use row-reduction techniques, which might require division
by a pivotal elment that is a polynomial in .
Once you have found the eigenvalues, finding the corresponding eigenvectors
is a nontrivial linear algebra problem.

1.2

Finding eigenvalues - a simple example





1 4
1
4
Let A =
. Then A I =
2 5
2
5
and A () = det(A I) = (1 )(5 ) + 8 = 2 4 + 3.
Setting 2 4 + 3 = ( 1)( 3) = 0, we find two eigenvalues, 1 and 3.
3

Finding the corresponding


eigenvectors
still requires a bit of algebra.


2 4
For = 1, A I =
.
2 4
 
2
By inspection we see that ~v1 =
is in the kernel of this matrix.

    1
1 4 2
2
Check: A~v1 =
=
eigenvector with eigenvalue 1.
2 5 1  1
 
4 4
1
For = 3, A I =
, and ~v2 =
is in the kernel.
2
1

  2  
1 4 1
3
Check: A~v2 =
=
eigenvector with eigenvalue 3.
2 5 1
3

1.3

A better way to find eigenvectors

~ . Keep computing A~
~ , A3 w
~,
Given matrix A, pick an arbitrary vector w
w , A2 w
etc. until you find a vector that is a linear combination of its predecessors. This
situation is easily detected by row reduction.
Now you have found a polynomial p of degree m such that p(A)~
w = 0. Furthermore, this is the nonzero polynomial of lowest degree for which p(A)~
w = 0.
Over the complex numbers, this polynomial is guaranteed to have a root by
virtue of the fundamental theorem of algebra (Hubbard theorem 1.6.13). Over
the real numbers or a finite field, it it will have a root in the field only if you are
lucky. Assuming that the root exists, factor it out: p(t) = (t )q(t).
Now p(A)~
w = (A I)q(A)~
w = 0.
Thus q(A)~
w is an
eigenvector
with eigenvalue .


1 4
Again, let A =
2 5
 
 
 
1
1
7
2
~ choose
~ =
As the arbitrary vector w
. Then A~
w=
and A w
.
0
2
8
~ , as a linear combination
We need to express the third of these vectors, A2 w
of
 the first two.
 This
 is done
 by row reducing the matrix
1 1 7
1 0 3
~ = 4A~
~.
to
to find that A2 w
w 3I w
0 2 8
0 1 4
Equivalently, (A2 4A + 3I)~
w = 0.
p(A) = A2 4A + 3I or p(t) = t2 4t + 3 = (t 1)(t 3): eigenvalues 1 and 3.
To get the eigenvector
1, apply the remaining factor
of p(A),

  for eigenvalue
 

4 4 1
4
2
~:
A 3I, to w
=
. Divide by -2 to get ~v1 =
.
2 2 0
2
1
To get theeigenvector
3, apply the remaining
   foreigenvalue

 factor of p(A),
2 4 1
2
1
~:
A I, to w
=
. Divide by -2 to get ~v2 =
.
2 4 0
2
1
In this case the polynomial p(t) turned out to be the same as the characteristic
polynomial, but that is not always the case.
 
1
~ =
If we choose w
, we find A~
w = 3~
w, p(A) = A 3I, p(t) = t 3. We
1
~ to find the other eigenvalue.
need to start over with a different w


2 0
If we choose A =
, then any vector is an eigenvector with eigenvalue
0 2
2. So p(t) = t 2. But the characteristic polynomial is (t 2)2 .


2 1
If we choose A =
, the characteristic polynomial is (t 2)2 . But now
0 2
~ = ~e1 we find p(t) = t 2
there is only one eigenvector.
If we choose w
 
1
~ = ~e2 we find
and the eigenvector
. But if we choose a different w
0
p(t) = (t 2)2 and we fail to find a second, independent eigenvector.
5

1.4

When is there an eigenbasis?

~ successively to equal ~e1~e2 , , ~en .


Choose w
In searching for eigenvectors, we find successively polynomials p1 (t), p2 (t), , pn (t).
There is a basis of real eigenvectors if and only if each of the polynomials pi (t)
has simple real roots, e.g. p(t) = t(t 2)(t + 4)(t 2.3). No repeated factors
are allowed!
A polynomial like p(t) = t2 + 1, although it has no repeated factors, has no
real roots: p(t) = (t + i)(t i).
If we allow complex roots, then any polynomial can be factored into linear
factors (Fundamental Theorem of Algebra, Hubbard page113).
There is a basis of complex eigenvectors if and only if each of the polynomials
pi (t) has simple roots, e.g. p(t) = t(ti)(t+i). No repeated factors are allowed!
Our technique for finding eigenvectors works also for matrices over finite fields,
but in that case it is entriely possible for a polynomial to have no linear factors
whatever. In that case there are no eigenvectors and no eigenbasis. This is one
of the few cases where linear algebra over a finite field is fundamentally different
from linear algebra over the real or complex numbers.

1.5

Matrix Diagonalization

In the best case we can find a basis of n eigenvectors {~v1 , ~v2 , , ~vn } with associated eigenvalues {1 , 2 , , n }. Although the eigenvectors must be independent, some of the eigenvalues may repeat.
Create a matrix P whose columns are the eigenvectors. Since the eigenvectors
form a basis, they are independent and the matrix P has an inverse P 1
The matrix D = P 1 AP is a diagonal matrix.
Proof: D~ek = P 1 A(P~ek ) = P 1 A~vk = P 1 k ~vk = k P 1~vk = k~ek .
The matrix A can be expressed as A = P DP 1 .
Proof: A~vk = P D(P 1~vk ) = P D~ek = P (k~ek ) = k P~ek = k ~vk .
A diagonal matrix Dis easy to raise
to an integer
kpower.
1 0 0
1 0 0
k

For example, if D = 0 2 0 , then D = 0 k2 0


0 0 3
0 0 k3
But now A = P DP 1 is also easy to rasie to a power, because Ak = P Dk P 1
(will be proved by induction)
The same result extends to kth roots of matrices,
where B = A1/k means that B k = A.

1.6

Properties of an eigenbasis

Even if all the eigenvalues are distinct, an eigenbasis is not unique. Any
eigenvector in the basis can be multiplied by a nonzero scalar and remain
an eigenvector.
Eigenvectors that correspond to distinct eigenvalues are linearly independent (your proof 4.1)
If the matrix A is symmetric, eigenvectors that correspond to distinct eigenvalues are orthogonal.

1.7

What if there is no eigenbasis?

We consider only the case where A is a 2 2 matrix. If a real polynomial p(t)


does not have two distinct real roots, then it either has a repeated real root or it
has a pair of conjugate complex roots.
Case 1: Repeated root: p(t) = (t )2 .
So p(A) = (A I)2 = 0.
Set N = A I, and N 2 = 0. The matrix N is called nilpotent.
Now A = I + N , and A2 = (I + N )2 = 2 I + 2N .
It is easy to prove by induction that Ak = (I + N )k = k I + kk1 N .
Case 2: Conjugate complex roots:
If a 2 2 real matrix A has eigenvalues a ib, then it can
 be expressed

a
b
in the form A = P CP 1 , where C is the conformal matrix
and P
b a
is a change of basis matrix. Since a conformal matrix is almost as easy as a
diagonal matrix to raise to the nth power by virtue of De Moivres theorem
(r(cos + i sin ))n = rn (cos n + i sin n), this representation is often useful.
Here is an algorithm for constructing the matrices C and P :
Suppose that the eigenvalues of A are a ib. Then A has no real eigenvectors,
~ we will find the polynomial
and for any real w
p(t) = (t a ib)(t a + ib) = (t a)2 + b2
)2 = I.
So p(A) = (A aI)2 + b2 I = 0 or ( AaI
b
Now we need to construct a new basis, which will not be a basis of eigenvectors
but which will still be useful.
Set ~v1 = ~e1 , ~v2 = ( AaI
)~e1 .
b
Then (A aI)~v1 = b~v2 and A~v1 = a~v1 + b~v2 .
)~v2 = ( AaI
)2~v1 = ~v1 , so
Also, ( AaI
b
b
(A aI)~v2 = b~v1 and A~v2 = a~v2 b~v1 .
With respect
to the new basis, the matrix that represents A is the conformal

a b
matrix C =
.
b a
If we define P in the usual way with columns ~v1 and ~v2 , then A = P CP 1 ,
and the matrices P and C are real.

1.8

Applications of eigenvectors

Markov processes
Suppose that a system can be in one of two or more states and goes through
a number of steps, in each of which it may make a transition from one state
to another in accordance with specified transition probabilities.
 
p
For a two-state process, vector ~vn = n specifies the probabilities for
qn
the system to be in state 1 or state 2 after n steps of the process, where
0 pn , qn 1. and
 pn + qn = 1 The transition probabilities are spacified
a b
by a matrix A =
, where all the entries are between 0 and 1 and
c d
a + c = b + d = 1.
After a large number of steps, the state of the system is speciifed by ~vn =
An~v0 .
The easy way to calculate An is by diagonalizing A. If there is a stationary
state ~v into which the system settles down, it corresponds to an eigenvector
with eigenvalue 1, since ~vn+1 = A~vn and ~vn+1 = ~vn = ~v.
Reflections
If 2 2 matrix F represents reflection in a line through the origin with
direction vector ~v, then ~v must be an eigenvector with eigenvalue 1 and a
vector perpendicular to ~v must be an eigenvector with eigenvalue -1.
If 3 3 matrix F represents reflection in a plane P through the origin
~ then N
~ must be an eigenvector with eigenvalue -1
with normal vector N,
and there must be a two-dimensional subspace of vectors in P , all with
eigenvalue +1.
Linear recurrences and Fibonacci-like sequences.
In computer science, it is frequently the case that the first two terms of a
sequence, a0 and a1 , are specified, and subsequent terms are specified by a
linear recurrence of the form an+1 = ban1 +can . The best-known example
is the Fibonacci sequence (Hubbard, pages 223-225) where a0 = a1 = 1 and
b = c = 1.

 

 
n  
an
0 1 an1
0 1
a0
Then
=
=
.
an+1
b c
an
b c
a1


0 1
to the nth power is to diagonalize
The easy way to raise matrix A =
b c
it.
Solving systems of linear differential equations
This topic, of crucial importance to physics, will be covered after we have
done some calculus and infinite series.
8

Lecture Outline
1. Using the characteristic polynomial to find eigenvalues and eigenvectors
If A~v = ~v, ~v is called an eigenvector for A, and is the corresponding
eigenvalue.
If A is a 2 2 or 3 3 matrix, there is a quick, well-known way to find
eigenvalues by using determinants.
Rewrite A~v = ~v as A~v = I~v, where I is the identity matrix.
Equivalently, (A I)~v = ~0
Suppose that is an eigenvalue of A. Then the eigenvector ~v is a nonzero
vector in the kernel of the matrix (A I).
It follows that the matrix (A I) is not invertible. But we have a formula
for the inverse of a 22 or 33 matrix, which can fail only if the determinant
is zero. Therefore a necessary condition for the existence of an eigenvalue
is that det(A I) = 0.
The polynomial A () = det(A I) is called the characteristic polynomial of matrix A. It is easy to compute in the 2 2 or 3 3 case, where
there is a simple formula for the determinant. For larger matrices A () is
hard to compute efficiently, and this approach should be avoided.
Conversely, suppose that A () = 0 for some real number . It follows
that the columns of the matrix (A I) are linearly dependent. If we row
reduce the matrix, we will find at least one nonpivotal column, which in
turn implies that there is a nonzero vector in the kernel. This vector is an
eigenvector.

2. A better way to find eigenvectors


~ . Keep computing A~
~,
Given matrix A, pick an arbitrary vector w
w , A2 w
3
~ , etc. until you find a vector that is a linear combination of its predeAw
cessors. This situation is easily detected by row reduction.
Now you have found a polynomial p of degree m such that p(A)~
w = 0.
Furthermore, this is the nonzero polynomial of lowest degree for which
p(A)~
w = 0.
Over the complex numbers, this polynomial is guaranteed to have a root
by virtue of the fundamental theorem of algebra (Hubbard theorem
1.6.13). Over the real numbers or a finite field, it it will have a root in the
field only if you are lucky.
Citing your source: This technique was brought to the worlds attention
by Sheldon Axlers 1995 article Down with Determinants (see Hubbard
page 224). Unlike most of what is taught in undergraduate math, it should
probably be cited when you use it in other courses. An informal comment
like Using Axlers method for finding eigenvectors... would suffice.

10


3 2
3. Consider the matrix A =
with entries from the finite field Z5 .
3 3
(a) Find the eigenvalues of A by solving the characteristic equation
det(A I) = 0, then find the corresponding eigenvectors. Solving a
quadratic equation over Z5 is easy in a pinch, just try all five possible
roots!
(b) Find the eigenvalues of A by using the technique of example 2.7.5
of Hubbard. You will get the same equation for the eigenvalues, of
course, but it will be more straightforward to find the eigenvectors.
(c) Write down the matrix P whose columns are the basis of eigenvectors,
and check your answer by showing that P 1 AP is a diagonal matrix.

11

4. Concocting a 2 2 matrix without a basis of eigenvectors






2 0
1 1
Let D =
,N =
. The matrix N is a so-called nilpotent
0 2
1 1
matrix: because its kernel is the same as its image, N 2 is the zero matrix.
(a) Show that the matrix A = D + N has the property that if we choose
~ that is not in the kernel of N , then the polynomial p(A) is
any w
(A 2I)2 and so there is no basis of eigenvectors.
(b) Prove by induction that Ak = Dk + kDk1 N.

12

5. Eigenbases
To construct the matrix P , we need a basis of eigenvectors. A sufficient,
but not necessary, condition is that the matrix A has n distinct eigenvalues.
In examples, these will be real numbers, but the result is valid also in Cn .
Here is your proof 8.1.
If ~v1 , , ~vn are eigenvectors of A : Rn Rn with distinct eigenvalues
1 n , they are linearly independent.
Suppose, for a contradiction, that the eigenvectors are linearly dependent.
There exists a first eigenvector (the jth one) that is a linear combination
of its predecessors:
~vj = a1 v~1 + + aj1~vj1 .
Multiply both sides by A j I. You get zero on the left, and on the right
you get a linear combination where all the coefficients are nonzero because
j i 6= 0. This is in contradiction to the assumption that ~vj was the first
one that is a linear combination of its predecessors.
Since in Rn there cannot be more than n linearly independent vectors, there
are at most n distinct eigenvalues.
Proof 8.1, start to finish:

13

6. Finding eigenvectors
This method is guaranteed to succeed only for the field of complex numbers, but the algorithm is valid for any field, and it finds the eigenvectors
whenever they exist.
~ . If you are really lucky, A~
Given matrix A, pick an arbitrary vector w
w
~ and you have stumbled across an eigenvector. If not,
is a multiple of w
~ , A3 w
~ , etc. until you find a vector that is a linear
keep computing A2 w
combination of its predecessors. This situation is easily detected by row
reduction.
Now you have found a polynomial p of degree m such that p(A)~
w = 0.
Furthermore, this is the nonzero polynomial of lowest degree for which
p(A)~
w = 0.
Over the complex numbers, this polynomial is guaranteed to have a root
by virtue of the fundamental theorem of algebra (Hubbard theorem
1.6.13). Over the real numbers or a finite field, it it will have a root in the
field only if you are lucky. Assuming that the root exists, factor it out:
p(t) = (t )q(t)
Now p(A)~
w = (A I)q(A)~
w = 0.
Thus q(A)~
w is an eigenvector with eigenvalue .
Here is a 2 2 example where the calculation is easy.


1 4
Let A =
2 5
 
1
~ choose
~.
As the arbitrary vector w
. Compute A~
w and A2 w
0

14

~ , as a linear
Use row reduction to express the third of these vectors, A2 w
combination of the first two.



1 1 7
0 2 8

Write the result in the form p(A)~


w = 0.

Factor: p(t)=
To get the eigenvector for eigenvalue 1, apply the remaining factor of
~.
p(A), A 3I, to w

To get the eigenvector for eigenvalue 3, apply the remaining factor of


~.
p(A), A I, to w

15

7. Change of basis
Our old basis consists of the standard basis vectors ~e1 and ~e2 .
Our new basis consists of one eigenvector for each eigenvalue.
 
 
2
1
Lets choose ~v1 =
and ~v2 =
.
1
1
It would be all right to multiply either of these vectors by a constant or to
reverse their order.
Write down the change of basis matrix P whose columns express the new
basis vectors in term of the old ones.

Calculate the inverse change of basis matrix P 1 whose columns express


the old basis vectors in terms of the new ones.

We are considering a linear transformation that is represented, relative to


the standard basis, by the matrix A. What diagonal matrix D represents
this linear transformation relative to the new basis of eigenvectors?

Confirm that A = P DP 1 . We have diagonalized the matrix A.




1 0
1 1
0 3 1 2


2 1
1 1

16

8. Eigenvectors for a 3 3 matrix


For Hubbard Example 2.7.5, the calculation is best subcontracted to Mathematica. The matrix is

1 1 0
A = 1 2 1
0 1 1

2

~ = 3.
Since we have help with the computation, make the choice w
5
The

2
3
5

matrix to row reduce is

1 0
3
1 3 9, different from the matrix in Hubbard.
2
3
6

The

1
0
0

result of row reduction is the same:

0 0 0
1 0 3
0 1 4

The rest of the work is easily done by hand.


Using the last column, write the polynomial p(t), and factor it.

Find an eigenvector that corresponds to the smallest positive eigenvalue. It


~ ; any vector will do, as long as it is not
is not necessary to use the same w
in the subspace spanned by the other eigenvectors. Hubbard uses ~e1 . Use
~e3 instead.

17

9. When is there an eigenbasis?


This is a difficult issue in general. The simple case is where we are lucky
and find a polynomial p of degree n that has n distinct roots. In that case
we can find n eigenvectors, and it has already been proved that they are
linearly independent. They form an eigenbasis. If the roots are real, the
eigenvectors are elements of Rn . If the roots are distinct but not all real,
the eigenvectors are still a basis of Cn .
~ . Using ~ei leads to
Suppose we try each standard basis vector in turn as w
a polynomial pi . If every pi is a polynomial of degree mi < n, the situation
is more complicated. Theorem 2.7.6 in Hubbard states the result:
There exists an eigenbasis of Cn if and only if all the roots of all the pi are
simple.
Before doing the difficult proof, look the simplest examples of matrices that
do not have n distinct eigenvalues.


2 0
Let A =
. In this case every vector in R2 is an eigenvector
0 2
with eigenvalue 2. There is only one eigenvalue, but any basis is an
eigenbasis.
~ = ~e1 and form the matrix whose columns are w
~ and
If we choose w
A~
w,


1 2
,
0 0
the matrix is already in echelon form.
What is p1 ?

What eigenvector do we find?

~ = ~e2 ?
What eigenvector do we find if we choose w

Key point: we found a basis of eigenvectors, even though there was


only one eigenvalue, and the polynomial (t 2)2 never showed up.

18



2 0
Let A =
. In this case there is only one eigenvalue and there is
1 2
no eigenbasis.
~ = ~e2 ?
What happens if we choose w

~ = ~e1 ,
If we choose w


1 2 4
confirm that
0 1 4

1 0 4
row reduces to
.
0 1 4

What is p1 ?

What happens when we carry out the procedure that usually gives an
eigenvector?

Key point: There was only one eigenvalue, the polynomial (t 2)2
showed up, and we were unable to find a basis of eigenvectors.

19

10. An instructive 3 3 example


The surprising case, and the one that makes the proof difficult, is the one
where there exists a basis of eigenvectors
but there
are fewer than n distinct
1 0 0
eigenvalues. A simple example is A = 0 2 0
0 0 2
Here each standard basis vector is an eigenvector. For the first one the
eigenvalue is 1; for the second and third, it is 2.
A less obvious example is

2 1 1
A = 0 2 0
0 1 1
The procedure for finding eigenvectors is carried out in the Mathematica
file, with the following results:
~ = ~e1 , we get p1 (t) = t 2 and find an eigenvector
Using w

1
0 with eigenvalue 2.
0
~ = ~e2 , we get p2 (t) = (t 1)(t 2) and find two eigenvectors:
Using w


1
1
0 with eigenvalue 1, 1 with eigenvalue 2.
1
1
At this point we have found three linearly independent eigenvectors and we
have a basis.
~ = ~e3 , we get p3 (t) = (t 1)(t 2) and find two eigenvectors:
If we use w


1
1
0 with eigenvalue 1, 0 with eigenvalue 2.
1
0
~ , we will get p(t) = (t 1)(t 2)
In general, if we use some arbitrary w
and we will find the eigenvector with eigenvalue 1 along with some linear
combination of the eigenvectors with eigenvalue 2.
Key points about this case:
The polynomial pi (t), in order to be simple, must have degree less than
n.
We need to use more than one standard basis vector in order to find
a basis of eigenvectors.

20

11. Proof that if all roots are simple there is an eigenbasis


~ = ~ei , the polynomial pi of degree mi
Assume that whenever we choose w
has simple roots. The columns of the matrix that we row reduce are
~ei , A~ei , Ami~ei . The image of this matrix has three properties.
It is a subspace Ei of Rn .
It includes mi eigenvectors. Since these correspond to distinct eigenvalues, they are linearly independent, and therefore they span Ei .
It includes ~ei .
Now take the union of all the Ei . This union has the following properties:
It includes each standard basis vector ~ei , so it is all of Rn .
It is spanned by the union of the sets of eigenvectors. In general there
will be more than n vectors in this set. Use them as columns of a
matrix. The image of this matrix is all of Rn . We can find a basis for
the image consisting of n columns, which are all eigenvectors.
12. Proof that if there is an eigenbasis, each pi has simple roots.
There are k distinct eigenvalues, 1 , , k . It is entirely possible that
k < n, since different eigenvectors may have the same eigenvalue.
Since there is a basis of eigenvectors, we can express each ~ei as a linear
combination of eigenvectors.
Q
Define pi (t) = (t j ). The product extends just over the set of eigenvalues that are associated with the eigenvectors needed to express ~ei as a
linear combination, so there may be fewer than k factors.
Q
~ is any
Form pi (A) = (A j I). The factors can be in any order. If w
eigenvector whose eigenvalue j is included in the product, then
(A j I)~
w = 0 and so pi (A)~
w = ~0. Since those eigenvectors from a basis
for a subspace that includes ~ei , it follows that pi (A)~ei = ~0.
If we form a nonzero polynomial p0i (t) of lower degree by omitting one factor
from the product, then p0i (A)~ei 6= ~0, since the eigenvectors that correspond
to the omitted eigenvalue do not get killed off.
So pi (t) is the nonzero polynomial of lowest degree for which pi (A)~ei = ~0,
and by construction it has simple roots.

21

13. Proof 4.2, first half


~ = ~ei , the polynomial pi of degree mi
Assume that whenever we choose w
has simple roots. Consider the subspace E that is the image of the matrix
whose columns are
~e1 , A~e1 , Am1~e1 , ~e2 , A~e2 , Am2~e2 , , ~en , A~en , Amn~en .
Prove that E = Rn (easy) and that there exists a basis for E that consists
entirely of eigenvectors(harder).

22

14. Proof 4.2, second half


Assume that there is a basis of Rn consisting of eigenvectors of n n matrix
A, but that A has only k n distinct eigenvalues. Prove that for any basis
~ = ~ei , the polynomial pi (t) has simple roots.
vector w

23

15. Conformal matrices and complex numbers




7 10
(a) Show that the polynomial p(t) for the matrix A =
has roots
2 1
3 2i.
(b) Show that ( A3I
)2 = I.
2
(c) Choose a new basis with ~v1 = ~e1 , ~v2 = ( A3I
)~e1 .
2
Use these basis vectors as the columns of matrix P .
Confirm that A = P CP 1 , where C is conformal and P is real.

24

16. Change of basis - nice neat case





 
1 1
1
Let A =
, and find an eigenvector, starting with ~e1 =
.
2 4
0
 


1
1
Then A~e1 =
and A2~e1 =
2
10




1 1 1
1 0 6
We row-reduce
to
0 2 10
0 1 5
and conclude that A2~e1 = 6~e1 + 5A~e1 or A2~e1 5A~e1 + 6~e1 = 0.
 
 
1
1
Complete the process of finding two eigenvalues and show that
and
1
2
2
are a pair of eigenvectors that form a basis for R .
p(t) =
For = 2,
For = 3,
The change of basis matrix P expresses the new basis (eigenvectors) in
terms of the old (standard); so its columns are the eigenvectors. Write
down P and calculate its inverse.

Now we can check the formula


[T ]{v0 },{v0 } = [P{v0 v} ]1 [T ]{v},{v} [P{v0 v} ].
Calculate P 1 AP to get the diagonal matrix D relative to the new basis.

2 1
1 1

1 1
2 4



1 1
1 2

25

17. Fibonacci numbers by matrices


The usual way to generate the Fibonacci sequence is to set a0 = 1, a1 = 1,
then calculate a2 = a0 + a1 = 2, a3 = a1 + a2 = 3, etc.
In matrix notation this can be written
  
 
a1
0 1 1
=
a2
1 1 1
and more generally
n  
 
1
an
0 1
.
=
1
1 1
an+1

Use this approach to determine a2 and a3 , doing the matrix multiplication


first.

0 1
1 1



0 1
1 1

Determine a6 and a7 by using the square of the matrix that was just constructed.

1 1
1 2



1 1
1 2

We have found a slight computational


speedup, but it would be nicer to

n
0 1
have a general formula for
.
1 1

26

18. Powers of a diagonal matrix.


For a 2 2 diagonal matrix,


c1 0
0 c2

n


cn1 0
.
=
0 cn2


The generalization to a diagonal matrix of any size is obvious.


Now suppose that we want to compute An and can find P such that
P


c1 0
AP =
.
0 c2

Prove by induction that


(P 1 AP )n = P 1 An P.

Now show that


cn1 0
P 1
A =P
0 cn2


For the Fibonacci example, this approach works with

1+ 5
1 5
c1 =
, c2 =
,
2
2
and

2
2
P =
1+ 5 1 5


.
The accompanying Mathematica notebook file Outline8.nb confirms this.
We need to find a systematic way to construct the matrix P .

27

Group Problems
1. Some interesting examples with 2 2 matrices
(a) Since a polynomial equation with real (or complex) coefficients always
has a root (the fundamental theorem of algebra), a real matrix is
guaranteed to have at least one complex eigenvalue. No such theorem
holds for polynomial equations with coefficients in a finite field, so
zero eigenvalues is a possibility. This is one of the few results in linear
algebra that depends on the
field.
 underlying

3 1
Consider the matrix A =
with entries from the finite field Z5 .
n 3
By considering the characteristic equation, find values of n that lead
to 2, 1, or 0 distinct eigenvalues. For the case of 1 eigenvalue, find an
eigenvector.
Hint: After writing the characteristic equation with n isolated on the
right side of the equals sign, make a table of the value of t2 + 4t + 4
for each of the five possible eigenvalues. That table lets you determine
how many solutions there are for each of the five possible values of
n. When the characteristic polynomial is the square of a linear factor,
there is only one eigenvector and it is easy to construct.


1 1
(b) The matrix A =
has only a single eigenvalue and only one
4 3
independent eigenvector.
Find the eigenvalue and eigenvector, show that A = D + N where D is
diagonal and N is nilpotent, and use analysis to calculate A3 without
ever multiplying A by itself (unless you want to check your answer).
(c) Extracting square roots by diagonalization.


2 1
The matrix A =
2 3
conveniently has two eigenvalues that are perfect squares. Find a
basis of eigenvectors and construct a matrix P such that P 1 AP is a
diagonal matrix.
Thereby find two independent square roots of A, i.e. find matrices B1
and B2 such that B12 = B22 = A , with B2 6= B1 . Hint: use the
negative square root of one of the eigenvaulues, the positive square
root of the other.
If you take Physics 15c next year, you may encounter this technique
when you study coupled oscillators.

28

2. Some proofs. In doing these, you may use the fact that an eigenbasis exists
if and only if all the pi (t) have simple roots.
(a) Suppose that a 5 5 matrix has a basis of eigenvectors, but that its
only eigenvalues are 1 and 2. Using Hubbard Theorem 2.7.6, convince
yourself that you must make at least three different choices of ~ei in
order to find all the eigenvectors.
(b) An alternative approach to proof 4.1 use induction.
Identify a base case (easy). Then show that if a set of k1 eigenvectors
with distinct eigenvalues is linearly independent and you add to the
set an eigenvector ~vk with an eigenvalue k that is different from any
of the preceding eigenvalues, the resulting set of k eigenvectors with
distinct eigenvalues is linearly independent.
(c) In general, the square matrix A that represents a Markov process has
the property that all the entries are between 0 and 1 and each column
sums to 1. Prove that such a matrix A has an eigenvalue of 1 and
that there is a stationary vector that is transformed into itself by A.
You may use the fact, which we have proved so far only for 22 and
3 3 matrices, that if a matrix has a nonzero vector in its kermel, its
determinant is zero.

29

3. Problems with 3 3 matrices, to be solved by writing or editing R scripts


(a) Sometimes you dont find all the eigenvectors on the first try.

1 2 0
The matrix A = 2 1 0
0 0 1
has three real, distinct eigenvalues, and there is a basis of eigenvectors.
Find what polynomial equation for the eigenvalues arises from each of
the following choices, and use it to construct as many eigenvectors as
possible.:
~ = e~1 .
w
~ = e~3 .
w
~ = e~1 + e~3 .
w

1 1 1
(b) Find two eigenvectors for the matrix A = 1 1 1 . and confirm
2 2 0
that using each of the three standard basis vectors will not roduce a
third independent eigenvector.
Clearly the columns of A are not independent; so 0 is an eigenvalue.
This property makes the algebra really easy.
(c) Use the technique of example 2.7.5 in Hubbard to find the eigenvalues
3 4 4

and eigenvectors of the matrix A = 1 3 1


3 6 4

30

Homework
1. Consider the sequence of numbers described, in a manner similar to the
Fibonacci numbers, by
b3 = 2b1 + b2
b4 = 2b2 + b3
bn+2 = 2bn + bn+1
(a) Write a matrix B to generate this sequence in the same way that
Hubbard generates the Fibonacci numbers.
(b) By considering the case b1 = 1, b2 = 2 and the case b1 = 1, b2 = 1,
find the eigenvectors and eigenvalues of B.
 
1
(c) Express the vector
as a linear combination of the two eigenvectors,
1
and thereby find a formula for bn if b1 = 1, b2 = 1.
2. (This is similar to group problem 1c.)


10 9
Consider the matrix A =
.
18 17
(a) By using a basis of eigenvectors, find a matrix P such that P 1 AP is
a diagonal matrix.
(b) Find a cube root of A, i.e. find a matrix B such that B 3 = A.
3. (a) Prove that if ~v1 and ~v2 are eigenvectors of matrix A, both with the
same eigenvalue , then any linear combination of ~v1 and ~v2 is also
an eigenvector.
(b) Suppose that A is a 3 3 matrix with a basis of eigenvectors but
~ , the vectors
with only two distinct eigenvalues. Prove that for any w
2
~ , A~
~ are linearly dependent. (This is another way to
w
w, and A w
understand why all the polynomials pi (t) are simple when A has a
basis of eigenvectors but a repeated eigenvalue.)

31

4. Harvard graduate Ivana Markov, who concentrated in English and mathematics with economics as a secondary field, just cannot decide whether
she wants to be a poet or an investment banker, and so her career path is
described by the following Markov process:
If Ivana works as a poet in year n, there is a probability of 0.9 that she
will feel poor at the end of the year and take a job as an investment
banker for year n + 1. Otherwise she remains a poet.
If Ivana works as an investment banker in year n, there is a probability
of 0.7 that she will feel overworked and unfulfilled at the end of the
year and take a job as a poet for year n + 1. Otherwise she remains
an investment banker.
 
p
Thus, if n describes the probabilities that Ivana works as a poet or a
qn
banker respectively
probabilities
for year n + 1


in year
 n, the corresponding

0.1 0.7
pn
pn+1
, where A =
=A
are given by
qn
0.9 0.3
qn+1
(a) Find the eigenvalues and eigenvectors of A.
(b) Construct the matrix P whose columns
  are the eigenvectors, invert
1
it, and thereby express the vector
as a linear combination of the
0
eigenvectors.
 
 
p0
1
(c) Suppose that in year 0 Ivana works as a poet, so that
=
.
q
0
0
 
 
pn
p
Find an explicit formula for
and use it to determine 10 . What
qn
q10
happens in the limit of large n?

32

5. (a) Prove by induction (no allowed!) that if F = P CP 1 , then


F n = P C n P 1 for all positive integers n.
(b) Suppose that 2 2 real matrix F has complex eigenvalues rei . Show
that, for integer n, F n is a multiple of the identity matrix if and only
if n = m for some integer m. Hint: write F = P CP 1 where C is
conformal. This hint also helps with the rest of the problem.


3
7
(c) If F =
, find the smallest n for which F n is a multiple of
1 1
the identity. Check your answer by matrix multiplication.


2 15
(d) If G =
, use half-angle formulas to find a matrix A
3
10
for which A2 = G. Check your answer by matrix multiplication.
Problems that require writing or editing R scripts
6. (This is similar to group problem 3b.)
Use the technique of example 2.7.5 in Hubbard to find the eigenvalues and
eigenvectors of the following two matrices. One has a repeated eigenvalue
and will require you to use the technique with two different basis vectors.

3 4 4
(a) A = 1 3 1
3 6 4

1 0 0
(b) B = 1 3 1
1 2 0

5 1 1
7. The matrix A = 1 3 1 has only one eigenvalue, 4, and so its char0 0 4
acteristic polynomial must be (t 4)3 .

(a) Show that A has a two-dimensional subspace of eigenvectors but that


there is no other eigenvector.
(b) Write A = D + N where D is diagonal and N is nilpotent, and confirm
that N 2 is the zero matrix.

33

8. Here is a symmetric matrix, which is guaranteed to have an orthonormal


basis of eigenvectors. For once, the numbers have not been rigged to make
the eigenvalues be integers.

4 1 1
2
A = 1 3
1
2 3
Express A in the form P DP 1 , where D is diagonal and P is an isometry
matrix whose columns are orthogonal unit vectors.
A similar example is in script 1.4X.

34

MATHEMATICS 23a/E-23a, Fall 2015


Linear Algebra and Real Analysis I
Module #2, Week 1 (Number Systems and Sequences)
Authors: Paul Bamberg and Kate Penner (based on their course MATH S-322)
R scripts by Paul Bamberg
Last modified: July 24, 2015 by Paul Bamberg
The lecture outline and problems have not yet been revised for 2015. Pages
1-7 are in final form. Print them if you are watching lecture preview videos or R
script videos before the course starts.
Reading
Ross, Chapter 1, sections 1 through 5 (number systems)
Ross, Chapter 2, sections 7 through 9 (sequences)
Hubbard, section 0.2 (quantifiers and negation)
Hubbard, section 0.6 (infinite sets)
Warmups(to be done before lecture)
(Subsection 1.1) Prove by induction (review how to do this if necessary)
that
n
X
n(n + 1)
.
i=
2
i=1
Then do the proof differently by assuming that there are positive integers
n for which the given formula is not true, letting m be the smallest such
value, and showing that your assumption led to a contradiction because the
formula would also have to be false for m 1.
(Subsection 1.2) Look at the axioms for an ordered field (Ross, p. 14).
Identify one of the axioms that is not satisfied by the complex numbers,
which form a field but not an ordered field.
(Subsection 1.2) You are given an unlimited budget to build a podium, one
foot in height, for the gold medal winner in your schools track meet. Your
only available construction material is squares of gold foil, which are very
thin. Show that the Archimedean property of the real numbers guarantees
that you can succeed.

(Subsection 1.2) Find a way to express 2 (which is irrational) as theleast


upper bound of a set of rational numbers. Hint: you can write 2 in
decimal notation.
1

(Subsection 1.4) After a careful reading of example 1 in section 8, write out


a Formal Proof that
1
lim = 0.
n
(Subsection 1.4) Invent an example of a sequence (sn ) of positive numbers
that is strictly decreasing (sn+1 < sn for all n) but whose limit is not zero.
(Subsection 1.5)Students of calculus readily accept the statement
if |a| < 1, then limn an = 0 on the basis that
when I add 1 to n, an gets smaller in magnitude.
However, the preceding example shows that this observation is not good
enough! Look at page 48 to see how Ross does the proof.
(Subsection 1.5)Invent sequences (sn ) and (tn ) such that lim(sn ) = 0 but
lim(sn tn ) = 2. Hint: look at theorem 9.4. You need to invent a (tn ) that
does not satisfy the hypotheses of this theorem.
Proofs to present in section or to a classmate who has done them.
5.1 Define countably infinite. Prove that the set of positive rational
numbers is countably infinite, but that the set of real numbers in the interval
[0,1], as represented by infinite decimals, is not countable.
5.2 Suppose that sn 6= 0 for all n and that s = lim sn > 0.
Prove that N such that n > N, sn > s/2, and that s1n converges to 1s .
Additional proofs(may appear on quiz, students will post pdfs or videos
5.3 (Ross, p. 25; the Archimedean Property of R)
The completeness axiom for the real numbers states that every nonempty
subset S R that is bounded above has a least upper bound sup S. Use
it to prove that for any two positive real numbers a and b, there exists a
positive integer n such that na > b.
5.4 (Ross, page 52)
Suppose that lim sn = + and lim tn > 0. Prove that lim sn tn = +.

R Scripts
Script 2.1A-Countability.R
Topic 1 - The set of ordered pairs of natural numbers is countable
Topic 2 - The set of positive rational numbers is countable
Script 2.1B-Uncountability.R
Topic 1 - Cantors proof of uncountability
Topic 2 - A different-looking version of the same argument
Script 2.1C-Denseness.R
Topic 1 - Placing rational numbers between any two real numbers
Script 2.1D-Sequences.R
Topic 1 - Limit of an infinite sequence
Topic 2 - Limit of sum = sum of limits
Topic 3 - Convergence of sequence of inverses (proof 5.2)

Executive Summary

1.1

Natural Numbers and Rational Numbers

The natural numbers N are 1, 2, 3, . They have the following rather


obvious properties. What is not obvious is that these five properties (the
Peano axioms) are sufficient to prove any other property of the natural
numbers.
N1. 1 belongs to N.
N2. If n N, then n + 1 N.
N3. 1 is not the successor of any element of N.
N4. If n and m N have the same successor, then n = m.
N5. A subset S N which contains 1, and which contains n + 1
whenever it contains n, must equal N.
Axiom N5 is related to proof by induction, where you want to prove an
infinite set of propositions P1 , P2 , P3 , .
You do this by proving P1 (the base case) and then proving that Pn
implies Pn+1 (the inductive step).
The least number principle states that any nonempty subset of N has a
least element. This statement, along with the assumption that any natural
number except 1 has a predecessor, can be used to replace N5.
Practical application: instead of doing a proof by induction, you can assert
that k > 1 is the smallest integer for which Pk is false, then get a contradiction by showing that Pk1 is also false, thereby proving that the set for
which Pk is false must be empty.
The familiar rational numbers can be regarded as fractions in lowest terms:
and 2m
represent the same rational number. The rational number
e.g. m
n
2n
m
r = n satisfies the first-degree polynomial equation nx m = 0. More
generally, a number that satisfies a polynomial equation of any (finite)
degree, like x2 2 = 0 or x5 + x 1 = 0, is called an algebraic number.
The rational numbers form a countably infinite set, which means that
there is a bijection between them and the natural numbers. Many proofs
rely on the fact that the rational numbers, or a subset of them, can be
enumerated as q1 , q2 , .

1.2

Rational Numbers and Real Numbers

The rational numbers and the real numbers each form an ordered field,
which means that there is a relation with properties
O1. Given a and b, either a b or b a.
O2. If a b and b a, then a = b.
O3. If a b and b c then a c.
O4. If a b, then a + c b + c.
O5. If a b and 0 c, then ac bc.
Many important properties of infinite sequences of real numbers can be
proved on the basis of ordering.
If we think of the rational numbers or the real numbers as lying on a number
line, we can interpret the absolute value |a b| as the distance between
point a and point b: dist(a, b) = |a b|. In two dimensions the statement
dist(a, b) dist(a, c) + dist(c, b) means that the length of one side of a
triangle cannot exceed the sum of the lengths of the other two sides. The
name triangle inequality is also applied to the one-dimensional special
case where c = 0; i.e. |a + b| |a| + |b|.
Many well-known rules of algebra are not included on the list of field axioms.
Usually, as for (a)(b) = ab, this is because they are easily provable
theorems. However, there are properties of the real numbers that cannot
be proved from the field axioms alone because they rely on the axiom that
the real numbers are complete. The Completeness Axiom states that
Every nonempty subset S of R that is bounded above has a least upper
bound.
This least upper bound sup S is not necessarily a member of the set S.
The Archimedean property of the real numbers states that
for any two positive real numbers a and b, there exists a positive integer n
such that na > b. Its proof requires the Completeness Axiom.
The rational numbers are a dense subset of the real numbers. This means
if a, b R and a < b, there exists r Q such that a < r < b.
Again the proof relies of the completeness of the real numbers.
It is not unreasonable to think of real numbers as infinite decimals (though
there are complications). In this view, (which is not even algebraic) is
the least upper bound of the set
S = {3, 3.1, 3.14, 3.141, 3.1415, 3.14159, }
The real numbers form an uncountable set. This means that there is no bijection between them and the natural numbers: they cannot be enumerated
as r1 , r2 , .

1.3

Quantifiers and Negation

Quantifiers are not used by Ross, but they are conventional in mathematics
and save space when you are writing proofs.
is read there exists. It is usually followed by such that or s.t.
Example: the proposition x s.t. x2 = 4 is true since either 2 or -2 has
the desired property.
is read for all or for each or for every. It is used to specify that some
proposition is true for every member of a possibly infinite set or sequence.
Example: x R, x2 0 is true, but x R, x2 > 0 is false.
Quantifiers and negation: useful in doing proofs by contradiction.
The negation of x such that P (x) is true is x, P (x) is false.
The negation of x, P (x) is true is x such that P (x) is false.

1.4

Sequences and their limits

A sequence is really a function whose domain is a subset n m of the


integers, usually starting with m = 0 or 1, and whose codomain (in this
module) is R. Later we will consider sequences of vectors in Rn .
A specific element is denoted sn . The entire sequence can be denoted
(s1 , s2 , ) or (sn )nN or even just (sn ).
Although a sequence is infinite, the set of values in the sequence may be
finite; e.g. for sn = cos n the set of values is just {1, 1}.
Limit of a sequence always refers to the limit as n becomes very large; so
it is unambiguous to write it lim sn instead of limn sn .
Sequence (sn ) is said to converge to the limit s if
 > 0, N N such that n > N, |sn s| < .
To prove that a sequence (sn ) converges by using this definition, we have
to know or guess the value of the limit s. The rest is algebra, frequently
rather messy algebra.
If the limit of a sequence exists, it is unique. The proof is a classic application of the triangle inequality.
A formal proof should be as concise as possible while omitting nothing
that is essential. Sometimes it obscures the chain of thought that led to the
invention of the proof. Formal proofs are nice, and you should learn how
to write them (Ross has six examples in section 8 and six more in section
9), but if your goal is to convince or instruct the reader, a longer version of
the proof may be preferable.
6

1.5

Theorems about sequences and their limits

Theorems about limits, all provable from the definition. These will be
especially useful for us after we define continuity in terms of sequences.
If lim sn = s then lim(ksn ) = ks.
If lim sn = s and lim tn = t, then lim(sn + tn ) = s + t.
Any convergent sequence is bounded:
if lim sn = s, M such that n, |sn | < M.
If lim sn = s and lim tn = t, then lim(sn tn ) = st.
If lim sn = 0 and (tn ) is bounded, then lim(sn tn ) = 0.
If sn 6= 0 for all n and s = lim sn 6= 0, then inf |sn | > 0 and
converges to 1s .

1
sn

Using the limit theorems above is usually a much more efficient way to find
the limit of the sequence than doing a brute-force calculation of N in terms
of . Ross has six diverse examples.
The symbol + has a precise meaning when used to specify a limit. We
say that the sequence sn diverges to + if
M > 0, N such that n > N, sn > M .
Similarly, we say that the sequence sn diverges to if
M < 0, N such that n > N, sn < M .
Theorems about infinite limits:
If lim sn = + and lim tn > 0 (could be +), then lim sn tn = +.
If (sn ) is a sequence of positive real numbers, then lim sn = + if and
only if lim s1n = 0.
If lim sn = +, then lim sn + tn = + if tn has any of the following
properties:
lim tn >
tn is bounded (but does not necessarily converge).
inf tn > (who cares whether tn is bounded above?).

Lecture Outline
1. Peano axioms for the natural numbers N = 1, 2, 3,
N1. 1 belongs to N.
N2. If n N, then n + 1 N.
N3. 1 is not the successor of any element of N.
N4. If n and m N have the same successor, then n = m.
N5. A subset S N which contains 1, and which contains n + 1
whenever it contains n, must equal N.
Axiom N5 is related to proof by induction, where you want to prove an
infinite set of propositions P1 , P2 , P3 , .
You do this by proving P1 (the base case) and then proving that Pn
implies Pn+1 (the inductive step).
A well known example: the formula 1 + 2 + 3 + + n = 21 n(n + 1)
For proposition P1 simply set n = 1: it is true that 1 = 21 n(n + 1)
Write down proposition Pn , and use a little algebra to show that if Pn is in
the sequence of true propositions, then so is Pn+1

A surprising replacement for axiom N5:


Every subset of N has a smallest element.
Any element of N except 1 has a predecessor.
Use these two statements (plus N1 through N4) to prove N5.

Practical application: instead of doing a proof by induction, you can denote


by k the smallest integer for which Pk is false, then get a contradiction by
showing that Pk1 is also false, thereby proving that the set for which Pk
is false must be empty.
How this works in our exampleSuppose that 1 + 2 + 3 + + n = 21 n(n + 1) is not always true. Then there
is a nonempty subset of natural numbers for which it is false. This subset
includes a smallest number k.
Using our analysis from the previous page:
How do we know that k cannot be 1?

Giiven that k cannot be 1, how do we know that k cannot in fact be the


smallest element for which Pk is false?

There is less to this approach than meets the eye. Instead of proving that
Pk implies Pk+1 for k 1, we showed that NOT Pk implies NOT Pk1 for
k 2,
But these two statements are logically equivalent: quite generally, for propostions p and q, p = q if and only if q = p. (principle of
contraposition)
A practical rule of thumb:
If it is easier to prove that Pk = Pk+1 , use induction.
If it is easier to prove that Pk = Pk1 , use the least-number
principle.
9

2. Proof by induction and least number principle


Students of algebra are aware that for any positive integer n, xn y n is
divisible by x y.
Give a formal inductive proof of this theorem by induction
(formal means no use of ).
Give an alternative proof using the fact that any nonempty set of
positive integers contains a smallest element.

10

3. (Ross, page 16; consequences of the ordered field axioms)


Using the fact that a set of numbers F (could be Q or R) satisfies the
ordered field axioms
O1. Given a and b, either a b or b a.
O2. If a b and b a, then a = b.
O3. If a b and b c then a c.
O4. If a b, then a + c b + c.
O5. If a b and 0 c, then ac bc.
prove the following:
If a b then b a.
a F , a2 0.
4. (Countability of the rational numbers - first part of proof 5.1 - script 2.1A)
Use the diagonal trick to prove that the positive rational numbers form
a countably infinite set.
5. (Ross, p. 25; the Archimedean Property of R and the denseness of Q corollary in script 2.1C)
The completeness axiom for the real numbers states that every nonempty
subset S R that is bounded above has a least upper bound sup S. Use
it to prove that for any two positive real numbers a and b, there exists a
positive integer n such that na > b.
6. Uncountability of the real numbers - second part of proof 5.1 - script 2.1B)
Prove that the real numbers between 0 and 1, as represented by infinite
decimals, form an uncountably infinite set.
7. (Ross, page 37 - to be done in LaTeX)
Prove that if lim sn = s and lim sn = t, then s = t.
8. (Ross, page 46 - script 2.1D)
Prove that if lim sn = s and lim tn = t, then lim(sn + tn ) = s + t.
9. (Ross, pages 45 and 47)
Prove that any convergent sequence is bounded, then use this result to show
that if lim sn = s and lim tn = t, then lim(sn tn ) = st.
10. (Ross, pages 43 and 47 - script 2.1D)
Suppose that sn 6= 0 for all n and that s = lim sn > 0.
Prove that N such that n > N, sn > s/2 and that s1n converges to 1s .
11. (Ross, page 48)
1

Using the binomial expansion, prove that lim(n n ) = 1.


11

12. (Ross, page 52 - to be done in LaTeX)


Suppose that lim sn = + and lim tn > 0. Prove that lim sn tn = +.

12

13. Proofs based on nothing but the ordered field axioms


O1. Given a and b, either a b or b a.
O2. If a b and b a, then a = b.
O3. If a b and b c then a c.
O4. If a b, then a + c b + c.
O5. If a b and 0 c, then ac bc.
(a) Using the axioms for an ordered field, prove that the sum of two positive numbers is a positive number.
(b) Using the axioms for an ordered field, prove that the product of two
positive numbers is a positive number.
(c) Prove that Z5 is not an ordered field.

13

14. Least upper bound principle works for R but not for Q.
Your students at Springfield North are competing with a rival team from
Springfield South to draw up a business plan for a company with m scientists
and n other employees. Entries with m2 > 2n2 get rejected. The entry with
the highest possible ratio of scientists to other employees wins the contest.
Will this competition necessarily have a winner?

14

15. Use quantifiers to express the following concepts:


(a) No matter how large a positive number M you choose, the sequence
(sn ) has infinitely many elements that are greater than M .
Does this statement imply that lim sn = +?
(b) No matter how small a positive number  you choose, the sequence
(sn ) has only finitely many elements that lie outside the interval
(a , a + ).
Does this statement imply that lim sn = a?

15

16. Proving limits by brute force


Prove by brute force that the sequence
1 2 3 4
, , , ,
3 5 7 9
converges to
1
.
2

16

17. Using limit theorems and trickery to prove limits


(a) Evaluate
lim
Note:

.
2
n( n + 1 n2 1)

= 0.99999999874999999...
100( 10001 9999)

(b) Evaluate
4

lim((n + 1) 3 n 3 ).
4

Note: 101 3 100 3 = 6.19907769....;

17

3
100 = 4.6415....

Group Problems
1. Proofs that use induction
(a) Prove that for all nonnegative integers n
n
X

i3 =

i=1

n
X

!2
i

i=1

Hint: the following identity from warmup #1 may be useful


n
X

i=

i=1

n(n + 1)
2

(b)

Starting from xy |xy|, which looks like Cauchy-Schwarz, prove


the triangle inequality |a + b| |a| + |b| for an ordered field
Starting from the triangle inequality, prove that for n numbers
a1 , a2 , , an
|a1 + a2 + + an | |a1 | + |a2 | + + |an |.

(c)

Use the Archimedean property of the real numbers to prove


if a and b are positive real numbers and a < b, there exists r Q
such that a < r < b.
If you need a hint, look at section 4.8 in Ross or run script 2.1C.
The fact that a and b are positive makes the proof easier than the
one in Ross.
By induction, prove that in any open interval (a, b) there are infinitely many rational numbers.

18

2. Properties of sequences (to be done in LaTeX)


(a) The squeeze lemma
Consider three sequences (an ), (bn ), (cn ) such that an sn bn for all
n and lim an = lim bn = s. Prove that lim sn = s.
(b) Using quantifiers to describe sequences
Let sn denote the number of inches of snowfall in Cambridge in year n,
e.g. s2013 = 90. Using the quantifiers (there exists) and (for all),
convert the following English sentences into mathematical notation.
i. There will be infinitely many years in which the Cambridge snowfall exceeds 100 inches.
ii. If you wait long enough, there will come a year after which Cambridge never again gets more than 20 inches of snow.
iii. The snowfall in Cambridge will approach a limit of zero.
(c) Prove that if sequence (tn ) is bounded and lim(sn ) = 0, then lim(tn sn ) =
0.

19

3. Some slightly computational problems


(a) Proving limits by brute force
Let
6n 4
sn =
.
2n + 8
Determine lim sn and prove your answer by brute force, directly from
the definition of limit. (For a model, see Ross, Example 2 on page 39.)
Then get the same answer more easily by using limit theorems.
(b) Finding limits by using limit theorems
p
Determine lim( n(n + 2) n), stating what limit theorems you are
using in each step.
Hint: Use the same trick of irrationalizing the denominator as in
Ross, section 8, example 5. However, that example requires using the
definition of limit. You can invoke limit theorems, which makes things
much easier.

(c) (Ross, 9.4) Let s1 = 1 and for n 1 let sn+1 = sn + 1


List the first four terms of (sn ).
It turns out that (sn )converges to a limit s. Assuming this fact,
prove that s = 21 (1 + 5).

20

Homework
1. Ross, exercise 1.1. Do the proof both by induction (with base case and
inductive step) and by the least number principle (show that the assumption that there is a nonempty set of positive integers for which the formula
is not true leads to a contradiction)
2. Using quantifiers to describe infinite sequences
A Greek hero enters the afterlife and is pleased to learn that the goddess
Artemis is going to be to training him for eternity. He will be shooting an
infinite sequence of arrows. The distance that the nth arrow travels is sn .
Use quantifiers and to convert the following to mathematical notation.
(a) He will shoot only finitely many arrows more than 200 meters.
(b) The negation of (a): he will shoot infinitely many arrows more than
200 meters. (You can do this mechanically by using the rules for
negation of statements with quantifiers.)
(c) No matter how small a positive number  Artemis chooses, all the rest
of his shots will travel more than 200  meters. (Off the record
this idea can be expressed as lim inf sn = 200)
(d) He will become so consistent that eventually any two of his subsequent
shots will differ in distance by less than 1 meter. (This idea will
resurface next week as the concept of Cauchy sequence.)
3. Denseness of Q
This problem is closely related to group problem 1c.
(a) Find a rational number x such that

355
113

22
.
7
355
.
113

<x<

(b) Find a rational number x such that < x <


Hint: = 4 arctan 1, which any decent calculator can evaluate.
4. Ross, exercise 3.6.
5. Ross, exercise 4.8. If you like this problem, you might enjoy reading enrichment section 6 in Ross, which explains how to construct the real numbers
using Dedekind cuts.
6. Ross, Exercise 8.2(c) and 8.2(e). You might want to use the limit theorems
from section 9 to determine the limit, but then do a Formal Proof in the
style of the examples from section 8, working directly from the definition
of limit.

21

The last three problems must be dome in LaTeX. Print the pdf file and
attach it to your handwritten solutions.
7. Ross, Exercise 8.9. The star on the exercise means that it is referred to in
many places.
8. Ross, Exercise 9.12. This ratio test may be familiar from a calculus
course. There is a similar, better known test for infinite series that is
slightly more difficult to prove.
9. Ross, Exercises 9.15 and 9.16(a). The first of these results is invoked frequently in calculus courses, especially in conjunction with Taylor series, but
surprisingly few students can prove it. If you are working the problems in
order, both should be easy.

22

MATHEMATICS 23a/E-23a, Fall 2015


Linear Algebra and Real Analysis I
Module #2, Week 2 (Series, Convergence, Power Series)
Authors: Paul Bamberg and Kate Penner (based on their course MATH S-322)
R scripts by Paul Bamberg
Last modified: July 24, 2014 by Paul Bamberg
The lecture outline and problems have not yet been revised for 2015. Pages
1-8 are in final form. Print them if you are watching lecture preview videos or R
script videos before the course starts.
Reading from Ross
Chapter 2, sections 10 and 11(pp. 56-77) (monotone and Cauchy sequences,
subsequences, introduction to lim sup and lim inf)
Chapter 2, sections 14 and 15 (pp. 95-109) (series and convergence tests)
Chapter 4, section 23 (pp.187-192) (convergence of power series)
Warmups(to be done before lecture)
Give an example of:
a set that contains its supremum and infimum.
a set that contains only its supremum.
a set that contains neither its supremum nor infimum.
From an analytic perspective, if given a series or sequence, how do you
show convergence? (If I gave you a sequence or series and asked whether
it converged, what would you need to compute or demonstrate to conclude
that it was convergent?) How would you show divergence?
Review chapter 2, section 16 on decimal expansion of real numbers.
Given a repeating decimal, you can write this number as a geometric series.
Write the repeating decimal 0.363636363 as a geometric series, and use
the formula
a
1r
to show that it is equal to 7/11.
P1
Give a brief explanation as to why the harmonic series
diverges. It
n
need not be rigorous - we will be exploring this in full in this class.

Review the convergence tests you can remember and any specific criteria
for their applications. Use one to show that

en

n=0

is convergent and another to show that

X
(1)n

n
n=0

is convergent. Try applying the Root Test (which may be unfamilar) to a


geometric series.
Proofs to present in section or to a classmate who has done them.
6.1 Bolzano-Weierstrass
Prove that any bounded increasing sequence converges. (You can
assume without additional proof the corresponding result, that any
bounded decreasing sequence converges.)
Prove that every sequence (sn ) has a monotonic subsequence.
Prove the Bolzano-Weierstrass Theorem: every bounded sequence has
a convergent subsequence.
6.2 The Root Test
P
Consider the infinite series
an and P
the lim sup |an |1/n , referred to as .
Prove the following statements about
an :
The series converges absolutely if < 1.
The series diverges if > 1.
If = 1, then nothing can be deduced conclusively about the behavior
of the series.
Additional proofs(may appear on quiz, students wiill post pdfs or
videos
6.3 (Cauchy sequences) A Cauchy sequence is defined as a sequence where
 > 0, N s.t. m, n > N = |sn sm | < 

Prove that any Cauchy sequence is bounded.


Prove that any convergent sequence is Cauchy.

Prove that any Cauchy sequence of real numbers is covergent. You


will need to use something that follows from the completeness of the
real numbers. This could be the Bolzano-Weierstrss theorem, or it
could the fact that, for a sequence of real numbers, if lim inf sn =
lim sup sn = s, then lim sn is defined and
lim sn = s
6.4 (Ross, p.188, Radius ofP
Convergence)
Consider the power series
an xn . Let us refer to lim sup |an |1/n as and
1/ as R. If = 0, R = + and if = +, R = 0. )
Prove the following:
If |x| < R, the power series converges.
If |x| > R, the power series diverges.

R Scripts
Script 2.2A-MoreSequences.R
Topic 1 Cauchy Sequences
Topic 2 Lim sup and lim inf of a sequence
Script 2.2B-Series.R
Topic 1 Series and partial sums
Topic 2 Passing and failing the root test
Topic 3 Why the harmonic series diverges

1
1.1

Executive Summary
Monotone sequences

A sequence (sn ) is increasing if sn sn+1 n.


A sequence (sn ) is strictly increasing if sn < sn+1 n.
A sequence (sn ) is decreasing if sn sn+1 n.
A sequence (sn ) is strictly decreasing if sn > sn+1 n.
A sequence that is either increasing or decreasing is called a monotone sequence.
All bounded monotone sequences converge.
For an unbounded increasing sequence, limn sn = +.
For an unbounded decreasing sequence, limn sn = .

1.2

Supremum, infimum, maximum, minimum

The supremum of a subset S (which is a subset of some set T ) is the least


element of T that is greater than or equal to all of the elements that are in the
subset S. The supremum of the subset S definitely lives in the set T . It may also
be in S, but that is not a requirement.
The supremum of a sequence the least upper bound of its set of elements.
The maximum is the largest value attained within a set or sequence.
It is easy to find examples of sets or sequences for which no supremum exists, or
for which a supremum exists but a maximum does not.
The infimum of a sequence is the greatest lower bound, or the greatest
element of T that is less than or equal to all of the elements that are in the
subset S. It is not the same as a minimum, because the minimum must be
achieved in S, while the infimum may be an element of only T .

1.3

Cauchy sequences

A sequence is a Cauchy sequence if


 > 0, N s.t. m, n > N, |sn sm | < 
Both convergent and Cauchy sequences must be bounded.
A convergent sequence of real numbers or of rational numbers is Cauchy.
A Cauchy sequence of real numbers is convergent.
It is easy to invent a Cauchy sequence of rational numbers whose limit is an
irrational number.
Off the record: quantum mechanics is done in a Hilbert space, one of the
requirements for whichis that every Cauchy sequence is convergent. Optimization
problems in economics are frequently formualated in a Banach space, which has
the same requirement.
5

1.4

lim inf and lim sup

Given any bounded sequence, the tail of the sequence, which consists of the
infinite number of elements beyond the N th element, has a well-defined supremum
and infimum.
Let us combine the notion of limit with the definitions of supremum and
infimum. The limit infimum and limit supremum are written and defined as
follows:
lim inf sn = lim inf{sn : n > N }
N

lim sup sn = lim sup{sn : n > N }


N

The limit supremum is defined in a parallel manner, only considering the


supremum of the sequences instead of the infimum.
Now that we know the concepts of lim inf and lim sup, we find the following
properties hold:
If lim sn is defined as a real number or , then
lim inf sn = lim sn = lim sup sn
If lim inf sn = lim sup sn , then lim sn is defined and
lim sn = lim inf sn = lim sup sn
For a Cauchy sequence of real numbers, lim inf sn = lim sup sn , and so the
sequence converges.

1.5

Subsequences and the Bolzano-Weierstrass theorem

A subsequence is a sequence obtained by selecting an infinite number of terms


from the parent sequence in order.
If (sn ) converges to s, then any subsequence selected from it also converges to s.
Given any sequence, we can construct from it a monotonic subsequence, either an increasing whose limit is lim sup sn , a decreasing sequence whose limit
is lim inf sn , or both. If the original sequence is bounded, such a monotonic sequence must converge, even if the original sequence does not.
This construction proves one of the most useful results in all of mathematics, the
Bolzano-Weierstrass theorem:
Every bounded sequence has a convergent subsequence.

1.6

Infinite series, partial sums, and convergence

Given an infinite series an we define the partial sum


sn =

n
X

ak

k=m

The lower limitPm is usually either 0 or 1.


The series
k=m ak is said to converge when the limit of its partial sums
as n equals some
P number S. If a series does not converge, it is said to
diverge. The sum
an has no meaning unless its sequence of partial sums
either converges to a limit S or diverges to either + or .
A series with all positive terms will either converge or diverge to +.
A series with all negative terms will either converge or divergePto .
For a series with both positive and negative terms, the sum
an may have no
meaning.
P
A series is called absolutely convergent if the series
|an | converges.
Absolutely convergent series are also convergent.

1.7

Familiar examples

A geometric series is of the form


a + ar + ar2 + ar3 + . . .
If |r| < 1, then

arn =

n=0

A p-series is of the form

a
1r

X
1
np
n=1

for some positive real number p. It converges if p > 1, diverges if p 1.

1.8

Cauchy criterion

. We say that a series satisfies the Cauchy criterion if the sequence of its partial
sums is a Cauchy sequence. Writing this out with quantifiers, we have
 > 0, N s.t. m, n > N, |sn sm | < 
Here is a restatement of the Cauchy criterion, which proves more useful for
some proofs:
n
X
 > 0, N s.t. n m > N, |
ak | < 
k=m

A series converges if and only if it satisfies the Cauchy criterion.


7

1.9

Convergence tests

Limit of the terms. If a series converges, the limit of its terms is 0.


P
Comparison
Test. Consider the series
an P
of all positive terms.
P
If P an converges and |bn | < an for all n then
bn also
P converges.
If
an diverges to + and |bn | > an for all n, then
bn also diverges to
+
P
Ratio Test. Consider the series
an of nonzero terms.
an+1
This series converges if lim sup | an | < 1
This series diverges if lim inf | an+1
|>1
an
an+1
an+1
If lim inf | an | 1 lim sup | an |, then we have no information and need
to perform another test to determine convergence.
P
Root Test. Consider the seriesP an , and evaluate lim sup |an |1/n .
If lim sup |an |1/n < 1, the series P an converges absolutely.
If lim sup |an |1/n > 1, the series
an diverges.
If lim sup |an |1/n = 1, the test gives no information.
Integral Test. Consider a series of nonnegative terms for which the other
tests seem to be failing. In the event that we can find a function f (x), such
that f (n) = an n, we may look at the behavior of this functions integral
to tell us whether
the series converges.
Rn
If limn R1 f (x)dx = +, then the series will diverge.
n
If limn 1 f (x)dx < +, then the series will converge.
Alternating Series Test. If the absolute value of the each term in an
alternating series is decreasing and has a limit of zero, then the series converges.

1.10

Convergence tests for power series

Power series are series of the form

an x n

n=0

where the sequence (an ) is a sequence of real numbers. A power series defines a
function of x whose domain is the set of values of x for which the series converges.
That, of course, depends on the coefficients (an ). There are three possibilities:
Converges x R.
Converges only for x = 0.
Converges x in some interval, centered at 0. The interval may be open (R, R),
closed [R, R], or a mix of the two like [R, R]. The number R is called the radius
of convergence. Frequently the series converges absolutely in the interior of the
interval, but the convergence at an endpoint is only conditional.
8

Lecture Outline
1. (Ross, p. 62, convergent & Cauchy sequences) A Cauchy sequence is defined as a sequence where  > 0, N s.t. m, n > N = |sn sm | < 

(a) Prove that any Cauchy sequence is bounded.


(b) Prove that any convergent sequence is Cauchy.
2. (Ross, pp. 60-62, limits of the supremum and infimum)
The limit of the supremum, written lim sup is defined as follows:
lim sup sn = lim sup{sn : n > N }
N

The limit of the infimum, written lim inf is defined as follows:


lim inf sn = lim inf{sn : n > N }
N

(We do not restrict sn to be a bounded sequence, so if it is not bounded


above, lim sup sn = +, and if it is not bounded below, lim inf sn = )
Let (sn ) be a sequence in R. Prove that if lim inf sn = lim sup sn = s, then
lim sn is defined and
lim sn = s
3. (Ross, p. 64, convergent & Cauchy sequences)
Using the result of the preceding proof, which relies on the completeness
axiom for the real numbers, prove that any Cauchy sequence of real numbers is convergent.

4. (Convergent subsequences, Bolzano Weierstrass)


Given a sequence (sn )nN , a subsequence of this sequence is a sequence
(tk )kN , where for each k, there is a positive integer nk such that
n1 < n2 < . . . < nk < nk+1 . . .
and tk = snk . So (tk ) is just a sampling of some, or all, of the (sn ) terms,
with order preserved.
A term sn is called dominant if it is greater than any term that follows it.
(a) Use the concept of dominant term to prove that every sequence (sn )
has a monotonic subsequence.
(b) Prove that any bounded increasing sequence converges to its least
upper bound.
(c) Prove the Bolzano-Weierstrass Theorem: every bounded sequence has
a convergent subsequence.
5. (Ross, p. 96, Example 1, geometric series (refers also to p. 98))
Prove that

X
k=0

ark =

a
if |r| < 1,
1r

and that the series diverges if |r| 1.


For the sake of novelty, do the first part of the proof by using the leastnumber principle insted of by induction.

10

6. (Ross, p.99-100, The Root Test)


P
Consider the infinite series
an and P
the lim sup |an |1/n , referred to as .
Prove the following statements about
an :
(you may assume the Comparison Test as proven)
The series converges absolutely if < 1.
The series diverges if > 1.
If = 1, then nothing can be deduced conclusively about the behavior
of the series.
7. (Ross,
P pp. 99-100, The Ratio Test)
Let
an be an infinite series of nonzero terms. Prove the following (you
may assume the Root Test as proven). You may also use without proof the
following result from Ross (theorem 12.2):
lim inf |

1
1
sn+1
sn+1
| lim inf |sn | n lim sup |sn | n lim sup |
|
sn
sn

If lim sup |an+1 /an | < 1, then the series converges absolutely.
If lim inf |an+1 /an | > 1, then the series diverges.
If lim inf |an+1 /an | 1 lim sup |an+1 /an |, then the test gives no
information.
8. (Ross, p.188, Radius of Convergence)
P
Consider the power series
an xn . Let us refer to lim sup |an |1/n as and
1/ as R. (Logically, it follows that if = 0, R = + and if = +, R =
0. )
Prove the following:
If |x| < R, the power series converges.
If |x| > R, the power series diverges.
(You may recognize R here as the radius of convergence.)

11

9. Defining a sequence recursively (model for group problems, set 1)


Johns rich parents hope that a track record of annual gifts to Harvard will
enhance his chance of admssion. On the day of his birth they set up a trust
fund with a balance s0 = 1 million dollars. On each birthday they add
another million dollars to the fund, and the trustee immediately donates
1/3 of the fund to Harvard in Johns name. After the donation, the balance
is therefore
2
sn+1 = (sn + 1).
3
Use R to find the annual fund balance up through s18 .
Use induction to show sn < 2 for all n.
Show that (sn ) is an increasing sequence.
Show that lim sn exists and find lim sn .

12

10. What is the fallacy in the following argument?

loge 2 = 1

1 1 1 1 1 1 1
+ + + + .
2 3 4 5 6 7 8

1
1 1 1 1
loge 2 = + + .
2
2 4 6 8
1 1 1 1 1 1
3
loge 2 = 1 + + + = loge 2.
2
4 4 3 8 8 5
3
= 1; 3 = 2; 1 = 0.
2

13

11. Clever proofs for p-series.


P1
(a) Prove that
= + by showing that the sequence of partial sums
n
is not a Cauchy sequence.
(b) Evaluate

X
n=2

1
n(n 1)

by exploiting the fact that this is a telescoping series.


(c) Prove that

X
1
n2
n=2

is convergent.

14

12. For the sequence


n+2
n
sin( ),
n+1
4
give three examples of a subsequence, find the lim sup and the lim inf, and
determine whether it converges.
sn =

15

13. A case where the root test outperforms the ratio test
(Ross, Example 8 on page 103)

X
n=0

n n

2(1)

=2+

1 1
1
1
1
+ +
+ +
+ .
4 2 16 8 64

(a) Show that the ratio test fails totally.


(b) Show that the root test creectly concludes that the seies is convergent.
(c) Find a simpler argument using the comparison test.

16

14. (Model for group problems, set 3) Find the radius of convergence and the
exact interval of convergence for the series

X
n 3n
x ,
2n
n=0

(a) by using the Root Test.


(b) by using the Ratio Test.

17

Group Problems
1. Subsequences, monotone sequences, lim sup and lim inf
(a) (Ross, 11.4) Here are four sequences:
n
)
4
an example of a monotone subsequence.
its set of subsequential limits.
its lim sup and lim inf.
is bounded? converges? diverges to +?
n

an = (2)n , xn = 5(1) , yn = 1 + (1)n , dn = n cos (


i.
ii.
iii.
iv.

For each sequence, give


For each sequence, give
For each sequence, give
Which of the sequences
diverges to ?

(b) (Ross, 12.4)


Show that lim sup(sn + tn ) lim sup sn + lim sup tn for bounded sequences (sn ) and (tn ), and invent an example where lim sup(sn + tn ) <
lim sup sn + lim sup tn . There is a hint on page 82 of Ross.
(c) The following famous series, known as Gregorys series but discovered
by the priest-mathematicians of southwest India long before James
Gregory (1638-1675) was born, converges to 4 .
1 1 1 1

= 1 + + + .
4
3 5 7 9
i. For the sequence of partial sums (sn ), find an increasing subsequence and a decreasing subsequence.
ii. Prove that lim sup sn = lim inf sn
iii. Prove that the series is not absolutely convergent by showing that
it fails the Cauchy test with  = 1/2,

18

2. Sequences, defined recursively


Feel free to use R to calculate the first few terms of the sequence instead
of doing it by hand. Using a for loop, you can easily calculate as many
terms as you like. By modifying script 2.2C, you can easily plot the first
20 or so terms. It you come up with a good R script, please upload it to
the solutions page.
n
)s2n for n > 1.
(a) (Ross, 10.9) Let s1 = 1 and sn+1 = ( n+1

Find s2 , s3 , s4 if working by hand. If using R, use a for loop to go


at least as far as s20 .
Show that lim sn exists.
Prove that lim sn = 0.
(b) (Ross, 10.10) Let s1 = 1 and sn+1 = 13 (sn + 1) for n > 1.
Find s2 , s3 , s4 if working by hand. If using R, use a for loop to go
at least as far as s20 .
Use induction to show sn > 12 for all n.
Show that (sn ) is a decreasing sequence.
Show that lim sn exists and find lim sn .
(c) (Ross, 10.12) Let t1 = 1 and tn+1 = [1

1
]t
(n+1)2 n

for n > 1.

Find t2 , t3 , t4 if working by hand. If using R, use a for loop to go


at least as far as t20 .
Show that lim tn exists.
for all n.
Use induction to show tn = n+1
2n
Find lim tn .

19

This last set of problems should be done using LaTeX. They provide good
practice with summations, fractions, and exponents.
3. Applying convergence tests to power series (Ross, 23.1 and 23.2)
Find the radius of convergence R and the exact interval of convergence.
In each case, you can apply the root test (works well with powers) or the
ratio test (works well with factorials) to get an equation that can be solved
for x to get the radius of convergence R. Since you have an xn , the root test,
which you may not have encountered in AP calculus, is especially useful. At
the endpoints you may need to apply something like the alternating series
test or the integral test.
Remember that lim n1/n = 1.
(a)

X
X 2n
xn! .
( )xn and
n!

(b)
X

X
3n
n
)x
and
nxn .
n 4n

(c)
X (1)n
X 3n
xn .
( 2 n )xn and
n4
n

20

Homework
1. Ross, 10.2 (Prove all bounded decreasing sequences converge.)
2. Ross, 10.6
3. Ross, 11.8.
4. Suppose that (sn ) is a Cauchy sequence and that the subsequence (s1 , s2 , s4 , s8 , s16 , )
converges to s. Prove that lim sn = s. Hint use the standard bag of tricks:
the triangle inequality, epsilon-over-2, etc.
5. Sample problem 2 shows that in general, the order of terms in a series must
be respected when calculating the sum. However, addition is commutative
and associative, which makes it surprising that order should matter.
Prove that if a series (an ) has only positive terms, then its sum is
equal to the least upper bound of the numbers that can be obtained
by summing over any finite subset of the terms.
Hint: Call this least upper bound S 0 . Call the sum as defined by Ross
S. Prove that S 0 S and that S S 0 .
Suppose that a series includes both positive and negative terms and
its sum is S. It looks as though you can split it into a series of nonnegative terms and a series of negative terms, sum each separately,
then combine the results. Will this approach work for the seies in
sample problem 2
6. Ross, 14.3 (Determining whether a series converges. Apologies to those who
have already done hundreds of these in a high-school course.)
7. Ross, 14.8.
8. Ross, 15.6
9. Ross, 23.4. You might find it useful to have R generate some terms of the
series.

10. Ross, 23.5

21

MATHEMATICS 23a/E-23a, Fall 2015


Linear Algebra and Real Analysis I
Module #2, Week 3 (Limits and continuity of functions)
Authors: Paul Bamberg and Kate Penner (based on their course MATH S-322)
R scripts by Paul Bamberg
Last modified: July 24, 2014 by Paul Bamberg
The lecture outline and problems have not yet been revised for 2015. Pages
1-8 are in final form. Print them if you are watching lecture preview videos or R
script videos before the course starts.
Reading from Ross
Chapter 3, sections 17 and 18. (continuity)
Chapter 3, sections 19 and 20 (uniform continuity and limits of functions)
Warmups(to be done before lecture)
Study example 1 on page 125, then invent a similar argument for the function f (x) = x2 2x + 1. It is important to realize that a proof can be done
for all sequences.

The function g(x) = sin

1
for x 6= 0, g(0) = 0
x

is discontinuous at x = 0. Show that the sequence xn =


a bad sequence to prove this assertion.

1
n

can be used as

Suppose that a function f (x) has the property that the image of the interval
I = [0, 2] is the interval J = [0, 1] [2, 3]. Invent a discontinuous function f
with this property and conveince yourself that no continuous function can
have this property.
When you define the arc sine function in a calculus course, you begin by
restricting the domain of the sine function to the interval [ 2 , 2 ]. Convince
yourself that this restriction makes Theorems 18.4 and 18.5 apply, while
restricting the domain to [0, ] would not work. Which restricted domain
works for defining the arc cosine function?
Read through examples 1-3 in section 19.1 of Ross. You can skip over the
computational details. The key issue is this:
On the interval (0, ) the function f (x) = x12 is continuous for any specified
x0 . However, when x0 is very small, the that is needed to prove continuity

must be proportional to x30 . There is no one size fits all that is independent of x0 . Example 3 shows that even with  = 1, it is impossible to meet
the requirement for uniform continuity. When you draw the graph of f (x),
you see what the problem is: the derivative of f (x), which is essentially the
ratio of  to , is unbounded.
The squaring function f (x) = x2 is continuous. However, its derivative is
unbounded on [0, ), and the function is not uniformly continuous. Convince yourself that no matter how small you require |y x| to be, you can
always make |f (y) f (x)| be as large as you like simply by making y and
x be large.
Now you have seen two ways to select a function and an interval so that the
function is continuous but not uniformly continuous on the interval. Read
through the rest of section 19.1 to see how to avoid this situation. There
are four ways:
Make the interval be closed and bounded.
If the interval is not closed, make it closed by including its endpoints,
and extend the function so that it remains continuous.
The problem is related to an unbounded derivative: if f 0 (x) is bounded,
it goes away.
If f turns a Cauchy sequence (xn ) into a Cauchy sequence (f (xn )),
there is no problem,
Think hard about definition 20.1. This is not the definition of limit that
is found in most calculus texts, but it is in some ways better because it
incorporates the ideas of limit at infinity and increases without limit.
Look at theorems 20.4 and 20.5, and convince yourself that they are crucial
for proving the well-known formulas for derivatives that are in every calculus
course. If you are fond of entertaining counterexamples, look at example 7
on page 158.
Proofs to present in section or to a classmate who has done them.
7.1 Suppose that a < b, f is continuous on [a, b], and f (a) < y < f (b).
Prove that there exists at least one x [a, b] such that f (x) = y.
Use Rosss no bad sequence definition of continuity, not the epsilon-delta
definition.
7.2 Using the Bolzano-Weierstrass theorem, prove that if function f is continuous on the closed interval [a, b], then f is uniformly continuous on [a, b].

Additional proofs(may appear on quiz, students wiill post pdfs or


videos
7.3 Prove that if f and g are real-valued functions that are continuous at
x0 R, then f + g is continuous at x0 . Do the proof twice: once using the
no bad sequence definition of continuity and one using the epsilon-delta
definition of continuity.
7.4 (Ross, page 146; uniform continuity and Cauchy sequences)
Prove that if f is uniformly continuous on a set S and (sn ) is a Cauchy
sequence in S, then (f (sn )) is a Cauchy sequence. Invent an example where
f is continuous but not uniformly continuous on S and (f (sn )) is not a
Cauchy sequence.

R Scripts
Script 2.3A-Continuity.R
Topic 1 - Two definitions of continuity
Topic 2 Uniform continuity
Script 2.3B-IntermediateValue.R
Topic 1 - Proving the intermediate value theorem
Topic 2 - Corollaries of the IVT

Executive Summary

1.1

Two equivalent definitions of continuity

Continuity in terms of sequences


This definition is not standard: Ross uses it, but many authors use the
equivalent epsilon-delta definition. Here is some terminology that students
find useful when discussing the concept:
If lim xn = x0 and lim f (xn ) = f (x0 ), we call xn a good sequence.
If lim xn = x0 but lim f (xn ) 6= f (x0 ), we call xn a bad sequence.
Then function f is continuous at x0 means every sequence is a good
sequence; i.e. there are no bad sequences.
The more conventional definition:
Let f be a real-valued function with domain U R. Then f is continuous
at x0 U if and only if
 > 0, > 0 such that if x U and |x x0 | < , |f (x) f (x0 )| < .
Which definition to use?
To prove that a function is continuous, it is often easier to use the second
version of the definition. Start with a specified , and find a (not the
) that does the job. However, as Ross Example 1a on page 125 shows,
the first definition, combined with the limit theorems that we have already
proved, can let us prove that an arbitrary sequence is good.
To prove that a function is discontinuous, the first definition is generally
more useful. All you have to do is to construct one bad sequence.

1.2

Useful properties of continuous functions

New continuous functions from old ones.


If f is continuous at x0 , then |f | is continuous at x0 .
If f is continuous at x0 , then kf is continuous at x0 .
If f and g are continuous at x0 , then f + g is continuous at x0 .
If f and g are continuous at x0 , then f g is continuous at x0 .
If f and g are continuous at x0 and g(x0 ) 6= 0, then
x0 .

f
g

is continuous at

If f is continuous at x0 and g is continuous at f (x0 ), then the composite


function g f is continuous at x0 .
Once you know that the identity function and elementary functions like nth
root, sine, cosine, exponential, and logarithm as continuous (Ross has not
yet defined most of these functions!), you can state the casual rule
If you can write a formula for a function that does not involve
division by zero, that function is continuous everywhere.
Theorems about a continuous function on a closed interval [a, b] (an example of a compact set), easy to prove by using the Bolzano-Weierstrass
theorem.
f is a bounded function.
f achieves its maximum and minimum values on the interval (i.e. they
are not just approached as limiting values).
The Intermediate Value Theorem and some of its corollaries.
It is impossible to do calculus without either proving these theorems or
stating that they are obvious!
Now f is assumed continuous on an interval I that is not necessarily closed
(e.g. x1 on (0, 1])
IVT: If a < b and y lies between f (a) and f (b), there exists at least
one x in (a, b) for which f (x) = y.
The image of an interval I is either a single point or an interval J.
If f is a strictly increasing function on I, there is a continuous strictly
increasing inverse function f 1 : J I.
If f is a strictly decreasing function on I, there is a continuous strictly
decreasing inverse function f 1 : J I.
If f is one-to-one on I, it is either strictly increasing or strictly decreasing.

1.3

Continuity versus uniform continuity

Its all a matter of the order of quantifiers. For continuity, y is agreed upon
before the epsilon-delta game is played. For uniform continuity, a challenge is
made using some  > 0, then a has to be chosen that meets the challenge
independent of y.
For function f whose domain is a set S:
Continuity: y S,  > 0,
> 0 such that x S, |x y| < implies |f (x) f (y)| < .
Uniform continuity:  > 0,
> 0 such that x, y S,|x y| < implies |f (x) f (y)| < .
On [0, ] (not a bounded set), the squaring function is continuous but not
uniformly continuous.
On (0, 1) (not closed) the function f (x) =
continuous.

1
x

is continuous but not uniformly

On a closed, bounded interval [a, b], continuity implies uniform continuity.


The proof uses the Bolzano-Weierstrass theorem.
By definition, if a function is continuous at s S and (sn ) converges to s,
then (f (sn )) converges to f (s). If (sn ) is merely Cauchy, we know that it
converges, but not what it converges to. To guarantee that (f (sn )) is also
Cauchy, we must require f to be uniformly continuous.
On an open interval (a, b) a function can be continuous without being uniformly continuous. However, if we can extend f to a function f , defined so
that f is continuous at a and b, then f is uniformly continuous on [a, b] and f
is uniformly continuous on (a, b). The most familiar example is f (x) = sinx x
on (0, ), extended by defining f (0) = 1.
Alternative criterion for uniform continnuity (sufficient but not necessary):
f is differentiable on (a, b), with f 0 bounded on (a, b).

1.4

Limits of functions

1. Definitions of limit
Rosss definition of limit, consistent with the definition of continuity:
S is a subset of R, f is a function defined on S, and a and L are real
numbers, or . Then limxaS f (x) = L means
for every sequence (xn ) in S with limit a, we have lim(f (xn )) = L.
The conventional epsilon-delta definition:
f is a function defined on S R, a is a real number in the closure of S
(not ) and L is a real number (not ). limxa f (x) = L means
 > 0, > 0 such that if x S and |x a| < , then |f (x) L| < .
2. Useful theorems about limits, useful for proving differentiation rules.
Note: a can be but L has to be finite.
Suppose that L1 = limxaS f1 (x) and L2 = limxaS f2 (x) exist and are
finite.
Then
limxaS (f1 + f2 )(x) = L1 + L2 .
limxaS (f1 f2 )(x) = L1 L2 .
limxaS ( ff21 )(x) =

L1
,
L2

provided L2 6= 0 and f2 (x) 6= 0 for x S.

3. Limit of the composition of functions


Suppose that L = limxaS f (x) exists and is finite.
Then limxaS (g f )(x) = g(L) provided
g is defined on the set {f (x) : x S}.
g is defined at L
(which may just be a limit point of the set {f (x) : x S}.)
g is continuous at L.
4. One-sided limits
We can modify either definition to provide a definition for L = limxa+ f (x).
With Rosss definition, choose the set S to include only values that
are greater than a.
With the conventional definition, consider only x > a: i.e.
a < x < a + implies |f (x) L| < .
It is easy to prove that
limxa f (x) = L if and only if limxa+ f (x) = limxa f (x) = L.
8

Lecture outline
1. (Ross, page 124)
For specified x0 and function f , define the following terminology:
If lim xn = x0 and lim f (xn ) = f (x0 ), we call xn a good sequence.
If lim xn = x0 but lim f (xn ) = f (x0 ), we call xn a bad sequence.
Then Rosss definition of continuity is every sequence is a good sequence.
Prove the following, which is the more conventional definition:
Let f be a real-valued function with domain U R. Then f is continuous
at x0 U if and only if
 > 0, > 0 such that if x U and |x x0 | < , |f (x) f (x0 )| < .
2. (Ross, page 128)
Prove that if f and g are real-valued functions that are continuous at x0 R,
then f + g is continuous at x0 .
3. (Ross, page 133)
Let f be a real-valued function on a closed interval [a, b]. Using the BolzanoWeierstrass theorem, prove that f is bounded and that f achieves its maximum value: .i.e. y0 [a, b] such that f (x) f (y0 ) for all x [a, b].
4. (Ross, page 134: the intermediate value theorem)
Suppose that a < b, f is continuous on [a, b], and f (a) < y < f (b). Prove
that there exists at least one x [a, b] such that f (x) = y.
Use Rosss no bad sequence definition of continuity, not the epsilon-delta
definition.
5. (Ross, page 143)
Using the Bolzano-Weierstrass theorem, prove that if function f is continuous on the closed interval [a, b], then f is uniformly continuous on [a, b].
6. (Ross, page 146)
Prove that if f is uniformly continuous on a set S and (sn ) is a Cauchy
sequence in S, then (f (sn )) is a Cauchy sequence. Invent an example where
f is continuous but not uniformly continuous on S and (f (sn )) is not a
Cauchy sequence.

7. (Ross, page 156)


Use Rosss non-standard but excellent definition of limit.
S is a subset of R, f is a function defined on S, and a and L are real
numbers, or .
Then limxaS f (x) = L means
for every sequence (xn ) in S with limit a, we have lim(f (xn )) = L.
Suppose that L1 = limxaS f1 (x) and L2 = limxaS f2 (x) exist and are
finite.
Prove that limxaS (f1 + f2 )(x) = L1 + L2 and
limxaS (f1 f2 )(x) = L1 L2 .
8. (Ross, page 159; conventional definition of limit)
Let f be a function defined on S R, let a be in the closure of S, and let
a be a real number.
Prove that limxa f (x) = L if and only if
 > 0, > 0 such that
if x S and |x a| < , then |f (x) L| < .

10

9. Using the bad sequence criterion to show that a function is discontinuous.


The signum function sgn(x) is defined as

x
|x|

for x 6= 0, 0 for x = 0.

Invent a bad sequence, none of whose elements is zero, to prove that


sgn(x) is discontinuous at 0, then show that for any positive x, no such bad
sequence can be constructed.
Restate this proof that sgn(x) is discontinous at x = 0, continuous for
positive x, in terms of the epsilon-delta definition.

11

10. Prove that the function


C(x) = 1

x2 x4
+
2
24

is equal to zero for one and only one value x [1, 2].
This result will be useful when we define without trigonometry.

12

11. Uniform continuity (or lack thereof)


Let f (x) = x2 +

1
.
x2

Determine whether f is or is not uniformly continuous on each of the following intervals:


(a) [1, 2]
(b) (0, 1]
(c) [2, )
(d) (1, 2)

13

12. Uniform continuity


Show that on the open interval (0, ) the function
f (x) =

1 cos x
x2

is uniformly continuous by using the extension approach.

14

13. Limits by brute force


(a) Use the epsilon-delta definition of limit to prove that limx0
(b) Use the sequence definition of limit to show that
exist.

15

x
limx0 |x|

|x| = 0.

does not

14. Limits that involve roots


Use the sum and product rules for limits to evaluate
1

x3 1
lim
x1 x 1

16

Group Problems
1. Proofs about continuity
For (a) and (b), do two different versions of the proof:
Use the no bad sequence definition and invoke a result for sequences
from week 1.
Use the epsilon-delta definition and mimic the proof for sequences from
week 1.
(a) Prove that if f and g are real-valued functions that are continuous at
x0 R, then f g is continuous at x0 . (Hint: on any closed interval [x0
a, x0 + b] in the domain of f , the continuous function f is bounded.)
(b) Prove that if f is continuous at x0 R, and g is continuous at f (x0 ),
then the composite function g f is continuous at x0 .
(c)

The Heaviside function H(x) is defined by H(x) = 0 for x <


0, H(x) = 1 for x 0 Using the no bad sequence definition,
prove that H is discontinuous at x = 0.
Using the epsilon-delta definition of continuity, prove that f (x) =
x3 is continuous for arbitrary x0 . (Hint: first deal with the special
case x0 = 0, then notice that for small enogh , |x| < 2|x0 |.

17

2. Uniform continuity; intermediate-value theorem


(a) Uniform contiunity, or lack thereof
Show that f (x) = x2 is not uniformly continuous on the closed
interval [0, ].
1
Show that f (x) = 1x
not uniformly continuous on [0, 1).
Show that f (x) = sin x is uniformly continuous on the open interval (0, ).
(b) Using the intermediate-value theorem
As a congressional intern, you are asked to propose a tax structure for
families with incomes in the range 2 to 4 million dollars inclusive. Your
boss, who feels that proposing a tax rate of exactly 50% for anyone
would be political suicide, wants a function T (x) with the following
properties:

It is continuous.
Its domain is [2,4].
Its codomain is [1,2].
There is no x for which 2T (x) = x.

Prove that this set of requirements cannot be met by applying the


intermediate-value theorem to the function x2T (x), which is negative
if the tax rate exceeds 50%.
Then prove from scratch that this set of requirements cannot be
met, essentially repeating the proof of the IVT. Hint: Consider the
least upper bound of the set of incomes S [2, 4] for which the tax
rate is less than 50 %, and construct a pair of good sequences.
(c) Continuous functions on an interval that is not closed
Let S = [0, 1). Invent a sequence xn S that converges to a number
x0
/ S. Hint: try x1 = 12 , x2 = 34 . Then, using this sequence, invent an
unbounded continuous function on S and invent a bounded continuous
function on S that has no maximum.

18

3. Calculation of limits (do these in LaTeX to get practice with fractions and
functions)
(a) Limits by brute force
i. Use the epsilon-delta definition of limit to prove that limx0 x sin x1 =
0.
ii. Use the sequence definition of limit to show that limx0 sin x1 does
not exist.
(b) Limits that involve square roots; use the sum and product rules for
limits
Evaluate

(x + h) 2 x 2
lim
h0
h
Evaluate

lim ( x + 1 x)

(c) Limits that involve trig functions; use the sum and product rules for
limits and the fact that limx0 sinx x = 1.
Evaluate

cos 2x 1
x0
x2
lim

Evaluate
lim

x0

19

tan x sin x
x3

Homework

Special offer if you do the entire problem set, with one problem omitted, in
LaTeX and hand in a printout of the PDF file, you will receive full credit for the
omitted problem.
1. Ross, exercises 19.2(b) and 19.2(c). Be sure that you prove uniform continuity, not just continuity!
2. Ross, exercise 19.4.
3. Ross, exercises 20.16 and 20.17. This squeeze lemma is a cornerstone of
elementary calculus, and it is nice to be able to prove it!
4. Ross, exercise 20.18. Be sure to indicate where you are using various limit
theorems.
5. Ross, exercise 17.4. It is crucial that the value of is allowed to depend on
x.
6. Ross, exercises 17-13a and 17-14. These functions will be of interest when
we come to the topic of integration in the spring term.
7. Ross, exercise 18-4. To show that something exists, describe a way to
construct it.
8. Ross, exercise 18-10. You may use the intermediate-value theorem to prove
the result.

20

MATHEMATICS 23a/E-23a, Fall 2015


Linear Algebra and Real Analysis I
Module #2, Week 4 (Derivatives, Inverse functions, Taylor series)
Authors: Paul Bamberg and Kate Penner (based on their course MATH S-322)
R scripts by Paul Bamberg
Last modified:July 24, 2015 by Paul Bamberg
The lecture outline and problems have not yet been revised for 2015. Pages
1-8 are in final form. Print them if you are watching lecture preview videos or R
script videos before the course starts.
Reading from Ross
Chapter 5, sections 28 and 29 (pp.223-240)
Chapter 5, sections 30 and 31, but only up through section 31.7.
Chapter 7, section 37 (logarithms and exponentials)
Warmups(to be done before lecture)
Review the derivative rules, and the limit definition of the derivative.
Be able to compute polynomial limits such as x2 2x from the limit definition of the derivative.
Read the last paragraph of section 29.8, which begins We next show how
, ) to do the
to ... Apply the argument to the case f (x) = sin x, I = (
2 2
standard derivation of the derivative of the arc sine function. Then be sure
that you understand what else needs to be proved.
Read the statement of LHospitals rule at the start of section 30.2. Then
look at examples 2 through 5 and identify the values of s and L.
Look through examples 6 through 9 of section 30.2. Dont worry about
the details: just notice that there are tricks that can be used to convert a
limit into a form to which LHospitals rule applies. Which example uses
the common denominator trick? Which uses the exponential trick?
Read Example 3 on page 257, which describes a function that does not
equal the sum of its Taylor series! Once you are aware of the existence of
such functions, you will appreciate why it is necessary to prove Taylors
theorem with remainder. Only by showing that the remainder approaches
a limit of zero can you prove that the Taylor series converges to the function.

Look at example 1 of section 31.4, where the familiar Taylor series for the
exponential function and the sine function are derived. By looking at the
corollary at the start of the section and the theorem that precedes it, figure
out the importance of the statement the derivatives are bounded.
Skim the proof of the binomial theorem in Section 31.7. Notice that it is
not sufficient just to crank out derivatives and get the Taylor series. We
will need to prove that, for any |x| < 1, the series for (1 + x) converges
to the function, and this requires a different form on the remainder. Look
at Corollary 31.6 and Corollary 31.4 and figure out which relies on the
mean-value theorem and which relies on integration by parts.
Proofs to present in section or to a classmate who has done them.
8.1 Suppose that f is a one-to-one continuous function on open interval I
(either strictly increasing or strictly decreasing) Let open interval J = f (I),
and define the inverse function f 1 : J I for which
(f 1 f )(x) = x for X I; f f 1 (y) = y for y J.
Use the chain rule to prove that if f 1 is differentiable at y0 = f (x0 ),
then
1
.
(f 1 )0 (y0 ) = 0
f (x0 )
Let g = f 1 ; it has already been shown that g is continuous at y0 .
Prove that, if f if differentiable at x0 , then
lim

yy0

1
g(y) g(y0 )
.
= 0
y y0
f (x0 )

8.2 Taylors Theorem with remainder: Let f be defined on (a, b) with a <
0 < b. Suppose that the nth derivative f (n) exists on (a, b).
Define the remainder

Rn (x) = f (x)

n1 (k)
X
f (0)
k=0

k!

xk .

Prove, by repeated use of Rolles theorem, that for each x 6= 0 in (a, b),
there is some y between 0 and x for which
Rn (x) =

f (n) (y) n
x .
n!

Additional proofs(may appear on quiz, students wiill post pdfs or


videos
8.3 (Ross, pp.233-234, Rolles Theorem and the Mean Value Theorem)
Prove Rolles Theorem: if f is a continuous function on [a, b] that is
differentiable on (a, b) and satisfies f (a) = f (b), then there exists at
least one x in (a, b) such that f 0 (x) = 0.
Using Rolles Theorem, prove the Mean Value Theorem: f is a continuous function on [a, b] that is differentiable on (a, b), then there exists
at least one x in (a, b) such that
f 0 (x) =

f (b) f (a)
ba

8.4 (Ross, pp. 228, The Chain Rule easy special case) Assume the following:
Function f is differentiable at a.
Function g is differentiable at f (a).
There is an open interval J containing a on which f is defined and
f (x) 6= f (a) (without this restriction, you need the messy Case 2 on
page 229).
Function g is defined on the open interval I = f (J), which contains
f (a).
Using the sequential definition of a limit, prove that the composite function
g f is defined on J and differentiable at a and that
(g f )0 (a) = g 0 (f (a)) f 0 (a).

R Scripts
Script 2.4A-Taylor Series.R
Topic 1 - Convergence of the Taylor series for the cosine function
Topic 2 - A function that is not the sum of its Taylor series
Topic 3 - Illustrating Rosss proof of Taylor series with remainder.
Script2.4B-LHospital.R Topic 1 - Illustration of proof 6 from Week 8
Script 2.4C-SampleProblems.R

Executive Summary

1.1

The Derivative - Definition and Properties

A function f is differentiable at some point a if the limit


f (x) f (a)
xa
exists and is finite. It is referred to as f 0 (a). If a function is differentiable
at a point a, then it is continuous at a as well.
lim

xa

Derivatives, being defined in terms of limits, share many properties with


limits. Given two functions f and g, both differentiable at some point a,
the following properties hold:
scalar multiples: (cf )0 (a) = c f 0 (a)
sums of functions: (f + g)0 (a) = f 0 (a) + g 0 (a)
Product Rule: (f g)0 (a) = f (a)g 0 (a) + f 0 (a)g(a)
Quotient Rule: (f /g)0 (a) = [g(a)f 0 (a) f (a)g 0 (a)]/g 2 (a) if g(a) 6= 0
The most memorable derivative rule is The Chain Rule, which states that
if f is differentiable at some point a, and g is differentiable at f (a), then
their composite function g f is also differentiable at a, and
(g f )0 (a) = g 0 (f (a)) f 0 (a)

1.2

Increasing and decreasing fucntions

The terminology is the same as what we used for sequences. It applies to functions
whether or not they are differentiable or even continuous.
A function f is strictly increasing on an interval I if x1 , x2 I and
x1 < x2 = f (x1 ) < f (x2 )
A function f is strictly decreasing on an interval I if x1 , x2 I and
x1 < x2 = f (x1 ) > f (x2 )
A function f is increasing on an interval I if x1 , x2 I and
x1 < x2 = f (x1 ) f (x2 )
A function f is decreasing on an interval I if x1 , x2 I and
x1 < x2 = f (x1 ) < f (x2 )

1.3

Behavior of differentiable functions

These justify our procedures when we are searching for the critical points of a
given function. They are the main properties we draw on when reasoning about
a functions behavior.
If f is defined on an open interval, achieves its maximum or minimum at
some x0 , and is differentiable there, then f 0 (x0 ) = 0.
Rolles Theorem. If f is continuous on some interval [a, b] and differentiable
on (a, b) with f (a) = f (b), then there exists at least one x (a, b) such
that f 0 (x) = 0 (Rolles Theorem).
Mean Value Theorem. If f is continuous on some interval [a, b] and differentiable on (a, b), then there exists at least one x (a, b) such that
f 0 (x) =

f (b) f (a)
ba

If f is differentiable on (a, b) and f 0 (x) = 0 x (a, b), then f is a constant


function on (a, b).
If f and g are differentiable functions on (a, b) such that f 0 = g 0 on (a, b),
then there exists a constant c such that
x (a, b) f (x) = g(x) + c

1.4

Inverse functions and their derivatives

Review of a corollary of the intermediate value theorem: If function f is


continuous and one-to-one on a interval I(which means it must be either
strictly increasing or strictly decreasing), then there is a continuous inverse
function f 1 , whose domain is the interval
J = f (I), such that f f 1 and f 1 f are both the identity function.
Not quite a proof: Since (f f 1 )(y) = y, the chain rule states that
f 0 (f 1 (y))(f 1 )0 (y) = y and, if f 0 (f 1 (y)) 6= 0,
(f 1 )0 (y) =

1
f 0 (f 1 (y))

Example: if f (x) = tan x with I = (


, ), then f 1 (y) = arctan y and
2 2
(arctan)0 (y) =

1
1
1
1
=
=
=
2
(tan)0 (arctan y)
sec2 (arctan y)
1 + tan (arctan y)
1 + y2

The problem: we need to prove that f 0 is differentiable.

1.5

Defining the logarithm and exponential functions

Define the natural logarithm as an antiderivative:


Z e
Z y
1
1
dt, and define e so that
dt = 1.
L(y) =
1 t
1 t
From this definition it is easy to prove that L0 (y) = y1 and not hard to prove that
L(xy) = L(x) + L(y).
Now the exponential function can be defined as the inverse function, so that
E(L(y)) = y. From this definition it follows that E(x + y) = E(x) + E(y) and
that E 0 (x) = E(x).

1.6

LHospitals rule

Suppose that f and g are differentiable functions and that


f 0 (x)
= L; lim+ f (x) = lim+ g(x) = 0; g 0 (a) < 0.
0
xa+ g (x)
xa
xa
lim

Then

f (x)
= L.
xa+ g(x)
lim

Replace x a+ by x a or x a or x and the result is still


valid. It is also possible to have limxa+ f (x) = limxa+ g(x) = . The
restriction to g 0 (a) < 0 is just to make the proof easier; the result is also
true if g 0 (a) > 0.
Once you understand the proof in one special case, the proof in all the other
cases is essentially the same.
Here is the basic strategy: given that
f 0 (x)
= L,
xa g 0 (x)
lim

use the mean value theorem to construct an interval (a, ) on which


|

f (x)
L| < .
g(x)

1.7

Taylor series

If a function f is defined by a convergent power series, i.e.


f (x) =

ak xk for |x| < R,

k=0

then it is easy to show that


f (x) =

X
f (k) (0)
k=0

k!

xk for |x| < R.

The challenge is to extend this formula to functions that are differentiable


many times but that are not defined p
by power series, like trig functions
defined geometrically, or the function (1 + x).
Taylors theorem with remainder version 1
By the mean value theorem, f (x) f (0) = f 0 (y)x for some y (0, x).
The generalization is that
f (x) f (0) f 0 (0)x

f (n1) (0) n1 f (n) (y) n


f 00 (0) 2
x
x
=
x
2!
(n 1)!
n!

for some y between 0 and x. It is proved by induction, using Rolles theorem


n times.
If the right hand side approaches zero in the limit of large n, then the Taylor
series converges to the function. This is true if all the derivatives f (n) are
bounded by a single constant C. This criterion is sufficient to establish
familiar Taylor expansions like
2

ex = 1 + x + x2 + x3! +
2
4
cos x = 1 x2 + x4! +
Taylors theorem with remainder version 2
Rx
The fundamental theorem of calculus says that f (x) f (0) = 0 f 0 (t)dt.
The generalization is that
f 00 (0) 2
f (n1) (0) n1
f (x)f (0)f (0)x
x
x
=
2!
(n 1)!
0

Z
0

(x t)n1 (n)
f (t)dt.
(n 1)!

It is proved by induction, using integration by parts, but not by us!


A famous counterexample.
1
The function f (x) = e x for x > 0 and f (x) = 0 for x 0 has the property that the remainder does not approach a limit of zero. It does not equal
the sum of its Taylor series.
8

Lecture Outline
1. (Ross, p.226, Sum and Product Rule for Derivatives)
Consider two functions f and g. Prove that if both functions are differentiable at some point a, then both (f + g) and f g are differentiable at a as
well, and:
(f + g)0 (a) = f 0 (a) + g 0 (a)
(f g)0 (a) = f (a)g 0 (a) + f 0 (a)g(a)
2. (Ross, pp. 228, The Chain Rule easy special case) Assume the following:
Function f is differentiable at a.
Function g is differentiable at f (a).
There is an open interval J containing a on which f is defined and
f (x) 6= f (a) (without this restriction, you need the messy Case 2 on
page 229).
Function g is defined on the open interval I = f (J), which contains
f (a).
Using the sequential definition of a limit, prove that the composite function
g f is defined on J and differentiable at a and that
(g f )0 (a) = g 0 (f (a)) f 0 (a).
3. The derivative at a maximum or minimum (Ross, page 232)
Prove that if f is defined on an open interval containing x0 , if f has its
maximum of minimum at x0 , and if f is differentiable at x0 , then f 0 (x0 ) = 0.
4. (Ross, pp.233-234, Rolles Theorem and the Mean Value Theorem)
Prove Rolles Theorem: if f is a continuous function on [a, b] that is differentiable on (a, b) and satisfies f (a) = f (b), then there exists at least one x
in (a, b) such that f 0 (x) = 0.
Using Rolles Theorem, prove the Mean Value Theorem: f is a continuous
function on [a, b] that is differentiable on (a, b), then there exists at least
one x in (a, b) such that
f 0 (x) =

f (b) f (a)
ba

5. (Ross, theorem 29.9 on pages 237-238, with the algebra done in reverse
order)
Suppose that f is a one-to-one continuous function on open interval I (either
strictly increasing or strictly decreasing) Let open interval J = f (I), and
define the inverse function f 1 : J I for which
(f 1 f )(x) = x for X I; f f 1 (y) = y for y J.
Use the chain rule to prove that if f 1 is differentiable at y0 = f (x0 ),
then
1
.
(f 1 )0 (y0 ) = 0
f (x0 )
Let g = f 1 ; it has already been shown that g is continuous at y0 .
Prove that
g(y) g(y0 )
1
= 0
.
lim
yy0
y y0
f (x0 )
6. (LHospitals Rule; based on Ross, 30.2, but simplified to one special case)
Suppose that f and g are differentiable functions and that
f 0 (z)
= L; f (a) = 0, g(a) = 0; g 0 (a) > 0.
za+ g 0 (z)
lim

Choose x > a so that for a < z x, g(z) > 0 and g 0 (z) > 0.
(You do not have to prove that this can always be done!)
By applying Rolles Theorem to h(z) = f (z)g(x) g(z)f (x),
prove that
f (x)
= L.
xa+ g(x)
lim

10

7. (Ross, page 250; version 1 of Taylors Theorem with remainder, setting


c = 0)
Let f be defined on (a, b) with a < 0 < b. Suppose that the nth derivative
f (n) exists on (a, b).
Define the remainder
n1 (k)
X
f (0)

Rn (x) = f (x)

k=0

k!

xk .

Prove, by repeated use of Rolles theorem, that for each x 6= 0 in (a, b),
there is some y between 0 and x for which
Rn (x) =

f (n) (y) n
x .
n!

8. (Ross, pp. 342-343; defining the natural logarithm)


Define

Z
L(y) =
1

1
dt.
t

Prove from this definition the following properties of the natural logarithm:

L0 (y) =

1
for y (0, ).
y

L(yz) = L(y) + L(z) for y, z (0, ).


limy L(y) = +.

11

9. Calculating derivatives

Let f (x) = 3 x.
(a) Calculate f 0 (x) using the definition of the derivative.
(b) Calculate f 0 (x) by applying the chain rule to (f (x))3 = x.

12

10. Using the Mean Value Theorem


(a) Suppose f is differentiable on R and f (0) = 0, f (1) = 1, and f (2) = 1.
Show that f 0 (x) = 1/2 for some x (0, 2).
Then, by applying the Intermediate Value Theorem and Rolles Theorem to g(x) = f (x) 41 x, show that f 0 (x) = 41 for some x (0, 2).
(b) Prove that if f is a differentiable function on an interval (a, b) and
f 0 (x) > 0 x (a, b), then f is strictly increasing.

13

11. Using LHospitals rule tricks of the trade


(a) Conversion to a quotient evaluate
lim x loge x2 .

x0+

(b) Evaluate

xex sin x
x0
x2
lim

both by using LHospitals rule and by expansion in a Taylor series.

14

12. Applying the inverse-function rule


The function g(y) = arctan y 2 , y 0 is continuous and strictly increasing,
hence invertible.
Calculate its derivative by finding a formula for the inverse function f (x),
which is easy to differentiate, then using the rule for the derivative of an inverse function. You can confirm your answer by using the known derivative
of the arctan function.

15

13. Definition and properties of the exponential function


Denote the function inverse to L by E, i.e.
(E(L(y)) = y for y (0, )
L(E(x)) = x for x R
Prove from this definition the following properties of the exponential function E:
E 0 (x) = E(x) for x R.
E(u + v) = E(u)E(v) for u, v R.

16

14. Hyperbolic functions, defined by their Taylor series


sinh x = x +

x3 x5
x2 x4
+
+ ; cosh x = 1 +
+
+
3!
5!
2!
4!

Calculate sinh0 x and cosh0 x, and prove that cosh2 x sinh2 x = 1.


Use Taylors theorem to prove that
sinh(a + x) = sinh a cosh x + cosh a sinh x.

17

Group Problems
1. Proving differentiation rules
(a) Trig functions
Prove that (sin x)0 = cos x from scratch using the fact that
sin x
=1
x0 x
lim

Let f (x) = csc x so that sin xf (x) = 1. Use the product rule to
prove that
(csc x)0 = csc x cot x.
(b) Integer exponents
Positive: use induction and the product rule to prove that for all
positive integers n
(xn )0 = nxn1
Hint: start with a base case of n = 1.
Negative: let f (x) = xn so that xn f (x) = 1. Use the product
rule to prove that for all positive integers n
(xn )0 = nxn1 .
(c) Non-integer exponents
Rational exponent: Let f (x) = xm/n , so that (f (x))n = xm .
Prove that
m m
f 0 (x) = x n 1 .
n
Irrational exponent:
Let p be any real number and define f (x) = xp = E(pL(x)).
Prove that f 0 (x) = pxp1 .

18

2. MVT, LHospital, inverse functions


(a) When a local minimum is also a global minimum
Suppose that f is twice differentiable on (a, b), with f 00 > 0, and that
there exists x (a, b) for which f 0 (x) = 0, so that x is a local minimum
of f . Consider y (x, b). By using the mean value theorem twice,
prove that f (y) > f (x). This, along with a similar result for y (a, x),
establishes that x is also the global minimum of f on (a, b).
(b) Using LHospitals rule
i. Evaluate the limit
1 cos x
x0 ex x 1
by using LHospitals rule, then confirm your answer by expanding
both numerator and denominator in a Taylor series.
ii. Evaluate the limit
lim

csc x cot x
.
x
It takes a little bit of algebraic work to rewrite this in a form to
which LHospitals rule can be applied.
lim

x0

(c) Applying the inverse-function rule

The function g(y) = arcsin y, 0 < y < 1 is important in the theory


of random walks.
Calculate its derivative by finding a formula for the inverse function
f (x), which is easy to differentiate, then using the rule for the derivative of an inverse function. You can confirm your answer by using the
known derivative of the arcsin function.

19

3. Taylor series
(a) Using the Taylor series for the trig functions
Define functions S(x) and C(x) by the power series
x 3 x5
x2 x4
+
; C(x) = 1
+

3!
5!
2!
4!
Calculate S 0 (x) and C 0 (x), and prove that S 2 (x) + C 2 (x) = 1.
Use Taylors theorem to prove that
C(a + x) = C(a)C(x) S(a)S(x).
S(x) = x

(b) Using the remainder to prove convergence


Define f (x) = loge (1 + x) for x (1, ).
Using the remainder formula
f (n) (y) n
x
Rn (x) =
n!
prove that
loge 2 = 1

1 1 1 1
+ + .
2 3 4 5

Show that the remainder does not go to zero if you set x = 1.


(c) Derive the Taylor series for the function f (x) = cos x. Prove that the
series converges for all x. Then use an appropriate form of remainder
to prove that it converges to the cosine function.

20

Homework

Again, if you do the entire assignment in TeX, you may omit one problem and
receive full credit for it.
1. Ross, 28.2
2. Ross, 28.8
3. Ross, 29.12
4. Ross, 29.18
5. Ross, exercises 30-1(d) and 30-2(d). Do these two ways: once by using
LHospitals rule, once by replacing each function by the first two or three
terms of its Taylor series.
6. Ross, 30-4. Use the result to convert exercise 30-5(a) into a problem that
involves a limit as y .
7. One way to define the exponential function is as the sum of its Taylor series:
ex = 1 + x +

x2
2!

x3
3!

+ .

Using this definition and Taylors theorem, prove that ea+x = ea ex .


8. Ross, exercise 31.5. For part (a), just combine the result of exmaple 3
(whose messy proof you need not study) with the chain rule.
9. Ross, exercise 37.9.

21

MATHEMATICS 23a/E-23a, Fall 2015


Linear Algebra and Real Analysis I
Module #3, Week 1
Author: Paul Bamberg
R scripts by Paul Bamberg
Last modified: July 27, 2015 by Paul Bamberg
The lecture outline and problems have not yet been revised for 2015. Pages
1-7 are in final form. Print them if you are watching lecture preview videos or R
script videos before the course starts.
Reading
Hubbard, Section 1.5. The only topology that is treated is the open-ball
topology.
Alas, Hubbard does not mention either finite topology or differential equations. I have included a set of notes on these topics that I wrote for Math 121.
Warmups(intended to be done before lecture)
Go to the page of the Math 23 Web site called Finite topology example.
Roam around the six pages by clicking links, and convince yourself that the
site is represented by the graph and the matrix T on page 3.
Look at the three axioms for topology on page 4, and decide whether or
not open intervals on the line and open disks in R2 appear to satisfy them.
In each case, invent an infinite intersection of open sets that consists of a
single point, which is a closed set.
Review matrix diagonalization and its generalizations. In order to solve
differential equations, you will need to be able to express a 2/times2 matix
A in one of three way:
A = P DP 1 where D is diagonal (for real distinct eigenvalues)
A = bI + N where N is nilpotent (if p(t) = (t b)2 )
A = P CP 1 where C is conformal (for complex conjugate eigenvalues)

Proofs to present in section or to a classmate who has done them.


Proofs:
9.1
Define Hausdorff space, and prove that in a Hausdorff space the
limit of a sequence is unique.
Prove that Rn , with the topology defined by open balls, is a Hausdorff
space.
9.2 Starting from the triangle inequality for two vectors, prove the triangle
inequality for n vectors, then prove the infinite triangle inequality for Rn
|

a~i |

i=1

|~
ai |

i=1

under the assumption that the infinite series on the right is convergent,
which in turn implies that the infinite series of vectors on the left is convergent.

R Scripts
Script 3.1A-FiniteTopology.R
Topic 1 - The standard Web site graph, used in notes and examples
Topic 2 - Drawing a random graph to create a different topology on the
same set
Script 3.1B-SequencesSeriesRn.R
Topic 1 - A convergent sequence of points in R2
Topic 2 - A convergent infinite series of vectors
Topic 3 - A convergent geometric series of matrices
Script 3.1C-DiffEquations.R
Topic 1 - Two real eigenvalues
Topic 2 - A repeated real eigenvalue
Topic 3 - Complex conjugate eigenvalues

Executive Summary

1.1

Axioms of Topology

In topology, we start with a set X and single out some of its subsets as open
sets. The only requirement on a topology is that the collection of open sets
satisfies the following rules (axioms)
The empty set and the set X are both open.
The union of any finite or infinite collection number of open sets is open.
The intersection of two open sets is open. It follows by induction that the
intersection of n open sets is open, but the intersection of infinitely many
open sets is not necessarily open.

1.2

A Web-site model for finite topology

A model for a set of axioms is a set of real-world objects that satisfy the axioms.
Consider a Web site of six pages, linked together as follows:

In this model, an open set is defined by the property that no page in the
set can be reached by a link from outside the set. We need to show that this
definition is consistent with the axioms for open sets.
The empty set is open. Since it contains no pages, it contains no page that can
be reached by an outside link.
The set X of all six pages is open, because there is no other page on the site
from which an outside link could come.
If sets A and B are open, no page in either can be reached by an outside link,
and so their union is also open.
If sets A and B are open, so is their intersection A B. Proof by contraposition:
Suppose that A B is not open. Then it contains a page that can be reached
by an outside link. If that link comes from A, then B is not open. If that link
comes from B, then A is not open. If that link comes from outside both A and
B, then both A and B are not open.
4

1.3

Topology in R and Rn

The usual way to introduce a topology for the set R is to decree that any open
interval is an open set and so is the empty set. Equivalently, we can decree that
the set of points for which |x x0 | < , with  > 0, is an open set. Notice that the
infinite intersection of the open sets (1/n, 1/n) is the single point 0, a closed
set!
The usual way to introduce a topology for the set Rn is to decree that any
open ball, the set of points for which |x x0 | < , with  > 0, is an open set.

1.4

More concepts of general topology

These definitions are intuitively reasonable for R and Rn , but they also apply to
the Web-site finite topology,
Closed sets
A closed set A is one whose complement Ac = X A is open. Careful:
this is different from one that is not open. There are lots of sets that are
neither open nor closed, and there are sets that are both open and closed.
A neighborhood of a point is any set that has as a subset an open set
containing the point. A neighborhood does not have to be open.
The closure of set A Rn , denoted A, is the smallest closed set that
contains A, i.e. the intersection of all the closed sets that contain A
is the largest open set that is
The interior of a set A Rn , denoted A,
contained in A, i.e. the union of all the open subsets of A.
The boundary of A, denoted A, is the set of all points x with the property
that any neighborhood of x includes points of A and also includes points
of the complement Ac .
The boundary of A is the difference between the closure of A and its interior.

1.5

A topological definition of convergence

Sequence sn converges to a limit s if for every open set A containing s, N such


that n > N , an A. In other words, the points of the sequence eventually get
inside A and stay there.
Specialize to R and Rn .
A sequence an of real numbers converges to a limit a if  > 0, N such that
n > N , |a an | < . (open sets defined as open intervals)
A sequence a1 , a2 , ... in Rn converges to the limit a if  > 0, M such that if
m > M , |am a| < . (open sets defined by open balls)
The sequence converges if and only if the sequences of coordinates all converge.

1.6

Something special about the open ball topology

For the Web diagram above , the sequence (6,5,4,6,5,4,5,4,5,4,...) converges


both to 4 and to 5. Both {456} and {45} are open sets (no incoming links) but
{4} {5}, {46}, and {56} are not.
This cannot happen in Rn . If the sequence a1 , a2 , ... in Rn converges to a and
same sequence also converges to the limit b,we can prove that a = b.
Why? The open ball topological space is Hausdorff. Given any two distinct
points a and b, we can find open sets A and B with a A, b B, and A B = .
In a Hausdorff space, the limit of a sequence is unique.

1.7

Infinite sequences and series of vectors and matrices

We need something that can be made less than . For vectors the familiar
length is just fine. The infinite triangle inequality (proof 9.2) states that
|

a~i |

i=1

|~
ai |

i=1

We define the length of a matrix by viewing the matrix as a vector.


Since an m n matrix A is an element of Rmn , we can view it as a vector
and define its length |A| as the square root of the sum of the squares of all
its entries. This definition has the following useful properties:
|A~b| |A||~b|
|AB| |A||B|
Let A be a square matrix, and define its exponential by
exp(At) =

X
(A)r tr
r=0

r!

Denoting the length of matrix A by |A|, we have


| exp(At)|

X
(|A|t)r
r=0

or | exp(At)| exp(|A|t) +

r!

n 1, so the series is convergent for all t.


6

1.8

Calculating the exponential of a matrix







b 0
bt 0
If D =
, then Dt =
and
0 c
0 ct

 



 bt

1 (bt)2
1 0
bt 0
0
e
0
exp(Dt) =
+
+
+ =
0 1
0 ct
0
(ct)2
0 ect
2
If there is a basis of eigenvectors for A,
then A = P DP 1 , Ar = P Dr P 1 ., and exp(At) = P exp(Dt)P 1 .
Replace D by a conformal matrix C = aI + bJ where J 2 = I and
exp(Ct) = exp(aIt) exp(bJt) can be expressed in terms of sin t and cos t.
If A = bI + N, and N 2 = 0, exp(At) = exp bt exp(N t) = exp bt(I + N t).

1.9

Solving systems of linear differential equations

We put a dot over a quantity to denote its time derivative.


The solution to the differential equation x = kx is x = exp(kt)x0 .
Suppose that there is more than one variable, for example
x = x + y
y = 2x + 4y.
 
x
If we set ~v =
then this pair of equations becomes
y


~v = A~v, where A = 1 1
2 4
The solution is the same as in the single-variable case: ~v = exp(At)~v0
Proof:
exp At =

X
Ar tr
r=0

r!

X rAr tr1
d
exp At =
.
dt
r!
r=1
Set s = r 1.

X
X
d
As+1 ts
As ts
exp At =
=A
= A exp At.
dt
s!
s!
s=0
s=0

So

d
~v =
exp At~v0 = A exp At~v0 = A~v.
dt
7

Lecture outline
1. Proof 9.1
Define Hausdorff space, and prove that in a Hausdorff space the
limit of a sequence is unique.
Prove that Rn , with the topology defined by open balls, is a Hausdorff
space.
2. Convergent sequences in Rn :
A sequence a1 , a2 , ... in Rn converges to the limit a if
 > 0, M such that if m > M , |am a| < .
Prove that the sequence converges if and only if the sequences of coordinates
all converge.
Then state and prove the corresponding result for infinite series of vectors
in Rn
3. Proof 9.2
Starting from the triangle inequality for two vectors, prove the triangle
inequality for n vectors, then prove the infinite triangle inequality for Rn
|

a~i |

i=1

|~
ai |

i=1

under the assumption that the infinite series on the right is convergent,
which in turn implies that the infinite series of vectors on the left is convergent.
4. Prove that if every element of the convergent sequence (xn ) is in the closed
subset C Rn , then the limit x0 of the sequence is also in C.
5. Proof of inequalities involving matrix length
The length of a matrix is calculated by treating it as a vector: take the
square root of the sum of the squares of all the entries.
If matrix A consists of a single row, then |A~b| |A||~b| is just the CauchySchwarz inequality.
Prove the following:
|A~b| |A||~b| when A is an m n matrix.
|AB| |A||B|

|I| = n for the n n identity matrix.

6. Constructing a finite topology


Axioms for general topology
The empty set and the set X are both open.
The union of any finite or infinite collection number of open sets is open.
The intersection of two open sets is open.
Suppose that we start with X = {123456} and choose a subbasis. consisting of {123}, {245}, and {456}.
Find all the other sets that must be open because of the intersection
axiom and the empty-set axiom.

Find all the other sets that must be open because of the union axiom
and the axiom that set X is open.

We now have the smallest collection of open sets that satisfies the axioms and includes the subbasis. A closed set is one whose complement
is open. List all the closed sets.

What is the smallest legal collection of open sets in the general case?
What is the largest legal collection of open sets in the general case?

7. Web site topology. A set of pages is open if there are no incoming links
from elsewhere on the site. A set of pages is closed if no outgoing link
leads to a page outside the set (i.e. if the complement is an open set.)

Open:{2}, {45}, {123}, {456}, {245}, {12345}, {2456}


Closed:{13456}, {1236}, {456}, {123}, {136}, {6}, {13}
Both: Empty set and {123456}
Is {345} a neighborhood of page 4?
What is the closure of {23}?
Of {26}?
What is the interior of {23}?
Of {23456}?

What is the boundary of {23}?


A sequence sn converges to page a if, for any open set S that contains
page a,
N such that n > N, sn S.
To which page or pages does sequence (1, 2, 3, 2, 1, 2, 2, 2, 2, 2, ) converge?
To which page or pages does sequence (4, 5, 6, 4, 5, 6, 4, 5, 4, 5, ) converge?

10

8. The open ball definition of an open set satisfies the axioms of topology.
A set U Rn is open if x U, r > 0 such that the open ball Br (x) U .
Prove that the empty set is open.

Prove that all of Rn is open.

Prove that the union of any collection of open sets is open.

Prove that the intersection of two open sets is open.

Prove that in R2 , the boundary of the open disc x2 + y 2 < 1 is the


circle x2 + y 2 = 1 .

Find the infinite intersection of open balls of radius n1 around the


origin, for all positive integers. Is it open, closed, or neither?

11

9. A geometric series of matrices


The geometric series formula for a square matrix A is


Let A =

0
12

(I A)1 = I + A + A2 + ....

 1

1
4 0
2 , A2 =
.
0
0 41

(a) Evaluate I + A2 + A4 .....


(b) Evaluate A + A3 + A5 .... = A(I + A2 + A4 ....).
(c) Evaluate I + A + A2 + ....
(d) Evaluate (I A)1 and compare.

12

10. Calculating and using the exponential of a matrix




 
1 1
1
The matrix A =
has eigenvector
with eigenvalue 2 and
2 4
1
 
1
with eigenvalue 3.
eigenvector
2
(a) Write A in the form A = P DP 1 ,and work out exp(At) = P exp(Dt)P 1 .
 
0
. Calculate exp(At)~v0 .
(b) As initial conditions, take ~v0 =
1
(c) Differentiate the answer with respect to t and check that
x = x + y
y = 2x + 4y.

13

11. Solving a differential equation when there is no eigenbasis.


The system of differential equations
x = 3x y
y = x + y



3
1
can be written ~v = A~v, where A =
.
1 1
Our standard technique leads to p(t) = t2 4t + 4 = (t 2)2 , so there is
one only eigenvalue.


11
Let N = A 2I =
.
11
We have found that p(A) = A2 4A + 4I = (A 2I)2 = 0, so N 2 = 0.
Since matrices 2I and N commute, exp(At) = exp(2It) exp(N t)
Show that exp At = e2t (I + N t) ,and confirm that (exp At)~e1 is a solution
to the differential equation.

14

12. Solving the harmonic oscillator differential equation (if time permits)
Applying Newtons second law of motion to a mass of 1 attached to a spring
with spring constant 4 leads to the differential equation
x = 4x.
Solve this equation by using matrices for the case where x(0) = 1, v(0) = 0.
The trick is to consider a vector


x(t)
~ =
w
, where v = x.

v(t)

15

Group Problems
1. Topology
(a) We can use the same conventions as for the ferryboat graph of week
1. Column j shows the links going out of page j. If Ti,j = 1, there is
a link from page j to page i. If Ti,j = 0, there is no link from page j
to page i.

0 1 0 0 0 0
1 0 0 0 0 0

0 1 0 1 0 0

T =
0 0 0 0 0 0 .

0 0 0 1 0 0
0 1 0 1 0 0
Draw the Web site graph that this matrix represents.
i. Open sets include {12} and {4}. List all the other open sets and
all the closed sets.
ii. Determine the interior, closure, and boundary of {123}.
iii. Determine to what point or points (if any) the sequence
(1, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 4, 6, 4, 6, 4, 6 ) converges.
(b) Recall the axioms of topology, which refer only to open sets:
The empty set and the set X are both open.
The union of any collection of open sets is open.
The intersection of two open sets is open.
A closed set C is defined as a set whose complement C c is open.
You may use the following well-known properties of set complements,
sometimes called De Morgans Laws:
(A B)c = Ac B c , (A B)c = Ac B c .
i. Prove directly from the axioms of topology that the union of two
closed sets is closed.
ii. In the Web site topology, a closed set of pages is one that has
no outgoing links to other pages on the site. Prove that in this
model, the union of two closed sets is closed.
iii. Prove that if A and B are closed subsets of R2 (with the topology
specified by open balls), their union is also closed.
(c) Subsets of R
A, and A.
i. Let A = {0} (1, 2]. Determine Ac , A,
S
ii. What interval is equal to n=2 [1 + n1 , 1 n1 ]? Is it a problem
that this union of closed sets is not a closed set?
iii. Let Q1 denote the set of rational numbers in the interval (1, 1).
Determine the closure, interior, and boundary of this set.
16

2. Convergence in Rn
(a) The sequence a1 , a2 , ... in Rn converges to a.
The sequence b1 , b2 , ... in Rn converges to b.
Define cn = an + bn , c = a + b.
Prove that the sequence c1 , c2 , ... in Rn converges to c. Use the triangle
inequality for vectors: the proof strategy is similar to the one that you
learned for sequences of real numbers.
(b) Suppose that the sequence a1 , a2 , ... in Rn converges to 0, and the sequence of real numbers k1 , k2 , , although not necessarily convergent,
is bounded: K > 0 such that n N, |kn | < K.
Prove that the sequence k1 a1 , k2 a2 , ... in Rn converges to 0.


0 1
(c) Prove that if J =
, then exp(Jt) = I cos t + J sin t. Show that
1 0
this is consistent with the Taylor series for eit .

17

3. Differential equations
(a) The original patriarchal differential equation problem
Isaac has established large flocks of sheep for his sons Jacob and Esau.
Anticipating sibling rivalry, he has arranged that the majority of the
growth of each sons flock will come from lambs born to the other son.
So, if x(t) denotes the total weight of all of Jacobs sheep and y(t)
denotes the total weight of all of Esaus sheep, the time evolution of
the weight of the flocks is given by the differential equations
x = x + 2y
y = 2x + y


1 2
i. Calculate exp(At), where A =
.
2 1
ii. Show that if the flocks are equal in size, they will remain that way.
What has this got to do with the eigenvectors of A?
iii. Suppose that when t = 0, the weight of Jacobs flock is S while the
weight of Esaus flock is 2S. Find formulas for the sizes as functions of time, and show that the flocks will become more nearly
equal in weight as time passes.


3 1

(b) Suppose that ~v = A~v, where A =


. Since p(t) = (t2)2 , there
1 1
is no basis of eigenvectors. By writing A as the sum of a multiple of
the identity matrix and a nilpotent matrix, calculuate exp(At).
~ = A~
(c) Convert x + 4x + 5x = 0 to a first-order equation of the formw
w,
1
and show that A = P CP 1 , where the first column of P is
and C
0
is conformal. Thereby determine x(t) for initial position x0 = 5 and
initial velocity v0 = 10. Dont multiply out the matrices let each
in turn act on the vector of initial condtions.

18

Homework
1. Suppose that you want to construct a Web site of six pages numbered 1
through 6, where the open sets of pages, defined as in lecture, include {126},
{124}, and {56}.
(a) Prove that in the Web site model of finite topology, the intersection
of two open sets is open.
(b) What other sets must be open in order for the family of open sets to
satisfy the intersection axiom?
(c) What other sets must be open in order for the family of open sets to
satisfy the union axiom?
(d) List the smallest family of open sets that includes the three given sets
and satisfies all three axioms. (You have already found all these sets!)
(e) Draw a diagram showing how six Web pages can be linked together so
that only the sets in this family are open. This is tricky. First deal with
5 and 6. Then deal with 1 and 2. Then incorporate 4 into the network,
and finally 3. There are many correct answers since, for example, if
page 1 links to page 2 and page 2 links to page 3, then adding a direct
link from page 1 to page 3 does not change the topology.
2. In R2 , in addition to defining an open ball Br around x, we can define an
open diamond Dr around x by
Dr (x) = {y R2 such that |x1 y1 | + |x2 y2 | < r}
and we can define an open square Sr around x by
Sr (x) = {y R2 such that max(|x1 y1 |, |x2 y2 |) < r}.
 
3
(a) For x =
, r = 1, make a sketch showing B1 (x), D1 (x), and S1 (x).
2
(b) Suppose that, in Hubbard definition 1.5.2, you replace open ball by
open diamond or open square. Prove that the topology remains
the same: i.e. that an open set according to one definition is an open
set according to either of the others.
(c) (Optional) Show that if, instead of two-component vectors, you use
infinite sequences, there is an open square of radius 1 centered on the
zero vector that is not contained in any open ball and an open ball of
radius 1 that is not contained in any open diamond. You can learn
more about infinite-dimensional vector spaces by taking Math 110,
Math 116, or Physics 143.

19

3. More theorems about limits of sequences


The sequence a~1 , a~2 , ... in Rn converges to ~a.
The sequence b~1 , b~2 , ... in Rn converges to ~b.
(a) Prove that the sequence of lengths |b~1 |, |b~2 |, ... in R is bounded:
K such that n, |b~n | < K. Hint: write b~m = b~m ~b + ~b, then use
the triangle inequality.
(b) Define the sequence of dot products: cn = a~n b~n .
Prove that c1 , c2 , converges to ~a ~b.
Hint: Subtract and add ~a b~n , then use the triangle inequality and the
Cauchy-Schwarz inequality.
1 1
4. Let A = 31 13
3

(a) By considering the length of A, show that


lim An

must be the zero matrix.


(b) Find a formula for An when n 1, and prove it by induction. Note
that the formula is not valid for n = 0.
(c) Verify the formula
(I A)1 = I + A + A2 + ....
for this choice of A. As was the case for sample problem 4, you can
evaluate the infinite sum on the right by summing a geometric series,
but you should split off the first term and start the geometric series
with the second term.

20

5. The differential equation x = 3x 2x describes the motion of an overdamped oscillator. The acceleration x is the result of the sum of a force
proportional to x,
supplied by a shock absorber, and a force proportional
to x, supplied by a spring.
 
x
~ =
(a) Introduce v = x as a new variable, and define the vector w
.
v
~ = A~
Find a matrix A such that w
w.
(b) Calculate the matrix exp(At).
(c) Graph x(t) for the following three sets of initial values that specify
position and velocity when t = 0:
 
1
~0 =
Release from rest: w
.
0
 
0
~0 =
Quick shove: w
.
1
 
1
~0 =
Push toward the origin: w
.
3

21



a b
6. Suppose that A is a matrix of the form S =
. Prove that
b a


cosh(bt) sinh(bt)
exp(St) = exp(at)
.
sinh(bt) cosh(bt)
Then use this result to solve
x = x + 2y
y = 2x + y
without having to diagonalize the matrix S.


1 9
7. Let B =
. Show that there is only one eigenvalue and find an
1 5
eigenvector for it. Then show that N = B I is nilpotent.
(a) By writing B = I + N , calculate B 2 .
(b) By writing B = I + N , solve the system of equations
x = x + 9y
y = x + 5y
 
x
for arbitrary initial conditions ~v0 = 0 .
y0


7 10
8. Week 4, sample problem 6, showed how to write A =
in the form


2 1

3 2
1 2
A = P CP 1 , where C =
is conformal and P
2 3
0 1

Follow up on this analysis


  to solve the differential equation ~v = A~v for
1
initial conditions ~v0 =
.
0
9. Let A be a 2 2 matrix which has two distinct real eigenvalues 1 and 2 ,
with associated eigenvectors ~v1 and ~v2 .
2I
(a) Show that the matrix P1 = A
is a projection onto the subspace
1 2
spanned by eigenvector ~v1 . Find its image and kernel, and show that
P12 = P1 .
1I
(b) Similarly, the matrix P2 = A
is a projection onto the subspace
2 1
spanned by eigenvector ~v2 . Show that P1 P2 = P2 P1 = 0, that P1 +P2 =
I, and that 1 P1 + 2 P2 = A.
(c) Show that exp(t1 P1 + t2 P2 ) = exp(1 t)P1 + exp(2 t)P2 , and use this
result to solve the equations
x = 4x + 5y
y = 2x + 3y
 
x
for arbitrary initial conditions ~v0 = 0 .
y0

22

MATHEMATICS 23a/E-23a, Fall 2015


Linear Algebra and Real Analysis I
Module #3, Week 2
Author: Paul Bamberg
R scripts by Paul Bamberg
Last modified: July 27, 2015 by Paul Bamberg
The lecture outline and problems have not yet been revised for 2015. Pages
1-6 are in final form. Print them if you are watching lecture preview videos or R
script videos before the course starts.
Reading
Hubbard, section 1.5, pages 92 through 99 (limits and continuity)
Hubbard, section 1.6 up through page 112.
Hubbard, Appendix A.3 (Heine-Borel)
Hubbard, section 1.7 up through page 133.
Proofs to present in section or to a classmate who has done them.
10.1 Let X R2 be an open set, and consider f : X R2 . Let x0 be a
point in X. Prove that f is continuous at x0 if and only if for every sequence
xi converging to x0 ,
lim f (xi ) = f (x0 ).

10.2 Using the Bolzano-Weierstrass theorem, prove that a continuous realvalued function f defined on a compact subset C Rn has a supremum M
and that there is a point a C (a maximum) where f (a) = M .

You may wish to feature Otzi


the Iceman as the protagonist of your proof.

R Scripts
Script3.2A-LimitFunctionR2.R
Topic 1 - Sequences that converge to the origin
Topic 2 - Evaluating functions along these sequences
Script 3.2B-AffineApproximation.R
Topic 1 - The tangent-line approximation for a single variable
Topic 2 - Displaying a contour plot for a function
Topic 3 - The gradient as a vector field
Topic 4 - Plotting some pathological functions

Executive Summary

1.1

Limits in Rn

To define limxx0 f (x), we need not require that x0 is in domain of f . We


require only that x0 is in the closure of the domain of f . This requirement
guarantees that for any > 0 we can find an open ball of radius around
x0 that includes points in the domain of f . There is no requirement that
all points in that ball be in the domain.
Limit of a function f from Rn to Rm :
We assume that the domain is a subset X Rn .
Definition: Function f : X Rm has the limit a at x0 :
lim f (x) = a

xx0

if x0 is in the closure of X and  > 0, > 0 such that


x X that satisfy |x x0 | < , |f (x) a| < .
limxx0 f (x) = a if and only if for all sequences with lim xn = x0 ,
lim f (xn ) = a. To show that a function f does not have a limit as x x0 ,
invent two different sequences, both of which converge to x0 , for which the
sequences of function values do not approach the same limit. Or just invent
one sequence for which the sequence lim f (xn ) does not converge!
If limxx0 f (x) = a and limxx0 f (x) = b, then a = b.


f1 (x)
Suppose f (x) =
.
f2 (x)
limxx0 f (x) = a if and only if limxx0 f1 (x) = a1 and limxx0 f2 (x) = a2 .
Properties of limits
These are listed on p. 95 of Hubbard. The proofs are almost the same as
for functions of one variable
Limit of sum = sum of limits.
Limit of product = product of limits.
Limit of quotient = quotient of limits if you do not have zero in the denominator.
Limit of dot product = dot product of limits. (proved on pages 95-96.)
These last two useful properties involve a vector-valued function f (x) and
a scalar-valued function h(x), both with domain U .
If f is bounded and h has a limit of zero, then hf also has a limit of zero.
If h is bounded and f has a limit of zero, then hf also has a limit of zero.

1.2

Continuous functions in topology and in Rn

Function f is continuous at x0 if, for any open set U in the codomain that
contains f (x0 ), the preimage (inverse image) of U , i.e. the set of points x
in the domain for which f (x) U , is also an open set.
Here is the definition that lets us extend real analysis to n dimensions.
f : Rn Rm is continuous at x0 if, for any open codomain ball of radius
 centered on f (x0 ), we can find an open domain ball of radius centered
on x0 such that if x is in the domain ball, f (x) is in the codomain ball.
An equivalent condition (your proof 10.1):
f is continuous at x0 if and only if every sequence that converges to x0 is a
good sequence. We will need to prove this for f : Rn Rm , but the proof
is almost identical to the proof for f : R R, which we have already done.
As was the case in R, sums, products, compositions, etc. of continuous
functions are continuous. If you can write a formula for a function of
several variables that does appear to involve division by zero, the theorems
on pages 98 and 99 will show that it is continuous.
To show that a function is discontinuous, construct a bad sequence!

1.3

Compact subsets and Bolzano-Weierstrass

A subset X Rn is bounded if there is some ball, centered on the origin,


of which it is a subset. If a nonempty subset C Rn is closed as well as
bounded, it is called compact.
Bolzano-Weierstrass theorem in Rn
The theorem says that given any sequence of points x1 , x2 , ... from a compact set C, we can extract a convergent subsequence whose limit is in C.
Easy proof (Ross,section 13.5)
In Rn , using the theorem that we have proved for R, extract a subsequence
where the first components converge. Then extract a subsequence where
the second components converge, continuing for n steps.
Hubbard, theorem 1.6.3, offers an alternative but nonconstructive proof.
Existence of a maximum
The supremum M of function f on set C is the least upper bound of the
values of f . The maximum, if it exists, is a point of evaluation: a point
a C such that f (a) = M . Infimum and minimum are defined similarly.
A continuous real-valued function f defined on a compact subset C Rn
has a supremum M and that there is a point a C (a maximum) where
f (a) = M . The proof (your proof 10.2) is similar to the proof in R.
4

1.4

The nested compact set theorem

Xk Rn is a decreasing sequence of nonempty compact sets: X1 X2 .


For example, in R, Xn = [1/n, 1/n]. In R2 . we can use nested squares.
The theorem states that

Xk 6= .

k=1
1
)
k

If Xk = (0, (not compact!), the infinite intersection is the empty set.


The proof (Hubbard, Appendix A.3) starts by choosing a point xk from each
set Xk , then invokes the Bolzano-Weierstrass theorem to select a convergent
subsequence yi that converges to a point a that
Tis contained in each of the Xk
and so is also an element of their intersection m=1 Xm .

1.5

The Heine-Borel theorem

The Heine-Borel theorem states that for a compact subset X Rn , any open
cover contains a finite subcover. In other words, if someone gives you a possibly
infinite collection of open sets Ui whose union includes every point in X, you can
select a finite number of them whose union still includes every point in X
X

m
[

Ui .

i=1

The proof (Hubbard, Appendix A.3) uses the nested compact set theorem.
In general topology, where the sets that are considered are not necessarily
subsets of Rn , the statement every open cover contains a finite subcover is
used as the definition of compact set.

1.6

Partial derivatives

If U is an open subset of Rn and function f : U R is defined by a formula



x1
x2

f

xn
then its partial derivative with respect to the ith variable is


a1
a1

1
f
...
... )
= Di f (a) = lim (f

f
ai + h
ai
h0 h
xi
an
an
This does not give the generalization we want. It specifies a good approximation to f only along a line through a, whereas we would like an approximation
that is good in a ball around a.
5

1.7

Directional derivative, Jacobian matrix, gradient

Let ~v be the direction vector of a line through a. Imagine a moving particle


whose position as a function of time t is given by a + t~v on some open interval
that includes t = 0. Then f (a + t~v) is a function of the single variable t. The
derivative of this function with respect to t is the directional derivative.
More generally, we use h instead of t and define the directional derivative as
f (a + h~v) f (a)
h
If the directional derivative is a linear function of ~v, in which case f is said
to be differentiable at a, then the directional derivative can be calculated if we
know its value for each of the standard basis vectors. Since
~v f (a) = lim

h0

f (a + h~ei ) f (a)
= Di f (a)
h0
h

~ei f (a) = lim


we can write

~v f (a) = D1 f (a)v1 + D2 f (a)v2 + + Dn f (a)vn .


For a more compact notation, we can make the partial derivatives into a 1 n
matrix, called the Jacobian matrix
[Jf (a)] = [D1 f (a)D2 f (a) Dn f (a)],
whereupon
~v f (a) = [Jf (a)]~v.
Alternatively, we can make the partial derivatives into a column vector, the
gradient vector

D1 f (a)
D2 f (a)

grad f (a) =
,
Dn f (a)
so that
~v f (a) = grad f (a) ~v.
We now have, for differentiable functions (and we will soon prove that if
the partial derivatives of f are continuous, then f is differentiable), a useful
generalization of the tangent-line approximation of single variable calculus.
f (a + h~v) f (a) + [Jf (a)](h~v)
This sort of approximation (a constant plus a linear approximation) is called
anaffine approximation.
6

Lecture outline
1. Given that function f : Rk Rm is continuous at x0 , prove that every
sequence such that xn x0 is a good sequence in the sense that f (xn )
converges to f (x0 ). (This is half of proof 10.1.)

2. Given that function f : Rk Rm is discontinuous at x0 , show how to


construct a bad sequence such that xi x0 but f (xi ) does not converge
to f (x0 ). (This is the other half of proof 10.1).

3. A fanciful version of proof 10.2: a continuous real-valued function f defined


on a compact subset C Rn has a supremum M and there is a point a C
(a maximum) where f (a) = M .
the Iceman, whose mummy is the featured exhibit at the archaeological
Otzi
museum in Bolzano, Italy, has a goal of camping at the greatest altitude
M on the Tyrol, a compact subset of the earths surface on which altitude
is a continuous function f of latitude and longitude.
can select a sequence
(a) Assume that there is no supremum M. Then Otzi
of campsites in C such that
f (x1 ) > 1, f (x2 ) > 2,... f (xn ) > n, . Show how to use BolzanoWeierstrass to construct a bad sequence, in contradiction to the
assumption that f is continuous.

(b) On night n, Otzi


chooses a campsite whose altitude exceeds M 1/n.
From this sequence, extract a convergent subsequence, and call its
limit a. Show that f (a) = M , so a is a maximum, and M is not
merely a supremum but a maximum value.

4. Nested compact sets


You have purchased a nice chunk of Carrara marble from which to carve
the term project for your GenEd course on Italian Renaissance sculpture.
On day 1 the marble occupies a compact subset X1 of the space in your
room. You chip away a bit every evening, hoping to reveal the masterpiece
that is hidden in the marble, and you thereby create a decreasing sequence
of nonempty compact sets: X1 X2 .
Your understanding instructor gives you an infinite extension of time on the
project. Prove that there is a point a that forever remains in the marble,
no matter how much you chip away; i.e. that

Xk 6= .

k=1

10

5. Heine-Borel theorem (proved in R2 , but the proof is the same for Rn .)


Suppose that you need security guards to guard a compact subset X R2 .
Heine-Borel Security, LLC proposes that you should hire an infinite number
of their guards, each of whom will patrol an open subset Ui of R2 . These
guards protect all of X: the union of their patrol zones is an open cover.
Prove that you can fire all but a finite number m of the security guards
(not necessarily the first m) and your property will still be protected:
X

m
[

Ui .

i=1

Break up the part of the city where your property lies into closed squares,
each 1 kilometer on a side. There will exist a square B0 that needs infinitely
many guards (the infinite pigeonhole principle).
Break up this square into 4 closed subsquares: again, at least one will need
infinitely many guards. Choose one subsquare and call it B1 . Continue this
procedure to get a decreasing sequence Bi of nested compact sets, whose
intersection includes a point a.
Now show that any guard whose open patrol zone includes a can replace
all but a finite number of other guards.

11

6. Cauchy sequences in Rn
Prove that every Cauchy sequence of vectors ~a1 , ~a2 , Rn is bounded:
i.e. M such that n, |~an | < M .
Hint: ~an = ~an ~am + ~am . When showing that a sequence is bounded,
you can ignore the first N terms.
Prove that if a sequence a1 , a2 , Rn converges to a, it is a Cauchy
sequence. Hint: am an = am a + a an . Use the triangle inequality.
Prove that every convergent sequence of vectors ~a1 , ~a2 , Rn is
bounded (very easy, given the preceding results.)

12

7. Using sequences to show that a limit does not exist.


 
x2 y 2
x
f
= 2
y
x + y2
Construct sequences (xn ), all of which converge to the origin, with the
following properties:
(a) lim f (xn ) = 1.
(b) lim f (xn ) = 0.
(c) lim f (xn ) = 3/5..
(d) lim f (xn ) does not exist.
Express f in terms of polar coordinates to make it clear what is going on.

13

8. A challeging bad sequence construction, from Hubbard pp. 96-97.


|y|
 
|y|e x2
x
f
=
y
x2

(a) Evaluate f on the sequence xn = 1/n, yn = m/n for arbitrary m


(b) Evaluate f on the sequence xn = 1/n, yn = 1/n2

14

9. Continuity and discontinuity in R3


(a) Define


x
0
xyz

F y = 2
, F 0 = 0.
x + y2 + z2
z
0
Prove that F is continuous at the origin.
(b) Define


0
x
xy + xz + yz
0 = 0.
,
g
g y = 2
x + y2 + z2
z
0
Prove that g is discontinuous at the origin.

15

10. Converse of Heine-Borel in R


The converse of Heine-Borel says that if the U.S goverment is hiring HeineBorel security to guard a subset X of the road from Mosul to Damascus
and wants to be sure that they do not have to pay an infinite number of
guards, then X has to be closed and bounded.
(a) What happens if Heine-Borel assigns guard k to patrol the open interval (k, k)?
(b) What happens if Heine-Borel selects a point x0 that is not in X and
assigns guard k to patrol the interval (x0 1/k, x0 + 1/k).?

16

 
p
x
11. Let f
= xy 3 .
y
 
4
Evaluate the Jacobian matrix of f at
and use it to find the best affine
1
 
 
4
2
approximation to f (
+t
) for small t.
1
1
 
 
4
2
By defining g(t) = f (
+t
), you can convert this problem to one
1
1
in single-variable calculus. Show that using the tangent-line approximation
near t = 0 leads to exactly the same answer.

17

12. A clever applcation of the gradient vector


The Cauchy-Schwarz inequality says that
grad f v |gradf ||v|, with equality when grad f and v are proportional.
If v is a unit vector, the maximum value of the directional derivative occurs
when v is a multiple of grad f .
Suppose
  that the temperature T in a open subset of the plane is given by
x
T
= 25 + 0.1x2 y 3 . If you are at x = 1, y = 2, along what direction
y
should you walk to have temperature increase most rapidly?

18

Group Problems
1. Theorems related to Bolzano-Weierstrass and Heine-Borel
(a) You are working for Heine-Borel Security and are bidding on a project
to guard the interior of one mile of Pennsylvania Avennue between the
Capitol to the White House, modeled as the open interval I = (0, 1).
Show that you can create a countably infinite set of disjoint open patrol
zones which cover only a subset of I, so that no finite subcover will be
possible. Then show that you cannot do the same with an uncountably
infinite set of disjoint open patrol zones. (Hint: each zone includes a
different rational number.)
(b) A school playground is a compact subset C R2 . Two aspiring quarterbacks are playing catch with a football, and they want to get as far
apart as possible. Show that if sup |x y| = D for any two points in
C, they can find a pair of points x0 and y0 such that |x0 y0 | = D.
Then invent simple examples to show that this cannot be done if the
playground is unbounded or is not closed.
(c) The converse of the Heine-Borel theorem states that if every open
cover of set X Rn contains a finite subcover, then X must be closed
and bounded.
i. By choosing as the open cover a set of open balls of radius 1, 2, ,
prove that X must be bounded.
ii. To show that X is closed, show that its complement X c must be
open. Hint: choose any x0 X c and choose an open cover of X
in which the kth set consists of points whose distance from x0 is
greater than k1 . This open cover of X must have a finite subcover.
If you need a further hint, look on pages 90 and 91 of Chapter 2 of
Ross.

19

2. Limits and continuity in R2


(a) Define
 
 
xy 3
x
0
f
= 2
,f
= 0.
6
y
0
x +y
1
1
i
Show that the sequence 1 is good but that i13 is bad.
i

(b) Let
 
 
xy(x2 y 2 )
x
0
= 0.
f
=
,f
2
2
2
0
y
(x + y )
 
0
Invent a bad sequence of points (a1 , a2 , ) that converges to
0
for which
lim f (ai ) 6= 0.
i
 
0
This bad sequence proves that f is discontinuous at
.
0
(c) Let
 
 
xy(x2 y 2 )
x
0
g
=
= 0.
,g
2
2
y
0
x +y
 
0
.
By introducing polar coordinates, prove that g is continuous at
0

20

3. Using partial derivatives to find approximate function values


 
 
x
2
2
(a) Let f
= x y. Evaluate the Jacobian matrix of f at
and use
y
0.5 



1.98
1.998
it to find the best affine approximation to f
and to f
.
0.51
0.501
Use a calculator or R, find the remainder (the difference between
the actual function value and the best affine approximation) in each
case. You should find that the remainder decreases by a factor that is
much greater than 10.
(b) Let
 
x2 y
x
f
= 4
.
y
x + y2
 
0
f is defined to be 0 at
. Show that both partial derivatives are
0
 
0
but that the function is not continuous there.
zero at
0
 
x
(c) Let f
= y + log(xy) (natural logarithm) for x, y > 0. Evaluate
y
 
0.5
the Jacobian matrix of f at
and use it to find the best affine
2


0.51
approximation (constant plus linear approximation) to f
.
2.02

21

Homework
1. A rewrite of Oetzi the Iceman, with lots of sign changes.
Joe the Plumber, who became a minor celebrity in the 2008 presidential
campagn, has hit the jackpot. Barack Obama enrolls him in a health plan,
formerly available only to members of Congress, that makes him immortal,
and gives him a special 401(k) that delivers $10K per month of tax-free
income. Joe retires to pursue his lifelong dream of camping at the lowest
spot in Death Valley.
Assume that Death Valley National Park is a closed set and that altitude
f (x) in the Park is a continuous function. Prove that the altitude in Death
Valley has a greatest lower bound (even though that is obvious on geographical grounds) and that there is a place where that lower bound is achieved,
so that Joe can achieve his goal.
2. You are the mayor of El Dorado. Not all the streets are paved with gold
only the interval [0,1] on Main Street but you still have a serious security
problem, and you ask Heine-Borel Security LLC to submit a proposal for
keeping the street safe at night. Knowing that the city coffers are full, they
come up with the following pricey plan for meeting your requirements by
using a countable infinity of guards:
Guard 0 patrols the interval ( N1 , N1 ), where you may choose any value
greater than 100 for the integer N . She is paid 200 dollars.
Guard 1 patrols the interval (0.4, 1.2) and is paid 100 dollars.
Guard 2 patrols the interval (0.2, 0.6) and is paid 90 dollars.
Guard 3 patrols the interval (0.1, 0.3) and is paid 81 dollars.
, 2.4 ) and is paid 100(0.9)k1 dollars.
Guard k patrols the interval ( 0.8
2k 2k
(a) Calculate the total cost of hiring this infinite set of guards (sum a
geometric series).
(b) Show that the patrol regions of the guards form an open cover of
the interval [0,1].
(c) According to the Heine-Borel theorem, this infinite cover has a finite
subcover. Explain clearly how to construct it. (Hint: look at the proof
of the Heine-Borel theorem)
(d) Suppose that you want to protect only the open interval (0,1), which
is not a compact subset of Main Street. In what very simple way can
Heine-Borel Security modify their proposal so that you are forced to
hire infinitely many guards?

22

3. Prove the Heine-Borel theorem in R2 by contraposition. Assume that you


have been given a countably infinite collection of open sets Ui that cover
a compact set X, and assume that no finite subcollection covers X. Show
(for a contradiction) that you can identify a single U that replaces all but
finitely many of the Ui .
4. Hubbard, Exercise 1.6.6. You might want to work parts (b) and (c) before
attempting part (a). The function f (x) is defined for all of R, which is not
a compact set, so you will have to do some work before applying theorem
1.6.9. Notice that a maximum does not have to be unique: a function
could achieve the same maximum value at more than one point.
5. Singular Point, California is a spot in the desert near Death Valley that is
reputed to have been the site of an alien visit to Earth. In response to a
campaign contribution from AVSIG, the Alien Visitation Special Interest
Group, the government has agreed to survey the region around the site.
In the vicinity, the altitude is given by the function
 
2x2 y
x
.
f
= 4
y
x + y2
A survey team that traveled through the Point going west to east declares
that the altitude at the Point itself is zero. A survey team that went
south to north would comment only that zero was perhaps a reasonable
interpolation.
(a) Suppose you travel through the Point along the line y = mx, passing
through the point at time t = 0 andmoving
witha constant velocity


x
t
such that x = t: in other words,
=
. Find a function
y
mt
g(m, t) that gives your altitude as a function of time on this journey.
Sketch graphs of g as a function of t for m = 1 and for m = 3. Is what
happens for large m consistent with what happens on the y axis?
 
0
, for which xn = n1
(b) Find a sequence of points that converges to
0
 
x
and f
= 1 for every point in the sequence. Do the same for
  y
x
f
= 1.
y
(c) Is altitude a continuous function at Singular Point? Explain.

23

6. (a) Hubbard, exercise 1.7.12. This is good practice in approximating a


function by using its derivative and seeing how fast the remainder
goes to zero.
(b) Hubbard, exercise 1.7.4. These are all problems in single-variable calculus, but they cannot be solved by using standard differentation formulas. You have to use the definition of the derivative as a limit.
7. Linearity of the directional derivative.
 
2
Suppose that, near the point a =
, the Celsius temperature is specified
1
 
x
by the function f
= 20 + xy 2 .
y
 
1
(a) Suppose that you drive with a constant velocity vector ~v1 =
, pass3
 
2
ing through the point
at time t = 0. Express the temperature
1
outside your car as a function g(t) and use single-variable calculus to
calculate g 0 (0), the rate at which the reading on your cars thermometer is changing. You have calculated the directional derivative of f
along the vector ~v1 by using single-variable calculus.
 
1
(b) Do the same for the velocity vector ~v2 =
.
1
(c) As it turns out, the given function f is differentiable, and the directional derivative is therefore a linear function of velocity. Use this fact
to determine the
 directional derivative of f along the standard ba0
sis vector ~e2 =
from your earlier answers, and confirm that your
1
answer agrees with the partial derivative D2 f (a).
(d) Remove all the mystery from this problem by recalculating the directional derivatives using the formula [Df (a)]~v.
 
 

2
x
and use it
8. Let f
= x y. Evaluate the Jacobian matrix of f at
4
y


1.98
to find the best affine approximation to f
.
4.06

As you can confirm by using a calculator, 1.98 4.06 = 3.989589452...

24

9. (a) Hubbard, Exercise 1.7.22. This is a slight generalization of a topic that


was presented in lecture. The statement is in terms of derivatives, but
it is equivalent to the version that uses gradients.
(b) An application: suppose that you are skiing on a mountain
  where
x
the height above sea level is described by the function f
= 1
y
0.2x2 0.4y 2 (with the kilometer as the unit of
 distance,

 this is not
x
1
unreasonable). You are located at the point
=
. Find a
y
1
unit vector ~v along the direction in which you should head if you want
~ 1 and w
~2
to head straight down the mountain and two unit vectors w
that specify directions for which your rate of descent is only 35 of the
maximum rate.
(c) Prove that in general, the unit vector for which the directional derivative is greatest is orthogonal to the direction along which the directional derivative is zero, and use this result to find a unit vector ~u
appropriate for a timid but lazy skier who wants to head neither down
nor up.

25

MATHEMATICS 23a/E-23a, Fall 2015


Linear Algebra and Real Analysis I
Module #3, Week 3
Differentiability, Newtons method, inverse functions
Author: Paul Bamberg
R scripts by Paul Bamberg
Last modified: July 26, 2015 by Paul Bamberg
The lecture outline and problems have not yet been revised for 2015. Pages
1-6 are in final form. Print them if you are watching lecture preview videos or R
script videos before the course starts.
Reading
Hubbard, section 1.7 (you have already read most of this)
Hubbard, sections 1.8 and 1.9 (computing derivatives and differentaibility)
Hubbard, section 2.8 page 233-235 and page 246. (Newtons method)
Hubbard, section 2.10 up through page 264. (inverse function theorem)
Proofs to present in section or to a classmate who has done them.
11.1 Let U Rn be an open set, and let f and g be functions from U to
R. Prove that if f and g are differentiable at a then so is f g, and that
[D(f g)(a)] = f (a)[Dg(a)] + g(a)[Df (a)].
11.2 Using the mean value theorem, prove that if a function f : R2 R has
partial derivatives D1 f and D2 f that are continuous
at a, it is differentiable


at a and its derivative is the Jacobian matrix D1 f (a) D2 f (a) .

R Scripts
Script 3.3A-ComputingDerivatives.R
Topic 1 - Testing for differentiability
Topic 2 - Illustrating the derivative rules
Script 3.3B-NewtonsMethod.R
Topic 1 - Single variable
Topic 2 - 2 equations, 2 unknowns
Topic 3 - Three equations in three unknowns
Script 3.3C-InverseFunction.R
Topic 1 - A parametrization function and its inverse
Topic 2 - Visualizing coordinates by means of a contour plot
Topic 3 - An example that is economic, not geometric

Executive Summary

1.1

Definition of the derivative

Converting the derivative to a matrix


The linear function f (h) = mh is represented by the 1 1 matrix [m].
When we say that f 0 (a) = m, what we mean is that the function
f (a + h) f (a) is well approximated, for small h, by the linear function
mh. The error made by using the approximation is a remainder r(h) =
f (a + h) f (a) mh. If f is differentiable, this remainder approaches 0
faster than h, i.e.
f (a + h) f (a) mh
r(h)
= lim
= 0.
h0
h0 h
h
lim

This definition leads to the standard rule for calculating the number m,
f (a + h) f (a)
.
h0
h

m = lim

Extending this definition to f : Rn Rm


A linear function L(~h) is represented by an m n matrix.
When we say that f is differentiable at a, we mean that the function
f (a + ~h) f (a) is well approximated, for any ~h whose length is small, by a
linear function L, called the derivative [Df (a)].
The error made by using the approximation is a remainder
r(~h) = f (a + ~h) f (a) [Df (a)](~h).
f is called differentiable if this remainder approaches 0 faster than |~h|, i.e.
1 ~
1
r(h) = lim
(f (a + ~h) f (a) [Df (a)](~h)) = 0.
~
~
h~0 |~
h~0 |~
h|
h|
lim

In that case, [Df (a)] is represented by the Jacobian matrix [Jf (a)].
Proof: Since L exists and is linear, it is sufficient to consider its action on
each standard basis vector. We choose ~h = t~
ei so that |~h| = t. Knowing
that the limit exists, we can use any sequence that converges to the origin
to evaluate it, and so
1
1
ei ) f (a) tL~
ei )) = 0? and L(~
ei ) = lim (f (a + t~
ei ) f (a))
lim (f (a + t~
t0 t
t0 t
What is hard is proving that f is differentiable that L exists since that
requires evaluating a limit where ~h ~0. Eventually we will prove that f is
differentiable at a if all its partial derivatives are continuous there.

1.2

Proving differentiability and calculating derivatives

In every case f is a function from U to Rm , where U is an open subset of Rn .


f is constant: f = c. Then [Df (a)] is the zero linear transformation, since
1
1
(f (a + ~h) f (a) [Df (a)]~h) = lim
(c c ~0) = ~0.
~
~
h~0 |~
h~0 |~
h|
h|
lim

f is affine: a constant plus a linear function, f = c + L. [Df (a)] = L , since


1
1
(f (a+~h)f (a)[Df (a)]~h) = lim
(c+L(a+~h)(c+L(a))L(~h)) = 0.
~
~
~
~
h~0 |h|
h~0 |h|
lim

f1
Df1 (a)

: then Df (a) =

f has differentiable components: if f =



fn
Dfn (a)
.
f + g is the sum of two functions f and g, both differentiable at a.
The derivative of f + g is the sum of the derivatives of f and g. (easy to
prove)
f g is the product of scalar-valued function f and vector-valued g, both
differentiable. Then
[D(f g)(a)]~v = f (a)([Dg(a)]~v) + ([Df (a)]~v)g(a).
g/f is the quotient of vector-valued function g and scalar-valued f , both
differentiable, and f (a) 6= 0. Then
g
[Dg(a)]~v ([Df (a)]~v)g(a)
[D( )(a)]~v =

.
f
f (a)
(f (a)2
U Rn and V Rm are open sets, and a is a point in U at which we want
to evaluate a derivative.
g : U V is differentiable at a, and [Dg(a)] is an m n Jacobian matrix.
f : V Rp is differentiable at g(a), and [Df (g(a))] is a p m Jacobian
matrix.
The chain rule states that [D(f g)(a))] = [Df (g(a))] [Dg(a)].
The combined effect of all these rules is effectively that if a function is
defined by well-behaved formulas (no division by zero), it if differentiable,
and its derivative is represented by its Jacobian matrix.

1.3

Connection between Jacobian matrix and derivative

If f : Rn Rm is defined on an open set U Rn , and


x1
f1 (x)
f (x) = f ... = ...
xn
fm (x)
the Jacobian matrix [Jf (x)] is made up of all the partial derivatives of f :

D1 f1 (a)....Dn f1 (a)

...
[Jf (a)] =
D1 fm (a)....Dn fm (a)
We can invent pathological cases where the Jacobian matrix of f exists
(because all the partial derivatives exist), but the function f is not differentiable. In such a case, using the formula
~v f (a) = [Jf (a)]~v
generally gives the wrong answer for the directional derivative! You are
trying to use a linear approximation where none exists.
Using the Jacobian matrix of partial derivatives to get a good affine approximation for f (a + ~h) is tantamount to assuming that you can reach the
point a + ~h by moving along lines that are parallel to the coordinate axes
and that the change in the function value along the solid horizontal line is
well approximated by the change along the dotted horizontal line. With
the aid of the mean value theorem, you can show that this is the case if
(proof 11.2) the partial derivatives of f at a are continuous.
(a1 , a2 + h2 ) (a1 + h1 , a2 + h2 )

(a1 , a2 )

1.4

(a1 + h1 , a2 )

Newtons method one variable

Newtons method is based on the tangent-line approximation. Function f is


differentiable. We are trying to solve the equation f (x) = 0, and we have found
a value a0 that is close to the desired x. So we use the best affine approximation
f (x) f (a0 ) + f 0 (x0 )(x a0 ).
Then we find a value a1 for which this tangent-line approximation equals zero.
f (a0 ) + f 0 (x0 )(a1 a0 ) = 0, and a1 = a0 f (a0 )/f 0 (a0 ).
When f (a0 ) is small, f 0 (a0 ) is large, and f 0 (a0 ) does not change too rapidly, a1
is a much improved approximation to the desired solution x. Details, for which
Kantorovich won the Nobel prize in economics, are in Hubbard.
5

1.5

Newtons method more than one variable

Example: we are trying to solve a system of n nonlinear equations in n unknowns,


e.g.
x2 ey sin(y) 0.3 = 0
tan x + x2 y 2 1 = 0.
Ordinary algebra is no help there is no nonlinear counterpart to row reduction.
n
U is an open subset
, and we have a differentiable
function ~f (x) : U Rn .
 of
R

x
x2 ey sin(y) 0.3
In the example, ~f
=
, which is differentiable.
y
tan x + x2 y 2 1
We are trying to solve the equation ~f (x) = ~0.
Suppose we have found a value a0 that is close to the desired x.
Again we use the best affine approximation
~f (x) ~f (a0 ) + [D
f (a0 )](x a0 ).
We set out to find a value a1 for which this affine approximation equals zero.
~f (a0 ) + [D
f (a0 )](a1 a0 ) = ~0
This is a linear equation, which we know how to solve!
If [D
f (a0 )] is invertible (and if it is not, we look for a better a0 ), then
1
a1 = a0 [D
f (a0 )] ~f (a0 ).

Iterating this procedure is the best known for solving systems of nonlinear equations. Hubbard has a detailed discussion (which you are free to ignore) of how
to use Kantorovichs theorem to assess convergence.

1.6

The inverse function theorem short version

For function f : [a, b] [c, d], we know that if f is strictly increasing or strictly
decreasing on interval [a, b], there is an inverse function g for which g f and
f g are both the identity function. We can find g(y) for a specific y by solving
f (x) y = 0, perhaps by Newtons method. If f (x0 ) = y0 and f 0 (x0 ) 6= 0, we
can prove that g is differentiable at y0 and that g 0 (y0 ) = 1/f 0 (x0 ).
Strictly monotone does not generalize, but nonzero f 0 (x0 ) generalizes to
invertible [Df (x0 )]. Start with a function f : Rn Rn whose partial derivatives
are all continuous, so that we know that it is differentiable everywhere. Choose
a point x0 where the derivative [Df (x0 )] is an invertible matrix. Set y0 = f (x0 ).
Then there is a differentiable local inverse function g = f 1 such that
g(y0 ) = x0 .
f (g(y)) = y if y is close enough to y0 .
[Dg(y)] = [Df (g(y))]1 (follows from the chain rule)
6

Lecture outline
1. (Proof 11.1)
Let U Rn be an open set, and let f and g be functions from U to R.
Prove that if f and g are differentiable at a then so is f g, and that
[D(f g)(a)] = f (a)[Dg(a)] + g(a)[Df (a)].
(This is simpler than the version in Hubbard because both f and g are
scalar-valued functions)
2. (Chain rule in R2 not a proof, but still pretty convincing)
U R2 and V R2 are open sets, and a is a point in U at which we want
to evaluate a derivative.
g : U V is differentiable at a, and [Dg(a)] is a 2 2 Jacobian matrix.
f : V R2 is differentiable at g(a), and [Df (g(a))] is a 2 2 Jacobian
matrix.
The chain rule states that [D(f g)(a))] = [Df (g(a))] [Dg(a)].
Draw a diagram to illustrate what happens when you use derivatives to find
a linear approximation to f g)(a))]. This can be done in a single step or
in two steps.

3. (Proof 11.2) Using the mean value theorem, prove that if a function f :
R2 R has partial derivatives D1 f and D2 f that are continuous
at a, it is 

differentiable at a and its derivative is the Jacobian matrix D1 f (a) D2 f (a) .
4. Newtons method
(a) One variable: Function f is differentiable. You are trying to solve the
equation f (x) = 0, and you have found a value a0 , close to the desired
x, for which f (a0 ) is small. Derive the formula a1 = a0 f (a0 )/f 0 (a0 )
for an improved estimate.
(b) n variables: U is an open subset of Rn , and function ~f (x) : U Rn is
differentiable. You are trying to solve the equation ~f (x) = ~0,
and you have found a value a0 , close to the desired x, for which ~f (a0 )
is small. Derive the formula
1
a1 = a0 [D
f (a0 )] ~f (a0 ).

for an improved estimate.


5. Derivative of inverse function
Suppose that f : Rn Rn is a continuously differentiable function. Choose
a point x0 where the derivative [Df (x0 )] is an invertible matrix. Set y0 =
f (x0 ). Let g be the differentiable local inverse function g = f 1 such that
g(y0 ) = x0 and f (g(y)) = y if y is close enough to y0 .
Prove that [Dg(y0 )] = [Df (x0 )]1

6. Jacobian matrix for a parametrization function


Here is the function that converts the latitude u and longitude v of a point
on the unit sphere to the Cartesian coordinates of that point.

 
cos u cos v
u
f
= cos u sin v
v
sin u
Work out the Cartesian coordinates of the point with sin u = 35 (37 degrees
North latitude) and sin v = 1(90 degrees East longitude), and calculate the
Jacobian matrix at that point. Then find the best affine approximation to
the Cartesian coordinates of the nearby point where u is 0.01 radians less
(going south) and v is 0.02 radians greater (going east).

7. Derivative of a function of a matrix (Example 1.7.17 in Hubbard):


A matrix is also a vector. When we square an n n matrix A, the entries of
S(A) = A2 are functions of all the entries of A. If we change A by adding
to it a matrix H of small length, we will make a change in the function
value A2 that is a linear function of H plus a small remainder.
We could in principle represent A by a column vector with n2 components
and the derivative of S by a very large matrix, but it is more efficient to
leave H in matrix form and use matrix multiplication to find the effect of
the derivative an a small increment matrix H. The derivative is still a linear
function, but it is represented by matrix multiplication in a different way.
(a) Using the definition of the derivative, show that the linear function
that we want is DS(H) = AH + HA.
(b) Confirm that DS is a linear function of H
(c) Check that DS(H) is a good approximation to S(A+H)S(A) for the
following simple case, where the matrices A and H do not commute.




1 1
0 h
A=
,H=
0 1
k 0

10

8. Two easy chain rule examples


(a) g : R R2 maps time into the position of a particle moving around
the unit circle:


cos t
g(t) =
.
sin t
f : R2 R maps a point into the temperature at that point.
 
x
f
= x2 y 2
y
The composition f g maps time directly into temperature .
Confirm that [D(f g)(t)] = [Df (g(t))] [Dg(t)].
(b) Let : R R be any differentiable function. You can make a function
f : R2 R that is constant
on any circle centered at the origin by
 
x
forming the composition f
= (x2 + y 2 ).
y
Show that f satisfies the partial differential equation yD1 f xD2 f = 0.

11

9. Chain rule for functions of matrices


In sample problem 2 we showed that the derivative of the squaring function
S(A) = A2 is DS(H) = AH + HA
Proposition 1.7.19 (tedious proof on pp. 136-137) establishes the similar
rule that for T (A) = A1 , the derivative is DT (H) = A1 HA1
Now the function U (A) = A2 can be expressed as the composition U =
S T.
Find the derivative DU (H) by using the chain rule.
The chain rule says the derivative of a composition is the composition of
the derivatives, even in a case like this where composition is not represented by matrix multiplication.

12

10. A non-differentiable function


Consider a surface where the height z is given by the function
 
 
3x2 y y 3
x
0
f
= 2
;f
= 0.
2
y
0
x +y
This function is not differentiable at the origin, and so you cannot calculate
its directional derivatives there by using the Jacobian matrix!
(a) Along the first standard basis vector, the directional derivative at the
origin is zero. Find two unit vectors along other directions that also
have this property.
(b) Along the second standard basis vector, the directional derivative at
the origin is -1.
Find two unit vectors along other directions that also have this property. (This surface is sometimes called a monkey saddle, because a
monkey could sit comfortably on it with its two legs and its tail placed
along these three downward-sloping directions.
(c) Calculate
 the
 directional derivative along an arbitrary unit vector
cos
~e =
. Using the trig identity sin 3 = 3 sin cos2 sin3 ,
sin
quickly rederive the special cases of parts (a) and (b).
(d) Using the definition of the derivative, give a convincing argument that
this function is not differentiable at the origin.

13

11. Newtons method


We want an approximate solution to the equations
log x + log y = 3
x2 y = 1
  
  
x
log x + log y 3
0
i.e. f
=
=
.
2
y
x y1
0
 
3
Knowing that log 3 1.1, show that x0 =
is an approximate solution
9
to this equation, then use Newtons method to improve the approximation.
Here is a check:
log 2.81 + log 6.87 = 2.98
2.812 6.87 = 1.02

14

12. An economic example of the inverse-function theorem:


Your model: Providing x in health benefits and y in educational benefits
leads to happiness H and cost C according the the equation

 
  
H
x
x + x0.5 y
.
=f
=
C
y
x1.5 + y 0.5
Currently, x = 4, y = 9, H = 22, C = 11. Your budget is cut, and you are
told to adjust x and y to reduce C to 10 and H to 19. Find an approximate
solution by using the inverse-function theorem.
 
H
We cannot find formulas for the inverse function g
that would solve
C
the problem exactly, but we can calculate the derivative of g.
"
# 

13
1 + 2y x
x
2
(a) Check that[Df ] =
= 4 1 is invertible.
3
1

x
3 6
2
2 y


 
0.03 0.36
19
(b) Use the derivative [Dg] =
to approximate g
0.55 0.6
10

15

Group Problems
1. Chain rule
(a) Chain rule for matrix functions
On smple problem 4, we obtained the differentiation formula for U (A) =
A2 by writing U = S T with S(A) = A2 , T (A) = A1 . Prove
the same formula from the chain rule in a different way, by writing
U = T S. You may reuse the formulas for the derivatives of S and
T:
If S(A) = A2 then [DS(A)](H) = AH + HA.
If T (A) = A1 then [DT (A)](H) = A1 HA1 .
(b) Let U R2 be the set of points whose coordinates
  are both positive.
x
Suppose that f : U R can be written f
= (y/x), for some
y
differentiable : R R.
Show that f satisfies the partial differential equation
xD1 f + yD2 f = 0.
(c) Chain rule with 2 2 matrices
 
r
Start with a pair of polar coordinates
.

 
x
Function g converts them to Cartesian
.
y
 

x
2xy
Function f then converts
to
.
y
x2 y 2
 
 
 
r
r
r
Confirm that [D(f g)(
))] = [Df (g
)] [Dg
]

16

2. Issues of differentiability
(a) Let
 
x2 y 2
x
f
= 2
.
y
x + y2
 
0
f is defined to be 0 at
. State, in terms of limits, what it means
0
 
0
to say that f is differentiable at
and prove that its derivative
0
 
0
[Df
] is the zero linear transformation.
0
(b) Suppose that A is a matrix and S is the cubing function given by the
formula S(A) = A3 . Prove that the derivative of S(A) is
[DS(A)](H) = A2 H + AHA + HA2 .
The proof consists in showing that the length of the remainder goes
to zero faster than the length of the matrix H.
(c) A continuous but non-differentiable function
 
 
x2 y
0
x
,f
= 0.
f
= 2
0
y
x + y2
i. Show that both partial derivatives vanish at the origin, so that
the Jacobian matrix at the origin isthe
 zero matrix [0 0], but
1
that the directional derivative along
is not zero. How does
1
this calculation show that the function is not differentiable at the
origin?
ii. For all points except the origin, the partial derivatives are given
by the formulas
 
 
x4 x2 y 2
2xy 3
x
x
,
D
f
=
D1 f
= 2
2
y
y
(x + y 2 )2
(x2 + y 2 )2
Construct a bad sequence of points approaching the origin to
show that D1 f is discontinuous at the origin.

17

3. Inverse functions and Newtons method


(to be done in R, by modifying R script 3.3B)
(a) An approximate solution to the equations
x3 + y 2 xy = 1.08
x2 y + y 2 = 2.04
is x0 = 1, y0 = 1.
Use one step of Newtons method to improve this approximation.
(b) You are in charge of building the parking lots for a new airport. You
have ordered from amazon.com enough asphalt to pave 1 square kilometer, plus 5.6 kilometers of chain-link fencing. Your plan is to build
two square, fenced lots. The short-term lot is a square of side x=0.6
kilometers; the long-term lot is a square of side y=0.8 kilometers. The
amount of asphalt A and the amount C of chain-link fencing required
are then specified by the function
 
   2

A
x
x + y2
=F
=
,
C
y
4x + 4y
Alas, Amazon makes a small shipping error. They deliver enough
asphalt to pave 1.03 square kilometers but only 5.4 kilometers of fence.
i. Use the inverse-function theorem to find approximate new values
for x and y that use exactly what was shipped to you.
In this simple case you can check your answer by solving algebraically for x and y.
ii. Find a case where A = 1 but the value of C is such that this
approach will fail because [DF ] is not onto. (This case corresponds
to the maximum amount of fencing.)
(c) Saving Delos
The ancient citizens of Delos, threatened with a plague, consulted the
oracle of Delphi, who told them to construct a new cubical altar to
Apollo whose volume was double the size of the original cubical altar.
(For details, look up Doubling the cube on Wikipedia.)
If the side of the original altar was 1, the side of the new altar had to
be the real solution to f (x) = x3 2 = 0.
Numerous solutions to this problem have been invented. One uses a
marked ruler or neusis; another uses origami.
Your job is to use multiple iterations of Newtons method to find an
approximate solution for which x3 2 is less than 108 in magnitude.

18

Homework
1. (similar to group problem 1a)
We know the derivatives of the matrix-squaring function S and the matrixinversion function T :
If S(A) = A2 then [DS(A)](H) = AH + HA.
If T (A) = A1 then [DT (A)](H) = A1 HA1 .
(a) Use the chain rule to find a formula for the derivative of the function
U (a) = A4 .
(b) Use the chain rule to find a formula for the derivative of the function
W (a) = A4 .
2. (a) Hubbard, Exercise 1.7.21 (derivative of the determinant function).
This is really easy if you work directly from the definition of the derivative.
(b) Generalize this result to the 3 3 case. Hint: consider a matrix whose
columns are ~e1 + h~a1 , ~e2 + h~a2 , ~e3 + h~a3 , and use the definition of the
determinant as a triple product.
3. Hubbard, Exercise 1.8.6, part (b) only. In the case where f and g are
functions of time t, this formula finds frequent use in physics. You can
either do the proof as suggested in part (a) or model your proof on the one
for the dot product on page 143.
4. (similar to group problem 1b)
Hubbard, Exercise 1.8.9. The equation that you prove can be called a
first-order partial differential equation.

19

5. (similar to group problem 2c)


As a summer intern, you are given the job of reconciling the Democratic and
Republican proposals for tax reform. Both parties agree on the following
model:
x is the change in the tax rate for the middle class.
y is the change in the tax rate for the well-off.
The net impact on revenue is given by the function
 
 
x(x2 y 2 )
x
0
f
=
,f
= 0.
2
2
y
0
x +y
The Republican proposal is y = x, while the Democratic proposal is
y = x.
(a) Show that f is continuous at the origin.
(b) Show that both proposals are revenue neutral by calculating two appropriate directional derivatives. You will have to use the definition
of the directional derivative, not the Jacobian matrix.
(c) At the request of the White House, you investigate a 50-50 mix of the
two proposals, the compromise case where y = 0, and you discover
that it is not revenue neutral! Confirm this surprising conclusion by
showing that the directional derivatives at the origin cannot be given
by a linear function; i.e. that f is not differentiable.
(d) Your final task is to explain the issue in terms that legislators can understand: the function is not differentiable because its partial derivatives are not continuous. Demonstrate that one of the partial derivatives of f is discontinuous at the origin. (D2 f is less messy.)

20

6. Chain rule: an example with 2 2 matrices A similar example with a 3 3


matrix is on page 151 of Hubbard.
The function

  1
x
(x + y)
2

was invented by Gauss about 200 years ago to deal


f
=
xy
y
with integrals of the form
Z

dt
p
.
(t2 + x2 )(t2 + y 2 )

It was revived in the late 20th century as the basis of the AGM (arithmeticgeometric mean) method for calculating . You can get 1 million digits with
a dozen or so iterations.
The function is meant to be composed with itself; so it will be appropriate
to compute the derivative of f f by the chain rule.
(a) f is differentiable whenever x and y are positive; so its derivative is
given by its Jacobian matrix. Calculate this matrix.
 
8
We choose to evaluate the derivative of f f at the point
.
2
   
8
5
Conveniently, f
=
. The chain rule says that
2
4
 
 
 
8
5
8
[D(f f )]
= [Df
][Df
].
2
4
2
Evaluate the two numerical Jacobian matrices. Because the derivative
of f is evaluated at two different points, they will not be the same.
(b) Write the formula for f f , compute and evaluate the lower left-hand
entry in its Jacobian matrix, and check that it agrees with the value
given by the chain rule.
7. (Related to group problem 3c)
The quintic equation x(x2 1)(x2 4) = 0 clearly has five real roots that
are all integers. So does the equation x(x2 1)(x2 4) 1 = 0, but you
have to find them numerically. Get all five roots using Newtons method,
carrying out enough iterations to get an error of less than .001. Use R to do
Newtons method and to check your answers. If you have R plot a graph,
it will be easy to find an initial guess for each of the five roots.

21

8. (Related to group problem 3b, but involves extra iterations)


The CEO of a chain of retail stores will get a big bonus if she hits her volume
and profit targets for December exactly. Her microeconomics consultant,
fresh out of Harvard, tells her that both her target figures are functions
of two variables, investment x in Internet advertising and investment y in
television advertising. The former attracts savvier customers and so tends
to contribute to volume more than to profit.
The function that determines volume V and profit P is
   3 1

V
x4 y 3 + x
=
.
2
1
P
x4 y 3 + y
With x = 16, y = 8, V = 32, P = 16, our CEO figures she is set for a
big bonus. Suddenly, the board of directors, feeling that Wall Street is
looking as much for profit as for volume this year, changes her targets to
V = 24, P = 24. She needs to modify x and y to meet these new targets.
(a) 
Near
 V =32,P = 16, there is an inverse function such that
x
V
=g
. Find its derivative [Dg], and use the derivative to find
y
P
values of x and y that are an approximate solution to the problem.
Because the increments to V and P are large, you should not expect
the approximate solution to be very good, but it will be better than
doing nothing.
(b) Use multiple iterations of Newtons method in R to find accurate values
of x and y that meet the revised targets. Feel free to modify Script
3.3C.

22

9. (a) Hubbard, problem 2.10.2. Make a sketch to show how this mapping
defines an alternative coordinate system for the plane, in which a point
is defined by the intersection of two hyperbolas.
(b) The point x = 3, y = 2 is specified in this new coordinate system
by the coordinates u = 6, v = 5. Use the derivative of the inverse
function to find approximate values of x and y for a nearby point
where u = 6.5, v = 4.5. (This is essentially one iteration of Newtons
method.)
(c) Find h such that the point u = 6 + h, v = 5.1 has nearly the same
x-coordinate as u = 6, v = 5.
(d) Find k such that the point x = 3 + k, y = 2.1 has nearly the same
u-coordinate as x = 3, y = 2.
(e) For this mapping, you can actually find a formula for the inverse function that works in the region of the plane where x, y, u, and v are all
positive. Find the rather messy formulas for x and y as functions of
u and v, and use them to answer the earlier questions. Once you calculate the Jacobian matrix and plug in appropriate numerical values,
you will be back on familiar ground.
I could get Mathematica Solve[] to find the inverse function only after
I eliminated y by hand. At this point the quadratic formula does the
job anyway!

23

MATHEMATICS 23a/E-23a, Fall 2015


Linear Algebra and Real Analysis I
Module #3, Week 4
Implicit functions, manifolds, tangent spaces, critical points
Author: Paul Bamberg
R scripts by Paul Bamberg
Last modified: July 27, 2015 by Paul Bamberg
The lecture outline and problems have not yet been revised for 2015. Pages
1-8 are in final form. Print them if you are watching lecture preview videos or R
script videos before the course starts.
Reading
Hubbard, Section 3.1 (Implicit functions and manifolds)
Hubbard, Section 3.2 (Tangent spaces)
Hubbard, Section 3.6 (Critical points)
Hubbard, Section 3.7 through page 354 (constrained critical points)
Proofs to present in section or to a classmate who has done them.
12.1 (Hubbard Theorem 3.2.4) Suppose that U Rn is an open subset,
F : U Rnk is a C 1 mapping, and manifold M can be described as the
set of points that satisfy F(z) = 0. Use the implicit function theorem to
show that if [DF(c)] is onto for c M , then the tangent space Tc M is the
kernel of [DF(c)]. You may assume that the variables have been numbered
so that when you row-reduce [DF(c)], the first n k columns are pivotal.
12.2(Hubbard, theorems 3.6.3 and 3.7.1) Let U Rn be an open subset
and let f : U R be a C 1 (continuously differentiable) function.
First prove, using a familiar theorem from single-variable calculus, that if
x0 U is an extremum, then [Df (x0 )] = [0].
Then prove that if M Rn is a k-dimensional manifold, and c M U is
a local extremum of f restricted to M , then Tc M ker[Df (c)].

R Scripts
Script 3.4A-ImplicitFunction.R
Topic 1 - Three variables, one constraint
Topic 2 - Three variables, two constraints
Script 3.4B-Manifolds2D.R
Topic 1 - A one-dimensional submanifold of R2 the unit circle
Topic 2 - Interesting examples from the textbook
Topic 3 - Parametrized curves in R2
Topic 4 - A two-dimensional manifold in R2
Topic 5 - A zero-dimensional manifold inR2
Script 3.4C-Manifolds3D.R
Topic 1 - A manifold as a function graph
Topic 2 - Graphing a parametrized manifold
Topic 3 - Graphing a manifold that is specified as a locus
Script 3.4D-CriticalPoints
Topic 1 - Behavior near a maximum or minimum
Topic 2 - Behavior near a saddle point
Script 3.5A-LagrangeMultiplier.R
Topic 1 - Constrained critical points in R2

1
1.1

Executive Summary
Implicit functions review of the linear case.

We have n unknowns, n k equations, e.g for n = 3, k = 1


2x + 3y z = 0, 4x 2y + 3z = 0 

2 3 1
Create an (n k) n matrix: T =
4 2 3
If the matrix T is not onto, its rows (the equations) are linearly dependent.
Otherwise, when we row reduce, we will find n k = 2 pivotal columns and
k = 1 nonpivotal columns. We assign values arbitrarily to the active variables
that correspond to the nonpivotal columns, and then the values of the passive
variables that corresponds to the pivotal column are determined.
Suppose that we reorder the unknowns so that the active variables come last.
Then, after we row reduce the matrix, the first n k columns will be pivotal. So
the first n k columns will be linearly independent, and they form an invertible
square matrix. The matrix is now of the form
 T = [A|B], where A is invertible.
~x
, where the passive variables ~x come
The solution vector is of the form ~v =
~y
first, the active variables ~y come second.
A solution to T ~v = ~0 is obtained by choosing ~y arbitrarily and setting
~x = A1 B~y. Our system of equations determines ~x implicitly in terms of ~y.

1.2

Implicit function theorem the nonlinear case.

We have a point c Rn , a neighborhood W of c, and a function F : W Rnk


for which F(c) = 0 and [DF(c)] is onto. F imposes constraints.
The variables are ordered so that the n k pivotal columns in the Jacobian
matrix, which correspond to the passive variables, come first. Let a denote the
passive variables at c; let b denote the active variables at c.
The implicit function g expresses the passive variables in terms of the active
variables, and
 g(b)
 = a. For y near b, x = g(y) determines passive variables
a
such that F
= 0. Tweak y, and g specifies how to tweak x so that the
b
constraints are still satisfied.
Although we usually cannot find a formula for g, we can find its derivative at
b by the same recipe that worked in simple cases.
Evaluate the Jacobian matrix [DF(c)].
Extract the first n k columns to get an invertible square matrix A.
Let the inverse of this matrix act on the remaining k columns (matrix B) and
change the sign to get the (n k) k Jacobian matrix for g.
That is, [Dg(b)] = A1 B.
3

1.3

Curves, Surfaces, Graphs, and Manifolds

Manifolds are a generalization of smooth curves and surfaces.


The simplest sort of manifold is a flat one, described by linear equations. An
example is the line of slope 2 that passes through the point x = 0, y = 2: a
one-dimensional submanifold of R2
There are three equivalent ways to describe such a manifold.
(The definition) As the graph of a function that expresses the passive variables in terms of the active variables: either y = f (x) = 2 + 2x or
x = g(y) = 12 (y + 2).
 
x
As a locus defined by a constraint equation F
= 2x y 2 = 0.
y
 
 
1
1
By a parametrization function g(t) =
+t
.
0
2
Definition: A subset M Rn is a smooth manifold if locally it is the graph
of a C 1 function (the partial derivatives are continuous). Locally means that
for any point x M we can find a neighborhood U of x such that within M U ,
there is a C 1 function that expresses n k passive variables in terms of the other
k active variables. The number k is the dimension of the manifold. In R3 there
are four possibilities:
k = 3. Any open subset M R3 is a smooth 3-dimensional manifold. In
this case k = 3, and the manifold is the graph of a function
f : R3 {~0}, whose codomain is the trivial vector space {~0} that contains
just a single point. Such a function is necessarily constant, and its derivative
is zero.
 
x
k = 2. The graph of z = f
= x2 + y 2 is a paraboloid.
y
 


x
cos 2z
~
k = 1. The graph of the function
= f (z) =
is a helix.
y
sin 2z
k = 0. In this case the manifold consists of one of more isolated points.
Near any of these points x0 , it is the graph of a function ~f : {~0} R3
whose domain is a zero-dimensional vector space and whose image is the
point x0 R3 . This function is differentiable because, since its domain
contains only one point (the zero vector) you cannot find narby points to
show that it is not differentiable.
There is no requirement that a manifold be the graph of a single function, or
that the active be the same at every point on the manifold. The unit circle, the
locus of x2 +y 2 1 = 0, is the union of four function graphs, two of which have x as
the active variable, two of which have y. By using a parameter
of
 t that is not
 one 
x
cos t
the variables, we can represent it by the parametrization
= g(t) =
y
sin t
4

1.4

Using the implicit function theorem

Start with an open subset U Rn and a C 1 function F : U Rnk . Consider


the locus, M U , the set of solutions of the equation F(z) = 0.
If [DF(z)] is onto (surjective) for every z M U , then M U is a smooth
k-dimensional manifold embedded in Rn .
Proof: the implicit function theorem says precisely this. The statement that
[DF(z)] is onto guarantees the differentiability of the implicitly defined function.
If [DF(z)] does not exist or fails to be onto, perhaps even just at a single point,
the locus is not a manifold. We use the notation M U because F may define
just part of a larger manifold M that cannot be described as the locus as a single
function. To say that M itself is a manifold, we have to find an appropriate U
and F for every point z in the manifold.

1.5

Parametrizing a manifold

For a k-dimensional submanifold of Rn , the parametrization function is : U


M , where U Rk is an open set. The variables in Rk are called parameters.
The parametrization function must be C 1 , one-to-one, and onto M . In other
words, we want to give us the entire manifold. Finding a local parametrization
that gives part of the manifold is of no particular interest, because there is, by
definition, a function graph that does that.
An additional requirement: The derivative of the parametrization function
is one-to-one for all parameter values. This requirement guarantees that the
columns of the the Jacobian matrix [D] are linearly independent.

1.6

Tangent space as graph, kernel, or image

Locally, a k-dimensional submanifold M of Rn is the graph of a function


g : Rk Rnk . The derivative of g, [Dg(b)], is an (n k) k matrix that
into a vector of
converts a vector of increments to the k active variables, y,
That is, x = [Dg(b)](y)

increments to the n k passive variables, x.


A point c of M is specified by the active variables b and the accompanying
passive variables a. The tangent space TM (c) is the graph of this derivative. It
is a k-dimensional subspace of Rn .
The k-dimensional manifold M can also be specified as the locus of the equation
F(z) = 0, for F : Rn Rnk The tangent space Tc M is the kernel of the linear
transformation [DF(c)].
Finally, the manifold M can also be described as the image of a parametrization
function : U Rk Rn ,
In this case any point of M is the image of some point u in the parameter
space, and the tangent space is T(u) M = Img [D(u)]. Whether specified as
graph, kernel, or image, the tangent space Tc M is the same! It contains the
increment vectors that lead from c to nearby points that are almost on the
manifold.
5

1.7

Critical points

Suppose that function f : Rn R is differentiable at point x0 and that the


derivative [Df (x0 )] is not zero. Then there exists a vector ~v for which the directional derivative is not zero, the function g(t) = f (x0 + t~v f (x0 ) has a nonzero
derivative at t = 0, and, even if we just consider points that lie on a line through
x0 with direction vector ~v, the function f cannot have a maximum or minimum
at x0 . So in searching for a a maximum or minimum of f at points where it is
differentiable, we need to consider only critical points where [Df (x0 ] = 0.
A critical point is not necessarily a maximum or minimum, but for f : Rn R
there is a useful test that generalizes the second-derivative test of single-variable
calculus. The proof relies on sections 3.3-3.5 of Hubbard, which we are skipping.
Form the Hessian matrix of second partial derivatives (Hubbard, p. 348),
evaluated at the critical point x of interest.
Hi,j (x) = Di Dj f (x).
H is a symmetric matrix. If it has a basis of eigenvectors and none of the
eigenvalues are zero, we can classify the critical point.
If H has a basis of eigenvectors, all with positive eigenvalues, the critical point
is a minimum.
If H has a basis of eigenvectors, all with negative eigenvalues, the critical
point is a maximum.
If H has a basis of eigenvectors, some with positive eigenvalues and some with
negative eigenvalues, the critical point is a saddle: it is neither a maximum or a
minimum.

1.8

Constrained critical points

These are of great important in physics, economics, and other areas to which
mathematics is applied.
Consider a point c on manifold M where the function f : Rn R is differentiable. Perhaps f has a maximum or minimum at c when its value is compared to
the value at nearby points on M , even though there are points not on M where f
is larger or smaller. . In that case we should not consider all increment vectors,
but only those increment vectors ~v that lie in the tangent space to the manifold.
The derivative [Df (c)] does not have to be the zero linear transformation, but
it has to give zero when applied to any increment that lies in the tangent space
Tc M , or
Tc M Ker[Df (c)].
When manifold M is specified as the locus where some function F = 0, there
is an ingenious way of finding constrined critical points by using Lagrange multipliers,but not this week!

1.9

Constrained critical points - three approaches

We have proved the following:


If M Rn is a k-dimensional manifold, and c M U is a local extremum of f
restricted to M , then Tc M ker[Df (c)].
Corresponding to each of the three ways that we can know the manifold
M, there is a technique for finding the critical points of f restricted to M .
Manifold as a graph
Near the critical point, the passive variables x are a function g(y of the
active variables y. Define the graph-making function
 
x
(y) =
g
y
Now f (g(y) specifies values of f only at points on the manifold. Just search
(y)] = 0.
for unconstrained critical points of this function by setting [Df g
This approach works well if you can represent the entire manifold as a single
function graph.
Parametrized manifold
Points on the manifold are specified by a parametrization (u).
Now f ((u)) specifies values of f only at points on the manifold. Just search
for unconstrained critical points of this function by setting [Df (u)] = 0.
This approach works well if you can parametrize the entire manifold.
Manifold specified by constraints
Points on the manifold all satisfy the constraints F(x) = 0.
In this case we know that
Tc M = Ker[DF(c)], so the rule for a critical point becomes
Ker[DF(c)] Ker[Df (c)].
If there is just a single constraint F (x) = 0, both derivative matrices consist
of just a single row, and we can represent the condition for a critical point
as Ker Ker .
Suppose that ~v ker and that = . The quantity is called a
Lagrange multiplier. Then by linearity, [Df (c)]~v = ~v = ~v = 0.
So [Df (c)]~v = 0 for any vector in the tangent space of F = 0, and we have
a constrained critical point.
It is not quite so obvious that the condition = is necessary as well as
sufficient. We will need to do a proof by contradiction (proof 13.1).

1.10

Equality of crossed partial derivatives

Let U Rn be open. Suppose that f : Rn R is differentiable at a and has the


property that each of its partial derivatives Di f is also differentiable at a. Then
Dj (Di f )(a) = Di (Dj f )(a).
The proof consists in using the mean value theorem to show that
Dj (Di f )(a) = Di (Dj f )(a) = lim

t0 t2

(f (a+t~ei +t~ej )f (a+t~ei )f (a+t~ej )+f (a)).

Proofs
1. Let W be an open subset of Rn , and let F : W Rnk be a C 1 mapping
such that F(c) = 0. Assume that [DF(c)] is onto.
Prove that the n variables can be ordered so that the first n k columns
of [DF(c)] are linearly independent, and that [DF(c)] = [A|B] where A is
an invertible (n k) (n k) matrix.
 
a
Set c =
, where a are the n k passive variables and b are the k active
b
variables.
Let g be the implicit function
from aneighborhood of b to a neighborhood
g(y)
of a such that g(b) = a and F
= 0.
y
Prove that [Dg(b)] = A1 B.

2. (Proof 12.1 - Hubbard Theorem 3.2.4)


Suppose that U Rn is an open subset, F : U Rnk is a C 1 mapping,
and manifold M can be described as the set of points that satisfy F(z) = 0.
Use the implicit function theorem to show that if [DF(c)] is onto for c M ,
then the tangent space Tc M is the kernel of [DF(c)]. You may assume that
the variables have been numbered so that when you row-reduce [DF(c)],
the first n k columns are pivotal.

10

3. (Hubbard, Proposition 3.2.7) Let U Rk be open, and let : U Rn be


a parametrization of manifold M . Show that
T(u) M = img[D(u)].
You may take it as proved that if subspaces V and W both have dimension
k and V W, then V = W (for the simple reason that k basis vectors for
V are k independent vectors in W and therefore also form a basis for W ).

11

4. (Proof 12.2 Hubbard, theorems 3.6.3 and 3.7.1)


Let U Rn be an open subset and let f : U R be a C 1 (continuously
differentiable) function.
First prove, using a familiar theorem from single-variable calculus, that if
x0 U is an extremum, then [Df (x0 )] = [0].
Then prove that if M Rn is a k-dimensional manifold, and c M U is
a local extremum of f restricted to M , then Tc M ker[Df (c)].

12

Sample Problems
1. A cometary-exploration robot is fortunate enough to land on an ellipsoidal
comet whose surface is described by the equation
x2 +

y2 z2
+
= 9.
4
9

Its landing point is x = 2, y = 4, z = 3.


Prove that the surface of the comet is a smooth manifold.
The controllers of the robot want it to move to a nearby point on the
surface where y = 4.02, z = 3.06. Use the implicit function theorem to
determine the approximate x coordinate of this point.
(Check: 1.982 + 4.022 /4 + 3.062 /9 = 9.0009.)
Find a basis for the tangent space at the landing point.
Find the equation of the tangent plane at the landing point.
(Check: 4(1.98) + 2(4.02) + (2/3)(3.06) = 18.)

13

2. The plane x + 2y 3z + 4 = 0 andthecone x2 + y 2 z 2 = 0 intersect in a


3
curve that includes the point c = 4. Near that point this curve is the
5
 
x
graph of a function
= g(z).
y
Use the implicit function theorem to determine g0 (5), then find the approximate coordinates of a point on the curve with z = 5.01.
Check: 2.89+2(4.07) - 3(5.01)= -4; 2.892 + 4.072 = 24.917.

14

3. Assume that, at the top level, there are nine categories x1 , x2 , , x9 in the
Federal budget. They must satisfy four constraints:
One simply fixes the total dollar amount.
One comes from your political advisors it makes the budget looks
good to likely voters in swing states.
One comes form Congress - it guarantees that everyone can have his
or her earmarks.
One comes from the Justice Department it guarantees compliiance
with all laws.
These four constraints together define a function F whose derivative is onto
for budgets that satisfy the constraints. The acceptable budgets, for which
F(x) = 0, form a k-dimensional submanifold M of Rn .
Specify the dimension of the domain and codomain foe
(a) A function g that specifies that passive variables in terms of the active
variables.
(b) The function F that specifies the constraints.
(c) A parametrization function that generates a valid budget from a set
of parameters.
For each alternative, specify the shape of the matrix that represents the
derivative of the relevant function and explain how, given a valid budget c,
it could be used to find a basis for the tangent space Tc M.

15

4. (Hubbard, exercise 3.1.17) Consider the situation described by Example


3.1.8 in Hubbard, where four linked rigid rods form a quadrilateral in the
plane. The distance from vertex x1 to x2 is l1 , the distance from vertex x2
to x3 is l2 , the distance from vertex x3 to x4 is l3 , and the distance from
vertex x4 to x1 is l4 .
Show that knowing the positions x1 and x3 of two opposite vertices determines exactly four possible positions of the linkage if the distance from x1
to x3 is less than both l1 + l2 and l3 + l4 but greater than both |l1 l2 | and
|l3 l4 | Draw diagrams to illustrate what can happen if these conditions
are not satisfied.

16

5. Critical points
 
x
f
= 21 x2 + 13 y 3 xy
y
Calculate the partial derivatives
as
of x and y, and show that the
 
 functions

0
1
only critical points are
and
0
1

Calculate the Hessian matrix H and evaluate it numerically at each critical


point to get matrices H0 and H1 .

 
0
Find the eigenvalues of H0 and classify the critical point at
.
0

 
1
Find the eigenvalues of H1 and classify the critical point at
.
1

17

Group Problems
1. Implicitly defined functions

 2

x
2
2
x
+
y
+
z

3
(a) The nonlinear equation F y =
= 0 implicitly
x2 + z 2 2
z
determines x and y as a function of z. The first equation describes a
sphere of radius 3, the second describes a cylinder of radius 2 whose
axis is the y-axis. The intersection is a circle in the plane y = 1.
Near the point x = 1, y = 1, z = 1, there is a function that expresses
the two
passive variables
x and y in terms of the active variable z.

2
2z
.
g(z) =
1
Calculate g0 (z) and determine the numerical value of g0 (1)
Then get the same answer without using the function g by forming
the Jacobian matrix [DF] evaluating it at x = y = z = 1, and using
the implicit function theorem to determine g0 (z) = A1 [B].
(b) Dean Smith is working on a budget in which he will allocate x to the
library, y to pay raises, and z to the Houses. He is constrained.
The Library Committee, happy to see anyone get more funds as long
as the library does even better, insists that x2 y 2 z 2 = 1.
The Faculty Council, content to see the Houses do well as long as other
areas benefit equally, recommends that x + y 2z = 1.
To comply with these constraints, the dean tries x = 3, y = 2, z = 2.
Given theconstraints,
x and y are determined by an implicitly defined

x
function
= g(z).
y
Use the implicit function theorem to calculate g0 (2), and use it to find
approximate values of x and y if z increased to 2.1.

x

(c) The nonlinear equation F y = x2 4z 2 4y 2 1 = 0 implicitly


z
determines x as a function of y and z, but we need to know whether x
is positive or negative to choose the right square root in the function.
 
y
Find the appropriate function g
near the point
z  
1
x = 3, y = 1, z = 1, and calculate [Dg
]
1
Then get the same answer by calculating the Jacobian matrix [DF ]
at x = 3, y = 1, z = 1, splitting off a square matrix A on the left, and
computing [Dg] = A1 B.

18

2. Manifolds and tangent spaces, investigated with help from R


(a) Manifold
M is known by the equation

x
4
2

y
F
= xz y = 0 near the point c = 2.
z
1
It can also
described parametrically by
be2
 
s
s

= st2 near s = 2, t = 1.
t
t4
i. Use the parametrization to find a basis for the tangent space Tc M.
ii. Use the function F to confirm that your basis vectors are indeed
in the tangent space Tc M.
iii. Use the parametrization to do a wireframe plot of the parametrized
manifold near s = 2, t = 1. See script 3.4C, topic 2.

x

(b) Manifold M is known by the equation F y = x2 y + xy 2 z 2 + 3 = 0


z

2

near the point c = 1.


3
i. Find a basis for the tangent space Tc M.
 
y
ii. Locally, M is the graph of a function x = g
. Determine
z
 
1
[Dg
] by using the implicit function theorem.
3
iii. Solve for z in terms of xand y, and use R to do a wireframe plot
of the manifold. See script 3.4C, topic 1.



z1
z3

(c) (Hubbard, Example 3.1.14) F z2 =


z3 z1 z2
z3
Construct [DF]. It has two rows.
Find the point for which [DF] is not onto. Use R to find points on
the manifold near this point, and try to figure out what is going on.
See the end of script 3.4C for an example of how to find points on a
1-dimensional manifold in R3 .

19

3. Critical points (rigged to make the algebra work, but you should also plot
contour lines in R and use them to find the critical points)
Calculate the Jacobian matrix and the Hessian both by using R and with
pencil and paper.
 
x
(a) i. Find the one and only critical point of f
= 4x2 + 12 y 2 + x82 y
y
on the square 14 x 4, 41 y 4.
ii. Use second derivatives (the Hessian matrix) to determine whether
this critical point is a maximum, minimum, or neither.
 
x
(b) The domain of the function F
= y 2 + (x2 3x) log y is the upper
y
half-plane y > 0. Find all the critical points of F , and use the Hessian
matrix to classsify each as maximum, minimum, or saddle point.
 
x
(c) The function F
= x2 y 3xy + 12 x2 + y 2 has three critical points,
y
two of which lie on the line x = y. Find each and use the Hessian
matrix to classify it as maximum, minimum, or saddle point.

20

Homework - due on December 2

Although all of these problems except the last one were designed so that they
could be done with pencil and paper, it makes sense to do a lot of them in R,
and the Week 12 scripts provide good models. For each problem that you choose
to do in R, include a see my script reference in the paper version. Put all your
R solutions into a single script, and upload it to the homework dropbox on the
week 12 page.
When you use R, you will probably want to include some graphs that are not
required by the statement of the problem.
Do appreciate that problems 3 and 4, which use only androgynous names, are
sexual-orientation neutral as well as gender-neutral and avoid the use of thirdperson singular pronouns.
1. (Hubbard, exercise 3.12)
Let X R3 be the set of midpoints of segments joining a point of the
curve C1 of equation y = x2 , z = 0 to a point of the curve C2 of equation
z = y 2 , x = 0.
(a) Parametrize C1 and C2 .
(b) Parametrize X.
(c) Find an equation for X (i.e. describe X as a locus)
(d) Show that X is a smooth surface.

x

2. Manifold M is known by the equation F y = x2 + y 4 2z 2 2 = 0 near


z

3
the point c = 1.
2
 
y
(a) Locally, near c, M is the graph of a function x = g
. Determine
z
[Dg(c)] by using the implicit function theorem.
(b) Use [Dg(c)] to find the approximate value of x for a point of M near
c for which y = 1.1, z = 1.8.
(c) Check your answers by finding an explicit formula for g and taking its
derivative.

21

3. Pat and Terry are in charge of properties for the world premiere of the
student-written opera Goldfinger at Dunster House. In the climactic
scene the anti-hero takes the large gold brick that he has made by melting
down chalices that he stole from the Vatican Museum and places it in a
safety deposit box in a Swiss bank while singing the aria Papal gold, now
rest in peace.
The gold brick is supposed to have length x = 8, height y = 2, and width
z = 4. With these dimensions in mind, Pat and Terry have spent their
entire budget on 112 square inches of gold foil and 64 cubic inches of an
alloy that melts at 70 degrees Celsius. They plan to fabricate the brick by
melting the alloy in a microwave oven and casting it in a sand mold.
Alas, the student mailboxes that they have borrowed to simulate safetydeposit boxes turn out to be not quite 4 inches wide. Fortunately, the
equation



x
xyz

64
F y =
=0
xy + xz + yz 56
z
specifies x and y implicitly in terms of z.
(a) Use the implicit function
  theorem to find [Dg(4)], where g is the funcx
tion that specifies
in terms of z, and find the approximate dimeny
sions of a brick with the same volume and surface area as the original
but with a width of only 3.9 inches.
(b) Show that if the original dimensions had been x = 2, y = 2, z = 16,
then the constraints of volume 64, surface area 136 specify y and z in
terms of x but fail to specify x and y in terms of z.
(c) Show that if the original brick had been a cube with x = y = z = 4,
then, with the constraints of volume 64, surface area 96, we cannot
show the existence of any implicit function. In fact there is no implicit
function, but our theorem does not prove that fact. This happens
because this cube has minimum surface area for the given volume.

22

4. This problem is an example of a two-dimensional submanifold of R4 .


For their term project in the freshman seminar Nuclear Terrorism and the
Third World, Casey and Chris decide to investigate whether plutonium
can be handled safely using only bronze-age technology. They acquire two
bronze spears, each 5 meters long, and design a system where the plutonium
container is connected to the origin by one spear and to the operator by
the other. Everything is in a plane. Now the coordinates x1 and y1 of
the plutonium and the coordinates x2 and y2 of the operator satisfy the
equation

x1


y1
x21 + y12 25

= 0.
F =
(x1 x2 )2 + (y1 y2 )2 25
x2
y2
One solution to this equation is x1 = 3, y1 = 4, x2 = 0, y2 = 8.
(You can build a model with a couple of ball-point pens and some Scotch
tape).
(a) Show that near the given solution, the constraint equation specifies x1
and y1 as a function of x2 and y2 , but not vice-versa.
(b) Calculate the derivative of the implicit function and show that it is not
onto. Determine in what direction the plutonium container will move
if x2 and y2 are both increased by equal small amounts (or changed
in any other way.) This system is not really satisfactory, because the
plutonium container can move only along a circle.
(c) Casey and Chris come up with a new design in which one spear has its
end confined to the x-axis (coordinate x2 can be changed, but y2 = 0).
The other spear has its end confined to the y-axis (coordinate y3 can
be changed, but x3 = 0). For this new setup, one solution is x1 = 3,
y1 = 4, x2 = 6, y3 =0. Show that x1 and y1 are now specified locally
x
by a function ~g 2 . Calculate [Dg] and show that it is onto.
y3
(d) Are x2 and
 y3, near the same solution, now specified locally by a
x1
function ~f
? If so, what is [Df ]?
y1
(e) For the new setup, another solution is x1 = 3, y1 = 4, x2 = 6, y3 = 8.
Show that in this case, although [DF] is onto, the choice of x1 and y1
as passive 
variables
is not possible, and there is no implicitly defined

x2
function ~g
as there was in part (c). Draw a diagram to illustrate
y3
what is the problem.

23

5. (Physics version)In four-dimensional spacetime, a surface is specified as the


intersection of the hypersphere x2 + y 2 + z 2 = t2 2 and the hyperplane
3x + 2y + z 2t = 2.
(Economics version)A resource is consumed at rate t to manufacture goods
at rates x, y, and z, and production is constrained by the equation x2 +
y 2 + z 2 = t2 2.
Furthermore, the expense of extracting the resource is met by selling the
goods, so that 2t = 3x + 2y + z 2.
In either case, we have a manifold that is the locus of

x
 2

y
x + y 2 + z 2 t2 + 2

F =
= 0.
z
3x + 2y + z 2t 2
t
(a) Show that this surface is a smooth 2-dimensional manifold.
(b) One point on the manifold is x = 1, y = 2, z = 3, t = 4. Near this
point the manifold is the graph of a function g that expresses x and y
as functions of z and t. Using the implicit function theorem, determine
[Dg] at the point z = 3, t = 4.
6. Consider
specified by the parametrization

 the
 manifold

x
t + et
, < t < .
g(t) =
=
t + e2t
y
Find where it intersects the line 2x+y = 10. You can get an initial estimate
by using the graph in script 3.4B, then use Newtons method to improve
the estimate.

24

7. Manifold X, a hyperboloid, can be parametrized as


 
x
sec u
y = u = tan u cos v
v
z
tan u sin v
If you use R, you can do a wireframe plot the same way that the sphere
was plotted in script 3.4C, topic 2.
(a) Find the coordinates of the point c on this manifold for which
u = 4 , v = 2 .

(b) Find the equation of the tangent plane Tc X as the image of [D 4 ].
2


x
(c) Find an equation F y = 0 that describes the same manifold near
z
c, and find the equation of the tangent plane Tc X as the kernel of
[DF(c)].
 
y
(d) Find an equation x = g
that describes the same manifold near
z
c, and
find the equation of the tangent plane Tc X as the graph of
 
0
[Dg
].
1
8. Hubbard, Exercise 3.6.2. This is the only problem of this genre on the
homework that can be done with pencil and paper, but you must be prepared to do one like it on the final exam!
9. Here is another function that has one maximum, one miminum, and two
saddle points, for all of which x and y are less than 3 in magnitude.
 
x
f
= x3 y 3 + 2xy 5x + 6y.
y
Locate and classify all four critical points using R, in the manner of script
3.4D. A good first step is to plot contour lines with x and y ranging from
-3 to 3. If you do
contour(x,y,z, nlevels = 20)
you will learn enough to start zooming in on all four critical points.
An alternative, more traditional, approach is to take advantage of the fact
that the function f is a polynomial. If you set both partial derivatives equal
to zero, you can eliminate either x or y from the resulting equations, then
find approximate solutions by plotting a graph of the resulting fourth-degree
polynomial in x or y.

25

You might also like