You are on page 1of 115

4.

1 Recurrence Relations
I supposed that you have learnt the chapter Sequences & Series in Maths T
before you arrive at this chapter.
A recursive definition of a sequence specifies one or more initial terms and a
rule for determining subsequent terms from those that precede them.
Arecurrence relations for the sequence {an} is an equation that
expresses anin terms of one or more of the previous terms of the sequence,
namely, a0,a1, , an-1, for all integers n with n n0, where n0 is a non-negative
integer.
That was the formal definition of recurrence relations. When you say that
something isrecursive, it means that there is a repetition. So a recurrence
relation is basically just anequation which relates a term, with the term before
it. Lets take the arithmetic sequence 1, 2, 3, 4, 5, till infinity. So the term 2 is
derived from the term 1, by adding 1 to it. Similarly, there is the same
relationship for all the terms, which is to add 1 to it. We shall denote 1
as a0, which is the initial term. Then, we find that the term a1 which is related
to the initial term by the equation
a1 = a0 + 1
So after generalizing the sequence, we can conclude that the arithmetic
sequence can be represented by the recurrence relations
an = an-1 + 1
where n 0 (non-negative integer). Using this equation, and given the initial
condition a0, you can write down the rest of the terms by slowly adding all the
way up (just imagine if I asked you to find the term a109!). So now you know that
a recurrence relation is just an equation which has an and at least another
term an-x. Examples of recurrence relations are
an = 6an-2
an = 5an+4 - 2an+3 + n

We say that a recurrence relation is homogeneous when it only contains the


terms an-x. For example, an = 6an-2 is homogeneous, while an = 5an-1 2an-2 +
3 is not, as 3 is not an an-x. term.
We say that a recurrence relation is linear when the maximum power of the an2
x terms is 1. For example, an = 6an-2 is linear, but an = 6(an-2) is not, as its
maximum power is 2.
The order / degree of a recurrence relation tells us the maximum amount of
terms away is the term an related from itself. For example, an = 6an-1 is a first
order recurrence relation, while an = 6an-1 + an-3 is a third order recurrence
relation. Any recurrence relation with the k-th order requires k amount of initial
conditions to be solved. For example, we see that the equation an = 8an-1 + 9an2 needs 2 initial conditions, a0 and a1 to be defined.

In STPM, you will only be dealing with linear and 2nd order recurrence
relations, for bothhomogeneous and non-homogeneous.

Now that you know what a recurrence relation is, I will guide you with some basic
modelling. You need to learn how to use recurrence relations in a given situation,
or question. Let me start with 2 very famous examples, the Fibonacci
Numbers and theTower of Hanoi.

RABBITS, AND THE FIBONACCI NUMBERS


Leonardo Pisano, also known as Fibonacci, came up with this problem in the
13th century. Suppose a young pair of rabbits (one male and one female) is
placed on an island. A pair of rabbits does not breed until they are 2 months old.
After they are 2 months old, each pair of rabbits produces another pair each
month. He wanted to find a recurrence relation for the number of pairs of rabbits
on the island after n months, assuming that no rabbits ever die.
Lets try counting. In the beginning, there were only 2 rabbits. Then in the first
and month, there are still 2 rabbits on the island, because they are still not old
enough to breed. But in the second month, the pair of rabbits started to breed,
and they produce another 2 rabbits on the island, making it 4 rabbits. In the third
month, there will be 6, because the old rabbits reproduce, but not the young
rabbits. Counting by pairs, we found out that the rabbits grow according to a
sequence of 1, 1, 2, 3, 5, 8, 13, and so on. Take a look at the bunny diagram
below.

Now, here is the hard part. To solve this problem, you know that there are 2
initial conditions, a0 and a1, which are both 1 (a0 is the starting, which I will call it
as month 0, and a1 is for the first month). As we step into month 2, the amount
of pair of rabbits will be the number of pairs of rabbits in the previous month
(month 1) plus a new line of rabbit which it reproduced (which has the condition
of the rabbits in month 0). The progress goes on and every time we reach a new
month, we will add up the number of pairs of rabbits in the previous month with
the number of pairs of rabbits in the month before the previous month. So in the
end, we come up with the famous Fibonacci Sequence, which is represented
by the recurrence relation
fn = fn-1 + fn-2
I bet you got lost somewhere, but this is the best explanation I could come up
with. You can try reading the textbooks, and you might not even understand it at
all. We see that the Fibonacci sequence is a 2nd order homogeneous linear
recurrence relation. This chapter really needs you to think a lot.
Do you know that Fibonacci numbers also exist in sunflower patterns, pinecones,
and spiral seashells? Get to know more about Fibonacci Numbers in Nature

4.2 Homogeneous Linear Recurrence Relations


Recall that you learnt in the previous section how to model a situation using
recurrence relations. The equations are helpful, however, it doesnt really help
much if you are searching for a huge term. For example, the relation an = 2an-1,
given the initial conditiona0 = 1, finding the term a109 will be tiring, as it will take
you forever to get there. When we say that we solve a recurrence relation, it
means that we are trying to convert the relation into an equation in terms
of n instead of an, which obviously, would be easier for you to calculate the nth
term.
In this section, Ill be showing you how to solve 2nd order homogeneous linear
recurrence relations. The non-homogeneous part follows from here in the next
section.

2 DISTINCT ROOTS
Given a recurrence relation an = 5an-1 6an-2, with initial conditions a0 = 1, a1 =
0. To start off with, we let an = rn. This is a smart guess which we will find
eventually that it is correct. We can then further deduce that an-1 = rn-1, and ann-2
. Substituting everything back into the equation, we have
2 = r
rn = 5rn-1 6rn-2
dividing the equation by rn-2 (which is the smallest power), we get
r2 = 5r 6
r2 5r + 6 = 0
which is a quadratic equation! This equation is called the characteristic
equation, and ris called the characteristic root. Solving the equation, we get r
= 2, 3. Again, using a smart guess, we deduce that the term an can be
represented by the equation
an = c12n + c23n
So you noticed that the 2n and 3n must have came from the characteristic roots
earlier on. This is the general solution of the recurrence relation. The
terms c1 and c2 are just 2 constants, which we will find by using the initial
conditions.
When a0 = 1,
a0 = c1 + c2 = 1

(1)

When a1 = 0,
a1 = 2c1 + 3c2 = 0
2c1 = 3c2
(2)
Now you have 2 simultaneous equations. Using the calculator, you can easily find
that c1= 3, c2 = 2. Substituting the constants back into the equation, you get
an = 3(2n) 2(3n)

which is what we called as the particular solution. This is the final answer that
we are looking for. Now that you substitute n = 109, you can get the answer
straight away for an! Now that you find the answer, try finding the first 5 or 6
terms, using both the recurrence relation an = 5an-1 6an-2 and the
equation an = 3(2n) 2(3n). Do they contradict one another? Congratulations,
you just learnt how to solve homogeneous recurrence relations!

2 EQUAL ROOTS
However, the above method is only true for 2 distinct roots in the
characteristic equation. Take another example, an = 4an-1 4an-2, a0 = 0, a1 =
1. You get a characteristic equation r2 + 4r + 4 = 0, r = 2. If you take the
general solution as an = c1(-2)n, then you are totally wrong. The correct answer
should be an = c1(-2)n + nc2(-2)n. Notice the extra multiplied n in the second
term. To summarize:
1. If the characteristic roots r1 and r2 are distinct, represent them as an =
c1r1n + c2r2n.
2. If the characteristic roots r are equal, represent them as an = c1rn + nc2r2n.
Distinct roots could be either real or complex. The method for both is the same.

I will not discuss the methods to solve higher order recurrence relations here.
However, the method is actually the same. Just represent an = rn, and you will
get linear, quartic or cubic equations, which you could eventually solve and get
an answer for it. Simple?
4.3 Non-Homogeneous Linear Recurrence Relations
Consider the following non-homogeneous linear recurrence relation:
an = { an-1 + an-2 } + { 3n + n3n + n2 + n + 3 }
(1)
(2)
Part (1) is the homogeneous part of the recurrence relation, which we now call it
as theassociated linear homogeneous recurrence relation. Part (2) is of
our interest in this section, it is the non-homogeneous part. Solving this kind of
questions are simple, you just need to solve the associated recurrence relation
(just like how you did in the previous section), then solve the non-homogeneous
part to find its particular solution. These two sections are solved separately,
which we will combine the results together in the end.

Example 1 (terms of the form kn):


an = 3an-1 + 2n
We first proceed to solve the associated linear recurrence relation (a.l.r.r.),
which is
an = 3an-1

The characteristic equation gives us r = 3, and therefore


an = c1(3n)
Now that the associated part is solved, we proceed to solve the nonhomogeneous part. Using a smart guess, we let
an = c22n
From here, we then deduce that an-1 = c22n-1. Putting these 2 equations back to
the initial recurrence relation an = 3an-1 + 2n, we have
c22n= 3c22n-1 + 2n
(c2 1)2n= 3c22n-1
2(c2 1)= 3c2
2(c2 1)= 3c2
And so we have c2 = 2, which then gives us an = 2(2n) = 2n+1. Combining
both the answers for the associated and non-homogeneous part, we have
our general solution
an = c1(3n) 2n+1
If we were given the initial condition a0 = 2, then our particular solution will be
an = 4(3n) 2n+1
This is the the general rule that we follow: For any amount of terms with the
form kn, we shall let an be kn multiplied by a constant. So if the nonhomogeneous part is an = 5n + 78n, then we let the answer be an = c15n +
c278n, in which c1 and c2 are constants to be found. The same goes to the
form nkn, in which you let an = c1nkn. However, there is anexception, when the
root r is of the same form as kn. For example,
an = 2an-1 + 2n
You get r = 2, which you will get an a.l.r.r. of an = c12n, which has the same form
with the non-homogeneous part! In this case, you need to multiply your nonhomogeneous part with n. Which means, you let
an = nc12n and an-1 = (n 1)c12n-1
And using the same method, you put it back to the initial equation,
nc12n = 2(n 1)c12n-1 + 2n
and you find c1 from here.
Similarly, if
an = 2an-1 + 3(2n) + 5n(2n)
you let your non-homogeneous part be
an = c1n2n + c2n2(2n)
and if
an = 4an-2 4an-2 + 3(2n) + 5n(2n)
which has a double root r = 2, then you will have a non-homogeneous part of
an = c1n2(2n) + c2n3(2n)
as long when the kn or nkn term is already found in the a.l.r.r. once, then multiply
n to all the terms, and multiply n2 if it is found twice. If you are curious why it is
so, you could actually try without following this rule. You find that you cant get
the correct answer.

Example 2 (polynomial terms, n2 + n + c or etc)


an = 3an-1 + n2 + 5n + 3
It is the same for the a.l.r.r., an = c1(3n). But for the non-homogeneous part, we
let
an = c2n2 + c3n + c4
(1)
2
an-1 = c2(n 1) + c3(n 1) + c4 (2)
I think you might have got the pattern by now. Note that if the equation was
an = 3an-1 + n2 + 3
an = 3an-1 + n2 + 5n or
an = 3an-1 + n2
we still need to use the above, an = c2n2 + c3n + c4. This is because we need to
account for the possibly missing terms which might arise in the particular
solution.
So, just like example 1, substitute back both the equations (1) and (2) into the
initial recurrence relations, then find c2 to c4, and combine with the a.l.r.r. to
find c1 with the given initial condition, say a0 = 1. However, there is also
an exception for this case, which is when one or two of the characteristic roots r
= 1. For example,
an = 2an-1 an-2 + n2 + 5n + 3
You obtain a double root r = 1 for your a.l.r.r.. Since 1n = 1, then your a.l.r.r. will
be of the form
an = c1 + nc2
which will clash with you equation for your non-homogeneous part if you use the
same equation like the above, an = c2n2 + c3n + c4. Instead, you should use an =
c2n4 +n3 + c4n2, which is multiplied with n2 to it. Similarly, if it were a first order
recurrence relation with one root r = 1, then you multiply n, and if it were a third
order recurrence relation with a triple root
r = 1, then you multiply n3 (notice the similarity with example 1). Again, you
can try doing without following the rules, which will result in you not getting the
required answer.

You need to do some practices on this section, as the recurrence relation


questions in STPM are mainly from this section. Later on, when you do 2nd order
differential equations, you will see that the solving methods of both sections are
actually quite similar.
5.1 Inverse Trigonometric Functions
This chapter will be of less words, but more formulas. What you need to do in this
chapter is:
1. memorize the useful graphs, identities and formulas.
2. spend your time trying to derive all the identities.
With this 2 points done, you are sure to score for this chapter. STPM questions
will be about proving them, sketching graphs, or differentiating and integrating
them (which will be covered in the next chapter).

You have learnt about trigonometric functions throughout your secondary


school years. Now, we let sin y = x. An inverse trigonometric
function inverses the trigonometric function, and is denoted as y = sin-1 x.
Note that there is a difference between sin-1 x and (sin x)-1. This is only one of
the 6 inverse trigonometric functions, the rest of them are cos-1 x, tan-1 x, sec1
x, csc-1 x, andcot-1 x.
Following are the graphs of the 6 inverse trigonometric functions:

The domain and the range of the functions are as follows:

Now that you the details about these 3 inverse trigonometric functions, itll be
formulas and identities. Try to remember as many as you can. In fact, make sure
you know how to derive every single one of them.

Prove the first one by letting x = cos y, the rest follows.


Inverse-Forward Identities

Forward-Inverse Identities

Proving this one is not hard too. Make x = cos y, and make use of the
identity cos2 x + sin2 x = 1. The rest follows too. Just that probably the tan(cos1
x) one will be harder. Give it a try.

Inverse Sum Identities

Prove the first one by letting x = cos (/2 y) = sin y. Try figuring out the rest
yourself.
sin-1 (-x) = sin-1 x
csc-1 (-x) = csc-1 x
cos-1 (-x) = cos-1 x
sec-1 (-x) = sec-1 x
tan-1 (-x) = tan-1 x
cot-1 (-x) = cot-1 x
This one is proven by letting sin y = x, and sin y = x. The rest follows.

I dont think this one will come out in exams. However, the proof requires you to
learn the inverse hyperbolic in the next section first.

Ill leave this proof to you to try.

This is one is the hardest to prove. Try proving using the formula

You probably dont even know that this formula exist.


5.2 Hyperbolic Functions
The hyperbolic functions, of which there are six, are so named because they
are related to the parametric equations for a hyperbola.
The 2 main hyperbolic functions are sinh x and cosh x (and so now you know
what the hyp button on your calculator is for). The hyperbolic functions are
actually functions of the natural exponents e x through the following equations:

We now relate the hyperbolic functions with the hyperbola. The equation for the
hyperbola is

We let
x = a cosh u
y = b sinh u
We find that cosh2 u sinh2 u = 1, which is true (This can be proven by
substituting theex into the equation). Now that we have 2 hyperbolic functions,
we use it to further derive a few other functions following a similar convention
which the trigonometric functionuses:

All these 6 hyperbolic functions have their special pronunciation. sinh is read as
shine,cosh as cosh, tanh as than, sech as sheck, csch as co-sheck
and coth as cough.

Now we shall see the graphs of the 6 hyperbolic functions. Note that they are all
derived from the exponential function:

cosh x
x

sinh
tanh x

sech x
x

csch
coth x

Their domain and ranges are as follows:

Now that you know the basic information of these functions, its time to
memorize formulas. But before you start, I need to introduce a special rule which
makes the memorizing easier.
The Osbornes Rule states that to change a standard ordinary trigonometric
identities into the equivalent standard hyperbolic identity, change the sign of the
term which is the product of two sines, and substitute the corresponding
hyperbolic functions. This means that if you remember all the trigonometric
identities, you can remember the hyperbolic identities. Please note that all the
trigonometric formulas which have the periodic characteristics (for example,

the R formula and the phase shifts) do not apply to hyperbolic functions, as they
are not periodic.
For each case, you should be able to derive them. Proving them is simple, just
plug in theex relation into it and you are sure to get it.
The formulas and identities are as follows:

Double-Angle Formula

Besides all these formulas, you should also know the relations between
hyperbolic functions and trigonometric functions. Use the following to derive
those for tanh x, sech x, csch x and coth x too. Bear in mind that i i = 1.

.3 Inverse Hyperbolic Functions


Inverse Hyperbolic Functions are obtained in the same way as the Inverse
Trigonometric Functions. I think I dont need to explain much, Ill straight
away show you the graphs:

cosh-1 x

sinh-1 x

tanh-1 x

sech-1 x
coth-1 x

csch-1 x

Note that due to the definition of functions, we only take the positive y values of
the functions
cosh-1 x and sech-1 x. The domain and ranges are as follows:

There are not much formulas and identities for this section. But there is one very
important thing that you are suppose to learn how to prove, which is
the logarithmic form of inverse hyperbolic functions.

Ill show you the proof for sinh-1 x:

Please promise me that you will learn how to prove the rest, this
is super important.
Here are some identities to remember. Note that they are quite similar to the
inverse trigonometric ones:

For all the above identities, please try to prove all of them. Refer to the
section inverse trigonometric functions for some hints on the proofs.

Thats all for this chapter. Just remember how to proof them, sketch their graphs,
and manipulate these functions. You will need to master this chapter before you
can proceed to the next one
6.1 Differentiability of a Function

In Maths T, you already learnt how to prove whether a function is continuous.


Now you need to know the relationship between continuity and differentiability.

A differentiable function has to be continuous, but it doesnt mean that


a continuous function is differentiable. Using logical propositions, it means
that if f(x) differentiable, then it is continuous, but not conversely. Normally, the
non-differentiability occurs in graphs with
1. a corner
tangent line

3. a discontinuity

2. a vertical

4. at end points

For piece-wise defined functions, it is easy to see whether a function is


differentiable at the joints. If the joints have different gradients for the different
sub-functions, then it is definitely not differentiable. However, there should be a
formal definition for differentiability. For a number a in the domain of the
function f, we say that f is differentiable at a , or that the derivatives of f exists
at a if

or
exists.

You can go on to prove that both formulas are actually the same thing. Of course,
differentiability does not restrict to only points. We could also say that a function
is differentiable on an interval (a, b) or differentiable everywhere, (-, +). Ill
give you one example:
Prove that f(x) = |x| is not differentiable at x=0.

So, f(x) = |x| is not differentiable at x = 0. [proven]

These 2 formulas can be used at different situations, so if one doesnt work, use
the other. Differentiability is not a common question in STPM, but you should still
be able to make use of this important information.
6.2 Derivatives of a Function Defined Implicitly or Parametrically
You probably have learnt how to differentiate and integrate functions implicitly
and parametrically, but only up to the first order. Here, we will be learning how to
continue on to the 2nd order. It is actually very easy and straight-forward, so
there is nothing too important in this section.

IMPLICITLY
I think I dont need to tell you how to do it. differentiating a function implicitly for
2nd order is just the same as 1st order. Ill show you an example:
Find the 2nd order derivative of the function x2 + y2 = 2.

Note the use of the product rule in this question. Just do more exercises, then
you will get used to these kind of questions.

PARAMETRICALLY
Probably theres something new in this section. Again, Ill show you an example:
Consider the parametric equations x = t + 1 and y = t3.
Differentiating each other with respect to t gives

To find d2y/dx2,

But we cannot differentiate 3t2 with respect to x. Therefore, using chain rule,

To summarize it up, finding the 2nd order derivative for parametric


equations x and y is by the equation:

6.3 Derivatives & Integrals of Trigonometric & Inverse Trigonometric Functions


The derivatives and integrals of trigonometric functions are covered in Maths
T. So in this section, Ill only teach you how to differentiate inverse
trigonometric functions. A warning here is that you must study the
chapter Integration (especially the part onintegration by parts) in Maths T
before you come to this section, if not you will get really confused.

To find the derivative of sin-1 x, we need to make use of our knowledge on


differentiating a function implicitly. We let x = sin y. Differentiating the
function implicitly, we have

So as a result, we get

From here, you can further deduce that the derivations of the derivatives of
inverse trigonometric functions should follow the same rule, i.e., differentiating
the functions implicitly, then making use of their trigonometric identities. The list
of derivatives of all the inverse trigonometric functions are as follows:

where a is a constant. You should try to prove each and every one of them as an
exercise.

You should further try to differentiate these functions with complicated variables
using all the differentiation rules you learnt. For example,

while

Take note that once you differentiate an inverse trigonometric function, it


becomes a fraction of polynomials. Do not worry about the anti-derivatives of
these inverse polynomial functions now, as I will give you a summary table in the
section on Reduction Formulae.
However, I want to discuss on the anti-derivative of the inverse trigonometric
function itself. For example, I want to find

To do this, you need to make use of integration by parts. If you followed the
formula in the Maths T formula sheet, it would be

However, I suggest that you use this formula which makes you remember easier:

Before I continue, let me explain this formula. Normally, you only use integration
by parts when you are trying to integrate a product of 2 functions, which are
most likelylogarithmic, exponential, polynomial and trigonometric
functions. So in any case, you let one function be u, and the other function
be v. Notice that v has to be a function that is easy to integrate, while u has to
be the other one which is hard to integrate / easy to differentiate. In words, this
formula can be read as
Integration of u v = [ u integrate v ] integration of (differentiate u
integrate v)
Never mind if you dont get it, as long you have your own version of I by P. So
continuing on integrating sin-1 x, we let u = sin-1 x, and v = 1. We have

Get it? So the important tips to this question is to put v = 1 (you might recall
that this is the method you use to integrate ln x). So the rest of the functions,
after integration gives

Try to derive all of them as an exercise. Note that the term ln [x + ( x2 1)] is
actually acosh-1 x function.
6.4 Derivatives & Integrals of Hyperbolic & Inverse Hyperbolic Functions
The derivatives and integrals of hyperbolic functions and inverse hyperbolic
functionsare very similar to those of trigonometric and inverse
trigonometric functions, just with a difference of a negative sign somewhere
within the formulas. There is no rule that we can tell where the minus sign has
changed, so this section requires a lot of memory work.

HYPERBOLIC FUNCTIONS
The derivatives of hyperbolic functions can be derived easily by converting the
functions into their exponential form. Ill leave it for you as an exercise to derive
all of them. The list of derivatives are as follows:

As you can see, the derivative of sinh x is cosh x, and vice versa, which is
different from trigonometric ones by a minus sign. The functions whose
derivatives have minus signs are the secondary hyperbolic functions, csch x,
sech x and coth x.
The integrals, again, are very similar to trigonometric integration.

The integrals for sech x and csch x may look a little weird. You should try to
differentiate the right hand side and see whether you get the expression on the
left. Again, you should do some homework to derive all of them.

INVERSE HYPERBOLIC

Again, the inverse hyperbolic functions have similar derivatives to what the
trigonometric functions have, and it is just a matter of a minus sign, with or
within the square roots. Deriving is similar: derive them implicitly and make use
of the hyperbolic identities (do not confuse with the trigonometric ones.
Remember Osbornes rule). Here you go

The integrals, as usual, are harder to do. You need to use integration by parts, as
I said in the previous section. Try doing them as how you did for the previous
section. As a matter of fact, the huge ln terms in the integrals of csch1
x and sech-1 x are just logarithmic forms of cosh-1 x and sinh-1 x.

This section can only be mastered by doing an adequate amount of exercises.


Frankly, the integrals of inverse functions dont need to be memorized, but you
must make sure you can derive them on the spot. You may start to confuse with

so many kinds of derivatives and anti-derivatives. But thats not the end yet, as I
havent combine some results that can be obtained from both these sections,
you will see it only in the next section. Beware, the next section is not as
easy
6.5 Reduction Formulae
WARNING: You need to fully master integration and differentiation before you
continue on this section.

SUMMARY OF PREVIOUS SECTION


Before I start, let me just give you some results of combining all of the
derivatives and integrals of trigonometric, inverse trigonometric,
hyperbolic and inverse hyperbolic. This will give you a clearer picture of what
you have learnt for the past 2 sections:
1. The Integrals of the Inverse Polynomials
Here I reorganize the tables of integrals for your reference:

As you can see, there is a pattern that you can easily memorize. Its either of the
form a2x2, x2a2 or a2+x2, whether with the square root or not. You also see
that they are all quadratic expressions, in which you could use the method
of completing the squares to solve similar cases. For example,

Also, make sure that the coefficient of x is always 1. Another example,

Notice that if you didnt, you would have got a different answer.

2. Trigonometric & Hyperbolic Substitution


Examples of integration like

cant be solved by normal ways. You might have learnt one trigonometric
substitution to solve this kind of questions in Maths T. But now that you have
learnt hyperbolic functions, your vocabulary of substitutions increases to 3 of
them. Whenever you face the integrals of this kind, you will:

3. Some extra tips on integration


These are just some short notes that I jotted down while I was studying for this
chapter few years ago. I thought I might wanna share with you all:
a.

This kind of integration makes use of the half angle formula. This applies to
hyperbolics as well.
b.

From here, you do integration by parts, with t2 as u and the term in the bracket
as v.
c.

Notice that it must be e2x. Here you use the substitution ex = sinh x. Similarly, if
the term in the square root was e2x 1 or e2x + 1, you substitute ex as cosh
x or sin xrespectively. Try and see whether it works.
d.

You might want to try proving this before you use it. This will be useful for the
next section.

e.

I actually learnt this in University. You should remember this by memory, it might
come useful.
Alright, lets get into the topic:

REDUCTION FORMULAE
A reduction formula is an expression of a definite integral in terms of n,
relating the integral to a similar form of itself. For example,

which can be represented as

Notice that firstly, it is a definite integral, which means that it has upper and
lower limits. Then, it relates to itself, with a decrease of power or so. These
formulae can be very helpful, especially when you calculate high powers of these
functions. So if you want to find

You can use the reduction formula to get

which is easily solvable.


Solving is easy, but the harder part is the proof. It can be very very complicated
and tedious if you are doing this for the first time. It is not easy to straight away
identify how to integrate (as in who is the u and who is the v if youre using
integration by parts), and sometimes, you take hours to solve just a simple
question. Ill show you the proof for the above example so youll know what I

mean. Using my famous colour coded integration by parts formula,

we have

handing over the sinn x term from the right to the left, we get

Complicated? Unfortunately, most exam questions on Reduction Formulae are all


onproving them. Since you need A LOT of exercises (seriously, I bold it
because this is no joke), Ill give you some examples for you to prove.

Not enough? Theres more:

and more

Not hard enough? Try 2 variables then:

Hope you havent start to freak out yet. I seriously havent tried proving all these
Reduction Formulae, so if you have done so, I salute you. I can give you some
tips here though:
1. Break down cosn x = cos x cosn-1 x and tann x = tan2 x tann-2 x.
2. Try checking out the expressions on the right. When theres a n 1, you know
that the term with the power of n needs to be differentiated once, and n 2, will
be differentiate twice. m + 1 means that term will be integrated.
3. For those which are related to polynomials and roots, you will find the formula
d. above very useful.
6.6 Applications of Integration
You probably have learned how to find the area enclosed between the
function f(x) and the axes, or between 2 functions. You have also learned
the volume of revolution for a function f(x) with the x or y-axis as the axis of

rotation. In this section, youll be learning 2 new applications, which are the arc
length and the surface area of revolution.

ARC LENGTH

Consider 2 points, P and Q, on a curve. P is the point (x, y) and Q is the point (x
+ x, y +y). Let s be the length of the arc from a point on the y-axis,
and s the length of the arcPQ. Since s is very small, we can approximate the
arc PQ to a straight line. Hence, using Pythagoras theorem, we have
(s)2 = (y)2 + (x)2
Dividing by (x)2, we obtain

As x 0, this gives

and after square-rooting both sides, we end up with

The parametric form of s can be obtained by dividing the


equation (s)2 = (y)2 + (x)2with (t)2. While the polar form is probably not in
your syllabus, so dont worry too much. To find the arc length of a particular
function, just differentiate it with respect to x, then substitute it in the formula
above.

SURFACE AREA OF REVOLUTION

Let A be the area of the surface formed by rotating the curve y = f(x), between
the lines x = a and x = b, about the x-axis. Let the curved surface area of
a blue ring shown be A. Treating the strip as being bounded by 2 cylinders, we
have
2y s A 2(y + y) s
As x 0, s 0, so we have

which gives us the formula

Again, differentiate the function, and substitute it into the formula to find the
surface area of revolution.
7.1 Taylor Polynomial
A power series is an expression of a function as a sum of infinite
polynomials. Every differentiable function f(x) can somehow be approximated
by a series of polynomials, such that f(x) = a + b(x-x0) + c(x-x0)2 + d(x-x0)3 +
e(x-x0)4 + + f(x-x0)n
When x is close to x0, and where a f are constants. If you remembered
the Binomial Expansion for real numbers, the function (1+x)r can be
represented by the series

Compare the Binomial Series above with the formula for f(x). You see that it is
just a special case of the above function, such that x0 is zero, and the constants
are defined in a special relation.

Our question is this: Since we could represent the above bracketed polynomial
function as an infinite series of polynomials, so is it possible that we represent
other functions, like sin x, ln x, ex or anything else? If it is doable, how do we
determine the constants a, b, c and so on as in the function f(x) above?

Before we get into our topic Taylor polynomials, let me introduce to


you Taylors Theorem with Remainder. The theorem states that if a certain
function f(x) is (n+1)-times differentiable, then

Let me explain this a little. The term a is used when we measure the f(x) close to
it. For example, when a = 0, we substitute it into the series, and the new
expression will be definitely quite accurate for estimating values x which are
close to a (of course, for certain functions, the value x is accurate for whatever
value a. Well discuss this in the later section). This means, we vary a to
approximate the different values of the same function.
Then, the term f(a), f(a) are the 1st and 2nd derivatives of the function f(x).
Note that the term f(n)(x), the n has a bracket, to tell us that it is not the nth
power of f, but the nth derivative of f. The entire series is what we called
as Taylor series. All those terms between the equal sign and the Rn are called
as the Taylor polynomial, and sometimes we denote this whole chunk of
polynomial as pn(x). Writing the whole equation in another form, we have

Now, the term Rn(x) is what we call as the remainder term. Since the Taylor
series is an infinite series, we wont possibly write down all the terms of the
series. So sometimes we just set our limits, for example, we want the series
corrected till the 6th order. So in this case, we see that Rn(x) is the difference
between f(x) and the sum of its first 6 polynomials.
The remainder term, could also be written as

Ill try to give you an illustration to make you understand how this Taylor
Series thingy work. By the way, we are not required to prove the formula for

Taylor series. For an example, take the function

Using Taylors Theorem, we find the Taylor series expanded at x = 0 (which


means, a = 0) for this function. By the way, there is a special name for the Taylor
series expanded at x = 0, which is named Maclaurin Series. We find f(x),
f(x) and so on, substituting them into the formula, we get
f(x) = x + x2 + x3 + x4 +
Notice that this function could be expanded by binomial expansion, which is
faster. Now look at the graph below.

Notice that the blue line sketches the exact graph of the function f(x). As I said
earlier, the Taylor series is only an estimation. This means that, the more Taylor
polynomial terms we keep, the more accurate the Taylor series estimates the
function f(x). Look and see that the graph of degree 1, and degree 2 are
actually quite far off from representing f(x), but is quite accurate for values
of x near 0. As the degree of polynomial increases, the graph of the Taylor series
will eventually be the same as the actual function f(x).

So now, we want to learn how to find the series for some functions that we know
of. Lets try ex. Since there can be an infinite amount of Taylor series expanded at
any a, we shall focus on deriving the Maclaurin series of functions.
Recalling the formula,

We find that ex will still be itself after infinite derivatives, and e0 = 1. So plugging
in what we have to, we get the Maclaurin series

Try finding the Maclaurin expansion for other functions, ln (1 - x), sinh x, and
any other functions you can think of. Note that not all Maclaurin series of
functions could have such beautiful series. Some might end up with non-ordered
coefficients.

Below is a list of common Maclaurin expansions:

I want you to note a few things:


1. There is no Maclaurin expansion for ln x, because ln 0 is not defined.

2. Notice that the Maclaurin expansion similarities for trigonometric and


hyperbolic functions. Here you are able to proof the hyperbolic-trigonometric
identities, which relates both the functions.
3. Some expansions are either odd or even. In other cases, there might be
missing a power as well, so it is normal for a function not to have all the powers
of x.

REMAINDER ESTIMATION THEOREM


If a function f(x) can be differentiated n + 1 times on an interval I containinga &
if M is an upper bound for fn+1(x) on I, i.e., | f(n+1)(x) | M,
then

Ignore the alien language first. Continuing from the previous part,
the remainder of the series is actually quite significant. When you use a Taylor
series to estimate something, you are interested in knowing the error you
estimate, or the difference between your estimate and the actual value. If you
remembered from the previous section, the remainder is given by the formula

The formula gives the exact error when f(x) is approximated by the nth Taylor
sum. The problem is that it is too difficult to evaluate it this way, so we are going
to find anoverestimate of the remainder instead. We look at the magnitude of
the (n + 1)th derivative of f(t) as t varies between a and x, and overestimate
that by a single number M(known as upper bound, as stated above). So here,
we are saying that the remainder is definitely smaller or equal to the upper
bound, and thus the formula above,

This information is important, as we will use it to


1. Estimate the error between the function and the series
2. Approximate a function to n decimal places
I understand that this might be hard for you to catch, so I will give you 2
examples here.
EXAMPLE 1 (ESTIMATE ERROR)
Find the Taylor series of the function ln x expanded at x = 1, to get a
cubic approximation, and estimate the error for ln 2.

Have I taught you how to find a Taylor series for a function?


We first list the function in terms of what we are looking for. In this case, since it
is expanded at x = 1, so the terms are powers of (x 1). It will be in terms
of x or (x + 5) if it is expanded at x = 0 and x = 5 respectively, so
ln x = a + b(x-1) + c(x-1)2 + d(x-1)3
Now, we need to find the constants a, b, c and d. You can find all of them by
substitutingx=1, and by differentiating the left and right side of the function.
Which means,

which gives you

and then

To go on, we need to use the formula above. To find M, we need to first find f(n+1)
(x),which is 6x-4. Remember the part above which says | f(n+1)(x) | M, we find
that the maximum value of 6x-4 is 6 if we use values 1 x
2 (interval I containing a), so we have

Thus, ln 2 = 5/6 within 1/4.


EXAMPLE 2 (Approximating decimal places)
Use an nth Maclaurin polynomial for ex to approximate e to 5 decimal

places accuracy. Find n.

(Note that if you are finding f(n+1)(x) = cosn x or sinn x, then M 1 instead.
Useful information.) Now, the different thing here compared to the previous
example is that we dont know n, so we cant substitute n for any value (in fact,
we are looking for n!). But we do have another piece of information, which is, to
5 decimal places. We take that decimal place, give it a 50%, and now the we
know that the remainder must be smaller than0.000005. So we have

By trial and error, we find that n = 9, then the equation holds. Therefore,

8.1 First Order Linear Differential Equations

In Maths T, you learnt how to solve 2 types of differential equations, namely


theseparable variable and the homogeneous differential equations. In FMT,
you will learn how to solve linear differential equations.

A differential equation is linear if it is of the form

where a is a function of x. It can be solved by introducing an Integrating


Factor, e a dx. This term is multiplied to the left and right of the equation, then
we will get

integrating both sides, we get

Which is an expression of y in terms of x. This method is very simple, let me give


you an example:
Find the general solution of the differential equation

We start by expressing it in the form

Which is

Now that we know the a, we can find the integrating factor,

Note that the integration in the integrating factor doesnt need a constant,
because it will eventually cancel out later. So multiplying it both sides,

8.2 Second Order Linear Differential Equations


In this section, we will be learning how to solve second order linear
differential equations, both homogeneous and non-homogeneous.

HOMOGENEOUS CASE
A second order homogeneous linear differential equation has the form

where a, b and c are constants. We first give a smart guess (ansatz) that the
solution has the form y = Aenx, where A is a constant, and n is an integer.
Differentiating it yields

and once we substitute all equations into the differential equation, and
eliminating Aenx, we get a quadratic equation of the form

which we call as the auxiliary equation. From here we can see that y = Aenx is
indeed a solution for the 2nd order differential equation, provided that the value
of n satisfies this equation. Once we find the values of n, we can thus write down
the general solution of the differential equation.
However, the equation will give you 3 outcomes, which is either it has 2 distinct
roots, 2 equal roots or 2 complex roots.
Case 1: 2 Distinct Roots
In this case, suppose the auxiliary equation gives you 2 roots n1 and n2. your
answer for ywill be in the form of

Remember that your initial guessed solution for the differential equation was y =
Aenx? Notice that if y = Aenx and y = Bemx both are solutions of the the
differential equation, then the sum of both the solutions, y = Aenx + Bemx is also
a solution for the differential solution. That is why, our solution for y is the sum of
both solutions. You may want to prove it. Given the differential equation

You find the auxiliary equation to have the values n = 1, 2 respectively. Do try
substituting y = Ae-x, y = Ae-2x and y = Ae-x + Be-2x into the equation. All of
them are consistent, arent they?
Case 2: 2 Equal Roots
Suppose your auxiliary equation gives you only one value of n. Your answer will
be in the form of
When there is a repeated root, you multiply it by x. Try recalling the connection
of this chapter with what you learnt in the chapter Recurrence Relations.
Case 3: Complex Roots
Suppose you get 2 complex roots, m + in and m in. Your answer will then be in

the form of
Notice the second line of the equation. Remember the fact that
e(m+in)x = emx(cos nx + i sin nx), and you get y = emx[ (A + B)cos nx + i(A
B)sin nx ], in which you represent the terms (A + B) and i(A
B) as C and D respectively. You will be surprised that D is actually a real
constant, so somewhere on the way, A and B must have been complex.
As I said, these are the forms of general solutions that you can get. To get
a particular solution, you need to have an initial condition, something like
when y = 1, x = 0 or so. The particular solution eliminates the constants ABCD,
and gives them in terms of real numbers instead.

NON-HOMOGENEOUS CASE
A second order non-homogeneous linear differential equation has the form

Again, a, b and c are constants, and f(x) is a function of x, which is either


a polynomial, a constant, an exponential function, a cosine or sine
function, or a combination of any 2. Functions like tan x, sinh x or ln x will be

out of your syllabus, in which the solving of these kinds of differential equations
will require the Method of Variation of Parameters. Try google for it if you
want to know more.
The solving method is easy. First you separate the differential equation into 2
parts. You let the first part = 0,

and this is solved just as above, by finding the auxiliary equation and then
representing the answer in the form of y = g(x) = Aenx + Bemx. This solution is
called as thecomplementary function (CF). The other part f(x) will have the
solution y = h(x), which is called as the particular integral (PI). Remember
that the sum of solutions is also a solution, so our final answer will be
y = g(x) + h(x)
Since you already know what to do with the CF, we will introduce methods to
solve the PI below, which depends on what h(x) is.
Case 1: h(x) is a Polynomial Function
You should just substitute the PI as a polynomial function. For example,

You already know the CF from above, which is y = Ae-x + Be-2x. Then to find the
PI, you let
y = Ax2 + Bx + C, according to the degree of the polynomial. Differentiating,
you get

Substituting it back, we get 2A + 3(2Ax + B) + 2(Ax2 + Bx + C) = x2 + 4x 3.


Solving forABC, you get A = 1/2, B = 1/2, C = 11/2. So in the end, our PI is

and the general solution, being the sum of the CF and the PI will be

Try not to get confused with the constants of the CF and the PI, in which here, I
have 2As and 2 Bs. I would suggest you that you should name the constants for
the PI as C, Dand E instead. This rule applies for any polynomial of degree n.
However, there is an exception, when your auxiliary equation has a root n = 0.

Since Ae0 = A, you already have a constant term in the CF. So for your PI, you
need to multiply your solution with an extrax. So if your
f(x) is 4x + 3, your PI should be Bx2 + Cx instead of Bx + C. Similarly, you can
guess that if the CF has a double root n = 0, you will then multiply your PI
with x2. Try relating this information with the chapter on Recurrence Relations.
Case 2: h(x) is an Exponential Function
This is easy. If f(x) = 5e2x, our PI will be just y = Ce2x. Just differentiate y to
get dy/dx andd2y/dx2, substitute it into the equation, and find A. Again in this
case, there are exceptions. If your CF already has a term Ae2x, then like the
above, you multiply x in front of the PI to give you y = Cxe2x. If your CF is y =
Ae2x + Bxe2x, then your PI will be y = Cx2e2x, multiplying x2 this time. Not hard I
think. If you are given

Your CF is the same, y = Ae-x + Be-2x. Your PI will be y = Cex + Dxe-2x, and you
should further solve the equation yourself.
Case 3: h(x) is a Cosine or Sine Function
If f(x) = 5sin 2x, or f(x) = 4cos 2x, or f(x) = 6sin 2x + 7cos 2x, your PI will
be the same, which is y = Ccos 2x + Dsin 2x. Notice that whether you have
only sines or only cosines, you still have to come up with both cosines and sines
for your PI. The reason is simple, if you only come up with one of them, your
solution is not solvable. Again, there is an exception, which is when your auxiliary
equation might have totally imaginary roots, which happens to give your CF a
sine or cosine function of the same form. As usual, just multiply an x in front of
your PI. For example,

You get an auxiliary equation of n = 4i, CF of y = A cos 4x + B sin 4x. So,


your PI should be in the form of y = Cxcos 4x + Dxsin 4x. Differentiate it
(might be complicated), substitute it, find constants C and D, and give the
general solution by adding the PI and CF. Should be straight
forward.Combinations of functions, like f(x) = x cos 3x, f(x) = xe4x, f(x) =
e4xsin 3x shouldnt be hard for you to solve. The basic rule is if your CF already
has a solution with the same form as f(x), then just multiply x to that term. If it
doesnt work, multiply x2 then.

SUBSTITUTION
If you could recall what you learned in Maths T, you have already learned how to
use the substitutions v = ax + by and y = vx to transform a complicatedlooking differential equation into one that is solvable. You can apply those skills
in 2nd order differential equations too. Other kinds of substitution include x =
u0.5, u = xy, but I want your attention on solving differential equations of the

form

You need to use the substitution

From here, find dy/dx and d2y/dx2 by using the chain rule.

Which in the end, gives you a differential equation of the form

which is solvable.

PROBLEM MODELLING
Seriously, I have looked through many books, but none of them really teach us
about modelling for 2nd order differential equations. You should be familiar with
modelling of 1st order differential equations though. So here, I have no choice
but to introduce to you some university level stuff.
1. LRC Circuits
The potential differences of an inductor, a resistor and a capacitor are
denoted by

So this means that the total voltage across the 3 elements put in series is equals
to

I assume you know that L, R, C, and Q means inductance, resistance,


capacitance andcharge respectively. Here we see that the voltage V is a
function of time, which makes it a non-homogeneous 2nd order linear differential
equation. Solving the differential equation means finding an equation which
relates the charge to time.
2. Oscillators
Remember in physics that a simple harmonic oscillator has the equation of
mx + kx = 0

where m is the mass, and k is the spring constant. Notice that this is a 2nd order
differential equation! Solving this makes you find x in terms of t. A damped
oscillator has an extra term in it,
mx + bx + kx = 0
where b is the drag constant. A forced oscillator, in turn would be
mx + kx = F(t)
where the force F is a function of time, probably a sine or cosine function. You
could have guessed it, that a forced damped oscillator would be
mx + bx + kx = F(t)
With these information, you are able to model a second order differential
equation once you know all the factors m, b, k and F.
There are a whole lot more of physics equations which requires differential
equations, like the famous Schrodingers Equation and other higher level
stuff, which requires higher level physics. I better stop here before I turn this into
a physics lecture instead.
9.1 Divisibility
Number Theory is considered one of the hardest sections in Mathematics. It is
the study of the very fundamentals of numbers, yet can be very complicated.
Information on this chapter for such a level of study is very rare, so I hope you
will appreciate everything that I have for you over here.

We have been learning division since standard 2. But today, we will look at it at
a different manner. If a and b are integers with a 0, then we say
that a divides b if there is an integer c such that b = ac. When a divides b we
say that a is a factor of b and that b is a multiple of a. The notation a |
b denotes that a divides b (which means, there is no remainder). We write a
b when a doesnt divide b. For example, 2 | 4, but 4 2. Take note that the
notation 2 | 4 and 2/4 are 2 different things. The former is the notation for
divisibility, while the latter is simply a fraction.
There are certain rules of divisibility that you should know. These are:
1. If a | b, b | c, then a | c.
You should know how to prove this. As above, the term a | b can be written as ak
= b, bl = c, and therefore akl = bl = c. Here, k and l are integers.
2. If a | b, a | c, then a | (b + c) and a | (mb + nc).
3. If a | b, then a | bc.
The above 2 can also be proven with the similar notation as 1.
Not every 2 numbers can divide each other. For example, 2 does not divide 7, as
it leaves a remainder of 1. Here we represent the above in an equation, which is
7 = 23 + 1
Here, 3 is the quotient, we denote the quotient as a div b, which in this case, 2
div 7 = 3. 1 is the remainder, which we denote as a mod b, and here we
have 2 mod 7 = 1. Note that a remainder has to be positive. For example, 7 =
2 3 1 is wrong, because it then gives us 2 div 7 = 3 and 2 mod 7 =

1, a negative remainder. It should be 7 = 2 4 + 1, which in turns give 2 div


7 = 4 and 2 mod 7 = 1. Try doing 2 mod 7 and 2 div 7, and see whether
the answers are different.

A prime number is a number that is only divisible by 1 and by the number itself.
A number which is not prime, is called as a composite number. The smallest
prime number is 2, and it goes on as 3, 5, 7, 11, 13, 17, 19 and so on. The
interesting thing about prime numbers is that, you are unable to write a formula
to determine the sequence or series of prime numbers. So therefore, if we want
to find a very huge prime number, we need to slowly divide the number by
almost every possible number before we say that it is prime. One very famous
example used in the past is the sieve of Eratosthenes, which is used to find all
the primes below 100. It is done by first listing down all the numbers from 1 to
100. Then, slowly cross out the multiples of 2, 3, 4 and so on, until you have
nothing to cross out. The rest of the numbers, are primes! Another one is The
Prime Number Theorem. You might wanna google about it.
So how do you know whether a number is prime, for a relatively small number?
There is a way to find out, at least a little faster than trying to divide the number
by any number smaller than itself. It is found that if a number is not
divisible by primes less than itssquare root, then it is a prime number. This
can be proven. If we have a composite number n such that ab = n, then if a >
n and b > n, then we have ab > n n > n, which is a contradiction.
Although it does speed up the process of finding primes, it is still quite a slow
method.
Prime numbers are the building blocks of all numbers. the Fundamental
Theorem of Arithmetic states that:
Every positive integer > 1 can be written uniquely as a prime or as the product
of 2 or more primes where the prime factors are written in order of nondecreasing size.
This is what we called as prime factorisation. For example, 4 = 22, 100 =
2252, 641 = 641 and so on. We can write down any number in terms of products
of primes, a = 2x3y5z7w and so on.
Theres a lot to talk about prime numbers. One famous argument was to prove
that there are infinitely many primes. Suppose you label every prime number
as p1, p2, p3 and so on. You found the greatest prime number in the world, called
as pn. So if we write a particular number a such that a = p1p2p3pn + 1, it must
have been a prime, since it couldnt be represented as the product of any primes
smaller than pn. This contradicts with what we said earlier on about finding the
greatest prime number, and therefore proves that there are indeed infinitely
many primes.
Another 2 interesting stuff on prime numbers are the Goldbachs
Conjecture and theTwin Prime Conjecture. Go look up on it if you are free.

Now, lets move on to the gcd and lcm. Try recalling whether this sounds
familiar to your Form 1 Mathematics. gcd is the greatest common
divisor (you are probably more familiar to the name highest common factor,
or HCF), while lcm is the lowest common multiple. Here we denote k = gcd
(a, b) to have the meaning of k is the greatest common divisor of the
integers a and b. Similarly, k = lcm (a, b) means k is the lowest common
multiple of the integers a and b. For example, gcd (4, 6) = 2 and lcm (5, 6) =
30.
Relating this back to prime numbers, for any 2 integers a and b, if gcd (a, b) =
1, we say that they are relatively prime. For example, 5 and 6 are relatively
prime.
Do you still remember the method to find your lcm and gcd in Form 1? You had to
draw out something like a ladder or so. But here, we will use another method,
which has something to do with the prime factorization. For example,
Find gcd (120, 500) and lcm (120, 500).
We first start by representing the numbers 120 and 500 in terms of primes.
120 = 23 3 5
500 = 22 53
Now, the formulas to find the gcd and lcm are easy, it is just
gcd (a, b) = p1min(a1,b1)p2min(a2,b2)p3min(a3,b3)pnmin(an,bn)
lcm (a, b) = p1max(a1,b1)p2max(a2,b2)p3max(a3,b3)pnmax(an,bn)
You first compare the primes present among the 2 numbers 120 and
500. p1max(a1,b1)means the maximum of the powers of that particular prime p 1 of
the 2 numbers a and b, while p1min(a,b) means the minimum. So plugging in the
numbers, we have
gcd (120, 500) = 2min(3,2) 3min(1,0) 5min(1,3) = 223051 = 20
lcm (120, 500) = 2max(3,2) 3max(1,0) 5max(1,3) = 233153 = 3000
From here, we obtain a new formula, as we can see that
ab = gcd (a,b) lcm (a,b)

The method described for computing the greatest common divisor of 2 integers,
using the prime factorizations of these integers, is inefficient. The reason is that
it is time consuming to find prime factorizations. Now I will teach you a more
efficient method of finding the gcd, called the Euclidian
Algorithm (also Euclids Algorithm). It is named after the ancient Greek
mathematician Euclid, who included a description of this algorithm in his
book The Elements. Lets start with an example.
Find gcd (91, 287).
First, we use the smaller term to divide the bigger term. Then, we take the
divisor of and the remainder of the equation, repeat the process, until we get no
more remainder. The last remainder is the gcd that we are finding. So we have
287 = 91 3 + 14
91 = 14 6 + 7
14 = 7 2

gcd (91, 287) = 7


You might be puzzled as in how did this method work. Basically, this method is
formulated from the results
if a = bq + r, then gcd (a, b) = gcd (b, r)
From
a = bq + r
I know that if some integer k divides a, it must divide b and r as well. Now I turn
the equation around
a bq = r
If some integer divides both a and b, then it must divide r. So here, the biggest
integer that can divide a, b and r must be the same integer,
which is gcd (a, b), and also gcd (b, r). So therefore, the Euclidean Algorithm is
valid.

9.2 Modular Arithmetic


Youll terribly love this section.

Consider how you read your time on the clock. Every time the short hand goes
one round, it will be 12 hours. So when the shorthand goes past another hour, it
will be 13 hours, and the time might be 13 o clock. We know, however that 13 o
clock is actually 1 o clock. Same to 25 o clock, it still means the same thing. We
say that the clock follows a modular system.
Modular Arithmetic, is the calculations of numbers in a modular system. In the
clocks system, it is of modulo 12. When two numbers a and b are congruent to
each other in the same modulo, we denote it by
a b (mod m)
This equation is read as a is congruent to b modulo m. For example, 13 1
(mod 12), this means that 13 is the same as 1 in a modulo 12 system. Note that
the main equation is the part on the left hand side, 13 1, while the right hand
side, (mod 12), tells you that this equation is valid only in modulo 12. This
modulo system also has another explanation for it. a b (mod m) means
that a and b give the same remainder when divided by m. Notice that 13 divided
by 12 gives remainder 1, while 1 divided by 12 also gives the remainder 1. Or
using the mod terminology, we say that
a mod m = b mod m
Take note that a b (mod m) and a = b mod m both bring different meanings.
The latter says that a is the remainder when b is divided by m.
Now, bringing divisibility in, we say that
a b (mod m) if and only if m | (a b)
Can you see that m divides a and b? And if that is the case, a and b actually
have a difference of a multiple of m. So this means that, 49 37 25 13

1 (mod 12). You just add 12 to the number, you get another number which is
congruent modulo 12.
If I convert this notation a b (mod m) into algebra, it can be written as a = b
+ km, where k is a constant (try verifying this with the divisibility notation
above). So to summarize things up:
When a b (mod m), then
a mod m = b mod m
m | (a b)
a = b + km

Before we go into solving linear congruences, we need to know some basic


rules of modular arithmetic. These rules below can be proven by yourself, and so
try doing it.
If a b (mod m) and c d (mod m), then
1. a + c b + d (mod m)
2. a c b d (mod m)
When in the same modulo m, the addition and subtraction rules work as usual.
This will be useful when you are solving simultaneous modular arithmetic
equations. This can be proven by using its algebraic form, a = b + km, c = d +
lm.
3. ac bd (mod m)
This is also important, and uses the same method above to prove.
4. ak bk (mod m)
Where k is a constant, a positive integer. Ill proof this one here for you:
When a b = km, then ak bk = (a b)(ak-1 + ak-2b + ak-3b2 + + abk-2 + bk1
), which is a multiple of (a b). Therefore, ak bk = lm, where l is a constant,
and therefore
ak bk 0 (mod m)
ak bk (mod m)
5. ak bk (mod m)
The congruence holds even when a constant is multiplied to both sides of the
equation. Same proof as 1, 2 and 3.
Next, try proving both the equations below (make use of the information that
a (a mod m) (mod m):
6. (a + b) mod m [(a mod m) + (b mod m)] (mod m)
7. ab mod m [(a mod m)(b mod m)] (mod m)
8. The Simplification Law
If c | a, c | b, c | m, and a b (mod m), then

To summarize this rule, it means that a constant c can only be divided out
from a, b andm if it divides all of them. Provable too.

Heres another one not to be confused with the former, the cancellation law.
If gcd (c,m) = 1, then
9. ac bc (mod m) a b (mod m)
You can prove this too. Suppose ac bc = (a b)c = km. Since gcd (c, m) =
1, c and mhave no common divisors, and therefore c | k. Since c divides this
constant k, c can be cancelled out, and thus a b = nm for some integer n.
Here we see that a b (mod m), which was to be shown.

FINDING THE INVERSE


b, the multiplicative inverse of a number a is such that ab = 1. Here, we can find
that b is actually the reciprocal of the number a. Here in modular arithmetic, we
are going to look for an inverse of a, such that
ab 1 (mod m)
Let us recall the Euclidean Algorithm. We learnt that we could find gcd (a,
m) by dividing the bigger number with the smaller number, and continue to
divide the smaller number with its remainder, and so on until there is no
remainder. Indeed, we could make use of this information to find the gcd in terms
of a linear combination of these 2 integers, such that
gcd (a, m) = m n + a b
where n and b are integers. If gcd (a, m) = 1, then an inverse of a exist, and
the integer bhappens to be the inverse of a. We will see why this is true in the
following example:
Find gcd (123, 2347) and write it as a linear combination of these
integers, and further find the inverse of 123 modulo 2347.
2347 = 123 19 + 10
123 = 10 12 + 3
10 = 3 3 + 1
3=13
gcd (123, 2347) = 1
Now, to get the linear combination thingy, we have to reverse all of the above
equations. Let me rewrite them again:
10 = 2347 123 19 (1)
3 = 123 10 12
(2)
1 = 10 3 3
(3)
Now we will do some back substitution. We want an equation of gcd (123,
2347) (which is 1) to be in terms of 123 and 2347. We start with equation (3),
and substitute equation (2), we have
1 = 10 3 (123 10 12)
= 10 3 123 + 10 36
= 10 37 3 123
Repeating the process with equation (1),
1 = (2347 123 19) 37 3 123
1 = 2347 37 123 706

We have now shown the gcd (123, 2347) in terms of a linear combination of its
numbers. This is what we called as the extended Euclidean Algorithm. Here,
we find that the inverse of 123 modulo 2347 is 706. We see that
-706 123 86838 1 (mod 2347)
Note that every integer congruent to 706 modulo 2347 is also the inverse of
123, which we find it best to represent the inverse of 123 as 1641, a positive
integer less than 2347.
I havent tell you why this works. Since gcd (a, m) = 1, and we know that it can
be represented as a linear combination 1 = m n + a b, we can show that
m n + a b 1 (mod m)
You should understand this equation. If 1 = 3 2, then 1 3 2 for whatever
modulo, and that make sense. Here, since m n 0 (mod m), as this is
obvious, since m divides itself completely, in whatever given n. So in the end, we
have a b 1 (mod m), which was what we used just now. Note that not all
integers have inverses in a particular modulo. It is only in the case where gcd (a,
m) such that there will be an inverse.
By the way, the inverse could also slowly be found by trial and error for small
moduli. For example, 2 mod 3. Try multiplying the numbers between 1 to 3 to
the number 2, and you find that 2 2 4 1 (mod 3). And thus, 2 is the
inverse of 2 modulo 3.

SOLVING LINEAR CONGRUENCES


A linear congruence equation has the form
ax b (mod m)
In which we want to find x. If you can relate this to the section above, it has a
solution only if gcd (a, m) = 1. This can be solved by finding the inverse of a.
Lets try an example:
Solve the linear congruence 3x 4 (mod 7).
We have checked that gcd (3, 7) = 1, and so an inverse of 3 exist, and thus the
solution exists. Using the extended Euclidean Algorithm, we get the inverse of 3
as 2. So multiplying 2 to both sides,
-2 3x 2 4 (mod 7)
We know that 2 3 1 (mod 7), and therefore
x 8 6 (mod 7)
substituting 6 back into x, you get the answer correct. Besides, substituting any
integer which is congruent to 6 modulo 7, like 13, 20, 8, 1 and etc are also
solutions of the linear congruence.
In cases where gcd (a, m) 1, there are solutions too, only if gcd (a, m) | b,
and there aregcd (a, m) solutions. For example,
2x 6 (mod 8) has gcd (2, 8) = 2 solutions, but 2x 5 (mod 8) has no

solution. Lets try to solve the linear congruence 2x 6 (mod 8). You can solve
it as follows:
Using the simplification law, you see that 2 divides 2, 6 and 8 and therefore
x 3 (mod 4)
which is in another modulo system. If you want the solution to be in the same
modulo system, then you need to do some modification. By looking at the
equation, you know that
x 3 (mod 8)
is one solution. The other solution is by adding 3 to the new modulo system you
get above, which is 4. You get another solution,
x 7 (mod 8)
So your solution for the linear congruence 2x 6 (mod 8) is x 3 (mod 8), x
7 (mod 8).
This same method applies: When there are 10 solutions, you keep on adding the
new modulo system integer value to the existing answer, until you get 10
solutions.
Lets try another one, 2x 6 (mod 9). Using rule number 9 above, you can
quickly see that gcd (2, 9) = 1, and therefore x 3 (mod 9). Try not to confuse
this one with the one above.

SIMULTANEOUS LINEAR CONGRUENCES


Similar to the one above, now you have 2 congruences with 2 unknowns, under
the same modulo. Lets consider a system of linear congruences with 2
unknowns:
ax + by k (mod m)
cx + dy n (mod m)
We first write this in matrix form:

For this system of congruences to have a solution, there must be an inverse for
the matrix. This means, that ad bc must not be zero, and must exist. Lets
multiply the left and right hand side with its adjoint matrix:

Now we get 2 linear congruences,


(ad bc) x (dk bn) (mod m)
(ad bc) y (an ck) (mod m)

and for such linear congruences to have solution, again we must make sure that
the equation gcd (ad bc, m) = 1 holds. With that you can solve the above 2
linear congruences for x and y. This kind of question came out in STPM 2009, my
year. Try solving it with the method I just showed you.

QUADRATIC RESIDUE MODULO M


a quadratic residue modulo m has the form
x2 q (mod m)
You are supposed to solve the equation in terms of x. I dont know of any short
cut to solve such a problem, but one way is to list out all the possible values,
draw a table, and find the answer. Example,
Solve the quadratic residue modulo x2 2 (mod 7).
We proceed to draw a table:

Therefore, we conclude that x 3 (mod 7) and x 4 (mod 7).


If you have noticed, we could actually solve linear congruences with the above
trial & error method too.
If m is very big but divisible, we could break the modulo system up. For example,
x2 14 (mod 35)
We can make it into 2 equations, namely x2 14 0 (mod 7) and x2 14 4
(mod 5).
Tabulating the table,
x2 0 (mod 7) has the solution x 0 (mod 7).
x2 4 (mod 5) has solutions x 2 (mod 5) and x 3 (mod 5)
x 0 (mod 7) means that
x 0, 7, 14, 21, 28 (mod 35) are solutions of modulo 35.
x 2 (mod 5) means that
x 2, 7, 12, 17, 22, 27, 32 (mod 35) are solutions of modulo 35.
x 3 (mod 5) means that
x 8, 13, 18, 23, 28, 33 (mod 35) are solutions of modulo 35.
We find the intersections of x 0 (mod 7) and x 2 (mod 5), we get
x 7 (mod 35)
And we find the intersections of x 0 (mod 7) and x 3 (mod 35), we get
x 28 (mod 35)
And our final answer solution is
x 7, 28 (mod 35)

MODULAR EXPONENTIATION
I dont think this is in the syllabus, but it is good for you to know. Modular
exponentiations are of the form an mod m. You are normally asked to compute
it with a very big value of n. For example,
Find 3101 mod 100.
First, do you still remember what are binary numbers? Express the term n in
binary form, by keep on dividing the number with 2, writing the remainder by the
side. Recall your Form 4 Maths:

So we get 101 = (1100101)2 = 26 + 25 + 22 + 20 = 64 + 32 + 4 + 1


Substituting it back to the congruence,
364+32+4+1 mod 100 = 3643323431 mod 100
Now, we need to tabulate the amounts congruent to 364 332 34 and 31.
32 9
34 92 81
38 812 61
316 612 21
332 212 41
364 412 81
Now you know all the values, substitute them back into the equation,
3643323431 81 41 81 3 3 (mod 100)
Spend some time understanding my calculations. If not, just pray that it wont
come out in exams.

CHINESE REMAINDER THEOREM


In the 1st century, the Chinese Mathematician Sun-Tsu asked:
There are certain things whose number is unknown. When divided by 3, the
remainder is 2; when divided by 5, the remainder is 3; and when divided by 7,
the remainder is 2. What will be the number of things?
This puzzle can be translated into the following question: What are the solutions
of the systems of congruences
x 2 (mod 3)
x 3 (mod 5)
x 2 (mod 7) ?

The Chinese Remainder Theorem, named after the Chinese heritage of


problems involving systems of linear congruences, states that when the moduli
of a system of linear congruences are pairwise relatively prime, there is a
unique solution of the system modulo the product of the moduli.
I will omit the proof, because I dont understand it either. Here are the steps to
solve this kind of problems:
Firstly, for a system of linear congruences with different moduli,
x a (mod m)
x b (mod n)
x c (mod o)
We construct a number M being the product of the moduli,
M=mno
Then, we construct a number Mm, Mn and Mo such that they are the product of
the all the moduli in the system other than itself. Which means,

Then, find the inverse of Mm, Mn and Mo respectively:


MmMMm 1 (mod m)
MnMMn 1 (mod n)
MoMMo 1 (mod o)
And finally, your answer will be:

Lets try to solve Sun Tsus problem.


x 2 (mod 3)
x 3 (mod 5)
x 2 (mod 7)
M = 105,
M3 = 35 2 (mod 3), inverse MM3 is 2.
M5 = 21 1 (mod 5), inverse MM5 is 1.
M7 = 15 1 (mod 7), inverse MM7 is 1.
x (2 2 35) + (1 3 21) + (1 2 15) 233 23 (mod 105)
Note that you cannot let M3 = 2, M5 = 1 and M7 = 1, as you will get a total
different answer. However, the inverses can be any other number congruent to
itself in its particular modulo.

FERMATS LITTLE THEOREM


If p is a prime number and a is an integer not divisible by p, then
ap-1 1 (mod p)
ap a (mod p)

This theorem is here for you to identify if a congruence can be solved easily.
Similarly, I wont prove it, so just keep this theorem in mind and use it if needed.
10.1 Graphs
In mathematics and computer science, graph theory is the study of graphs,
mathematical structures used to model pairwise relations between objects from
a certain collection. Agraph, G = (V, E) consists of V, a nonempty set of vertices
/ nodes and E, a set of edges. In other words, a graph is a discrete structure
consisting of vertices, and edges that connect these vertices. Each edge has
either one or two vertices associated with it (endpoints). An edge is said
to connect its endpoints. A graph looks something like this:

As you can see, a and b are vertices, while e and f are edges. the edge g is
called a loop. The vertex set V = {a, b}.

In this section, there will be many terminologies which you should remember,
and should be able to write down their definition in your exam. Here we will be
learning the different kinds of graphs and their names:
An infinite graph is a graph with infinite vertex set (or rather, an infinite
number of vertices). The definition of a finite graph is just the converse.
Throughout this section, we will only be learning about graphs with finite amount
of edges and vertices.
A simple graph is a graph in which each edge connects two different
vertices and where no two edges connect the same pair of vertices.
A multigraph is a graph that hasmultiple edges connected to
the same vertices, while a pseudograph is a graph that may
include loops, multiple edges connecting the same pair of vertices. The 3
pictures below illustrate a simple graph, a multigraph and a pseudograph:

Notice for the multigraph, there are 2 edges connecting both a to b and a to c,
while 3 edges connecting e to f. As for the pseudograph, there exist loops at the
vertices e and f.
The complement of the graph, GM, has the same amount of vertices as
graph G but whenever there is a edge between vertices a and b, there wont be
an edge, and whenever there isnt an edge between vertices a and b, an edge is

added to it. This only applies to simple graphs. For example, below is the graph
and its complement:

All the above graphs are undirected, that means that one can traverse an edge
in both directions. A directed graph (or digraph), consists of a nonempty set of
vertices and a set of directed edges (or arcs). Each directed edge is associated
with an ordered pair of vertices. The directed edge associated with the ordered
pair V = (u, v) is said to start at uand end at v. In other words, we say
that u is adjacent to v, while v is adjacent from u. Notice thee different uses
of { } and ( ) brackets for undirected and directed graphs. Below is a directed
graph:

For the ordered pair of vertices (u, v), we say that u and v are adjacent, and we
say that the edge is incident / connects u and v. u is known as the initial
vertex and v being theterminal vertex. Using the similar naming convention,
we can describe a simple directed graph as a directed graph in which each
edge connects two different vertices and where no two edges connect the same
pair of vertices. Then similarly, a directed multigraphcan be defined.
An underlying undirected graph is the undirected graph that results from
ignoring directions of edges. It is just the same graph without the arrows.
A mixed graph, is a graph with both directed and undirected edges.
A converse of a directed graph, is the graph in which its arrows are reversed.

For every graph, we could come up with subgraphs, which are graphs that are
subsets of the initial graph. For example, the graph

can be broken down into 11 subgraphs below:

An exercise for you here is that you can try to figure out whether you can
determine the total amount of subgraphs, given the values of V and E.

A bipartite graph is a simple graph such that its vertex set V can be partitioned
into 2 disjoint sets V1 and V2 such that every edge in the graph connects a
vertex in V1 and V2. Consider the bipartite graph below:

Notice that I coloured the vertices with 2 colours, red and blue. The blue vertices
will not connect to any other blue vertex, and the red vertices too, they dont
connect to any other red vertex. The graph is partitioned such that there are two
sets or parties of vertices which can be grouped together. To identify a bipartite
graph is simple: As long as you can colour adjacent vertices with only 2 colours,
then it is a bipartite graph. For example, you colour the first vertex blue. The
vertices adjacent to the first vertex must be coloured red, and if you can fit all
the vertices with 2 colours such that no two adjacent vertices have the same
colour, then it is a bipartite graph. Notice also, that a graph is bipartite if and
only if it has no odd cycles. We will learn about cycles in the next session.

Now there are a 5 types of special simple graphs I want to introduce:


1. Complete Graph Kn
This graph is a simple graph that contains exactly 1 edge between each pair
of distinct vertices. In other words, this graph has the maximum amount of
edges it can have, and adding any edge between any 2 vertices will turn it into a
multigraph. The graphs look as follows:

For the K4 graph, it has 4 vertices, and every vertex is connected to the other 3
vertices. By simple calculations, a Kn graph has n vertices, and n(n-2)/2 edges.
2. Cycle Graph Cn
This graph, where n 3, consists of n vertices and edges.

Strictly speaking, C2 is not a Cycle graph, as n < 3. Notice that every vertex is
only connected to two other vertices. It looks like a regular polygon with n sides.
3. Wheel Graph Wn
This graph looks like a wheel with n sides. We obtain the wheel when we add an
additional vertex to the cycle Cn, for n 3, and connect this new vertex to each
of the n vertices in Cn, by new edges.

A Cn graph has n + 1 vertices and n 2n edges.


4. n-Dimensional Hypercube, Qn
This graph, also know as n-cube, is the graph whose vertices represent the 2 n bit
strings of length n. Two vertices are adjacent if and only if the bit strings that
they represent differ in exactly one bit position. I dont think this graph is in the
syllabus, but I think it will be good for you to know:

This graph has 2n vertices and 2n-1 edges. Try proving this if you are free.
5. Complete Bipartite Graph, Km,n
This graph is just a bipartite graph, in which there is only 1 edge between each
pair ofdistinct vertices across V1 and V2. Note that the number of edges, |

E(m, n) | = mn, and there are m + n vertices.

Now that we know everything about the structure of graphs, we shall now get
into the a little calculations. The degree of vertex is the number of edges
incident with it, except that a loop at a vertex contributes 2 times to the degree
of that vertex. The degree of a vertex is denoted by deg (v). When deg (0), we
say that the vertex is isolated, and whendeg (1), then we say that the vertex
is pendant.
We now want to find the relationship between the sum of degrees of vertices &
number of edges. The Handshaking Theorem states that the sum of degree of
vertices is double the amount of edges. In equation form, we have

This theorem has many implications. One of them is that we know that a graph
cannot exist if the sum of degree of vertex is odd.
In the case for directed graphs, we denote deg+ (v) as the out-degree,
meaning the amount of arcs pointing away from the vertex, while the indegree is denoted by deg- (v), which is the amount of arcs pointing towards the
vertex. Modifying the handshaking theorem, we have

11.1 Transformation
A transformation is a correspondence between 2 sets of points in a plane. A
transformation M is described as a linear transformation of n-dimensional
space when it has the properties
T(x) = T(x), and
T(x + y) = T(x) + T(y)
where and are arbitrary constants.Recalling your Form 4 Mathematics, you
learned how to find the image of points on the Cartesian plane under a certain
transformation. Here you will further learn how to use matrices and some simple
linear algebra to represent transformations in 2 dimensions only.

An equation of a transformation looks like this:

where M is a matrix of transformation. The matrix M,

will determine how the point (x, y) will transform into its image (x, y). The
matrix M is easy to compose. Basically,

where (1, 0) and (0, 1) are the unit vectors of directions x and y respectively (or
rather, you can treat these 2 vectors as points on the x and y plane). For
example, if I want to transform the point (1, 0) to (2, 0), and the point (0,
1) to (0, 2), then my matrix of transformation will be

So if you want to find the transformation of a unit box, (0, 0), (1, 0), (0, 1) and (1,
1), just use this matrix and pre-multiply with the points, then you will get the
image of the transformation. An example will be given in the next section.

Knowing how a transformation matrix works, we now want to learn how to


represent a few types of linear transformation with 2 2 matrix. We learned the
3 isometries: translation, rotation and reflection in Form 4. Now we will go
through them again, and then we will learn some new ones too. By the way,
an isometry is a distance-preserving map between metric spaces. Geometric
figures which can be related by an isometry are called congruent. This means
that, after an isometric transformation, the area remains unchanged.
1. Translation

Translation is just the moving of coordinates, moving of an object from one point
to another, without altering its size, shape and orientation. The matrix below will

represent a transformation

where a and b will be the amount of shift of the object. (1, 2) will translate the
point (x, y)one step right and 2 steps upward and vice versa.
2. Rotation

Given an angle, a point is rotated along the origin either clockwise or


anticlockwise. A rotation, once the angle being known, could be represented by
the matrix

Note that this rotation restricts to rotation about the origin only. We will discuss
later what to do if the point of rotation is not zero. The area and the shape of the
object is unchanged, and once rotated about 360 o, the object gets back to its
initial position.
3. Reflection

For a reflection, you need a line which acts like a mirror, such that the whole
image reflects to the other side of the the line, equidistance and perpendicular to
that line. This line, in this case, must pass through the origin. Again, the shape of
the object doesnt change, and so is the area. A few common reflection matrices
are as follows:

along x-axis

along y-axis

along the line y = x

It is actually a little tedious to find the matrix of reflection with only given a line
in the form of y=mx. First, you find the normal line, y = m-1x + c. Substitute
the points (1, 0) and (0, 1) to find two parallel normal lines, which passes
through these 2 points. Next, you find the intersection point of these 2 lines, with
the line of reflection. Taking that intersection point as the mid point, you
probably know how to figure out where the reflected points of (1,
0)and (0,1) are, and thus completing your matrix.
But there is a faster way. Let the line of reflection y = mx be written in the form
of y=(tan )x. We see that the gradient m = tan . With this information, we
find , and the reflection matrix is just represented by

You can try figuring out why this is true. This has something to do with the angles
subtended from the point to the origin, then the angle of the line, the uses of
cosine and sine and etc. To find cos 2 and sin 2, you could either calculate ,
or you might want to make use of some trigonometric identities.
4. Scaling

Scaling does not preserve the size, but it preserves the ratio of the object. This
scaling starts from the origin. Scaling can be represented by the matrix

where a is a constant. If |a| > 1, then it is an enlargement. If |a| < 1, then it is


acontraction, that means the size decreases. A negative value of a makes the
object enlarge or contract at another direction. In the case of the red box above,
it will enlarge in the 3rd quadrant instead of the 1st. a also represents the factor
of enlargement. a = 2means that the image will be twice as large as the object,
and vice versa.

5. Stretch

A stretch looks similar to an enlargement, but this time, the ratio of the sides and
shape is not preserved. It can be a stretch along the x-axis, along the y-axis, or a
stretch along both axis, with different proportions. A stretch is represented as
below:

along x-axis

along y-axis

You probably could have guessed that for values of |a| < 1 turns the stretch into
a compression, while a negative value of a stretches the object the other way.
For a stretch, it really doesnt matter whether it stretches from the origin or some
other point, as they are the same anyway.
6. Shear

A shear deforms a shape a little. It turns a square into a rhombus, as shown


above. It looks like as if we are flattening something sideways. The shear can be
represented by the matrices below:

parallel to x-axis
angles

parallel to y-axis

2-way shear at different

The angle is calculated from the opposite axis. For example, the box above
undergoes a shear parallel to the x-axis, and the angle is calculated clockwise
from the positive y-axis. If the angle was 45o, we say that it is a shear
of 45o parallel to the x-axis. Conversely, it can be a shear of xo parallel to the yaxis, which looks like the one below:

The shear depends on the origin too.

WHEN THE REFERENCE POINT IS NOT THE ORIGIN


As I said earlier, these transformations transform with respect to the origin.
rotations, reflections, scaling and shears all have their reference points at the
origin. In order to make their transformation not from the origin, we need to
translate the point of reference to the origin (translating the coordinate of the
objects together), do the transformation, then translate the coordinate points
back again. I dont know what is the terminology for this, since this is something I
figured out myself. If the point of rotation / scaling / shear is(a, b), with M as the
transformation matrix, then (x, y) is transformed as follows:

In the case of a reflection, as I said earlier, the reflection matrix above applies
only for lines passing through the origin, y = mx. Now that we want to find the
reflection of an object across the line y = mx + c, we take (0, c) as the point of
reference to be subtracted and added in this case. The transformation will
become

You can try it out and see whether this is true. You will find that translating any
point (a, b)will be correct, as long as the line translates such that it passes
through the origin.

SIMILARITY TRANSFORMATION
Two square matrices A and B that are related by A = P-1BP where P is a square
non-singular matrix are said to be similar. A transformation of the form P-1BP is
called asimilarity transformation, or conjugation by P. Try recalling what you
learnt about similar triangles in Maths T. Similarity transformation simply means
that the 2 transformation Aand B are similar to each other, just that they
probably changed their basis, coordinate or are multiplied by a different factor. I
dont have much information on this, so I wouldnt elaborate much here (please
share with me if you have good information on this, I will add it in here some

day). However, if you are asked to find whether 2 matrices A and B are similar,
just make use of the formula above, and if the equations are consistent, that it is,
if not then otherwise.
11.2 Matrix Representations
Knowing all the different types of transformation, we shall now get to do the
algebra of transformations. Lets begin with a simple example:
Find and describe the image of the triangle ABC where A(1, 0), B(2, 0)
and C(2, 3) under the transformation matrix

.
Plotting the new coordinates OA, OB and OC, we find that the transformation
is a reflection in the x-axis (or reflection in Ox).

Singular transformation in 2 dimensions maps all shapes are transformed into


either a point or a line, and a line is transformed into a single point. In other
words, the area of the object is destroyed. Consider the matrices below:

The first one maps all shapes to the line y = x. The second matrix maps all
points to the x-axis, while the last one maps everything to the origin. You will
know that a matrix M is a singular matrix when | M | = 0. There is a way to tell
whether a matrix maps to a line or to a point. Consider a singular matrix

If the column vector (a, b) = (c, d), then the matrix maps all shapes to a point.
If the column vector (a, b) (c, d) but (a, b) // (c, d), then the matrix maps all
shapes to a line.

AREA SCALE-FACTOR AND THE DETERMINANT


Throughout our discussion on transformations, we havent discussed on how the
transformation affects the area of an object. We want to know whether a certain
transformation makes a certain object enlarged or diminished. It turns out that
the determinant of the matrix of transformation tells us information on how the
area would be in the end. With the matrix of transformation M, We see that

Area of object det (M) = Area of image


In the case when | M | = 0, the transformation maps lines or shapes to a point,
and the area is destroyed, in which agrees with the part earlier on.

Invariant points are points which map to themselves after the transformation.

This means that


If you might have noticed, this reminds you on the chapter about eigenvalues
and eigenvectors, in which this situation, the eigenvalue is one. To find the
invariant points for the transformation M, for example

You substitute it into the equation above, then you get


x=x
y = y
So this tells us that the invariant points of this transformation are any points (x,
0), or simply just the points on the x-axis. Verify yourself to see whether this is
true.
An invariant line, maps a line to the same line, but not necessarily mapping all
the points to the same points. In our study, all invariant lines must pass through
the origin, and even if there were invariant lines that do not pass through the
origin, it must be parallel (has the same gradient) to another invariant line which
passes through the origin. To find the invariant lines under a certain
transformation, we make use of the parametric form of the line, x = t, y = at.
We substitute the variable t into x and y and we have

or to make life easier, we rather put

Note that the variable x maps to another variable X, but not to itself. Ill show
you an example:
Find the invariant lines of the transformation

So we have two equations


mx = X
x(5 4m) = mX
Dividing both the equations, we get a quadratic equation
m2 + 4m 5 = 0, m = 1, 5
We have the lines y = x, y = 5x.
You might want to test whether the lines y = x + c or y = 5x + c are invariant
too. Substitute it back into the equation,
For m = 1,
x+c=X
5x 4x 4c = X + c
We get c = 0, the lines y = x + c are not invariant.
For m = 5,
-5x + c = X
25x = 5X + 5c
Since both are just 1 equation, c is dependent of x and X, and thus y =
5x + c are invariant lines.
The invariant lines are y = x, y = 5x + c, where c is an arbitrary
constant.

TRANSFORMING LINES
Knowing how to transform points, we shall now learn how to transform lines. As
in the part on invariant lines, we substitute the parametric equation of x and y,
then we solve the equation in terms of X & Y, as the equation below

Example,
Find the image of the line y = 2 2x under the transformation

We first substitute the line into the transformation,

2x + 2 2x = X = 2
4x + 4 4x = Y = 4
The line transform into the point (x, y) = (2, 4).

Notice that in this case, the line is transformed into a point. In other cases if it
transforms into another line, remember to find an equation that relates X with Y.
You should be aware that this is the very same method you will do if you were to
find the transformation of circles, parabolas, hyperbolas, ellipses or other curves.
Make use of their parametric equations and substitute them into the equation.
Recall the parametric forms of these curves.

INVERSE TRANSFORMATION
I think I dont need to elaborate too much on this. An inverse
transformation helps us to find the object if the image is given. You find the
inverse of the matrix of transformation, and the equation will become

From here you should recall that a singular transformation has no inverse. In
other words, you cant find a matrix that transform a single point to 4 other
points, or transform a line into a pentagon.

ADDITION, SUBTRACTION, SCALAR MULTIPLICATION, COMPOSITION


The addition and the subtraction of transformations M and N,
M(x) + N(x) = (M + N) (x)
M(x) N(x) = (M N) (x)
Although is defined so, has no geometrical meaning. For example, I add a matrix
of rotation of 45 degrees with a matrix of reflection along the line y = x, gives
you some awkward transformation, which doesnt really have a relation to both.
But the scalar multiplication of a matrix does mean something,
(cM) (x) = c(M) (x)
as it has the effect of scaling. Both these operations, I assume you already know
how to do so, as this is covered in the chapter Matrices in Maths T. We are more
interested in thecomposition of transformations. Given two
transformation M and N, If the an object undergoes transformation M, then
transformation N, it can be written as

Or we could also write it as (N M) (x) = x.

I think you probably remembered in form 4 that the transformation NM means


transformM first, then transform N. This is quite straightforward, I think. In
exams, you will be asked to find the matrix of the combined transformation of 2
or more transformations. If not, you will be given the points of the object and
image, with half of the transformation, then ask you to find the other missing
transformation, as well as describing it. Just make use of what you learnt about
Matrices.
12.1 3D Vectors
This chapter will be a continuation and combination of what you learnt from the
chapters Coordinate Geometry and Vectors. As we come into 3 dimensions, we
make use of vectors as it makes our analysis much easier. Here, we introduce the
coordinate systems for three-dimensional space 2. The study of 3-dimensional
spaces lead us to the setting for our study of calculus of functions of two and
three variables later in University.
We set up the 3D coordinate system by fixing a point O in space (called
the origin) and take three lines passing through O that are perpendicular to
each other. These lines are labelled as x-axis, y-axis and z-axis respectively.
The direction of the z-axis is determined by the right-hand rule:

I think you should be familiar with this rule in Physics. When your fingers point
in the direction in the x-axis, and make it curl towards the y-axis, then your
thumb will be pointing to the z-axis. Try to get used to this setting: with the z-axis
pointing upwards, x on the left, y on the right.
A point P in space can be represented by an ordered triple (a, b,
c) where a, b and c are projections of the point P onto the x-, y- and z-axis
respectively. The three dimensional space is also called the xyz-space.
You probably should know how we represent a vector in 3D. Using the same
conventions of unit vectors i and j, we just add one more k to represent the unit
vector in the z direction (e.g., 2i + 3j 5k). Everything about a vector in 2D
works about the same in 3D. The length of a vector P(a, b, c) follows the
Pythagorean relation

And similarly, the distance between 2 position vectors A and B can be found by
the equation

Lets do a little revision on the properties of vectors, scalar multiplication,


addition, subtraction & etc. We let a, b and c be 3 vectors, k and h be 2
constants, then we have
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)

a+b=b+a
a + (b + c) = (a + b) + c
a+0=a
a + (a) = 0
k(a + b) = ka + kb
(k + h)a = ka + ha
(kh)a = k(ha)
1a = a

SCALAR PRODUCT
Scalar product, also known as the dot product, is a multiplication of 2 vectors
(a, b, c)and (d, e, f) such that

The scalar product yields an answer in the form of a scalar, which is a value
instead of a vector. In trigonometry, it can be represented by the equation
a b = |a||b| cos
I believe all these are not new to you, as you have studied it in Maths T.
However, in this section, we will be going quite detail on the algebra of vectors,
unlike in Maths T where you focused more on the applications, namely
the resultant force / velocity and relative velocity. Let us look at the
properties of scalar products. Given a, b and c are vectors, d being a constant,
we have
(i) a b = b a (commutativity)
(ii) a (b + c) = a b + a c (distributive law)
(iii) (da) b = d(a b) = a (db)
(iv) 0 a = 0
(v) a a = |a|2
We say that two vectors are orthogonal to each other when they are
perpendicular to each other. Two vectors a and b are orthogonal if and only if a
b = 0. In 3D, we say that a vector a is orthogonal to vectors b and c if a is
perpendicular to both b and c.
The component of b onto a (or scalar projection) is the resolved part of a in
the direction of b. This means that when we have 2 vectors a and b pointing at 2
different directions, with their tail of the arrow connected to each other, the

component of b onto ais the length of the orthogonal projection of b onto a.

We write the notation compa b to represent the component of b onto a, and


mathematically, it has the value

and according to the picture above, it is the length of PS.


The vector projection of b onto a is just the vector PS itself. it has the formula

We write the notation proja b to represent the projection of b onto a. Remember


that the answer is a VECTOR, not just a VALUE.
For a vector a (ai, aj, ak), The direction ratio is written as ai : aj : ak, whereby
your answer could be in the simplest form (divided by its highest common
divisor). Thedirection cosines of the vector a are

respectively.
The angle between the vector and the z-axis can be found using the equation

and therefore you can deduce the angle between the vector and the x-axis & yaxis respectively.
Recalling that the dot product of 2 vectors, a b = |a||b| cos , we can easily
find theangle between 2 vectors,

VECTOR PRODUCT
Also known as cross product, the vector product is something new for you, as it
cannot exist in a 2D plane. We define the vector product of 2 vectors (a, b,
c) and (d, e, f) to be

The cross product yields a vector (it has a magnitude and a direction), which is
orthogonal to both the original vectors. In trigonometry, the cross product a b
= |a||b| sin .

You can use the right hand rule to determine the direction of the cross product.
Point your fingers to the direction of a, curl it towards the direction of b, then
your thumb points in the direction of a b. This information is very important
we come to the section on planes.
Different from the dot product, any vector cross itself yields zero.
i i = 0, j j j j = 0, kj kj = 0
Or in other words, the cross product of 2 parallel vectors is zero. You can use
your right hand rule to verify this. For the unit vectors, you could also get the
following results:

We shall now see the properties of the cross product. If a, b and c are vectors
and d is a scalar, then
(i) a b = b a
(ii) (da) b = d(a b) = a (db)
(iii) a (b + c) = a b + a c
(iv) (a + b) c = a c + b c
(v) a (b c) = (a b) c
(vi) a (b c) = (a c)b (a b)c
(vii) (a b) a = 0
Probably (vi) is hard to remember. (vii) is just the definition of the dot product,
where the dot product of 2 orthogonal vectors equals to zero. Also take note that

the cross product is not commutative. Reversing the as and bs will result in an
extra minus sign.
The cross product has many applications, especially in physics. You use the cross
product to find the torque, magnetic force and etc. In geometry, we see that
the area of a triangle made up by 3 vectors a, b and c is

A scalar triple product of vectors a, b and c is a b c. If you might have


noticed, you have to do the cross product first before the dot product. If you did
the dot product first, then you get a scalar crossing a vector, in which by
definition, does not exist. Note also that a b c = a b c. We could
evaluate a b c using determinant

Where a = (a1, a2, a3), b = (b1, b2, b3), and c = (c1, c2, c3) respectively. We use
the scalar triple product to find the volumes of various solids. Since b c is the
base area of a solid, when dotted with another vector a, it multiplies the area
with the cosine of the height. So the formulas for different solids are as below:

1. volume of cuboid & parallelogram:

abc

2. volume of tetrahedron:

3. volume of triangular prism

4. volume of pyramid

12.2 Straight Lines


Straight lines in 3 dimensions isnt as easy as it is in 2 dimensions. When we
want to construct a straight line in space, it must be pointing at a
specific direction, and you must give at least one point that it passes through.

EQUATION OF A LINE
Let r be a line in xyz space, we let a and b be 2 vectors and t be an arbitrary
constant. The vector equation of a line can be represented by the equation

The vector a (x0, y0, z0) is a position vector. It is a point in space in which the
line passes through. Then the vector b is a direction vector. This vector
determines the direction of the line. The constant t is there, meaning that any
scalar multiplication of the direction vector, is also the same direction vector.
Summarizing it up, you actually get this:

You need some visualization here. Look at the diagram below. The green line L
first needs a point a in space. Then you need a direction vector b to tell you
where the line extends too. So if you analyse carefully, an equation of a line is
not unique. You can put in an infinite amount of different position vectors, or use
an infinite amount of direction vectors of the same ratio to construct different
line equations, which actually refers to the same line. This is unlike lines in 2D,
where a line only has one representation.

You might have also noticed that the vector equation of a line is actually
a parametric equation of a line. If you break it down,

This is where
is the position vector a, and
is the direction
vector b. Probably now you figure out why the line is not unique, since
parametric equations are not unique. By the way, we can also write the vector
equation as r = ai + bj +ck + t(pi + qj + rk). I dont like this method as we
waste too much time writing the ijks and +/- signs.
Now if we try to modify the 3 parametric equation, such that it is t in terms of
something else, we get the cartesian equation, as below:

We normally write this whole chunk of equalities without the =t, I only show it
here for clarity. A line in 3D space has 2 equal signs. So what if p, or q, or both
are 0? An example of such lines are

You might want to substitute it back into the vector equation to check this out.
You probably could have guessed why we prefer to use the vector equation
instead of the cartesian equation. WIth all these information, you should be able
to know how to construct a line equation, given only 2 points it passes through.

SKEW, PARALLEL, INTERSECT?


In 2D, lines are either parallel to each other, or they intersect. However in 3D,
there exist another relationship between 2 lines, in which they do not intersect
and are not parallel to each other. These lines are called skewed lines.
Our question is this: how do we show that whether 2 lines are parallel, intersect
one another, or are skewed?
To show that 2 lines are parallel, we show that they have the same direction
vector. The 2 lines below

are parallel, because they have the same direction vector. You can further check
whether the lines coincide (or, whether they are just both the same line). To do
this, we take the point (1, 2, 3) and substitute into (x, y, z) in the second
equation. Doing some algebra, we find that the value of s for the 3 parametric
equations are not consistent. Therefore, it does not coincide, and is a parallel
line. This method also tells us whether a particular point lies in the line. So here
we see that the point (1, 2, 3) does not lie in the second line.
To show that 2 lines intersect, we let line 1 equal line 2. We get 3 equations.
Consider the two lines below:

We have
-3 + 4t = s
-5 + 3t = -9 + 2s
-4 + t = 13 3s
If we could find a value of s and t such that it satisfies all the 3 equations, the
lines intersect. If the value of s and t contradict one another, then the lines
are skewed. We can further find the point of intersection. By using the values
of s and t, substituting them back into the initial equations, we get the
intersection point. In this case, the point of intersection is
(5, 1, 2).

DISTANCE FROM POINT TO LINE


Given a line r1 and point r2,

to find the distance from the point to the line, we want to make use of the sine of
the angle between the line r1 and the line (r2 a). Look at the diagram below.

Recalling that |a b| = |a||b| sin , the distance between the line and the

point r2 is

DISTANCE BETWEEN 2 LINES


To find the distance between 2 lines, we have 2 situations:
1. the lines are parallel

Given the two lines, we can make use of what we learnt from the part above, and
find that the distance between these 2 lines are just

2. the lines are skewed

Given 2 lines, the shortest distance between 2 skewed lines can be found
through the equation

where k is a constant. Let me explain this a little. The distance between the two
lines is r2 r2. It is parallel to the normal vector (b d), and that is why we
multiply it with k. So after setting up the equation, we get the equation c +
sd a tb = k(b d), which is actually 3 parametric equations in terms of 3
variables t, s and k. From here, we solve for s, t and k, and we multiply k to the
magnitude of b d,

and thus you get the shortest distance between 2 skewed lines.

ANGLE BETWEEN 2 LINES


Recalling the formula you learnt in the previous section,

You use this formula to find the angle between two lines, by
substituting a and b as the direction vectors of both lines. Shouldnt be a
problem for you, I think.
12.3 Planes
A plane is simply just a flat surface in space. We first start by introducing
the vector equation of a plane,

where a is a position vector, and b and c are 2 non-parallel


vectors, s and t being 2 arbitrary constants. Consider the diagram below,

We need to have at least 2 direction vectors to show the direction of the plane,
and then a point to know where does the plane lie exactly. We multiply the 2
direction vectors with different constants, to show that any direction vector
proportion to that ratio is also a direction vector. Similarly, this form of the plane
equation is not unique. Again, this form can be written in the ijk form, in which
looks ugly and long.
There is another vector equation of the plane. Though not named properly, I call
it the normal form. We first find the normal vector of a plane, i.e., a vector
which is normal to both the direction vectors. You obtain the normal vector by
getting the cross product of band c. Suppose that the normal vector is (a, b, c),
the normal form of the equation will be

Where d is constant which determines the position of the plane. d has a


significant meaning. If the normal vector (a, b, c) is a unit vector (magnitude =
1), then d is the perpendicular distance from the plane to the origin. For 2
planes, if their values of d have opposite signs, it means that they are at the
opposite sides of the origin. Finding the valued is simple: Just plug in a point
lying in the plane into x, y, z, then you get it.
If we evaluate the dot product above, we get the cartesian form,

This cartesian form is unique, unlike the other forms. This is the most common
form of the equation of planes used. You can see that this equation is linear, and
that the equation
y = mx + c, or x = a are all equations of planes in 3 dimensional space.
So to sum up, to construct a plane equation, you need one of these information:
1. 3 points lying on the plane.

2. 2 points lying on the plane, and 1 directional vector.


3. 2 lines lying on the plane.
4. a point lying on the plane, and the normal vector of a plane.
There is a fast way to get the equation of the plane when 3 points are given. I
havent tried this before, but you could make use of the determinants below to
find your equation:

LINE LIES IN / PARALLEL / INTERSECT A PLANE


We shall now discuss how to determine whether a line lies in / is parallel to /
intersects a plane. Given the equations of the line and plane to be

We first find whether the direction vector of the line is parallel to the plane. In
other words, we want to know whether the direction vector of the line
is perpendicular to the normal vector of the plane. By taking b n, if the
answer is zero, then the line is parallel to the plane. We might want to know
whether the parallel line actually lies in the plane. We can do this by
substituting the position vector of the line into r2, and if LHS = RHS, then indeed
the line lies in the plane, and is otherwise if the equality doesnt hold.
So if b n 0, this means that the line definitely intersects the plane.
The point of intersection can be found by letting r1 = r2, that is,

You should be able to solve for t, which satisfies all the 3 parametric equations.
Then finally, to find the point of intersection, we substitute t back into the line
equation to find (x, y, z).

PLANE PARALLEL TO / INTERSECTS ANOTHER PLANE


Since the cartesian equation is unique, 2 planes can only coincide one another
if they have the same plane equation. 2 planes are parallel only if they have the
same normal vector, which is also easy to find. Planes that are not parallel have

to intersect somewhere, and we can determine the line of intersection.

Consider 2 plane equations below:


We first find the common direction by using

this will be the direction vector of the intersecting line. To find a position
vector of the line, we make use of the cartesian equation of both planes,

We need to solve this system of linear equations to find x, y and z. Recall the
Chapter on Matrices, this system of equations have infinitely many solutions. As
usual, let one of them be t, solve for x, y and z in terms of t, and then just
substitute a value for t to get a random position vector. The line equation is thus
found.

DISTANCE FROM A POINT / PARALLEL LINE TO A PLANE


I think I wont prove this one, as it is similar to the proof in 2D. To find the
distance between a point (x, y, z) to a plane, make use of the equation in

your Data Booklet:


Notice that there is something different in my equation. It is -d instead
of +d because I made use of the cartesian equation ax + by + cz = d instead
of ax + by + cz + d = 0. Please DO NOT CONFUSE THEM.
If you want to find the distance between a parallel line to the plane (note that the
line has to be parallel to have a distance), you substitute the position vector
of the line (x, y, z)into the above equation, and you get it.

DISTANCE BETWEEN 2 PLANES


Given 2 parallel planes,

We can find the distance between them by finding

I will explain why this makes sense. Firstly, you should recall that the values d/|
n| ande/|n| are the perpendicular distances from the planes to the origin. Also
remembering that the distance really depends on whether both the planes lie on
the same side of the origin, or the other (same sign or different sign). You
subtract them, then take the modulus because distance is never negative.

ANGLE BETWEEN LINE AND PLANE

Consider a line with direction vector a and a plane with normal vector n. The
angle between the line and the plane can be found by using the equation

Note that if you used cos , you would have gotten the angle between the line
and the normal vector instead.

ANGLE BETWEEN PLANES

The angle between 2 planes is actually the same angle between the 2 normal
vectors. So given 2 planes with normal vectors m and n respectively, we can find

the angle between 2 planes by using the dot product,

Recall that this is the same formula to find the angle between 2 lines.

Now that you know how to construct planes, you might be curious as in how 3D
shapes are constructed. Again, you could make use of the applet I shared with
you in the previous post, from the drop down menu of new graph, choose z =
f(x, y) surfaces. Fiddle around it and have fun creating awkward shapes. This is
obviously out of your syllabus, but let me just give you some equations for some
very common shapes in 3D:

cylinders,
paraboloid,
x2 + y 2 = r 2
2
y

elliptic paraboloid,

ellipsoid,
cone,
hyperboloid
ax2 + by2 + cz2 = 1
x2 + y 2 z2 = 1

hyperbolic

z = x2 + y 2

z = x2

elliptic
x2 + y2 z2 = 0

13.1 Random Samples


In statistics, we are always interested to get information from a particular group,
be it people, animals, or even non-living things. This group of interest is what we
called as apopulation. A population is a particular group which we need
information about in a statistical enquiry. A population can be very big, for
example, the amount of hair growing on ones head, or the amount of people in a
country. So some times, we could only gather information from a sample of
people. A simple random sample is a sample of size n if all possible samples
are equally likely to be selected. So here, we differentiate the terms population
and sample, as the sample being the subset of a population.
A parameter is an unknown or known numerical characteristics of a population,
such as the mean and the standard deviation . A statistic is a value
computed from a sample such as mean xM and standard deviation s. Notice the
symbols for both cases are different, and we will make use of this convention. So
here we can conclude that the parameter is the actual value of a population,
while the statistic is a value obtained from samples, which is supposed to be
quite close in value to the parameter.
In order to get the information required, we need to do surveys. There are 2
main kinds of surveys:
1. Census
A census is done to survey on every single member of a population. For a
country, they need to do a census to count how many people are there in it. Or in
a class, we need everyone to submit their health report, in order to know which
blood type do the students belong to. However, there are situations that the
census cant be used. In infinite samples, for example, we have an infinite
number of stars, and we cant measure the brightness of every star to find its
mean brightness or distance from the earth. Another example, is testing the
durability of light bulbs. To test the average lifespan of light bulbs, you cant test
every light bulb, if not, youll destroy the population!
2. Sample Survey
A sample survey is done by interviewing / collecting data from only a small group
of members within the group, which is the sample. A sample is always less than
100% of the population. For example, we do a survey on 100 residents in
Petaling Jaya, to see whether they like it if we replace the McDonald outlet in SS2
with an A&W outlet.
Both the census and a sample survey have their advantages and disadvantages.
To sum up, a census is good for a small population, and a sample survey is more
suitable for a big population. Look at the table below:

Before you start sampling, you need to do a few things. First, you need to identify
thetarget population, as in where and who do you want to interview. Next, you
determine the sampling units, the people / item to be sampled. If your
population is all the primary schools in Malaysia, is your sampling unit the
student, the teacher, or the canteen waiter? You have to make it clear. Then, you
need a sampling frame. You need a list in which the sampling units within a
population are individually named or numbered. Of course the list cannot be
complete, or sometimes just couldnt be generate, as the list of units will change,
move in and out, or maybe if they are fish in a pond, they couldnt be listed
down!
Once you are done, you can start your survey.

Knowing that we can start surveying, we need to know the possible sampling
methods. We shall not focus on census in this chapter (the title says it). Now we
shall look into a few types of sampling methods:
1. Random Sampling
I believe you are familiar with the term random. It means that you do not
choose a sample on purpose, you just simply pick one. There are 3 kinds of
random sampling:
Simple Random Sample
As its name suggest, it is simple, you dont need to do any homework to get
that sample. You could draw lots, use a random number to choose which unit you
want to take the survey. You can make use of a random number table to

choose your units. It acts as a large dice, and looks something like the one below:

You can use numbers from left to right, following the numbers given. Or you
could also close your eyes, and use a pencil to point on a number on the table.
For example, in a group of students numbered 1 to 100, you want to choose 5
random students. You can take 2 digit numbers starting from the left of the table,
namely 82, 03, 14, 58 and 21 to be the students you want.
You could actually use your calculator as a random number generator. On
yourCASIO fx-570MS, press shift - Ran#, then you will get a random number,
3 decimal places, between 0.000 to 1.000. You can use multiplication or division
to manipulate the random number to the range you want.
Note that there exist 2 kinds of simple random samples, one with replacement,
one without replacement.
Systematic Random Sample
In systematic sampling, you make use of a certain pattern, a certain sequence
to find your samples. For example, in a list of 1000 people, you take every kth
person to take the survey, depending on your sample size.
Stratified Random Sample
In a stratified sample, there are many distinguishable layers. For example, in a
population of people, they have different age groups, they have different
occupations and etc. We take a few units from different age groups, and combine
them in one sample in the end.
2. Non-Random Sampling
I think I dont need to elaborate much on this. It is not random, and therefore you
choose a unit with a solid and particular reason. There are 2 kinds over here:
Clusters
Clusters are like natural sub-groups of a population. For example, in a primary
school, there are 6 classes in standard 1, with all the kids having the same
status. Note that this differs from stratified random sample, since stratas are
different, and classes are alike. You choose to study on one cluster, which means
that you didnt randomly pick students from any class in the school. You save a
lot of effort, time and money, as you dont need to pick the survey forms from
every class or so.

Quotas
Quota sampling is widely used in market researches where the population is
divided into groups in terms of age, sex, income level and etc. Then when you
are about to survey, you already have your plans in mind: I want to survey one
person who has high income, has a big family, and another one with low income,
with a small family and etc. You already set specific requirements for the
members of the population that you are about to interview or collect data from.
All these sampling methods have their pros and cons. I summarize them in the
table below:

In every survey, there will sure be some sources of bias. Obviously, when you
are collecting data from a population, you want it to be as accurate as possible,
and thus should eliminate any bias in the process of sampling. These biases will
cause the survey or data collection to be very inaccurate, and give a wrong
picture of what the population really is. Examples of sources of bias are:
1. lack of good sampling frame
Its like using a list of friends generated from your Twitter account. You will miss
out those friends who dont use Twitter. You need a good sampling frame in order
that everyone has an equal chance of being sampled.
2. wrong choice of sampling unit
In surveying on who has a car at home, you chose the wrong sampling unit
people, since a better sampling unit would be household, since children dont
drive.
3. no response by some chosen units
Some people just choose to answer your survey questions for God-knows-what
reason. Then, your questionnaire might have some questions in which they dont
have much choice to answer with. For example, they dont respond the question
do you like Subway Sandwiches? Yes / No when they dont even know that such
outlet exist.
4. introduced by the person conducting the survey
The person conducting the survey might already have a conclusion in mind, and

tries to make his survey results to suit his mindset. For example, on the question
Which party will do a better job in the next General Elections? If the surveyor is
a Pakatan Rakyat supporter, he might influence the person taking the survey to
agree with his stand.

SIMULATING RANDOM SAMPLES


There are many ways to get random samples, just like what we did above. We
used a random number table, or using the random number generator from the
calculator. But now, we want to simulate random samples from
a given distribution. There are 2 kinds of distributions that we can obtain a
simulated random sample:
1. Frequency Distribution
A frequency distribution looks something like this:

It has a value x and a frequency. Lets say, I would like to generate a sample of
size 6 from this population. For data like this, we could not just simply use a
calculator to randomly get the numbers 1 to 4 as our sample. It has a frequency,
or rather a weightage of how we should randomly choose the numbers. So what
we can do is we can tabulate a table, making use of its cumulative frequency.

Using this table, we can finally tabulate the random sample. For example, now
that we have a random number as 04938581365399, so we can get the numbers
4, 93, 85, 81, 36, 53, which corresponds to the values of x being 1, 4, 3, 3, 2,
3 respectively. We have finally got our random sample from the frequency
distribution.
2. Probability Distribution
The method is the same as the above, we create a cumulative frequency, and
change the base to be over 1, then use the generated random numbers to find
the random samples. There are a few kinds of probability distributions:
probability distribution

This one is not hard. We find the cumulative frequency, then

Binomial distribution X ~ B (n, p)


Hope you still remember the formula, P(X = x) = nCxpxqn-x. For example, we
take
X ~ B (3, 0.4), then we have

Poisson distribution X ~ P0 ()
The formula is

We tabulate the table for X ~ P0 (4)

Probability density function


It can be something like

We should find its cumulative density function,

From here, we let the random generated number 0 x 1 equal to that


function, and find x inversely.
Normal distribution X ~ N (, 2)
Making reference to the formula

We let the random generated number 0 x 1 equal to the cumulative


probability of the normal distribution. Then by using normal tables (or your
calculator), you can find z, and therefore x.
13.2 Sampling Distributions
When we are in the process of finding sample means, or standard deviations, we
might also want to know how the data are distributed. So following the few
distributions that we have learnt, being Binomial, Poisson and Normal, we are
learning a new one here: TheSampling Distribution of means.

SAMPLING DISTRIBUTIONS OF A SAMPLE MEAN


Before we start, we need to recall some information on expectation algebra. We
remember that in a population, the expected value E(X) is actually the mean
itself, , while the expected variance Var(X) is the variance of the population
itself, 2. So now, we are going to find the expected value of a sample
mean, E(XM).
We all know that the mean of a sample of size n can be represented by the
equation

where x1, x2 and etc are independent observations in the populations. So we


further find that the expected value of sample mean is

which is actually the same value as the population mean. What this means is
that the sample mean estimated should have the same value of the population
mean. We will then find that the sample variance has a different value from the
population variance. Using the fact that

we find the sample variance to be

So the standard deviation of the sampling distribution is

which we call as the standard error of the mean. However, remember that
this standard error is for samples with replacement. For samples without
replacement, the variance would be

Where N is the size of the finite population, and n being the sample size. I do not
know how to derive this, and I dont think it will appear in exams. I put it here for
your reference.
So now, for every time when we have a normal distribution X ~ N(, 2), we
have a sampling distribution of

Consider the distribution X ~ N(100, 64)

and consider the following:

Notice that the sample size affects the sampling distribution. So now to answer
questions, unlike Maths T, you have to be very particular as in whether it is
talking about a population or a sample. Let me give you an example:
The volume of wine in bottles are normally distributed with a mean of
758ml and a standard deviation of 12ml. A random sample of 10 bottles
is taken and the mean volume found. Calculate the probability that the
sample mean is less than 750ml.
Let X be the volume of wine in bottles.
X ~ N(758, 122)
Since X is normally distributed, then the sampling distribution with n =
10,
XM ~ N(758, 122 / 10)
XM ~ N(758, 14.4)
P(XM < 750) = P(Z < 2.108)
= 0.0175
I assume that you have fully studied the chapters Discrete Probability
Distributions & Continuous Probability Distributions in Maths T. So now you know
the difference between samples and populations, the final answer will be
different if you used the wrong distribution.

We were assuming that the sample was taken from a population which follows
the normal distribution. So what if it isnt? Maybe, the sample was taken from a
Binomial, Poisson or even a Uniform distribution?
Lets do a little experiment. Suppose you have an unfair coin, such that every
time you toss it, it has 25% chance of getting a head. So if you toss it 10 time,
you get a binomial distribution, X ~ B(10, 0.25). We plot the probability graph
below. The red bars are the Binomial plots, while the blue line is the normal

approximation.

So now, we do the sampling distribution of XM. That means, we do the experiment


various times, get different means, and tabulate them as a distribution. If we do
it 30 times (sample size of 30) we get a graph like below:

then 50 times, we get

It gets closer to a normal distribution, doesnt it?


Now we try a Poisson distribution, probably the average amount of monkeys
seen along the road everyday is 4, then X ~ P0(4). So the probability of
seeing n monkeys a day can be tabulated as follows:

Again, we get into serious investigation to see how many monkeys appear
everyday, and we get the means for 30 times, and we find the sampling

distribution of XM to be as follows:

Once again it is close to the normal blue curve. Remember that the y-axis stands
for probability. So this sampling distribution simply tells us the probability of the
mean monkeys seen on the road daily, with a sample size of n.
We try now for a uniform distribution. A uniform distribution X ~ R(a, b) means
that X is uniformly distributed with a range of a x b. It has the following
expectation and variance:

Assume X ~ R(0, 27), representing the probability of getting a number between


0 to 27 in a lucky draw to be equal. We can plot its distribution as

then again, we find the sampling distribution of XM. We do 30 sample, and we find
that actually, it looks like a normal distribution!

All these graphs are done with this applet. So after doing all these, we find that
the sampling distribution taken from distributions not normally distributed,
the sampling distribution takes the normal shape as the size increase. In other
words, for large sample size n, it is approximately normal. And here, we
introduce the central limit theorem:
When samples are taken from a non-normal population with known
variance2 then for large values of n, the distribution xM is approximately normal
such that

In statistics, we define a large sample to be n 30. You will be using this


convention for the rest of the chapters. Let me show you an example of the use
of central limit theorem:
The average number of telephone calls made in an evening to a
counselling service is 4.5 calls. 30 random observations are taken, and
find the probability that the sample mean exceeds 5.
X ~ P0(4.5)
Since n 30, by central limit theorem, XM is approximately normal, so
XM ~ N(4.5, 0.15)
P(XM > 5) = P [Z > (5 4.5) / 0.15] = P (Z > 1.291) = 0.098

SAMPLING DISTRIBUTIONS OF SAMPLE PROPORTIONS


Suppose a random sample of n observations is taken from a population in which
the proportion of successes is p and the proportion of failures is q = 1 p.
If X is the number of successes in the sample, then X follows a binomial
distribution,
X ~ B(n, p). You should recall that E(X) = np and Var(X) = npq. Using the
same method how we find the expectation of sample mean XM, now we use it find
the expectation of the sample proportion Ps .
We know that

So finding E(Ps) and Var(Ps), we get

This in turns give us the distribution of sample proportion,

and we define the term

as the standard error of proportion.


When using a distribution of sample proportions, we need to put continuity
correctionsinto account (try recalling what you learned in Maths T). For this
case, the continuity correction is

Ill show you an example:


It is known that 3% of frozen pies delivered to a canteen are broken.
What is the probability that, on a morning when 500 pies are delivered,
5% or more are broken?
Let p be the probability that a pie is broken, p = 0.03.
Let Ps be the proportion of pies in the sample that are broken.
q = 0.97, n =500, we have
Ps = N(0.03, 0.0000582)
P(Ps 0.05) = P(Ps 0.05 0.001) [continuity correction, as calculated]
= P(Ps > 0.049) = P(Z > 2.491) = 0.0064
if you could have noticed, there is another way of solving this solution, just by
using Binomial Distribution alone.
Let X be the number of broken pies in the sample.
X ~ B(500, 0.03)
Since n 30, np, nq > 5, it is approximately normal.
X ~ N(np, npq)
X ~ N(15, 14.55)

500 5% = 25
P(X 25) = P( X > 24.5) = P(Z > 2.491) = 0.0064
If I were you, I would choose to do the second method. However, in exam
questions, if you were asked to find the proportion, then you better do the first
method to avoid deduction of marks. Note that in either cases, this sample of
proportion can only be used for large sample size n.
13.3 Point Estimates
To define a certain distribution, be it Binomial, Poisson or Normal, you need to
know their population parameters. And of course, if you dont know the
parameters before hand, you would want to use sampling to estimate it. This
estimate is unbiased if the average (or expectation) of a large number of values
taken in the same way is the true value of the parameter. The best way to
estimate these parameters is by using one with the smallestvariance.
So here in this section, we are focusing on point estimates. We estimate that
the parameters are those points or data that we collected through the samples.
Look at the 3 equations below.

We denote an unbiased estimate with a cap. So the unbiased estimate of the


proportion of success in the population, the population mean and population
variance are denoted by pp , p and p 2 respectively. It is found that, the best
unbiased estimate for the population proportion and population mean, are the
sample proportion ps and the sample mean xM themselves. However, the best
unbiased estimate for the population variance is denoted differently, with the
above formula. The formula for the expected variance can also have the
following forms:

That is all you need to know about this section. Let me give you a short example:
The concentrations, in milligrams per litre, of a trace element in 7
randomly chosen samples of water from a spring were
240.8, 237.3, 236.7, 236.6, 234.2, 233.9, 232.5
Determine the unbiased estimates of the mean and the variance of the
concentration of the trace element per little of water from the spring.
To answer this question, we need to make use of our calculator. Set your CASIO
570MS to SD mode, and input all the data into it, by pressing the individual
numbers in, every time followed by the DT button, until you finished inputting
everything. Next, you press
shift+S-VAR. It gives you the option of xM , xn and xn-1. The first one gives

you the unbiased estimate of the mean, while the last one will give you the
unbiased estimate of the standard deviation. Just show them a little working
even though you know the answers straight away:

13.4 Interval Estimates


Point estimates might not be accurate. There is always a possibility that the
unbiased estimate of the population mean is far away from the actual mean.
Another way of finding this value is to construct an interval, known as
a confidence interval. This confidence interval tells us that there is a certain
probability that the unbiased estimated mean will lie within it. We usually write
this interval in terms of (a, b), where the terms a and b are theconfidence
limits, or end-values of the interval. Consider a normal curve:

We define a confidence interval in terms of percentage. For example, a 95%


confidence interval, like the one above means that there is 95% probability that
the population mean lies in the interval. Here we shall learn how to construct a
confidence interval for a population proportion and a population mean.

CONFIDENCE INTERVAL FOR POPULATION PROPORTION


Here you want to find p, the proportion of successes in a particular population.
You take a sample of size n, and then find the best unbiased estimate pp . You
need to recall quite a lot of information from the last 2 parts, putting in mind that
when we are dealing with population proportions, whether it comes from a
normal or non-normal distribution, it must be done with a large sample (n
30). Recalling that the sampling distribution of population proportion is

The confidence limit will be

and the confidence interval will be

Okay, I need to explain this a little. If you would have observed closely, the
confidence interval is constructed by the unbiased estimate of population
proportion, the standard error. The term a determines the percentage of
interval you wanted. This value a, can be obtained from the normal tables (or the
Buku Sifir given in STPM). It looks something like this:

Ill teach you how to read this table, in the example below:
In order to assess the probability of a successful outcome, an
experiment was performed 200 times. The number of successful
outcomes was 72. Find a 95% confidence interval for p, the population
proportion of success.
We start by listing down the important values: ps, qs and n, and the distribution.
ps = 72 200 = 0.36, qs = 0.64, n = 200
Ps ~ N (0.36, 0.001152)
To find a, we refer to the table. Note that the table was written for lower tail
probability
P (Z a), but we are looking for P ( a Z a). So a central 95% of the
distribution, should have an upper and lower tail of 2.5%. This table might help
to explain a little:

The diagram on the left shows the lower tail probability, which is what the table
in your Buku Sifir gives. We want to find the one on the right, in which by looking

at the position of the red lines, you know that definitely are different. So here,
the value of a comes from the column 0.975, which is 1.960. So your
confidence interval shall be
( ps 1.960.001152, ps + 1.960.001152 ) = (0.622, 0.738)
You might have probably noticed that the continuity correction is omitted. Yes,
this is indeed the case. You need to get used to reading the table to prevent
yourself from using the wrong value of a. A 90% interval means that it has a
lower tail probability of 95%, a80% interval means that it has a lower tail
probability of 90% and etc. To make things faster, I suggest you memorize the 4
most common percentage intervals:
90%
95%
98%
99%

confidence
confidence
confidence
confidence

level
level
level
level

1.645
1.960
2.326
2.576

CONFIDENCE INTERVALS FOR POPULATION MEAN


This section is not so straight forward. Although it shares a lot of similarities with
the part above, the construction of confidence intervals for population mean
depends on thevariance (known or unknown), the distribution (normal or
non-normal) and itssample size. So in this section, there are 5 cases:
1. Normal with known variance 2
The sampling distribution will be

using the best unbiased estimate of population mean xM = , the confidence


interval is

2. Non-normal with known variance 2 (n 30)


In this case, the sample may be taken from a Binomial or Poisson distribution.
Since the sample size is large, according to the central limit theorem,
we approximate a normal distribution.
X ~ B(n, p) becomes X ~ N(np, npq)
X ~ P0() becomes X ~ N(, )
X ~ R(a, b) becomes X ~ N( (a + b), 1/12 (b a)2 )
and etc. From here, after finding the sampling distribution XM, again using the
best unbiased estimate of population mean xM = , the confidence interval is the

same as above,

3. Normal with unknown variance 2 (n 30)


The method of solving this is just the same as method 1, but here we do not
know the population variance. Using the unbiased estimate of population mean xM
= , and the unbiased estimate of population variance,

Our confidence interval will be

or we could rewrite it in terms of s,

4. Non-normal with unknown variance 2 (n 30)


Similar to method 2, we approximate a normal distribution, and after finding
the sampling distribution XM, we use the unbiased estimates p and p , we use the
same equation for confidence interval as the method 3,

5. Normal with unknown variance 2 (n < 30)


It is interesting to note that when the sample size is small, the sampling
distribution

does not follow a normal distribution. Instead, it follows a t-distribution.

The distribution of T is a member of t-distributions. All t-distributions are


symmetric about zero and have single parameter (pronouced new) which is a
positive integer. is known as the number of degrees of freedom of the
distribution and if, for example, Thas a t-distribution with 5 degrees of freedom,
you would write T ~ t(5). For a sample sizen, it can be shown that T follows a t-

distribution with (n 1) degrees of freedom. Take a look at the t-distribution


curves below.

Notice that we only use the t-distribution when the sample size is small, and
therefore, when t tends to infinity, it will look like a normal curve. In other words,
nothing much has changed, we are just using a new distribution for small sample
size. After knowing that our sample size is small, we use the t-distribution using
(n 1) degrees of freedom, use the unbiased estimates for both the mean and
the variance, and our new formula will be

where t can be obtained from the t-distribution tables. It looks something


like this. The way you use it is exactly the same as the critical values for the
normal distribution, its just that there is a column of degrees of freedom.

HOW TO SOLVE EXAM QUESTIONS


It isnt hard. All you need to do is to identify the quantities stated in the question,
and youll classify whether you should solve the question using which one of the
5 methods. Ill put here a few example of questions, and show you how to
analyse them:
A plant produces steel sheets whose weights are known to be normally
distributed with a standard deviation of 2.4kg. A random sample of 36
sheets had a mean weight of 31.4kg. Find the 99% confidence interval
for the population mean.
It is normally distributed, variance = 2.42kg (known), sample size = 36, sample
mean = 31.4kg. Use method 1.
The heights of men in a particular district are distributed with mean
cm and standard deviation cm. On the basis results obtained from a
random sample of 100 men from the district the 95% confidence
interval for was calculated and found to be (177.22cm, 179.18cm).
Find the value of the and xx .
Unknown distribution, variance known, but sample size large. Approximate
normal, method 2. You need to work backwards using the confidence interval
formulas, get 2 simultaneous equations, and solve for and xx . Give it a try.
The fuel consumption of a new model of car is being tested. In one trial,
50 cars chosen at random, were driven under identical conditions and
the distances, x km, covered on 1 litre of petrol were recorded. The

results gave the following totals:


x = 525, x2 = 5625
Calculate a 95% confidence interval for the mean petrol consumption,
in km/l, of cars of this type.
Unknown distribution, variance unknown, big sample. Approximate normal, use
unbiased estimate of population variance (you have to calculate it this time), use
method 4.
A sample of 8 independent observations of a normally distributed
variable gave the following values: 3.6, 3.9, 4.5, 3.8, 4.4, 4.9, 4.2, 3.8.
Determine a 99% confidence interval for the population mean .
Normal distribution, unknown variance, and small sample. Method 5. In your
question, you need to write these sentences very clearly:
since n < 30, a t(n-1) distribution is used. T ~ t(7)
Then you continue to find the confidence intervals.

1. interval width
The width of a confidence interval can be obtained from the expression

Also remember, when the width is increased, then either


a. the sample size n increases,
b. the confidence interval decreases, or
c. the variance decreases.
2. Assumption
Many times you might be asked, state the assumptions you made. You
probably only have one assumption, which is: we assume that it is a random
sample.

To summarize this section, I made a chart for you to remember things easier.

Take note that this is the most important section of this chapter. Be sure you are
clear with all the distributions, dont confuse the sample size n with the number
of trials n in a Binomial distribution, and practise more on population
proportions.
14.1 Hypotheses
Lets imagine this story.
One day in town, you met this awkward looking Mathematics tuition teacher. He
brags that 95% of his pupils get As for their Mathematics T in STPM every year.
Since you love Mathematics so much, you thought that maybe you might want to
take his tuition class. But being sceptical in nature, you were wondering whether
95% of his students getting As, is a little too much. So you decided that you
want to put this teacher to a test. You managed to get some information from 15
of his ex-students, and find out that 11 of them got A for Maths T in the previous
year.
Now your question is: is the Maths tuition teachers claim, a little bit overboard?
Is 11 out of 15, 95%? Obviously it isnt, but since you are only taking a sample,
you cant be sure that you are right. What if there were 13 or 14 students got
As? You know that if 2 or 3 students got As, he is definitely lying. Then how
about 10 students? 8 or 9 students? There must be a cut off point, such that you
are VERY SURE that he is lying, or not. Isnt it?
Or lets think of another story. Suppose you are an athlete, participating in the
MSSM 400m race. You find that every time, your running speed follows a normal
distribution with a mean of 40km/h. Bored of running everyday, you decided to
test whether drinking 2 cups of milk in the morning everyday helps improve your
running. So after drinking milk for 5 days, you find your mean speed turned out
to be 40.9km/h.
Again you question yourself: did you really improved? Well, it might so happen
that you run a little faster this time, and has nothing to do with the milk. You
might also be wondering, how much increase in speed is considered as
improve? You need a cut off point, again.

NULL AND ALTERNATIVE HYPOTHESIS


If you didnt notice, you were actually making hypotheses, or a significance
test. You were trying to test a hypothesis, to determine whether you can
conclude something. You were testing whether the 95% students get As and the
improvement in running is true. The initial assumption is what we called as
a null hypothesis, H0. It is very important as it provides the model for the
calculations. The null hypothesis for the first case is 95% of the students get As
for Maths T. If your results show that indeed 95% of the students get As in
Maths T, then your hypothesis is true. The case is this: you cant reject his claim
if you dont have enough evidence to do so. If after your test, you have enough
evidence to reject his claim, then you need an alternative hypothesis, H1. The
alternative hypothesis for this case is less than 95% of the students get As in
Maths T. This is a binomial problem, so in Mathematical terms, we have
H0: p = 0.95
H1: p < 0.95
Notice that you are only interested in whether the probability is less than 95% or
not, so this means that we are interested in the left hand end of the distribution.
This is known as the lower tail. In the second case, we are interested in
the upper tail, as in whether you have improved or not. There are cases that
you want to know whether there is change in the values, e.g. whether there is a
change in supporters for Barisan Nasional, Pakatan Rakyat or etc. For this case,
we use a two-tailed test.

TEST STATISTIC & TEST VALUE


So now you have a null and alternative hypothesis. The next thing you need is
a test statistic. A test statistic is the variable X that you are looking for. In the
first case, you are looking for the number of students who get As in Maths T,
while in the second case, it is your running speed. A test value is found when
you have conducted the experiment. The test value in the first case is 11, as you
found out of 15 people, while in the second case, is 40.9km/h. You definitely want
to know what you are experimenting on, dont you?
14.2 Critical Regions
You previously learnt how to formulate a null and alternate hypothesis, and
determine your test statistic and test value. With these information is still not
enough. We shall now proceed to setting the significance level, and determining
the critical region.

When making a hypothesis test, you have to make a decision about


the significance level, which is the value of the probability that is considered to
imply an unlikely or rare event. As a guide, events that have a probability of 5%
or less are regarded as unlikelyand events having a probability of 1% or less are
regarded as very unlikely. Other significance level used are 10% and 2%

respectively. Try not to confuse this with what you learned in the previous
chapter, which was confidence intervals, in terms of 90+%.
Lets say, in a test of 10 true-false questions which were written in Hindi, your
friend got 6 questions correct, and you want to know whether he was guessing,
or he really studied Hindi. You formulate the hypotheses as below:
H0: Your friend is guessing. He makes use of the 50% luck.
H1: Your friend seriously studied Hindi before. He scores more than
50%.
Mathematically, this is a binomial problem again, X ~ B(10, 0.50).
H0: p = 0.50
H1: p > 0.50
Notice that the expression for H0 always has an =' sign, while H1 should have
either <, > or signs. To start our test, we need to define our significance level.
We can say, for example, that we want to test at the 5% level, that he could have
obtained this score by guessing all the answers. We can also choose to test at
1% level or 10% level, and obviously, you might get different results.
So from here you can see that in the last section, you cant get any answer if you
dont set a significance level. You cant say how much you have improved in your
running, unless you state that an increase in 5% is significant, or if I run faster
by 10%, then there is significant improvement. With this significant level, then
only our hypothesis could be done. For the example above, say, we want to test
it at 5% level. We first need to find out the probability of how many questions he
get correct. We plot a cumulative binomial distribution
X ~ B(10, 0.50).

Notice that it is a decreasing cumulative Binomial plot.

This curve tells us the probability he gets n questions correct. So we see that,
there is 99.9% probability that he gets at least one question correct, and 62.3%
probability that he gets at least 5 questions correct etc. Even if your friend gets 8
questions correct, there is 5.5% probability that he is guessing, which is still
above our required significance level. So here, if he gets 9 questions correct, it
must be really a rare event, as he has only 1.1% probability of getting this score
if he was guessing. We say that the numbers 9 and 10 lie in the critical
region, which is the group of observations that are considered to be unusual or
unlikely (rare) events. We also say that number 9 is the critical value, or cut-off
point, since anything above it is considered a rare event.
So what can we conclude from here? We can see that if your friend got 0 to 8
questions correct, we have no evidence, saying that he did studied Hindi, as
these are not rare events (they are > 5% probability). We say that the null
hypothesis H0 is not rejected, which is the case. But if he gets 9 or 10
questions correct, we say that there is evidence, at 5% significance level, that
your friend did study Hindi. In other words, the null hypothesis H0 is rejected
in favour of the alternative hypothesis H1.
Notice that if we did a 10% significance level test, number 8 now lies in the
critical region! So this is actually very subjective, and it really depends on you (or
the question in your test paper) to determine what is considered significant and
what is not.

Let me sum up what you understand about hypothesis testing by now:


1. To test something, you need to first define your null hypothesis H 0, something
that is
claimed, or happening.
2. Then you define your alternative hypothesis H 1 just in case H0 is not true.
3. Find your test statistic, test value.
4. Try to identify what kind of distribution it is from.
5. Determine a significance level to reject or accept the claim.
6. Plot or use a given cumulative distribution to find the critical regions.
7. Determine whether the test value lies in the critical region. If yes, then H 0 is

rejected. If not,
H0 is accepted.
This is only a rough idea of how a hypothesis is about. You might still be a little
confused about what is happening, I have to apologize for that, because I break
down this chapter in quite a weird way. 14.3 Tests Of Significance
A Hypothesis Test is a Test of Significance. In this section, we will be looking
at all the possible types of hypothesis tests that can be made in STPM. Before we
start, every hypothesis test follow a general rule. You need to state these 7 steps
(or workings) in your answer sheet:
1. Define the variable X.
Let X be ,
X ~ B(n, p) / X ~ N(, 2) / X ~ P0()
2. Define H0 and H1.
H0: p / / / 1 2
=
?
H1: p / / / 1 2 <, >, ?
3. Write down the case if H0 is true.
If H0 is true, then p / / / 1 2 = ?
and X ~ B(n, ?) / X ~ N(?, 2) / X ~ P0(?)
4. Define your type of test and significance level.
Use a upper / lower / two tailed test, at ?% level.
5. Set the criteria to reject H0.
Reject H0 if P(X x) < ? / P(X x) < ? / z < ? / z > ? / |z| > ? / T < ?
6. Do the calculations.
P(X ?) = ? / P(X ?) = ? / z = ? / T = ?
7. Conclude your results.
Since P(X x) = ? / P(X x) = ? / Z = ? / T = ?, x lies / doesnt lie in the
critical region.
H0 is rejected in favour of H1 / not rejected. We conclude that . at
?% level.
If you have all these 7 steps on your answer sheets, then you will probably get
90% percent of the marks. Dont make calculation mistakes though.

TYPES OF SIGNIFICANT TESTS


In this part, there are 12 kinds of significant tests that you might face, be it lower
tail, two-tailed or upper tail tests. I will go through this section with an example
for each one. Questions are in blue, answers are in red:
1. Binomial Proportion p (n < 30)
A certain type of seed has a germination rate of 70%. The seeds undergo a new
treatment after which 9 germinates in a packet of 10 seeds. Test, at the 5%
level, whether this is evidence of an increase in the germination rate.

Let X be the germination rate of a certain type of seed, X ~ B(10, p)


H0: p = 0.7 [the germination rate is 70%]
H1: p > 0.7 [the germination rate increases]
If H0 is true, then p = 0.7, and X ~ B(10, 0.7)
Use an upper tail test, at 5% level.
Reject H0 if P(X x) < 0.05 [0.05 stands for 5%]
P(X 9) = P(X = 9) + P(X = 10) = 0.1211 + 0.0282 = 0.1493 = 14.93%
Since P(X 9) = 14.93%, x doesnt lie in the critical region.
H0 is not rejected.
We conclude that there is no evidence that there is an increase in germination
rate, at 5% level.
A binomial proportion with small sample isnt hard. The thing that bothers you
might probably be the calculations of P(X 9). Remember the formula for
Binomial distribution,nCxpxqn-x.
2. Binomial Proportion p (n 30)
For this case, an approximation to the normal distribution is used. Remember the
continuity correction is used, such that it lies in the critical region.
A manufacturer claims that 8 out of 10 dogs prefer its brand of dog food to any
others. In a random sample of 120 dogs, it was found that 85 appeared to prefer
that brand. Test, at the 5% level whether you would accept the manufacturers
claim.
Let X be the number of dogs which prefer the manufacturers brand of dog food,
X ~ B(120, p)
H0: p = 0.8
H1: p 0.8 [notice that we are using the sign. This is because we are testing
whether the claim is exactly correct. That means, the claim is wrong if more than
8 dogs like the brand, and also if less than 8 dogs like the brand.]
If H0 is true, then p = 0.8 and X ~ B(120, 0.8)
Since np > 5, nq > 5, then X is approximately normal,
X ~ N(np, npq), which is X ~ N(96, 19.2).
Use a two-tailed test, at 5% level.
Reject H0 if |z| > 1.960 [Still remember how to get this value 1.960? Remember
that a two-tailed test at 5% means that both ends of the bell curve has 2.5%
each. Refer to the critical values for the normal distribution at the end of this
post.]

[85.5, continuity correction, such that it lies in the critical region, that means you
correct it such that the value is nearer to the critical region.]
Since z = 2.396, z lies in the critical region.
H0 is rejected in favour of H1. There is evidence that the proportion is lesser, and
therefore the manufacturers claim is not accepted, at 5% level.
3. Poisson Mean

The number of white corpuscles on a slide has a Poisson distribution with mean
3.5. After treat, a sample was taken and the number of white corpuscles was
found to be 8. Test at the 5% level of significance, whether the number of white
corpuscles has increased.
Let X be the number of white corpuscles on a slide, X ~ P 0().
H0: = 3.5
H1: > 3.5
If H0 is true, then = 3.5, and X ~ P0(3.5).
Use an upper tail test, at 5% level.
Reject H0 if P(X x) < 0.05.
P(X 8) = 1 P(X < 7) = 1 0.9733 = 0.0267 = 2.7% [I hope you remember the
Poisson formula. In some formula booklets, there are Poisson cumulative
probability tables, they help too.]
Since P(X 8) = 2.7% < 5%, x lies in the critical region.
H0 is rejected in favour of H1. There is evidence, at 5% level that the number of
white corpuscles increased.
Not a hard one, I suppose. Remember that if > 5, you can actually make an
approximation to the Normal distribution, X ~ N(, 2).
4. Population Mean (Normal, 2 known)
A machine fills cans with soft drinks so that the volume of liquid in the cans
follow a normal distribution with mean 335ml and standard deviation of 3ml. A
setting on the machine is altered, following which the operator suspects that the
mean volume of liquid discharged by the machine into the cans has decreased.
He takes a random sample of 50 cans and finds that the mean volume of liquid
in these cans is 334.6ml. Does this confirm his suspicion? Perform a significance
test at the 5% level and assume that the standard deviation remains unchanged.
Let X be the volume of liquid in the cans, X ~ N(, 3 2)
H0: = 335
H1: < 335
The sample size is 50, Xx ~ N(, 32/50) [recall what you learned in the previous
chapter]
If H0 is true, then = 335, and Xx ~ N(335, 9/50)
Use a lower tail test, at 5% level.
Reject H0 if z < 1.645

Since z = 0.9428 > 1.645, z doesnt lie in the critical region.


H0 is not rejected. There is no evidence to confirm the suspicion of the operator,
at 5% level.
For hypothesis type 4 to 8, you might want to recall what you learn in the
previous chapter. Remember when to use t-distribution, when to approximate
normal and etc. These few types make use of the sampling distribution.

5. Population Mean (Non-normal, 2 known)


I think I dont need to show you an example on this one. It is similar to number 4.
You make that non-normal distribution (or sometimes unnamed, or unknown
distribution) approximate normal, and follow the exact same steps as type 4.
6. Population Mean (Normal, 2 unknown, n 30)
When the variance is unknown, you make use of the best unbiased estimate of
population variance,

and the rest of the steps follows.


7. Population Mean (Non-normal, 2 unknown, n 30)
Similar to type 6, you make use of the best unbiased estimate of population
variance. The following example illustrates both type 6 and 7:
A random sample of 75 11-year-olds performed a simple task and the time
taken, x minutes, noted for each. The results were summarized as follows:
x = 1215, x2 = 21708
Test, at the 1% level, whether there is evidence that the mean time taken to
perform the task is greater than 15 minutes.
Let X be the time taken to perform a simple task by the 11-year-olds.
H0: = 15
H1: > 15
The distribution is unknown. But since n = 75 is large, by the central limit
theorem, Xx is approximately normally distributed, Xx ~ N(, 2/75), with variance
unknown.
If H0 is true, then = 15, and Xx ~ N(15, 2/50)
Use an upper tail test, at 1% level.
Reject H0 if z > 2.326

Since z < 2.326, z doesnt lie in the critical region.


H0 is not rejected. There is no evidence, at ?% level that the mean time is greater
than 15 minutes.
8. Population Mean (Normal, 2 unknown, n < 30)
You probably might have guessed correctly. You should use the t-distribution to
do this kind of significance test.

Family packs of bacon slices are sold in 1.5kg packs. A sample of 12 packs was
selected at random and their masses, measured in kilograms, noted. The
following results were obtained: x = 17.81, x 2 = 26.4357
Assuming that the masses measured in kg packs follow a normal distribution
with variance 2 unknown, test at the 1% level whether the packs are
underweight.
Let X be the mass of packs of bacon slices, X ~ N(, 2)
H0: = 1.5
H1: < 1.5
Since 2 is unknown, and n < 30, a t-distribution is used, T ~ t(n 1)
If H0 is true, then = 1.5, T ~ t(11), where

Use a lower tail test, at 1% level.


Reject H0 if t < 2.718 [refer to the t-distribution tables]
xx = 1.484, so T = 3.506 < 2.718
t lies in the critical region.
H0 is rejected in favour of H1. There is evidence that the packs are underweight,
at 1% level.
9. Difference between Means 1 2 (different variance 12 & 22 known)
This is something new. Type 9, 10 and 11 are only for 2 Normal populations,
X1 and X2with unknown means 1 and 2. So it means that here, you have 2
samples, with the new test statistic XM1 XM2, and let us consider its sampling
distribution. Doing some expectation algebra,

and so our sampling distribution of difference of means will be

and therefore, in standardised form,

Lets try one example:

Due to differences in the environment, the masses of a certain species of small


animal are believed to be greater in Region A than in Region B. It is known that
the masses in both regions are normally distributed with masses in Region A
having a standard deviation of 0.04kg and masses in region B having a standard
deviation of 0.09kg.
To test this theory, random samples are taken: 60 animals from Region A had a
mean mass of 3.03kg and 50 animals from Region B had a mean mass of 3.00kg.
Does this provide evidence, at the 1% level that the animals of this species in
Region A have a greater mass than those in Region B?
Let X1 be the mass (kg) of an animal in Region A, with population mean 1. X1 ~
N(1, 0.042)
Let X2 be the mass (kg) of an animal in Region B, with population mean 2. X2 ~
N(2, 0.092)
H0: 1 2 = 0 [there is no difference in the masses between the regions]
H1: 1 2 > 0 [the animals in Region A have greater mass]
Consider the distribution of the difference between the means XM1 XM2, with n1 =
60, n2 = 50.
If H0 is true, then 1 2 = 0 [there can be cases where it is not 0 too.]

Use an upper tail test, at 1% level.


Reject H0 if z > 2.326

z doesnt lie in the critical region.


H0 is not rejected. There is no evidence, at the 1% level, that the animals in
region A have a greater mass than those in region B.
10. Difference between Means 1 2 (common 2 known)
This one has not much difference from the one above. This means that the 2
populations have a common variance, which doesnt change in time. The
distribution will then be represented by

and the test statistic,

By the way, you can also create confidence intervals for situations like this
too. Try it out yourself. There can be 2 tail, upper tail and lower tail tests as well.

11. Difference between Means 1 2 (common 2 unknown)


I dont know if the variances are different. But if both populations have a
common unknown variance, the unbiased estimate 2, also known as a pooled
two-sample estimate, has the formula

where n1 and n2 are the sample sizes and s12 and s22 are the variances of the 2
samples respectively. The distribution will be

and the test statistic,

This is, however, not always the case. When both the samples are small, we
should use the t-distribution instead. The test statistic will now be

where T ~ t(n1 + n2 2). We should only use the t-distribution when n1 + n2 2


< 30.
A large group of sunflowers is growing in the shady side of a garden. A random
sample of 36 of sunflowers is measured. The sample mean height is found to be
2.86m, and the sample standard deviation is found to be 0.60m. A second group
of sunflowers is growing in the sunny side of the garden. A random sample of 26
of these sample flowers is measured. The sample mean height is found to be
3.29m and the sample standard deviation is found to be 0.9m. Treating the
samples as large samples from normal distribution having the same variance but
possibly different means, obtain a pooled estimate of the variance and test

whether the results provide significant evidence at the 5% level that the sunnyside flowers grow taller, on average, than the shady-side sunflowers.
Let X1 be the height of sunflowers in the shady side, X 1 ~ N(1, 2)
Let X2 be the height of sunflowers in the sunny side, X 2 ~ N(2, 2)
where 2 is unknown.
H 0 : 1 2 = 0
H 1 : 1 2 > 0
Consider the distribution of the difference between the means XM1 XM2, with n1 =
n2 = 36.
If H0 is true, then 1 2 = 0

and therefore

Use an upper tail test, at 5% level.


Reject H0 if z > 1.645

z lie in the critical region.


H0 is rejected in favour of H1. There is evidence, at the 5% level, that the sunnyside sunflowers grow taller than the shady-side sunflowers.

When you perform a significance test, you tend to make errors. If H 0 is correct
and you accept it, or if H0 is false and you reject it, then youve made a correct
decision. However, there are 2 kinds of errors that you will made:
1. A Type I Error, which is made when you reject H0 when it is true
2. A Type II Error, which is made when you accept H0 when it is false.
Questions are usually interested to know the probability of making these errors.
The first one is easy, P(Type I error) = level of significance. For the type II
error, things are not so straight forward. A specific value of H1 is stated in order
to find the probability of this error. Ill show you an example below:
A random observation is taken from a binomial distribution X ~ B(20, p) and
used to test the null hypothesis p = 0.8 against the alternative hypothesis p>
0.8. The significance level of the test is 7%. Find the probability of making a Type
I error. Find also the probability of making a Type II error if in fact p = 0.85.

The probability of making a Type I error is 7%. [same as the level of significance]
You make a Type II error if you accept H0 when p is the value specified in H 1.
For Type II error,
H0: p = 0.80
H1: p = 0.85
P(X = 20) = 0.012 = 1.2%
P(X 19) = 0.069 = 6.9%
P(X 18) = 0.206 = 20.6%
So the critical region is X 19.
So P(Type II error) = P(accept H0 when H1 is true)
= P(X < 19 when p = 0.85)
= P(X < 19 when X ~ B(20, 0.85))
P(X 18) = 1 - P(X = 20) - P(X = 19) = 0.824 = 82.4% [Note that in this part of
the calculations, you are using p = 0.85, but not 0.80 as when you were finding
the critical region above.]
The probability of making a Type II error is 82.4%.
Let me summarize how you find the probability of a Type II error:
1. Define your new H1
2. Find the critical region
3. Find the probability of the new value in H1 that lies outside your found critical
region.
By the way, the expression 1 P(Type II error) is known as the Power of the
Test.

For your reference.

Study hard, make sure you dont make mistakes in this section, which is meant
to score.
We will be going ALL INTO it in the next section, where the calculations come
it.

You might also like