ResearchMethodology Jan2015

Research Methodology and
Biostatistics
Dr.K.P.Suresh ,Ph.D (Biostatistics)

National Institute of Veterinary Epidemiology and
Disease Informatics (NIVEDI) Hebbal, Bangalore560024
A statistician is someone who, with his

head in an oven and his feet in a bucket
of ice water, when asked how he feels,
responds: On the average, I feel fine.
What is Research?
Research is the systematic process
of
collecting
and
analyzing
information
to
increase
our
understanding of the phenomenon
under study. It is the function of the
researcher to contribute to the
understanding of the phenomenon
and
to
communicate
that
understanding
others.
Invention: Invest money to
to generate
knowledge
Innovation: Invest knowledge to generate money
The logic of scientific

reasoning
The whole point of

science is to uncover the
truth.
We have our senses,
through which we
experience the world
and make observations.
We have the ability to

reason, which enables us
to make logical
inferences.
In science we impose
logic on those
observations.
Inductive Inference:
Statistics as the Technology of the
Scientific Method
Statistical methods are objective methods by which
group trends are
abstracted from observations on many separate
individuals.
Summarizing data: Averages, percentages , presentation
of tables and charts
A major part of statistics involves the drawing of
inferences from samples to a population in regard to
some characteristic of interest
In statistical reasoning, then, we make inductive
inferences, from the particular (sample) to the general
(population). Thus, statistics may be said to be the
technology of the scientific method.
CLINICAL RESEARCH PROCESS

Pre-clinical testing
Investigational New Drug Application (IND)
Phase I (assess safety)
Phase II (test for effectiveness)
Phase III (large-scale testing)
Licensing (approval to use)
Approval (available for prescription)
Post-marketing studies (special studies and
long-term effectiveness/use)
Scientific enterprise
Values, Ethics and Standards in Scientific
Research
Research is based on the same ethical
values that apply in everyday life, including
honesty, fairness, objectivity, openness,
trustworthiness, and respect for others.
A scientific standard refers to the
application of these values in the context of
research. Examples are openness in
sharing research materials, fairness in
reviewing grant proposals, respect for
ones colleagues and students, and honesty
in reporting research results.
Scientific misconduct
The most serious violations of standards have come to be known as
scientific misconduct. The U.S. government defines misconduct as
fabrication, falsification, or plagiarism (ffP) in proposing, performing, or
reviewing research, or in reporting research results.
Scientists who violate standards other than ffP are said to engage in
questionable research practices. Scientists and their institutions should
act to discourage questionable research practices (QRPs) through a broad
range of formal and informal methods in the research environment
Fabrication is making up data or results.
Falsification is manipulating research materials, equipment, or processes,
or changing or omitting data or results such that the research is not
accurately represented in the research record.
Plagiarism is the appropriation of another persons ideas, processes,
results, or words without giving appropriate credit.
Questionable Research Practices: deliberately dividing research
results into the least publishable units to increase the count of ones
publications
Intellectual Property rights in

Research
Discoveries made through scientific research can have great
value to researchers in advancing knowledge, to
governments in setting public policy, and to industry in
developing new products.
Researchers should be aware of this potential value and of
the interest of their laboratories and institutions in it, know
how to protect their own interests, and be familiar with the
rules governing the fair and proper use of ideas.
Intellectual Property rights:
benefiting from a new idea may require establishing
intellectual property rights through patents and copyrights,
or by treating the idea as a trade secret. Intellectual
property is a legal right to control the application of an idea
in a specific context
Patent: Control the Application of ideas
Copyright: Control the expression of ideas
Research methods
vs Research methodology
Research methods: usually refers to
specific
activities
designed
to
generate
data
(Questionnaire,
interviews,
focus
groups,
observation, experimental)
Research Methodology: is more about
your
attitude
to
and
your
understanding of research and the
strategy you choose to correctly
Why do research?
Research allows you to gain appreciation for the practical
applications of knowledge, and to step outside your
classroom and learn about the theories, tools, resources, and
ethical issues that scholars and professionals encounter on a
daily basis.
1.Fascination
2.Answer to unsolved problems
3.Gain insight in to a particular issue
4. Develop many transferable skills for the benefit of
Research
is a method of making new ideas to be implemented into
community
practice and checking if it works
5.Personal satisfaction and achievement
Increasing Mind
Power
NEGATIVE
POSITIVE
Anger
Preoccupation/Work
Irritability
Laughing therapy
Jealousy
Yoga
Blame
Music therapy
Complain
Read
Anxiety
Play
Inferiority
Ego personality
How to be Happy
Keep your heart free
from hate, your
mind from worry.
Live simply, expect
little, give much,
sing often, pray
always. Fill your life
with love, scatter
sunshine, forget
Stages of Scientific Knowledge

1. Explorative :
is undertaken when few or no previous studies exist. The aim is look for
patterns or hypothesis that can be tested and will form the basis for
further research: Meta-analysis
2. Descriptive research
Describe the data generating process, Description would answer questions
such as; 1. what is the range of prostate volumes(ml) for a sample urology
patients 2. What is the difference in average volume between patients
with negative biopsy results and those with positive results
3. Explanation or analytical research

We seek to infer characteristics of the data generating process . Inference
would answer questions such as, for a sample of patients with prostate
problems , can we expect the average of volumes of patients with positive
biopsy results to be less than those of negative biopsy results.
4. Predictive research
We seek to make predictions about a characteristic of data generating
process, such prediction would answer questions such as, on the basis of
patient negative digital rectal examination, Prostate Specific antigen
Internal Validity
Internal validity is a crucial measure in quantitative studies, where it
ensures that a researchers experiment design closely follows the
principle of cause and effect.
The researcher can eliminate almost all of the
potentialconfounding variablesand set up strongcontrolsto
External
isolate other factors.
Validity
External validity is one the most difficult of the validity types to achieve,
and is at the foundation of every good experimental design.
External validity asks the question of generalizability: To what
populations, settings, treatment variables and measurement
variables can this effect be generalized?
Confounding: Confounding is the distortion of the

effect of one risk factor by the presence of another.
Confounding occurs when another risk factor for a
disease is also associated with the risk factor being
studied but acts separately
Confounding can be controlled by
restriction, by matching on the
confounding variable or by including it in
the statistical analysis.
Bias (Systematic Error): Any process or
effect at any stage of a study from its design to
its execution to the application of information
from the study, that produces results or
conclusions that differ systematically from the
truth.
Extraneous variables
Extraneous variables are those having relationship with
main variables (X and Y)
Confounder:
Z
X
Mediator
Z
Moderator: A moderator is a variable (z) , where X and
Y have different relationship at different levels of Z
Suppressor:
Z
Covariate
X
Decision theory and Statistics

Decision theory is theory about decision, This subject is
not unified one. The Decision theory is concerned with
goal directed behavior in the presence of options
Statistics for decision

making
Data analysis
Statistical tests
Predictions
Simulations
Accuracy and Precision

Accuracy is the
degree of veracity
(adherence to
truthfulness)
Precision is the
degree of
reproducibility
Waiting hurts. Forgetting hurts. But not

knowing which decision to take can
sometimes be the most painful...
Decision based on
sample data
Decision
making
system
Testing Surveillance System

Criteria:
Surveillan
ce system
Actual Condition
(in population)
Infection
present
Infection
Absent
Infection
present
True Positive
False Positive
(Type II error)
Infection
Absent
False
Negative
(Type I error)
True Negative
Reality
Effect does
not exist
Effect
Exists
Effect does
not Exist
Correct
Decision
Type 2
Error
Effect Exists
Type 1
Error
Correct
Decision
Conclusion
How to control errors

Definition of population
Study design
Research hypothesis
Null hypothesis
Alternative hypothesis
One tailed/two tailed study
Sample size
Sample selection
Randomization
Blinding
Data collection procedures
Data management procedures
Suitable statistical method/s
Interpretations
Testing Surveillance System

Criteria:
Surveillance
system
Actual Condition
(in population)
Infection
present
(Ho)
Infection
Absent
(H1)
Infection
present
True Positive
(TP)
False Positive
(Type II error)
Infection
Absent
False Negative
(Type I error)
True Negative
(TP)
Sensitivity
Sensitivity= TP/(TP+FN)
Specificity
Specificity=TN/(TN+FP)
PPV/NPV
PPV= TP/(TP+FP)
NPV=TN/(TN+FN)
Accuracy=
(TP+TN)/N
infection or disease; or to detect an exotic or new disease

so that control action can be quickly instituted
Monitoring:
Systematic
collection,
analysis
and
dissemination of information about the level (occurrence,
incidence and prevalence) of infections or disease that
are known to occur in a specified population
Surveys: are the tools for data collection
An Investigation using the systematic collection of
information from a population that is not under the control of
Investigator
Passive Surveillance: passive or general surveillance typically
takes the form of disease reporting system. If a producer notices
a disease problem , this is reported and reported in systematic
manner
Active Surveillance: uses structured disease surveys to collect
high-quality disease information quickly and cheaply
Representative Sample: one that is Similar to Population.
Inference is valid only when a representative sample is chosen
Random Sample: Every element in the population has the same
probability of being selected in the sample.
Universal TRUTH we learnt

sun rises in the east
Fact:sun neither rises nor sets, only
earth rotates
Moral:
Education spoils our
Applying Inclusion and

Exclusion criteria
for defining study population
All clinical trials have guidelines about who can participatethese
are specified in the inclusion/exclusion criteria:
Factors that allow someone to participate in a clinical trial are
"inclusion criteria"
Factors that exclude or do not allow participation in a clinical
trial are "exclusion criteria"
These factors may include: Age, Gender, The type and stage of
disease, Previous treatment history, Specific lab values, Other
medical conditions..
Inclusion and exclusion criteria are not used to reject people
personally. The criteria are used to:
1. Identify appropriate participants
2. Keep them safe
3. Help ensure that researchers can answer the questions they
want answered
One of the crucial component of successful research or trial is the
Testing as a Scientific
Knowledge
1.
Specify clearly, completely and unequivocally the question

you are asking
Identify, specify in detail and plan how to measure the

variable(s) to answer that question
Review your definitions of population and sample and verify

the appropriateness of generalization
Review the sampling scheme to obtain the data
Specify exactly the null and alternative hypothesis
Select the risks for Type I and Type II error
Choose the form of statistical test
Verify that your sample size is adequate to achieve the

proposed statistical power
At this point , obtain your data
10
Identify and list and test possible biases
11
Perform the statistical test and form the conclusion
Study designs
In many ways the design of

a study is more important
than the analysis. A badly
designed study can never
be retrieved, whereas a
poorly analysed one can
usually be re-analysed.
Consideration of design is also important

because the design of a study will govern how the
data are to be analysed.
Study Designs
Observational Studies:
Nature determines who is

exposed to factor of interest and who is not
Cross-Sectional studies
Case-Control studies
Prospective
Experimental Study Designs:
Investigator
determines who is exposed
Correlation studies and Modeling:
Feasibility study:
In vitro studies, case series studies, pilot
studies
Case series
Descriptive account of an interesting
characteristic
In one patient
In a small group of patients
Usually involves patients seen over a short
period of time
Does not involve controls
No research hypothesis
Leads to formulation of hypotheses, other
types of studies
Cross sectional studies

Analyze data collected at a single
point in Time
Provide information on status quo
(e.g. prevalence of a condition, or
disease characteristics)
Quick to complete, cheap
Cannot examine outcomes
May lead to biased conclusions about
disease
progression
Case control studies

Longitudinal, retrospective
design
Starts with the outcome
Cases: those with the outcome
Controls: those without the
outcome
Case-control advantages
Shorter, Cheaper and Useful
to study rare diseases or
diseases that take a
long time to manifest, or to
explore preliminary
Hypotheses
Case-control
disadvantages
Difficult to control for bias
May depend entirely on
quality of existing records
Can be difficult to designate
appropriate control group
Advantages/Disadvantages of CaseControl study
Advantages
Disadvantages
Quick
Uncertain if exposure preceded

disease
Require reasonably small

numbers
Potential for recall bias
Reasonably economical
Selection bias (Recruitment

influenced by exposure)
Sensible for study of rare

disease
Unable to estimate disease

incidence
No loss to follow-up
Case control studies are less

reliable than either randomized
controlled trials or cohort
studies.
Can test current hypothesis

Consistency of measurement
easily maintained
RETROSPECTIVE STUDY
Experimental Design
where the investigator determines who is exposed. These
may prove causation
Determine causes:
True experimental design is regarded as the most accurate form of
experimental research, in that it tries to prove or disprove a
hypothesis mathematically, with statistical analysis.
A double blind experiment is an experimental method used
to ensure impartiality, and avoid errors arising from bias.
Quasi Experimental Design: where lack of randomization
Prospective Study
designs
The most powerful studies are prospective
studies, and the paradigm for these is the
randomised controlled trial. In this subjects
with a disease are randomised to one of two
(or more) treatments. one of which may be a
control treatment.
Parallel Study designs

A parallel group design is one in which
treatment and control are allocated to different
individuals. To allow for the therapeutic effect of
simply being given treatment, the control may
consist of a placebo, an inert substance that is
physically identical to the active compound.
Randomized Controlled
Study
1. Randomized controlled trials require one or more control

groups for purposes of comparison.
2. The selection of control groups depends on the objectives
of the study. In the evaluation of traditional medicine,
3. a concurrent control group should be used.
The control groups may involve (not in order of priority):
well established treatment
non-treatment
different doses of the same
treatment
sham or placebo treatment
full-scale treatment
minimal treatment
alternative treatment.
Cross-over Study designs

A crossover study is one in which two or more treatments
are applied sequentially to the same subject.
The advantages are that each subject then acts as their own
control and so fewer subjects may be required.
The main disadvantage is that there may be a carry over
effect in that the action of the second treatment is affected
by the first treatment.
Cohort studies
Cohort: a group of people who have something in common and who
remain part of a group over an extended period of time
Outcomes determined after follow-up: longitudinal, prospective
studies
COHORT STUDY
Advantages/Disadvantages of Cohort
study
Advantages
Disadvantages
Can collect exposure
information as exposure
happens
Duration of study : May take

decades to complete
Can collect multiple

different exposures
Subjects must be followed

over time
Exposure information can

be relatively reliable
Losses potentially invalidate

the study
Can collect information as

outcome happens
Very expensive
Can you afford to wait
decades for your answer
Comparison of Case-Control ad
Cohart Study Design
Case-control works from outcome (or presence of disease) to

treatment (or exposure),
Cohort works from treatment (or exposure) to outcome (or
presence of disease).
Nested Case-Control Study
cases of a disease that occur in a

defined cohort are identified and, for
each, a specified number of matched
controls is selected from among those
in the cohort who have not developed
the disease by the time of disease
occurrence in the case
potentially offers impressive reductions

in costs and efforts of data collection and
analysis compared with the full cohort
approach, with relatively minor loss in
statistical efficiency
Case-Cohort
studies
The case-cohort design is most useful in analyzing time to
failure in a large cohort in which failure is rare.
Covariate information is collected from all failures and a
representative sample of censored observations.
Sampling is done without respect to time or disease status,
and, therefore, the design is more flexible than a nested
case-control design.
Despite the efficiency of the methods, case-cohort designs
are not often used because of perceived analytic
complexity.
IMPROVED EXPERIMENTAL
DESIGN
Shift effect or symptomatic effect

Slope effect or Disease modifying effect
R
BETTE MANCE
R
PERFO
WITHDRAWAL DESIGN
ACTIV
E
SLOPE EFFECT/DISEASE
MODIFYING EFFECT
PLACEB
O
TIME
Blinding
To
further
eliminate
bias,
randomized trials are sometimes
"blinded" (also called masked).
Single-blinded trials are those
in which participants do not
know which group they are in and therefore which
intervention they are receiving
- until the conclusion of the
study.
Double-blinded trials are those
in which neither the
participant nor the
investigators know to which
group the participant has been
assigned until the conclusion
of the study.
- Triple blinded trials where
Statistician also blinded
1.A study is done to examine the association

between a mothers education and risk of a
congenital heart defect in her offspring. The
investigator enrolls a group of mothers of babies
with birth defects and a group of mothers of
babies without birth defects. The mothers are
then asked a series of questions about their
education
1.Case series
2.Case-control study
3.Nested case-control study,
4.Prospective cohort study
5. Retrospective cohort study,
6. Randomized clinical trial,
2.A study on the association of coffee

consumption and performance on a
memory test randomly assigns half of the
enrolled subjects to drink coffee one hour
before taking the memory test and the
other half to not drink coffee one hour
before taking the memory test.
1. Case series
2. Case-control study
3. Nested case-control study,
4. Prospective cohort study
7.Cross-sectional study,
8.Ecological study and
3.A study examining the association

between meat consumption and heart
disease compares the average number of
kilograms of meat consumed per person
for 50 different countries to the incidence
rate of heart disease in the same 50
countries
1. Case series
4.A study describes a group of hospital patients all of

whom suffer from migraine with aura and experienced an
ischemic stroke.
1. Case series
9.Case cohort study
5.An investigator enrolls a group of healthy

individuals and distributes questionnaires to
collect information on sex and blood type. The
investigator then examines the association
between sex and blood type.
1. Case series
9.Case cohort study
6.A researcher uses a database of medical

records to identify a group of retired factory
workers. He reviews each persons medical
records to follow their factory exposures over
time and see which of these subjects has
developed skin cancer in the past 25 years.
1. Case series
9.Case cohort study
Boy: Where Are You Going?

Girl: For Suicide..
Boy: Then, Why so Much Make-Up?
Girl: You Idiot..!! Tomorrow My Photo
will Come In Newspaper..........
Superiority, Equivalence and NonInferiority studies

Superiority study:
A Trial or Research with the objective of showing that the
response to the investigational product is superior to a
comparative agent (Active or placebo control)
Significant results are good outcome
Equivalence study:
Research with the objective of showing that the difference
between control and study treatments are not large in either
direction of the study . Investigational product is compared
to a reference treatment without the objective of showing
superiority
Non-significant/significant results are good outcome
Non-inferiority study:
If the study objective is to demonstrate that
Measurement error
If more than one operator used in study
then measurement (gauge) error has two
components of variance:
2
total
Repeatability:
2
repeatability
instrument
Reproducibility:
2
reproducibility
operators
2
product
2
repeatability
2
gauge
2
reproducibiliy
- Variance due to measuring
- Variance due to different
What is Hypothesis
testing?
A statistical hypothesis is an assumption about a population

parameter. This assumption may or may not be true. Hypothesis
testing refers to the formal procedures used by statisticians to
accept or reject statistical hypotheses.
Statistical Hypothesis
The best way to determine whether a statistical hypothesis is true
would be to examine the entire population. Since that is often
impractical, researchers typically examine a random sample from
the population. If sample data are not consistent with the
statistical hypothesis, the hypothesis is rejected.
Null hypothesis. The null hypothesis, denoted by H 0, is usually the
hypothesis that sample observations result purely from chance.
Alternative hypothesis. The alternative hypothesis, denoted by H1
or Ha, is the hypothesis that sample observations are influenced by
some non-random cause.
Example
For example, suppose we wanted to determine whether a coin was fair
and balanced. A null hypothesis might be that half the flips would result in
Heads and half, in Tails. The alternative hypothesis might be that the
number of Heads and Tails would be very different. Symbolically, these
hypotheses would be expressed as
H0: P = 0.5
Ha: P 0.5
Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails.
Given this result, we would be inclined to reject the null hypothesis. We
would conclude, based on the evidence, that the coin was probably not
fair and balanced.
hypothesis test can have one of two outcomes: Reject the Null
Hypothesis or failure to reject the null hypothesis.
distinction between "acceptance" and "failure to reject?"
Acceptance implies that the null hypothesis is true.
Failure to reject implies that the data are not sufficiently persuasive for
us to prefer the alternative hypothesis over the null hypothesis.
Null hypothesis/Alternative hypothesis

The null hypothesis, H0, represents a theory that has been put
forward, either because it is believed to be true or because it
is to be used as a basis for argument, but has not been
proved. For example, in a clinical trial of a new drug, the null
hypothesis might be that the new drug is no better, on
average, than the current drug. We would write
H0: there is no difference between the two drugs on average.
We give special consideration to the null hypothesis. This is due
to the fact that the null hypothesis relates to the statement
being tested, whereas the alternative hypothesis relates to
the statement to be accepted if / when the null is rejected.
The final conclusion once the test has been carried out is always
given in terms of the null hypothesis. We either "Reject H0 in
favour of H1" or "Do not reject H0"; we never conclude
"Reject H1", or even "Accept H1".
If we conclude "Do not reject H0", this does not necessarily
mean that the null hypothesis is true, it only suggests that
there is not sufficient evidence against H0 in favour of H1.
Null Hypothesis:
Suppose we are testing
the efficacy of a new drug on
Example
patients with myocardial infarction (heart attack).

We divide the patients into two groupsdrug and no drug
according to good design procedures,
and use as our criterion measure mortality in the two
groups.
It is our hope that the drug lowers mortality,
but to test the hypothesis statistically, we have to set it up in a sort
of backward
way.
We say our hypothesis is that the drug makes no difference, and
what we hope to do is to reject the no difference
hypothesis, based on
evidence from our sample of patients.
This is known as the null hypothesis.
Alternative Hypothesis
We test this against an alternate hypothesis, known as HA , that the
difference in death rates between the two groups does not equal 0.
We then gather data and note the observed difference in
mortality between group A and group B.
If this observed difference is sufficiently greater than zero, we
reject the null hypothesis.
If we reject the null hypothesis of no difference, we accept the alternate
hypothesis, which is that the drug does make a difference.
1.I will assume the hypothesis that there is no difference is true;
2. I will then collect the data and observe the difference between
the two groups;
3. If the null hypothesis is true, how likely is it that by chance
alone I would get results such as these?
4. If it is not likely that these results could arise by chance under
the assumption than the null hypothesis is true, then I will
conclude it is false, and I will accept the alternate hypothesis.
Why Do We Test the Null Hypothesis?

Suppose we believe that drug A is better than drug B in preventing death
from a heart attack.
Why don't we test that belief directly and see which drug is better, rather
than testing the hypothesis that drug A is equal to drug B?
The reason is that there is an infinite number of ways in which
drug A can be better than drug B, so we would have to test an
infinite number of hypotheses.
If drug A causes 10% fewer deaths than drug B, it is better. So first
we would have to see if drug A causes 10% fewer deaths.
If it doesn't cause 10% fewer deaths, but if it causes 9% fewer deaths, it is
also better.
Then we would have to test whether our observations are
consistent with a 9% difference in mortality between the two
drugs.
One tailed vs two tailed

A one-tailed test looks for an increase or
decrease in the parameter whereas
a two-tailed test looks for any change in
the parameter (which can be any
change- increase or decrease).
One tailed or two tailed tests

When is a one-tailed test appropriate?
Because the one-tailed test provides more power to detect an effect, you
may be tempted to use a one-tailed test whenever you have a hypothesis
about the direction of an effect. Before doing so, consider the
consequences of missing an effect in the other direction. Imagine you
have developed a new drug that you believe is an improvement over an
existing drug. You wish to maximize your ability to detect the
improvement, so you opt for a one-tailed test. In doing so, you
fail to test for the possibility that the new drug is less effective
than the existing drug. The consequences in this example are
extreme, but they illustrate a danger of inappropriate use of a
one-tailed test.
So when is a one-tailed test appropriate? If you consider the
consequences of missing an effect in the untested direction and conclude
that they are negligible and in no way irresponsible or unethical,
then you can proceed with a one-tailed test.
When is a one-tailed test NOT appropriate?
Choosing a one-tailed test for the sole purpose of attaining significance is
not appropriate. Choosing a one-tailed test after running a twotailed test that failed to reject the null hypothesis is not
appropriate, no matter how "close" to significant the two-tailed test
examples
We could use a one-tailed test, to see if
the stream has a higher pH than one
year ago, for which we would use the
alternate hypothesis HA: prev < current.
However, we may want a more rigorous
test, for the hypothesis that HA: prev
current. This would mean that both HA:
prev < current and HA: prev > current were
satisfied, and we could be sure that
there is a significant difference between
the means.
Factorial Designs
An experiment , the process engineer's goal is to determine how the yield
of an adhesive application process can be improved by adjusting three
(3) process parameters: mixture ratio, curing temperature, and curing
time. For each of these input parameters, two levels will be defined for
use in this 2-level experiment. For the mix ratio, the high level is set at
55%, while the low level is set at 45%. For the curing temp., the high
level is set at 150 deg C while the low level is set at 100 deg C. For the
curing time, the high level is set at 90 minutes, while the low level is set
at 30 minutes.As mentioned, the output response monitored is process
yield. Assume further that the data were gathered by performing just a
single replicate (n=1) per combination treatment.
ADVANTAGES
Factorial designs are the
ultimate designs of choice
whenever we are interested in
examining treatment
variations.
Factorial designs are
efficient. Instead of
conducting a series of
independent studies, we
are effectively able to
combine these studies into
P value
The probability value (p-value) of a
statistical hypothesis test is the
probability of getting a value of
the test statistic as extreme as or
more extreme than that observed
by chance alone, if the null
hypothesis H0, is true.
It is the probability of wrongly
rejecting the null hypothesis if it
is in fact true.
1.Significant figures
+ Suggestive significance
(P value: 0.05<P<0.10)
* Moderately significant
( P value:0.01<P 0.05)
** Strongly significant
(P value : P 0.01)
The Confidence Level

The confidence interval is the statistical measure of the number of
times out of 100 that results can be expected to be within a
specified range.
For example, a confidence interval of 90% means that results of an
action will probably meet expectations 90% of the time.
The basic idea described in Central Limit Theorem is that when a
population is repeatedly sampled, the average value of an attribute
obtained is equal to the true population value. In other words, if a
confidence interval is 95%, it means 95 out of 100 samples will
have the true population value within range of precision.
Degree of Variability.
Depending upon the target population and attributes under
consideration, the degree of variability varies considerably. The
more heterogeneous a population is, the larger the sample size is
required to get an optimum level of precision. Note that a
Measurements
Identity : each value on the measurement scale
has unique meaning
Magnitude: Values on the measurement scale
have an ordered relationship to one another. That
is, some values are larger and some are smaller.
Equal intervals. Scale units along the scale are
equal to one another. This means, for example,
that the difference between 1 and 2 would be
equal to the difference between 19 and 20.
A minimum value of zero. The scale has a true
zero point, below which no values exist.
Types of Measurements
1. Nominal Scale of Measurement
The nominal scale of measurement only satisfies the identity property
of measurement.
2. Ordinal Scale of Measurement
The ordinal scale has the property of both identity and magnitude
3. Interval Scale of Measurement
The interval scale of measurement has the properties of identity,
magnitude, and equal intervals.
4. Ratio Scale of Measurement
The ratio scale of measurement satisfies all four of the properties of
measurement: identity, magnitude, equal intervals, and a minimum
value of zero.
Normal distribution and

Standard Deviation
No effect : d<0.20
Mild effect : 0.20
<d<0.50
Moderate effect: 0.50
<d<0.80
Large effect :
0.80<d<1.20
Very large effect :
d>1.20
In statistics, an effect size is a

measure of the strength of the
relationship between two
variables in a statistical
population, or a sample-based
estimate of that quantity. An
effect size calculated from data
is a descriptive statistic that
conveys the estimated
magnitude of a relationship
without making any statement
about whether the apparent
relationship in the data reflects
a true relationship in the
population.
Effect size is simply the change in the scale

from before to after
treatment, divided by the standard deviation at
baseline.
Question: "How to Kill an Ant??"

Marks!!
Asked in an Exam for 10
Student:
Mix Chilli Powder with Sugar, & keep It Outside the Ant's
Hole..!
After eating, Ant will Search for some Water near a Water
tank.
Push ant in to it.. =!!
Now Ant will go to Dry itself Near Fire,

When it Reaches fire, Put a Bomb into D fire..!!
Then Admit Wounded Ant in ICU..!! =O
And Then Remove Oxygen Mask from it's Mouth and Kill the
Ant.. !! =|
MORAL:
Don't Play with Students.. !!
Sample Size
Determining the sample size to be selected is an important step in any
research study. For example let us suppose that some researcher wants to
determine prevalence of eye problems in school children and wants to
conduct a survey.
The important question that should be answered in all sample
surveys is "How many participants should be chosen for a survey"?
However, the answer cannot be given without considering the
objectives and circumstances of investigations.
The choosing of sample size depends on non-statistical
considerations and statistical considerations. The non-statistical
considerations may include availability of resources, manpower,
budget, ethics and sampling frame.
The statistical considerations will include the desired precision of
the estimate of prevalence and the expected prevalence of eye
problems in school children.
The Level of Precision: Also called sampling error, the level of precision,
is the range in which the true value of the population is estimated to be.
This is range is expressed in percentage points. Thus, if a researcher finds
Sample size
Estimation
Size;
of
Optimum
Sample
Lower sample size leads to more

insignificant results
Too
many
sample
make
unnecessary significant results and
costly experiment
Criteria for Estimation of sample
size
1.Type I error (0.05 or 5%)
2.Statistical Power( 0.80 or 80%)
3.Expected difference
4.Standard deviation
Sample size :
Dose Escalation
Dose limiting toxicity (DLT) must
be defined
Decide a few dose levels (e.g. 4)
At least three patients will be
treated on each dose level
(cohort)
Not a power or sample size
calculation issue
Dose Escalation
Enroll 3 patients
If 0/3 patients develop DLT
Escalate to new dose
If DLT is observed in 1 of 3
patients
Expand cohort to 6
Escalate if 3/3 new patients do not
develop DLT (i.e. 1/6 develop DLT)
Dose Escalation
Maximum Tolerated Dose (MTD)
Dose level immediately below the
level at which 2 patients in a
cohort of 3 to 6 patients
experienced a DLT
Usually go for safe dose

MTD or a maximum dosage that is
pre-specified in the protocol
Phase II/III :Number of

Patients to Enroll?
Ratio of two arms !:1, 1:1.5 or 1:2
Power of study minimum 80.0% or
=0.80
Difference of outcome
Standard deviation
One tailed/Two tailed
Type I error, = 0.05/0.01
Sample size:Example Survey type

Z 2 /2 * P *(1 p ) * D
N
E2
Where P is the prevalence or proportion of event of interest for

the study, E is the Precision (or margin of error) with which a
researcher want to measure something. Generally E will be 10%
of P and Z/2 is normal deviate for two tailed alternative
hypothesis at level of significance, for example, for 5% level of
significance Z/2 is 1.96 and 1% level of significance it is 2.58 as
shown in table 2.
D is the design effect reflects the sampling design used in the
survey type of study. This is 1 for simple random sampling and
higher values (usually 1 to 2) for other designs such as stratified,
Systematic, cluster random sampling etc, estimated to
compensate for deviation from simple random sampling
procedure. The design effect for cluster random sampling is
taken as 1.5 to 2.
For the purposive sampling, convenience or judgment sampling,
1. Sample size estimation for proportion in

survey type of studies
Example: Researcher interested to know the sample size for
conducting a survey for measuring the prevalence of obesity in
certain community. Previous literature gives the estimate of
obesity at 20% in the population to be surveyed, and assuming
95% confidence interval or 5% level of significance and 10%
margin of error, the sample size can be calculated as follow as;
a simple random sampling design. Hence sample size of

1537 is required to conduct community based survey to
estimate the prevalence of obesity. Note-E is the margin
of error, in the present example it is 10%X0.20=0.02.
To find the final adjusted sample size, allowing nonresponse rate of 10% in the above example,, the adjusted
sample size will be 1537/(1-0.10)=1537/0.90=1708.
2. Sample size estimation with single

group mean
N = (Z/2)2 s2 / d2,
where s is the standard deviation obtained from
previous study or pilot study and
d is the accuracy of estimate or how close to the true
mean
Z/2 is normal deviate for two tailed alternative
hypothesis at level of significance
Example: In a study for estimating the weight of
population and wants the error of estimation to be less
than 2 kg of true mean (that is expected difference of
weight to be 2 kg) , the sample standard deviation was 5
and with a probability of 95%, and (that is) at an error
rate of 5%, the sample size estimated to be
N=(1.96)2 (5)2/ 22 gives the sample of 24 subjects, if the
allowance of 10% for missing, losses to follow-up,
3. Sample size estimation with two means

N
(r 1)( Z /2 Z1 ) 2 2
r d2
Where Z is the normal deviate at

level of significance( Z is 1.96
for 5% level of significance and
2.58 for 1% level of significance)
and Z1- is the normal deviate at
1-% power with % of type II
error( 0.84 at 80% power and
1.28 at 90% statistical power).
r=n1/n2 is the ratio of sample
size required for two groups,
generally it is one for keeping
equal sample size for two groups,
If r=0.5 gives the sample size
distribution as 1:2 for two
Let`s
us
say
a
clinical
researcher wanting to compare
the effect of two drugs, A and
B, on systolic blood pressure
(SBP). On literature search
researcher found the mean SBP
in two groups were 120 and
132 and common standard
deviation of 15, The total
sample size for the study with
r=1 (equal sample size), =5%
and power at 80% and 90%
were computed as
24 and for 90% of statistical
power the sample size will be
32. In unequal sample size of
1: 2 (r=0.5) with 90% statistical
power of 90% at 5% level
4. Sample size estimation with two proportions
/2
2 p(1 p) Z1 p1 (1 p1 ) p2 (1 p2 )
( p1 p2 )2
Where p1 and p2 are the proportion of event of

interest (outcome) for group I and group II and p
is , is normal deviate at level of significance
and Z1- is the normal deviate at 1-% power with
% of type II error, normally type II error is
considered 20% or less.
If researcher is planning to conduct a study with
unequal groups, he or she must calculate N as if
we are using equal groups, and then calculate the
modified sample size . If r=n1/n2 is the ratio of
sample size in two groups, then the required
4. Sample size estimation with two proportions

Example: It is believed that the proportion of patients
who develop complications after undergoing one type of
surgery is 5% while the proportion of patients who
develop complications after a second type of surgery is
15%. How large should the sample be in each of the two
groups of patients if an investigator wishes to detect,
with a power of 90% , whether the second procedure has
a complications rate significantly higher than the first at
the 5% level of significance?
5. Sample size estimation with

correlation co-efficient
N
1
4
Z /2 Z1
1 r
log
Example: According to the literature, the correlation

between salt intake and systolic blood pressure is around
0.30. A study is conducted to attests this correlation in a
population, with the significance level of 1% and power of
90%. The sample size for such a study can be estimated as
2
follows:
2.58
1.28
1
1 0.3
log e
4
1
0.3
the sample size for 90%

power at 1% level of
significance was 99 for
two tailed alternative
test and 87 for one
6. Sample size estimation with odds

ratio
Sample size(per group) estimation: A Thumb

rule
Two group : n=16 s2/d2
Three groups: n=22 s2/d2
Four groups: n=26 s2/d2
One tailed: 20% less

Pre-post design: 50% less
Cross-over design: 25% of
two tailed
K:1 ratio for Unequal

sample size
Where s is the within
Increase total sample size
standard deviation d is the
by
smallest difference of means
(k+1)2/4k
Total sample size for two
group is 26
Factors affecting the sample sizeWant in 2:1
1. The Sample size increases as SD
increase
Then
26*9/8=30 (20:10)
2. The sample size increases as significance level made
smaller(<0.05)
3. Sample size increases with required power increases
(>0.80)
4. Sample size increases with decrease in difference
Sample Size calculation
n=111
n=141
n=69
for =0.70
Yoga teacher to a woman: Has

yoga any effect over your husbands
drinking habit?
Woman: Yes, An Amazing Funny
Effect !! Now he drinks the whole
bottle standing upside down over his
head.
Selection of Controls
Generally control: Cases ratio is 1:1
1.Historical controls: comparison of group
which was treated earlier period using
another form of therapy/intervention
2.Geographical control : Comparison with a
group treated elsewhere with a different
form of therapy /intervention
3.Volunteer control: May not be matched group
4.Concurrent control : Control group observed
simultaneously with the treated group
Placebo control vs Active control A placebo is
an inactive pill, liquid or a powder that has
no treatment value Control is the standard by
which the experimental observations are
evaluated. In many Clinical trials an
How many controls

Case-Control Study
Sample size
calculation says
n=13 cases
and controls are
needed
Only have 11 cases!
Want the same
precision
n0 = 11 cases
kn0 = # of controls
n
k
2n0 n
k = 13 / (2*11 13) =
13 / 9 = 1.44
kn0 = 1.44*11 16
controls (and 11
cases)
Same precision as
13 controls and 13
cases
Question by a student !!
If a single teacher can't
teach us all the subjects,
Then...
How could you expect a single
student
to learn all subjects ?
Randomization
An important aspect of any research
which should be clearly stated in
the final report is the method used
to assign treatments (or other
interventions)
to
participants.
Random assignment has been used
for more than 50 years and is the
preferred method of assignment.
Randomization eliminates the source
of bias in treatments assignment;
Subjects in various groups should not
differ in any systematic way,
If
treatments
are
systematically
different , research results will be
biased.
Inadequate
randomization,
overestimate treatment by 40%
Commonly used term in

research
Protocol
A protocol is a study plan on which all clinical

trials are based. The plan is carefully
designed to safeguard the health of the
participants as well as answer specific
research questions. A protocol describes
what types of people may participate in the
trial; the schedule of tests, procedures,
medications, and dosages; and the length of
the study. While in a clinical trial,
participants following a protocol are seen
regularly by the research staff to monitor
their health and to determine the safety and
effectiveness of their treatment.
Informed Consent form

Informed Consent An agreement signed by all volunteers
participating in a clinical research study, indicating their
understanding of:
(1)why the research is being done;

(2)what researchers hope to learn;
(3)what will be done during the trial and for how
long;
(4)what risks are involved;
(5)what, if any, benefits can be expected from
the trial;
(6)what other interventions are available;
(7)the participants right to leave the trial at any
time.
Two principles of data

analysis
Intent-to-treat Analysis (ITT) Full Analysis data Set
Patients in a trial assigned to one treatment group but
for variety of reasons, they receive other treatment
(withdrawal or failure to comply). If this occurs, subjects
should be analyzed as if they had completed the study in
their treatment group. If the composition of each
treatment groups are altered, one negates the intention of
randomized trial- to have random distribution of
unmeasured characteristics that may affect the outcome
(confounders). Regardless protocol deviations, subject
compliance or withdrawal analysis is performed according
to assigned treatment group. Admits non-compliance and
protocol deviations.
Per Protocol Analysis (PP) Per Protocol Analysis data Set or
efficacy sample or evaluable sample
Only patients who sufficiently complied with the trial
protocol should be considered in analysis. Compliance
covers exposure to treatment, availability of measurement
and absence of major protocol violations, also called as
Types of Clinical trails

Treatment trials
Testing new drugs/new approaches
Prevention trials
Better ways to prevent diseases.

Vaccines, medicines, minerals etc
Diagnostic trials
Find better test for diagnosing the

diseases
Screening trials
Test the best way to detect certain

disease or health conditions
Quality of Life
trials
Explore the ways to improve

comfort and quality of life for
individuals with Chronic illness
Phases of Clinical trails

Phase I
Researchers test an experimental drug or treatment in a

small group of healthy people (20-80) for the first time to
evaluate its safety, determine a safe dosage range, and
identify side effects
Phase II
The experimental study drug or treatment is given to a

larger group of patients (100-300) to see if it is effective
and to further evaluate its safety.
Phase III
the experimental study drug or treatment is given to large

groups of Patients (1,000-3,000) to confirm its
effectiveness, monitor side effects, compare it to
commonly used treatments, and collect information that
will allow the experimental drug or treatment to be used
safely.
Phase IV
post marketing studies delineate additional information

including the drug's risks, benefits, and optimal use.
Teacher fell asleep in class and a little naughty boy walked up

to him,
Little boy: "Teacher are you sleeping in class?"
Teacher: "No I am not sleeping in class."
Little boy: "What were you doing sir ?"
Teacher: "I was talking to God."
The next day the naughty boy fell asleep in class and the same
teacher walks up to him...
Teacher: "young man, you are sleeping in my class."
Little boy: "No not me sir, I am not sleeping."
Angry teacher: "What were you doing.??"
Little boy: "I was talking to God."
Angry teacher: "What did He say??"
Data management
Involves Screening for missing data
Is the Missing due to incomplete data
collection
Is the missing due to non-response
Is the pattern of missing is random
Data validity: Screening for data validity; wrong
entry etc
Outliers: Screening for outliers
Normality test:
test
Kolmogrove Smrinov normality

Shaperio wilk W test
Missing values , outliers and strategies

Since 1960, principles of Intent-to-treat (ITT) has become widely accepted for the
analysis of controlled clinical trails. In this context, the question of how to
perform such an analysis in the presence of missing information about a main
endpoint is of Major importance. 10-20% additional samples are required to
adjust the drop out rate
Drastic increase in the Type I error and Substantial decrease in Power of the study
Type of Missing:
1.Missing completely at random(MCAR): when missing values are randomly
distributed across all observations, it can be confirmed by dividing respondents into
those with and without missing data, using the t-test or chi-square test to establish two
groups do not differs significantly
2. Missing at random (MAR) : is a condition which exits when missing values are not
randomly distributed across all the observations but are randomly distributed within
one or more subsamples
Methods of estimating missing values
1. LOCF (Last observation carry forward)
2. Mean value
3. By Regression method
Data sheet on missing
SAMPLING
It is a scientific method of data collection.
The main principle behind sampling is that we seek knowledge about the total units(called
population) by observing a few units(called sample) and extend our inference about the
sample to the entire population.
DIFFERENCE BETWEEN SAMPLING

AND CENSUS
IN CENSUS METHODLOGY EACH AND EVERY ELEMENT OF THE UNIVERSE
IS CONTACTED
WHEREAS IN SAMPLING METHODOLOGY FEW ELEMENTS ARE SELECTED
FROM UNIVERSE FOR THE RESEARCH.
NEED OF SAMPLING
1. POPULATION IN MANY CASES MAY BE SO LARGE AND SCATTERED THAT A
COMPLETE COVERAGE MAY NOT BE POSSIBLE.
2. IT OFFERS HIGH DEGREE OF ACCURACY BECAUSE IT DEAL WITH A SMALL
NUMBER OF PERSONS.
3. IN SHORT PERIOD OF TIME VALID AND COMPARABLE RESULTS CAN BE
OBTAINED
4. SAMPLING IS LESS DEMANDING IN TERMS OF REQUIREMENTS OF
INVESTIGATORS.
5. IT IS ECONOMICAL SINCE IT CONTAINS FEWER PEOPLE.
Role Surveys in disease controls

For the purpose of disease control , improving the health and
productivity of animal or aquatic animals and therby, the well-being of
the people information needed to
1.
2.
3.
4.
5.
6.
7.
Identify what disease exist in the country

Determine the level and location of diseases
Determine the importance of different diseases
Set priorities for the use resources for disease control activities
Plan, implement, and monitor disease controls programs
Respond to disease outbreaks
Meet reporting requirement of International organisations OIE, WHO
etc
8. Demonstrate disease status for trading parteners
Population, Sample and Sampling Method

Who is the target group This is called the study
for the study?
population
Who in the target group This is called the
should be surveyed?
sample.
How many people
should be surveyed?
This is called the

sample size.
How should the people This is called the

to be surveyed by
sampling method.
selected?
PRINCIPLES OF SAMPLING
1. SAMPLE UNITS MUST BE CHOSEN IN A SYSTEMATIC AND OBJECTIVE
MANNER.
2. SAMPLE UNITS MUST BE CLEARLY DEFINED AND EASILY IDENTIFIABLE
3. SAMPLE UNITS MUST BE INDEPENDENT OF EACH OTHER.
4. SAME UNITS OF SAMPLE SHOULD BE USED THROUGHTOUT THE
STUDY.
5. THE SELECTION PROCESS SHOULD BE BASED ON SOUND CRITERIA
AND SHOULD AVOID
ERRORS, BIAS AND DISTORTIONS.
Population
Sample
Convenience Sampling
Researche
r
Researcher selects units to be included based

on ease of obtaining them or simple availability
Volunteer Sampling
Ill do it!
Ill do it!
Researcher
Ill do it!
Researcher uses only people who volunteer to participate

in the research
Network/Snowball Sampling
Researcher
Researcher selects a few participants, who then

suggests others
who may be willing to participate
Simple Random Sampling
2, 6, 7, 12, 18
Each member of the population is listed in fashion (e.g., numerically

and then a sample is drawn by randomly selecting members of the p
Systematic/Sequential Random Sampling
Desired Sample Size: 5

Population Size: 20
A random start
in the
sequence is selected, and sample is selected
Random
Start:
Increment:
20/52 = 4
selecting cases sequentially in the list to produce the desired sampl
Cluster Sampling
Probability Sampling
1. Random Sampling
Random sampling is one

of the most popular types
of random or probability
sampling.
2. Stratified Sampling Method

Stratified sampling is a
probability sampling
technique wherein the
researcher divides the
entire population into
different subgroups or
strata, then randomly
selects the final subjects
proportionally from the
different strata.
3. Systematic Sampling
Systematic sampling is a random sampling technique which is
frequently chosen by researchers for its simplicity and its
periodic quality.
Starting number: The researcher selects an integer that must be less

than the total number of individuals in the population. This integer will
correspond to the first subject.
Interval: The researcher picks another integer which will serve as the
constant difference between any two consecutive numbers in the
4. Cluster Sampling
In cluster sampling, instead of selecting all the subjects from
the entire population right off, the researcher takes several
steps in gathering his sample population.
5. Area Probability Sample

An area probability sample is one in which geographic areas are
sampled with known probability. While an area probability sample
design could conceivably provide for selecting areas that are
themselves the units being studied,
In survey research an area probability sample is usually one in
which areas are selected as part of a clustered or multi-stage
design. In such designs, households, individuals, businesses, or
other organizations are studied, and they are sampled within the
geographical areas selected for the sample.
Area sampling is basically multistage sampling in which maps,
rather than lists or registers, serve as the sampling frame. This is
the main method of sampling in developing countries where
adequate population lists are rare. The area to be covered is
divided into a number of smaller sub-areas from which a sample is
selected at random within these areas; either a complete
enumeration is taken or a further sub-sample.
6. Multi-stage Sampling
A multi-stage sample is one in which sampling is done
sequentially across two or more hierarchical levels, such as
first at the county level, second at the census track level, third
at the block level, fourth at the household level, and ultimately
at the within-household level.
Single-stage samples include simple random sampling, systematic
random sampling, and stratified random sampling. In single-stage
samples, the elements in the target population are assembled into a
sampling frame; one of these techniques is used to directly select a
sample of elements.
In contrast, in multi-stage sampling, the sample is selected in
stages, often taking into account the hierarchical (nested)
structure of the population. The target population of elements
is divided into first-stage units, often referred to as primary
sampling units (PSUs), which are the ones sampled first. The
selected first-stage secondary .
IN THIS METHOD , SAMPLING IS SELECTED IN VARIOUS STAGES
7. MULTI-PHASE SAMPLING
IN THIS TYPE OF SAMPLING THE PROCESS IS SAME AS IN
MULTI-STAGE SAMPLING . IN THIS EACH SAMPLE IS
ADEQUATELY STUDIED BEFORE ANOTHER SAMPLE IS DRAWN
FROM IT.
IN MULTISTAGE SAMPLING ONLY THE FINAL SAMPLE IS STUDIED
WHEREAS
IN MULTI-PHASE SAMPLING ALL SAMPLES ARE RESEARCHED.
Summing up.
Hypothesis Tests
Statisticians follow a formal process to determine whether to
reject a null hypothesis, based on sample data. This process, called
hypothesis testing, consists of four steps.
1.State the hypotheses. This involves stating the null and
alternative hypotheses. The hypotheses are stated in such a way
that they are mutually exclusive. That is, if one is true, the other
must be false.
2.Formulate an analysis plan. The analysis plan describes how to
use sample data to evaluate the null hypothesis. The evaluation
often focuses around a single test statistic.
3.Analyze sample data. Find the value of the test statistic (mean
score, proportion, t-score, z-score, etc.) described in the analysis
plan.
Non-Probability Sampling:
1.Convenience Sampling
Convenience
sampling is a nonprobability sampling
technique where
subjects are selected
because of their
convenient
accessibility and
proximity to the
researcher.
2. Sequential Sampling Method

Sequential sampling is a non-probability sampling
technique wherein the researcher picks a single or a
group of subjects in a given time interval, conducts his
study, analyzes the results then picks another group of
subjects if needed and so on.
3. Quota Sampling
Quota sampling is a non-probability sampling technique wherein
the assembled sample has the same proportions of individuals
as the entire population with respect to known characteristics,
traits or focused phenomenon.
4. Judgmental Sampling
Judgmental sampling is a non-probability sampling technique where the
researcher selects units to be sampled based on their knowledge and
professional judgment.
5. Snowball sampling
Snowball sampling is a non-probability sampling technique that is used
by researchers to identify potential subjects in studies where subjects
are hard to locate.
Types of Snowball Sampling
Decision Errors
Two types of errors can result from a hypothesis test.
Type I error. A Type I error occurs when the researcher rejects a null
hypothesis when it is true. The probability of committing a Type I error is
called the significance level. This probability is also called alpha, and is
often denoted by .
Type II error. A Type II error occurs when the researcher fails to reject a
null hypothesis that is false. The probability of committing a Type II error
is called Beta, and is often denoted by . The probability of not
committing
Decision
Rules
a Type II error is called the Power of the test.
The analysis plan includes decision rules for rejecting the null
hypothesis.
P-value. The strength of evidence in support of a null hypothesis is
measured by the P-value. Suppose the test statistic is equal to S. The Pvalue is the probability of observing a test statistic as extreme as S,
assuming the null hypothesis is true. If the P-value is less than the
significance level, we reject the null hypothesis.
Region of acceptance. The region of acceptance is a range of
values. If the test statistic falls within the region of acceptance, the null
hypothesis is not rejected. The region of acceptance is defined so that
the chance of making a Type I error is equal to the significance level.
One tailed or two tailed
Car driver: You cheated me. You sold me useless

radio.
Shopkeeper: No, I sold a good radio to you.
Car driver: Radio label shows "Made in Japan" but
radio says: This is all India Radio.
Preparation of masterchart or data

sheet
CONSTRUCTION OF
SCALES
Construction of
questionnaire or scale
What a person knows (knowledge of
information)
What a person likes and dislikes (values and
preferences)
What a person thinks (opinions, attitudes,
beliefs, perceptions)
What experiences have taken place
(biography)
What is occurring at present (facts)
Likert scaling (1932)

Likert scales were developed in 1932 as the familiar five-point bipolar
response format most people are familiar with today. These scales always
ask people to indicate how much they agree or disagree, approve or
disapprove, believe to be true or false.
Frequency
Important
Quantity
Likert scaling format
Writing questions: key points

Anonymity and Confidentiality
An anonymous study is one in which nobody (not even the study directors)
can identify who provided data on completed questionnaires.
The Length of a Questionnaire: As a general rule, long questionnaires
get less response than short questionnaires (Brown, 1965; Leslie, 1970).
Color of the Paper : Berdie, Anderson and Neibuhr (1986) suggest that
color might make the survey more appealing.
Incentives: Many researchers have examined the effect of providing a
variety of nonmonetary incentives to subjects. These include token gifts
such as small packages of coffee, ball-point pens, postage stamps, or key
rings .
The "Don't Know", "Undecided", and "Neutral" Response Options
Response categories are developed for questions in order to facilitate the
process of coding and analysis. Respondents were more likely to choose the
"undecided" category when it was off to the side of the scale.
Question Wording
Steps to construct a scale

Phase I: Item generation
Face Validity
Content Validity
Phase II: Scale Development (pilot
study)
Construct validity
Criterion related validity
Reliability
Internal Consistency
Phase III: Scale Evaluation (Large scale
data collection)
Reliability
Internal Consistency
Discriminant validity
Table 1: Statements selected from various source for face

validity
Sources
Number of
statements
Previous literature
Thesis
40
8.0
Peer reviewed
papers
40
8.0
Bulletins/Mannuals/
Annual report
30
6.0
Experts consulted
Professors
75
15.0
Student
Entrepreneurs
25
5.0
Entrepreneurs
50
10.0
Entrepreneurship
websites
100
20.0
Online library
Online journal
Discussion forum
Others
Total
15
20
5
100
500
3.0
4.0
1.0
20.0
100.0
Table 2: Content validity by five

experts for developing
Entrepreneurial skills
questionnaire for graduate
Description
students
Number of
statement
Number of statements
screened at face validity
phase
evaluated by experts
satisfied Aiken`s Index >0.70
250
100.0
250
100.0
110
44.0
140
56.0
110
44.0
Internet source
Number of statements not

satisfied Aiken`s Index
considered of Pilot study
Table 4: Criterion related validity

Entrepreneurial skills
Present study Standard

scale
scale
Correlation Significant
General
11.1742.85
33.205.99
0.528
<0.001**
Managerial
108.6841.25 33.205.99
0.591
<0.001**
Manufacturing
93.6831.56
33.205.99
0.522
<0.001**
Marketing
85.9933.02
33.205.99
0.599
<0.001**
399.72132.38 33.205.99
0.604
<0.001**
Total
Table 3 Content and Construct validity by Item-Total

correlation
Factor loadings
Statements
1. I am a person who is ready to take
responsibility
2. I want to be economically
independent
3. I feel responsible for my mistakes
and take corrections
4. I persevere till I can achieve my
dream
5. I have a supportive network of
friends, family and advisers
6. I'm flexible and able to take advice
7. I am a person who makes decision
within a reasonable time frame
8. I set goals & articulate a vision
9. It is important to me to make a mark
in this life
10. I have self confidence and self
esteem
11. I have a strong need to work
independently
12. I don't start something without a
clear vision and plan of action
13. Once I start a project I pursue it
inspite of challenges
Aiken`s
Index
Item-Total
correlation
Factor 1
Factor 2
Factor 3
Factor 4
0.850
0.847
0.848
0.291
0.265
0.243
0.850
0.858
0.821
0.298
0.291
0.263
0.850
0.856
0.833
0.312
0.263
0.256
0.750
0.850
0.832
0.278
0.293
0.251
0.750
0.863
0.827
0.317
0.261
0.277
0.750
0.858
0.828
0.297
0.283
0.264
0.750
0.848
0.824
0.290
0.283
0.255
0.750
0.855
0.802
0.312
0.276
0.280
0.850
0.857
0.823
0.312
0.261
0.273
0.850
0.850
0.844
0.291
0.253
0.266
0.900
0.852
0.831
0.274
0.266
0.291
0.900
0.846
0.823
0.305
0.262
0.256
0.850
0.839
0.843
0.277
0.263
0.248
Table 6: Explorative Factor analysis: Extraction and Rotation Sums of Squared Loadings
Initial Eigen values

Compo
nents
Total
Extraction Sums of
Squared Loadings
% of
Cumul
Varian
Total
ative %
ce
Rotation Sums of
Squared Loadings
(Varimax)
% of
Cumul
Varian
Total
ative %
ce
% of
Cumul
Varian
ative %
ce
74.83
68.03
68.03
74.83
68.03
68.03
27.49
24.99
24.99
7.95
7.23
75.26
7.95
7.23
75.26
23.83
21.66
46.65
6.58
5.98
81.24
6.58
5.98
81.24
22.15
20.13
66.78
5.27
4.79
86.03
5.27
4.79
86.03
21.18
19.25
86.03
Table 7: Test-Retest reliability(Stability ) and

Cronbach Alpha (Consistency) Co-efficient based on Pilot study
Entrepreneuria Number Max Cronbach Correlatio
l skills
of items score
alpha
n
General
Managerial
Manufacturing
Marketing
Total
30
30
25
25
110
150
150
125
125
550
0.996
0.994
0.993
0.990
0.996
0.7381
0.7610
0.6652
0.7149
0.7707
Reliabilit
y Index
P value
Remark
0.8493
<0.001**
High
reliable
0.8643
High
<0.001** reliable
0.7990
Very
<0.001** reliable
0.8337
High
<0.001** reliable
0.8705
High
<0.001** reliable
VALIDITY AND
RELIABILITY
Validity refers to the accuracy
or truthfulness of a measurement. Are we measuring what we
think we are? "Validity itself is a simple concept, but the determination of the validity of a measure
is elusive
Face validity is based solely on the judgment of the researcher.
Content validity. Expert opinions, literature searches, and pretest open-ended questions help to
establish content validity.
Criterion-related validity can be either predictive or concurrent. Predictive validity refers to the
ability of an independent variable (or group of variables) to predict a future value of the dependent
variable. Concurrent validity is concerned with the relationship between two or more variables at
the same point in time.
Construct validity: It looks at the underlying theories or constructs that explain a phenomena.
Reliability is synonymous with repeatability. A measurement that yields consistent results over
time is said to be reliable.
A test-retest measure of reliability: administering the same instrument to the same group of
people at two different points in time.
equivalent-form technique: The researcher creates two different instruments designed to
measure identical constructs.
split-half reliability: measures of internal consistency
Statistical Methods
STATISTICAL METHODS
OUTCOME
COMPARISON
PARAMETRIC
NON-PARAMETRIC
MEAN/SD
ONE GROUP
ONE SAMPLE Z TEST
RUN TEST
NO(%)
ONE GROUP
EXACT TESTS
CHI-SQUARE TEST
FISHER EXACT TEST
MEAN/SD
TWO GROUP
STUDENT T TEST
MANN WHITENY U TEST
MEAN/SD
TWO GROUP
(PRE-POST)
STUDNT T
TEST(PAIRED)
WILCOXON
SIGNED RANK TEST
NO(%)
TWO GROUP
MEAN/SD
THREE OR MORE
GROUPS
NO(%)
THREE OR MORE
GROUPS
RELATIONSHIP
CHI-SQUARE TEST
FISHER EXACT TEST
ANOVA
KRUSKAL WALLIES
TEST
CHI-SQUARE TEST
FISHER EXACT TEST
PEARSON
CORRELATION
REGRESSION
SPEARMAN
CORRELATION
Some Important scales

The Beck Depression Inventory. The 1970 version of the Beck
Depression Inventory is a 21-item selfreport inventory. Each item consists
of four alternative statements that represent gradations of a given
symptom rated in severity from 0 to 3. The scale is scored by summing the
item ratings; the total scores can range from 0 to 63. The instrument was
either self-administered by the patients or read aloud by one of the
The
Hopelessness
research
assistants. Scale. The Hopelessness Scale consists of 20 truefalse statements that assess the extent of pessimism. Each of the 20 items
is scored 1 or 0; the total score is the sum of the individual item scores. The
possible range of scores is from 0 to 20. The method of administration was
similar to the procedure used for the Beck scale.
The Scale for Suicide Ideation. This scale quantifies the severity of
current suicidal ideas and wishes. It was developed on the basis of
systematic clinical observations and interviews with suicidal patients. The
scale includes 19 items; each of these is composed of three choices that
range from 0 (least severe) to 2 (most severe) of the given construct. The
total score is computed by summing the item ratings; the scores can range
from 0 to 38. The items quantify the frequency and duration of suicidal
thoughts as well as the patients' attitudes toward them.
Japanese Prime Minister:

Give me Bihar for 3 years, we
will turn it into Japan.
Laloo: Give me Japan for 3
months, I will turn it into Bihar.
Fundamental Technique
of Life table or Survival
analysis
time to event
Time to death, time to relapse,
recovery, pregnancy, receiving organ
transplant, failure to treatment
Uses the time of entry
Answer the question of the chance of
survival after being diagnosed with
disease or after beginning the
treatment
Example
A group of 200 subjects followed for
three years
Deaths (Events) occurred throughout
three years
What is the chance of surviving at
the end of three years??
Interva It
l
1
200
dt
qt
pt
Pt
20
0.1
0.9
1.0
180
30
0.17
0.83
0.9
150
40
0.27
0.73
0.747
0.73x0.747=0.545
0.545
It: Number alive at the beginning of time t

dt:Number of deaths during the time interval
qt:dt/It=prob of dying during the time interval
pt=1-qt: Probability of surviving in the time interval
Pt=Cum probability of survival at the beginning of time interval or at
the end of previous time interval
Meta-analysis
About 40,000 journals for the sciences,
the researchers are filling those journals at the rate of one article
every 30 seconds.
As results accumulate, it becomes increasingly difficult to
understand what they tell us and becomes difficult to find the
knowledge in this flood of information.
Meta-analysis is a rapidly expanding area of research that has been
relatively underutilized in Animal Nutrition and Physiology
Meta-analysis is also an evidence based research
Meta-analysis is a analysis of analysis, the statistical analysis of
large collection of results from individual studies for the purpose
of integrating the findings.
Meta-analysis aims to quantitatively combine

results of different studies
Pooled estimates should be a weighted average
of all studies included in meta- analysis.
Increases the statistical power
Resources
Schoenfeld, Richter. Nomograms for calculating the number of

patients needed for a clinical trial with survival as an endpoint.
Biometrics. 38(1):163-170, 1982.
Bland JM and Altman DG. One and two sided tests of significance.
British Medical Journal 309: 248, 1994.
Pepe, Longton, Anderson, Schummer. Selecting differentially
expressed genes from microarry experiments. Biometrics.
59(1):133-142, 2003.
http://faculty.vassar.edu/lowry/VassarStats.html
Statistics guide for research grant applicants, St. Georges Hospital
Medical School (http://www.sghms.ac.uk/depts/phs/guide/size.htm)
http://www.physics.csbsju.edu/stats/contingency_NROW_NCOLUMN
_form.html
http://www.physics.csbsju.edu/stats/exact_NROW_NCOLUMN_form.
html
http://www.randomization.com/
http://www.graphpad.com/quickcalcs/index.cfm
Clinical Study Protocol

Administrative Structure
Sponsor Signature page
Investigator Signature Page
Facilities
1.0 List of abbreviations and definitions of terms
2.0 Introduction
2.1 Background
2.1.1 Non clinical Summary
2.1.2 Clinical Summary
2.2 Benefits and Risks
2.2.1 Benefits and Risks of the study
2.2.2 Benefits and Risks of the study Drug
2.3 Rationale for the study
3.0 Study Objectives

3.1 Primary objectives
3.2 Secondary objectives
3.3 Exploratory Objectives
4.0 Investigational Plan
4.1 Study design
4.2 Blinding
4.3 Randomization/ treatment Groups
4.4 Study endpoints
4.4.1 Pharmacokinetic endpoints
4.4.2 Pharmacodynamic endpoints
4.4.3 Efficacy endpoints
4.4.4 Safety Endpoints
4.4.5 Immunogenicity Endpoints
4.5 Duration of the Study
4.5.1 Part 1 treatment and Evaluation
4.5.2 Part 2 Recovery and follow-up
4.6 Selection of Study population

4.6.1 Inclusion Criteria
4.6.2 Exclusion Criteria
4.6.3 Protocol Required Restrictions
4.6.4 Patient Withdrawal
4.6.5 Screen failure or Rescreening
4.7 Study Assessments
4.7.1 Clinical laboratory Evaluations
4.7.2 Vital Signs
4.7.3 Physical Examination
4.7.4 Electrocardiogram
4.7.5 Chest X ray and Quantiferon -TB Gold Test
4.8 Study Periods
4.8.1 Part 1 Treatment and evaluation
4.8.1.1 Screening
4.8.1.2 Randomization Visit 1(week 0), Day 1
4.8.1.3 Evaluation Visit 2, Visit3 Visit 4 .
4.8.3 End- of-study
5.0 Study treatments

5.1 Treatment administered
5.2 Investigational products
5.3 Packaging and Labelling
5.4 Storage
5.5 Preparation
5.6 Study Medication Accountability
5.7 Prior and concomitant Treatments
5.8 Rescue Therapy
5.8.1 Prohibited Concomitant Medications
5.8.2 Allowed Medications
5.9 Treatment Compliance
6.0 Pharmacokinetic, Pharmacodynamic, efficacy outcome and
immunogenicity assessment
6.1 Pharmacokinetics and Pharmacodynamics
6.1.1 Pharmacokinetics
6.1.2 Pharmacodynamics
6.1 Pharmacokinetics and Pharmacodynamics

6.1.1 Pharmacokinetics
6.1.2 Pharmacodynamics
6.2 Efficacy Outcomes
6.2.1 Health assessment questionnaire Disability index
6.3 Immunogenicity
6.4 Appropriateness of Measurements
7.0 Safety
7.1 Drug associated adverse events- warnings, precautions and treatment
recommendations
7.2 Adverse Events
7.2.1 Definitions and general guidelines
7.2.2 Clinical laboratory abnormalities and other abnormal assessments
7.2.3 Pregnancy
7.2.4 Safety monitoring
8.0 Statistics
8.1 Determination of sample size
8.2 Statistical Methods
8.2.1 Interim Analysis
8.2.2 Efficacy Analysis
8.2.3 Safety Analysis
8.2.4 Analysis Populations
8.3 Safety
8.3.1 Adverse Events
8.3.2 Clinical Laboratory Evaluations
8.3.3 Vital signs Measurements, physical findings and other safety Evaluations
8.3.4 Immunogenicity
8.4 Pharmacokinetic Analysis
8.5 Pharmacodynamic analysis
9.0 Ethics
9.1 Institutional Review Board or Independent Ethics Committee
10.0 Data Integrity and Quality Assurance

10.1 Monitoring
10.2 Data Management/ Coding
10.3 Quality Assurance Audit
10.4 Ethical Conduct of the study
10.5 Patient Information and informed consent
10.6 Patient Data Protection
11.0 Study Administration
11.1 Administrative Structure
11.2 Study and study Centre closure
11.3 Study discontinuation
11.4 Data handling and Record Keeping
11.5 Direct Access to source Data/Documentation
11.6 Investigator Information
11.6.1 Investigator Obligations

11.6.2 Protocol Signatures
11.6.3 Publication Policy
11.7 Financing and Insurance
11.8 Interpretation of the Protocol/Protocol Amendment(s)
11.9 Protocol Deviations/Violations
12.0 Appendices
12.1 Appendix 1:Schedule of Assessments
( and so on.... list of appendices)
13.0 References
List Of Tables
Contact details: 9341321900/9341364359/9663590140

sureshkp97@gmail.com
The winners in life think constantly in terms of I can, I will, and I am. Losers, on
the other hand, concentrate their waking thoughts on what they should have or
would have done, or what they can't do.
Thank you

ResearchMethodology Jan2015

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ResearchMethodology Jan2015

Uploaded by

Copyright:

Available Formats

Research Methodology and

Dr.K.P.Suresh ,Ph.D (Biostatistics)

A statistician is someone who, with his

The logic of scientific

The whole point of

We have the ability to

CLINICAL RESEARCH PROCESS

Intellectual Property rights in

Stages of Scientific Knowledge

3. Explanation or analytical research

Confounding: Confounding is the distortion of the

Decision theory and Statistics

Statistics for decision

Accuracy and Precision

Waiting hurts. Forgetting hurts. But not

Testing Surveillance System

How to control errors

Testing Surveillance System

infection or disease; or to detect an exotic or new disease

Universal TRUTH we learnt

Applying Inclusion and

Specify clearly, completely and unequivocally the question

Identify, specify in detail and plan how to measure the

Review your definitions of population and sample and verify

Review the sampling scheme to obtain the data

Specify exactly the null and alternative hypothesis

Select the risks for Type I and Type II error

Choose the form of statistical test

Verify that your sample size is adequate to achieve the

At this point , obtain your data

Identify and list and test possible biases

Perform the statistical test and form the conclusion

In many ways the design of

Consideration of design is also important

Nature determines who is

determines who is exposed

Correlation studies and Modeling:

Cross sectional studies

Case control studies

Advantages/Disadvantages of CaseControl study

Uncertain if exposure preceded

Require reasonably small

Potential for recall bias

Selection bias (Recruitment

Sensible for study of rare

Unable to estimate disease

Case control studies are less

Can test current hypothesis

Parallel Study designs

1. Randomized controlled trials require one or more control

Cross-over Study designs

Duration of study : May take

Can collect multiple

Subjects must be followed

Exposure information can

Losses potentially invalidate

Can collect information as

Case-control works from outcome (or presence of disease) to

Nested Case-Control Study

cases of a disease that occur in a

potentially offers impressive reductions

Shift effect or symptomatic effect

1.A study is done to examine the association