You are on page 1of 14

NurseResearcher 8/4

10/7/01

10:10 am

Page 24

24

The Delphi approach

Considerations in using the


Delphi approach: design,
questions and answers
There are many pitfalls to the Delphi approach which can lead
researchers to produce invalid results. The type of questions asked and
the scale of responses is one pitfall. Laurence Moseley and Donna
Mead give some ideas from practical experience which may help to
avoid some of these traps.

Introduction
A major characteristic of a Delphi study is the asking of questions. The
questioning is iterative, and is intended to find a consensus: it is still
asking questions. Devising questions is a more complex task than it at
first sight appears. There are many pitfalls for the unwary, and there is
extensive relevant research addressing these methodological pitfalls:
from the very early exploratory days (Thorndike 1920) to more
modern large-scale question testing (Schuman and Presser 1996).
Much of this research is widely cited: sometimes in textbooks on
research methods or questionnaire design (Festinger and Katz 1965,
Oppenheim 1966), sometimes in experimental studies or metaanalyses (Kasten and Weintraub 1999, Hoyt and Kerns 1999,
Engelhard and Stone 1998, Wilson et al 1993). As asking questions is
a major part of any Delphi study, such work is relevant to Delphi
research, and in this paper we intend to concentrate on the design of
questions and potential answers.
Given what we have known for many years about the vagaries of
language use (Belson 1981 and 1986), question ordering (Schuman and
Presser 1996), question answering (Thorndike 1920, Kahneman 1982,
Slovic and Tversky 1982 inter alia), it is important that detailed
consideration is given to such issues, and reported. Question design is
NURSE RESEARCHER VOLUME 8 NUMBER 4

NurseResearcher 8/4

10/7/01

10:10 am

Page 25

25

open to many researcher biases and to errors in respondent judgements.


However, the real concern is not so much that the researchers may
consciously or unconsciously mislead the reader, rather it is that they
may, probably unconsciously, mislead themselves, and the reader has
no way of checking whether that has happened or not.
This concern means that a single piece of work may not be replicable
and that, because of this, the body of knowledge about a topic is not
cumulative. The picture we obtain does not become clearer and clearer
with each study undertaken. Instead, we have a series of studies in
which any differences which emerge, rather than being substantive
findings, are open to a wide range of methodological criticisms and
discussion. It is the intention of those who are concerned with both the
broad sweep and the minutiae of methods of data gathering from
human respondents to remove, or at least reduce, the scope for such
varying interpretations.
Instead of relying on data which are gathered, analysed, and
summarised in some unspecified way inside the heads of the
researchers, we wish to make the process open and transparent, so that
readers and other researchers can attempt either to repeat the process, or
to modify it consciously to introduce an element of replicability and the
possibility of cumulation of comparable evidence. This applies as much
to Delphi studies as to any other research.

The statement-generating question (round 1)


When one comes to the actual question to be printed on the round 1
section of the part of the research strictly concerned with Delphi, there
are still other points to consider. Any study that involves asking people
a question and then analysing some amalgamation of their answers is
heavily dependent on the exact wording of the question posed. If the
question is leading, ambiguous, or has other defects, then the
interpretation of the answers is difficult, or at least unsafe. A question
such as: Which services are important for families who have a
member suffering from _______? looks fairly straightforward.
However, even such a question has limitations. Apart from the obvious
problems of definition (does one include extended families, step
NURSE RESEARCHER VOLUME 8 NUMBER 4

NurseResearcher 8/4

10/7/01

10:10 am

Page 26

26

The Delphi approach


families, different generations of families?) there is a problem of what
one would regard as a satisfactory answer. What would you do if one
respondent apparently thought very deeply about it and produced only
one suggestion? What would one do if another respondent came up
with 27 suggestions? Are these two people answering what is
psychologically the same question? Could one ask instead: Which
five services are important ?
By limiting the number of suggestions to five or some other small
number, it is possible to lose potentially useful information, or at least
data. However, one is also gaining in two ways. Firstly, you are
starting to try to focus the respondents attention on the importance of
different suggested items. Secondly the problem is being limited to a
size which the human mind can handle (72 items at one time) (Miller
1956, Baddeley 1994). This problem is discussed further in the section
on Ranking and Comparing later in this article.
Of course, in order to obtain one set of statements, one may ask
several sub-questions with a view to amalgamating the answers.
(Butterworth and Bishop 1995). For example, you might ask: What
should one, Have you ever, What would you recommend.
Even though the words are different, the material being sought is the
same in each case. In a sense, the wording differences do not matter,
because the statements generated will in any case be subjected to one
or more rating rounds.

Is the statement-rating question unidimensional? (round 2)


In a standard Delphi study, one asks for ratings in round 2, processes
the responses, perhaps sets to one side items on which there is already
consensus, (to be reported at the end of the study), removes items on
which there is consensus that they are low priority, and for the
remainder of the items feeds back to each respondent their own
original responses together with an indication of what other panel
members thought. The intention of this iteration is to try to generate
consensus, but to do so on the basis of opinions in social isolation,
rather than of personality, bullying, or other power relationships.
NURSE RESEARCHER VOLUME 8 NUMBER 4

NurseResearcher 8/4

10/7/01

10:10 am

Page 27

27

Iteration or Repetition
Repetition of an identical question in consecutive rounds to attempt to
generate consensus is not always the correct thing to do. To take a
study on childhood cancers, for example, there were clearly two
dimensions to the problem. One of these posed the question, How
many families would benefit if a particular policy was implemented?
This is a clear question, and it is worth asking in its own right. If there
is consensus that all or most families would benefit from a certain
policy, there is probably merit in pursuing that policy. We called this
the breadth question. However, there is a second dimension which
needs to be teased out. That is the answer to the question: How much
would each family benefit if policy X was introduced ?. This we
called the depth question. Cross-tabulating these two dimensions
clarifies the policy decision which the study is supposed to inform. We
present below possible sets of results in the form of a 2 x 2 table.
Table 1: Envisaging multiple dimensions in a policy-oriented Delphi study

Number of beneficiaries

Degree of benefit
High

Low

High

Low

Clearly, if policies fall within cell A (high benefit to large numbers of


beneficiaries), we are likely to recommend those policies. Similarly,
those which fall within cell D are unlikely to be given priority. However,
for the other two cells it is not immediately apparent in which order we
would recommend them. To select one above the other would be to take
a policy decision. If we choose cell C, we are automatically saying that
we prefer great benefit, even if to only a few people. Similarly, if we
choose cell B we are asserting that we wish to benefit as many people as
possible, even though the benefit may be relatively low in some cases, or
even if we could have obtained greater benefit for a smaller number of
people. Setting out the options in the form of a simple table often
clarifies the decisions which we have to make as researchers.
Other researchers have faced the same problem. In a study of likely
NURSE RESEARCHER VOLUME 8 NUMBER 4

NurseResearcher 8/4

10/7/01

10:10 am

Page 28

28

The Delphi approach


changes in health policy in Europe, the researchers wished to know
two things: were potential changes desirable and were those potential
changes likely (Butterworth et al 1991)? How does one tackle such
multi-dimensionality in the central research question of the Delphi? In
the Butterworth study, the respondents asked both questions in one
round, offering two rating scales, one above the other, one for
desirability, one for probability of occurrence. A similar approach was
adopted by Tan (1997). It is not necessarily the best way of tackling
multi-dimensionality. Firstly, it increases the psychological burden on
the respondent, in requiring him or her to do two things at once, and in
causing a confusing interaction in rating the two labels. Second, it
increases the length of the questionnaire. This is likely to lead to
attrition, and to pose difficult interaction problems to the respondents.
One consideration which may cast light on the problem of multidimensionality is that it is not essential to ask the same question in
round 3 as you did in round 2. In the paediatric cancer study, in round 2
we asked: Approximately how many families would benefit from this
policy, while for round 3 we asked the question: How much benefit
would the families obtain from this policy?. It follows from using a
different question that the range of possible responses will need to be
modified to match the content of the new question. Thus, in round 3 we
had responses running from: A great deal to None at all rather than
the round 2 options, All families to No families. A final decision is
which of the statements should be submitted for judgement in round 3.
Once the analysis of round 2 is complete, we have far more
information than before that round. We know, for example, which of
the policies are agreed to be of use to only a minority of families (or
even to none of them). Do we really need to know how much benefit
each family will get from a policy which would help only a small
number of them? We suggest not, although because we have
articulated that choice, readers can choose to disagree with us, and will
know why they disagree. Similarly, do we need to estimate how much
benefit would come to each family when the panel cannot agree on
how many families would benefit? We suggest not.
NURSE RESEARCHER VOLUME 8 NUMBER 4

NurseResearcher 8/4

10/7/01

10:10 am

Page 29

29

In our study, we decided that the statements to be submitted to round 3


would be only those which had met our criteria for importance and
consensus in round 2. In other words, for round 3 (the depth rating
round), the panel were offered only those statements which had already
been judged to be of use to substantial numbers of families. Thus, we
consciously chose: breadth first, depth second. The reason for this
ordering was that the study was policy oriented, and there was little point
in assessing the utility of policies which had not been judged as being
widely applicable. It also makes the psychological tasks facing our panel
much easier, since at one point in time they are being asked to make only
one type of judgement, and the problems of interaction (the answer to one
question unduly influencing the answer to another one) are removed.
Thus, we would argue that when one approaches a Delphi study
which appears to exhibit multi-dimensionality, a case can be made for
splitting the task into more than one dimension, if that is inherent in
our purpose, and then doing the following:
approaching each of the dimensions in different rounds
asking a different question in each round
offering a different range of responses in each round
selecting which statements to retain and submit to each round.

The range of responses to be offered in the rating rounds


We have to think of three major considerations when deciding on
which responses to offer to the panel. The first is whether the
responses actually represent answers to the question. It is all too easy
to fall into the trap of offering responses such as: Strongly agree,
Very important, or the like when the question explicitly or implicitly
invited an answer which was a numerical estimate such as: A great
deal, Hardly any, All the time. In the study of services for
childhood cancer, the question in round 2 started How many, and
the range of answers had to be consonant with that, e.g. All, Most,
None.
The second consideration is what sort of response scale to offer. The
conventional rating format is that of a Likert-type scale, usually with all
the points labelled. That is very widely used and is frequently adequate.
NURSE RESEARCHER VOLUME 8 NUMBER 4

NurseResearcher 8/4

10/7/01

10:10 am

Page 30

30

The Delphi approach


Likert-type scales
In most Delphi studies researchers have used Likert-type scales. Even
with such an apparently simple device, one has to take care. There are
many common problems in using them. They include: faking good
(social desirability) or deviation (faking bad), the hello-goodbye
effect, yea-saying, end aversion, positive skew, halo effects, and the
framing effect (an interaction between the wording of the question and
that of the answers offered). Such effects are covered in texts on
scaling e.g. Streiner and Norman 1991).
Perhaps the most important single question is, How many points
should there be on the scale? This usually varies between three and
11. We would argue that 11 is too many, largely because it exceeds
human capacity for holding items in short-term memory (Miller 1956,
Baddeley 1994). However, equally we believe that three or five is too
few. One reason is that such small numbers give little scope for fine
judgements. More important, however, is the fact that the use of more
points (we normally use seven) permits one at the analysis stage to
combine answers into broader categories, while having too few points
means that information is lost and one cannot disaggregate judgements
post hoc. On a seven-point scale one can aggregate by recoding in at
least the following ways:
Table 2: Some potential aggregations of ratings on a seven-point scale

123
12
1

567 Negative-middling-positive

345 67 More discriminating negative-middling-positive


23456 7

Stresses strong or extreme judgements

12345

67 Picks out strong positive opinions

1234

56

Unenthusiastic-positive-enthusiastic

and probably many more. At the stage of analysis, one can obtain
subtle differences in judgements between statements and between
individual judges. An often unacknowledged weakness of short scales is
that any measurement of changes in judgement (e.g. before and after
some event, or between rounds of a Delphi study) leaves very little space
NURSE RESEARCHER VOLUME 8 NUMBER 4

NurseResearcher 8/4

10/7/01

10:10 am

Page 31

31

for respondents to move into. Each step between scale points is large, and
more psychological effort is needed to make it. Thus, even the apparently
simple task of choosing the number of points on the scale needs thought.
Labelling the points is also not without its problems. We usually try to
make the end-aversion phenomenon into a virtue. We regularly label the
extreme points on the scale (1 and 7) with strong labels such as
Extremely strongly agree, In all cases, Never, Literally under no
circumstances, and the like. The reasoning behind this is that respondents
are unlikely to choose such labels, but when they do make such choices,
they really mean what they say, and we can thus have more confidence in
the extreme judgements made, whether positive or negative.
We also recommend leaving some points unlabelled on the grounds
that respondents may genuinely waver between labelled points, in that
they may not be able exactly to agree with our labels, and we thus
leave room for intermediate judgements. Commonly points two, four
and six will be unlabelled, although there is frequently good reason to
label point four, because of the variety of ways in which people may
make a middling judgement (I dont know, It varies from case to
case, It varies from setting to setting).
Whatever answering scheme one adopts, it should be tested. There is
no point in writing down a scale with labels and hoping that people
will understand the words in exactly the same way that you do. Any
Likert type scale should be piloted, and on a population similar to the
final panel. This should include ensuring that the words are
comprehensible and apparently have the meaning which was intended.
In drawing up statements for Thurstone scales (in which a panel judges
the emotional strength of each statement), we have found that one
obtains an acceptable level of consensus on only about 10 per cent of
the statements. In one study, we had to start with 252 statements to end
up with 22 on which there was adequate agreement on the emotional
tone which they exhibited (Moseley et al 1998).
Equally important is to estimate the response frequency, i.e. to see
how many people check each value. If no one uses points 1 to 4, then
you have effectively only a three-point scale, and all the advantages of
discrimination are lost. This is a particular problem if they cluster at
NURSE RESEARCHER VOLUME 8 NUMBER 4

NurseResearcher 8/4

10/7/01

10:10 am

Page 32

32

The Delphi approach


point seven. As we note elsewhere, if everything is important, then
nothing is important. Overall, the choice of the scale length and labels
is not a trivial, and certainly not an automatic task. Whatever you
choose to do, it must be tested.
One should also consider alternatives to the Likert-type scale. These
alternatives include (although this is not an exhaustive list):
ranking and comparing
magnitude ratio scaling (MRS)
visual analogue scales (VAS)

Ranking and comparing


These methods are different in two ways from Likert-type scales. Firstly,
the ratings may not be a simple ordinal Likert-type scale with an upper
limit. Secondly, one does not have to ask for judgements of statements
on their own. Rather we may ask for the respondents to judge statements
relative to each other (How much better would A be than B would?).
However, it is difficult for the human mind to process many dozens (or
hundreds) of item by item comparisons. Even with only a handful of
statements, the number of comparisons which have to be made rapidly
grows into double figures or more. Not only does each rater have to
make a large number of comparisons (which may again well lead to
rater fatigue and attrition), but also it is very difficult to remember what
sorts of judgements they had made in other cases. It is possible to end up
with a situation in which A is more desirable than B, B is more desirable
than C, but A is less desirable than C. With only 20 statements (policies
to be compared and ranked), one has to make 190 comparisons. If this
method had been used for the childhood cancer study referred to above,
to make a complete sweep of one-to-one comparisons of the 57 policies
suggested would have required 1,596 comparisons. As the number of
items to be compared grows, the number of possible comparisons grows
very rapidly. This process is referred to as a combinatorial explosion, and
its analysis is called combinatorial complexity. Indeed, there is a field of
study called combinatorics.
To overcome these burdens, we have tried two methods. Method 1 is
to let the comparisons be implicit rather than explicit. For example,
NURSE RESEARCHER VOLUME 8 NUMBER 4

NurseResearcher 8/4

10/7/01

10:10 am

Page 33

33

one might ask people how much time, money, how many staff, or
whatever, are required for each policy. One then takes these, and the
researcher, rather than the respondent, makes the comparison. In a
study of the importance of various aspects of the UKCC Scope of
Professional Practice document ( UKCC 1992), we used this method.
However, it worked only because we had developed a computer
program which allowed the respondent to input the number of hours
(or whatever) devoted to a given activity, which checked the data (e.g.
that they were not working hundreds of hours per week), let them
modify their input, and then finally displayed the rankings implicit in
their absolute data. Despite the obvious utility of this method, and the
richness of the data which it produces, on training, economic and time
grounds it is probably not practicable for most Delphi studies, and
cannot be anonymous. However, its existence does show that there are
sound ways of conducting a Delphi study which are not conventional,
and that it is important to consider all alternatives.

Magnitude ratio scales (MRS)


Method 2 is a second approach to relative comparisons, and is called
Magnitude Ratio Scaling (MRS). With simple exhaustive ranking, for
policies A, B, C, D one would make the comparisons A-B, A-C, A-D,
B-C, and so on. This rapidly produces a combinatorial explosion. With
the MRS one instead takes a single item (called the anchor point) and
then asks the respondent to rank each of the rest of the items, one-byone, relative not to each other, but to that anchor point, and only to that
anchor point. This reduces the combinatorial complexity of the task. In
the childhood cancer study, using the MRS reduced the number of
comparisons to be made by each respondent from 1,596 to the much
more manageable 56. However, one can go a stage further. In most use
of Likert scales, the data are at best at the ordinal level. This means that
there is an artificial upper limit on the ratings which can be given. If the
scale runs from, say, one to seven, then it is impossible for any
statement to be rated higher than seven. One may get a clear idea of the
order of preference of our respondents, but it is impossible for us to get
a comprehensive view of the size of their preferences.
NURSE RESEARCHER VOLUME 8 NUMBER 4

NurseResearcher 8/4

10/7/01

10:10 am

Page 34

34

The Delphi approach


To gain an idea of the relative sizes of the preferences Magnitude
Ratio Scaling (MRS) can be used. This has been used, for example, to
try to estimate how much more serious one crime is than another
(Sellin and Wolfgang 1984). On a Likert scale, one might have an
ascending order of seriousness, say, car theft 1, burglary 3, rape 6,
murder 7. However, the views of judges might actually follow a pattern
in which those crimes are in the same order, but with different weights
attached. Thus, it might be useful to have results which read: car theft
500, burglary 600, rape 5,000, murder 20,000. Thus, we get not only
the order of ranking the statements, but a clearer idea of the size of gaps
between the rankings.
Of course, the analysis of the results has to be undertaken in a
different way. Simple arithmetic means could give misleading results if
some respondents (but not others) chose to give ratings in the hundreds
of thousands or millions range. To overcome this, the MRS is not
calculated using an arithmetic mean, but rather a geometric mean,
which lessens the overwhelming effect of a few extreme ratings, while
allowing such extreme ratings to be made and to have some effect.
This method was used, for example, in Meads Delphi study in which
she tried to identify the dimensions of primary nursing. It was so
effective for that purpose that one could generate weights which were
then used for judging the presence or absence of primary nursing on a
given ward (Mead 1993). It is so powerful that the weights generated
in that study could be used to produce a completely automatic
computer-written report on a given ward. When evaluated on a sample
of wards, it was such that the sisters on those wards reported a 93 per
cent accuracy rate for those reports (Mead and Moseley 1994, Moseley
et al 1997).

Visual analogue scales


Visual Analogue Scales (VAS) consist simply of a line (horizontal or
vertical) 100mm long which is labelled only at the extreme ends. They
are used, for example in studies which measure the subjective
experience of pain. In that case, the labels might run from No pain at
NURSE RESEARCHER VOLUME 8 NUMBER 4

NurseResearcher 8/4

10/7/01

10:10 am

Page 35

35

all to The worst pain that I could imagine. Note that there are no
intermediate labels. In a Delphi study, they could be labelled, say, Of
no use at all up to Absolutely essential. These scales have several
advantages:
they make a minimal use of words, thus overcoming problems of
linguistic interpretation.
they are simple to complete (one merely puts a pencil mark on the
line)
they can be read and interpreted automatically by a scanning device,
thus avoiding the errors of human data input
they produce data which is at the ratio level of measurement, and
which are therefore suitable for a wider range of statistical
manipulations than are safely usable with ordinal data
they give order and distance
in the pain domain at least there is considerable evidence that they
are reliable and valid (Coll 2000).
Visual analogue scales have potential advantages, and we mention
them here in the hope that other researchers will experiment with them
as part of their own Delphi studies.

Summary
The Delphi approach has been widely used in a variety of fields. It has
strengths in overcoming many of the social and psychological
problems associated with opinion and attitude measurement. It can be
seen at first sight as a single technique, with a fixed method which all
future researchers could follow. However, even at the simplest level, it
is really an approach rather than a method. It has a general shape
(Generate statements, Formulate a question (or questions), Undertake
round 1, Analyse the responses, Undertake round 2, and so on).
However, what one does at each stage can legitimately vary.
there is no reason why one must restrict oneself to the panel
members as a source of the statements in the first place. Other
sources are possible and legitimate, although we would regard these
as additional too, rather than in place of, round 1.
the researcher does not have to have only one question about one
NURSE RESEARCHER VOLUME 8 NUMBER 4

NurseResearcher 8/4

10/7/01

10:10 am

Page 36

36

The Delphi approach


domain. There are choices, although not necessarily all those choices
will eventuate in a Delphi study.
within a domain, many apparently simple questions turn out upon
analysis to be multi-dimensional. Our view is that those dimensions
should be made clear and investigated either by multiple
simultaneous questions, or preferably by serial questioning in
multiple rounds. Robust and transparent questionnaire design and
analysis are fundamental to the success of the study. These two
features are vital to replicability.
most researchers would regard as desirable the elicitation not merely
of consensus, but also of some ratings of high importance. Few
would be satisfied with a panel which said in effect We have a
consensus that we have no idea what to do about the problem which
the study is intended to handle
it is important that one considers alternative forms of rating. When
one has available various forms of ranking or comparison methods,
the unthinking use of simple Likert scales may well be sub-optimal.
whatever decisions are made, they should be reported fully enough to
make replication of each stage possible.
The important point, though, is that one has choices - from deciding
whether to undertake a Delphi study at all through to the most abstruse
points of statistical analysis. Some of those choices are by no means
obvious, and the first step in any Delphi project should be to ensure
that those choices have at least been considered. They include the
decision whether to follow the conventions or consciously to innovate.
Donna Mead RGN, MSc, PhD is Professor of Nursing and Head of
School of Care Sciences, University of Glamorgan
Laurence Moseley MA, Dip Soc, Ad, MBCS Professor of Health Services
Research, School of Care Sciences, University of Glamorgan
Acknowledgements
We have drawn on the work of two of our PhD students (Liz Parry and Aldo Picek) in some
of the examples in this paper. We acknowledge their contribution, and thank them.

NURSE RESEARCHER VOLUME 8 NUMBER 4

NurseResearcher 8/4

10/7/01

10:10 am

Page 37

37

References
Baddeley A (1994) The Magical Number 7 - Miller G (1956) The Magical Number Seven,
still magic after all these years. Psychological Plus or Minus Two: some limits on our
Review, 101, 2: pp 353-356.
capacity for processing information.
Psychological Review, 63, 2: 81-87.
Belson WA (1981) The design and
understanding of survey questions.
Moseley LG et al (1997) Can feedback be
Aldershot, Gower.
individualised, useful, and economical?
International Journal of Nursing Studies, 34,
Belson WA (1986) Validity in survey
4, 285-294.
research. London, Gower.
Moseley LG et al (1998) Experience of,
Butterworth T (1991) Nursing in Europe: A Knowledge of, and Opinions about,
Delphi Survey. Manchester, Dept of Nursing, Computerised Decision Support Systems
University of Manchester.
among Health Care Clinicians in Wales. Report
Butterworth T, Bishop V (1995) Identifying No C/96/1/029 to the Wales Office of Research
and Development for Health and Social Care.
the characteristic of optimum practice:
findings from a survey of practice experts in Oppenheim AN (1966) Questionnaire
nursing, midwifery and health visiting.
Design and Attitude Measurement. London,
Journal of Advanced Nursing, 22, 24-32.
Heinemann.
Coll AM (2000) Quality of Life following
Day Surgery. Unpublished PhD thesis.
Glamorgan, University of Glamorgan.
Engelhard G, Stone GE (1998) Evaluating
the quality of ratings from standard-setting
judges. Educational and Psychological
Measurement, 58, 2, 176-196.
Festinger L, Katz D (1965) Research
Methods in the Behavioral Sciences. New
York, Holt, Rinehart and Winston.
Hoyt WT, Kerns MD (1999) Magnitude and
moderators of bias in observer ratings. A
meta analysis. Psychological Methods, 4, 4,
403-424.
Kahneman D et al (1982) Judgement under
uncertainty: heuristics and biases.
Cambridge UP, Part IV.
Kasten R, Weintraub Z (1999) Rating errors
and rating accuracy: a field experiment.
Human Performance, 1, 2: 137-153.
Mead DM (1993) The development of
primary nursing in NHS care giving
institutions in Wales. Unpublished PhD
Thesis. University of Wales.
Mead DM, Moseley LG (1994) Automating
ward feedback: a tentative first step. Journal
of Clinical Nursing, 3, 347-354.

Schuman H, Presser S (1996) Questions and


Answers in Attitude Surveys: experiments on
question form, wording, and context.
Thousand Oaks, Sage.
Sellin T, Wolfgang ME (1964) The
measurement of delinquency. London, Wiley.
Slovic et al (1982) Judgement under
uncertainty: heuristics and biases.
Cambridge, Cambridge University Press.
Streiner DL, Norman GR (1991) Health
Measurement Scales: a practical guide to
their development and use. Oxford, Oxford
University Press.
Tan PSW (1997) The future of nursing
education in the United Kingdom by the year
2010: a Delphi survey. Unpublished MSc
dissertation (Nursing), No. 0385288. Kings
College London.
Thorndike EL (1920). A constant error in
psychological ratings. Journal of Applied
Psychology, 4: 25-29.
UKCC (1992) Scope of Professional Practice.
London, UKCC
Wilson TD et al (1993) Scientists evaluations
of research - the biasing effects of the
importance of the topic. Psychological
Science, 4, 5, 322-325.

NURSE RESEARCHER VOLUME 8 NUMBER 4

You might also like