You are on page 1of 22

Coaley-3941-Ch-01:Coaley-Sample.

qxp

30/07/2009

8:05 PM

Page 1

1
Introduction: Foundations of Psychological Assessment

Learning Objectives
By the end of this chapter you should be able to: Understand the basic principles underlying psychological assessment, how they contrast with common perceptions, and distinguish between its different forms. Identify the key figures in the historical development of assessment methods. Give an account of the core characteristics and issues relating to different approaches. Understand their use in the different areas of applied psychology.

What is this Chapter About?


Applied psychologists ply their trade in the real world. So we have to begin by introducing many of the core definitions, characteristics and foundations underlying modern approaches to assessment and psychometrics. It helps also to have an understanding of the historical tradition preceding modern practice, so we will review its development from its historical roots, identifying those explorers who have had a significant and enduring influence. We will also take a look at some key terms and issues, followed by discussion of common types of test and how these can be classified or grouped. The chapter will conclude with brief descriptions of how and why assessments are used in the different fields of applied psychology today.

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 2

An Introduction to Psychological Assessment and Psychometrics

What Do We Mean by Psychological Assessment and Psychometrics?


The common thread that unites all of the domains of applied psychology is measurement. Psychometrics are designed to do measurement; in fact, the term is an abbreviation for psychological measurement. They form a branch of a wider field referred to as psychological assessment, which seeks to understand the psychology of the individual, whatever the circumstances, whether in clinical, forensic, educational, counselling, health, coaching or occupational settings. The complexity of the mind makes this a difficult task to achieve. A proliferation of terms used over the years has tended to cause some confusion and so the word test has been applied as a generic word for absolutely everything linked to assessment. It could mean a questionnaire or an inventory, and is interchangeable with equivalent terms such as tool, assessment, measure or instrument. But in practice there are distinctions. Lets say, for arguments sake, you feel a bit depressed and go to see a clinical or counselling psychologist. Your psychologist may firstly go through a detailed interview and make notes, and then ask you to complete a depression inventory. Or you have just been subjected to hospital treatment and feel a bit anxious about your state of health so you visit a health psychologist who goes through a similar process using an anxiety inventory. Or you apply for a new job and have to face an assessment centre which includes interviews, tests, questionnaires and work sample exercises. In all these cases you undertake an assessment which has different components. The whole process consists of a psychological assessment and is designed to describe, predict, explain, diagnose and make decisions about you. The actions required by social services to care for you, in some instances, may also be included. Therefore measurement, using quantitative inventories, tests or questionnaires, actually forms one or more parts of a broader thing called psychological assessment (see Figure 1.1). A test is a sub-component of measurement, being focussed on those tasks/questions (called items) which have right or wrong answers, and are mostly referred to as cognitive, ability or aptitude tests. That means that you cannot really describe a personality questionnaire as a personality test, even though it may make use of measurement and the fact that many experienced psychologists who have written books like this one mix the two terms. People get worried when they encounter the term personality tests, so I think it is neither an accurate description nor good public relations to use it. Similarly, a questionnaire is also a sub-component of measurement, although having items which do not have right or wrong answers. They may, for example, ask people to agree or disagree about a statement or to indicate whether a particular statement is true or false about them. A response to say a statement is false about me as an individual would, surely, not be a wrong answer. The term inventory is sometimes also used for these instruments. Lastly, the term psychometric, as I said earlier, refers to those things which are based upon a measurement process, including tests and questionnaires which are not tests. An understanding of the statistics underlying tests and questionnaires is essential for good practice use of them. To confuse things further, I prefer to describe some components solely as assessments, for
2

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 3

Introduction: Foundations of Psychological Assessment


Psychological Assessment

Measurement

Non-measurement

Correct/ incorrect item responses

Not using correct/ incorrect responses

Interviews, observations etc.

Other questionnaires/ checklists etc.

Tests Figure 1.1

Questionnaires, inventories

A taxonomy of psychological assessment

example interviews, simple checklists and observations, to distinguish them from activities which do involve measurement. So psychological assessments are far more than tests. True assessment really is a more complex enterprise involving the integration of information from different sources to get a more comprehensive understanding of a person, using multiple sources including personal, social and medical history where relevant. Measurement ultimately evolved from the study of individual differences in human psychology which has aimed to be more objective in its descriptions of people. The concern is to establish what exists rather than whether what exists is good or bad. Key questions are: What are the ways by which people differ and how can we objectively measure the differences? Over the last 100 years or so the discipline has become increasingly more scientific in its approach, and the growth of empirical thinking has had enormous consequences in how we make assessments. Psychology is concerned to discover not just what characteristics are possessed by a person, but also the way these are organized to make the individual different from others. The aim is to be more precise, enabling the trained professional to make justifiable and verifiable predictions. In other words, we seek to use clearly agreed criteria to define psychological constructs and, where possible, to measure these through the use of scales and statistical techniques. Often scales can be standardized so as to compare a person with others, for example the general population or other people diagnosed as suffering depression or anxiety, or other managers in an occupational setting. Psychometric instruments are carefully constructed to ensure their measurements are both accurate and replicable.
3

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 4

An Introduction to Psychological Assessment and Psychometrics

The science of psychology operates on the basis of clear criteria and standardized measurement scales. We need to be explicit about what we mean and how we measure, whether in research or practice. Used well, measurement can give us accurate and relevant information which leads to more effective decision-making, providing insights not available through observations and interviews. These latter methods, anyway, are often influenced by personal factors relating to the person doing the assessment. So it is clear that to adopt a scientific approach we need to base our methods upon measurement (see Box 1.1 which discusses what we mean by measurement and its benefits).

Box 1.1

What is Measurement?

Measurement is the assignment of numbers to properties or attributes of people, objects or events using a set of rules, according to Stevens (1946, 1968). From this definition several characteristics of measurement may be derived (Aguinis, Henle and Ostroff, 2001): 1 It focuses on attributes of people, objects or events not on actual people, objects or events. 2 It uses a set of rules to quantify these. They must be standardized, clear, understandable and easy to apply. 3 It consists of scaling and classification. Scaling deals with assignment of numbers so as to quantify them, i.e. to determine how much of an attribute is present. Classification refers to defining whether people, objects or events fall into the same or different categories. Aguinis et al. add that Stevens definition relates to a process of measurement. This means that: 1 Its purpose should be determined, for example, in prediction, classification or decision-making. 2 The attribute should be identified and defined. A definition needs to be agreed before it is measured or different rules may be applied, resulting in varying numbers being assigned. The purpose of measurement should guide this definition. 3 A set of rules, based on the definition, should be determined to quantify the attribute. 4 Lastly, the rules are applied to translate the attribute into numerical terms. Benefits of Measurement 1 The key benefit is objectivity, which minimises subjective judgement and allows theories to be tested (Aguinis, 1993). 2 Measurement results in quantification. This enables more detail to be gathered than through personal judgements. 3 More subtle effects can be observed and statistical analysis used to make precise statements about patterns of attributes and relationships (Pedhazur and Pedhazur Schmelkin, 1991). 4 Better communication is possible because standardized measures lead to a common language and understanding. 4

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 5

Introduction: Foundations of Psychological Assessment

Surveys of public attitudes towards psychological assessment and measurement are comparatively rare. There has been a growing recognition of the value of psychological assessment among people at large and other health professionals, increasing demand in the US. Elsewhere data is based upon acceptance of test materials and methods in the workplace. In the UK one survey found that most employers, whilst still using traditional methods, such as application forms, references and interviews, are increasingly also using ability tests, personality questionnaires and assessment centres (Hodgkinson, Daley and Payne, 1996). There have been studies of the perceptions of graduates and managers about the use of psychometrics, especially for recruitment and selection, and these are generally positive, with some worries about, for example, the need for professionally qualified administrators (as shown in Box 1.2).

Box 1.2

Evaluating Perceptions of Testing

How people think about psychological assessment is important in applied psychology. In clinical settings studies of peoples perceptions are focussed mainly on therapeutic methods and outcomes. In the workplace they have often been based upon perceptions of fairness and relevance to jobs. Increased use of unsupervised computerbased testing has been subject to evaluation because of concerns about lack of standardization and a potential for cheating. A study by Hughes and Tate (2007) demonstrates that many applicants feel that such testing is unfair. Method Participants completed an online questionnaire requesting their views and experiences regarding computer-based ability testing. The target population was made up of undergraduates and graduates who were considered more likely to have been exposed to this kind of testing. Results and Discussion A total of 46 per cent thought computer-based testing to be a fair selection method, 41 per cent felt it was not fair, 6 per cent felt it depended on circumstances and 7 per cent did not express a view. Comments of those who said that it depended on the circumstances of use tended to focus on: Its use alongside other selection measures The relevance of the test to the job The tests quality and provision of practice items and feedback Whether cheating could be controlled.

The authors say that the high proportion who did not feel the tests were fair demonstrates a need for employers to ensure tests are appropriate and the reasons for using them are explained. Their purpose and the process by which candidates are assessed should be made transparent in pre-test information. In other words, communication is a key issue in managing perceptions. (Continued) 5

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 6

An Introduction to Psychological Assessment and Psychometrics (Continued) This is particularly important for selection methods. When there is high unemployment employers might feel they can ignore candidates reactions, but in a labour shortage unpopular techniques could deter some from applying. One study even suggests that candidates who are uncomfortable with an organizations methods may react by not buying its products (Stinglhamber, Vandebberghe and Brancart, 1999). Some techniques are more popular. Interviews, work samples and assessment centres are preferred to peer assessment, personality questionnaires and abstract reasoning tests because they appear less job-related.

The picture is less clear among other European countries, with some national differences (Cook and Cripps, 2005). Assessment and measurement in occupational settings are most popular in Spain, Portugal and the Netherlands, whilst there is some resistance, notably in Germany and Turkey. There is not much information about other areas of the world Australia and New Zealand appear to have a similar approach to the UK, and in some African countries, such as Nigeria and Ghana, there has been a move towards testing. Some evidence is available, based upon personal experience, the countrys historical background and the introduction of test producers to China, that the Chinese are also using them. Psychologists and others, such as HR professionals, using assessment instruments will need more than just technical skills to make their way in the world. These skills include knowing how to administer test materials relevant to their area of practice both accurately and ethically. But they also need a sound understanding of the theoretical and conceptual foundations of their science, combined with cultural awareness. And they will need the communication skills to be able to explain what they are doing and why.

Summary
The psychology of individual differences seeks to describe the ways in which people differ, and to understand how and why these arise, and because of this assessment instruments are used widely in applied psychology today. They are founded upon an objective, scientific and empirical approach to making justifiable and verifiable predictions about people, rather than being based on subjective opinion. Psychological assessment refers to the integration of information from multiple sources in order to describe, predict, explain, diagnose and make decisions. Psychometrics are those instruments which measure peoples characteristics, having been subjected to standardization using scales which enable scores to be compared. In any form of assessment the tasks or questions are called items. Where an instrument has right/wrong items it is often referred to as a test; whilst others are better referred to as questionnaires or inventories. It is important for those who make use of these instruments to do so in an ethical way and to adhere to codes of practice.
6

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 7

Introduction: Foundations of Psychological Assessment

Historical Background
The Chinese invented gunpowder and also psychological assessment, not that the two are connected. They used testing some 4000 years ago for job selection purposes and appeared to be a test-dominated society. A variety of assessments were used for civil service examinations designed to choose Mandarins and all of the Emperors officials were examined every third year, including job sample tests to identify proficiency in arithmetic, archery, music, writing and ceremonial skills (Bowman, 1989; Doyle, 1974). Candidates were also assessed for their ability to memorize and understand the Confucian classics, as well as in essay and poem composition. Formal procedures were established, including independent assessments by at least two assessors and the standardization of test conditions, as is done often today. The Greek philosophers Plato and Aristotle also discussed individual differences in their works. Interest then declined during the Middle Ages until a new recognition of individualism came in the sixteenth century Renaissance. By the seventeenth century post-Renaissance philosophers began to look at ideas, events and phenomena in more scientific ways, leading to a new way of thinking called empiricism. This said that all factual or true knowledge comes from experience and was developed by John Locke into an organized school of thought. When Charles Darwin provided an account of the mechanisms of evolution between 1858 and 1877, he influenced early psychology. His principal thesis was that members of a species exhibit variability of characteristics and this variability results in some being better suited than others to any particular set of environmental conditions. His term characteristic meant anything which could be attributed to an individual organism, for example agility or height. Those best adapted would reproduce more prolifically, possibly being the only ones to survive to maturity and reproduce. The significance of individual differences between those belonging to the same species was, therefore, a key factor which influenced early psychologists and statisticians, many of whom contributed to the development of a new science of mental measurement. Experimental psychologists such as Gustav Fechner, Wilhelm Wundt and Hermann Ebbinghaus, discovered that psychological phenomena could be described in rational and quantitative ways. Especially important was the Englishman Francis Galton (18221911), whose career was similar to that of his cousin Darwin. You are in good company if you have felt close to a breakdown before exams because Galton studied maths at Trinity College, Cambridge, and suffered a breakdown before his finals so he didnt get a very good honours degree. But, like his cousin, Galton adopted the new scientific ideas which he thought could be proven only by careful enquiry and used his wealth to pursue this. Among many other interests, he became obsessed with making all kinds of measurements of people in his research laboratory. More than 17,000 people paid for the privilege of providing measurements, such as height, weight, strength, rate of movement and reaction times. Galton was a prolific writer and a zealous scientist. He was the first to emphasize the importance of individual differences, created the first tests of mental ability and was the first to use questionnaires. He discovered a number of statistical procedures to analyse data,
7

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 8

An Introduction to Psychological Assessment and Psychometrics

many still in use today, for example he found that a wide range of measures of human physiology and abilities produce what is still referred to as a normal curve, sometimes as the bell curve or normal distribution. He said this curve could be meaningfully summarized by its mean and standard deviation, and suggested the use of these to describe measures of human attributes. Galton also invented the scatter-plot to illustrate data. His application of exact quantitative methods resulted in the discovery of a numerical factor which he called correlation, specifying the degree of relationship between individuals or any two attributes. He was one of the first to realize the importance of posted questionnaires, which he accompanied with prizes! Outside of psychology, he was the discoverer of finger-printing and weather-reporting (Galton, 1865, 1869, 1874). The Frenchman Alfred Binet (18571911) had a rather different background, being the child of a single mother who took him to Paris at the age of 15. He qualified in law but then switched to medicine, although his interest in psychology was more important. Working at the Sorbonne in 1891, he became assistant director of the laboratory of physiological psychology and in 1905 opened a Paris laboratory for child study and experimental teaching. Influenced by Galtons work, he was appointed to a ministerial commission to study the plight of retarded school children to ensure they would have an adequate education. A mechanism was needed to identify pupils in need of alternative education. So Binet set out to identify the differences that separate the abnormal child from the normal and to measure them. He constructed a series of tests, including short, varied problems about daily life, as well as tests of cognitive processes such as memory. They were made up of a series of tasks thought to be representative of a typical childs abilities at different ages. Binet ranked the tests in accordance with age levels corresponding to performances by the average child. In doing so he distinguished between the mental age attained on the scale and the chronological age of a child. The outcomes, developed with his assistant Theodore Simon, were received throughout the world with wide acclaim. Binet and Simon published their last revision in 1911 (Binet and Simon, 1911; Binet, 1916; Binet and Simon, 1916). In the United States Lewis Terman (18771956) standardized the BinetSimon scale using sampling methods, resulting in what has since been called the StanfordBinet Intelligence Test (Terman, 1916, 1917). Galtons works also influenced Karl Pearson (18571936), who was noted for saying: Have you ever attempted to conceive all there is in the world worth knowing that not one subject in the universe is unworthy of study? A thorough polymath (meaning he liked to study many different things), Pearson could lecture in different subjects. As a freethinker, too, he hated authoritarianism, forcing Cambridge University to drop compulsory church attendance. One of Galtons books played a major part in changing his career, and he became interested in finding mathematical ways of studying evolution and heredity. As a result he wrote papers which contributed to the development of regression analysis and the correlation coefficient (think of the Pearson Product Moment Correlation Coefficient), and discovered the chi-square test of statistical significance. One of the most productive scaling theorists was Louis Thurstone (18871955), a mechanical engineer, who made important contributions to psychology. Thurstone spent most of his career at the University of Chicago where he founded the
8

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 9

Introduction: Foundations of Psychological Assessment

Psychometric Laboratory. He designed techniques for measurement scales, for the assessment of attitudes and developed test theory (Thurstone, 1919, 1953). His major contribution was in the creation of new methods of factor analysis to identify the nature and number of potential constructs within a set of observed variables. Although a mathematician, Georg Rasch (19011980) is best known for his contribution to psychometrics through the development of a group of statistical models known as Rasch models (Rasch, 1980). His work has had an influence on later adaptive testing by computers which have been used for the administration of tailored tests. In these the selection of questions to give a precise estimate of ability is based upon a rigorous model. Where people interact with assessment questions or items in a way which enables comparisons between them, Rasch models have provided a quantitative means of measuring attributes which are on a continuum or scale. One of the twentieth centurys foremost contributors was Raymond Cattell (19051998), whose first degree was in chemistry and physics. He had a major influence on the theoretical development of personality as he sought to apply empirical techniques to understand its basic structure (Cattell, 1965). He extended existing methods of factor analysis and explored new approaches to assessment, and has been unrivalled in the creation of a unified theory of individual differences, combining research in intelligence with that of personality. The first person to emphasize that different cultures have alternative concepts of what an intelligent person is and that traditional tests measure only skills valued in academia and work in industrialized societies was sometimes referred to as the test guru. Anne Anastasi (19082001) went to college at 15, completed a first degree in psychology at 19 and her doctorate in just two years. Anastasi undertook major studies of test construction, test misuse, misinterpretation and cultural bias, and was the author of the influential book Psychological Testing (1988), which has been the core text in this field since its first edition in 1954. The seventh edition was published in 1997 (Anastasi and Urbina, 1997). Lastly, we should include the first professor of psychometrics in the UK, Paul Kline (19371999), whose two major interests were psychometrics and Freudian theory. He did much to explain what has become an increasingly complex field and provided evaluations of the most widely-used tests. In his last book The New Psychometrics: Science, Psychology and Measurement (1998), he argued that truly scientific forms of measurement could be developed to provide a new psychometrics which would transform psychology from a social to a pure science. The development of diagnostic assessment in the clinical arena has a history all of its own, and has encountered problems because of its psychiatric background. Arguments have arisen between psychiatrists on the nature of mental illness and its scientific status, as well as through challenges by others. For example, the French thinker Michel Foucault wrote in his book Madness and Civilization that mental illness was a cultural construct rather than a natural fact and that the history of madness properly written would be about questions of freedom and control, knowledge and power (Foucault, 2001). The main emphasis of psychiatry has been upon the development of a scientific understanding of mental illness and of healing the mentally ill. Jean-Etienne Esquirol (17721840) transformed the classification and diagnosis of mental disorder so that
9

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 10

An Introduction to Psychological Assessment and Psychometrics

diagnosticians could develop clearly defined profiles on the basis of symptoms. Jean-Martin Charcot (18251893) extended the classification and played a key role in beginning modern psychiatry. Emil Kraepelin (18561926) also contributed significantly to the concepts of mental disease and its classification. Influenced by experimental psychology, Kraepelin also pioneered psychological testing with psychiatric patients. As a consequence of the work of Sigmund Freud (18561939) and others, classification was extended by the 1950s to include the complexes and neuroses of ordinary people, leading eventually to the depression, anxiety, eating and sexual disorders of the late twentieth century. The old rigid distinction between the mad and the sane no longer existed and many practitioners believed that most disorders were among the community at large rather than in hospitals. Most people were thought to experience some degree of mental ill-health at some time. On the shelf above me there is a postcard propped against the books it says in large letters Who is normal? Anyone can experience mental distress. No one needs the stigma to go with it. All of this has resulted in a continuing commitment to the development of assessment classifications, extending them to include milder and borderline cases and many new conditions such as Post-Traumatic Stress Disorder (PTSD) and Attention-Deficit Hyperactivity Disorder (ADHD). The handbook for this is known as the Diagnostic and Statistical Manual of Mental Disorders (DSM), of the American Psychiatric Association, first published in 1952, which was based on the mental disorders sections of the International Classification of Diseases (ICD) published by the World Health Organization. The ICD, the latest version of which is the ICD-10, classifies both mental and physical disorders, and is more widely used in Europe (World Health Organization, 2004). There is now a large degree of overlap between the two systems. A revised edition of the manual, the DSM-III, was published in 1980 and a further edition, DSM-IV in 1994, including collaboration with those developing the ICD equivalent (American Psychiatric Association, 1994). The contents have grown over the years, reflecting a large increase in the number of identified disorders. The manual has introduced detailed procedures which are widely accepted, although being subject to the criticism that they are not based upon any theory or quantitative approach and are, therefore, weak. For an enjoyable account and critique of the DSM, see Kutchins and Kirk (1997). As with all previous psychiatric classifications, it is accused of containing clinical observations which are treated as objective and independent of any theory, the classical reference being Szasz (1970). The most recent version mentions traits in descriptions and use of this term needs objective evidence on the basis of the statistical tool of factor analysis. An additional criticism concerns the overlap between diagnostic criteria for categories, being either identical or very similar in some cases. Indeed, research by Widiger and Costa (1994) found no evidence to support the DSM-IV classifications. There have also been arguments over its unnecessary medicalization of typical characteristics of people, for example the addition of shyness as a psychiatric disorder. Kline is damning: It would be possible to agree that, whenever a sigh of wind was heard in a chimney, a unicorn had passed overhead. With good training the judgement between wind and unicorn could be perfect (2000: 377). Whether the unicorn exists is another matter!
10

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 11

Introduction: Foundations of Psychological Assessment

However, the manual does state that it is used by a wide range of professionals from medical, psychological and social domains and can be applied across settings, and that the initial impetus for developing a classification was the need to collect statistical information. Many of the criticisms made are discussed in the introduction, which outlines the limitations of the categorical approach and its use in clinical decision-making. Another traditional form of assessment widely used in health settings has involved projection, including the Rorschach inkblot test and the Thematic Apperception Test, which ask people to describe ambiguous visual stimuli. Although popular, these have also been subject to criticism, as we shall see in Chapter 9. The number of alternative clinically oriented assessments which are psychometrically sound has, however, grown in recent years.

Summary
Psychological assessment has had a long history, although the most rapid development was from the mid-nineteenth to the mid-twentieth centuries. A key focus has been upon empirical measurement and individual differences, culminating in modern psychometrics with its emphasis upon the normal distribution, standard deviation, correlation, sampling and standardization, measurement scales, factor analysis, statistical models, and more recently test construction, as well as issues of best-practice and culture. These terms, placed in more of an historical order rather than a conceptual one, are all commonplace today. To practise effectively in any form of applied psychology requires a good understanding of all of these. In addition, the Diagnostic and Statistical Manual of Mental Disorders (DSM) and the International Classification of Diseases (ICD) have worldwide use in assessment of mental disorders.

Core Characteristics of Assessment


All psychological assessments are made up of a collection of questions or tasks, known as items. In a questionnaire this may involve a multiple-choice response format such as an anxiety questionnaire:
Indicate how much you have been bothered by each symptom during the PAST WEEK, INCLUDING TODAY, by placing an X in the corresponding space in the column next to each symptom. NOT AT ALL MILDLY MODERATELY It did not It was very unpleasant bother me a lot but I could cope 1 Stomach upsets 2 Having dizzy spells 3 Feeling scared 11 SEVERELY I could hardly Stand it

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 12

An Introduction to Psychological Assessment and Psychometrics

Or a personality questionnaire:
Begin here 1 I would enjoy being an engineer more than being a primary school teacher. a. True b. Not sure c. False When something bothers me, I can often laugh it off. a. True b. Not sure c. False

Or an ability test:
Q1. 1.08, 2.16, 3.24, 4.32, 5.4, 6.48 What number comes next? 1 6.56 2 6.66 3 7.56 4 7.58 5 7.66 6 7.76

For the last two measures you would, of course, have a response sheet to mark your answers on. Only parts of possible ones are shown here for illustration purposes. How would you go about scoring these? For the anxiety questionnaire you might give a number of value 0, 1, 2 or 3 for each of the column headings and then sum the totals for all of the columns, as is actually done with the Beck Anxiety Inventory. With the ability test, you could just determine the number of correct responses by counting them to give a total score. Life gets a bit more complicated with personality questionnaires because they often have more than one scale, sometimes as many as 30 or more. In these, all of the items relating to the scales are jumbled up in the questionnaire; otherwise the respondent might guess at what is being assessed by a particular group of them. They are separated either by scoring keys or software to give a total score for each of the scales. These then form the profile for a person.

The Technical Nature of Assessment


But what makes the difference between assessments like these and a questionnaire printed in a popular magazine which aims, say, to tell you how attractive you might be to others? The answer is centred upon technical information about the instruments themselves and often the procedures by which they are administered: Standardized administration is required for many tests so that the administration and instructions are the same for everyone who takes them. Tests and questionnaires often have normative information, i.e. about how different groups have responded as part of a process of standardization. Their results are measured on scales and items are specifically related to measurement on these scales. This information about different groups is usually available within a technical
12

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 13

Introduction: Foundations of Psychological Assessment

manual. It helps administrators to identify the difference between high, average and low scores for a group of people. Test publishers also provide information on the accuracy/consistency of scores (known as reliability). They also give evidence of validity, which provides the basis for making valid inferences about people from their scores. The basis of psychometrics lies in these things standardization, reliability and validity. Put simply, the differences between an acceptable psychological measurement and that set of questions in a magazine lie in: A scientific rationale for what is being measured An explanation of construction Standardized administration procedures in many cases Use of a large sample to establish norms or a process for comparison with others Accuracy and error measures Evidence for validity Guidance on interpretation

These sorts of things should be available, either in a test manual or some other format, for any type of assessment provided by a publisher. It is important for purchasers who are unfamiliar with a particular assessment to study the manual carefully before using it. The dangers of not doing so could include: Purchasing an assessment which is inappropriate for the purpose required Purchasing one which is of poor quality Not understanding how to use the assessment properly and, therefore, affecting important factors such as its accuracy Not administering or scoring the assessment effectively and thus having a detrimental impact upon accuracy and whether you can interpret any scores appropriately Misusing the test and the interpretation of its outcomes in feedback to individuals. A second factor relates to the question: What do tests and questionnaires really measure? It might be easier to answer this question when we consider other sciences, for example in physics we measure such things as mass or volume, in chemistry we might measure temperature or concentration of a solution, in biology metabolic rate or response level to a stimulus. In engineering we might look at the length and height of materials, the velocity of moving components, rate of electrical flow or voltage, and so on. All of these appear more substantial than factors such as verbal reasoning, spatial reasoning, levels of emotional stability or social confidence, or of depression or psychopathology, or the whole host of things measured by psychologists. We seem to be dealing with concepts which are more abstract. Can we put a hand on a specimen of, say, anxiety or a form of reasoning or of emotional stability, etc? No, of course not. To assess them we need to undertake an inferential process, i.e. we need to make an inference about the level of something based upon observations. That something may be
13

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 14

An Introduction to Psychological Assessment and Psychometrics

described as a hypothetical concept, and we are restricted to identifying how we can compare individuals in terms of this. Mind you, the same is true for many things also measured in other sciences and technology. What about forces? We can observe their outcomes but can we see them directly? Some forces are based upon more of an inference than others, for example the nuclear binding force holding together an atomic nucleus. We cant really see electrical current, i.e. the electrons thought to be flowing along a cable, or even voltage. There are many things measured in physical sciences which are also based on inferential processes, just like in psychology. However, some people prefer to cope with things which are easily observable and understandable. They may prefer dealing with the physical world, disliking concepts which are less concrete or visible. But you cant escape them. So psychology focuses upon assessment of concepts which are based on inference, and this lies at the heart of what we mean by validity which is explored in Chapter 6. To illustrate this process, consider the question: Where would you rather go to a social event with your friends or a quiet evening alone following your own interests? If you reply that you would rather go to the social event then I might infer that you are more extraverted than introverted; if you choose the solitary evening then I might infer the opposite. Obviously, that is not enough information to make a decision about you; it just illustrates an inferential process. Evidence of validity is, therefore, important because it provides a justification of the inferences you can make from an assessment. Put simply, validity is about what any assessment actually measures. By means of different techniques we ask about a persons responses, behaviour or mental states and use these as indicators of underlying characteristics. All of this means that competence in using any assessment lies in looking past its superficial characteristics, such as the items and how they are written, to its underlying technical properties. That is why it is important to discourage people from seeking to discuss items in terms of their structure, the way they are phrased or even their punctuation. Reliability and validity are constructed on the basis of all of the items operating together as a unity; although this doesnt mean to say that designers dont look at these factors when they construct them. They do, its just that they have to make a decision about the format of items and, once having done so, then establish its technical properties. Once items have been constructed we need to be more concerned with the technicalities of the instrument. Competence in using assessment lies not in dealing with what might be called its surface content, but rather with a body of information and statistics. To make assessments of people is, frankly, a dangerous thing. If we do it badly and the assessed person dislikes the outcomes, then we may encounter rejection, hostility and in some instances complaints. There are good forms of assessment and bad ones and there is bad use of good ones. We need to ensure we are using appropriate and relevant methods and that we do so in a way which is fair and acceptable. The important point is that we do not provide qualitative unverifiable judgements, which everyone, whether non-psychologists or psychologists, is capable of making, but should instead aim to provide quantitative and verifiable evidence. This is particularly important when we are dealing with the lives and careers of people.
14

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 15

Introduction: Foundations of Psychological Assessment

Stable and Changing Characteristics


Traits are defined as relatively constant, long-lasting tendencies or characteristics of individuals, being predictable and indicating underlying potential (Allport and Odbert, 1936; Allport, 1961). They remain relatively stable throughout the life span, especially after adulthood. Mike Smith and his wife Pam (Smith and Smith, 2005) say a trait is a posh name for a characteristic and quote the definition of a trait as a dimension of individual differences in tendencies to show consistent patterns of thoughts, feelings and actions. They also add that trait theory is based upon two self-evident ideas that: Peoples thoughts, feelings and actions differ on a number of dimensions, and These dimensions can be measured. Trait measures try to assess people in terms of how they usually are. However, it is important to note that people can change, sometimes dramatically through unusual circumstances or gradually through life experience hence the use of the word relatively. We cant measure traits directly, and our principal aim is to compare a persons position on a trait scale to that of others, for example I might demonstrate the trait of aggressiveness but just how aggressive am I? Am I more or less aggressive than others or am I at a level which is typical for most people? On this basis traits can provide useful descriptions of how people typically behave. Traits can be grouped into three classes attainments, ability traits and personality traits. Measures of attainment indicate how well a person performs in a particular field following a course of instruction, for example school exams. They tend to be retrospective, looking backwards to knowledge or skills learned, and are influenced by factors such as teaching ability and resources. Ability traits relate to a persons level of cognitive performance in some area, referring to thinking skills which can predict future potential, rather than just knowledge. Personality traits indicate an individuals style of behaviour. Many theorists have attempted to develop a descriptive classification of people in terms of trait characteristics, such as being introverted, emotionally stable, dominant, impulsive and shy, and which relate to objectively observable behaviours. Psychometric evidence has led many psychologists to view individual differences in terms of such things. Many personality measures, such as the 16PF, the 15FQ and the Occupational Personality Questionnaire (OPQ), are therefore trait measures. Despite situational influences at the time of assessment, personality traits may be a useful tool in predicting how individuals are likely to behave most of the time. Traits should be distinguished from states, which are transient or temporary aspects of the person, such as moods, happiness, anger, fear, displeasure and even surprise, and which tend to be shown physiologically. They can result from the effects of situational circumstances or feelings, for example through fatigue, anger, boredom or just having a hangover, lasting hopefully for quite short durations. To complicate things, consider a possible exception: motivation. You may not be motivated now because you dont like the author of this book although you have to read it, but tomorrow will be doing something you love and will be strongly motivated by it
15

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 16

An Introduction to Psychological Assessment and Psychometrics

(suggesting motivation is a state characteristic). However, there are people who seem to go through life being always motivated whatever they do they are always doing their best and putting in a lot of energy (suggesting motivation is a trait). Another exception concerns anxiety, which can be split into trait and state anxiety. Trait anxiety is the general level of anxiety each person has, assuming nothing has happened recently to increase it. State anxiety, however, reflects that caused by some thought or event, and tends to be situational. In general, mood states can influence behaviour regardless of traits, as when sadness impairs the interpersonal skills of someone who is normally well-liked. Assessment of states is more common in therapeutic settings through the use of measures of depression, anxiety, helplessness and suicidal ideation. It has also been suggested that moods should be distinguished from motivational forces which direct behaviour temporarily, for example the basic biological drives of food, sex, aggression or social contact (Cattell, 1957). These, too, are states because they decline after having been met. Traits help us to understand long-term behaviour, although states are important if we are trying to predict how a person will behave at a certain time. A few measures are made up of assessments of both, for example the Spielberger State-Trait Anxiety Inventory.

Summary
Competence in psychological assessment and measurement relies on the understanding of technical information so that quantitative and verifiable evidence is gained. The basis of psychometrics lies in standardization, reliability and validity. Standardization provides information about how groups have responded to assessment and enables users to identify high, average and low scores. Reliability provides information on the accuracy of scores and validity about what an instrument measures. A publishers manual is often provided to give information about these. Assessment materials mostly measure abstract concepts and interpretation involves a process of inference. Both trait and state-based assessment instruments are available today. Traits represent relatively constant and stable, enduring characteristics of individuals, whilst states are defined as being made up of more transient characteristics.

Types of Measurement
There appears to be many ways in which tests can be classified or categorized, and this doesnt help the newcomer. First, they may be classified in terms of the method of measurement they use. The broadest of these approaches distinguishes between how people perform in seeking to do their best and how they react to items. They can then be grouped into two areas: Measures of maximum performance, and Measures of typical performance
16

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 17

Introduction: Foundations of Psychological Assessment

Maximum Performance Measures


Measures of maximum performance include tests of ability, aptitude and attainment. As suggested, attainment measures indicate how well a person performs in a particular field following instruction or teaching. They are retrospective and are influenced by external factors. They are, therefore, outside the scope of psychological measurement, although the distinction between attainment and aptitude is not necessarily always clear-cut. Ability tests, aptitude tests and other objective tests are maximum performance measures because they are about how well people do things, how well they have learned skills or how great their potential is. They aim to identify what we can do when we try our hardest. They range from abstract concepts for example: Abstract reasoning Spatial orientation or relations Numerical reasoning Inductive reasoning Ideational fluency Musical sensitivity

to the rather practical, for example: Clerical speed and aptitude Programming aptitude Spelling and grammar Manual dexterity Hand tool dexterity

In this case there are right or wrong, good or bad answers, and the tests are usually timed so that response speed is involved. They provide raw scores, which is the total number of correct answers, and these are then converted to more usable scores such as percentiles. Aptitude scores may sometimes be influenced by attainment, for example a certain level of reading ability may be needed to understand items. Those with relatively easy items with a strict time limit are called speed tests. They have items of similar difficulty and measure how many can be completed accurately within a set time. True speed tests consist of items which, if given without the time limit, would be correctly answered by almost everyone and are mostly useful in assessing aptitudes such as clerical skill or perceptual speed tasks. In one instance a speed test was devised for the selection of traders and dealers working for an international bank, and was designed to check on their ability to accurately work out currency conversions whilst under high pressure. If the score depends solely on the ability to answer questions, rather than speed although this remains a factor involved, then we have a power test which measures the ability to do something. Having a time limit ensures a maximum score is set. Power tests tend to get harder as a candidate progresses through items; the time limit enables norms to be provided for comparison of someones score with others and sets the top level of ability achieved.
17

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 18

An Introduction to Psychological Assessment and Psychometrics

Typical Performance Measures


Measures of typical performance include assessments of personality, belief, values and interests, i.e. what we typically are, what we would normally do, and so are more user friendly. Personality dispositions are preferred or typical ways of thinking and behaving, being referred to as underlying characteristics or traits. They are often assessed by selfreport measures having multiple scales, including scales for such things as assertiveness, anxiety or ambition. There is no right or wrong in terms of the responses given (which is why I prefer to call them questionnaires or inventories rather than tests) and there is usually no set time limit. They will encourage individuals to be as honest as possible in their responses. I can hear you saying that because they are self-report instruments they can be faked. As we shall see in Chapter 8, their designers try to identify any level of this or other forms of sabotage. Examples of personality questionnaires include: The The The The The The The The The The The 16 Personality Factor questionnaire (16PF) Personality Assessment Inventory (PAI) Occupational Personality Profile (OPP) 15 Factor Questionnaire (15FQ) California Personality Inventory (CPI) Myers-Briggs Type Indicator (MBTI) Minnesota Multiphasic Personality Inventory (MMPI) Jung Type Indicator (JTI) Millon Adolescent Personality Inventory Occupational Personality Questionnaire (OPQ) Criterion Attribution Library (CAL)

An alternative way of classifying assessment lies in terms of a distinction between standardized and non-standardized techniques. A standardized instrument has been administered to a representative sample of people from a group or population, whose converted scale scores, or norms, serve as a basis for interpreting the scores of others. These contrast with non-standardized measures, for example learning tests used informally by teachers or questionnaires to identify your preferred team role. Lacking standardization means that you cannot compare the scores of individuals with typical scores. Another way of classifying measures is on the basis of group or individual administration. Many of those used in health, forensic or educational settings are individually administered, including the Wechsler Adult Intelligence Scale (WAIS) or the Wechsler Intelligence Scale for Children (WISC). Others, for example Ravens Progressive Matrices, the 15FQ and the Critical Reasoning Test Battery, can be administered to a group and because of this are useful as part of job selection or development programmes. Group assessments mostly use pencil-and-paper measures, with booklets and answer forms. They can also be distinguished from apparatus tests which are often linked to sensory-motor abilities or sensory acuity. An example is the Movement Assessment Battery for Children, which includes equipment for manual dexterity and ball skills. Similarly, some tests contain only verbal materials, compared to those needing
18

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 19

Introduction: Foundations of Psychological Assessment

the manipulation of objects like the soldering of components, which are called performance tests. Yet another approach to classification is based upon the method of scoring responses. Objective tests use precise scoring procedures, for example through counting correct answers. In contrast, elicitation questionnaires, like essays, need a more subjective approach to marking and are seen as non-objective. A broader view again might be to see a distinction between assessments in terms of cognitive versus affective methods. Those which are cognitive tests aim to quantify a form of mental activity, for example reasoning ability or an aptitude of some kind, whilst affective measures may assess aspects of personality, as well as interests, values, motives and attitudes. And lastly, yet another approach to classification concerns the level of qualification possessed by people who wish to buy and use them, which we will consider in Chapter 10.

Quality and Measurement


In general terms what might be the quality criteria when we come to consider any form of psychological assessment? The following is not an exhaustive list, but provides us with something to think about if we are preparing to buy or construct a measure: The scope including the range of attributes covered, of norm groups or of people who can potentially be assessed (its breadth). Reliability or accuracy of the test. See Chapter 5. Validity of the test. See Chapter 6. Acceptability can its purposes be explained and feedback offered? Practicality including the cost, equipment and facilities needed for its use. Fairness, in terms of any legal issues involved, for example where this might relate to discrimination relating to sex, race, disability or age. Where tests are used to compare people, they are designed to discriminate between them, although in a fair and ethical way. This is discussed in Chapter 10. Utility the costs and benefits in any applied domain of using an assessment and the alternatives available.

So What Are They Used For?


To conclude this first chapter, it might be helpful to set the scene for what is to come by considering briefly some of the uses of assessment methods and tests in different fields of applied psychology. They are used throughout psychology, whether researchbased or applied, allied disciplines. You just cant get away from them. There are now hundreds of assessment materials being produced and distributed commercially. Its helpful if you can see how they are being used in different domains, especially those you might be considering for a future career. Assessment tools are often used in clinical psychology as a means of diagnosing mental health problems, for assessing change in a patients mental state in response to therapy, for conducting audits of treatment outcomes, and for distinguishing between clinical groups.
19

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 20

An Introduction to Psychological Assessment and Psychometrics

For example, a psychologist might want to track change in the mental state of a patient by regularly administering a depression inventory to see if there has been improvement. In working with children the psychologist might want to know whether a young person has behaviours which are, say, autistic in nature or indicate a learning disability. Those working with older people may be concerned to identify whether someone is suffering from depression using a geriatric depression scale. These are just a few illustrative examples. Similar measures will also be used by psychologists specializing in counselling psychology. This kind of programme can also be used in forensic psychology in working therapeutically with offenders, as well as in conducting assessments requested by courts of law to help in decision-making. For example, a court may want to know the level of intellectual functioning of an offender, the persons suggestibility and compliance before sentencing, or competency to stand trial. It may want to know more about an offenders mental state, including such things as high levels of depression or anxiety, or psychosis, Attention-Deficit Hyperactivity Disorder or Post-Traumatic Stress Disorder. In clinical neuropsychology practitioners use many assessment tools in diagnosing brain damage resulting from accidents, strokes or dementia and in helping people suffering epilepsy. The consequences of an accident or stroke may result in poorer attention span, weaker memory and poorer use of language, as shown in Box 1.3. A neuropsychologist may want to assess these using specific tests, as well as the effect of events on a persons visual perception, bodily senses and motor functions. Neuropsychological tests can identify the localization in the brain of damage, its nature and effect upon bodily or social functioning and emotional state, and how best to conduct rehabilitation.

Box 1.3

Understandng Brain Injury

Mrs Smith could remember travelling along in the car and the moment when it was in collision with a lorry. Her next memory was of waking in hospital four days later. Life up until then had seemed normal. Her children had grown up; she was happily married and still working. She had many interests. But after treatment, things were no longer the same. She would have sudden angry outbursts, which were out of character. She couldnt do the cooking any more. Her memory was poor and she couldnt concentrate. In the UK some 50 per cent of serious head injuries are caused by road accidents. Most of these are closed head injuries involving major primary brain damage. This might be centred in one area or in a number of areas or even be spread throughout a large part. It can occur in areas different from the location of the original impact. It is not surprising that many accident victims experience impairments which make daily functioning more difficult. Mrs Smith (not her real name) was referred to a clinical neuropsychologist because of dizziness, poor memory and an inability to concentrate. Assessment began with a structured interview. Despite appearing alert, Mrs Smith had experienced posttraumatic amnesia over a four-day period, suggesting she may have sustained a moderately severe head injury. This was followed by administration of a number of tests: 20

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 21

Introduction: Foundations of Psychological Assessment

The National Adult Reading Test version 2 (NART-2) provided an estimate of pre-morbid intellectual ability, i.e. of ability before any injury or trauma. The Wechsler Adult Intelligence Scale (WAIS III) measured aspects of intellectual functioning. The Wechsler Memory Scale (WMS-III) was used and the Controlled Oral Word Association Test. The Rey-Osterrieth Complex Figure Test assessed visualspatial ability and visual memory. The Trail-Making Test (TMT) measured visual conceptual and visuomotor tracking/ attentional switching. The Hayling Test measured basic task initiation speed. The Tower of Hanoi Puzzle assessed planning, response inhibition, information-processing speed and working memory. Analysis showed Mrs Smith had sustained a moderately severe head injury, suffering impairments in general and working memory, learning, retrieval of new information and attention, as well as slower cognitive processing and impairment in higher-level functioning. A plan was drawn up to help her, including attendance at a head injury group providing education sessions, advice on memory aids and strategies, occupational therapy to help with household activities and vocational rehabilitation.

Where a child has problems in learning at school the practice of educational psychology enables the identification of potential learning difficulties and how these might best be remedied. Assessment materials are available today to look at overall achievement or specific areas of potential difficulty such as reading comprehension, speed and accuracy of reading, auditory processing of language, memory skills, general reasoning and writing skills. Tests can be used to identify problems like dyslexia. The outcomes will help a psychologist to decide what intervention will best support the child and what advice to give teachers and parents. In health psychology practitioners may help people to cope with a wide range of problems, possibly being based in a hospital or community service. The psychologist might identify how best to support someone who has experienced a major heart operation or a diagnosis of cancer and provide guidance to carers and families. Where an individual is suffering high levels of depression or anxiety, assessment materials can aid diagnosis. There are instruments designed to identify health problems, to assess opinions and beliefs about health, to measure pain perception and control, and to assess stress and ways of coping with it. Ability, aptitude and personality assessments are used widely in occupational psychology. They can be used for selection, for promotion, coaching, development and training purposes and in career counselling by occupational psychologists and other professionals. An employer may be interested in finding the best person available for a senior managerial position. This could involve design of an assessment centre including work samples, structured interviews, ability tests and personality questionnaires. Outputs are then combined to give an overall view of individual strengths.
21

Coaley-3941-Ch-01:Coaley-Sample.qxp

30/07/2009

8:05 PM

Page 22

An Introduction to Psychological Assessment and Psychometrics

Summary
In this section we have considered a number of ways of classifying psychological measures. The main approach is to divide them into those which distinguish between how people perform in trying to do their best (maximum performance measures) and those which distinguish in terms of how they react to items (typical performance measures). Among other classifications discussed is the level of qualification which might be needed to use them effectively. We have also looked at issues concerning quality criteria in evaluating assessment tools, and briefly at how they might be used in different fields of applied psychology. This chapter was designed to provide an introduction to psychological assessment, which involves the integration of information from multiple sources in order to understand people. We have seen that measurement techniques form a major part of assessment throughout psychology. Lack of regard for these techniques will mean that assessments do not have an objective and scientific basis, and any critical evaluation needs to be focussed on identifying measurement issues. We have learned: About the nature of psychological assessment, the need for measurement, standardization and for codes of practice and ethics. To distinguish between different forms of assessment and how they can be categorized. The key figures in historical development, including Galton, Binet, Cattell, Anastasi and Kline. About core characteristics and issues relating to different approaches, including reliability, validity and the differences between states and traits. About some of the ways in which applied psychologists make use of measures.

22

You might also like