You are on page 1of 140

STRUCTURE OF MOTOR SYMPTOMS OF PARKINSON'S DISEASE

A Dissertation Submitted to the Faculty of Physical Education and Sport Charles University

In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Kinanthropology

by Jan tochl

Prague, Czech Republic September 2005

ACKNOWLEDGEMENTS This thesis has been completed with the help and effort of many people. First, I would like to thank to Professor Petr Blahu for his patience and support during my doctoral study. His unreserved approachability, useful lectures and encouragement were crucial for the final realization of the thesis. I am also very grateful to the University of Groningen for having me stay and for arranging my stay comfortably and unforgettably. I would like to especially recognize and thank Dr. Anne Boomsma and Dr. Marijtje A. J. van Duijn for their great support, constructive criticism and valuable lectures. Special thanks to I. Kohoutov for her helpful comments to the manuscript. Sincere appreciation is due to Prof. Rika for his patient assistance in getting the data and correcting the manuscripts. Individual thanks to Prof. Leenders for his effort and help with data gathering. Many thanks also to doctors Jan Roth, Petr Me, Robert Jech, Tereza Serranov, and Olga Ulmanov who performed some of the UPDRS testing. Last but not least, I am grateful to my family and friends, especially to Ondra, Cipis, Zmijk, Ross, Mra and Elika for their constant support, encouragement and friendship. Individually, I would like to express my deepest gratitude to Eva Tomeov for her unlimited support and patience.

CONTENTS ACKNOWLEDGEMENTS................................................................................................ 2 LIST OF TABLES.............................................................................................................. 6 LIST OF FIGURES ............................................................................................................ 7 ABSTRACT........................................................................................................................ 8 STRUCTURE OF MOTOR SYMPTOMS OF PARKINSON'S DISEASE ...................... 1 ON MEASUREMENT OF THEORETICAL CONCEPTS ................................... 1 PARKINSON'S DISEASE ..................................................................................... 7 What Is Parkinson's Disease ....................................................................... 7 Etiology of Parkinson's Disease.................................................................. 7 Clinical Symptoms of Parkinson's Disease................................................. 9 Terms Used to Describe Motor Symptoms of Parkinson's Disease.......... 10 Parkinson's Disease Progression and Medication..................................... 11 Unified Parkinson's Disease Rating Scale ................................................ 11 Motor Section of UPDRS (MS UPDRS) and Its Dimensionality............. 17 RESEARCH QUESTION..................................................................................... 20 HYPOTHESES ..................................................................................................... 20 METHODS ........................................................................................................... 21 Structural Equation Modeling (SEM)................................................................... 21 Introduction............................................................................................... 21 Types of SEM Models .............................................................................. 24 Statistical Assumptions of SEM ............................................................... 28 Types of Parameters Used in SEM Models .............................................. 31 Methods for Parameters Estimation.......................................................... 31 Note on Using Ordinal Variables in SEM ................................................ 34

Identification Problem in SEM ................................................................. 35 Model Testing and Fit Evaluation............................................................. 36 Chi-square statistic.................................................................................... 37 Alternative Fit Indices............................................................................... 38 Conventional Practice and Recommendations for Model Evaluation ...... 42 Mokken's Scale Analysis ...................................................................................... 44 Introduction............................................................................................... 44 IRT Versus Nonparametric IRT (NIRT)................................................... 45 Assumptions Underlying NIRT Models ................................................... 49 Extension of NIRT to Polytomous Items.................................................. 52 Mokken's Monotone Homogeneity Model for Polytomous Items............ 55 Mokken's Double Monotonicity Model for Polytomous Items ................ 56 Scaling Procedure ..................................................................................... 56 Limitations and Issues of Mokken's Scale Analysis................................. 58 EMPIRICAL RESEARCH ................................................................................... 60 Introduction........................................................................................................... 60 Sample Description............................................................................................... 61 Results................................................................................................................... 62 Initial Computations.................................................................................. 62 Exploratory Mokken's Scale Analysis of MS UPDRS ............................. 64 Confirmatory Mokken's Scale Analysis of MS UPDRS .......................... 68 Summary of Results of Mokken's Scale Analyses.................................... 74 Building Structural Equation Models of Parts of MS UPDRS ................. 75 Building Structural Equation Model of Entire MS UPDRS ..................... 92 Differences Between Models for Patients in on and off States ....... 102

Summary of Results of Structural Equation Modeling........................... 103 Discussion ........................................................................................................... 105 CONCLUSION............................................................................................................... 111 REFERENCES ............................................................................................................... 112 APPENDICES ................................................................................................................ 122 Motor Section of Unified Parkinson's Disease Rating Scale.............................. 122 MS UPDRS Data Sheet ...................................................................................... 124 Exploratory Mokken's Scale Analysis for Non-trichotomized Data and Cutoff Criterion of Hi >0.3............................................................................................. 126 Exploratory Mokken's Scale Analysis for Non-trichotomized Data and Cutoff Criterion of Hi >0.4............................................................................................. 127 Parameter Estimates, Standard Errors and T-values of Selected Models........... 128

LIST OF TABLES 1. Basic statistical properties of the data........................................................................... 63 2. Exploratory results for cutoff criterion of Hi>0.3 ......................................................... 65 3. Exploratory results for cutoff criterion of Hi>0.4 ......................................................... 67 4. Values of Hi coefficients of confirmatory Mokken's scale analysis ............................. 69 5. Values of Hi coefficients of confirmatory Mokken's scale analysis ............................. 72 6. Values of Hi coefficients of confirmatory Mokken's scale analysis ............................. 73 7. Matrix of polychoric correlations of items related to tremor........................................ 76 8. Fitted residuals .............................................................................................................. 77 9. Fitted residuals .............................................................................................................. 78 10. Fitted residuals ............................................................................................................ 80 11. Fitted residuals ............................................................................................................ 81 12. Matrix of polychoric correlations of items related to rigidity and bradykinesia ........ 82 13. Fitted residuals ............................................................................................................ 84 14. Fitted residuals ............................................................................................................ 86 15. Fitted residuals ............................................................................................................ 87 16. Fitted residuals ............................................................................................................ 89 17. Matrix of polychoric correlations of items related to axial/gait bradykinesia ............ 90 18. Fitted residuals ............................................................................................................ 91 19. Matrix of polychoric correlations of MS UPDRS ...................................................... 93 20. Fitted residuals .......................................................................................................... 101 21. Differences between models for patients in on and off states ........................... 102

LIST OF FIGURES 1. Latent variable modeling of theoretical concepts ........................................................... 2 2. Example of path analytic model ................................................................................... 25 3. Logistic function ........................................................................................................... 46 4. IRFs of various difficulty parameters ........................................................................... 47 5. IRFs of various difficulty and discrimination parameters ............................................ 48 6. Logistic IRF (solid curve), Nonparametric IRF (dashed curve); and ordered latent class model IRF (step function) ................................................................................ 49 7. Path diagram of one-factor model of tremor................................................................. 76 8. Path diagram of two-factor model of tremor ................................................................ 78 9. Path diagram of two-factor model of tremor ................................................................ 79 10. Path diagram of hierarchical model of tremor ............................................................ 80 11. Path diagram of two-factor model of rigidity and bradykinesia ................................. 84 12. Path diagram of hierarchical model of rigidity and bradykinesia............................... 85 13. Path diagram of three-factor model of rigidity and bradykinesia ............................... 87 14. Path diagram of four-factor model of rigidity and bradykinesia ................................ 88 15. Path diagram of one-factor model of axial/gait bradyknesia ...................................... 91 16. Path diagram of three-factor model of the MS UPDRS ............................................. 94 17. Path diagram of five-factor model of the MS UPDRS ............................................... 95 18. Path diagram of hierarchical model of the MS UPDRS ............................................. 97 19. Path diagram of seven-factor model of the MS UPDRS ............................................ 99

ABSTRACT The aim of this study is to investigate the number and the structure of the motor symptoms of Parkinson's disease measured by Motor Section of the Unified Parkinson's Disease Rating Scale (UPDRS). This is inferred through statistical analysis of the Motor Section of the UPDRS. First, the etiology and the clinical symptoms of Parkinson's disease are outlined. Then, the UPDRS is introduced with focus on the statistical features of the Motor Section of this scale. The next two chapters deal with Mokken's scale analysis and structural equation modeling. Finally, dimensionality and reliability of the Motor Section of the UPDRS are studied with nonparametric Mokken's scale analysis and structural equation modeling. The UPDRS measures were obtained from 405 patients with PD (237 men (39 off; 170 on; 28 unknown), 168 women (21 off; 140 on; 7 unknown)). The analysis showed high skewness of the data in most of the items substantiating the use of a nonparametric scaling method. Mokken's scale analysis allowed for separating the Motor Section of the UPDRS into five dimensions. The first dimension consisted of axial/gait bradykinesia and left-sided items of rigidity and bradykinesia of the extremities suggesting their co-occurrence. Right-sided items of rigidity and bradykinesia of the extremities generated the second dimension. There was a high internal consistence of these two dimensions assessed by Cronbach's alpha (0.92 and 0.87, respectively). The third and the fourth dimension consisted of tremor-right and tremor-left items (both resting and postural), respectively. Cronbach's alpha, however, was less satisfactory for these dimensions (0.62 and 0.65). Items Speech and Facial expression generated stand-alone, but statistically limited dimension (alpha = 0.76). Structural equation

modeling showed that the Motor Section of the UPDRS incorporates seven factors: rigidity, tremor, bradykinesia of the extremities, axial/gait bradykinesia,

speech/hypomimia, and two additional factors for laterality. Finally, the structure of motor symptoms of Parkinson's disease seems to be stable across on and off states.

Keywords: Kinanthropology, Mokken's scale analysis, structural equation modeling, dimensionality, reliability, Motor Section of the UPDRS

1 STRUCTURE OF MOTOR SYMPTOMS OF PARKINSON'S DISEASE

ON MEASUREMENT OF THEORETICAL CONCEPTS In contrast with physical measurement such as time, length, and weight, etc., in psychology, sociology and other sciences some variables cannot be measured directly. Self concept, IQ, attitudes, and motor abilities (e.g. motor coordination or motor endurance) are examples of such variables. They are called latent traits or theoretical concepts or especially in psychology hypothetical constructs. They characterize the general features of behavior and serve as a theoretical explanations of performance referring to the abstract and generic attributes of human activity. The problem is how the theoretical concepts can be introduced by the test battery since they are not observable by nature and can be measured only indirectly and by tests which are (usually) scaled on the basis of physical variables like time, length, etc. Moreover, the items in the test battery have (usually) different empirical contents, are scored by different experimental operations, and their score values are possibly represented in different measurement units. It deals with the mutual relationship of the empirical and theoretical levels of scientific knowledge. This issue can be solved through the so-called rules of correspondence (Blahu, 1996a). These rules enable the process of induction in which the more general and generic theoretical concepts (e.g. abilities) are inductively constructed from the empirical attributes (e.g. tests) that work as their partial and more specific empirical indicators. In other words, since theoretical concepts are not measured directly, the researcher must operationally define the latent variable of interest in terms of the behavior1 believed to represent it. Assessment of the behavior, then, constitutes the weak associative measurement of an underlying concept (Byrne, 2001). These measured scores (measurements) are termed observed or manifest variables; they serve as indicators of the underlying theoretical concept that they are presumed to

The term behaviour is used here in the broadest sense to include, for instance in vivo observation of some physical task or activity, coded response to interview questions, self-report response to an attitudinal scale, etc.

2 represent. Latent variables are mathematically constructed variables supposed to model theoretical concepts (Raykov & Marcoulides, 2000). Latent variables can be further divided into the so-called explanatory (or exogenous) latent variables and response (or endogenous) latent variables. Explanatory latent variables are synonymous with independent variables (although the term independent variable is mostly meant in the sense of multiple regression); they cause fluctuations in the values of other latent variables in the model. Changes in the values of explanatory variables are not explained by the model, rather they are considered to be influenced by other factors external to the model (e.g. age or gender). Response latent variables are synonymous with dependent variables and, as such, are influenced by the explanatory variables in the model, either directly or indirectly. Fluctuations in the values of response variables in the model are said to be explained by the model since all latent variables that influence them are included in the model specification.

Fig. 1. Latent variable modeling of theoretical concepts (Blahu, 1996a)

Models with latent variables can be defined as a class of statistical models that describe observable reality through its relationship with unobservable mathematically constructed characteristic (Blahu, 1980). They are helpful in offering the tools for solving the problem of correspondence by representations of the semantic level in terms of the syntactic or model level as shown in Figure 1. Moreover, such modeling is a relatively universal approach and therefore can be used in many branches. 2

3 There are two basic conditions enabling the use of latent variable models in the role of formalized corresponding rules. They were introduced by McDonald (1979), but more precisely described by Blahu (1991): 1) Concept formation by the weak associative measurement: This is represented by set of model equations, typically regression functions. Then, the associative measurement in Physics can be understood as a special case of this weak one. 2) Requirement of the completeness of explanation that the concept yield. In the simplest case, this it the axiom of local independence (will be introduced later). If these two principles are fulfilled then the concept formation can be understood as a weaker case of associative measurement carried out through statistical modeling with latent variables. Since the purpose of the models with latent variables is to specify and confirm regularities, these methods contribute to the two basic tasks of science - explanation and prediction (Blahu, 1996b). From a practical point of view they can be used for (Blahu, 1985): a) Estimation of the level of latent variable in Anthropomotoricity for example estimation of the individual level of motor abilities b) optimal test reduction usually exclusion of items with low validity or reliability c) classification of variables especially developing new and more general terms (e.g. terms static strength, dynamic strength and explosive strength can be covered by the term strength) d) development or confirmation of the structural hypotheses e) transformating of variables into variables with better predictive properties epika (2003) divides models with latent variables into three groups of models: Linear factor analysis, nonlinear factor analysis, latent structure models. Rabe-Hesketh and Skrondal (2005) propose another classification: generalized linear mixed (or multilevel) models, measurement models (factor, item response, or latent class), and structural equation models. For the purpose of this study we suggest the following classification: Item response theory (IRT) models, which include 1- or 2- or 3-parameter logistic models and various Nonparametric IRT models including Mokken's models. IRT 3

4 is focused on evaluation of the degree of precision and breadth of scales that are used to measure theoretical concepts, or (in IRT terms) underlying latent traits. IRT consists of a class of statistical procedures that are used to model the association between an individual's responses to survey questions/items (in probabilistic terms) and an underlying latent trait that is measured by the items. Structural equation modeling (SEM). It is a method for determining the extent to which data on a set of variables are consistent with hypotheses about association among the variables. Usually, it is based on analysis of covariance or correlation matrix of observed variables. SEM includes factor analysis, multiple indicatormultiple cause (MIMIC), non-recursive (reciprocal effects) path models, growth curve models, ANOVA, ANCOVA, MANOVA, etc. Latent class analysis (LCA). It is a statistical method for finding subtypes of related cases (latent classes) from multivariate categorical data. It is usually based on analysis of frequency table of observed response patterns. For example, it can be used to find distinct diagnostic categories given the presence/absence of several symptoms or types of attitude structures from survey responses. The results of LCA can also be used to classify cases to their most likely latent class. Latent profile analysis is a variant on LCA for continuous variables. A detailed overview of special cases of the general model with latent variables is presented in Blahu (1985). In sport sciences, the first application of latent variable modeling can be found in the work of Burt (1925). Nowadays it is still widely used even though some new types of modeling like multilevel modeling or the so-called Social Networks are being developed. From the perspective of the czech kinanthropology, factor analysis was applied in many studies (e.g. Blahu, elikovsk, & Kov, 1973). The more recent applications of IRT or factor analysis can be found in epika (2000), tochl (2002), Tomeov (2003), etc. The problem of motor diagnostic methods (motor testing) in Kinanthropology is essentially related to standardization of these methods (Mkota & Blahu, 1983). As pointed out by Blahu (2004) the main and the minimum standardization requirements of diagnostic quality contain:

5 construct validity towards the theoretical concepts that are covering the diagnosed domain as assessed through content validity by experts, the validity usually being modeled by correlations with the latent variable verified dimensionality of the diagnosed domain so that the whole extent of the diagnosed domain and its structure are appropriately covered reliability of the diagnostic methods, preferably verified by two or more ways, for example by the stability over replications as well as by their internal consistency practical validity to an external criterion, for example to other diagnostic methods, sometimes also in the form of predictive validity The assessment of the standardization criteria above necessarily includes application of statistical models with latent variables (epika, Rika, tochl, & Blahu, 2003). Scientific diagnostics in general, not only in terms of testing, has to be clearly distinguished from the clinical diagnostics methods (Blahu, 2004). This does not mean that diagnostic methods that are routinely used in educational or even medical practice would not be truly scientific. On the contrary, it is highly desirable that diagnostic methods are standardized on the same level of rigor as the scientific ones (Dvokov, 2002; tochl, 2002). Quite often, however, the clinical diagnostic methods of human movements are not evaluated objectively from the above mentioned points of view; but they are built on a rather intuitive background. Neurological syndromes such as the combination of hypokinesia, rigidity, resting tremor and postural abnormalities in Parkinson's disease (PD) represent such theoretical concepts which individual features can be statistically modeled as latent variables. The identification of the dimensionality of such syndromes is important because knowledge about the co-occurrence of symptoms may help to define disease phenotypes and provide clues for differential diagnosis. This study discusses the dimensionality, the structure of latent variables, the validity and the reliability assessment of the Motor Section of the Unified Parkinson's Disease Rating Scale (MS UPDRS) within the framework of two statistical approaches mentioned above - nonparametric item response theory and structural equation modeling.

6 One is encouraged to recognize this study in the framework of Kinanthropology and its synonym Kinesiology since Kinanthropology is to be understood as a comprehensive term for a scientific field dealing with the basic and applied research, with potential practical applications in monitoring various quantitative as well as qualitative indicators of human motor activities (Blahu et al., 1993). In addition, the content of Kinanthropology was formed from Anthropomotoricity and therefore it focuses on motor abilities and skills with the accent on the diagnostic quality of motor tests validity, reliability, etc.

7 PARKINSON'S DISEASE

What Is Parkinson's Disease Parkinson's disease is a progressive neurodegenerative disease based on extinction (dysfunction and death) of neurons pars compacta substantiae nigrae which lead to decrease of dopamine content of the striatum (Nevmalov, Rika, & Tich, 2002). Parkinson's disease (PD) was first described in 1817 by James Parkinson (2002). In the early 1900's, pathologists in Europe and the United States recognized a specific abnormality in the brains of individuals who in life had Parkinson's disease. In a region of the brain called the midbrain, there are certain neurons that contain a dark pigment known as melanin and this cluster of neurons is known as the "substantia nigra", meaning black substance. These pigmented neurons in the substantia nigra produce dopamine. Dopamine is a chemical messenger (neurotransmitter) responsible for transmitting signals between the substantia nigra and several clusters of neurons that together comprise the "basal ganglia" and is vital for normal movement. There is an abundant reservoir of dopamine in this region, but when the level drops below 20%, symptoms of Parkinson's disease begin to emerge. Thus, the loss of dopamine causes the nerve cells of the basal ganglia to fire out of control, leaving patients unable to direct or control their movements in a normal manner.

Etiology of Parkinson's Disease There are currently four theories on the cause of Parkinson's disease (Ebbitt, 2005) and many scientists believe that it is probably a combination of one or more of the following factors. In the field of genetics, some researchers believe that Parkinson's disease has a genetic cause and is therefore hereditary (Foltynie, Sawcer, Brayne, & Barker, 2002; Gasser, 1998). In many families, Parkinson's has occurred in many generations. Group of victims tends to follow one side, either the mother or the father. Studies show that first-

8 degree relatives of Parkinson's victims are two times more likely to develop the disease than relatives in families in which there is no history of Parkinson's (Rybicki, Johnson, Peterson, Kortsha, & Gorell, 1999). By studying families in which Parkinson's has passed through generations, researchers have identified an abnormal gene that as the cause of some cases of Parkinson's disease. However, in other Parkinson's patients there is no genetic link and the illness does not run in the family. For this reason, genetic studies on Parkinson's remain controversial (The Parkinsons web, 2005). Most scientists believe Parkinson's is caused by a combination of environmental and genetic influences. Parkinson's may be the result of environmental factors such as drinking well water (Gorell, Johnson, Rybicki, Peterson, & Richardson, 1998), living in the rural communities (Ferraz, Andrade, Tumas, Calia, & Borges, 1996), and exposure to heavy metals. Carbon monoxide poisoning, carbon disulphide, potassium cyanide, and methyl alcohol may also contribute to an environmental cause of Parkinson's. Some scientists have implicated manganese ore dust as a possible cause as well (Olanow, 2004). Through recent research, and testing it has been determined that no one environmental agent could be the only cause of Parkinson's disease (Parkinson's disease: etiology and genetics, 2005). The MPTP (by product in the process of heroin production) has been known to destroy substantia nigra (Muramatsu et al., 2003). This often results in many of the symptoms of Parkinson's disease. Infectious disease has been suggested as a possible cause of Parkinson's as well (Kristensson, 1992). In the early 1900's, there was an epidemic of an illness that caused people to fall into a stupor or suffer severe insomnia. This disease was called sleeping sickness or encephalitis lethargic. Many of the victims of this disease developed a certain form of Parkinson's disease. Still other scientists believe that Parkinson's disease may be a direct result of the process of accelerated aging. This process occurs for currently unknown reasons. Through this aging, some of the brain's ability to produce dopamine decreases. This results in many of the symptoms of Parkinson's disease. The single factor that has been most consistently associated with a reduced risk of PD is cigarette smoking, which has been demonstrated in numerous studies (e.g. Allam, 8

9 Del Castillo, & Navajas, 2002; Hernan et al., 2001). Caffeine consumption is also associated with a reduced incidence of PD (Deleu, 2001; James, 2003).

Clinical Symptoms of Parkinson's Disease Parkinson's disease causes motor (movement) and nonmotor symptoms. Prior to the diagnosis of PD, a person may begin to feel a drop in energy or a loss of coordination (Okun, McDonald, & DeLong, 2002). Several symptoms such as impaired handwriting, reduced arm swing, a "limp" or tremor may begin to emerge on one side of the body (Poewe & Wenning, 1998). Other early symptoms may include internal shakiness, difficulty in getting out of a chair, a soft voice and/or depression. These symptoms evolve gradually and may even be imperceptible to the patient or family members until a physically or emotionally stressful event occurs, triggering an exacerbation of these symptoms (Parkinsons disease, 2005). When the disease is fully expressed, the major clinical features include bradykinesia (slow movement), tremor (typically at rest and extinguished with movement), rigidity (a clinical finding of resistance to movement, often associated with a jerky sensation called cogwheeling) and impaired postural reflexes (poor balance) (Worldwide education and awareness for movement disorders, 2005). Many patients also suffer from secondary symptoms. These include depression, sleep disturbances, dementia, forced eyelid closure, speech problems, drooling, difficulty in swallowing, weight loss, constipation, breathing problems, difficulty in voiding, dizziness stooped posture, swelling of the feet, and sexual problems (The Parkinsons web, 2005). A variety of other symptoms can be associated with Parkinson's disease including fatigue, weakness, joint pain, internal tremor, anxiety, impaired recent memory, oily face or scalp, constipation, bladder urgency, soft hoarse speech, sleep disturbances, restless legs, etc. The average Parkinson's disease patient experiences 2 - 3 hours of off state each day. Generally, off state is reffered to as state of impaired motoricity (Roth, Sekyrov,

10 & Rika, 1999). Patients in off state experience handwriting problems, overall slowness, loss of olfaction, loss of energy, stiffness of muscles, walking problems, sleep disturbances, balance difficulties, challenges getting up from a chair, and many other motor and non-motor symptoms. For the use of clinical diagnostics, on state and off state are defined more rigorously (Langston et al., 1992): Defined on state: state after dosage standard dues of medication (L-DOPA or Defined off state: state patient with PN after 12-hourly omission antiagonist of dopamine) Parkinson's medication (it is 12 clock around of last dues of treatment), least 1 o'clock after awakening, to do away possible "sleep benefit").

Terms Used to Describe Motor Symptoms of Parkinson's Disease Bradykinesia: literally slowed movement. Dystonia: involuntary contraction of a muscle or a group of muscles. Dyskinesias: abnormal involuntary movements that can be characterized as

writhing movements and can include dystonic movements. These movements can be seen in a variety of disorders such as Huntington's Chorea, the dystonias and Tourette syndrome. These movements are commonly caused by levodopa and other antiparkinsonian medication and are often seen as a delayed reaction to antipsychotic medication. Rigidity: stiffness, increased resistance to passive movement. It is present when limbs are still, but increases as they move. It is related to over elasticity of specific nerve cells in the spinal cord that control muscle tone. Tremor: 5-6 Hz alternating activity og antagonist muscles controlling a joint, leading to alternating joint movements (Latash, 1998). Tremors are often worse on one side of the body than on the other. Resting tremor: a tremor of a limb that increases when the limb is at rest. Action/postural tremor: a tremor that increases when the hand/muscle is moving

voluntarily.

10

11 Parkinson's Disease Progression and Medication The progression of Parkinson's disease is highly variable, although the progression may be relatively slower in patients whose initial symptoms include tremor (Jankovic & Tolosa, 2002). The likelihood of developing PD increases with age. PD typically begins in a person's 50s or 60s, and slowly progresses with increasing age. The average age of onset is 62.4 years. Onset before age 30 is rare, but up to 10% of cases begin by age 40 (Worldwide education and awareness for movement disorders, 2005). A principal aim of PD therapy is to replace the brain's supply of dopamine with the drug levodopa, which the brain converts into dopamine. Levodopa was introduced as a PD therapy in the 1960s, and remains the most effective therapy for motor symptoms (Hoehn, 1992). It lessens and helps to control all the major motor symptoms of PD, including bradykinesia. Nausea and vomiting are the most common side effects, and are due to accumulation of dopamine in the bloodstream (Hunter, Shaw, Laurence, & Stern, 1973; Markham, Diamond, & Treciokas, 1974). Orthostatic hypotension (low blood pressure upon standing) also occurs (Lang, 2001). The risk of hallucinations and paranoia increases over time (Klawans, 1988). Compulsive behavior, including gambling and hypersexuality, is another risk (Proctor & McGinness, 1970). Drowsiness is a common adverse effect of levodopa and other dopaminergic therapies, and sudden sleep onset is possible (Tracik & Ebersbach, 2001). Patients may not experience any warning signs of sudden sleep onset. The most troubling adverse effect from long-term levodopa use is dyskinesias (Friedman, 1985). Dyskinesias result from the combination of long-term levodopa use and continued neurodegeneration. They typically begin to develop in milder forms after 3 to 5 years of treatment, but are more severe after 5 to 10 years of treatment.

Unified Parkinson's Disease Rating Scale The Unified Parkinson's Disease Rating Scale (UPDRS) is one of the most widely used rating scales for assessing patients with Parkinson's disease (PD). The UPDRS was 11

12 designed to provide a measure of signs and symptoms of Parkinson's disease in clinical practice and research. It is a scale that was developed in an effort to incorporate elements from existing scales, and to provide a comprehensive but efficient and flexible way to monitor PD-related disability and impairment. Prior to its development, multiple scales, including the Webster, Columbia, King's College, Northwestern University Disability, New York University Parkinson's Disease Scale, and UCLA Rating Scales, were used in different centers, making comparative assessments difficult. The development of the UPDRS involved multiple trial versions, and the final published scale is officially known as UPDRS version 3.0.1. The scale itself has four components - Part I, Mentation, Behavior and Mood; Part II, Activities of Daily Living; Part III, Motor Section; Part IV, Complications of Therapy. The original concept of the scale was to provide a core assessment tool that could be accompanied by additional measures to focus on global impairment. For example, whereas the UPDRS is often accompanied by and reported with such scales as the Schwab and England and Hoehn and Yahr scales. Of all available clinical scales for the assessment of Parkinsonian motor impairment and disability, the UPDRS is currently the most commonly used. Sixty-nine percent of 19941998 articles using a PD-rating scale relied on the UPDRS as the standard tool (Goetz, 2003). This trend is an international one and the UPDRS predominates as the primary scale in published studies from both US and other geographical regions. Utilization of UPDRS One of the core advantages of the UPDRS is that it was developed as a compound scale to capture multiple aspects of PD (Goetz, 2003). It assesses both motor disability (Part II: Activities of Daily Living; contains 13 items) and motor impairment (Part III: Motor Section; contains 27 items). In addition, Part I (4 items) addresses mental dysfunction and mood, and Part IV (11 items) assesses treatment related motor and nonmotor complications. Another unique feature of the UPDRS is the availability of a teaching-videotape standardizing the practical application of the scale and thereby serving as an important

12

13 asset to enhance inter-rater reliability (Goetz, Stebbins, & Chmura, 1995). This feature is particularly relevant to the training of new raters and to the conduct of multicenter therapeutic trials in PD. Despite its multidimensional approach with four different parts, the UPDRS has proven an easy-to-use instrument in clinical practice with an average time requirement for administration of the full scale between 10 and 20 minutes. This time can be further shortened by self-administration of the Mentation and ADL parts by patients in the waiting room. The UPDRS is increasingly used as a gold standard reference scale. The UPDRS is also the common reference scale in studies of instrument development for rating specific aspects of PD (Martinez-Martin et al., 1997). The UPDRS has also been used to define the placebo response in PD (Goetz, Leurgans, & Raman, 2002). Almost all recent trials of surgical interventions for PD, both related to intracerebral transplantation and deep brain surgery, have employed the UPDRS. It is a key component of the Core Assessment Programs for Intracerebral Transplantation and Surgical Interventional Therapies for PD (CAPIT/CAPSIT) (Goetz, LeWitt, & Weidenman, 2003). Although specifically developed to assess PD, the UPDRS has also been utilized to rate Parkinsonian features of other conditions, including normal aging, progressive supranuclear palsy, and Lewy body dementia. The UPDRS has been used in studies of early, mild PD, moderate but stable PD, and severe disease and motor fluctuations. Prior studies have demonstrated that the scale favors the assessment of moderate and severe impairments, and may not be ideally configured to assess very mild disease-related signs and symptoms (Vieregge, Stolze, Klein, & Heberlein, 1997). Several longitudinal studies of PD have demonstrated that the UPDRS score increases over time and scores are higher at key clinical decision-making points like the need to introduce symptomatic therapy (Poewe & Wenning, 1998). Numerous studies indicate that the UPDRS is responsive to therapeutic interventions. Published reports using the UPDRS, however, have focused almost exclusively on Caucasians (white men), and the UPDRS characteristics have not been extensively investigated in different racial or ethnic minorities (Tanner, 1999). Insufficient information is available on the ability of the UPDRS to discriminate between disease categories of clinical pertinence. To date, operative definitions of minimal, mild, 13

14 moderate and severe stages of PD have not been explicitly defined (Goetz, 2003). UPDRS scores, however, correlate with the Hoehn and Yahr scale and with the Schwab and England scale (Martinez-Martin et al., 1994). Furthermore, within the UPDRS, the objective, physician-derived Motor Section (Part III) correlates well with the subjective, patient-derived Activities of Daily Living section (Part II). Clinimetric Issues Of all available PD rating scales, the UPDRS has the additional advantage that it is the most thoroughly tested instrument from a clinimetric point of view. Almost onethird of all studies assessing clinimetric properties of impairment and disability scales for PD identified in a recent systematic review were targeted on the UPDRS (Ramaker, Marinus, Stiggelbout, & Van Hilten, 2002). Clinimetric scale evaluation usually assesses a scale's reliability and validity. The UPDRS has shown excellent internal consistency across multiple studies (Martinez-Martin et al., 1994). This high degree of internal consistency may be artificially inflated due to redundancy in the large number of items in Parts II and III of the UPDRS. Assessments of rater consistency included both inter-rater reliability and intra-rater reliability. Inter-rater reliability appears adequate for the total UPDRS (Martinez-Martin et al., 1994) as well as the Activities of Daily Living and the Motor Section. Several studies reports on the unacceptably low inter-rater reliability for selected items assessing speech and facial expression on the Motor Section of the UPDRS (Camicioli, Grossmann, Hudnell, & Anger, 2001). Other studies, however, reported acceptable inter-rater reliability estimates for these items (Martinez-Martin et al., 1994). There are also some published reports examining intra-rater reliability (Camicioli et al., 2001). This latter study shows low to medium intra-rater reliability. Among 400 early-stage PD subjects, examined on two occasions, separated by approximately 2 weeks, the intraclass correlation coefficients were very high: total score, 0.92; Mentation, 0.74; Activities of Daily Living, 0.85; Motor Section, 0.90 (Siderowf et al., 2002). The UPDRS has adequate face validity and samples important and typical domains associated with PD. In addition, its construction was guided by experts in the field and based on previous scales. Criterion validity has not been established because

14

15 there is no absolute gold standard that can be used for this assessment. The majority of validation studies have assessed the construct validity of the UPDRS. These studies have generally found satisfactory results regarding convergent validity with other instruments assessing PD, such as the Hoehn and Yahr or Schwab and England scales or timed motor tests (Stebbins & Goetz, 1998; Stebbins, Goetz, Lang, & Cubo, 1999). Divergent validity, or the degree to which the scale does not measure domains unrelated to PD, has not been well established. Multiple studies have examined construct validity of the UPDRS through factor analysis. These studies have found between three and six factors that account for a significant proportion of the total scale variance (Martinez-Martin et al., 1994; Stebbins & Goetz, 1998; Stebbins et al., 1999). The resultant factors form rational groupings of the items, and suggest that the scale has a valid multidimensional assessment format. So far, one factor structure, composed of six factors (axial/gait bradykinesia, right bradykinesia, left bradykinesia, rigidity, rest tremor, and action/postural tremor) has been shown to be stable across on and off states (Stebbins & Goetz, 1998; Stebbins et al., 1999). Additional validity studies have been conducted to assess the ability of the UPDRS to detect changes in function in either untreated or treated states. In general, these studies have demonstrated that the UPDRS is sensitive to changes in clinical status. Ambiguities of UPDRS Despite the marked strengths and wide usage of the UPDRS, a number of limitations nonetheless exist. First, as a composite scale, the UPDRS is uneven in the type of information it gathers. For example, Part I is conceptually different from Part II and Part III, and as a screening assessment for the presence of depression, dementia or psychosis, it cannot be used as an adequate severity measure of any of these behaviors (Goetz, 2003). Part IV is constructed differently than the rest of the UPDRS with a mixture of 5-point categories and dichotomous (yes/no) ratings that are difficult to analyze together.

15

16 Some items of the Motor Section have relatively poor inter-rater reliability, including Speech, Facial expression, Posture, Body bradykinesia, and all items of action/postural tremor and rigidity (Martinez-Martin et al., 1994). A specific example of a key testing problem is the assessment of postural stability in Part III. Because the response of the patient and the assigned rating depend directly on the force of the postural threat, standardized instructions and application of the test are essential for consistent ratings. These instructions are not part of the UPDRS. Additionally, there is some redundancy of items in both the ADL and Motor Section. While duplication of material enhances the internal consistency of the scale, some critics consider such enhancement a spurious inflation (Martinez-Martin et al., 1994). Redundancy also increases the time required to administer the scale. Efforts to reduce redundancy have led to the Short Parkinson's Evaluation Scale (SPES), based directly on the UPDRS, but with fewer items and reduced rating categories of 03 (Goetz, 2003). The SPES is a disease-specific scale, omitting the UPDRS items that are considered as redundant or of minor clinical significance. It contains four parts: mental state (3 items), ADL (8 items), motor examination (8 items), and complications of therapy (5 items). Furthermore, the HY and a scoring of motor fluctuations are included. The SPES adopts a four-point ordinal scale for each item (Martignoni, Franchignoni, Pasetti, Ferriero, & Picco, 2003). The allocation of items to specific parts of the UPDRS is not altogether consistent, leading to potential ambiguities of interpretation. Part II, titled Activities of Daily Living, includes a mixture of items which are directly related to daily activities (e.g. dressing, eating), but also examine patient perceptions of primary disease manifestations (e.g. tremor, salivation). Items that overlap these two categories include the gait items that assess primary Parkinsonian features (freezing, falls), and impact on walking as an activity of daily living. The UPDRS Part II is culturally biased, and the anchoring descriptions for some item ratings are not applicable to all ethnic environments. For example, Dressing (Item 10) describes difficulty with buttons, even though many traditional cultures do not use them; Cutting food/handling utensils (Item 9) presumes that food is regularly cut for eating and that utensils are used, although some cultures serve food in bite-size portions 16

17 and some do not use eating utensils. Although the scale was considered applicable to most international urban settings, the UPDRS may be limited by ambiguities when applied in epidemiological research efforts that involve field work to rural and geographically isolated cultures (Goetz, 2003). Comorbidities and the UPDRS PD is more prevalent in subjects over 50 years of age, therefore the co-existence of other diseases like diabetes, stroke, and arthritis can confound the evaluation of PDrelated impairment and disability. Furthermore, common co-existent disorders like depression can potentially affect the speed of a patient's movement, alter motivation, and enhance perceptions of disability even when PD itself is stable (Goetz, 2003). Important Elements Not Covered Several key elements of PD are not covered by the UPDRS. When the scale was formulated in the mid-1980s, the developers were well aware of this limitation, but they made choices to delete questions on some Parkinsonian impairments, mainly to create a scale that was reasonably simple and short. Several areas of concern exist (Goetz, 2003). Items not covered by the Unified Parkinson's Disease Rating Scale include: anhedonia, bradyphrenia, anxiety, hypersexuality, sleep disorders (insomnia, excessive daytime sleepiness), fatigue, dysautonomia (urinary dysfunction, constipation, impotence, sweating), dysregulation, and health-related quality of life.

Motor Section of UPDRS and Its Dimensionality As mentioned earlier, the UPDRS has several parts: Part I. (Mentation, Behavior and Mood); Part II. (Activities of Daily Living (ADL)); Part III. (Motor Section); Part IV. (Complications of Therapy). This study focuses on the Motor Section of the UPDRS (MS UPDRS) which consists of the following 27 items:

17

18 1. 2. 3 -7. 8 -9. Speech Facial expression Tremor at rest (Face/lips/chin (FLC); Right upper extremity (RUE); Left upper extremity (LUE); Right lower extremity (RLE); Left lower extremity (LLE)) Action/postural tremor of hands (Right; Left) 10 -14. Rigidity (Head/neck (H/N); RUE; LUE; RLE; LLE) 15 -16. Finger taps (Right; Left) 17 -18. Hand movements (Right; Left) 19 -20. Rapid alternating movements of hands (Right; left) 21 -22. Leg agility (Right; Left) 23. 24. 25. 26. 27. Arising from chair Posture Gait Postural stability Body bradykinesia and hypokinesia Each item of the Motor Section is scored in one of five response categories. The wording of the response categories is formulated differently for each item; however, ordering of categories is invariant across the items. Categories (scores) are numbered from zero to four and they are ordered increasingly. This means that the higher the category, the higher the value on the corresponding latent trait. In this context the term latent trait expresses the hidden quality of symptoms such as rigidity, bradykinesia, tremor, et cetera, which are, in principle, measurable for any person: the level of any latent trait equals zero in case of absence of the corresponding symptom. Within the MS UPDRS, main motor symptoms of PD (tremor, rigidity and bradykinesia) and axial symptoms, such as speech, posture, postural stability and gait, define symptom groups as being evaluated according to their respective severity. These symptom groups are typically derived by using (statistical) scaling techniques. Previous research assessing the dimensionality of the MS UPDRS (Cubo et al., 2000; MartinezMartin et al., 1994; Stebbins & Goetz, 1998; Stebbins et al., 1999) found between three and six factors accounted for a proportion ranging from 59% to 78% of the total scale 18

19 variance. However, all these studies used exploratory factor analysis (EFA), a scaling procedure, which is explorative and relies on either strong assumptions concerning the distribution of single variables or the number of observations or the level of statistical measurement (Dunteman, 1989; Eliason, 1993). As will be discussed later, however, given the statistical properties of the indicators in the MS UPDRS, neither EFA nor some of the confirmatory factor analysis (CFA) estimators are the most appropriate scaling techniques, because the assumptions of the underlying statistical model may easily be violated. Instead of EFA, we used methods conforming to nonparametric item response theory (NIRT) and structural equation modeling (SEM). NIRT represent a family of statistical measurement models based on a minimal set of assumptions necessary to obtain useful measurements with the aim to order items or persons with respect to their latent trait value (Sijtsma & Molenaar, 2002). NIRT does not parametrically define the function describing the relation between the probability of a response in an item response category and the value on the latent trait. Since NIRT models are designed for ordinal measurement they are well suited for the purposes of the MS UPDRS. Comparing to NIRT, the extra feature of SEM is that it provides evaluation of structure of the symptoms underlying Motor Section and therefore the conclusions about the cooccurrence of the symptoms can be inferred.

19

20 RESEARCH QUESTION

While previous studies assessed the dimensionality of the Motor Section of the Unified Parkinson's Disease Rating Scale (MS UPDRS) using inappropriate statistical methods, the results of the number of concepts underlying the MS UPDRS cannot be trustworthy. Further there is a necessity to determine the diagnostic quality of motor tests (in the sense of validity and reliability) employed for clinical praxis of diagnosing Parkinson's disease. In addition, the investigation of the relationships among the motor symptoms of Parkinson's disease is essential. Therefore the following scientific question is addressed: What kind of theoretical concepts and relationships among these concepts underlie clinical motor tests diagnosing Parkinson's disease?

HYPOTHESES

H1: While the motor impairment of the Parkinson's syndrome is a complex system of difficulties, it is assumed that the Motor Section of the Unified Parkinson's Disease Rating Scale will be multidimensional. H2: It is assumed that the generic reliabilities of all dimensions of the Motor Section of the Unified Parkinson's Disease Rating Scale will be lower than the standard requirements for such type of the motor tests, i.e. lower than 0.9. H3: No difference of items' factor loadings for patients in on and off states is assumed.

20

21 METHODS

Structural Equation Modeling Introduction Structural equation modeling (SEM) is a statistical methodology that takes a confirmatory (i.e. hypothesis-testing) approach to the analysis of a structural theory bearing on some phenomenon (Byrne, 2001). Kaplan (2000) defines SEM as a class of methodologies that seeks to represent hypotheses about the means, variances, and covariances of observed data in terms of a smaller number of structural parameters defined by a hypothesized underlying model. SEM is a parametric statistical methodology and its the goal is to draw inferences to a large, but (usually) finite, population based on estimates from a sample obtained from that population. SEM is widely used by biologists, economist, educational researchers, marketing researchers, medical researchers and a variety of social and behavioral scientists. It provides researchers with a comprehensive method for the quantification and testing of theories. Other major characteristic of SEM is that it explicitly takes into account the measurement error that is ubiquitous in most disciplines and that it can deal with latent variables. The term structural equation modeling conveys two important aspects (Byrne, 2001): a) the processes under study are presented by a series of structural (i.e. regression) equations b) these structural relations can be visualized graphically to enable a clearer conceptualization of the theory under study. The hypothesized model can be tested statistically in a simultaneous analysis of the entire system of equations to determine the extent to which it is consistent with the data. If the so-called goodness of fit is adequate, the plausibility of postulated relations among variables is enhanced; if it is inadequate, the tenability of such relations is rejected (Byrne, 2001). Boomsma (2004) enumerates what SEM can and cannot provide: 1) Allow multiple indicators of the same concept 21

22 2) Estimate the nature of measurement error in observed variables 3) Estimate relationships between concepts corrected for measurement error 4) Allow for correlated disturbances or measurement errors 5) Test the ability of an hypothesized model to account for the variances and covariances among the observed variables 6) Compare the fit of different hypothesized models to the data 7) Cannot prove causation 8) Cannot establish that a model is true Byrne (2001) formulated several aspects of SEM set it apart from the older generation of multivariate procedures. First, it takes a confirmatory rather than an exploratory approach to the data analysis. Furthermore, by demanding that the pattern of intervariable relations is specified a priori, SEM lends itself well to the analysis of data for inferential purposes. By contrast, most other multivariate procedures are essentially descriptive by nature (e.g. exploratory factor analysis), so that testing of hypotheses is difficult, if not impossible. Second, although traditional multivariate procedures are incapable of either assessing or correcting for measurement error, SEM provides explicit estimates of these error variance parameters. Indeed, alternative methods (e.g. General Linear Model) assume that error(s) in the explanatory (independent) variables vanishes. Thus, applying those methods when there is error in the explanatory variables is tantamount to ignoring error, which may lead to serious inaccuracies especially when the errors are sizeable. Third, although data analyses using the former methods are based on observed measurements only, those using SEM procedures can incorporate both unobserved (i.e. latent) and observed variables. Finally, there are no widely and easily applied alternative methods for modeling multivariate relations, or for estimating point and/or interval indirect effects. From the historical point of view, SEM represents the hybrid of two separate statistical traditions. The first tradition is factor analysis developed in the disciplines of psychology and psychometrics. The origins of factor analysis can be traced to the work of Galton and Pearson on the problem of inheritance of genetic traits. It is the work of Spearman (1904), however, on the underlying structure of mental abilities that can be 22

23 credited with the development of the common factor model. Spearman's theoretical position was that the intercorrelations among tests of mental ability could be accounted for by a general ability factor common to all of the tests and specific ability factors associated with each of the separate tests. In the 1930's attention shifted to the work of Thurstone and his colleagues at the University of Chicago. According to Kaplan (2000), Thurstone argued that there did not exist one underlying general factor of ability accompanied by specific ability factors as postulated by Spearman, but rather that there existed major group factors referred to as primary mental abilities (Thurstone, 1935). By the 1950s and 1960s factor analysis gained tremendous popularity, owing much to the development and refinement of statistical computing capacity. Indeed, Mulaik (1972) characterized this era as a time of agnostic and blind factor analysis. However, during this era, developments in statistical factor analysis were also occurring. Specifically, work by Jreskog (1967) and Lawley (1940) led to the development of a maximum likelihood based approach to factor analysis. A generalized least squares approach was developed later by Jreskog and Goldberger (1972). Developments by researchers like Anderson and Rubin (1956) led to the methodology of confirmatory factor analysis that allowed for testing hypotheses regarding the number of factors and the pattern of loadings. The second tradition is simultaneous equation modeling developed mainly in econometrics, but having an early history in the field of genetics. The genetic origin of SEM had its beginnings with the biometric work of Sewell Wright (1918; 1921). Wright's major contribution was in showing how the correlations among variables could be related to the parameters of a model as represented by a path diagram a pictorial device that Wright was credited with inventing. A second line of development occurred in the field of econometrics. The form of econometric modeling of relevance to SEM should be credited to the work of Haavelmo (1943), who was interested in modeling the interdependence among economic variables. This approach is known as simultaneous equation modeling.

23

24 The combination of these two types of methodologies into the coherent analytic framework was based on the work of Jreskog (1970), Keesling (1972) and Wiley (1973) which, besides others, led to developing of LISREL model used in this study.

Types of SEM Models Structural equation modeling (SEM), sometimes also labeled as Causal modeling, Latent variable modeling, Covariance structure analysis, LISREL models, etc. (Boomsma, 2004), can be considered as an umbrella term of other more specific statistical methods including Simultaneous equations (path analysis), Multivariate regression, confirmatory factor analysis (CFA), etc. A number of members of the SEM family, however, may vary from study to study. In accordance with Kelloway (1998), Raykov and Marcoulides (2000) as well as with Kaplan (2000), the following classification is recommended: Observed variable path analysis (or simply path analysis) Factor analysis (in SEM terminology measurement model) General structural equation models

Path Analysis Model These models are usually conceived only in terms of observed variables. For this reason, some researchers do not consider path analysis models to be typical SEM models. Nonetheless, path analysis is the important part of the historical development of SEM and uses the same underlying idea of model fitting and testing and therefore should be included into the family of SEM. An example of a path analysis model is presented in Figure 2. Path analysis was derived to partition direct and indirect relationships among variables. It deals with dependency relationships among variables and uses multiple regression as a method for estimating model parameters. Path models are presumed to

24

25 represent causal hypotheses. However, a significant path model does not imply causality (rather one can use the model to test for causality using experimental data). Let p be the number of response variables and q the number of explanatory variables. The system of structural equations representing the model in Figure 2 can be written as
y = + y + x + ,

where y is a p x 1 vector of observed response variables, x is a q x 1 vector of observed explanatory variables, is p x 1 vector of structural intercepts, is a p x p coefficient matrix that relates response variables to each other, is a p x q coefficient matrix that relates response to explanatory variables, and is a p x 1 vector of disturbance terms where cov( ) = is the p x p covariance matrix of the disturbance terms. Finally let cov( x ) = be the q x q covariance matrix of the explanatory variables.

X1

Y3

1 e3

X2

Y2

1 e2

X3

Y1

e1

Fig. 2. Example of path analytic model

Two general path analytic models can be distinguished: a) recursive, and b) nonrecursive ones. A characteristic feature of recursive systems is that elements of are contained in the lower triangular part of . In addition, for recursive models, is a diagonal matrix whose elements are the variances of the disturbances. In nonrecursive models a feedback loop between two response variables is specified. In other words, is not lower triangular. Furthermore, it is typically the case that a covariance term is 25

26 specified between the disturbances among response variables in the feedback loop. This means that is specified to be a symmetric matrix with nonzero off-diagonal elements. Nonrecursive models are referred to in the field of econometrics as simultaneous equation models and have been widely used in economics to study problems such as supply and demand for certain commodities. The presence of feedback loops also implies an underlying dynamic specification to the structural model insofar as some period of time is required for the feedback to take place (Kaplan, 2000). Assumptions of path analysis models are as follows (Path analysis and structured linear equations, 2004): Linear and additive relationships. In other words, path analysis excludes curvilinear and multiplicative models Error terms are supposed to be uncorrelated with one another (except for nonrecursive models) Recursive models only one way causal flows Observed variables are measured without error

Confirmatory Factor Analysis (CFA) Model The model used to relate observed measures to factors is the linear factor analysis model which can be written as

x = x + ,

where x is q x 1 vector of observed responses on q questions that are assumed to measure respective latent variable, x is q x k matrix of factor regression weights (usually called loadings), is a k x 1 vector k latent variables and is q x 1 vector of unique variables that contain both measurement error and specific error to be described below. It is convenient to evoke the assumptions that

26

27

E() = 0 , E() = 0 ,
and

cov(,) = 0 .

Under these assumptions, the covariance matrix of the observed data can be written in the form of the fundamental factor analytic equation, = cov(xx) = xE ()x + E () = xx + ,

where is a q x q population covariance matrix, is a k x k matrix of factor variances and covariances, and is a q x q diagonal matrix of unique variances. Because the CFA model focuses solely on the link between factors and their measured variables, within the framework of SEM, it represents what has been termed a measurement model (Byrne, 2001). Structural Equation Models and LISREL Definition There are several general structural equation models, for example Covariance Structure Analysis (COSAN) developed by McDonald (1978; 1980), Reticular Action Model (RAM) credited to McArdle (1980) and McArdle and McDonald (1984), Linear Equations (LINEQS) developed by Bentler and Weeks (1980); and Linear Structural Relationships (LISREL) first published by Jreskog (1973) but additionally credited to Wiley (1973). All of them are very general and following this generality, McDonald (1991) showed that RAM is the special case of COSAN and COSAN can be considered as a special case of RAM as well. Finally COSAN is a special case of LISREL. The LISREL approach, which is used in this study, defines the structural equation model as = + + ,

27

28 where is an m 1 vector of response latent variables, is a k 1 vector of explanatory latent variables, is an m m matrix of regression coefficients relating the latent response variables to each other, is an m k matrix of regression coefficients relating response variables to explanatory variables, and is an m 1 vector of disturbance terms. The latent variables are linked to observable variables via measurement equations for the explanatory and response variables. These equations are defined as x = x + and y = y + , respectively, where x and y are p m and q k matrices of factor loadings, and and
are p 1 and q 1 vector of uniqueness.

Statistical Assumptions of SEM This chapter considers the major assumptions associated with structural equation modeling. These include multivariate normality, linearity of relationships, sufficiently large sample size, and absence of multicollinearity.

a) Multivariate normality A substantive assumption underlying the standard use of SEM is that the observations of k random variables are drawn from a multivariate normal population. Each indicator should be normally distributed for each value of each indicator. Use of ordinal or dichotomous measurement are examples of violation of multivariate normality. This assumption is particularly important for Maximum Likelihood (ML) estimation since this estimator is derived directly from the expression for the multivariate normal distribution. Generally, the higher the degree of nonnormality, the higher the bias of this estimator (Hoogland, 1999).

28

29 The effects of non-normality on parameter estimates, standard errors, and tests of model fit are well known. In general, simulation studies (e.g. Kline, 1998) suggest that under conditions of severe non-normality of data, SEM parameter estimates are still fairly accurate but corresponding significance coefficients are too high. In contrast, standard errors appear to be underestimated relative to the empirical standard deviation of the estimates. Lack of multivariate normality usually inflates the value of the chi-square statistic such that the overall chi-square fit statistic for the model as a whole is substantially overestimated, and this overestimation appears to be related to the number of degrees of freedom of the model (Boomsma, 1983). The Satorra-Bentler adjusted chi-square are used for inference of exact structural fit when there is reason to think there is lack of multivariate normality. There are also other estimation methods than ML such as Weighted Least Squares (WLS) or Diagonally Weighted Least Squares (DWLS) which do not require the assumption of multivariate normality (see Bollen, 1989).

b) Linearity SEM assumes linear relationships between indicator and latent variables, and between latent variables. However, as with regression, it is possible to add exponential, logarithmic, or other nonlinear transformations of the original variable to the model. These transforms are added alone to model power effects or along with the original variable to model a quadratic effect, with an unanalyzed correlation (curved doubleheaded arrow) connecting them in the diagrammatic model. It is also possible (although not without difficulties) to model quadratic and nonlinear effects of latent variables (Kline, 1998).

c) Sufficient sample size Sample size should not be small since SEM is an asymptotic theory which implies that the behaviour of parameter estimates and test statistics are known only for large sample sizes.

29

30 One rule of thumb, is to have at least 15 cases per measured variable or indicator (Loehlin, 1992). Another rule of thumb found in the literature is that sample size should be at least 50 more than 8 times the number of variables in the model (Kaplan, 2000). Bentler and Chou (1987) recommend at least 5 cases per parameter estimate (including error terms as well as path coefficients). The researcher should go beyond these minimum sample size recommendations principally when data are non-normal (skewed, kurtotic) or incomplete.

d) Absence of multicollinearity Complete multicollinearity is assumed to be absent, but correlation among the independents may be modeled explicitly in SEM (Structural equation modeling, 2005). Complete multicollinearity will result in singular covariance matrices, which are ones on which one cannot perform certain calculations (e.g. matrix inversion) because division by zero will occur. Also, when the correlation coefficient (r) is high, say r .85 , multicollinearity is considered high and empirical underidentification may be a problem (Rindskopf, 1984). Even when a solution is possible, high multicollinearity decreases the reliability of SEM estimates. High multicollinearity might be suggested by values of standardized regression weights greater than +1 and or less than -1 (Jreskog, 1999). Likewise, when there are two nearly identical latent variables, and these two are used as causes of a third latent variable, the difficulty in computing separate regression weights may well be reflected in much larger standard errors for these paths than for other paths in the model, reflecting high multicollinearity of the two nearly identical variables (Structural equation modeling, 2005). The same difficulty in computing separate regression weights may well be reflected in high covariances of the parameter estimates for these paths - estimates much higher than the covariances of parameter estimates for other paths in the model. Another effect of the multicollinearity may be negative error variance estimates. Strategies for dealing with covariance matrices which are not positive definite (Structural equation modeling, 2005): Allow the LISREL program (Jreskog & Srbom, 2004a) to add automatically a ridge constant, which is a weight added to the diagonal of

30

31 the covariance matrix (the ridge). This strategy can result in markedly different chisquare fit indexes, however. Other strategies include removing one or more highly correlated items to reduce multicollinearity; using different starting values; using different reference items for the metrics of latent variables; or replacing tetrachoric correlations with Pearsonian correlations in the input correlation matrix.

Types of Parameters Used in SEM Models There are three types of model parameters that are important in conducting SEM analyses: All parameters that are supposed to be estimated by the program are commonly referred to as free parameters. Parameters whose values that are set to a given constant are called fixed parameters since they do not change value when the model is fit to the observed data. Fixing parameters (usually to zero) is the way how to postulate the model. The other types of parameters are called constrained parameters (also referred to as restricted or restrained). Nonlinear constraint can also be specified. Models that include constrained parameters have parameters that are postulated to be equal to one another, but their value is not specified in advance as is that of fixed parameters. Constrained parameters are typically included in a model if their restriction is derived from the existing theory or represents a substantively interesting hypothesis tested in a proposed model (Kaplan, 2000).

Methods for Parameters Estimation Generally, the aim of SEM is to reach as close a fit of the estimated covariance matrix with the observed covariance matrix as possible (Urbnek, 2000). Thus, the substantive hypothesis of SEM can be expressed as

S = () ,

31

32 where S is the observed covariance matrix and () is the population covariance matrix. This equation, however, is valid only if the model is so-called just identified. For overidentified models (that are in the scope of SEM) the solution which minimizes the socalled lost-function or discrepancy function E = S - ()

is desired. There are several estimation methods and types of discrepancy functions that are used in SEM programs like AMOS (Arbuckle, 2003), COSAN (McDonald & Fraser, 1990), EQS (Bentler, 1995a), etc. The application of each estimation method is based on the minimization of a corresponding discrepancy function. The current version of the most popular LISREL program (Jreskog & Srbom, 2005) provides the following estimation methods: a) Unweighted Least Squares (ULS) The simplest of all commonly used discrepancy functions. It can be expressed as
1 2 FULS = tr ( S - () ) 2 b) Generalized Least Squares (GLS) This function is given by following expression
2 1 FGLS = tr ( I S 1() ) , 2

where I is unit matrix. c) Maximum Likelihood (ML) This function is well known as a part of method of confirmatory factor analysis (Urbnek, 2000).
32

33
FML = ln () + tr ( S()1 ) ln S ( p + q )

d)

Weighted Least Squares (WLS) In both continuous and categorical cases, the approach to estimation under non-

normality utilizes a class of discrepancy functions referred generally as Weighted Least Squares (WLS). The WLS discrepancy function can be written as

FWLS = (s - )W -1 (s - ) ,
where s = vech(S) and = vech [ ()] are vectorized elements of S and () respectively1. The matrix W is a consistent estimate of the asymptotic covariance matrix of s and must be positive definite. e) Diagonally Weighted Least Squares (DWLS) Let w gh be an estimate of the asymptotic variance of s gh . These estimates may be used with a discrepancy function of the form
FDWLS = (1/ w gh ) (s gh - gh )2

Recently, the opinion of leader authors on SEM has shifted toward using DWLS instead of WLS for ordinal or categorical data, since using WLS requires for huge sample size and often led to problems with parameter estimates (negative error variances, etc.) Which method of estimation should a researcher use? Some clues are given by Jreskog and Srbom (2004b).

The vech() operator takes the k ( k + 1) / 2 nonredundant elements of the k k matrix and syringe them into a vector of dimension [ k (k + 1) / 2] 1 .
1

33

34 1. If the data are continuous and approximately follow a multivariate normal

distribution, then the method of Maximum Likelihood is recommended. 2. If the data are continuous and approximately do not follow a multivariate

normal distribution and the sample size is not large, then the Robust Maximum Likelihood method is recommended. For larger sample sizes, the method of Weighted Least Squares is recommended. Both these methods will require an estimate of the asymptotic covariance matrix of the sample variances and covariances. 3. If the data are ordinal, categorical or mixed, then the Diagonally Weighted

Least Squares (DWLS) method for polychoric correlation matrices is recommended. This method will require an estimate of the asymptotic covariance matrix of the sample correlations.

Note on Using Ordinal Variables in SEM


Observations on an ordinal variable represent responses to a set of ordered categories. It is only assumed that a person who selected a specific category has more of the characteristic than if he/she had chosen a lower category, but it is unknown how much more. Ordinal variables do not have origins or units of measurement. Means, variances, and covariances have no meaning (Jreskog & Srbom, 2004a). That is why other techniques are employed for using ordinal variables in structural equation modeling. For ordinal variable z it is assumed that there is an underlying continuous variable

z * which represents the attitude underlying the ordered responses to z and it is assumed
to have range from to + . This continuous variable z * can be used in SEM (instead of observed z) since it assigns a metric to the ordinal variable z. Polychoric correlations reflect the relationships among ordinal variables assuming existence of z * . Advocation for using the polychoric correlations for ordinal data is based on the work of Jreskog and Srbom (1988). In this simulation study the Phi, Spearman rank, and Kendall tau-b correlations performed poorly, whereas the polychoric correlations

34

35 with ordinal data produced robust parameter estimates and better fitting models. Furthermore, Ethington (1987) determined that the Pearson correlation coefficient underestimates the factor loadings of the ordinal variables and overestimates the chisquare values. One problem associated with the use of polychoric correlations is that the polychoric correlation matrices do not ensure positive definiteness . This could be caused by sampling, outliers, or variable collinearity. One approach to correcting this problem is to smooth the matrix using ridge constant (Wothke, 1992), which is implemented in the LISREL program. Another problem is that the polychoric correlation matrices generally provide inflated chi-square values and underestimated standard errors of estimates due to larger variability (Schumacker & Beyerlein, 2000). The WLS or DWLS estimators are recommended for parameter estimations in models where ordinal data are used (compare Jreskog and Srbom (1993) and (2004b)). Numerous simulation studies focused on the robustness properties of the WLS estimator. It was that WLS produce biased estimates for sample sizes less than 400 (Hoogland, 1999). Further, Muthn and Kaplan (1992) found that the WLS chi-square was markedly sensitive to sample size and this sensitivity increased as the size of the model increased. In addition, standard errors produced by WLS were noticeably downward biased, becoming worse as the model size increased (Kaplan, 2000).

Identification Problem in SEM


The general problem of identification is whether unique estimates of the parameters of the full model can be determined from the elements of the covariance matrix of the observable variables. In this sense, there are three types of SEM models: underidentified, just-identified, and overidentified. There are several rules for the identification of structural models. Here, only the t rule is introduced. Interested readers are referred to Bollen (1989). The t rule says that one cannot estimate more parameters than there are unique elements in the covariance matrix. This means that given the k k covariance matrix (where k is the number of observed variables), more than k (k + 1) / 2 parameters cannot 35

36 be computed. Computing exactly k (k + 1) / 2 parameters results in just-identified model, sometimes also referred to as saturated model (Medsker, Williams, & Holahan, 1994). Such a model always provides an unique solution that is able to perfectly reproduce the covariance matrix. Saturated models have no statistical character and therefore are out of interest of the SEM. When the number of unknowns exceeds the number of equations, the model is said to be underidentified. This is a problem since the model parameters cannot be uniquely estimated; there is no unique solution. In fact there are an infinite number of solutions. Finally, and most importantly, when the number of equations exceeds the number of unknowns, the model is referred as overidentified. When models are overidentified, there are a number of solutions to obtain unique estimate, and the task in most applications of structural equation modeling techniques is to find the solution that provides the best fit to the data. Thus, besides the empirical identification problems, the identification of structural equation model is purely a matter of the number of estimated parameters (Bollen, 1989). The t rule is a necessary but not a sufficient condition for model identification.

Model Testing and Fit Evaluation


Structural equation models are used to test a theory about relationships between theoretical concepts. A major aspect of model-fit evaluation involves the issue of the substantive considerations of the model. Specifically, all models considered in research should be conceptualized according to the latest knowledge about the phenomenon under study (Raykov & Marcoulides, 2000). Whereas classical methodology is typically interested in rejecting null hypotheses, SEM is most concerned with finding a model that does not contradict the data. In other words, when using SEM methodology one is usually interested in not rejecting the null hypothesis. However, not rejecting a null hypothesis does not mean that it is true. Similarly, because model testing in SEM involves testing the null hypothesis that the model is capable of perfectly reproducing the analyzed matrix of observed variables, not 36

37 rejecting a fitted model does not imply that it is the true model. Not rejecting a fitted model can be due to incorrect specification of the model or due to sampling error. In addition, just because a model fits the data well does not mean that it is the only model that fits the data well. As noted by Raykov and Marcoulides (2000), there are usually a number of models that fit the data equally well as the model under consideration or interpretation. Which one of these models is better and which one is wrong can only be decided on the basis of a sound body of knowledge about the studied phenomenon. One can also evaluate the validity of a proposed model by conducting replication studies (i.e. cross-validation). The value of a proposed model is greatly enhanced if the same model can be replicated in new samples from the same population (Raykov & Marcoulides, 2000).

Chi-square statistic
Model discrepancy is often expressed as the asymptotic chi-square test statistic. The value of the chi-square statistic should not be significant if there is a good model fit since a significant chi-square indicates lack of satisfactory model fit. That is, the chisquare statistic is a "badness-of-fit" measure in that a finding of significance means the given model's covariance structure is significantly different from the observed covariance matrix. If the corresponding p value is less than .05, the researcher's model is rejected. There are three ways, listed below, in which the value of the chi-square test statistic may be misleading: a) The more complex the model, the more the chi-square test statistic tends to indicate a good fit and therefore can mislead a researcher. In other words, the chi-square test statistic tests the difference between the researcher's model and a just-identified version of it, so the closer the researcher's model is to being just-identified, the more likely good fit will be found. In a just-identified model, there will be always a perfect fit regardless the quality of a model. b) The larger the sample size, the more likely the rejection of the model and the more likely a Type II error (rejecting true hypothesis). In very large samples, even tiny

37

38 differences between the observed model and the perfect-fit model may be found significant. c) The chi-square test statistic is also very sensitive to violations of the assumption of multivariate normality. When this assumption is known to be violated, the researcher may prefer the Satorra-Bentler scaled chi-square (which is an adjustment to chi-square which penalizes chi-square for the amount of kurtosis in the data) or the mean

and variance adjusted chi-square.

Because of these reasons, many researchers who use SEM believe that with a reasonable sample size (say larger than 200) and good approximate fit as indicated by other fit indexes (e.g. CFI, RMSEA), the significance of the chi-square test may be discounted and that a significant chi-square is not a reason by itself to modify the model.

Alternative Fit Indices


In this section, the basic ideas behind the several kinds of fit indexes will be introduced. Fit indexes used in our research are introduced more precisely, the less important ones (from our point of view) are only listed. The development of this class of indices has been partly motivated by the known sensitivity of the likelihood ratio chisquare statistic to large sample sizes. The basic idea behind the first kind of indices is that the fit of the model is compared to the fit of some baseline model that usually specifies complete independence among the observed variables. Such a model of complete independence is the most restrictive model possible and hence the measure of fit of the baseline model will usually be quite large. The question is whether the model of interest is an improvement to the baseline model. The usual rule of thumb for these indices is that 0.95 is indicative of good fit relative to the baseline model. The first indices of this kind were developed by Tucker and Lewis (1973) (TLI), Bentler and Bonett (1980) (NNFI, NFI) and Mulaik et al.

38

39 (1989) (PNFI). Other variations of these have been proposed and discussed by Bollen (1986; 1989) (RFI, IFI) and Bentler (1990) (CFI). The quintessential example of a comparative fit index is the normed fit index (NFI). This index can be written as

NFI =

b 2 t 2 , b 2

where b 2 is the chi-square for the baseline model and t 2 is the chi-square for the model of interest. The baseline model will be typically associated with a very large chi-square value and since the null hypothesis tested by b 2 states the covariances among the variables are zero. NFI assumes a true null hypothesis (and therefore the central chi-square distribution of the test statistic), however, the null hypothesis is never exactly true and the distribution of the test statistic can be better approximated by some non-centrality parameter. Thus, the relative noncentrality index (RNI) was developed by McDonald and Marsh (1990). Values of RNI can lie outside the range [ 0,1] . The adjusted version of this index (values within the range [ 0,1] ) was introduced by Bentler (1990) and it is referred to as the comparative fit index (CFI). It is defined as ( t 2 df t ) ( b 2 df b )

CFI = 1

where df t and dfb are the corresponding degrees of freedom. Another kind of fit indices focuses on the matrix of residuals, i.e. on the difference between observed and fitted covariance matrix. Index of this kind that is preferred nowadays (see e.g. Hu and Bentler (1999)) is the standardized version of Jreskog and Srbom's (1981) root mean squared residual (SRMR, Bentler, 1995b).

39

40 Another kind of indices is sometimes refered to as measures based on errors of approximation (Kaplan, 2000). The use of chi-square as a central 2 -statistic is based on the assumption that the model holds exactly in the population. However, this may be an unreasonable assumption in most empirical research (Kaplan, 2000). The consequence of this assumption is that models which hold approximately in the population will be rejected in large samples (Jreskog & Srbom, 1993). The detailed discussion what is meant by approximately can be found in Kaplan (2000). Thus, Steiger (1990) defines a

root mean square error of approximation (RMSEA) as

RMSEA = with point estimate

F0 , where F0 = 2 df df

RMSEA = max F n 1df , 0 ,

where n = N 1 . F0 is the population discrepancy value that would have been obtained if the model had been fitted to the population covariance matrix and F is the corresponding sample discrepancy value function obtained when the model is fitted to the sample covariance matrix. In utilizing the RMSEA for assessing approximate fit, a formal hypothesis testing framework can also be employed. Browne and Mels (1990) defined a close fit as a RMSEA value less than or equal to 0.05. Thus, the formal null hypothesis to be tested is H 0 : 0.05 . Practical suggestions for RMSEA are the following (Browne & Cudeck, 1993; McDonald & Ho, 2002): Values of RMSEA less than 0.05 indicate a close fit, values between 0.05 and 0.08 are indicative of fair fit, values between 0.08 and 0.10 are indicative of mediocre fit. Values of RMSEA higher than 0.10 indicate poor fit. Values of RMSEA less than 0.06 in combination with values of SRMR less than 0.09 have been

40

41 found to have the least sum of Type I and Type II error rates and thus are preferable for model evaluation (Hu & Bentler, 1999). Recent developments in statistical theory allow one to gauge the extent to which a model will cross-validate in a future sample based on the work of Akaike (1973; 1987). To begin, one must adopt the idea that the goal of statistics is the realization of appropriate prediction. Then, attention shifts from the estimation of parameters to the estimation of distributions of future observations. The question is how to estimate a distribution. Here, Akaike (1985) shows how the concept of entropy could be related to notions of statistical information. Estimating a distribution of future observations is referred as a predictive distribution. The adequacy of this predictive distribution can be measured, according to Akaike, as the deviation of the predictive distribution from some true distribution. This observation leads to a measure of the badness-of-fit of the model, referred to as Akaike's information criterion (AIC) and written as AIC( H 0) = -2 max ln L( H 0) + 2t0 ,

where, in the context of SEM, t0 is a number of unknown parameters under the null hypothesis H 0 : S = () . When the goal is to use AIC for model comparison, then AIC( H 0) can be defined as

AIC( H 0) = 2 2(df ) .

The use of the AIC requires fitting several competing models. Then, the model with the lowest value of the AIC among them is deemed to fit the data best from a predictive point of view. Another issue in evaluating a structural model is whether the model is capable of cross-validating well in future sample of the same size, from the same population and sampled in the same fashion. Thus Browne and Cudeck (1993) defined a cross-validation index (CVI) which measures the extent to which the fitted model from the calibration sample also fits the validation sample. The same authors developed also a single sample 41

42 cross-validation index, the so-called expected cross-validation index (ECVI) (Browne & Cudeck, 1989), which can be estimated as ECVI = E F ( v , c ) = F ( v , c ) + n 1 (k + t ) , where k = k (k + 1) / 2 and represents the number of non-redundant elements of S and t represents the number of free parameters in model. The ECVI is to be used in the same way as the AIC for selecting among competing models. That is, the model with the smallest ECVI is selected as the model that will cross-validate best.

Conventional Practice and Recommendations for Model Evaluation Regardless of whether a researcher proposes a model that supports or contradicts past knowledge, the advantages of SEM methodology can only be used with variables that have been assessed in the sense of validity and reliability. If the analyzed data are poor, in the sense of reflecting substantial unreliability and poor construct validity, the results will be poor, regardless of the quality of the model. An intriguing question is: Why do methodological presentations and substantive applications of structural equation modeling focus so heavily on model fit? From a statistical point of view, a good reason to be concerned with model fit is that strong evidence to the contrary would call into question the accuracy of parameter estimates and their standard errors. Indeed, numerous studies (e.g. Kaplan, 1989a; 1989b) have shown that parameter estimates, standard errors, and Type II error rates are profoundly affected by model specification error as indicated by lack of model fit. However, the fit of the model should not be blindly investigated by fit indices only. Researcher should always investigate and report whether the residuals are close to zero (Boomsma, 2000). In fact, there are often at least one goodness-of-fit index which indicates good fit even if the model is poor (McDonald, 2003). In cases where the variables have low correlation, the structural (path) coefficients will also be low. Researchers should report not only fit indices but also should report the 42

43 structural coefficients so that the strength of paths in the model can be assessed (Boomsma, 2000). Readers should not be left with the impression that a model is strong simply because the "fit" is high. When correlations are low, path coefficients may be so low as not to be significant; even when fit indexes show "good fit." Comparative Fit Index (CFI) and other indices of fit compare model-implied covariances with observed covariances, measuring the improvement in fit compared to the difference between a null model with covariances as 0 on the one hand and the observed covariances on the other. As the observed covariances approach 0 there is no "lack of fit" to explain (that is, the null model approaches the observed covariance matrix). More generally, "good fit" will be harder to demonstrate as the variables in the SEM model have low correlations with each other. That is, low observed correlations often will bias model chi-square and other fit indexes toward indicating good fit (Structural equation modeling, 2005). Likewise, one can have good fit in a misspecified model. One indicator of this occurring is if there are high modification indices in spite of good fit. High modification indexes indicate multicollinearity in the model and/or correlated error (Structural equation modeling, 2005). A good fit does not mean that each particular part of the model fits well. Also, a good fit does not mean that the explanatory variables are causing the response variables. For instance, one may get a good fit precisely because one's model accurately reflects that most of the explanatory variables have little to do with the response variables. On the contrary, one may get a bad fit not because the structural model is in error, but because of a poor measurement model.

43

44 Mokken's Scale Analysis Introduction Nonparametric item response theory (NIRT) is a family of statistical measurement models that are based on a minimal set of assumptions necessary to obtain useful measurements of person and items. Generally, this theory can be viewed as a nonparametric approach to item response theory (IRT). That is because the nonparametric approach does not parametrically define the function describing the relation between the probability of a response in a response category and the latent trait. It implies that NIRT models are generalized IRT models, for example well known IRT models such as 1- or 2or 3-parameter logistic model and parametric models for polytomous items are the special cases of NIRT. Because NIRT models allow for ordinal measurement, they are well suited for traditional tests and questionnaires that are presented to each respondent. Older models factor analysis and classical test theory, are focused on the outcome of the test as a whole. One major advantage of using IRT and NIRT is its modeling of specific response probabilities for each specific person-item combination (Sijtsma & Molenaar, 2002). Within the framework of modeling via NIRT the term latent trait instead of latent variable or factor will be used. Generally, latent traits can be measured only by set of tasks or items on which observable responses are recorded. It is assumed that a person has a position on the latent trait. This position is called the latent trait value or the person parameter (Molenaar & Sijtsma, 2000). It is impossible to determine latent trait value of the person by just one question or one observation. Indirectly, however, people's positions on a latent trait can be inferred through combining their observable answers to a skillfully chosen set of questions. It is also assumed that this latent trait can be displayed on continuum and that every person has at least low but finite level of each trait. In 1971 Mokken suggested a theory and a procedure of scale analysis for dichotomous items that has become known as Mokken's scale analysis. This theory was developed for building unidimensional scales as well as for ordering items in such scales. Mokken's models are based on several assumptions but they remain very general. This

44

45 generality is one of the biggest advantages of Mokken's models; it enables to fit many data sets powerfully and thus the Mokken's scale analysis is widely applicable. Although Mokken's scale analysis was developed for scales with dichotomous items (e.g. answers can be for example yes and no or apply and not apply) it can be easily extended to polytomous items (respondents can choose from more than two answers for example Likert scales).

IRT Versus Nonparametric IRT In this section the extension of the traditional and well-known IRT models such as the 1-parameter logistic model (1PLM; the special case of 1PLM is also known as the Rasch model) and the 2-parameter logistic model (2PLM) will be outlined. This will lead to introduction of a new group of models NIRT models and their special cases, the socalled Mokken's models. The discussion is restricted to dichotomous items only extension to polytomous items will be introduced in the next section. To appreciate the difference between the IRT and NIRT, it is necessary to define the so-called item response function (IRF) that behaves specifically for each of IRT models. Let Xi denotes the item score, which equals 0 if the answer is incorrect and equals 1 if the answer is correct, T denotes the latent trait value, than IRF is defined as P( Xi = 1| T ) = Pi (T ) ,

which means the IRF is a conditional probability of the correct answer conditional upon the value of T . From practical point of view, it is convenient to express this probability as the logistic function eT . 1 + eT

Pi (T ) =

45

46 This function increases monotonely as shown in the figure below:

Fig. 3. Logistic function

Obviously, lim Pi ( F ) = 1 and lim Pi ( F ) = 0 . This function depends on T only,


T
T ( )

so it assumes that all items have the same difficulty. In other words, fitting this model to the test battery leads to keeping items with the same difficulty only. Frequently, however, one wants to incorporate items with various difficulties. The function which takes into account the difficulty is the following: e (T di ) , 1 + e (T di )

Pi (T ) =

where d is the difficulty parameter. This equation is well known as the Rasch model. Figure 4 shows several IRFs with d = [-3, -1, 0, 1, 4] .

46

47

Fig. 4. IRFs of various difficulty parameters

For all IRFs the slopes are the same. This means that 1PLM assumes that the relationship between the item score and the latent trait is the same for all items. This is a very strong assumption. Alternatively, the 2-parameter logistic model (2PLM) is free from this assumption since it adds another parameter. Let's denote this parameter as a. Thereafter the equation for the 2PLM is the following: e ai (T di ) 1 + e ai (T di )

Pi (T ) =

Parameter a is interpreted as the discrimination power. The higher a, the steeper the slope. The steeper the slope, the better the item distinguishes (within the limited range of T) between people with low and high level of the latent trait value. Several IRFs with various ai and di are depicted in Figure 5.

47

48

Fig. 5. IRFs of various difficulty and discrimination parameters

Comparing to NIRT models, all these parametric IRT models tend to restrict the test data and through the kinds of parameters they may lead to other kinds of applications (Michielsen, Vries, Heck, Vijver, & Sijtsma, 2004). NIRT models provide an ordinal scale for measuring persons with respect to the latent trait of interest such as ability or a personality trait. NIRT models are based on just enough assumptions necessary to maintain the ordinal measurement of persons and items, and for this purpose an order relation between the item score and the latent trait suffice. It means, that these models assume that the relationship between conditional probability of correct answer Pi (T ) and latent trait value T is governed by order restrictions. Rigorously, these models assume that for any pair of arbitrarily chosen values Ta and Tb with Ta < Tb , it holds that

Pi (Ta ) Pi (Tb ) .
In other words, the IRF is any nondecreasing function of T. It is not necessarily a logistic function. One may argue that in practice, many IRFs are logistic or nearly so. But Sijtsma and Molenaar (2002) claim, using the Rasch model rejects, for example, all items having IRFs with different slopes than 1, using 2PLM means rejecting, for example, all 48

49 items having IRFs with lower asymptotes that are greater than 0 (typical of the multiplechoice item format). Using the 3-parameter logistic model, that extends 2PLM with lower asymptote or guessing parameter, means rejecting all items with IRFs having upper asymptotes less than 1 as well as all IRFs with an irregular shape, one or more sharp bends, etc.. Figure 6 shows some examples of nondecreasing IRFs and thus conforming to NIRT.

Fig. 6. A logistic IRF (solid curve), a nonparametric IRF (dashed curve); and ordered latent class model IRF (step function)

Some of the NIRT models also provide an ordinal scale for items with respect to their difficulty. There are also NIRT models for measuring preferences, as for certain consumer products, which assume that the relationship between item score and latent trait is bell-shaped. Such models are interesting but have only few applications (Michielsen et al., 2004).

Assumptions Underlying NIRT Models


NIRT models are free from some strong assumptions that make other models more or less limited. However, there are four assumptions relevant to NIRT models for dichotomous items. Extension of these assumptions for polytomously scored items is discussed in another section.

49

50 A) Assumption of unidimensionality This assumption means that all items from the same item set share (measure) the same latent trait apart from unique characteristics and measurement error. Recalling the distinction between latent variable as a formal mathematical representation of latent trait, which is semantic interpretation of empirical attribute, the so-called indicator (Blahu, 1996a). Sijtsma and Molenaar (2002) formulate some interpretations of this assumption: a) b) The psychological interpretation is that all items measure one ability - for example the level of stress or well-being of respondents. The mathematical interpretation says that only one latent variable is necessary to account for the data structure. From a practical point of view, it is convenient to have unidimensional scales. Obviously, it is much easier to interpret results that are based on the sum score from such scales. It is also convenient to combine several unidimensional scales into a single battery that has high predictive validity. B) Assumption of local independence This important and strong assumption does not underlie NIRT only but it is substantial for various statistical models (see e.g. Blahu (1985)). It was first defined by Anderson (1959). Within the NIRT framework, the local independency states that the individual response to item i is not influenced by his or her responses to the other items in the same test (Sijtsma & Molenaar, 2002). Because NIRT is a probabilistic model, this assumption can be also expressed in terms of probabilities as shown below. Note that the following is valid for dichotomous items only. Let X = ( X 1 , X 2 , ..., X k ) be the vector that contains the item score variables,

x = ( x1 , x2 , ..., xk ) a realization of X. Then local independence means

P ( X = x|T ) = P( Xi = xi| T ) ,
i =1

where P ( Xi = xi | T ) is a conditional probability of having score xi in item i, given a latent trait level T.
50

51 The interpretation of this equation is outlined in Blahu (1976): If we determine the respondent's latent trait value, then the performance of this respondent may vary randomly only. Local independence implies that for a fixed value of T, the covariance between any pair of items equals 0. However, if cov( X i , X j | T ) = 0 for any i and j, it does not

necessarily mean that local independence holds. Mathematically, it is a necessary, but not a sufficient condition for satisfying the assumption of the local independence. In addition, although unidimensionality and local independence are in relation, they do not imply one another. Local independence can be violated, for example, by learning through practice. This means that during testing latent trait value T may vary increase during practice, or decrease, for example, because the testee becomes tired. For the test constructors it is important not to violate this assumption by making items that are functionally dependent. Here is a simple example: Item 1: Forward roll Item 2: Forward roll, rotation, backward roll It is obvious, that somebody who is not able to make a forward roll will fail in both items, even if he or she can make a backward roll. Although detecting the local independency is difficult, some statistical methods have been developed by Douglas, et al. (1998), Ip (2000; 2001) or Hoskens and De Boeck (1997). C) Assumption of monotonicity of IRFs The third assumption is that for each item the probability of correct answer Pi (T ) is a monotonously nondecreasing function of the latent trait T. Investigation of this assumption is an important part of the software for evaluating NIRT models. D) Assumption of Nonintersecting IRFs The previous assumptions (A-C) are sufficient for many applications of NIRT. However, a more restrictive model that will be introduced later requires the additional
51

52 assumption of nonintersecting of IRFs across T. IRFs may touch locally and may even be coincide completely in the extreme case (Molenaar & Sijtsma, 2000). This assumption implies the following: If it is known that the probability of a correct answer for item k is lower than for item l for one value of T and assumption of nonintersection of IRFs holds then
Pk (T ) < Pl (T )

for all values of T. For the dichotomous scored items, the expected conditional item score equals the IRF value, because
E ( Xi | T ) = 0* P( Xi = 0 | T ) + 1* P( Xi = 1| T ) = Pi (T ) .

Now the IRFs values can be replaced by expected conditional item scores
E ( Xk | T )<E ( Xl | T )

for all Ts. In other words, nonintersection of IRFs allows a stochastic ordering of items. It can be generalized to n items. This ordering is often called an invariant item ordering (IIO) (Hemker, Sijtsma, Molenaar, & Junker, 1996). Such an ordering is useful for many applications that are introduced in Sijtsma and Molenaar (2002).

Extension of NIRT to Polytomous Items

So far, only NIRT models for dichotomous items have been discussed. Usually, however, respondents are offered items with several ordered answer categories. The wellknown Likert scale is an example of such ordered answer categories. In fact, NIRT models can be used for any scales where the order of answer categories is meaningful.

52

53 Hence it is assumed that all items used in a test battery or a questionnaire have the same number of answer categories, which follows from the reasons given by Sijtsma and Molenaar (2002). They state that although it is possible to model items with different numbers of categories (each of these items measuring the same T), there are two reasons to ignore these cases: a) b) In practice items in the test battery measuring the same latent trait usually have the same number of answer categories. Often there is no theoretical justification for using different numbers of answer categories which would result in the assignment of different weights to the items simply because with one item more points could be earned than with another item. For purpose of extension to polytomous items, it is necessary to define a test score (or sum score), denoted as X + . Let's assume that answer categories are numbered such that X i = 0, 1,..., m , for all i = 1,..., k . Then the test score is defined as

X + = Xi .
i =1

Values of X

are in range from 0 to mk. For example in Likert scaling where the

item score starts at 1, then X + = k , k + 1,..., mk . Although both scoring leads to the same results, it is preferred to use scoring that starts with 0. Let k denotes the number of items with m + 1 answer categories, h denotes the polytomous item score. Then the so-called item step is an imaginary threshold between two adjacent answer categories. Let's define the item step score denoted as Yih as
Yih = 0 if X i < h and Yih = 1 if X i h .

The item score now can be written as

53

54
Xi = Yih ,
h =1 m

and the test score

X + = Xi = Yih ,
i =1 i =1 h =1

where X + is again in range from 0 to mk. In other words, polytomous item with m+1 answer categories can be divided into

m+1 dichotomous item responses.


The IRF can be also extended to polytomous items as the so-called item step

response function (ISRF). ISRF is defined as Pih (T ) = P( X i h | T ) ,


where i = 1,..., k and h = 0,..., m . Obviously, for each item there are just as many ISRFs as item categories, but for
h = 0 this probability equals 1, which is not informative about the item functioning. In

other words, item with m+1 answer categories has m meaningful ISRFs. The NIRT models for polytomous items assume that all ISRFs are monotonously nondecreasing functions in T. Moreover, within one item ISRFs do not intersect by definition, because difference between Pih(T) and Pi,h+1(T) leads to
m m

Pih(T ) Pi , h + 1(T ) =

P( X = g | T ) P( X = g | T ) = P( X = h | T ) .
i i i g =h g = h +1

Because probability is always nonnegative, ISRFs may tie but not intersect. However, this does not necessarily hold for the ISRFs of different items. For polytomous items, assumptions of unidimensionality and local independence remain the same as for dichotomous items. Assumption of the monotonicity will be applied to the ISRFs. Assumption of the nonintersection will be applied to the ISRFs as well. Note that nonitersection for polytomous items is much stronger than for 54

55 dichotomous, because for each pair of items there are 2m ISRFs which are not allowed to intersect.

Mokken's Monotone Homogeneity Model for Polytomous Items


The Mokken's monotone homogeneity (MH) model for polytomous items is based on the assumptions of unidimensionality, local independence, and monotonicity. The MH model describes item response data that were generated by the set of homogenous items (unidimensionality) having ISRFs that are monotonically (monotonicity) related to the latent trait (Sijtsma & Molenaar, 2002). The MH model for polytomous items implies the following: Let x+ be an arbitrarily fixed value of X + . Then for two fixed values of T, denoted Ta and Tb such as 0 < Ta < Tb < , the MH model implies

P ( X + x+ | T = Ta ) P( X + x+ | T = Tb) .

This model allows stochastic ordering of respondents on the X + by means of T. This means that fitting MH model implies that persons can be ranked on the test score

X + using their ordering on the latent trait T. In other words, we can expect higher test
score on condition that respondent will have higher value of T. However, the latent trait value T is obviously not known. It is more useful to ask whether it is possible to reverse roles of T and X + , it means if this model allows ordering persons on T by means of X + . For dichotomous item

X + can be used safely for ordering persons on T (Sijtsma & Van der Ark, 2005). For
polytomous items, however, Hemker, Sijtsma, Molenaar, and Junker (1997) showed that from the theoretical point of view this does not necessarily hold. However, some studies have shown (Van der Ark, 2000) that using X + for ordering persons on the T is efficient for many tests and distributions. Moreover, Sijtsma and Van der Ark (2005) and Sijtsma and Molenaar (2002) state that distortions occur only in rare cases and almost always 55

56 involve respondents that are located closely on the X + scale; say, one or two points apart. The same authors recommend using X + as a proxy for ordering respondents on T without hesitation.

Mokken's Double Monotonicity Model for Polytomous Items


The double monotonicity (DM) model for the polytomous items is the second Mokken's model. It is an NIRT model based on the same assumptions as the MH model (unidimensionality, local independence, monotonicity), plus the fourth assumption assumption of nonintersection of ISRFs. As mentioned earlier, the assumption of nonintersection allows ordering of the dichotomous scored items. However, as pointed out by Sijtsma and Molenaar (2002), it does not hold for polytomous items. This assumption only orders the ISRFs but not the items themselves. Nevertheless Sijtsma and Hemker (1998) developed the so-called

strong DM model that provides an invariant item ordering. Introduction of the strong
DM would enlarge the thesis to a large extend. Thus, interested readers are referred to Sijtsma and Hemker (1998).

Scaling Procedure
Several researchers (Molenaar, 1991; Sijtsma, Debets, & Molenaar, 1990) discussed methods for determining the fit of the MH model and DM model for polytomous item scores. These methods were implemented into computer program MSPWin, currently in version 5 (Boer, 2001). They include (Sijtsma & Van der Ark, 2005):

Scalability coefficients The scalability coefficients were first introduced by Loevinger (1947) for the

purpose of evaluation of the homogeneity of a set of items. The scalability of item pair coefficient (Hij) is defined as the ratio of the covariance of items i and j; and the 56

57 maximum covariance given the marginals of the bivariate cross-classification table of scores on items i and j, that is,

Hij =

Cov( Xi, Xj ) . Cov max( Xi, Xj )

The item scalability coefficient Hi is defined as

Hi

Cov( X , X ) , = Cov ( X , X )
j i i j j i max i j

and the scalability coefficient H for k items is defined as


k -1 k

H=

Cov
i =1 j =i +1

i =1 j =i +1 k 1 k

Cov( X , X
i max

(Xi, X j )

It has been shown by Hemker, Sijtsma, and Molenaar (1995) that given the MH model, all Hij , H i and H coefficients range from 0 to 1. Generally, if H = 1 , there is no disordering or inversion of the item responses. If H = 0, this means that there is no linear relation among the test items (no correlation). Under the MH model, H values of 0 mean that for k -1 items the ISRFs are constant functions of T (Hemker et al., 1995). Generally, scales with H < 0.3 are not considered as unidimensional. Scales with H coefficients higher than 0.3 and lower than 0.4 are considered to be weak, when 0.4 < H < 0.5 the scale is of medium strength and when H >0.5, the scale is seen as a strong one (Sijtsma & Molenaar, 2002). Higher H values mean that the slope of the ISRFs tends to be steeper, which implies the items discriminate better among T (Sijtsma & Van der Ark, 2005).

57

58 To assess whether an item is coherent enough to be included into the scale, the corresponding item coefficient Hi is used. All His in the unidimensional scale should be larger than 0.3. During the process of analysis of the scale, the cutoff value of Hi must be specified by the researcher, whose decision would depend on the need for homogeneity of scales the higher the cutoff value of Hi, the higher the level of homogeneity (unidimensionality) of the resulting scales. A method for estimating the reliability of the total score X + Sijtsma and Molenaar (1987) proposed a method to estimate the reliability of the total score X + . This method is based on methods proposed by Mokken (1971) and assumes that the ISRFs do not intersect. Thus, before interpreting the reliability estimate it should be checked whether the DM model assumptions are not violated. The reliability estimated by this method is denoted as Rho coefficient. Sijtsma and Molenaar (1987) showed that in number of cases Rho coefficient is almost unbiased, whereas Cronbach's (1951) alpha coefficient always underestimates the reliability of the total score.

Limitations and Issues of Mokken's Scale Analysis Apart from ordering persons on latent trait by means of sum score (mentioned earlier) there are some other theoretical problems and limitations of the Mokken's models. Three of them will be discussed here. Recalling the invariant item ordering (IIO), MSPWin does not contain methods that explicitly investigate whether or not the set of items has an IIO. Because neither MH nor DM model for polytomous items imply an IIO; and because researchers often like to know whether the difficulty orderings are the same for different relevant subgroups, the inclusion of a method for investigating the item ordering in relevant subgroups would be an important new feature of MSPWin. Second, the estimation of ISRFs is based on P ( X i x | R) , where R = X + X i is the so-called rest score. These estimates are used for checking whether the ISRFs are monotonely nondecreasing and whether ISRFs are nonintersecting. However, Junker and

58

59 Sijtsma (2000) showed that under the MH model generally P ( X i x | R ) is not nondecreasing in R. Alternatively, Junker (1996) suggested conditioning on a rest score D based on the k 1 item scores all dichotomized at the same x. However, this method has not been implemented in MSPWin yet. As emphasized by Sijtsma and Ark (2005), future research has to show, whether for practical use these methods give valid information about the ISRFs in the sense that errors are negligible; and indicate whether methods presently implemented should be replaced by theoretically more sound methods. Comparing to structural equation modeling the limitation of Mokken's scale analysis is that it does not provide estimation of relationships among latent traits. Moreover, the so-called bi-factorial structures can not be investigated.

59

60 EMPIRICAL RESEARCH

Introduction The aim of the study is to investigate the number and the structure of motor symptoms of Parkinson's disease. This is inferred through statistical analysis of the Motor Section of the Unified Parkinson's Disease Rating Scale (MS UPDRS). Two statistical approaches were employed - Mokken's scale analysis (MSA) and structural equation modeling (SEM). Results of this research are presented in this section. This section consists of several subsections. The sample of Parkinson's disease patients is described first. This is subsequented by results of the research. Results are organized into six main areas: a) Initial computations. The basic statistical properties of the data are investigated. b) Assessment of dimensionality by exploratory MSA for cutoff values of Hi coefficients equal 0.3 and 0.4. c) Confirmatory MSA. Various meaningful structures of the parts of the MS UPDRS are investigated. d) Building the structural equation models of the parts of the MS UPDRS. e) Building the structural equation model of the entire MS UPDRS. f) Discussion of the differences between SEM models for patients in on state and off state.

60

61 Sample Description Four hundred and five consecutive patients (237 men (39 off; 170 on; 28 unknown), 168 women (21 off; 140 on; 7 unknown)) mean age 61, range 35-80 years) with Parkinson's disease (PD) diagnosed according to the current clinical criteria (Hughes, Daniel, Kilford, & Lees, 1992) were included in the research. Each patient was evaluated by one member of a group of certified neurologists specializing in movement disorders who were routinely using the UPDRS. 60 patients were examined in defined off state and 310 patients in defined on state. For 35 patients, their motor state during evaluation was not specified. This sample consists of two subsamples. The first subsample of N=147 (96 men (38 off; 30 on; 28 unknown), 51 women (15 off; 29 on; 7 unknown)) was obtained at the Movement Disorder Centre, Charles University, Prague, Czech Republic. The second subsample of N=258 (141 men (1 off; 140 on) 117 women (6 off; 111 on) was acquired at University Medical Centre Groningen, Netherlands.

61

62 Results Initial Computations For getting an overview of the characteristics of the data set the analysis of the basic statistical properties of each item had been made. For this purpose the NCSS (Hintze, 1996) program and the PRELIS (Jreskog & Srbom, 2002) program have been used. For the analysis of statistical distribution various distributional tests have been employed. The expected non-normal distribution of most items' responses has been verified. Generally the sample distributions of the item responses are skewed to the left (skewness ranging from 0.2 to 4.43; mean 1.11; see Table 1). Items concerning tremor and the item Arising from chair violate the skew values of normal distribution more than the others. Items Tremor at rest face, lips, chin, Tremor at rest right lower extremity and Tremor at rest left lower extremity have very high values of kurtosis. The sum scores of the Motor Section were ranging from 2 to 74 (mean 26.14; standard deviation 13.27). Regarding the high skewness of items, it was necessary from a statistical point of view to reduce the number of categories to obtain accurate results. Therefore response categories were trichotomized for all Mokken's scale analyses as follows: Category I = Score 0; Category II = Score 1; Category III = Scores 2, 3, and 4 in the Motor Section of the UPDRS. For comparative purposes, results for non-trichotomized response categories can be found in appendix. It is found that the variance of responses of the item Tremor at rest face, lips, chin is very low. This means that there is a lack of information. In fact, there have been only 20 patients who have reached category 1; 8 patients who have reached category 2; and only 2 patients have reached category 3 in this item of the total 405 patients. Practically this item can not distinguish various levels of Parkinson's disease well.

62

63
Table 1 Basic statistical properties of the data Item Speech Facial expression Tremor at rest FLC RUE LUE RLE LLE Action/postural tremor Rigidity Right Left H/N RUE LUE RLE LLE Finger taps Hand movements Rapid altern. mov. Leg agility Arise from chair Posture Gait Postural stability Right Left Right Left Right Left Right Left Mean 1.13 1.45 0.10 0.59 0.47 0.17 0.15 0.45 0.44 0.84 1.25 1.11 0.84 0.83 1.51 1.54 1.26 1.31 1.21 1.29 1.19 1.26 0.69 1.25 1.16 1.13 Range 0-4 0-4 0-3 0-4 0-4 0-3 0-3 0-3 0-3 0-3 0-3 0-4 0-4 0-4 0-4 0-4 0-4 0-4 0-4 0-4 0-4 0-4 0-4 0-4 0-4 0-4 St. deviation 0.89 0.84 0.40 0.94 0.83 0.51 0.49 0.68 0.65 0.84 0.84 0.90 0.81 0.84 0.96 1.02 0.88 0.96 0.97 0.96 0.96 1.03 1.06 0.93 0.94 1.01 Skewness 0.63 0.42 4.43 1.51 1.83 3.56 3.74 1.57 1.39 0.70 0.20 0.44 0.67 0.74 0.30 0.21 0.43 0.35 0.51 0.51 0.65 0.61 1.63 0.77 0.87 0.70 Kurtosis 3.15 3.25 24.17 4.30 5.74 16.56 17.78 5.43 4.66 2.71 2.43 2.67 2.92 2.92 2.58 2.29 2.92 2.48 2.73 3.01 3.03 2.82 5.03 3.58 3.64 2.91

Body bradykinesia 1.54 0-4 0.98 0.48 2.95 RUE Right Upper Extremity; LUE Left Upper Extremity; RLE Right Lower Extremity; LLE Left Lower Extremity; H/N Head, Neck; FLC Face, Lips, Chin; Right Right Extremity; Left Left Extremity

63

64 Exploratory Mokken's Scale Analysis of MS UPDRS This section deals with exploratory approach of the Mokken's scale analysis. Two analyses with different cutoff criterion for Loevinger's scalability coefficient Hi will be performed to determine the number of unidimensional scales (dimensions) of the MS UPDRS. For analyzing the MS UPDRS data the MSPWin 5.0 program (Boer, 2001) was used. First, the search option in the program was employed, searching for the most coherent unidimensional subscales that could be created from the set of all items and which satisfy assumptions of the Mokken's model. To evaluate the homogeneity of the Mokken's scale, Loevinger's scalability coefficients H and Hi were used. For comparative purposes with previous studies (e.g. Stebbins et al. (1999), Ramaker et al. (2002)) Cronbach's alphas were computed for each dimension. For the first analysis, the recommended initial cutoff criterion for Loevinger's scalability coefficient Hi value of 0.3 has been chosen; results are exposed in Table 2. This cutoff criterion has been subsequently increased to 0.4 (Table 3) as suggested by Sijtsma and Molenaar (2002), based on the study by Hemker et al. (1995) for obtaining definitely unidimensional scales. All subscales obtained from these analyses are unidimensional and suit also the monotonicity assumption of ISRFs. Moreover, subscales two and three in Table 2 and subscales three and four in Table 3 also satisfy the assumption of double-monotonicity. The assumption of local independency is not empirically verifiable, but there is no reason to assume that it does not hold for these subscales. Three subscales satisfy the cutoff criterion of Hi > 0.3 (Table 2). The first subscale consists of all items related to rigidity, bradykinesia of the extremities and axial/gait bradykinesia and also includes items Speech and Facial expression; it is of medium strength (scale H equals 0.43). The item Rigidity - RUE has the lowest Hi coefficient (0.32), other Hi coefficients in this subscale are ranging from 0.37 to 0.51. The lower boundary of reliability estimated by Cronbach's alpha is high (0.94). This scale violated the assumption of non-intersection of ISRFs and thus the Rho coefficient is not computed.

64

65
Table 2 Exploratory results for cutoff criterion of Hi>0.3 Item Speech Facial expression Tremor at rest FLC RUE LUE RLE LLE Action/postural tremor Rigidity Right Left H/N RUE LUE RLE LLE Finger taps Hand movements Rapid altern. mov. Leg agility Arise from chair Posture Gait Postural stability Body bradykinesia Scale H Reliability (Rho) Right Left Right Left Right Left Right Left 0.41 0.32* 0.40 0.41 0.44 0.42 0.46 0.43 0.46 0.42 0.47 0.43 0.45 0.51 0.47 0.44 0.44 0.50 0.43 0.48 0.66 0.53 0.70 0.42* 0.48* 0.49 0.53 0.52 0.57 Hi of subscale 1 0.38 0.37 Hi of subscale 2 Hi of subscale 3

Cronbach's alpha 0.94 0.62 0.65 RUE Right Upper Extremity; LUE Left Upper Extremity; RLE Right Lower Extremity; LLE Left Lower Extremity; H/N Head, Neck; FLC Face, Lips, Chin; Right Right Extremity; Left Left Extremity The item with the lowest value of Hi in each subscale is marked by an asterisk

65

66 The second and the third subscale correspond with tremor of the right side of the body (Tremor at rest RUE, Tremor at rest RLE and Action/postural tremor right hand) and tremor of the left side of the body, respectively (Tremor at rest LUE, Tremor at rest LLE and Action/postural tremor left hand). Both of them have relatively high values of scale Hs (0.48 and 0.53 respectively). For these limited subscales (both consist of three items only), there were neither violations against monotonicity nor non-intersection of the ISRFs. It allows for estimating of the lower boundary of the reliability by Rho coefficients (0.66 and 0.70, respectively). Cronbach's alphas equal 0.62 and 0.65 for these subscales. Recalling high skewness and kurtosis of the item Tremor at rest face, lips, chin, this item was excluded by the program because of an excessively low Hi value. In other words, this item does not fit to any of these subscales. After increasing the initial cutoff criterion for Loevinger's scalability coefficient Hi to 0.4, the subscale number one from the previous analysis was split into three subscales (Table 3). The level of homogeneity (unidimensionality) of these three new subscales highly increased (scale H > 0.5). Now subscale one includes the items related to axial/gait bradykinesia (i.e. Arise from chair, Posture, Gait, Postural stability and Body bradykinesia), and all left-sided items measuring rigidity and bradykinesia of the extremities (Rigidity LUE, Rigidity LLE, Finger taps left hand, Hand movements left hand, Rapid altern. movements left hand and Leg agility left leg). The Hi coefficients range from 0.46 to 0.59. The second subscale contains rightsided items measuring rigidity and bradykinesia of the extremities (Rigidity RUE, Rigidity RLE, Finger taps right hand, Hand movements right hand, Rapid altern. movements right hand and Leg agility right leg) with His ranging from 0.49 to 0.65. These two subscales violated the double-monotonicity assumption, thus the reliability could not be estimated using the Rho coefficient. However, Cronbach's alphas are satisfactory for both subscales (0.92 and 0.87, respectively).

66

67
Table 3 Exploratory results for cutoff criterion of Hi>0.4 Item Speech Facial expression Tremor at rest FLC RUE LUE RLE LLE Act./post. tremor Right Left Rigidity H/N RUE LUE RLE LLE Finger taps Right Left Hand movements Right Left Rapid altern. mov. Leg agility Arise from chair Posture Gait Postural stability Body bradykinesia Scale H Reliability (Rho) Cronbach's alpha Right Left Right Left 0.54 0.58 0.54 0.50 0.52 0.56 0.54 0.92 0.56 0.87 0.48 0.66 0.62 0.53 0.70 0.65 0.74 0.76 0.59 0.52 0.58 0.60 0.58 0.61 0.51 0.65 0.46* 0.51 0.49* 0.42* 0.48* 0.49 0.53 0.52 0.57 Hi of subscale Hi of subscale Hi of subscale 1 2 3 Hi of subscale 4 Hi of subscale 5 0.74* 0.74

RUE Right Upper Extremity; LUE Left Upper Extremity; RLE Right Lower Extremity; LLE Left Lower Extremity; H/N Head, Neck; FLC Face, Lips, Chin; Right Right Extremity; Left Left Extremity; Act./post. Action/postural The item with the lowest value of Hi in each subscale is marked by an asterisk

67

68 Items Tremor at rest face, lips, chin and Rigidity head, neck do not fit any of the subscales and therefore were excluded automatically by the program. The items Speech and Facial expression generated another dimension, which is, however, very limited (consists of two items only). Hence, this dimension is denoted as speech/hypomimia. Taken as a whole, the results indicate that rigidity, bradykinesia of the extremities, axial/gait bradykinesia and speech/hypomimia are very similar concepts. In other words these symptoms co-occur. Furthermore, the concepts of rigidity and bradykinesia of the extremities are side-sensitive, and axial/gait bradykinesia is related to left-sided items of bradykinesia and rigidity of the extremities. Rest tremor and action/postural tremor are relatively independent from other symptoms of Parkinson's disease and both are side-sensitive.

Confirmatory Mokken's Scale Analysis of MS UPDRS This section deals with confirmatory approach of the Mokken's scale analysis employed to test the various meaningful structures of the MS UPDRS. The goal is to investigate whether some statistically stronger scales can be found than those found by exploratory Mokken's scale analysis. The same data are used in the following analyses as in the exploratory MSA. For the purpose of lucidity, the MS UPDRS was divided into three parts and each part was analyzed separately. First part consisted of items related to rest tremor and action/postural tremor, the second one of items related to rigidity and bradykinesia of the extremities, the third part of items related to axial/gait bradykinesia and speech/hypomimia. Mokken's scale analysis of items related to tremor The following items of the MS UPDRS are analyzed: Tremor at rest: 1) Face, lips, chin (FLC)

68

69 2) Right upper extremity (RUE) 3) Left upper extremity (LUE) 4) Right lower extremity (RLE) 5) Left lower extremity (LLE) Action/postural tremor: 6) Right hand (Right) 7) Left hand (Left) The MSPWin5 test option was used for this analysis. The program then computes values of Hi and H coefficients for structures specified by the researcher. In other words, there is no cutoff criterion for the Hi coefficient. According to Molenaar and Sijtsma (2000), however, researchers should not consider subscale with values of His smaller than 0.3 as unidimensional even if scale H goes over 0.3. The values of His for specified subscales (this specification is based on theoretical meaningfulness) are exposed in Table 4.
Table 4 Values of Hi coefficients of the confirmatory Mokken's scale analysis Item Tremor at rest FLC RUE LUE RLE LLE Action/postural tremor Right Left Scale H Reliability (Rho) Analysis 1 0.28* 0.36 0.42 0.34 0.32 0.31 0.30 0.34 0.35 0.41 0.35 0.34 0.34 0.32* 0.35 0.48 0.66 0.42* 0.48* 0.53 0.70 0.49 0.53 0.52 0.57 Analysis 2 Analysis 3 Analysis 4

Cronbach's alpha 0.69 0.68 0.62 0.65 RUE Right Upper Extremity; LUE Left Upper Extremity; RLE Right Lower Extremity; LLE Left Lower Extremity; H/N Head, Neck; FLC Face, Lips, Chin; Right Right Extremity; Left Left Extremity The item with the lowest value of Hi in each subscale is marked by an asterisk

69

70 As can be seen from Table 4 (analysis 1), unidimensionality of the all 7 items cannot be accepted since item Tremor at rest face, lips, chin has a Hi value smaller than 0.3. However, even with the unidimensionality of the scale after exclusion of the item Tremor at rest face, lips, chin the Hi values are not very high anyway. If the results of analysis 2 are compared with those for analysis 3 and 4, it can be seen that values of Hi coefficients increased considerably which indicates that tremor at rest and action/postural tremor are side-sensitive. In addition, the worst items are items related to action/postural tremor in both analyses, which is reasonable, because although action/postural tremor and rest tremor are very similar concepts, they are not the same (see definitions on page 11). Scale H is the coefficient evaluating the level of unidimensionality (homogeneity) of the whole scale. Comparing model including all items with the model of two separate scales, scale H coefficient increased from 0,34 to 0,48 and 0,53 respectively. It indicates that the model of two separate scales fits the data much better. The Rho coefficients obtained from analysis 3 and 4 are equal 0.66 and 0.70 respectively which indicates possibility of low reliability of the scale. Mokken's scale analysis of items related to rigidity and bradykinesia of extremities Using MSPWin program another part of MS UPDRS is analyzed, which contains the following rigidity and bradykinesia of the extremities items: Rigidity: 1) Head, neck (H/N) 2) Right upper extremity (RUE) 3) Left upper extremity (LUE) 4) Right lower extremity (RLE) 5) Left lower extremity (LLE) Bradykinesia of the extremities: 6) Finger taps right hand (Right) 7) Finger taps left hand (Left) 8) Hand movements right hand (Right) 9) Hand movements left hand (Left)

70

71 10) Rapid alternating movements right hand (Right) 11) Rapid alternating movements left hand (Left) 12) Leg agility right leg (Right) 13) Leg agility left leg (Left) First, according to several studies unidimensionality of items related to rigidity and bradykinesia is tested. Results are exposed in Table 5 (analysis 1). Although unidimensionality for the cutoff value of Hi = 0.3 can be accepted, comparing to results from analysis 2 and 3 respectively this scale is not favored. In analysis 2, the unidimensionality of items related to rigidity is tested. The worst item in analysis 2 is the item Rigidity head, neck with Hi = 0.52. The reason can be found in the non-side-dependency formulation of this item. However, its Hi value is sufficiently high to be included in the scale. In analysis 3 items related to bradykinesia are tested for unidimensionality. All Hi values of this scale are higher than 0.5 and the scale H equals 0.53. Therefore the scale can be considered to be unidimensional. Except for the item Rigidity head, neck all other items are formulated as sidedependent. Therefore, the unidimensionality of the two scales which deal with laterality of rigidity and bradykinesia of the extremities is tested in the analysis 4 and 5 correspondingly. Each scale includes right-sided and left-sided items respectively. Item Rigidity head, neck is excluded. Both scales are very homogenous. The lowest Hi value is equal to 0.49. This can be interpreted as that for this part of the MS UPDRS the relationships among items can be explained more due to their competency to extremity formulation of the item than due to relevancy to bradykinesia or rigidity.

71

72
Table 5 Values of Hi coefficients of the confirmatory Mokken's scale analysis Item Rigidity H/N RUE LUE RLE LLE Finger taps Right Left Hand movements Right Left Rapid altern. mov. Leg agility Right Left Right Left Scale H Reliability (Rho) Cronbach's alpha Analysis 1 0.42 0.35* 0.43 0.45 0.47 0.46 0.49 0.45 0.48 0.44 0.48 0.43 0.44 0.45 0.91 0.56 0.85 Analysis 2 0.52* 0.54 0.57 0.58 0.58 0.51 0.55 0.52 0.55 0.51 0.54 0.53 0.51* 0.53 0.91 0.56 0.87 0.52 0.58 0.64 0.89 0.60 0.67 0.61 0.68 0.65 0.71 0.51 0.62 0.49* 0.56* Analysis 3 Analysis 4 Analysis 5

RUE Right Upper Extremity; LUE Left Upper Extremity; RLE Right Lower Extremity; LLE Left Lower Extremity; H/N Head, Neck; Right Right Extremity; Left Left Extremity. The item with the lowest value of Hi in each subscale is marked by an asterisk

Mokken's scale analysis of rest of items of MS UPDRS The following items have not been analyzed by MSPWin program yet: Speech/hypomimia: 1) Speech 2) Facial expression Axial/gait bradykinesia: 3) Arising from chair 4) Posture 5) Gait 6) Postural stability 7) Body bradykinesia and hypokinesia

72

73 One may ask why items Speech and Facial expression are tested for unidimensionality with items belonging to axial/gait bradykinesia concept even if they are not supposed to measure it. The answer is that many studies (Cubo et al., 2000; Stebbins & Goetz, 1998; Stebbins et al., 1999) found Speech, Facial expression and axial/gait bradykinesia to be one factor only. In analysis 1, the unidimensionality of all items listed above is investigated. Except for the item Facial expression with Hi value of 0.48, all other Hi values are higher or equal 0.5.
Table 6 Values of Hi coefficients of the confirmatory Mokken's scale analysis Item Speech Facial expression Arise from chair Posture Gait Postural stability Body bradykinesia Scale H Reliability (Rho) Analysis 1 0.50 0.48* 0.68 0.63 0.57 0.58 0.59 0.58 Analysis 2 0.75 0.70 0.67 0.67 0.66* 0.69 0.91 0.88

Cronbach's alpha 0.90 The item with the lowest value of Hi in each subscale is marked by an asterisk

For the analysis 2 item Speech has been excluded together with Facial expression. There are three reasons to exclude them: 1) It is in accordance with the theory. Items Speech and Facial expression are not supposed to measure axial/gait bradykinesia 2) It is consistent with the findings of exploratory Mokken's scale analysis. These two items generated stand-alone dimension for cutoff criterion of H i > 0.4 (see Table 3) 3) They were the worst items in the previous analyses 73

74 Obviously the scale tested in analysis 3 is the most homogenous one. Moreover, there was no violation of intersection of ISRFs and thus the reliability by Rho coefficient can be estimated. It equals 0.91.

Summary of Results of Mokken's Scale Analyses Previous analyses represent the nonparametric item response theory approach employed for assessing the dimensionality of the Motor Section of the Unified Parkinson's Disease Rating Scale. Exploratory Mokken's scale analysis found three dimensions for cutoff criterion of H i > 0.3 and five dimensions for cutoff criterion of H i > 0.4 . Symptoms of rigidity, bradykinesia of the extremities, axial/gait bradykinesia and speech/hypomimia co-occur. On the contrary, tremor seems to be a relatively independent symptom of Parkinson's disease. Further, axial/gait bradykinesia is highly related to left-sided items of rigidity and bradykinesia of the extremities. Finally, the results of confirmatory Mokken's scale analysis suggest that tremor, rigidity, and bradykinesia of the extremities are side-sensitive.

74

75 Building Structural Equation Models of Parts of MS UPDRS The aim of this section is to assess the dimensionality and the structure of these dimensions within each of motor symptom of PD. For this purpose the MS UPDRS will be divided into several parts, which will be analyzed separately. Results will be presented in the separated subsections. Previous studies using exploratory factor analysis found between three and six factors (dimensions) in UPDRS (Cubo et al., 2000; Martignoni et al., 2003; MartinezMartin et al., 1994; Stebbins & Goetz, 1998; Stebbins et al., 1999). All studies found that MS UPDRS includes concepts of rigidity, bradykinesia and tremor. Concerning other factors, studies vary. Since the level of measurement is ordinal and the sample size is relatively small, Jreskog and Srbom (1993) recommend to analyze the matrix of polychoric correlations of the data and use the Diagonally Weighted Least Squares (DWLS) method for parameter estimation. Thus, the DWLS is used for all following analyses in this section. As recommended by Boomsma (2000), correlation matrices, path diagrams, substantive goodness of fit indices, standard errors of the estimates summaries, and residual matrices are reported. For selected models full report of parameter estimates, standard errors of the estimates and t-values (t-value is a ratio of parameter estimate and the corresponding standard error of the estimate) can be found in appendix.

Building SEM model of items related to tremor During the development of MS UPDRS, the following items were designed for evaluating the level of tremor: Tremor at rest: 1) Face, lips, chin (TrFLC) 2) Right upper extremity (TrRUE) 3) Left upper extremity (TrLUE) 4) Right lower extremity (TrRLE) 5) Left lower extremity (TrLLE)

75

76 Action/postural tremor: 6) Right hand (ATrRhand) 7) Left hand (ATrLhand) In the following section, four theoretically meaningful models are presented based on the analysis of correlation matrix exposed in Table 7.
Table 7 Matrix of polychoric correlations of items related to tremor (N=405) TrFLC TrRUE TrLUE TrRLE TrLLE ATrRhand ATrLhand TrFLC 1.00 0.43 0.57 0.46 0.37 0.12 0.26 TrRUE 1.00 0.52 0.64 0.18 0.55 0.13 TrLUE 1.00 0.42 0.69 0.25 0.60 TrRLE TrLLE ATrRhand ATrLhand

1.00 0.68 0.37 -0.02

1.00 0.04 0.45

1.00 0.60

1.00

The unidimensionality of the all 7 items is tested first. The factor is labeled as Tremor. Although this structure was not reported before, this could be meaningful since all items are related to tremor.

Fig. 7. Path diagram of the one-factor model of tremor

76

77 Goodness of Fit Indices and Standard Errors Summary:


Sample size = 405 Degrees of Freedom = 14 Satorra-Bentler Scaled Chi-Square = 140.53 (P = 0.0) Root Mean Square Error of Approximation (RMSEA) = 0.15 90 Percent Confidence Interval for RMSEA = (0.13; 0.17) Model AIC = 168.53 Expected Cross-Validation Index (ECVI) = 0.42 Normed Fit Index (NFI) = 0.92 Comparative Fit Index (CFI) = 0.92 Standardized Root Mean Square Residual (SRMR) = 0.17 Goodness of Fit Index (GFI) = 0.95 Fitted Residuals: Range = <-0.47; 0.25>; Median = 0.00 St. Errors: Range = <0.04; 0.13>; Median = 0.08; St. Deviation = 0.03 Table 8 Fitted residuals TrFLC TrRUE TrLUE TrRLE TrLLE ATrRhand ATrLhand TrFLC 0.00 0.04 0.09 0.02 -0.06 -0.21 -0.10 TrRUE 0.00 -0.02 0.16 -0.31 0.18 -0.28 TrLUE 0.00 -0.18 0.09 -0.22 0.10 TrRLE TrLLE ATrRhand ATrLhand

0.00 0.15 -0.05 -0.47

0.00 -0.38 0.00

0.00 0.25

0.00

Corresponding to previous studies, this part of the MS UPDRS cannot be considered as unidimensional. Recalling the rules of thumbs for goodness of fit indices, almost all of them suggest rejecting this model. In addition, values in the residual matrix are very high especially for items ATrRhand and ATrlhand. This inspires to analyse another structure described below. In accordance with other studies (Cubo et al., 2000; Stebbins & Goetz, 1998; Stebbins et al., 1999) a two factor structure is tested. One factor for all items related to tremor at rest concept (denoted as Tremor again) and the second for items for action/postural tremor concept (denoted as ATremor) are used. The two factors are supposed to be correlated because they are both related to the tremor symptom.

77

78

Fig. 8. Path diagram of the two-factor model of tremor

Goodness of Fit Indices and Standard Errors Summary:


Sample size = 405 Degrees of Freedom = 13 Satorra-Bentler Scaled Chi-Square = 145.46 (P = 0.0) Root Mean Square Error of Approximation (RMSEA) = 0.16 90 Percent Confidence Interval for RMSEA = (0.14; 0.18) Model AIC = 175.46 Expected Cross-Validation Index (ECVI) = 0.43 Normed Fit Index (NFI) = 0.91 Comparative Fit Index (CFI) = 0.92 Standardized Root Mean Square Residual (SRMR) = 0.15 Goodness of Fit Index (GFI) = 0.96 Fitted Residuals: Range = <-0.39; 0.25>; Median = 0.00 St. Errors: Range = <0.04; 0.14>; Median = 0.09; St. Deviation = 0.03 Table 9 Fitted residuals TrFLC TrRUE TrLUE TrRLE TrLLE ATrRhand ATrLhand TrFLC 0.00 0.02 0.06 0.00 -0.08 -0.15 -0.03 TrRUE 0.00 -0.06 0.14 -0.33 0.25 -0.21 TrLUE 0.00 -0.22 0.06 -0.13 0.18 TrRLE TrLLE ATrRhand ATrLhand

0.00 0.13 0.03 -0.39

0.00 -0.30 0.08

0.00 0.00

0.00

78

79 Comparing to the previous model GFI, SRMR, and fitted residuals slightly improved. However, RMSEA and Satorra-Bentler chi-square are worse. The fit of the model, however, is not acceptable anyway. Because the items concerning tremor have been formulated like extremity dependent, another correlated two factor structure is tested. One factor is used for rightsided items (hence denoted as TremorR), and one factor for left-sided items (TremorL;see Figure 9). The item Tremor at rest face, lips, chin has been excluded since it does not depend on the laterality. This structure was the final acceptable one in the Mokken's scale analysis.

Fig. 9. Path diagram of the two-factor model of tremor

Goodness of Fit Indices and Standard Errors Summary:


Sample size = 405 Degrees of Freedom = 8 Satorra-Bentler Scaled Chi-Square = 150.44 (P = 0.0) Root Mean Square Error of Approximation (RMSEA) = 0.21 90 Percent Confidence Interval for RMSEA = (0.18; 0.24) Model AIC = 176.44 Expected Cross-Validation Index (ECVI) = 0.44 Normed Fit Index (NFI) = 0.89 Comparative Fit Index (CFI) = 0.89 Standardized Root Mean Square Residual (SRMR) = 0.16 Goodness of Fit Index (GFI) = 0.96 Fitted Residuals: Range = <-0.38; 0.30>; Median = 0.00 St. Errors: Range = <0.04; 0.15>; Median = 0.10; St. Deviation = 0.04

79

80
Table 10 Fitted residuals TrRUE 0.00 0.08 0.03 -0.21 0.04 -0.21 TrLUE 0.00 -0.04 0.01 -0.14 0.01 TrRLE 0.00 0.27 -0.17 -0.38 TrLLE ATrRhand ATrLhand

TrRUE TrLUE TrRLE TrLLE ATrRhand ATrLhand

0.00 -0.31 -0.07

0.00 0.30

0.00

Surprisingly, the fit of the model is worse than of the previous two models. This model cannot be accepted. Another structure is exposed in Figure 10. As mentioned earlier, except of item Tremor at rest face, lips, chin (TrFLC) all other items are formulated as extremity dependent. Therefore the hierarchical structure which contains sub-dimensions for rightsided items (TremorR) and left-sided items (TremorL) of tremor is meaningful. Item Tremor at rest face, lips, chin is obviously related directly to tremor concept. Note, that till this point rather factor analysis then the real structural modeling was employed. The next analysis, however, tests the full structural model.

Fig. 10. Path diagram of the hierarchical model of tremor

80

81 Goodness of Fit Indices and Standard Errors Summary:


Sample size = 405 Degrees of Freedom = 12 Satorra-Bentler Scaled Chi-Square = 127.86 (P = 0.0) Root Mean Square Error of Approximation (RMSEA) = 0.15 90 Percent Confidence Interval for RMSEA = (0.13; 0.18) Model AIC = 159.86 Expected Cross-Validation Index (ECVI) = 0.40 Normed Fit Index (NFI) = 0.93 Comparative Fit Index (CFI) = 0.93 Standardized Root Mean Square Residual (SRMR) = 0.15 Goodness of Fit Index (GFI) = 0.96 Fitted Residuals: Range = <-0.37; 0.32>; Median = 0.00 St. Errors: Range = <0.07; 0.25>; Median = 0.14; St. Deviation = 0.05

Table 11 Fitted residuals TrFLC TrRUE TrLUE TrRLE TrLLE ATrRhand ATrLhand TrFLC 0.06 0.05 0.06 -0.06 -0.19 -0.11 0.00 TrRUE 0.00 0.06 0.01 -0.20 0.06 -0.20 TrLUE 0.00 -0.06 0.00 -0.13 0.01 TrRLE TrLLE ATrRhand ATrLhand

0.00 0.28 -0.15 -0.37

0.00 -0.28 -0.04

0.00 0.32

0.00

This structure matches the theory best. As can be seen, however, the fit remains bad, even if CFI and chi-square have been slightly improved comparing to previous structures analyzed in this section. Generally, an acceptable structure of this part of the MS UPDRS has not been found. It could be the effect of relatively small sample size and (or) kurtotic items bothering. Nevertheless it does not mean that such structures cannot be used to build the SEM model of the whole MS UPDRS since only the correlations within tremor items were included in the previous analyses. Correlations between tremor items and items belonging to other symptoms of PD can be still well explained by these structures. As a consequence, the fit of the whole MS UPDRS model can be good enough even if the residuals within the tremor items will remain relatively high.

81

82 Building SEM model of items related to rigidity and bradykinesia of extremities According to the content validity, the following items are related to rigidity concept: Rigidity: 1) Head, neck (Neck) 2) Right upper extremity (RUE) 3) Left upper extremity (LUE) 4) Right lower extremity (RLE) 5) Left lower extremity (LLE) Bradykinesia of the extremities: 6) Finger taps - right hand (FRhand) 7) Finger taps - left hand (FLhand) 8) Hand movements - right hand (HRhand) 9) Hand movements - left hand (HLhand) 10) Rapid alternating movements - right hand (RRhand) 11) Rapid alternating movements - left hand (RLhand) 12) Leg agility - right leg (LRleg) 13) Leg agility - left leg (Lleg)
Table 12 Matrix of polychoric correlations of items related to rigidity and bradykinesia (N=405) Neck RUE LUE RLE LLE FRhand FLhand Hrhand HLhand RRhand RLhand LRleg LLleg Neck 1.00 0.53 0.59 0.61 0.60 0.46 0.41 0.38 0.37 0.40 0.40 0.37 0.39 RUE 1.00 0.53 0.75 0.45 0.51 0.22 0.49 0.18 0.51 0.20 0.37 0.18 LUE 1.00 0.51 0.82 0.27 0.58 0.27 0.56 0.28 0.58 0.24 0.46 RLE LLE FRhand FLhand

1.00 0.73 0.48 0.30 0.44 0.28 0.48 0.29 0.44 0.34

1.00 0.26 0.56 0.25 0.53 0.27 0.55 0.32 0.53

1.00 0.59 0.85 0.45 0.77 0.40 0.63 0.39

1.00 0.47 0.86 0.46 0.82 0.45 0.72

82

83
Table 12 (continued) Hrhand 1.00 0.59 0.80 0.45 0.65 0.39 HLhand 1.00 0.48 0.86 0.49 0.70 RRhand 1.00 0.58 0.71 0.46 RLhand LRleg LLleg

Hrhand HLhand RRhand RLhand LRleg LLleg

1.00 0.50 0.74

1.00 0.77

1.00

There are at least two substantive reasons to analyze these two parts of the MS UPDRS together: 1) Rigidity and bradykinesia are related concepts; it is reasonable to assume that people who suffer from rigidity of muscles will probably have slower movements (bradykinesia) 2) There are only five items which measure rigidity. From the identification point of view, this can be a problem when one wants to fit more complicated structures within the rigidity items. A two-factor model for these items has been presented in the study of Cubo et al. (2000); one factor for rigidity (labeled as Rig), the next one for bradykinesia of the extremities (labeled as Brad). Thus, this structure was put to an empirical test. Results are exposed in Figure 11 and the corresponding residual matrix in Table 13. Following the reasons mentioned above factors are supposed to be correlated.

83

84

Fig. 11. Path diagram of the two-factor model of rigidity and bradykinesia
Goodness of Fit Indices and Standard Errors Summary:
Sample size = 405 Degrees of Freedom = 64 Satorra-Bentler Scaled Chi-Square = 1994.43 (P = 0.0) Root Mean Square Error of Approximation (RMSEA) = 0.27 90 Percent Confidence Interval for RMSEA = (0.26; 0.28) Model AIC = 2048.43 Expected Cross-Validation Index (ECVI) = 5.07 Normed Fit Index (NFI) = 0.77 Comparative Fit Index (CFI) = 0.77 Standardized Root Mean Square Residual (SRMR) = 0.13 Goodness of Fit Index (GFI) = 0.96 Fitted Residuals: Range = <-0.30; 0.19>; Median = 0.00 St. Errors: Range = <0.02; 0.11>; Median = 0.04; St. Deviation = 0.04

Table 13 Fitted residuals Neck RUE LUE RLE LLE FRhand FLhand HRhand HLhand RRhand RLhand LRleg LLleg Neck 0.00 0.00 -0.04 0.01 -0.05 0.09 0.02 0.01 -0.03 0.04 0.01 0.02 0.03 RUE 0.00 -0.08 0.16 -0.18 0.16 -0.16 0.13 -0.21 0.15 -0.19 0.03 -0.17 LUE 0.00 -0.18 0.07 -0.15 0.14 -0.15 0.10 -0.14 0.13 -0.16 0.05 RLE LLE FRhand FLhand

0.00 0.01 0.08 -0.13 0.03 -0.16 0.07 -0.15 0.05 -0.06

0.00 -0.18 0.10 -0.19 0.05 -0.17 0.08 -0.09 0.10

0.00 -0.11 0.19 -0.27 0.12 -0.30 0.02 -0.24

0.00 -0.24 0.09 -0.25 0.06 -0.21 0.04

84

85
Table 13 (continued) Hrhand 0.00 -0.14 0.14 -0.27 0.02 -0.26 HLhand 0.00 -0.24 0.08 -0.19 0.00 RRhand 0.00 -0.13 0.09 -0.18 RLhand LRleg LLleg

Hrhand HLhand RRhand RLhand LRleg LLleg

0.00 -0.17 0.05

0.00 0.16

0.00

This model is not acceptable since the fit is poor. Except the item Rigidity head, neck, all other 12 items have been formulated like extremity dependent again. Therefore the hierarchical structure with two subdimensions accounting for laterality is tested. Hence the factor for the right-sided items is denoted as RigBradR and for the left-sided items as RigBradL. Item Rigidity head, neck belongs to the general factor which is denoted as RigBrad.

Fig. 12. Path diagram of the hierarchical model of rigidity and bradykinesia

Goodness of Fit Indices and Standard Errors Summary:


Sample size = 405 Degrees of Freedom = 63 Satorra-Bentler Scaled Chi-Square = 1445.49 (P = 0.0) Root Mean Square Error of Approximation (RMSEA) = 0.23 90 Percent Confidence Interval for RMSEA = (0.22; 0.24)

85

86
Model AIC = 1501.49 Expected Cross-Validation Index (ECVI) = 3.72 Normed Fit Index (NFI) = 0.83 Comparative Fit Index (CFI) = 0.84 Standardized Root Mean Square Residual (SRMR) = 0.12 Goodness of Fit Index (GFI) = 0.96 Fitted Residuals: Range = <-0.21; 0.34>; Median = -0.05 St. Errors: Range = <0.04; 0.42>; Median = 0.14; St. Deviation = 0.12

Table 14 Fitted residuals Neck RUE LUE RLE LLE FRhand FLhand HRhand HLhand RRhand RLhand LRleg LLleg Neck 0.00 0.20 0.25 0.11 -0.05 -0.16 -0.07 -0.21 -0.05 -0.18 -0.17 -0.17 0.14 RUE 0.00 0.14 0.21 -0.16 -0.09 -0.16 -0.13 -0.14 -0.09 -0.17 -0.17 0.16 LUE 0.00 0.34 -0.16 -0.13 -0.21 -0.16 -0.16 -0.15 -0.18 -0.07 0.18 RLE LLE FRhand FLhand

0.00 -0.18 -0.14 -0.20 -0.19 -0.17 -0.15 -0.10 -0.13 0.15

0.00 0.10 0.12 -0.05 0.05 -0.09 -0.06 -0.07 -0.04

0.00 -0.03 0.06 -0.03 0.04 -0.02 -0.01 -0.09

0.00 0.09 0.07 -0.05 -0.05 -0.08 -0.12

Table 14 (continued) Hrhand 0.00 -0.01 0.06 0.01 -0.04 -0.14 HLhand 0.00 0.10 0.02 0.01 -0.09 RRhand 0.00 0.03 0.02 -0.09 RLhand LRleg LLleg

HRhand HLhand RRhand RLhand LRleg LLleg

0.00 0.33 -0.11

0.00 -0.08

0.00

The fit has been slightly improved but still the model has to be rejected. A number of studies (Stebbins & Goetz, 1998; Stebbins et al., 1999) found a three-factor structure; one factor for rigidity (Rig), the second and the third factors for left (LBrad) or right (RBrad) bradykinesia respectively. This structure is presented in Figure 13 and in Table 15. All three factors are correlated.

86

87

Fig. 13. Path diagram of the three-factor model of rigidity and bradykinesia
Goodness of Fit Indices and Standard Errors Summary:
Sample size = 405 Degrees of Freedom = 62 Satorra-Bentler Scaled Chi-Square = 815.89 (P = 0.0) Root Mean Square Error of Approximation (RMSEA) = 0.17 90 Percent Confidence Interval for RMSEA = (0.16; 0.18) Model AIC = 873.89 Expected Cross-Validation Index (ECVI) = 2.16 Normed Fit Index (NFI) = 0.91 Comparative Fit Index (CFI) = 0.91 Standardized Root Mean Square Residual (SRMR) = 0.098 Goodness of Fit Index (GFI) = 0.98 Fitted Residuals: Range = <-0.21; 0.28>; Median = 0.00 St. Errors: Range = <0.02; 0.11>; Median = 0.10; St. Deviation = 0.04

Table 15 Fitted residuals Neck RUE LUE RLE LLE FRhand FLhand HRhand HLhand RRhand RLhand LRleg LLleg Neck 0.00 0.00 -0.04 0.01 -0.05 0.10 0.01 0.02 -0.04 0.05 0.00 0.02 0.02 RUE 0.00 -0.07 0.16 -0.18 0.17 -0.17 0.14 -0.21 0.16 -0.19 0.03 -0.18 LUE 0.00 -0.18 0.07 -0.14 0.13 -0.14 0.10 -0.13 0.13 -0.16 0.03 RLE LLE FRhand FLhand

0.00 0.01 0.09 -0.14 0.04 -0.17 0.09 -0.15 0.06 -0.07

0.00 -0.17 0.09 -0.18 0.05 -0.15 0.08 -0.09 0.08

0.00 0.05 0.08 -0.10 0.01 -0.13 -0.11 -0.11

0.00 -0.07 0.03 -0.08 0.00 -0.07 -0.05

87

88
Table 15 (continued) Hrhand 0.00 0.04 0.03 -0.10 -0.10 -0.12 HLhand 0.00 -0.06 0.03 -0.04 -0.07 RRhand 0.00 0.05 -0.03 -0.04 RLhand LRleg LLleg

HRhand HLhand RRhand RLhand LRleg LLleg

0.00 -0.02 -0.02

0.00 0.28

0.00

Although the fit has been improved and this structure was presented in a number of studies, our analyses did not approve it. The reason is that one can see high residuals clustering for rigidity items. It indicates something unexplained for these items. Therefore it is reasonable to add a new factor of bradykinesia (labeled as Brad - see path diagram below). Moreover it follows the theory. This factor is correlated with the factor of rigidity.

Fig. 14. Path diagram of the four-factor model of rigidity and bradykinesia

Goodness of Fit Indices and Standard Errors Summary:


Sample size = 405 Degrees of Freedom = 52 Satorra-Bentler Scaled Chi-Square = 430.69 (P = 0.0) Root Mean Square Error of Approximation (RMSEA) = 0.13

88

89
90 Percent Confidence Interval for RMSEA = (0.12; 0.15) Model AIC = 508.69 Expected Cross-Validation Index (ECVI) = 1.26 Normed Fit Index (NFI) = 0.95 Comparative Fit Index (CFI) = 0.96 Standardized Root Mean Square Residual (SRMR) = 0.054 Goodness of Fit Index (GFI) = 0.99 Fitted Residuals: Range = <-0.14; 0.14>; Median = 0.00 St. Errors: Range = <0.03; 0.12>; Median = 0.06; St. Deviation = 0.03

Table 16 Fitted residuals Neck RUE LUE RLE LLE FRhand FLhand Hrhand HLhand RRhand RLhand LRleg LLleg Neck 0.00 -0.02 -0.01 -0.09 -0.07 0.14 0.10 0.06 0.06 0.07 0.09 -0.02 0.04 RUE 0.00 0.07 0.05 -0.06 -0.01 -0.02 -0.04 -0.06 0.02 -0.04 0.01 -0.09 LUE 0.00 -0.07 0.02 0.01 0.00 0.01 -0.03 0.00 0.00 -0.09 -0.01 RLE LLE FRhand FLhand

0.00 0.07 -0.02 -0.01 -0.06 -0.02 0.00 -0.01 0.02 -0.01

0.00 -0.04 0.00 -0.04 -0.05 -0.04 -0.01 -0.04 0.05

0.00 0.12 0.02 -0.02 -0.02 -0.06 -0.02 -0.13

0.00 0.00 0.01 -0.04 -0.02 -0.13 -0.01

Table 16 (continued) Hrhand 0.00 0.13 0.01 -0.02 -0.01 -0.14 HLhand 0.00 -0.01 0.01 -0.09 -0.03 RRhand 0.00 0.09 0.02 -0.10 RLhand LRleg LLleg

Hrhand HLhand RRhand RLhand LRleg LLleg

0.00 -0.08 0.02

0.00 0.11

0.00

Although the value of RMSEA is larger than the cutoff value of 0.05, the other fit indices and residuals suggest that this model explains intercorrelations sufficiently. In addition, this measurement model matches the theory best. In other words, this model can be accepted from both a statistical and substantive point of view. Generally, results suggest that rigidity and bradykinesia are stand-alone concepts even if they correlate. In addition except of item Rigidity head, neck (Neck) items relating to rigidity and bradykinesia are side-sensitive. In other words symptoms of

89

90 rigidity and bradykinesia co-occur and show more on one side of the body than on the other. Some other structures which might suit the theory have been evaluated, but some identification problems have been found in all of them and therefore it has not been possible to get any estimates.

Building SEM model of items related to axial/gait bradykinesia It was attempted to find the factor structure for the following items of the MS UPDRS: axial/gait bradykinesia: 1) Arising from chair (Arising) 2) Posture (Posture) 3) Gait (Gait) 4) Postural stability (Stabil) 5) Body bradykinesia and hypokinesia (BodyBra)
Table 17 Matrix of polychoric correlations of items related to axial/gait bradykinesia (N=405) Arising Posture Gait Stabil BodyBra Arising 1.00 0.78 0.79 0.83 0.70 Posture 1.00 0.75 0.75 0.73 Gait 1.00 0.72 0.69 Stabil BodyBra

1.00 0.68

1.00

By theory, all items should measure the so-called axial/gait bradykinesia (the factor is denoted hence as BBrad). Thus the one-factor structure is being evaluated (see Figure 15).

90

91

Fig. 15. Path diagram of the one-factor model of axial/gait bradyknesia

Goodness of Fit Indices and Standard Errors Summary:


Sample size = 405 Degrees of Freedom = 5 Satorra-Bentler Scaled Chi-Square = 9.64 (P = 0.086) Root Mean Square Error of Approximation (RMSEA) = 0.048 90 Percent Confidence Interval for RMSEA = (0.0; 0.093) Normed Fit Index (NFI) = 1.00 Comparative Fit Index (CFI) = 1.00 Standardized Root Mean Square Residual (SRMR) = 0.017 Goodness of Fit Index (GFI) = 1.00 Fitted Residuals: Range = <-0.03; 0.03>; Median = 0.00 St. Errors: Range = <0.02; 0.11>; Median = 0.06; St. Deviation = 0.04 Table 18 Fitted residuals Arising Posture Gait Stabil BodyBra Arising 0.00 -0.02 0.00 0.03 -0.03 Posture 0.00 0.01 -0.01 0.03 Gait 0.00 -0.03 0.01 Stabil BodyBra

0.00 -0.01

0.00

This unidimensional model provides acceptable values of all fit indices as well as values in the residual matrix close to zero. The fit of such unidimensional model is very good and thus the model is not rejected neither from the statistical nor the theoretical

91

92 point of view. AIC and ECVI fit indices are not reported since there is no alternative meaningful model for comparison.

Building Structural Equation Model of Entire MS UPDRS So far only the parts of the MS UPDRS were examined. In this section all 27 items are analyzed in an effort to build the structural equation model of the motor symptoms of Parkinson's disease. The aim is to discover the structure of motor symptoms of Parkinson's disease. First it was attempted to confirm two structures as suggested by Mokken's scale analysis (see Tables 2 and 3). The three dimensional one is demonstrated in Figure 16, the five-dimensional one in Figure 17. Note that the three-dimensional model does not include item Tremor at rest face, lips, chin and the five-dimensional one does not include items Tremor at rest face, lips, chin and Rigidity head, neck. These items were excluded by the MSPWin program since they did not fit to any of the subscales (dimensions) (see Mokken's scale analysis results). Since the MSA allows only to determine the number of dimensions (factors) but not the relationships (correlations) among them, correlations among factors are added in the following two analyses so that they suit the theory about the symptoms of PD. No residual matrices are reported in the following analyses since they would enlarge the thesis inadequatly.

92

Table 19

Matrix of polychoric correlations of the Motor Section of the UPDRS (N=405)

Matrix of polychoric correlations of the MS UPDRS

Facial TrFLC TrRUE TrLUE TrRLE TrLLE ATrRhand ATrLhand Neck RUE LUE RLE LLE FRhand FLhand HRhand HLhand RRhand RLhand LRleg LLleg Arising Posture Gait Stabil BodyBra

93
1.00 0.73 0.48 0.30 0.44 0.28 0.48 0.29 0.44 0.34 0.36 0.37 0.37 0.31 0.43 1.00 0.26 0.56 0.25 0.53 0.27 0.55 0.32 0.53 0.40 0.44 0.43 0.37 0.46 1.00 0.59 0.85 0.45 0.77 0.40 0.63 0.39 0.45 0.43 0.45 0.47 0.54 1.00 0.47 0.86 0.46 0.82 0.45 0.72 0.54 0.48 0.43 0.51 0.60

Speech 1.00 Speech 0.70 Facial 0.04 TrFLC TrRUE -0.05 TrLUE -0.05 TrRLE -0.19 TrLLE -0.27 ATrRhand -0.03 ATrLhand -0.02 0.35 Neck 0.23 RUE 0.25 LUE 0.36 RLE 0.32 LLE FRhand 0.38 FLhand 0.38 HRhand 0.37 HLhand 0.37 RRhand 0.42 RLhand 0.42 0.48 LRleg 0.46 LLleg Arising 0.54 Posture 0.55 0.45 Gait 0.49 Stabil BodyBra 0.51 1.00 0.52 0.64 0.18 0.55 0.13 0.06 0.32 -0.01 0.18 -0.09 0.20 -0.07 0.19 -0.07 0.20 -0.08 0.12 -0.13 -0.05 -0.07 0.01 -0.06 0.03 1.00 0.42 0.69 0.25 0.60 0.13 0.00 0.32 0.05 0.23 -0.03 0.23 -0.04 0.20 -0.02 0.26 0.04 0.17 0.12 0.02 0.11 0.06 0.11 1.00 0.68 0.37 -0.02 0.09 0.17 0.00 0.18 0.03 -0.01 -0.04 -0.05 -0.18 0.03 -0.12 0.00 -0.06 -0.03 -0.11 -0.05 -0.02 -0.01 1.00 0.04 0.45 0.04 -0.12 0.22 -0.07 0.26 -0.11 0.20 -0.20 0.09 -0.11 0.18 0.01 0.16 -0.01 -0.01 0.03 -0.03 0.03 1.00 0.60 0.15 0.35 0.09 0.19 0.00 0.16 -0.04 0.19 0.03 0.19 -0.04 0.10 -0.06 0.07 0.00 0.01 0.04 0.06 1.00 0.19 -0.02 0.34 0.01 0.23 -0.05 0.21 0.01 0.26 0.01 0.25 0.00 0.15 0.13 0.15 0.14 0.11 0.16 1.00 0.53 0.59 0.61 0.60 0.46 0.41 0.38 0.37 0.40 0.40 0.37 0.39 0.40 0.42 0.40 0.38 0.46 1.00 0.53 0.75 0.45 0.51 0.22 0.49 0.18 0.51 0.20 0.37 0.18 0.25 0.30 0.31 0.27 0.35 1.00 0.51 0.82 0.27 0.58 0.27 0.56 0.28 0.58 0.24 0.46 0.40 0.40 0.38 0.36 0.46 1.00 0.59 0.80 0.45 0.65 0.39 0.50 0.47 0.47 0.45 0.56 1.00 0.48 0.86 0.49 0.70 0.57 0.54 0.51 0.52 0.61 1.00 0.58 0.71 0.46 0.54 0.48 0.50 0.47 0.57 1.00 0.50 0.74 0.63 0.55 0.50 0.54 0.60 1.00 0.77 0.65 0.52 0.50 0.57 0.55 1.00 0.69 0.55 0.51 0.61 0.58 1.00 0.78 0.79 0.83 0.70

1.00 0.19 -0.06 0.07 -0.01 -0.08 0.02 0.09 0.42 0.36 0.38 0.36 0.37 0.28 0.35 0.26 0.32 0.33 0.38 0.35 0.42 0.49 0.48 0.38 0.41 0.52

1.00 0.43 0.57 0.46 0.37 0.12 0.26 0.20 0.00 0.07 0.02 -0.03 0.08 0.21 0.10 0.15 0.15 0.23 0.08 0.01 0.06 0.22 0.15 0.09 0.25

1.00 0.75 1.00 0.75 0.72 1.00 0.73 0.69 0.68

1.00

93

94

Fig. 16. Path diagram of the three-factor model of the MS UPDRS

94

95

Fig. 17. Path diagram of the five-factor model of the MS UPDRS

95

96 Goodness of Fit Indices and Standard Errors Summary for model in Figure 16:
Sample size = 405 Degrees of Freedom = 298 Satorra-Bentler Scaled Chi-Square = 2877.91 (P = 0.0) Root Mean Square Error of Approximation (RMSEA) = 0.15 90 Percent Confidence Interval for RMSEA = (0.14; 0.15) Model AIC = 2983.91 Expected Cross-Validation Index (ECVI) = 7.39 Normed Fit Index (NFI) = 0.87 Comparative Fit Index (CFI) = 0.88 Standardized Root Mean Square Residual (SRMR) = 0.13 Goodness of Fit Index (GFI) = 0.93 Fitted Residuals: Range = <-0.38; 0.40>; Median = -0.02 St. Errors: Range = <0.04; 0.26>; Median = 0.10; St. Deviation = 0.04

Goodness of Fit Indices and Standard Errors Summary for model in Figure 17:
Sample size = 405 Degrees of Freedom = 271 Satorra-Bentler Scaled Chi-Square = 2165.32 (P = 0.0) Root Mean Square Error of Approximation (RMSEA) = 0.13 90 Percent Confidence Interval for RMSEA = (0.13; 0.14) Model AIC = 2273.32 Expected Cross-Validation Index (ECVI) = 5.63 Normed Fit Index (NFI) = 0.89 Comparative Fit Index (CFI) = 0.90 Standardized Root Mean Square Residual (SRMR) = 0.12 Goodness of Fit Index (GFI) = 0.95 Fitted Residuals: Range = <-0.38; 0.37>; Median = 0.00 St. Errors: Range = <0.02; 0.18>; Median = 0.08; St. Deviation = 0.04

Interestingly, none of these models fits the data well. Although NIRT is based on different assumptions and computational procedures than SEM, the underlying idea is the same and therefore we expected similar results. Differences in results of NIRT and SEM can be explained due to different robustness properties of these two approaches, or as a consequence of slight changes between the models found by MSA and those tested by LISREL program (correlations among factors). Next, the hierarchical structure was tested. This structure emphasizes the laterality of the PD. Hence the concepts of laterality (RigBradR, RigBradL) were extended also for the tremor items and they were renamed as Left and Right. The general factor was added and denoted as MDPD (abbreviation for movement disorder of Parkinson's disease).

96

97

Fig. 18. Path diagram of the hierarchical model of the MS UPDRS

97

98 Goodness of Fit Indices and Standard Errors Summary:


Sample size = 405 Degrees of Freedom = 321 Satorra-Bentler Scaled Chi-Square = 2438.86 (P = 0.0) Root Mean Square Error of Approximation (RMSEA) = 0.13 90 Percent Confidence Interval for RMSEA = (0.12; 0.13) Model AIC = 2552.86 Expected Cross-Validation Index (ECVI) = 6.32 Normed Fit Index (NFI) = 0.89 Comparative Fit Index (CFI) = 0.90 Standardized Root Mean Square Residual (SRMR) = 0.14 Goodness of Fit Index (GFI) = 0.94 Fitted Residuals: Range = <-0.34; 0.68>; Median = -0.02 St.Errors: Range = <0.03; 1197.2>; Median = 0.37; St.Deviation = 349.1

This model of the MS UPDRS cannot be accepted since the fit indices are not good enough. The next model (Figure 19) demonstrates how the model fits when all SEM models of the parts of the MS UPDRS developed in previous sections are gathered together. However, there are some extensions and changes since for some parts of the MS UPDRS (especially for tremor) the acceptable model was not found. First, concepts of laterality found for tremor, rigidity and bradykinesia symptoms are merged together and denoted as left and right. Second, the strong relationship between items Speech and Facial expression (see Table 3) is expressed as the new factor denoted as Face. These items have been found by Mokken's scaling procedure to generate stand-alone dimension of speech/hypomimia. Further, they have been found to have strong relationships to the concepts of rigidity (Rig), bradykinesia (Brad) and axial/gait bradykinesia (BBrad) (see Table 2). Thus all these concepts are assumed to be correlated. Third, the best model for the tremor items (depicted in Figure 10) was replaced by the second best one (Figure 7) since the hierarchical structure led to nonconvergence of the estimation procedure. However, the fit of the one-factor structure was only slightly worse; and moreover, the hierarchical structure is partially substituted here by the extended concepts of laterality (left, right).

98

99

Fig. 19. Path diagram of the seven-factor model of the MS UPDRS

99

100 Goodness of Fit Indices and Standard Errors Summary:


Sample size = 405 Degrees of Freedom = 300 Satorra-Bentler Scaled Chi-Square = 899.33 (P = 0.0) Root Mean Square Error of Approximation (RMSEA) = 0.070 90 Percent Confidence Interval for RMSEA = (0.065 ; 0.076) Model AIC = 1055.33 Expected Cross-Validation Index (ECVI) = 2.61 Normed Fit Index (NFI) = 0.96 Comparative Fit Index (CFI) = 0.97 Standardized Root Mean Square Residual (SRMR) = 0.077 Goodness of Fit Index (GFI) = 0.99 Fitted Residuals: Range = <-0.40; 0.34>; Median = 0.00 St.Errors: Range = <0.02; 0.17>; Median = 0.07; St.Deviation = 0.04

Even if some residual correlations remained high (see Table 20), the fit indices and values in the residual matrix suggest this model need not to be rejected. The values of the intercorrelations among factors of rigidity, bradykinesia, speech/hypomimia and axial/gait bradykinesia indicate that there is one general factor underlying them which is consistent with the results of the Mokken's scaling procedure. Such structure, however, led to nonconvergence of the estimation preocedure and therefore the results are not presented.

100

Table 20

Fitted residuals

Fitted Residuals

Facial TrFLC TrRUE TrLUE TrRLE TrLLE ATrRhand ATrLhand Neck RUE LUE RLE LLE FRhand FLhand HRhand HLhand RRhand RLhand LRleg LLleg Arising Posture Gait Stabil BodyBra

101
0.00 -0.06 -0.01 0.01 0.00 0.01 -0.03 0.01 0.00 -0.07 0.00 0.00 0.04 0.03 0.00 0.08 0.00 0.09 -0.03 0.00 -0.07 -0.03 -0.02 -0.03 0.02 0.00 -0.07 -0.02 0.00 -0.08 0.02 0.00 -0.03 0.01 -0.04 -0.03 -0.03 0.00 -0.02 0.06 -0.03 0.03 0.04 -0.03 0.04 0.00 0.11 0.05 -0.04 -0.01 -0.09 -0.02 -0.14 -0.06 -0.04 -0.01 0.00 0.05 0.00 -0.01 0.02 -0.05 -0.01 -0.12 -0.01 -0.03 -0.04 -0.07 -0.01 0.06 0.00 0.10 0.02 -0.05 -0.01 -0.15 -0.02 -0.01 0.00 -0.03 0.06

Speech 0.00 Speech 0.00 Facial 0.04 TrFLC TrRUE -0.05 -0.05 TrLUE -0.19 TrRLE -0.27 TrLLE ATrRhand -0.03 ATrLhand -0.02 -0.05 Neck -0.06 RUE -0.09 LUE -0.01 RLE -0.06 LLE FRhand 0.03 FLhand -0.01 HRhand 0.01 HLhand -0.03 RRhand 0.04 RLhand 0.02 0.06 LRleg 0.02 LLleg 0.00 Arising 0.05 Posture -0.03 Gait 0.00 Stabil BodyBra 0.00 0.00 -0.14 0.04 -0.13 0.00 0.13 0.00 0.05 0.05 0.02 -0.03 -0.04 -0.04 -0.07 -0.02 0.00 0.04 0.04 0.12 0.02 0.11 0.06 0.11 0.00 0.19 -0.08 -0.40 0.09 0.11 0.00 0.14 0.03 -0.07 -0.04 -0.12 -0.18 -0.03 -0.12 -0.02 -0.06 -0.03 -0.11 -0.05 -0.02 -0.01 0.00 -0.29 -0.05 0.04 -0.12 0.02 -0.07 0.10 -0.11 -0.01 -0.20 -0.12 -0.11 -0.02 0.01 0.06 -0.01 -0.01 0.03 -0.03 0.03 0.00 0.34 0.15 0.11 0.09 0.03 0.00 -0.10 -0.04 -0.07 0.03 -0.03 -0.04 0.01 -0.06 0.07 0.00 0.01 0.04 0.06 0.00 0.19 -0.02 0.08 0.01 0.02 -0.05 -0.06 0.01 -0.01 0.01 -0.01 0.00 0.02 0.13 0.15 0.14 0.11 0.16 0.00 0.01 -0.03 -0.05 -0.09 0.16 0.08 0.08 0.03 0.09 0.06 0.01 0.02 -0.06 -0.01 -0.01 -0.04 0.02 0.00 0.09 0.05 -0.04 -0.04 -0.01 -0.07 -0.06 -0.02 -0.05 -0.02 -0.08 -0.08 -0.01 0.01 -0.03 0.04 0.00 -0.03 0.01 -0.10 -0.04 -0.01 0.01 -0.01 -0.01 0.06 0.00 0.06 0.03 -0.10 -0.01 -0.03 0.01 -0.03 0.04 0.00 -0.09 0.00 0.04 0.01 -0.02 0.00 0.04 0.00 0.13 0.04 -0.05 -0.05 0.00 -0.04 0.00 0.06 -0.03 -0.05 0.03 -0.03 0.00 -0.01 0.03 0.05 -0.11

0.00 0.19 -0.06 0.07 -0.01 -0.08 0.02 0.09 0.06 0.10 0.07 0.02 0.02 -0.04 0.00 -0.07 -0.05 -0.02 0.00 -0.04 0.02 -0.01 0.02 -0.06 -0.05 0.05

0.00 0.02 0.11 -0.04 -0.04 -0.21 -0.05 0.20 0.00 0.07 0.02 -0.03 0.08 0.21 0.10 0.15 0.15 0.23 0.08 0.01 0.06 0.22 0.15 0.09 0.25

0.00 0.04 0.10 -0.24 0.04 -0.19 0.06 0.07 -0.01 0.01 -0.09 -0.06 -0.07 -0.07 -0.07 -0.02 -0.08 0.02 -0.13 -0.05 -0.07 0.01 -0.06 0.03

0.00 0.05 0.00 0.02 0.02 0.00 -0.03 -0.03 -0.07 0.00

101

102 Differences Between Models for Patients in on and off States In this section we discuss the differences between structure of motor symptoms for PD patients in on state and off state. During the UPDRS testing all patients were evaluated in defined motor states (60 patients were examined in defined off and 310 patients in defined on motor states). Definitions of defined on and defined off states were mentioned in the section Clinical Symptoms of Parkinson's Disease. For purposes of the following analyses the sample of PD patients was divided into two subsamples with respect to the patient's motor state and these subsamples were subsequently compared using LISREL program. Unfortunately, the DWLS estimator in the LISREL program has not worked for multigroup analysis. Probably some bugs are present in the program. Thus, the Robust Maximum Likelihood (RML) estimator has been employed. First, the hypothesis that correlation matrices for patients in on and off states are not significantly different was investigated to avoid meaningless results for testing level of model's invariance across motor states. Then the various hypotheses about the equality of the factor loadings, error variances and factor correlations were tested. Results are exposed in Table 21.

Table 21 Differences between models for patients in on and off states Equal correlation matrices Equal factor loadings Equal factor correlations Equal error variance Satorra-Bentler chi-square Degrees of freedom RMSEA Model AIC NFI CFI Yes 451.88 378 0.031 879.47 0.93 0.99 Yes Yes Yes 1348.44 678 0.070 505.33 0.77 0.87 Yes Yes No Yes No Yes 1343.56 657 0.072 (0.067; 0.078) 541.23 0.77 0.87 No Yes Yes -

90 Percent Confidence Interval for RMSEA (0.018; 0.042) (0.065; 0.076)

RFI 0.87 0.76 0.75 For models with empty cells the iteration procedure was not performed since the fitted covariance matrix produced by the Robust Maximum Likelihood estimator was not positive definite

102

103
Table 21 (continued) Equal correlation matrices Equal factor loadings Equal factor correlations Equal error variance Satorra-Bentler chi-square Degrees of freedom RMSEA 90 Percent Confidence Interval for RMSEA Model AIC NFI CFI Yes No No 1347.27 630 0.075 (0.070; 0.081) 592.14 0.77 0.86 No Yes No No No Yes 1268.71 612 0.073 (0.067; 0.079) 603.36 0.78 0.87 No No No 1278.02 585 0.077 (0.071; 0.082) 651.29 0.78 0.87

RFI 0.74 0.75 0.74 For models with empty cells the iteration procedure was not performed since fitted covariance matrix produced by the Robust Maximum Likelihood estimator was not positive definite

Obviously, the hypothesis about equality of the correlation matrices for patients in on and off need not to be rejected which allows us to interpret the rest of analyses. The fit indices are very similar for all hypotheses. This might be due to small sample size of patients in off state which leads to incapability of statistical procedures to reject any hypothesis. However, the hypothesis that all parameters (i.e. factor loadings, factor correlations and standard errors) are the same in the populations of patients in on and off states seems to be slightly favored (compare RMSEA, AIC, RFI). In other words, the structure of motor states seems to be stable across motor states which is consistent with other studies (Stebbins & Goetz, 1998; Stebbins et al., 1999). However, since the results are rather inconclusive and for several structures the RML failed the cross-validation study with considerably larger sample size of patients in off state is necessary.

Summary of Results of Structural Equation Modeling Previous analyses represent SEM approach for investigation of dimensionality of the Motor Section of the Unified Parkinson's Disease Rating Scale. A number of theoretically meaningful models was tested resulting in the conclusion that the MS UPDRS is bi-factorial and consists of seven dimensions tremor, rigidity, bradykinesia 103

104 of the extremities, axial/gait bradykinesia, speech/hypomimia and two dimensions of laterality. In accordance with results of the Mokken's scale analysis, rigidity, bradykinesia of the extremities, speech/hypomimia and axial/gait bradykinesia are correlated. High values of these correlations suggest to add the general factor. Such a hierarchical model, however, led to nonconvergence of the estimation procedure and therefore estimation of model's parameters failed. Differences between models for patients in on and off states were investigated. The hypothesis that all parameters in the model (i.e. factor loadings, factor correlations and standard errors) are the same in the populations of patients in on and off states seems to be slightly favored. This proposes that the structure of motor symptoms is the same for patients in on and off states. Nevertheless, other less restrictive models were also acceptable and therefore the cross-validation is necessary.

104

105 Discussion In the presented study, the structure of motor symptoms of Parkinson's disease was investigated. For this purpose structural equation modeling (SEM) and Mokken's scale analyses (MSA) of the Motor Section of the Unified Parkinson's Disease Rating Scale were performed. The suitability of using these methods for this kind of scale follows from the sample size, sample distributions of item responses, ordinal measurement level of the items, and from the assumptions of the Mokken's model (Sijtsma & Molenaar, 2002) as well as of the estimators of SEM. After studying the distributional properties of the items, we found that the items distributions were positively skewed and had high values of kurtosis. It can be argued that this was influenced by the choice of PD patients with relatively low level of impairment. However, our sample was taken in an outpatient clinic representing a current PD patient population where most often routine UPDRS testing is performed. Another reason for high skewness and kurtosis of data could be found in the wording of response categories itself, which may be more appropriate for the testing of severe impairment in a late stage of PD. The tendency for taking lower response categories was confirmed by comparing different samples (Stebbins, 2004). The item Tremor at rest face, lips, chin had an extremely small variance in our sample. From an analytic point of view, the low variance means that almost no information can be obtained from this item, implying that it hardly discriminates between subjects. This implies the automatic exclusion of such an item from analysis by MSPWin (Boer, 2001). Several studies (Cubo et al., 2000; Martignoni et al., 2003; Martinez-Martin et al., 1994; Stebbins & Goetz, 1998; Stebbins et al., 1999) assessed the construct validity and the dimensionality of the Motor Section of the UPDRS through exploratory factor analysis (EFA). These studies found between three and six factors that accounted for a proportion ranging from 59% to 78% of the total scale variance (without reporting how these proportions were computed). However, as only EFA was performed, the conclusions about the dimensionality may not be trustworthy because using factor analysis models and corresponding estimation methods (Principal Component Analysis,

105

106 Maximum Likelihood) requires the fulfillment of several assumptions: (1) Principal Component Analysis (PCA) requires a continuous measurement level (Higuchi & Eguchi, 2004); (2) Maximum likelihood (ML) estimation requires continuous measurement level and either normally distributed item responses or a large number of observations which may compensate for small degrees of nonnormality (Boomsma & Hoogland, 2001). Previous studies with the UPDRS had not referred to the item distribution and, moreover, low sample sizes of n < 300 were used to make inferences about dimensionality (Cubo et al., 2000; Martignoni et al., 2003; Stebbins & Goetz, 1998; Stebbins et al., 1999). In addition, the measurement of UPDRS is obviously ordinal instead of continuous, which may also pose problems when using ML estimator or PCA (Higuchi & Eguchi, 2004). On the other hand, Mokken's scale analysis is rather suitable for examining ordinal data and it may be applied in small samples (Sijtsma & Molenaar, 2002). In the present analysis, for the cutoff criterion of Loevinger's scalability coefficient H i = 0.3 , three different theoretically and clinically meaningful

unidimensional subscales were found. In other words, three dimensions were established. The first subscale consists of items related to rigidity, bradykinesia of the extremities, axial/gait bradykinesia, and items Speech and Facial expression. Therefore this scale can be an indicator of co-occurrence of these symptoms of PD. So far, studies have reported that these three concepts should be separated (Cubo et al., 2000; Stebbins & Goetz, 1998; Stebbins et al., 1999). However, for most patients in common PD populations, the main symptoms co-occur whereas isolated tremor or bradykinesia and rigidity of an extremity may be solely present in very early stages of PD. Interestingly enough, after increasing the cutoff criterion to H i = 0.4 , items measuring rigidity and bradykinesia of the extremities of the right side of the body generated another subscale, which indicates they are side-sensitive. Indeed, in a clinical cohort it has been shown that initial PD symptoms start more frequently on the rightsided extremities than on the left (Poewe & Wenning, 1998). This might account for more independent behavior of right-sided items in group comparisons. Using EFA methods, side-sensitivity of bradykinesia of the extremities was mentioned before

106

107 (Stebbins & Goetz, 1998; Stebbins et al., 1999) as well as that of action/postural tremor (Cubo et al., 2000). Side-sensitivity of rigidity and rest tremor, however, has never been reported so far. The third and the fourth subscale consist of items designed to measure right tremor and left tremor respectively, both resting and action/postural. Moreover, these two subscales consisted of the same items across all analyses. This means that there is a statistical evidence that items related to tremor are side-sensitive. Previous studies had reported tremor as a differently structured two-dimensional concept; the first dimension for resting tremor, the second one for action/postural tremor (Cubo et al., 2000; Stebbins & Goetz, 1998; Stebbins et al., 1999). Further, the relative independence of tremor from rigidity and bradykinesia can be viewed as an indicator of the lack of significant relationship between tremor and PD disability which is consistent with other reports (Henderson, Kennard, & Crawford, 1991; Reynolds & Montgomery, 1987). The fifth subscale (speech/hypomimia) consisted of two items only Speech and Facial expression. From the statistical point of view, however, such a subscale is very limited. Previous studies had reported these items measuring axial/gait bradykinesia (Cubo et al., 2000; Stebbins & Goetz, 1998; Stebbins et al., 1999). However, the scale measuring axial/gait bradykinesia has statistically better properties if Speech and Facial expression are not included. Speech may be harder to evaluate based on the itemcategory wording, which is also reflected in studies showing an unacceptably low interrater reliability of these items (Camicioli et al., 2001; Richards, Marder, Cote, & Mayeux, 1994). From a clinical point of view, often the impairment of speech does not correspond to the patient's general motor status and may be worse (or less frequently better) than the rest of the examination. Although the rule of thumb for cutoff values of Hi coefficients has been discussed in some studies (Hemker et al., 1995; Molenaar & Sijtsma, 2000; Sijtsma & Molenaar, 2002), no final conclusion about the best cutoff value has been reached. To be sure that the level of homogeneity of the final subscales is high enough to be unidimensional, we prefer a cutoff Hi value of 0.4. Therefore from a statistical point of view, we favor the five-dimensional MS UPDRS structure rather than a three-dimensional one. The Mokken's scaling procedure does not provide any investigation of the relationships among the dimensions. In addition, within the MSPWin program each item 107

108 can be associated with one dimension only. Therefore the confirmatory factor analysis (CFA) and SEM using Diagonally Weighted Least Squares (DWLS) estimator were employed (Jreskog & Srbom, 2004b). They offer evaluation of more complicated structures and relationships among dimensions. The limitation of this method is that it requires estimation of the asymptotic variances and at is why large sample size is needed. The SEM showed that five dimensions are not enough to explain items' intercorrelations of the MS UPDRS. A number of theoretically meaningful models was tested resulting in the conclusion that the MS UPDRS is bi-factorial and consists of seven dimensions tremor, rigidity, bradykinesia of the extremities, axial/gait bradykinesia, speech/hypomimia and two dimensions of laterality. Following the results of the Mokken's scaling procedure, rigidity, bradykinesia of the extremities, speech/hypomimia and axial/gait bradykinesia are correlated. High values of these correlations call for adding the general factor. Such a model, however, led to nonconvergence of the estimation procedure and therefore estimation of model's parameters failed. The level of discrepancy between results of Mokken's scale analysis and SEM is, however, surprising, since the underlying ideas behind NIRT and SEM are very similar (e.g. it was shown by Takane and Leeuw (1987) that the normal ogive model for graded scores (one of IRT models) is formally equivalent to factor analysis of ordinal data proposed by Muthn (1984)). This difference could be due to different robustness properties of DWLS estimator and Mokken's scale analysis against small sample size and nonnormality or due to impossibility of investigation of bi-factorial structures in MSA. Another problem with the previous research on the dimensionality of the UPDRS Motor Section is that many studies assessed its overall reliability by Cronbach's alphacoefficient (Cubo et al., 2000; Martignoni et al., 2003; Stebbins & Goetz, 1998). These studies found overall Cronbach's alpha to be very high (0.9 or higher). This coefficient is however more suitable for unidimensional scales (Kamata, Turhan, & Darandari, 2003) and it can even seriously mislead a researcher when used for multidimensional scales (Schmitt, 1996). Thus Cronbach's alpha is computed for each subscale separately in the present study. For comparative purposes we chose the Rho coefficient here. To estimate an overall reliability of composite scales such as the Motor Section of UPDRS, coefficients such as multidimensional extension of McDonald's omega 108

109 (McDonald, 1970) should be used (Kamata et al., 2003). Estimation of the overall reliability was not the main focus of this research and will be presented in the future study. The lower boundaries of the estimated reliabilities of tremor-right, and tremor-left by Cronbach's alpha are not very satisfactory; alpha equals 0.62 for tremor-right and 0.65 for tremor-left. The reason is obvious these subscales consist only of three items each. Other studies (Stebbins & Goetz, 1998; Stebbins et al., 1999) had found adequate reliability either of tremor at rest (ranging from 0.85 to 0.91) or action/postural tremor (ranging from 0.80 to 0.85). These results call for a follow-up study assessing the influence of this difference on the overall UPDRS score reliability. The Cronbach's alpha for the limited dimension of speech/hypomimia equals 0.76. The Cronbach's alphas of all other subscales are acceptable and comparable with previous reports (Stebbins & Goetz, 1998; Stebbins et al., 1999). In addition, these subscales are very homogenous and they consist of a sufficient number of items. From a statistical and substantive point of view, these subscales are satisfactory. The lack of reliability of tremor-right and tremor-left subscales was confirmed by computing the Rho coefficient. These coefficients equal 0.66 for tremor-right and 0.70 for tremor-left. Reliabilities estimated by Rho coefficient of all other subscales would lead to unreliable conclusions due to many intersections of ISRFs. The state of the patient (on or off) can, in principle, influence the dimensionality. Thus the differences between models for patients in on and off states were investigated. Results suggested that there are no differences between models for on and off states of patients. In other words, factor loadings, error variances as well as factor intercorrelations are invariant across motor states. This proposes that the structure of motor symptoms is the same for patients in on and off states, which is consistent with other studies (Stebbins & Goetz, 1998; Stebbins et al., 1999). Nevertheless, other less restrictive models were also acceptable. It was argued that it is due to small sample size of patients in off state which leads to incapability of statistical procedures to reject any hypothesis. Moreover, for several models the estimation of parameters failed. That is the reason why the cross-validation is necessary.

109

110 In conclusion, the findings of the present study should be considered in the context of an exploratory Mokken's scale analysis and structural equation modeling. Such analyses require follow-up cross-validation studies confirming the (factor) structure of the MS UPDRS. Regarding the skewness and kurtosis of item responses validated across various samples of Parkinson's disease patients, the necessity of considerably larger sample sizes turned up to make model's parameters more reliable. It is believed that this study can widen the present knowledge regarding number and structure of the motor symptoms of Parkinson's disease. In addition, diagnostic quality of the MS UPDRS in the sense of reliability was examined which can contribute to the future improvements on the PD rating scales. In future studies, the structure of Parkinsonian signs should be cross-validated in different samples and states of patients and the overall reliability of the Motor Section of the UPDRS should be estimated.

110

111 CONCLUSION The Motor Section of the Unified Parkinson's Disease Rating Scale (MS UPDRS) is multidimensional, which is in accordance with our hypothesis. Nevertheless making the conclusion about the number of dimensions is an uphill job. Mokken's scaling procedure demonstrated that five dimensions underlie MS UPDRS (right tremor, left tremor, speech/hypomimia, right-sided items of rigidity and bradykinesia of the extremities, axial/gait bradykinesia together with left-sided items of rigidity and bradykinesia of the extremities). From the SEM point of view, however, the MS UPDRS is seven-dimensional with bi-factorial structure. The concepts of rigidity, tremor, bradykinesia of the extremities, axial/gait bradykinesia and speech/hypomimia are substantial. They are accompanied by two factors denoted as left and right accounting for laterality of tremor, rigidity, and bradykinesia of the extremities. As confirmed using Mokken's model factors of rigidity, bradykinesia of the extremities, axial/gait bradykinesia and speech/hypomimia are statistically highly related and one can view them as one factor only. Both Mokken's scaling procedure and SEM showed that tremor is a relatively independent symptom with strong laterality. The generic reliability of this compounded concept (rigidity, speech/hypomimia, bradykinesia of the extremities, axial/gait bradykinesia) is higher than 0.9 and thus satisfactory. Generic reliabilities higher than 0.9 were also found for the stand-alone concepts of bradykinesia of the extremities and axial/gait bradykinesia. Reliability of rigidity equals 0.85. The generic reliability of tremor was found to be poor. Generally, the hypothesis about the poor reliabilities of dimensions has to be rejected. Analyses suggested that there are likely no differences in the structure of motor symptoms between the patients in on and off state. In other words, factor loadings, factor correlations as well as the error variances are the same for patients in both motor states. Thus, the hypothesis concerning the correspondence of items' factor loadings for on and off state can be accepted. However the cross-validation is necessary. Item Tremor at rest face, lips, chin was found to have very low variance. In other words, this item hardly contributes to the information about the level of patient's tremor. From the statistical point of view, this item is redundant in the MS UPDRS.

111

112 REFERENCES Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. Paper presented at the Second International Symposium on Information Theory, Budapest. Akaike, H. (1985). Prediction end entropy. In A. C. Atkinson & S. E. Feinberg (Eds.), A celebration of statistics (pp. 1-24). New York: Springer-Verlag. Akaike, H. (1987). Estimation of linear models with incomplete data. Psychometrika, 52, 317-332. Allam, M. F., Del Castillo, A. S., & Navajas, R. F. (2002). Smoking and Parkinson's disease. European Journal of Neurology, 9(3), 315-316. Anderson, T. W. (1959). Some scaling methods and estimation procedures in the latent class model. In U. Grenander (Ed.), Probability and statistics (pp. 9-38). New York: Wiley. Anderson, T. W., & Rubin, H. (1956). Statistical inference in factor analysis. Paper presented at the Third Berkeley Symposium for Mathematical Statistic and Probability, Berkeley. Arbuckle, J. L. (2003). AMOS (Version 5.0.1). Spring House: Amos Development Corporation. Bentler, M. P., & Chou, C. P. (1987). Practical issues in structural modeling. Sociological Methods and Research, 16(1), 78-117. Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107, 238-246. Bentler, P. M. (1995a). EQS (Version 5.7). Encino: Multivariate Software. Bentler, P. M. (1995b). EQS structural equations program manual. Encino: Multivariate Software. Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88,, 588-606. Bentler, P. M., & Weeks, D. G. (1980). Linear structural equations with latent variables. Psychometrika, 45, 289-308. Blahu, P. (1976). K teorii testovn pohybovch schopnost (1. ed.). Prague: Charles University. Blahu, P. (1980). Zklady model latentnch promnnch vetn faktorov analzy. Prague: Charles University. Blahu, P. (1985). Faktorov analza a jej zobecnn. Praha: SNTL. Blahu, P. (1991). Latent variable modeling of theoretical concepts in behavioral sciences as a case of measurement. Acta Universitatis Carolinae, 27(2), 5-11. Blahu, P. (1996a). Concept formation via latent variables modeling of motor abilities. Kinesiology, 28(2), 12-21. Blahu, P. (1996b). K systmovmu pojet statistickch metod v metodologii empirickho vzkumu chovn. (1. ed.). Prague: Karolinum. Blahu, P. (2004). On methodological aspects of building human movement science: psychomotricity and kinanthropology. Acta Universitatis Carolinae, 40(2), 5-18. Blahu, P., elikovsk, S., & Kov, R. (1973). K nelinern faktorov analze motorickch schopnost. Acta Universitatis Carolinae, 9(1), 31-44.

112

113 Blahu, P., Dobr, L., Hohler, V., Hoek, V., Svato, V., & Svoboda, B. (1993). Kinanthropology - a new recognized scientific discipline. Acta Universitatis Carolinae, 29(2), 61-78. Boer, P. (2001). MSPWin (Version 5.0). Groningen: ProGAMMA. Bollen, K. A. (1986). Sample size and Bentler & Bonett's nonnormed fit index. Psychometrika, 51, 375-377. Bollen, K. A. (1989). Structural equations with latent variables. New York: John Wiley&sons. Boomsma, A. (1983). On the robustness of LISREL (maximum likelihood estimation) against small sample size and non-normality. Groningen: ICS. Boomsma, A. (2000). Reporting analyses of covariance structures. Structural Equation Modeling: A Multidisciplinary Journal, 7(3), 461-483. Boomsma, A. (2004). Structural equation modeling.Unpublished manuscript, University of Groningen. Boomsma, A., & Hoogland, J. J. (2001). The robustness of LISREL modeling revisited. In R. Cudeck, S. Toit & D. Srbom (Eds.), Structural equation modeling: Present and future. A festschrift in honour of Karl Jreskog (pp. 139-168). Chicago, IL: Scientific Software International. Browne, M. W., & Cudeck, R. (1989). Single sample cross-validation indices for covariance structures. Multivariate Behavioral Research, 24, 445-455. Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & S. J. Long (Eds.), Testing structural equation models (pp. 136-162). Newbury Park: Sage. Browne, M. W., & Mels, G. (1990). RAMONA users guide. Columbus: Department of Psychology, Ohio State University. Burt, C. L. (1925). The young delinquent. London: University of London Press. Byrne, B. (2001). Structural equation modeling with AMOS: Basic concepts, applications, and programming. Mahvah, New Jersey: Lawrence Erlbaum Associates, Publishers. Camicioli, R., Grossmann, S. J., Hudnell, K., & Anger, K. W. (2001). Discriminating mild parkinsonism: Methods for epidemiological research. Movement Disorders, 16(1), 33-40. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334. Cubo, E., Stebbins, G. T., Golbe, L. I., Nieves, A., Leurgans, S., Goetz, C. G., et al. (2000). Application of the Unified Parkinson's Disease Rating Scale in progressive supranuclear palsy: Factor analysis of the motor scale. Movement Disorders, 15(2), 276-279. epika, L. (2000). Kvantitativn pspvek ke kvalitativn analze vrchnho hodu jednoru. Tlesn Vchova a Sport Mldee, 66(8), 42-47. epika, L. (2003). Modely teorie polokovch odpovd v diagnostice motoriky lovka. Plze: Zpadoesk Univerzita v Plzni. epika, L., Rika, E., tochl, J., & Blahu, P. (2003). On statistical methodology to study brain-behavioral relationships: Preliminary study. Paper presented at the Sixth World IBRO Congress of Neurosciences, Prague, Czech Republic.

113

114 Deleu, D. (2001). Smoking, alcohol, and coffee consumption preceding Parkinson's disease. Neurology, 56(7), 984-985. Douglas, J., Kim, H., Habing, B., & Gao, F. (1998). Investigating local dependence with conditional covariance functions. Journal of Educational and Behavioral Statistics, 23, 129-151. Dunteman, G. H. (1989). Principal component analysis. Newbury Park: Sage. Dvokov, H. (2002). Preschool childrens body posture: The diagnostic quality of field methods and internal validity of their partial items. Paper presented at the Second International Science and Expert motometry to the qualitative motodiagnostics, Krjanska Gora, Slovenia, Ljubjana. Ebbitt, A. (2005). Parkinsons disease. Retrieved 1.4., 2005, from http://serendip.brynmawr.edu/biology/b103/f97/projects97/Ebbitt.html#8 Eliason, S. R. (1993). Maximum likelihood estimation: Logic and practice. Iowa: Sage Publications. Ethington, C. A. (1987). The robustness of LISREL estimates in structural equation models with categorical variables. Journal of Experimental Education, 55(2), 8088. Ferraz, H. B., Andrade, L. A., Tumas, V., Calia, L. C., & Borges, V. (1996). Rural or urban living and Parkinson's disease. Arquivos de Neuro-Psiquiatria, 54(1), 3741. Foltynie, T., Sawcer, S., Brayne, C., & Barker, R. A. (2002). The genetic basis of Parkinson's disease. Journal of Neurology, Neurosurgery & Psychiatry, 73(4), 363-370. Friedman, A. (1985). [Dyskinesia as a complication of the treatment of Parkinson's disease with L-dopa--clinical observations]. Neurologia i Neurochirurgia Polska, 19(4), 291-294. Gasser, T. (1998). Genetics of Parkinson's disease. Annals of Neurology, 44(3 Suppl 1), S53-57. Goetz, C. G. (2003). The Unified Parkinson's Disease Rating Scale (UPDRS): Status and recommendations. Movement Disorders, 18(7), 738-750. Goetz, C. G., Leurgans, S., & Raman, R. (2002). Placebo-associated improvements in motor function: comparison of subjective and objective sections of the UPDRS in early Parkinson's disease. Movement Disorders, 17(2), 283-288. Goetz, C. G., LeWitt, P. A., & Weidenman, M. (2003). Standardized training tools for the UPDRS activities of daily living scale: newly available teaching program. Movement Disorders, 18(12), 1455-1458. Goetz, C. G., Stebbins, G. T., & Chmura, T. A. (1995). Teaching tape for the motor section of the Unified Parkinson's Disease Rating Scale. Movement Disorders, 10, 263-266. Gorell, J. M., Johnson, C. C., Rybicki, B. A., Peterson, E. L., & Richardson, R. J. (1998). The risk of Parkinson's disease with exposure to pesticides, farming, well water, and rural living. Neurology, 50(5), 1346-1350. Haavelmo, T. (1943). The statistical implications of a system of simultaneous equations. Econometrica, 11, 1-12.

114

115 Hemker, B. T., Sijtsma, K., & Molenaar, I. W. (1995). Selection of unidimensional scales from a multidimensional item bank in the polytomous Mokken's IRT model. Applied Psychological Measurement, 19(4), 337-352. Hemker, B. T., Sijtsma, K., Molenaar, I. W., & Junker, B. W. (1996). Polytomous IRT models and monotone likelihood ratio of the total score. Psychometrika, 61(4), 679-693. Hemker, B. T., Sijtsma, K., Molenaar, I. W., & Junker, B. W. (1997). Stochastic ordering using the latent trait and the sum score in polytomous IRT models. Psychometrika, 62(3), 331-347. Henderson, L., Kennard, C., & Crawford, T. (1991). Scales for rating motor impairment in Parkinson's disease: studies of reliability and convergent validity. Journal of Neurology, Neurosurgery & Psychiatry, 54(1), 18-24. Hernan, M. A., Zhang, S. M., Rueda-deCastro, A. M., Colditz, G. A., Speizer, F. E., & Ascherio, A. (2001). Cigarette smoking and the incidence of Parkinson's disease in two prospective studies. Annals of Neurology, 50(6), 780-786. Higuchi, I., & Eguchi, S. (2004). Robust principal component analysis with adaptive selection for tuning parameters. Journal of Machine Learning Research, 5, 453471. Hintze, J. (1996). NCSS (Version 6.0.21). Kaysville, Utah. Hoehn, M. M. (1992). The natural history of Parkinson's disease in the pre-levodopa and post-levodopa eras. Neurologic Clinics, 10(2), 331-339. Hoogland, J. J. (1999). The robustness of estimation methods for covariance structure analysis. Groningen: University of Groningen. Hoskens, M., & De Boeck, P. (1997). A parametric model for local item dependencies among test items. Psychological Methods, 2, 261-277. Hu, L., & Bentler, M. P. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1-55. Hughes, A. J., Daniel, S. E., Kilford, L., & Lees, A. J. (1992). Accuracy of clinical diagnosis of idiopathic Parkinson's disease: a clinico-pathological study of 100 cases. Journal of Neurology, Neurosurgery & Psychiatry, 55(3), 181-184. Hunter, K. R., Shaw, K. M., Laurence, D. R., & Stern, G. M. (1973). Sustained levodopa therapy in parkinsonism. Lancet, 2(7835), 929-931. Ip, E. H. (2000). Adjusting for information inflation due to local dependency in moderately large item clusters. Psychometrika, 65, 73-91. Ip, E. H. (2001). Testing for local dependency in dichotomous and polytomous item response models. Psychometrika, 66, 109-132. James, W. H. (2003). Coffee drinking, cigarette smoking, and Parkinson's disease. Annals of Neurology, 53(4), 546; author reply 546. Jankovic, J., & Tolosa, E. (Eds.). (2002). Parkinson's disease and movement disorders. Philadelphia: Lippincott Williams & Wilkins. Jreskog, K. G. (1967). Some contributions to maximum likelihood factor analysis. Psychometrika, 32, 443-482. Jreskog, K. G. (1970). A general method for analysis of covariance structures. Biometrika, 57, 239-251.

115

116 Jreskog, K. G. (1973). A general method for estimating a linear structural equation system. In A. S. Goldberger & O. D. Duncan (Eds.), Structural equation models in the social sciences (pp. 85-112): Academic Press. Jreskog, K. G. (1999). How large can a standardized coefficient be? Retrieved 20.8., 2005, from http://www.ssicentral.com/lisrel/techdocs/HowLargeCanaStandardizedCoefficient be.pdf Jreskog, K. G., & Goldberger, A. S. (1972). Factor analysis by generalized least squares. Psychometrika, 37, 243-260. Jreskog, K. G., & Srbom, D. (1981). LISREL V: Analysis of linear structural relationships by the method of maximum likelihood. Chicago: National Educational Resources. Jreskog, K. G., & Srbom, D. (1988). PRELIS. A program for multivariate data screening and data sumarization. Users guide (2. ed.). Chicago: Scientific Software International. Jreskog, K. G., & Srbom, D. (1993). LISREL 8: Structural equation modeling with the SIMPLIS command language. Chicago, IL: Scientific Software International, Inc. Jreskog, K. G., & Srbom, D. (2002). PRELIS (Version 2.54). Lincolnwood, Illinois: Scientific Software International, Inc. Jreskog, K. G., & Srbom, D. (2004a). Lisrel (Version 8.7). Lincolnwood, Illinois: Scientific Software International, Inc. Jreskog, K. G., & Srbom, D. (2004b). Online help file for LISREL 8.72 for Windows. Lincolnwood, Illinois: Scientific Software International, Inc. Jreskog, K. G., & Srbom, D. (2005). LISREL (Version 8.72). Lincolnwood, Illinois: Scientific Software International, Inc. Junker, B. W. (1996). Exploring monotonicity in polytomous item response data. Paper presented at the Annual Meeting of the National Council on Measurement in Education, New York. Junker, B. W., & Sijtsma, K. (2000). Latent and manifest monotonicity in item response models. Applied Psychological Measurement, 24, 65-81. Kamata, A., Turhan, A., & Darandari, E. (2003). Estimating reliability for multidimensional composite scale scores. Paper presented at the Annual Meeting of American Educational Research Association, Chicago. Kaplan, D. (1989a). The problem of error rate inflation in covariance structure models. Educational and Psychological Measurement, 49, 333-337. Kaplan, D. (1989b). A study of the sampling variablility and z-values of parameter estimates from misspecified structural equation models. Multivariate Behavioral Research, 2, 197-204. Kaplan, D. (2000). Structural equation modeling: Foundations and extensions. Thousand Oaks: Sage Publications. Keesling, J. W. (1972). Maximum likelihood approaches to causal analysis.Unpublished manuscript, University of Chicago. Kelloway, K. E. (1998). Using LISREL for structural equation modeling: A researchers guide. Thousand Oaks: Sage Publications. Klawans, H. L. (1988). Psychiatric side effects during the treatment of Parkinson's disease. Journal of Neural Transmission, 27, 117-122. 116

117 Kline, R. B. (1998). Principles and practice of structural equation modeling. New York: Guilford Press. Kristensson, K. (1992). Potential role of viruses in neurodegeneration. Molecular and Chemical Neuropathology, 16(1-2), 45-58. Lang, A. E. (2001). Acute orthostatic hypotension when starting dopamine agonist therapy in parkinson disease: the role of domperidone therapy. Archives of Neurology, 58(5), 835. Langston, J. W., Widner, H., Goetz, C. G., Brooks, D., Fahn, S., Freeman, T., et al. (1992). Core assessment program for intracerebral transplantations (CAPIT). Movement Disorders, 7(1), 2-13. Latash, M. L. (1998). Neurophysiological basis of movement. Champaign, IL: Human Kinetics. Lawley, D. N. (1940). The estimation of the factor loadings by the method of maximum likelihood. Proceeding of The Royal Society of Edinburgh, 60, 64-82. Loehlin, J. C. (1992). Latent variable models: An introduction to factor, path, and structural analysis (2. ed.). Hillsdale, NJ: Lawrence Erlbaum. Loevinger, J. (1947). A systematic approach to the construction and evaluation of tests of ability. Psychological Monographs, 61(4). Markham, C., Diamond, S. G., & Treciokas, L. J. (1974). Carbidopa in Parkinson's disease and in nausea and vomiting of levodopa. Archives of Neurology, 31(2), 128-133. Martignoni, E., Franchignoni, F., Pasetti, C., Ferriero, G., & Picco, D. (2003). Psychometric properties of the Unified Parkinson's Disease Rating Scale and of the Short Parkinson's Evaluation Scale. Neurological Sciences, 24(3), 190-191. Martinez-Martin, P., Garcia Urra, D., del Ser Quijano, T., Balseiro Gomez, J., Gomez Utrero, E., Pineiro, R., et al. (1997). A new clinical tool for gait evaluation in Parkinson's disease. Clinical Neuropharmacology, 20(3), 183-194. Martinez-Martin, P., Gil-Nagel, A., Gracia, L. M., Gomez, J. B., Martinez-Sarries, J., & Bermejo, F. (1994). Unified Parkinson's Disease Rating Scale characteristics and structure. The cooperative multicentric group. Movement Disorders, 9(1), 76-83. McArdle, J. J. (1980). Causal modeling applied to psychonomic systems simulation. Behavior Research Methods & Instrumentation, 12, 193-207. McArdle, J. J., & McDonald, R. P. (1984). Some algebraic properties of the reticular action model for moment structures. The British Journal of Mathematical and Statistical Psychology, 37, 234-251. McDonald, R. P. (1970). Theoretical foundations of principal factor analysis and alpha factor analysis. British Journal of Mathematical and Statistical Psychology, 23, 121. McDonald, R. P. (1978). A simple comprehensive model for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 31, 59-72. McDonald, R. P. (1979). The structural analysis of multivariate data: A sketch of a general theory. Multivariate Behavioral Research, 14, 21-38. McDonald, R. P. (1980). A simple comprehensive model for the analysis of covariance structures: Some remarks on applications. British Journal of Mathematical and Statistical Psychology, 33, 161-183.

117

118 McDonald, R. P. (1991). Faktorov analza a pbuzn metody v psychologii (P. Blahu, Trans. 1. ed.). Praha: Academia. McDonald, R. P. (2003). Notes on goodness-of-fit indices.Unpublished manuscript. McDonald, R. P., & Fraser, C. (1990). COSAN. Charlottesville: McArdle, J.J. McDonald, R. P., & Ho, R. M. (2002). Principles and practice in reporting structural equation analyses. Psychological Methods, 7(1), 64-82. McDonald, R. P., & Marsh, H. W. (1990). Choosing a multivariate model: Noncentrality and goodness-of-fit. Psychological Bulletin, 107, 247-255. Medsker, G. J., Williams, L. J., & Holahan, P. J. (1994). A review of current practices for evaluating causal models in oranizational behavior and human resources management. Journal of Management, 20, 439-464. Mkota, K., & Blahu, P. (1983). Motorick testy v tlesn vchov a sportu (1. ed.). Praha: SPN. Michielsen, H. J., Vries, J. d., Heck, G. L. v., Vijver, F. J. R. v. d., & Sijtsma, K. (2004). Examination of the dimensionality of fatigue: The construction of the Fatigue Assessment Scale (FAS). European Journal of Psychological Assessment, 20(3948). Mokken, R. J. (1971). A theory and procedure of scale analysis. The Hague: Mouton. Molenaar, I. W. (1991). A weighted Loevinger H-coefficient extending Mokken's scaling to multicategory items. Kwantitatieve Metoden, 37(12), 97-117. Molenaar, I. W., & Sijtsma, K. (2000). MSP5 for Windows. Groningen: iec ProGAMMA. Mulaik, S. A. (1972). The foundations of factor analysis. New York: McGraph-Hill. Mulaik, S. A., James, L. R., Van Alstine, J., Bennett, N., Lind, S., & Stillwell, C. D. (1989). An evaluation of goodness-of-fit indices for structural equation models. Psychological Bulletin, 105, 430-445. Muramatsu, Y., Kurosaki, R., Watanabe, H., Michimata, M., Matsubara, M., Imai, Y., et al. (2003). Cerebral alterations in a MPTP-mouse model of Parkinson's disease-an immunocytochemical study. Journal of Neural Transmission, 110(10), 11291144. Muthn, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 46, 115132. Muthn, B., & Kaplan, D. (1992). A comparison of some methodologies for the factor analysis of non-normal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45, 19-30. Nevmalov, S., Rika, E., & Tich, J. (2002). Neurologie. Prague: Galn&Karolinum. Okun, M. S., McDonald, W. M., & DeLong, M. R. (2002). Refractory nonmotor symptoms in male patients with Parkinson's disease due to testosterone deficiency: a common unrecognized comorbidity. Archives of Neurology, 59(5), 807-811. Olanow, C. W. (2004). Manganese-induced parkinsonism and Parkinson's disease. Annals of the New York Academy of Sciences, 1012, 209-223. Parkinson, J. (2002). An essay on the shaking palsy. 1817. Journal of Neuropsychiatry and Clinical Neurosciences, 14(2), 223-236; discussion 222.

118

119 Parkinsons disease. (2005). Retrieved 4.4., 2005, from http://www.parkinsonsinstitute.org/movement_disorders/parkinsons.html The Parkinsons web. (2005). Retrieved 3.3., 2005, from http://pdweb.mgh.harvard.edu/ Parkinson's disease: etiology and genetics. (2005). Retrieved 1.2., 2005, from http://www.healthguide.com Path analysis and structured linear equations. (2004). Retrieved 12.2., 2005, from http://www.polsci.wvu.edu/duval/ps401/Notes/PS401Notes_Part1.ppt#257,2,Intro duction Poewe, W. H., & Wenning, G. K. (1998). The natural history of Parkinson's disease. Annals of Neurology, 44(3 Suppl 1), S1-9. Proctor, P., & McGinness, J. E. (1970). Levodopa side-effects and the Lesch-Nyhan syndrome. Lancet, 2(7687), 1367. Rabe-Hesketh, S., & Skrondal, A. (2005). Tutorial: Generalized latent variable modeling. Retrieved 23.8., 2005, from http://www.crm.umontreal.ca/Latent05/pdf/rabe_tutorial.pdf Ramaker, C., Marinus, J., Stiggelbout, A. M., & Van Hilten, B. J. (2002). Systematic evaluation of rating scales for impairment and disability in Parkinson's disease. Movement Disorders, 17(5), 867-876. Raykov, T., & Marcoulides, G. A. (2000). A first course in structural equation modeling. London: Lawrence Erlbaum Associates. Reynolds, N., & Montgomery, G. (1987). Factor analysis of Parkinson's impairment. An evaluation of the final common pathway. Arch Neurol, 44(10), 1013-1016. Richards, M., Marder, K., Cote, L., & Mayeux, R. (1994). Interrater reliability of the Unified Parkinson's Disease Rating Scale motor examination. Movement Disorders, 9(1), 89-91. Rindskopf, D. (1984). Structural equation models: Empirical identification, Heywood cases and related problems. Sociological Methods and Research, 13, 109-119. Roth, J., Sekyrov, M., & Rika, E. (1999). Parkinsonova nemoc (2. ed.). Prague: Maxdorf. Rybicki, B. A., Johnson, C. C., Peterson, E. L., Kortsha, G. X., & Gorell, J. M. (1999). A family history of Parkinson's disease and its effect on other PD risk factors. Neuroepidemiology, 18(5), 270-278. Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psychological Assesment, 4, 350-353. Schumacker, R. E., & Beyerlein, S. T. (2000). Confirmatory factor analysis with different correlation types and estimation methods. Structural Equation Modeling: A Multidisciplinary Journal, 7(4), 629-636. Siderowf, A., McDermott, M., Kieburtz, K., Blindauer, K., Plumb, S., & Shoulson, I. (2002). Test-retest reliability of the Unified Parkinson's Disease Rating Scale in patients with early Parkinson's disease: results from a multicenter clinical trial. Movement Disorders, 17(4), 758-763. Sijtsma, K., Debets, P., & Molenaar, I. W. (1990). Mokken scale analysis for polychotomous items: Theory, a computer program and an application. Quality & Quantity, 24, 173-188.

119

120 Sijtsma, K., & Hemker, B. T. (1998). Nonparametric polytomous IRT models for invariant item ordering, with results for parametric models. Psychometrika, 63, 183-200. Sijtsma, K., & Molenaar, I. W. (1987). Reliability of test scores in nonparametric item response theory. Psychometrika, 52(1), 79-97. Sijtsma, K., & Molenaar, I. W. (2002). Introduction to nonparametric item response theory (Vol. 5). London: Sage Publications. Sijtsma, K., & Van der Ark, L. A. (2005). Progress in NIRT analysis of polytomous item scores: Dillemas and practical solutions. Retrieved 12.5., 2005, from http://spitswww.uvt.nl/~avdrark/research/book2001.pdf Spearman, C. (1904). General intelligence, objectively determined and measured. American Journal of Psychology, 15, 189-226. Stebbins, G. T. (2004). Personal communication. Stebbins, G. T., & Goetz, C. G. (1998). Factor structure of the Unified Parkinson's Disease Rating Scale: Motor examination section. Movement Disorders, 13(4), 633-636. Stebbins, G. T., Goetz, C. G., Lang, A. E., & Cubo, E. (1999). Factor analysis of the motor section of the Unified Parkinson's Disease Rating Scale during the offstate. Movement Disorders, 14(4), 585-589. Steiger, J. H. (1990). Structural model evaluation and modification: An interval estimation approach. Multivariate Behavioral Research, 25, 173-180. Structural equation modeling. (2005). Retrieved 9.3., 2005, from http://www2.chass.ncsu.edu/garson/pa765/structur.htm Takane, Y., & Leeuw, J. (1987). On the relationships between item response theory and factor analysis of discretized variables. Psychometrika, 52(3), 393-408. Tanner, C. M. (1999). Parkinson study group. Pramipexole in ethnic minority with Parkinson's disease. Movement Disorders, 14, Suppl. 5. Thurstone, L. L. (1935). The vectors of the mind. Chicago: University of Chicago Press. tochl, J. (2002). kly pro hodnocen plaveck rovn pedkolnch dt. Unpublished diploma thesis, Charles University, Prague. Tomeov, E. (2003). Physical self-perception profile: Factorial validity of the czech version. Paper presented at the 11th European Congress of Sport Psychology, Copenhagen. Tracik, F., & Ebersbach, G. (2001). Sudden daytime sleep onset in Parkinson's disease: polysomnographic recordings. Movement Disorders, 16(3), 500-506. Tucker, L. R., & Lewis, C. (1973). A reliability coeffficient for maximum likelihood factor analysis. Psychometrika, 38, 1-10. Urbnek, T. (2000). Strukturln modelovn v psychologii. Brno: Nakladatelstv Pavel Kepela. Van der Ark, L. A. (2000). On stochastic ordering in polytomous IRT models; A simulation study. Paper presented at the Fifth International Conference on Logic and Methodology, Amsterdam. Vieregge, P., Stolze, H., Klein, C., & Heberlein, I. (1997). Gait quantification in Parkinson's disease: locomotor disability and correlation to clinical rating scales. Journal of Neural Transmission, 104, 237-248.

120

121 Wiley, D. E. (1973). The identification problem in structural equation models with unmeasured variables. In A. S. Goldberger & O. D. Duncan (Eds.), Structural equation models in the social sciences. New York: Academic Press. Worldwide education and awareness for movement disorders. (2005). Retrieved 15.5., 2005, from http://www.wemove.org/par/par.html Wothke, W. (1992). Nonpositive definite matrices in structural modeling. In Sociological methods and research. Beverly Hills: Sage. Wright, S. (1918). On the nature of size factors. Genetics, 3, 367-384. Wright, S. (1921). Correlation and causation. Journal of Agriculture Research, 20, 557585.

121

122 APPENDICES Motor Section of Unified Parkinson's Disease Rating Scale


1. Speech 0 = Normal. 1 = Slight loss of expression, diction and/or volume. 2 = Monotone, slurred but understandable; moderately impaired. 3 = Marked impairment, difficult to understand. 4 = Unintelligible. 2. Facial Expression 0 = Normal. 1 = Minimal hypomimia, could be normal "Poker Face". 2 = Slight but definitely abnormal diminution of facial expression 3 = Moderate hypomimia; lips parted some of the time. 4 = Masked or fixed facies with severe or complete loss of facial expression; lips parted 1/4 inch or more. 3. - 7.Tremor at rest (head, upper and lower extremities) 0 = Absent. 1 = Slight and infrequently present. 2 = Mild in amplitude and persistent. Or moderate in amplitude, but only intermittently present. 3 = Moderate in amplitude and present most of the time. 4 = Marked in amplitude and present most of the time. 8. - 9. Action or Postural Tremor of hands 0 = Absent. 1 = Slight; present with action. 2 = Moderate in amplitude, present with action. 3 = Moderate in amplitude with posture holding as well as action. 4 = Marked in amplitude; interferes with feeding. 10. - 14. Rigidity (Judged on passive movement of major joints with patient relaxed in sitting position. Cogwheeling to be ignored.) 0 = Absent. 1 = Slight or detectable only when activated by mirror or other movements. 2 = Mild to moderate. 3 = Marked, but full range of motion easily achieved. 4 = Severe, range of motion achieved with difficulty. 15. 16. Finger Taps (Patient taps thumb with index finger in rapid succession.) 0 = Normal. 1 = Mild slowing and/or reduction in amplitude. 2 = Moderately impaired. Definite and early fatiguing. May have occasional arrests in movement. 3 = Severely impaired. Frequent hesitation in initiating movements or arrests in ongoing movement. 4 = Can barely perform the task. 17. 18. Hand Movements (Patient opens and closes hands in rapid succesion.) 0 = Normal. 1 = Mild slowing and/or reduction in amplitude. 2 = Moderately impaired. Definite and early fatiguing. May have occasional arrests in movement. 3 = Severely impaired. Frequent hesitation in initiating movements or arrests in ongoing movement. 4 = Can barely perform the task.

122

123
19. 20. Rapid Alternating Movements of Hands (Pronation-supination movements of hands, vertically and horizontally, with as large an amplitude as possible, both hands simultaneously.) 0 = Normal. 1 = Mild slowing and/or reduction in amplitude. 2 = Moderately impaired. Definite and early fatiguing. May have occasional arrests in movement. 3 = Severely impaired. Frequent hesitation in initiating movements or arrests in ongoing movement. 4 = Can barely perform the task. 21. 22. Leg Agility (Patient taps heel on the ground in rapid succession picking up entire leg. Amplitude should be at least 3 inches.) 0 = Normal. 1 = Mild slowing and/or reduction in amplitude. 2 = Moderately impaired. Definite and early fatiguing. May have occasional arrests in movement. 3 = Severely impaired. Frequent hesitation in initiating movements or arrests in ongoing movement. 4 = Can barely perform the task. 23. Arising from Chair (Patient attempts to rise from a straightbacked chair, with arms folded across chest.) 0 = Normal. 1 = Slow; or may need more than one attempt. 2 = Pushes self up from arms of seat. 3 = Tends to fall back and may have to try more than one time, but can get up without help. 4 = Unable to arise without help. 24. Posture 0 = Normal erect. 1 = Not quite erect, slightly stooped posture; could be normal for older person. 2 = Moderately stooped posture, definitely abnormal; can be slightly leaning to one side. 3 = Severely stooped posture with kyphosis; can be moderately leaning to one side. 4 = Marked flexion with extreme abnormality of posture. 25. Gait 0 = Normal. 1 = Walks slowly, may shuffle with short steps, but no festination (hastening steps) or propulsion. 2 = Walks with difficulty, but requires little or no assistance; may have some festination, short steps, or propulsion. 3 = Severe disturbance of gait, requiring assistance. 4 = Cannot walk at all, even with assistance. 26. Postural Stability (Response to sudden, strong posterior displacement produced by pull on shoulders while patient erect with eyes open and feet slightly apart. Patient is prepared.) 0 = Normal. 1 = Retropulsion, but recovers unaided. 2 = Absence of postural response; would fall if not caught by examiner. 3 = Very unstable, tends to lose balance spontaneously. 4 = Unable to stand without assistance. 27. Body Bradykinesia and Hypokinesia (Combining slowness, hesitancy, decreased armswing, small amplitude, and poverty of movement in general.) 0 = None. 1 = Minimal slowness, giving movement a deliberate character; could be normal for some persons. Possibly reduced amplitude. 2 = Mild degree of slowness and poverty of movement which is definitely abnormal. Alternatively, some reduced amplitude. 3 = Moderate slowness, poverty or small amplitude of movement. 4 = Marked slowness, poverty or small amplitude of movement.

123

124 MS UPDRS Data Sheet

124

125

125

126 Exploratory Mokken's Scale Analysis for Non-trichotomized Data and Cutoff Criterion of Hi >0.3
Item Speech Facial expression Tremor at rest FLC RUE LUE RLE LLE Action/postural tremor Rigidity Right Left H/N RUE LUE RLE LLE Finger taps Hand movements Rapid altern. Mov. Leg agility Arise from chair Posture Gait Postural stability Body bradykinesia Scale H Reliability (Rho) Right Left Right Left Right Left Right Left 0.44 0.35* 0.42 0.42 0.45 0.48 0.52 0.48 0.51 0.50 0.52 0.50 0.52 0.56 0.52 0.51 0.52 0.57 0.48 0.46 0.67 0.50 0.70 0.41* 0.48* 0.46 0.49 0.50 0.53 Hi of subscale 1 0.41 0.40 Hi of subscale 2 Hi of subscale 3

Cronbach's alpha 0.94 0.61 0.64 RUE Right Upper Extremity; LUE Left Upper Extremity; RLE Right Lower Extremity; LLE Left Lower Extremity; H/N Head, Neck; FLC Face, Lips, Chin; Right Right Extremity; Left Left Extremity The item with the lowest value of Hi in each subscale is marked by an asterisk

126

127 Exploratory Mokken's Scale Analysis for Non-trichotomized Data and Cutoff Criterion of Hi >0.4
Item Speech Facial expression Tremor at rest FLC RUE LUE RLE LLE Act./post. tremor Rigidity Right Left H/N RUE LUE RLE LLE Finger taps Right Left Hand movements Right Left Rapid altern. Mov. Leg agility Arise from chair Posture Gait Postural stability Body bradykinesia Scale H Reliability (Rho) Right Left Right Left 0.54* 0.58 0.54 0.58 0.55 0.58 0.57 0.58 0.64 0.57 0.58 0.59 0.63 0.58 0.59 0.72 0.57 0.65 0.55 0.67 0.55* 0.57 0.61 0.61 0.62 Hi of subscale Hi of subscale Hi of subscale Hi of subscale Hi of subscale 1 2 3 4 5 0.72* 0.72 0.57* 0.55* 0.57 0.55

Cronbach's alpha 0.94 0.85 0.76 0.53 0.56 RUE Right Upper Extremity; LUE Left Upper Extremity; RLE Right Lower Extremity; LLE Left Lower Extremity; H/N Head, Neck; FLC Face, Lips, Chin; Right Right Extremity; Left Left Extremity; Act./post. Action/postural The item with the lowest value of Hi in each subscale is marked by an asterisk

127

128 Parameter Estimates, Standard Errors and T-values of Selected Models a) The one-factor model of tremor
Tremor .59 (.06) TrFLC 9.72 .66 (.05) TrRUE 14.15 .82(.04) TrLUE 21.86 .73 (.05) TrRLE 14.29 .73 (.05) TrLLE 13.77 .57 (.05) ATrRhand 10.42 .61 (.06) ATrLhand 10.95 Upper values: Parameter estimate (standard error of the estimate) Lower value: t value Unique var .65 (.12) 5.36 .56 (.12) 4.79 .33 (.12) 2.82 .46 (.12) 3.71 .46 (.13) 3.62 .68 (.11) 6.07 .63 (.12) 5.23

b) The hierarchical model of tremor


TremorR TrFLC TrRUE TrLUE TrRLE TrLLE AtrRhand AtrLhand TremorR TremorL Tremor Upper values: Parameter estimate (standard error of the estimate) Lower value: t value .64 (.19) 3.35 .65 (.12) 5.37 .74 (.07) 10.05 .88 (.09) 10.01 1.00 .81 (.25) 3.31 .75 (.13) 5.77 .78 .91 TremorL Tremor .65 (.07) 8.73 Unique var .58 (.14) 4.13 .40 (.20) 2.01 .17 (.17) .98 .34 (.16) 2.15 .43 (.15) 2.94 .59 (.14) 4.34 .58 (.14) 4.22

128

129 c) The four-factor model of rigidity and bradykinesia


Rig 0.85 (0.04) 19.62 0.65 (0.04) 15.31 0.70 (0.04) 19.41 0.83 (0.03) 25.43 0.79 (0.03) 22.72 Brad RigBradR RigBradL Unique var .28 (.12) 2.25 .34 (.11) 3.23 .22 (.10) 2.20 .20 (.10) 2.00 .17 (.10) 1.71 .18 (.12) 1.46 .17 (.11) 1.57 .16 (.11) 1.45 .14 (.10) 1.35 .23 (.11) 2.05 .17 (.11) 1.51 .26 (.12) 2.08 .31 (.11) 2.69

Neck RUE LUE RLE LLE Frhand Flhand Hrhand Hlhand Rrhand Rlhand Lrleg Llleg Rig Brad RigBradR RigBradL

.48 (.05) 9.48 .53 (.05) 10.75 .32 (.05) 5.98 .46 (.05) 9.62 .69 (.05) 13.84 .68 (.05) 14.49 .69 (.06) 11.54 .68 (.05) 12.79 .73 (.05) 14.86 .67(.05) 12.64 .85 (.05) 17.10 .76 (.05) 16.62 .59 (.06) 10.03 .60 (.05) 11.49 .61 (.07) 8.78 .63 (.06) 10.76 .48 (.07) 6.97 .62 (.06) 10.92 .12 (.11) 1.13 .34 (.07) 4.78

1.00 .54 (.05) 10.48 1.00 1.00 1.00

Upper values: Parameter estimate (standard error of the estimate) Lower value: t value

129

130 d) The one-factor model of axial/gait bradykinesia


BBrad .92 (.02) Arising 43.48 .87 (.02) Posture 40.96 .85 (.03) Gait 33.36 .87 (.02) Stabil 46.51 .80 (.03) BodyBra 28.16 Upper values: Parameter estimate (standard error of the estimate) Lower value: t value Unique var .15 (.11) 1.44 .25 (.11) 2.34 .27 (.11) 2.52 .25 (.10) 2.35 .36 (.11) 3.33

e) The seven-factor model of the MS UPDRS


Face .87 (.03) 25.34 .80 (.04) 19.14 Tremor Rig Brad BBrad Right Left Unique var .24 (.12) 1.99 .36 (.13) 2.75 .59 (.15) 3.94 .39 0.12) 3.28 .23 (.12) 1.95 .40 (.12) 3.35 .45 (.13) 3.50 .55 (.13) 4.36 .52 (.12) 4.23 .28 (.12) 2.42 .31 (.11) 2.80 .17 (.10) 1.66 .23 (.11) 2.17 .15 (.10) 1.47 .20 (.10) 2.02

Speech Facial TrFLC TrRUE TrLUE TrRLE TrLLE ATrRhand ATrLhand Neck RUE LUE RLE LLE FRhand

.64 (.08) 7.83 .65 (.06) 10.87 .73 (.06) 12.16 .77 (.07) 11.47 .64 (.06) 10.08 .52 (.07) 7.42 .49 (.07) 7.09 .85 (.04) 21.83 .61 (.04) 15.99 .73 (.03) 21.51 .79 (.03) 22.90 .81 (.03) 25.35 .66 (.04) 16.33

.43 (.13) 3.17 .49 (.10) 4.67 .11 (.17) 0.66 .38 (.16) 2.41 .42 (.13) 3.19 .48 (.12) 4.07

.57 (.06) 8.75 .55 (.07) 8.43 .39 (.06) 6.40 .44 (.07) 6.40 .60 (.07) 9.17

130

131
(continued) Face FLhand HRhand HLhand RRhand RLhand LRleg LLleg Arising Posture Gait Stabil BodyBra Face Right Left Tremor Rig 1.00 1.00 1.00 1.00 Tremor Rig Brad .72 (.03) 20.83 .67 (.04) 16.20 .74 (.04) 20.98 .70 (.04) 19.35 .75 (.03) 21.41 .79 (.03) 27.17 .80 (.03) 30.08 BBrad Right Left .55 (.05) 11.76 Unique var .18 (.10) 1.72 .18 (.10) 1.88 .15 (.10) 1.51 .24 (.11) 2.19 .16 (.10) 1.55 .33 (.11) 2.98 .28 (.11) 2.53 .15 (.11) 1.38 .27 (.11) 2.47 .33 (.11) 2.95 .29 (.11) 2.66 .22 (.11) 2.02

.61 (.06) 10.33 .55 (.05) 11.43 .52 (.05) 9.47 .53 (.05) 10.42 .23 (.07) 3.19 .26 (.06) 4.68 .92 (.02) 41.16 .86 (.02) 35.96 .82 (.03) 30.88 .85 (.02) 37.56 .88 (.02) 35.64

.54 (.06) 1.00 8.60 .62 (.07) .53 (.05) Brad 8.41 9.94 .67 (.03) .59 (.06) BBrad 20.05 10.11 Upper values: Parameter estimate (standard error of the estimate) Lower value: t - value

1.00 .85 (.03) 27.36 1.00

131

You might also like