You are on page 1of 3

Case Selection

You can select cases by entering a case-selection statement in the box


below. The case-selection or "where" statement has the following
form:

where variable expression relational operator condition

Variable Expressions
You can specify a single variable or an expression involving several
variables.

To insert a variable name, select the variable in the list box at the right
and then double-click.

All of the usual arithmetic operators [+ - / * ( ) ] are available for use in


variable expressions.

Relational Operators
The following relational operators are available:

= equals
!= not equal
< less than
> greater than
<= less than or equal
>= greater than or equal
& and
| or
, or (comma, used in a series)
! not

The modulus operator is also available:

% the remainder after division by the following


operand

Examples
Examples of selection conditions given by "where" expressions are:

where sex = 1 & age < 50


where (income + benefits)/famsize < 4500
where income1 >= 20000 | income2 >= 20000
where income1 >= 20000 & income2 >= 20000
where dept = "auto loan"
where 'estimated price/earnings' > 50

Note that strings used in case-selection expressions need not be


enclosed in double quotes unless they contain embedded blanks or
logical or arithmetic operators. Fieldnames which contain blanks or
such illegal characters must be enclosed in single quotes.

Wildcards ( * or ? ) are available to select subgroups of string


variables.
where account = ????22
where id = 3*

(The first statement will select any accounts that have 2's as the 5th and
6th characters in the string, while the second statement will select
strings of any length that begin with 3.)

The comma operator ',' is used to list different values of the same
variable name that will be used as selection criteria. It allows you to
bypass lengthy "or" expressions when giving lists of conditional
values.

where state = CA,WA,OR,AZ,NV


where caseid != 22*,30??,4?00

Missing Values
You can test to see that any variable is missing by comparing it to the
special, internal variable, '_missing'. For example

where income != _missing

Selecting Cases
You can select specific rows by using the special, internal
variable ‘_rownum’. For example, you can select the first 50
cases by using the following expression:

where _rownum < 51

or you can select every fifth case by using the modulus


operator:

where _rownum % 5 = 0

Sampling Functions
Three functions are available for sampling.

The first

samp_rand(prop)

allows for simple random sampling. Each case is selected with a


probability equal to 'prop'.

The second

samp_fixed(sample_size,total_observations)

selects a random sample of fixed size. The first case is drawn with a
probability of 'sample_size/total_observations', and the succeeding i'th
case is drawn with a probability of '(sample_size - hits) /
(total_observations - i)'

Finally, a third function

samp_syst(n)
performs a systematic sample of every n'th case after a random start.

Expressions are evaluated from left to right. You can thus sample from
a subset of your cases by subsetting them first and then sampling. For
instance to take a random half of high school graduates use:

where schooling >= 12 & samp_rand(.5)

You might also like