You are on page 1of 36

Programmability in SPSS 16 and 17

Jon K Peck Technical Advisor and Principal Software Engineer Athens, May 2008

Agenda
Review of programmability The Extension mechanism and the PROPOR

procedure User-Defined dialog boxes The Dataset class and comparing datasets Examples: custom sorting, pattern matching Building applications that embed SPSS Integrating R into SPSS Q and A Wrap up

Programmability extends the standard SPSS capabilities


Makes it easy to build jobs that respond to data, output,

and the environment


Allows greater generality, more automation Makes jobs more flexible and robust Allows extending the capabilities of SPSS Allows the use of existing or new statistical modules

written in R or Python
Enables simpler and more maintainable code Increases your productivity Puts you in control

More fun

Programmability embeds Python or R inside SPSS


SPSS syntax

BEGIN PROGRAM PYTHON or R. Python or R code END PROGRAM. SPSS syntax A program in the SPSS input stream can communicate with SPSS and control it and use the language's facilities and modules A Python or .NET application can embed SPSS inside itself Resources and forums are at SPSS Developer Central www.spss.com/devcentral
Programmability plugins are an optional install

Example: Automate the job of finding unlabelled values of a variable


BEGIN PROGRAM. No label may indicate import spss, spssaux, spssdata an error def findUnlabelledValues(name): d = spssaux.VariableDict() labels = set(d[name].ValueLabelsTyped) data = spssdata.Spssdata(indexes=[name]) values = set() for case in data: values.add(case[0]) data.close() values.discard(None) print "\nUnlabeled Values:\n",sorted(values.difference(labels)) findUnlabelledValues("origin") END PROGRAM.

Unlabeled Values: [4.0, 7.0, 11.0]

Python and R Are open source software

SPSS is not the owner or licensor of the Python or R software. Any user of Python or R must agree to the terms of the license agreement located on the Python or R web site. SPSS is not making any statement about the quality of the Python or R programs. SPSS fully disclaims all liability associated with your use of the Python or R programs.

SPSS is divided into two parts


The SPSS Processor: invisible
Syntax processing Computation Data handling Procedures May be remote with SPSS Server

The SPSS Front End: what you see


Menus and dialog boxes Output Viewer Data Editor Syntax window

SPSS 16 added new programmability and scripting features


SPSS 15
SPSS Processor
SPSS syntax Python programs .NET programs

SPSS 16 SPSS Processor


SPSS syntax Python programs .NET programs R programs Extensions

SPSS Front End


SaxBasic scripts COM support

SPSS Front End


Basic scripts (Windows) COM support (Windows) Python scripts

Scripting is useful for working with Viewer contents


Scripts can be written in Python or, on Windows, in

Basic Python apis have a structure similar to familiar SaxBasic scripting


Import the spssClient module

IDEs are provided for Python and Basic SPSS 17 will allow programs to use the spssClient

module Autoscripts are triggered by specified types of output events


E.g., creating a table of regression coefficients

Autoscripts have been generalized in SPSS 16

The EXTENSION mechanism turns Python or R programs into user-defined SPSS syntax
Python and R add great functionality to SPSS Many users know only SPSS syntax

MEANS TABLES = accel BY origin


/CELLS MEAN COUNT STDDEV MEDIAN /STATISTICS LINEARITY. Extensions define SPSS syntax for programs via XML Definitions are loaded automatically on SPSS startup Parsed syntax is passed to Python or R module User never needs to know about the programs Author never needs to parse SPSS syntax PLS module in SPSS 16 is an extension

Extensions simplify the author's job


User's SPSS Syntax Author code Module Run

SPSS Parser

Output

Extension XML

extension module Template parsecmd

The author supplies only the gold parts The user just enters the command syntax

PROPOR is a new extension procedure


Calculates confidence intervals for proportions

Developer Central

Produces pivot table output PROPOR /HELP. Confidence Intervals for Proportions and Differences in Proportions. PROPOR /HELP displays this help and does nothing else. Syntax: PROPOR NUM=list DENOM=list [ID=varname] [/DATASET NAME=dsname] [/LEVEL ALPHA=value] [/HELP] Example: PROPOR NUM= 55 DENOM=100.

PROPOR produces a pivot table of confidence intervals

What about user interfaces?


S P S S 1 7

User-defined dialog boxes look like SPSS-defined dialogs

Which is the real one?

SPSS 17

Programmability can enhance procedures: A program to customize sorting in CTABLES


CTABLES /TABLE occupation[COUNT]

/CATEGORIES VARIABLES=occupation ORDER=D KEY=COUNT TOTAL=YES. This table is sorted in descending order, but category Other should be at the bottom.

A Program To Customize Sorting in Ctables


import spss, spssaux2 spssaux2.genCategoryList("occupation", specialvalues=[4], macroname="other") spss.Submit("""CTABLES /TABLE occupation[COUNT] /CATEGORIES VARIABLES=occupation [!other] TOTAL=YES.""")

Python regular expressions greatly simply tasks involving patterns in strings


A regular expression defines a pattern that can be

searched for or used in a replace Example: a dataset contains three variables, firstname, lastname, and narrative. The names need to be replaced in the narratives so that they are anonymous Sample data:

Using regular expressions to work with patterns: Making a narrative anonymous


begin program. import spss, spssaux, spssdata, re vard = spssaux.VariableDict() curs = spssdata.Spssdata(indexes='firstname lastname narrative', accessType='w') curs.append(spssdata.vdef("anonnarrative", vtype=vard['narrative'].VariableType + 100)) curs.commitdict() wbound = r"\b" for case in curs: fnregex = re.compile(wbound + case.firstname.strip() + wbound, flags=re.IGNORECASE) lnregex = re.compile(wbound + case.lastname.strip() + wbound, flags=re.IGNORECASE) E.g. \bSmith\b newnarr = fnregex.sub("-firstname-", case.narrative) newnarr = lnregex.sub("-lastname-", newnarr) curs.casevalues([newnarr]) curs.CClose() end program.

Before and After

The Dataset class delivers new functionality for data management


Available for Python and .NET Retrieve, add, delete and change variables,
S P S S 1 6

properties, and values Process multiple datasets at the same time Access any case by case number Included in the spss module in the plug-in
ds = spss.Dataset() ds.varlist['accel'].label = "acceleration" #change label print len(ds.cases) ds.cases[10,2] = [100] #change a value

comparedatasets uses the Dataset class to compare cases and variables in two datasets
BEGIN PROGRAM. import spss, comparedatasets c = comparedatasets.CompareDatasets("first", "second", idvar="id", diffcount="differences", reportroot="compare") As an extension: c.cases() c.dictionaries() COMPDS DS1=first, DS2=second /DATA ID=id DIFFCOUNT=differences c.close() ROOTNAME=compare. END PROGRAM.

Developer Central

Comparedatasets: The output dataset reports case differences

comparedatasets: A summary is written to the SPSS Viewer

SPSS 17 will have a built-in procedure

You can do selection, summary statistics, and charts on the outcome variables for further information.

The Dataset class makes it easy to use the functions in the extendedTransforms module
data list fixed /dt(a21). begin data. 2/22/2008 11:47:45 AM 2/22/2008 11:47:45 PM end data. begin program. import spss, extendedTransforms spss.StartDataStep() ds = spss.Dataset() ds.varlist.append("newdt", 0) ds.varlist[-1].format = (22,22,0) # DATETIME22.0 format

strtodatetime and datetimetostr allow patterns to be used for dates and times 14 functions in extendedTransforms

for i, case in enumerate(ds.cases): ds.cases[i, -1] = extendedTransforms.strtodatetime(case[0], "%m/%d/%Y %I:%M:%S %p") spss.EndDataStep() end program. Developer Central

You can write applications where SPSS is hidden using external drives mode

Application built by SPSS Services

A Reporting Application

Real names have been scrambled

The application was built with Python, SPSS, and standard Python packages
Written entirely in Python Uses SPSS invisibly for calculation and charting Output is captured with the Output Management

System (OMS) Uses free packages to supplement SPSS


wxPython for user interface Reportlab for PDF production

Similar things could be done with .NET

R programs can be run inside SPSS


SPSS datasets and output can be processed by R New SPSS datasets can be created from R R can communicate with SPSS via 30 apis

BEGIN PROGRAM R. cases <- spssdata.GetDataFromSPSS(c("mpg", "accel"), 5) spsspivottable.Display(cases, collabels=c("mpg", "accel")) END PROGRAM. Output appears in the SPSS Viewer spsspivottable.Display produces pivot tables print() produces plain text SPSS 17 will include graphical output

R brings many statistical methods into SPSS

52 packages starting with "a"

Example: Estimate Rents Using the R Package kknn: K Nearest Neighbors


BEGIN PROGRAM R.

dict <- spssdictionary.GetDictionaryFromSPSS() data <-spssdata.GetDataFromSPSS() library(kknn) kl <- c("rectangular","triangular","epanechnikov", "gaussian","rank") t.con <-train.kknn(nmqm ~ wfl + bjkat + zh, data=data, kmax=25, kernel=kl) print(t.con) newv <- spssdictionary.CreateSPSSDictionary(c("predictedRent", "Predicted Rent", 0, "F8.2", "scale")) spssdictionary.SetDictionaryToSPSS("newrents", data.frame(dict, newv)) best <- (charmatch(t.con$best.parameters$kernel, klist)-1) * 25 + t.con$best.parameters$k spssdata.SetDataToSPSS("newrents", data.frame(c(t.con$fitted.values[[best]]), data)) spssdictionary.EndDataStep()
END PROGRAM. (Adapted from an Example in the kknn Package)

R output appears in the Viewer. The output data appear in the Data Editor

Where We Have Been Today


Programmability adds flexibility and power to SPSS The extension mechanism integrates programs better

into SPSS syntax The new Dataset class adds data management power The new scripting capabilities provide more ways to work with output R integration opens a large collection of statistical techniques to SPSS users

Questions and Answers

? ? ? ? ? ? ?

In Conclusion
Programmability capabilities continue to grow Opening up SPSS puts you in control through plugging in

your own code More tasks can be automated You can easily tap large R and Python libraries New capabilities extend data management The Extension mechanism integrates capabilities with a consistent syntax

Tell us about your programmability experiences

Jon Peck, Ph. D. SPSS Inc 233 S Wacker Drive Chicago, IL 60606 peck@spss.com

You might also like