You are on page 1of 30

Chapter 2

Getting Your Data into SAS

Section 1

METHODS TO GET YOUR DATA INTO


SAS

Methods for Getting Data In


Normally, our data is outside of the SAS environment. We need to get them in.
There is always a way to get the data into SAS no matter where it resides.
There are four general categories to get your data into SAS:
1. Entering data directly into SAS data sets. (viewtable)
2. Creating SAS data sets from raw data files. (using DATA steps, IMPORT)
3. Converting other softwares data files into SAS data sets. (IMPORT)
4. Reading other softwares data files directly. (SAS / ACCESS out of this courses
scope)
Of course, the method will depend on where your data is located and what tools are
available to you.

Reading Data into SAS

There are 4 options to read data into SAS for us:

1.

2.
3.
4.

INFILE statement (external raw data) in DATA Step


a.
List input
b.
Column input
c.
Input with informats
Using cards or datalines statements (internal raw data)
Importing the data set via SAS interface not available in SAS Studio `
Importing the data set via PROC IMPORT not covered in this course

Entering data with DATA Step


When to use DATA steps?
1.
2.
3.
4.

To get your data into SAS


To make the data set SAS compatible (although this is not the case for our courses
datasets)
Sometimes, a part of the variables or observations are needed to be extracted for
further analysis
Different procedures may require the same data set in different format. A data step
is needed to transform the dataset into the appropriate format for a procedure

We will be talking about this method a lot during this chapter and the next couple of
chapters.

INFILE Statement

INFILE statement
Format:
DATA <dataset_name>;
INFILE <path_of_the_file>;
INPUT <names_of_variables>;

The input type can be


List
Column
Informats

UNIX: INFILE /home/mydir/president.dat; /*for SAS Studio*/


Windows: INFILE c:\MyDir\President.dat; /*for Base SAS*/

SAS log gives very valuable information.

Data Separated with Spaces

It is very easy, but easy comes with some limitations:


Values are separated with at least one space,
Values are at most 8 characters long,
Data must be read all at once, no skipping over any data
No dates, if possible, dates need special care.
Despite the limitations, it is a very popular method with raw data.

INPUT Name $ Age Height;

It is also called list input because of the format of the statement. The variable names are
listed in the INPUT statement by the order of appearance in the data set.

Example 2.1.1 Data Separated with Spaces


Your hometown has been overrun with toads this year. A local resident, having heard of frog
jumping in California, had the idea of organizing a toad jump to cap off the annual town fair.
For each contestant you have the toads name, weight, and the jump distance from
three separate attempts. If the toad is disqualified for any jump, then a period is used to
indicate missing data. Here is what the data file ToadJump.dat looks like:

Lucky 2.3 1.9 . 3.0


Spot 4.6 2.5 3.1 .5
Tubs 7.1 . . 3.8
Hop 4.5 3.2 1.9 2.6
Noisy 3.8 1.3 1.8 1.5
Winner 5.7 . . .

Data Arranged in Columns

Also called column input.


There are no delimiters. Instead, each of the variables values always start on the same
column in the dataset.
Values are characters or standard numeric. Numeric values cannot have thousandseparators or special date formats.
Advantages over list input

No need for spaces between values


Missing values can be left blank
Character data can have embedded spaces. This is a very good sign to use this input type.
You can skip unwanted variables.

INPUT Name $ 1-10 Age 11-13 Height 14-18;

Example 2.1.2 Data Arranged in Columns


The local minor league baseball team, the Walla Walla Sweets, is keeping records about
concession sales. A ballpark favorite are the sweet onion rings which are sold at the
concession stands and also by vendors in the bleachers. The ballpark owners have a
feeling that in games with lots of hits and runs more onion rings are sold in the bleachers
than at the concession stands. They think they should send more vendors out into the
bleachers when the game heats up, but need more evidence to back up their feelings.
For each home game they have the following information: name of opposing team,
number of onion ring sales at the concession stands and in the bleachers, the
number of hits for each team, and the final score for each team. The following is a
sample of the data file named OnionRing.dat.
For your reference, a column ruler showing the column numbers has been placed above
the data:
/*----+----1----+----2----+----3----+----4
Columbia Peaches 35 67 1 10 2 1
Plains Peanuts 210 2 5 0 2
Gilroy Garlics 151035 12 11 7 6
Sacramento Tomatoes 124 85 15 4 9 1

Reading Data into SAS (non-standard formats)

Informats are useful anytime you have non-standard data.


There are three general types of informats: character, numeric, and date.

Character
$informatw.

Numeric Date
informatw.d
informatw.

The $ indicates character informats,


w is the total width,
d is the number of decimal places (numeric informats only).

INPUT Name $10. Age 3. Height 5.1 BirthDate MMDDYY10.;

The period is very important. It is often overlooked.

A list of useful informats

Example 2.1.3

This example illustrates the use of informats for reading data. The following data file,
Pumpkin.dat, represents the results from a local pumpkin-carving contest. Each line includes the
contestants name, age, type (carved or decorated), the date the pumpkin was entered, and the
scores from each of five judges.
Alicia Grossman 13 c 10-28-2008 7.8 6.5 7.2 8.0 7.9
Matthew Lee 9 D 10-30-2008 6.5 5.9 6.8 6.0 8.1
Elizabeth Garcia 10 C 10-29-2008 8.9 7.9 8.5 9.0 8.8
Lori Newcombe 6 D 10-30-2008 6.7 5.6 4.9 5.2 6.1
Jose Martinez 7 d 10-31-2008 8.9 9.510.0 9.7 9.0
Brian Williams 11 C 10-29-2008 7.8 8.4 8.5 7.9 8.0

Column Pointer @

When using mix input styles, there is one possible complication.


When SAS reads a line of raw data it uses a pointer to mark its place, but each style
of input uses the pointer differently.
With list style input, SAS automatically scans to the next non-blank field and starts
reading.
With column style input, SAS starts reading in the exact column you specify.
But with formatted input, SAS just starts readingwherever the pointer is, that is
where SAS reads.
@n moves the pointer to the nth column.

Example 2.1.4 Challenge

Try to import NatPark.dat file to SAS by writing your code. (OPTIONAL but good exercise)
Try to use all the methods we have learned so far. Use list input, column input, informats and
column pointer in the same INPUT statement.
There is no single correct solution for the problem. Use your imagination. The output should
look something like this:

Solution will be provided in the Husky CT forums.

INFILE Statement Options

FIRSTOBS =
OBS =
MISSOVER
TRUNCOVER
DLM =
DSD

Examples 2.1.5 through 2.1.8

Reading Data into SAS (cntd.)

CARDS or DATALINES statements


Format:
Data <dataset_name>;
input <names_of_variables>
Datalines or cards;
<List of Data observations>
;

Reading Data into SAS (cntd.)

cards or datalines statements


Example:
Data animals;
input Zooname $ Tigers Lions Monkeys;
Cards;
San_Diego 7 4 23
New_York 11 4 37
Orlando 2 8 41
;

************ OR **********
Data animals;
input Zooname $ Tigers Lions Monkeys;
Datalines;
San_Diego 7 4 23
New_York 11 4 37
Orlando 2 8 41
;

Section 2

WORKING WITH SAS DATA SETS

Temporary vs. Permanent

Temporary data set


Available only for the current session
Immediately erased when the session is finished
Permanent data set
Remains when the job or session is finished
If you use a data set more than once, it is more efficient to save it as a
permanent SAS data set than to create a new temporary SAS data set every
time you want to use the data.

SAS Data Set Names

Two level approach


WORK.MYSALES

libref
(library reference)

member name

Follows standard SAS naming conventions

Is my data set permanent or temporary?

No explicit way to make a data set temporary or permanent.

This information is hidden in where you put your data set. If it is in WORK library, then
it is temporary. Else, it is permanent.

This also means that if you dont specify a libname with your data, it will be temporary
because it goes to WORK library as WORK library is the default.

Example
Data Statement

Libref

Member name

Type

DATA ironman;

WORK

ironman

temporary

DATA WORK.ironman;

WORK

ironman

temporary

DATA Bikes.ironman;

Bikes

ironman

permanent

DATA distance;
Miles = 26.22;
Kilometers = 1.61 * Miles;
RUN;
PROC PRINT DATA = distance;
RUN;

Temporary

DATA Bikes.distance;
Miles = 26.22;
Kilometers = 1.61 * Miles;
RUN;
PROC PRINT DATA = Bikes.distance;
RUN;

Permanent

LIBNAME Statement

A libref is a nickname that corresponds to the location of a SAS data library.

Use libname statement to create a libref.


Format: LIBNAME libref path to your data library;
Example: LIBNAME mySASlib c:\SAS\myrawdata;

You can also define a libref using the New Library window.

Example 2.2.1

This program sets up a libref named PLANTS pointing to the BaseData directory.
Then it reads the raw data from a file called Mag.dat, creating a permanent SAS data
set named MAGNOLIA which is stored in the PLANTS library.

M. grandiflora Southern Magnolia 80 15 E white


M. campbellii 80 20 D rose
M. liliiflora Lily Magnolia 12 4 D purple
M. soulangiana Saucer Magnolia 25 3 D pink
M. stellata Star Magnolia 10 3 D white

Example 2.2.2

LIBNAME example "/folders/myfolders/basedata/";


PROC PRINT DATA = example.magnolia;
TITLE Magnolias;
RUN;
Note that libref in this example and the previous example are different, however
the location they are referring is the same. So, this code works.

Example 2.2.3 and Example 2.2.4

You can also read into and from any file by direct referencing.

Listing the contents of a data set with PROC CONTENTS


Format:
PROC CONTENTS Data = <mydata>
RUN;
PROC CONTENTS is a simple procedure that shows the contents of a data set. It is a
procedure that outputs the metadata of the dataset.

PROC Contents Output

Further Reading

Optional: Read The Little SAS Book Chapter 2.12 2.18 for more advanced data
parsing methods

You might also like