Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Biostatistics Using JMP: A Practical Guide
Biostatistics Using JMP: A Practical Guide
Biostatistics Using JMP: A Practical Guide
Ebook630 pages4 hours

Biostatistics Using JMP: A Practical Guide

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Analyze your biostatistics data with JMP!

Trevor Bihl's Biostatistics Using JMP: A Practical Guide provides a practical introduction on using JMP, the interactive statistical discovery software, to solve biostatistical problems. Providing extensive breadth, from summary statistics to neural networks, this essential volume offers a comprehensive, step-by-step guide to using JMP to handle your data.

The first biostatistical book to focus on software, Biostatistics Using JMP discusses such topics as data visualization, data wrangling, data cleaning, histograms, box plots, Pareto plots, scatter plots, hypothesis tests, confidence intervals, analysis of variance, regression, curve fitting, clustering, classification, discriminant analysis, neural networks, decision trees, logistic regression, survival analysis, control charts, and metaanalysis.

Written for university students, professors, those who perform biological/biomedical experiments, laboratory managers, and research scientists, Biostatistics Using JMP provides a practical approach to using JMP to solve your biostatistical problems.

LanguageEnglish
PublisherSAS Institute
Release dateOct 3, 2017
ISBN9781635262414
Biostatistics Using JMP: A Practical Guide
Author

Trevor Bihl

Trevor Bihl is both a research scientist/engineer and an educator who teaches biostatistics, engineering statistics, and programming courses. He has been a SAS/JMP user since 2009 and provides various biostatistics and data mining consulting services. His background includes multivariate statistics, signal processing, data mining, and analytics. His educational background includes a BS and MS from Ohio University and a PhD from the Air Force Institute of Technology. He is the author of multiple journal and conference papers, book chapters, and technical reports.

Related to Biostatistics Using JMP

Related ebooks

Enterprise Applications For You

View More

Related articles

Reviews for Biostatistics Using JMP

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Biostatistics Using JMP - Trevor Bihl

    Chapter 1: Introduction

    1.1 Background and Overview

    1.2 Getting Started with JMP

    1.3 General Outline

    1.4 How to Use This Book

    1.5 Reference

    1.1 Background and Overview

    This book evolved from personal experiences in both teaching and consulting in biostatistics. Although many biostatistics textbooks show computer outputs and results, they rarely show how to generate the results. Biostatistics instruction is also commonly theoretical and based on solving simple problems by hand. However, real-world data is usually more complicated than the simple examples, and I found that frequent collaborators–PhD-educated researchers who performed and managed experiments–needed more understanding of software to analyze their data. This is difficult because such researchers often spend a majority of their time performing experiments and use statistical software sparingly. They also often do not have access to a dedicated biostatistician in their office or are competing for the time of their office’s single biostatistician. Although such researchers might know the mechanics of a statistical method, they might not how to generate meaningful results using software.

    Therefore, a practical, how-to guide to biostatistics was needed. There are many software applications available for statistics and biostatistics, so why JMP? As an educator, I found JMP an advantage to teaching. I could spend more time on theory and interpretation because JMP does not require scripts and syntax. As a collaborator and consultant, I found my colleagues would readily gravitate toward JMP and its results because of the graphical user interface (GUI) format and its ease of use. And finally, unless you want to code algorithms themselves, as a researcher, you will find JMP to be more user-friendly, correct, and developed when compared to many other competing packages. Incidentally, if you want to code, SAS programming abilities do exist in JMP. Thus, you can fully use JMP for analysis ranging from simple to complex and customized.

    This book presents and solves problems germane to biostatistics with easy-to-reproduce examples. The book is also a general biostatistics reference that leverages the topics found in leading biostatistics books. This chapter introduces JMP, presents a general outline of the book contents, and provides a brief guide to using this book.

    1.2 Getting Started with JMP

    When you first run JMP, you will be greeted with a Tip of the Day (Figure 1.1). There are 62 tips of the day, and they show up whenever you start JMP. These tips can be useful to new JMP users in gaining familiarity with the software. However, if you don’t want to see these tips further, you can do the following:

    1.   Clear Show tips at start-up.

    2.   Click Close.

    Figure 1.1 Initial Tip of the Day

    After you close the Tip of the Day, you are greeted with the primary JMP interface seen in Figure 1.2. Here, you can load data, create a new data table, or look for recently used files. If this is the first time you have opened JMP, there will be no recent files to consider. Thus, you must load or create a new data table.

    To load a file:

    ●   Click File ► Open.

    or

    ●   Click on the third icon on the taskbar.

    To create a blank data table:

    ●   Click File ► New.

    or

    ●   Click on the first icon on the taskbar.

    Alternatively, if you want to load a built in JMP example data file, you can do so. A variety of files are available.

    To load example data files:

    1.   Click Help ► Sample Data.

    2.   Select a data file under the method of interest.

    Also, you can select individual or multiple data tables in the Window List and then close all of these files. This is advantageous if you inadvertently opened many files, such as in a mistakenly setup Internet open.

    To close many open data tables:

    1.   Select the windows of interest.

    2.   Right-click and select Close.

    3.   You will then be prompted to save these files.

    Figure 1.2 JMP Primary Interface

    If you create a new data table, you will be presented with Figure 1.3. Here, you see that there is a spreadsheet-like table, with a Column 1 ready for you to start considering. Also, when you have loaded and analyzed data, you can save these results to the JMP data table and instantly reload at a later date, as will be discussed in Section 14.2.

    Figure 1.3 New Data Table

    1.3 General Outline

    With this basic usability knowledge from Section 1.2, you are now ready to consider bio-statistical data analysis. Biostatistics covers a wide variety of topics ranging from simple hypothesis tests to complex nonlinear algorithms. This book aims to cover the range of methods with varying levels of detail. To do so, this book is organized sequentially as outlined in Table 1.1.

    Table 1.1 General Outline of Biostatistics Using JMP: A Practical Guide

    1.4 How to Use This Book

    In Chapters 2 and 3, this book moves to data-wrangling issues, such as data collection and cleaning. Since upward of 80% of your time can be spent in making messy data usable (Lohr, 2014), learning the tools JMP has for assembling and cleaning data is key and is covered in Chapters 2 and 3.

    In Chapters 4 and 5, you can learn the basics of descriptive statistics and data visualizations in JMP. Following this, the primary focus is on modeling, which involves creating a mathematical representation (a model) of data or a system in order to make inferences about it.

    After this, you have a few different paths available:

    Chapter 6 discusses epidemiological and geographical interpretations.

    Chapter 4 also discusses developing custom equations.

    Chapter 7 discusses the various hypothesis test and confidence interval methods.

    Chapters 8 to 10 discuss linear models such as analysis of variance (ANOVA), regression, and model validation.

    Because of the interrelation of the underlying methods of regression (Chapter 9) and ANOVA (Chapter 8), diagnostic and remedial measures for these methods are discussed in Chapter 10.

    Chapter 11 discusses classification methods, such as logistic regression, and clustering methods, such as k-means.

    Chapters 7 to 10 largely deal with a continuous dependent (e.g., Y, variable (prediction)). Methods to analyze a discrete dependent variable are presented in Chapter 11.

    Chapter 12 presents advanced modeling methods (e.g., factor analysis, neural networks, and control charts).

    Chapter 13 introduces the vast array of survival analysis methods in JMP.

    Chapter 14 presents methods that facilitate collaborating in addition to sources of additional functionality.

    If you are using a previously created data set, then it is advantageous to start with data-wrangling methods and then look at the various analytical tools this book discusses. However, if you are starting a new experiment and will be collecting data, then you should start looking at Section 8.4.1, which discusses experimental design considerations and how to develop and select factor levels for an experiment.

    1.5 Reference

    Lohr, S. (2014, Aug. 18). For big-data scientists, ‘janitor work’ is key hurdle to insights. New York Times, p. B4.

    Chapter 2: Data Wrangling: Data Collection

    2.1 Introduction

    2.2 Collecting Data from Files

    2.2.1 JMP Native Files

    2.2.2 SAS Format Files

    2.2.3 Excel Spreadsheets

    2.2.4 Text and CSV Format

    2.3 Extracting Data from Internet Locations

    2.3.1 Opening as Data

    2.3.2 Opening as a Webpage

    2.4 Data Modeling Types

    2.4.1 Incorporating Expression and Contextual Data

    2.5 References

    2.1 Introduction

    Many examples in textbooks and manuals such as this book are very orderly and clean. However, real-world data is rarely orderly and clean (authors actually spend a lot of time searching for and fabricating good example data) and up to 80% of your time can be lost in wrangling to make messy data usable (Lohr, 2016). Medical and biological data is also becoming increasingly big in nature (Bihl, Young II, and Weckman, 2016), and thus wrangling the data can be of interest (Marx, 2013). To analyze messy real-world data, data wrangling comes into play.

    Data wrangling, conceptualized in Figure 2.1, involves taking raw data, extracting and cleaning it, and developing data features for analysis. The boundaries are not always clear when data wrangling ends and when statistical analysis begins. Thus, some overlap exists between data wrangling and statistical analysis when you begin to select/extract/analyze data features (e.g., the columns of a JMP data table).

    Figure 2.1 Data-Wrangling Overview, adapted from (Boehmke, 2016)

    Data wrangling is not frequently discussed in conjunction with analysis because books, including this one, focus on developing knowledge in using the presented methods. However, it is important to mention data-wrangling issues since the real world is not full of clean, textbook-style data. Thus, this book considers two data-wrangling tasks with Chapter 2 focusing on data collection in JMP and Chapter 3 focusing on data cleaning using JMP tools. Discussion of data analysis methods then makes up the majority Chapters 4 to 13 of this book. This chapter will present data collection methods built into JMP. JMP can load both JMP and SAS format files, in addition to loading Excel and CSV files. Moreover, data can be imported from text files and Internet webpages using the sophisticated data importation tools in JMP.

    2.2 Collecting Data from Files

    JMP supports opening a wide variety of data file formats, as detailed in Table 2.1 with annotation on, provided that JMP can load or save in that format. By supporting a wide variety of data types, JMP facilitates work on older data sets, with users who do not use JMP, and with a wide audience. To aid in usability, this book will consider examples from some of these data types (*.txt, *.sas7bdat, *.csv).

    Table 2.1 Data Files Supported by JMP

    2.2.1 JMP Native Files

    JMP uses a propriety data format native to JMP software. As with any file format, this format describes how to handle the pieces of the file. However, not all software applications have the keys to unlock all data types. Thus, JMP knows how to assemble a data table from a *.jmp file, but other software applications likely do not. However, various advantages exist to the JMP native format. If you perform an analysis in JMP, you can save this analysis in the JMP file and reload it on a different computer exactly as you left it. Although you could save to a *.csv (or other formats, as shown in Table 2.1), the ability to reload an analysis would be lost by saving outside the *.jmp format.

    2.2.1.1 JMP Sample Data

    A wide variety of sample data sets are available in JMP to illustrate the application of specific methods and to provide data for analysis and demonstration. In JMP 13, 543 sample data files are included and available for use. Their applications range from simple to complex, and this enables you to become familiar with both JMP data and methods through exploring them.

    To access the folder containing all sample data available in your copy of JMP:

    1.   Select Help ► Sample Data Library.

    2.   In the Explorer window that opens to the data directory, double-click on the file of interest to load it.

    To access the sample data listed by method or domain of interest:

    1.   Select Help ► Sample Data.

    2.   Select the method or domain of interest (e.g., click Analysis of Variance).

    Note that you can further see the type of method these files best correspond with.

    3.   Select an example file of interest (e.g., Blood Pressure).

    4.   A data table for this file is then opened.

    2.2.2 SAS Format Files

    SAS files are natively supported by JMP. However, a wide variety of SAS file extensions are in use (e.g., *.sas, *.ss2, *.sc2, etc.). Thus, some care and control might be of interest when loading a file from SAS into JMP. For an example, you can load a SAS data file to JMP:

    1.   Select File ► Open.

    2.   Find the SAS file of interest and click once on it.

    3.   After you select a SAS file, JMP enables you to specify how to open the file. (See Figure 2.2.)

    4.   Click Open when you are satisfied with your selection.

    5.   You will then be greeted with a JMP data table.

    As hinted at in Figure 2.2, JMP knows what to do with a SAS file and thus JMP is asking you whether you want your JMP data table to have column names from the SAS variables names or the SAS variable labels. After making this selection and opening the file, you will be greeted with a JMP data table.

    Figure 2.2 Opening a SAS File

    2.2.3 Excel Spreadsheets

    Microsoft Excel is a spreadsheet software with worldwide use, and thus JMP readily handles the importation of data from Excel spreadsheets. Since Excel can contain data in multiple sheets of the same file, some care and user interaction might be necessary to import data from Excel. An example will be considered using a generic data file ExcelExample.xlsx, which contains nothing more than two columns of user-generated numbers.

    For an example, you can load an Excel data file to JMP:

    1.   Select File ► Open.

    2.   Find the Excel file of interest and click once on it.

    3.   After you select an Excel file, JMP enables you to specify how to open the file. (See Figure 2.3.)

    a.   The key consideration at this point is whether Row 1 values are column/variable names or are numbers.

    4.   Click Open when you are satisfied with your selection.

    5.   You will then be greeted with the Excel Import Wizard. (See Figure 2.4.)

    a.   At the top left, you can view the Excel data.

    b.   At the top right, select the Excel sheet of interest. (This example has only one sheet with numbers.)

    c.   Specify needed considerations using the options in the middle (e.g., where the data is located).

    6.   Click Import when you are satisfied that the view in the top left is what you want the data table to look like.

    Figure 2.3 Opening an Excel File

    Figure 2.4 Excel Import Wizard Dialog Box

    2.2.4 Text and CSV Format

    JMP can load text files as well. However, some care must be taken in this process since text files can be highly unstructured. To facilitate this reality, JMP incorporates a variety of options when opening a text file. For this example, a small file, TextExample.txt, was created with two tab-separated columns of data. Loading a *.csv file will be similar.

    To load a text file:

    1.   Select File ► Open.

    2.   Find the text file of interest and click once on it.

    3.   After a text file is selected, JMP enables you to specify how to open the file. (See Figure 2.5.)

    A variety of options exist, as shown in Figure 2.5. Selecting Data, using Text Import preferences lets JMP use whatever preferences have been selected. Selecting Data, using best guess lets JMP examine the data and use what it believes to be the best approach to open the data. Selecting Plain text into Script window will not load the file as a data table, but as a script text file. Here you can select Data with Preview, which will allow you more control of the text loading and show additional functions that can assist in loading text files.

    Figure 2.5 Opening a Text File

    After you select Data with Preview and click OK, you are greeted with the Import dialog box, as shown in Figure 2.6. Here, you see the text file that you have loaded. For this text file, you see a blue line separating the Day and Meas columns, indicating which parts of the text file JMP will assign to data table columns. The text file under consideration has columns separated by tabs. JMP was able to detect this and its best guess is that the file is tab delimited. However, you could select a wide variety of delimited or fixed-width fields. In addition, you could have selected a subset of this file or a compatibility standard, as shown in the expanded sections at the bottom of Figure 2.6.

    To continue loading:

    1.   Ensure that the lines for data columns at the top of the Import dialog box are as desired.

    2.   Click Next.

    3.   Change the names of the columns, if needed.

    4.   Click Import.

    We are now greeted with the resultant data table as shown in Figure 2.7.

    Figure 2.6 The Text File Import Dialog Box with All Fields Displayed

    Figure 2.7 Data Table from Text Imported File

    2.3 Extracting Data from Internet Locations

    Data is not always conveniently in a spreadsheet or file. Frequently, data can be found on webpages, and thus you need to bring the data to the software. Various options exist to accomplish this task, including copying and pasting the data into a spreadsheet. However, this can be inefficient. To improve the process, you can directly load a webpage and extract its data using JMP. To collect data from the Internet in JMP, begin by selecting File ► Internet Open.

    The dialog box shown in Figure 2.8 will appear. Type or paste the URL into the field. For this example, consider the following URL:

    https://en.wikipedia.org/wiki/Reported_Road_Casualties_Great_Britain

    This is a Wikipedia article that has data from road casualties in Great Britain from 1926 to 2015.

    Figure 2.8 Internet Open

    After you type the URL into the field, you must specify how this will be opened via the Open As drop-down menu. JMP supports three options, listed in Table 2.2. For this example, examine opening as data or text.

    Table 2.2 Internet Open Data Options

    2.3.1 Opening as Data

    If you select the default option, JMP will open the webpage as data. An dialog box like that in Figure 2.9 will appear. This dialog box shows the various items JMP sees as tables:

    1.   Select the desired table or tables.

    2.   Click OK.

    This example shows only the first available table selected. The resultant data tables will appear, as shown in Figure 2.10.

    Figure 2.9 Importing a Table When Opening as Data

    Figure 2.10 Imported Data Table

    2.3.2 Opening as a Webpage

    After JMP opens the URL as a webpage, you can use the page, Figure 2.11, as you could any browser. If you investigate this webpage in the JMP browser, you will find that there are two tables of possible interest on this webpage.

    Figure 2.11 Wikipedia Page for Reported Road Casualties in the JMP Browser

    After you become familiar with the webpage, perform the following steps to create a data table:

    1.   Select File ► Import Table as Data Table.

    2.   An dialog box like that in Figure 12.2 will appear.

    3.   Select the desired table or tables.

    4.   Click OK.

    If you select both options, JMP will instantiate two data tables: the table seen in Figure 2.10, and the table seen in Figure 2.12, which has additional information. However, you should not begin to investigate the data in Figure 2.12 yet. First, you should review the URL; if you examine the webpage as shown in Figure 2.11, you will find that the data in Figure 2.12 is for the year 2008 only. Second, the Ref. and Note columns of both Figure 2.10 and Figure 2.12 could be either very troublesome or very useful, depending on what you plan to do next.

    Figure 2.12 Types of Casualties

    2.4 Data Modeling Types

    JMP data tables can contain columns with continuous values, categorical (nominal or ordinal) values, and additional types (such as pictures). How these columns are modeled within JMP relates to the data type and modeling type. A variety of available data modeling types and their associated symbols are presented in Figure 2.13 with Figure 2.13a showing modeling types for numeric data and Figure 2.13b showing additional modeling types. Basic data types include numeric, character (for text), row state (as for a column that shows the states of individual rows), and expression (such as pictures).

    Figure 2.13 Data Modeling Types in JMP13

    The numeric data type should be used for continuous random variables, ordinal and nominal should be used for discrete values (both numeric and text), and none should be used for columns that fit none of these categories. Characters are frequently seen in categorical groups (e.g., a nominal column with gender) and with unstructured text (e.g., raw observational data from a lab technician). Expressions can be useful for contextual data (e.g., you want to include a picture from a stained slide to link the graphical data to the numeric results that you will analyze). Although the focus of this book will be on numeric data and nominal character data, an example will be seen in Section 2.4.1

    Enjoying the preview?
    Page 1 of 1