You are on page 1of 1

Project 2: Formatting and processing data into usable formats.

Many jobs require that we have our dataset in a table or matrix format for subsequent
analysis, and so most of us will, at some point, use a spreadsheet (Excel or LibreOffice
Calc for example) to view and process our data.
This exercise requires you to take the example datasets we have provided and,
following the rules listed below, prepare these as suitable datasets for use in further
analysis. The examples we have given are typical containing many of the things that
need to be commonly fixed.

You need to ensure your finished data:


1. Is all in one table, in one sheet
2. Has at least one field in each row that can be used to uniquely identify that row
3. Has column names that do not contain spaces or special characters
4. Has consistent terms to describe the data throughout each column
5. Doesnt mix text and numerical values for the same data no missing values
6. Does not use formatting (colours etc) to indicate information
7. Is exported/saved as a plain text format either csv or tsv (choose sensibly and
watch out for tabs or commas in data fields!)

You have three datasets to prepare each involving slightly different challenges.
Please attempt all three.

1. Dataset 1: Array experiment sample data


2. Dataset 2: Tissue expression data
3. Dataset 3: Supplementary materials

You might also like