You are on page 1of 8

Seven dirty secrets of data visualisation | Feature | .

net magazine

2/26/13 5:51 PM

Seven dirty secrets of data visualisation


Data visualisation - and in particular, web-based data visualisation - is having its moment. JavaScript libraries like D3.js, Raphal, and Paper.js, building on modern browser support for Canvas and SVG, have made it easier than ever to produce complex visualisations that, until recently, were the province of computer scientists and a handful of specialist designers. Visualisation is the new 'must-have' element in project proposals and personal portfolios, and startups like Platfora, Datameer, and our own employers ClearStory Data and Chartio are raising millions for analytics platforms with browser-based visualisation interfaces. To some extent, the buzz is justified. Data visualisation is a wonderful way of exploring data, finding new insights, and telling a compelling story. But what are the real challenges visualisation developers face - and what don't they want you to know about their work? We'll lead you through some of the dirty secrets of the information visualisation (infovis) profession, offering an inside look at the process of visualisation development, along with practical tools and approaches for dealing with its inevitable challenges and frustrations.

Secret #1: Real data is ugly


Most data visualisation tutorials start with a pleasant fantasy: a pristine data set. Whether youre learning to build a basic bar chart or a force-directed network graph, youre presented with clean, normalised, well-formatted base data. This perfect JSON or CSV file is the digital analog of the neatly prepped mise en place in a televised cooking show: the refined result of tedious, painstaking work presented as raw ingredients. In practice, when dealing with most real-world data sets, expect to spend up to 80 per cent of your time finding, acquiring, loading, cleaning and transforming your data. Some of this process can be done with automated tools, but almost any data cleaning involving two or more data sets will require some level of manual work. A wide variety of
http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation Page 1 of 8

Seven dirty secrets of data visualisation | Feature | .net magazine

2/26/13 5:51 PM

tools can convert XLS to XML or timestamps to other date formats, but nothing can automagically map one companys internal sales categories to those of its competitors, or deal reliably with data entry typos, incompatible character encodings, or (shudder) poor OCR.

Tools and strategies


Budget significant time in any visualisation project for data cleanup. Increase your estimate (in some cases exponentially) for multiple data sources, manually entered or OCR data, divergent categorisation schemes, and non-standard formats Google Refine is a great data cleanup workhorse, though it has limitations, particularly for non-tabular data. Other cleanup-specific tools include Data Wrangler and Mr. Data Converter. However, many tasks still require basic proficiency in a scripting language like Python or manual work in Excel. Save your scripts - youll use them again Eat your own dog food if you can: visualisation is a great tool for identifying data problems. Use scatter plots and histograms to find and fix suspicious outliers

Secret #2: A bar chart is usually better

http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation

Page 2 of 8

Seven dirty secrets of data visualisation | Feature | .net magazine

2/26/13 5:51 PM

Compared to bar charts, bubble charts support more data points in less space, doughnut charts clearly indicate part-whole relationships, and treemaps support hierarchical categories - but none match simple bars for fine-grained comparison One of the first questions to ask when considering a potential visualisation design is Why is this better than a bar chart? If youre visualising a single quantitative measure over a single categorical dimension, there is rarely a better option. Likewise, time-based data is usually best displayed on a line chart, and scatterplots are often best for exploring correlations between two linear measures. At the risk of sounding regressive, there are good reasons these charts have been in continuous use since the 18th century. Bar charts are one of the best tools available for facilitating visual comparisons, leveraging our innate ability to precisely compare side-by-side lengths.

http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation

Page 3 of 8

Seven dirty secrets of data visualisation | Feature | .net magazine

2/26/13 5:51 PM

The corollary to bar chart superiority, and perhaps the dirtiest secret in this article, is that the coolest-looking visualisations are often the least useful. The novelty and aesthetic appeal of custom visualisations comes at a cost: the clarity of the data. Most bar chart alternatives ask the viewer to compare differences we have a harder time discerning: areas, angles, hues, or opacities. At best, such visualisations make comparison difficult; at worst, they distort the data entirely, leading viewers to false conclusions.

Tools and strategies


Dont dismiss traditional visualisation choices if they represent the best option for your data. Start with bar and line charts, and look further only when the data requires it Have a good rationale for choosing other options. Compared to bar charts, bubble charts support more data points with a wider range of values; pies and doughnuts clearly indicate part-whole relationships; treemaps support hierarchical categories Bar charts have the added bonus of being one of the easiest visualisations to make you can hand-code an effective bar chart in HTML using nothing but CSS and minimal JavaScript, or make one in Excel with a single function

Secret #3: Theres no substitute for real data


Cleaning and formatting a single data set is hard enough, but what if youre building a live visualisation that will run with many different datasets? Maybe you have to build a visualisation for use in multiple departments within one company, where every department has its own database, and you dont have time to manually clean each dataset. Your first instinct may be to grab some demo data and use that to build your visualisation; your visualisation library may even come with standard sample data. Unfortunately, there is no substitute for real data. Demo data tends to have a normal distribution and a manageable number of records; its designed to show visualisations in their best light. A bar chart doesnt just have the prerequisite bars, it looks like an ideal bar chart. It doesnt help you plan for data discrepancies, null values, outliers, or other real-world problems. If you rely too much on demo data, when you plug in real data you may find that your visualisation isnt the best one suited for your data to begin with.
http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation Page 4 of 8

Seven dirty secrets of data visualisation | Feature | .net magazine

2/26/13 5:51 PM

Tools and strategies


Ideally use several random samples of real data if you cannot access an entire dataset Invalid and missing data is a guarantee. If your data wont be cleaned before being graphed, do not clean your sample data Real data may be so large as to overwhelm your visualisation or the system generating it. Be sure that if you use a sample of data you correctly scale up the sample size (or reduce it appropriately) before creating a final visualisation

Secret #4: The devil is in the details

Laying out labels horizontally can quickly lead to crowding and illegible text (top). Rotating labels 90 degrees improves legibility, but takes away significant space from the visualisation. Finding a truncated or abbreviated label format is one approach, but won't work for every data set
http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation Page 5 of 8

Seven dirty secrets of data visualisation | Feature | .net magazine

2/26/13 5:51 PM

Designing the labels, legends and axes for your visualisation is often an afterthought to the initial visualisation. But these elements are crucially important to the visualisation, and can be difficult and time-consuming to get right, especially when you cant predict the data ahead of time. When laying out your visualisation, leave significant rendering space for any additional marks you may need, often including relatively wide margins around the graphical part of your visualisation. Axis labels should be spaced such that they do not occlude each other and are easily readable. Use rotate or reposition labels if necessary for legibility. If a particular area is overcrowded with labels, but you need them for clarity, consider moving the labels farther from the elements they reference and connect them with an indicating line. Another technique is to group crowded labels together in a single tooltiplike group. Consider the space youve allowed and the length of the longer labels. If the labels wont fit you might need to shorten them with ellipses, or simply truncate the text at a fixed length. Similarly, legends require advance planning to render well. One easy option is to reserve some space for the legend to one side of the graphic. Unfortunately, this means that youll need to reduce the size of the graphical portion of your visualisation. In order to preserve some space you may be able to place the legend in an empty part of the graphic, or make the legend draggable so the viewer can access any graphics underneath.

Tools and strategies


Plan space around your graphic for labels, axes and legends Designate a maximum character length for labels, truncating if needed to prevent crowding. Group nearby labels together, revealing them in response to user actions Consider scrolling or accordion-style expansion for long legends Whatever you do, dont leave these elements out. Labels may seem like a secondary concern when youre focused on the graphic elements, but they are incredibly important to your viewers

Secret #5: Animate only when appropriate


http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation Page 6 of 8

Seven dirty secrets of data visualisation | Feature | .net magazine

2/26/13 5:51 PM

As a visualisation author, its often tempting to add animations into your final product. Animations are a powerful way of connecting data to changes in state and trends. However, animations can also lead to confusing or misleading interpretations of your data. You should carefully plan for how it will affect your entire output and not simply add it at the end of your work. Animations work best when they can reveal data relationships showing how data groups together between different states, how the data changes over time, or how data points are directly related. In general, make your animations simple, predictable and re-playable. Allow users to view the animation multiple times so they can track where objects start and end. Avoid occluding objects in a transition with other objects, which makes tracking more difficult and do not transition objects along unpredictable paths. With complex animations, research suggests that viewers comprehension improves when the animation is broken into simple 'staged' transitions. A stage pauses the animation with the objects in a transitioning state and provides the viewer a moment to reflect on the state of each object.

Tools and strategies


Strive to make your animations as simple as possible Consider staged animations when an animation is either complex or has many transitioning objects Flashy animations are often entertaining at first, but quickly become frustrating to the viewer. Do not add animation just because you can

Secret #6: Visualisation is not analysis


It's a central tenet of the field that data visualisation can yield meaningful insight. While theres a great deal of truth to this, its important to remember that visualisation is a tool to aid analysis, not a substitute for analytical skill. Its also not a substitute for statistics: your chart may highlight differences or correlations between data points, but to reliably draw conclusions from these insights often requires a more rigorous statistical approach. (The reverse can also be true - as Anscombes Quartet demonstrates, visualisations can reveal differences statistics hide.) Really understanding your data generally requires a combination of analytical skills, domain expertise, and effort. Dont expect your visualisations to do this work for you, and make sure you manage the expectations of
http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation Page 7 of 8

Seven dirty secrets of data visualisation | Feature | .net magazine

2/26/13 5:51 PM

your clients and your CEO when creating or commissioning visualisations.

Tools and strategies


Unless youre a data analyst, be very careful about promising real insight. Consider working with a statistician or a domain expert if you need to offer reliable conclusions Small design decisions - the colour palette you use, or how you represent a particular variable - can skew the conclusions a visualisation suggests. If youre using visualisations for analysis, try a variety of options, rather than relying on a single view Stephen Fews Now You See It offers a good practical introduction to using visualisation for business analysis, including suggestions for developers on how to design analytically-valid visualisation tools

Secret #7: Data visualisation takes more than code


The range of libraries and tutorials now available make it easier than ever to produce production-quality web-based visualisations without specialised expertise. But creating visualisations that offer real insight or tell a compelling story still requires a particularly wide range of real skills in addition to coding, including graphic design, data analysis, and an understanding of interaction design and human perception. No library or technology can substitute for knowing what youre doing. But the flip side of this secret is that you dont need to know that much - especially if you use well-established visualisations and interaction principles. Learn enough about the field to avoid newbie mistakes (always zero-base your bar charts and never set a circle radius with a linear scale), keep things simple (no 3D, limited animation, no drop shadows), base your work on solid examples and you can create great visualisations.

http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation

Page 8 of 8

You might also like