Big Data Visualization
()
About this ebook
- This unique guide teaches you how to visualize your cluttered, huge amounts of big data with ease
- It is rich with ample options and solid use cases for big data visualization, and is a must-have book for your shelf
- Improve your decision-making by visualizing your big data the right way
This book is for data analysts or those with a basic knowledge of big data analysis who want to learn big data visualization in order to make their analysis more useful. You need sufficient knowledge of big data platform tools such as Hadoop and also some experience with programming languages such as R. This book will be great for those who are familiar with conventional data visualizations and now want to widen their horizon by exploring big data visualizations.
James D. Miller
James Miller has enjoyed many accomplishments in his life. He has served as Youth Pastor, Pastor, Police Officer, Police Chaplain, and coach of several different sports and teams. But his greatest joy has been the joy of being married to Judith, and together raising 4 boys and being grandparents to 16 wonderful grand kids.
Read more from James D. Miller
Implementing Splunk - Second Edition Rating: 0 out of 5 stars0 ratingsIBM Cognos TM1 Developer's Certification guide Rating: 0 out of 5 stars0 ratingsThe Journey Rating: 0 out of 5 stars0 ratings
Related to Big Data Visualization
Related ebooks
Learning Tableau Rating: 0 out of 5 stars0 ratingsBusiness Intelligence Guidebook: From Data Integration to Analytics Rating: 4 out of 5 stars4/5Learning Tableau 2019 - Third Edition: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition Rating: 0 out of 5 stars0 ratingsMicrosoft Power BI A Complete Guide - 2019 Edition Rating: 5 out of 5 stars5/5Learning Tableau 10 - Second Edition Rating: 4 out of 5 stars4/5Building Big Data Applications Rating: 0 out of 5 stars0 ratingsSpreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science Rating: 0 out of 5 stars0 ratingsMaking Big Data Work for Your Business: A guide to effective Big Data analytics Rating: 0 out of 5 stars0 ratingsPower Query for Power BI and Excel Rating: 0 out of 5 stars0 ratingsArchitecting Big Data & Analytics Solutions - Integrated with IoT & Cloud Rating: 5 out of 5 stars5/5QlikView Essentials Rating: 0 out of 5 stars0 ratingsData Quality: The Accuracy Dimension Rating: 4 out of 5 stars4/5Big Data Analytics with R Rating: 0 out of 5 stars0 ratingsPractical Business Intelligence Rating: 3 out of 5 stars3/5Introduction to Data Science Using R Rating: 0 out of 5 stars0 ratingsDelivering Business Intelligence with Microsoft SQL Server 2016, Fourth Edition Rating: 0 out of 5 stars0 ratingsUnderstanding Big Data: A Beginners Guide to Data Science & the Business Applications Rating: 4 out of 5 stars4/5R: Data Analysis and Visualization Rating: 5 out of 5 stars5/5Data Warehousing in the Age of Big Data Rating: 0 out of 5 stars0 ratingsTableau Cookbook – Recipes for Data Visualization Rating: 0 out of 5 stars0 ratingsPractical Data Analysis Cookbook Rating: 0 out of 5 stars0 ratingsQlikView for Developers Rating: 0 out of 5 stars0 ratingsMicrosoft Tabular Modeling Cookbook Rating: 0 out of 5 stars0 ratingsIntroduction to R for Business Intelligence Rating: 0 out of 5 stars0 ratingsData Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses Rating: 4 out of 5 stars4/5Data Analysis with R Rating: 5 out of 5 stars5/5Practical Data Science Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsPractical Data Analysis Rating: 4 out of 5 stars4/5
Enterprise Applications For You
Scrivener For Dummies Rating: 4 out of 5 stars4/5Systems Thinking: Managing Chaos and Complexity: A Platform for Designing Business Architecture Rating: 4 out of 5 stars4/5Bitcoin For Dummies Rating: 4 out of 5 stars4/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5QuickBooks 2023 All-in-One For Dummies Rating: 0 out of 5 stars0 ratingsExcel 2019 For Dummies Rating: 3 out of 5 stars3/5Excel Formulas and Functions 2020: Excel Academy, #1 Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology Rating: 0 out of 5 stars0 ratingsManaging Humans: Biting and Humorous Tales of a Software Engineering Manager Rating: 4 out of 5 stars4/550 Useful Excel Functions: Excel Essentials, #3 Rating: 5 out of 5 stars5/5SharePoint 2016 For Dummies Rating: 5 out of 5 stars5/5QuickBooks 2021 For Dummies Rating: 0 out of 5 stars0 ratingsUsing Word 2019: The Step-by-step Guide to Using Microsoft Word 2019 Rating: 0 out of 5 stars0 ratingsThe Ridiculously Simple Guide To Numbers For Mac Rating: 0 out of 5 stars0 ratingsEnterprise AI For Dummies Rating: 3 out of 5 stars3/5Essential Office 365 Third Edition: The Illustrated Guide to Using Microsoft Office Rating: 3 out of 5 stars3/5Mastering QuickBooks 2020: The ultimate guide to bookkeeping and QuickBooks Online Rating: 0 out of 5 stars0 ratingsLearning Python Rating: 5 out of 5 stars5/5Learn Windows PowerShell in a Month of Lunches Rating: 0 out of 5 stars0 ratingsNotion for Beginners: Notion for Work, Play, and Productivity Rating: 4 out of 5 stars4/5Excel Tips and Tricks Rating: 0 out of 5 stars0 ratingsQuickBooks Online For Dummies Rating: 0 out of 5 stars0 ratingsExcel 2016 For Dummies Rating: 4 out of 5 stars4/5The New Email Revolution: Save Time, Make Money, and Write Emails People Actually Want to Read! Rating: 5 out of 5 stars5/5
Reviews for Big Data Visualization
0 ratings0 reviews
Book preview
Big Data Visualization - James D. Miller
Table of Contents
Big Data Visualization
Credits
About the Author
About the Reviewer
www.PacktPub.com
Why subscribe?
Customer Feedback
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Introduction to Big Data Visualization
An explanation of data visualization
Conventional data visualization concepts
Training options
Challenges of big data visualization
Big data
Using Excel to gauge your data
Pushing big data higher
The 3Vs
Volume
Velocity
Variety
Categorization
Such are the 3Vs
Data quality
Dealing with outliers
Meaningful displays
Adding a fourth V
Visualization philosophies
More on variety
Velocity
Volume
All is not lost
Approaches to big data visualization
Access, speed, and storage
Entering Hadoop
Context
Quality
Displaying results
Not a new concept
Instant gratifications
Data-driven documents
Dashboards
Outliers
Investigation and adjudication
Operational intelligence
Summary
2. Access, Speed, and Storage with Hadoop
About Hadoop
What else but Hadoop?
IBM too!
Log files and Excel
An R scripting example
Points to consider
Hadoop and big data
Entering Hadoop
AWS for Hadoop projects
Example 1
Defining the environment
Getting started
Uploading the data
Manipulating the data
A specific example
Conclusion
Example 2
Sorting
Parsing the IP
Summary
3. Understanding Your Data Using R
Definitions and explanations
Comparisons
Contrasts
Tendencies
Dispersion
Adding context
About R
R and big data
Example 1
Digging in with R
Example 2
Definitions and explanations
No looping
Comparisons
Contrasts
Tendencies
Dispersion
Summary
4. Addressing Big Data Quality
Data quality categorized
DataManager
DataManager and big data
Some examples
Some reformatting
A little setup
Selecting nodes
Connecting the nodes
The work node
Adding the script code
Executing the scene
Other data quality exercises
What else is missing?
Status and relevance
Naming your nodes
More examples
Consistency
Reliability
Appropriateness
Accessibility
Other Output nodes
Summary
5. Displaying Results Using D3
About D3
D3 and big data
Some basic examples
Getting started with D3
A little down time
Visual transitions
Multiple donuts
More examples
Another twist on bar chart visualizations
One more example
Adopting the sample
Summary
6. Dashboards for Big Data - Tableau
About Tableau
Tableau and big data
Example 1 - Sales transactions
Adding more context
Wrangling the data
Moving on
A Tableau dashboard
Saving the workbook
Presenting our work
More tools
Example 2
What's the goal? - purpose and audience
Sales and spend
Sales v Spend and Spend as % of Sales Trend
Tables and indicators
All together now
Summary
7. Dealing with Outliers Using Python
About Python
Python and big data
Outliers
Options for outliers
Delete
Transform
Outliers identified
Some basic examples
Testing slot machines for profitability
Into the outliers
Handling excessive values
Establishing the value
Big data note
Setting outliers
Removing Specific Records
Redundancy and risk
Another point
If Type
Reused
Changing specific values
Setting the Age
Another note
Dropping fields entirely
More to drop
More examples
A themed population
A focused philosophy
Summary
8. Big Data Operational Intelligence with Splunk
About Splunk
Splunk and big data
Splunk visualization - real-time log analysis
IBM Cognos
Pointing Splunk
Setting rows and columns
Finishing with errors
Splunk and processing errors
Splunk visualization - deeper into the logs
New fields
Editing the dashboard
More about dashboards
Summary
Big Data Visualization
Big Data Visualization
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: February 2017
Production reference: 1230217
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78528-194-5
www.packtpub.com
Credits
About the Author
James D. Miller is an IBM certified expert, creative innovator, and accomplished Director, Sr. Project Leader, and Application/System Architect with more than 35 years of extensive applications, system design, and development experience across multiple platforms and technologies.
His experiences and specialties include introducing customers to new and sometimes disruptive technologies and platforms, integrating with IBM Watson Analytics, cloud migrations, Cognos BI, TM1 and web architecture design, systems analysis, GUI design and testing, data and database modeling and systems analysis, design, and the development of OLAP, Client/Server, Web and Mainframe applications and systems utilizing IBM Watson Analytics, IBM Cognos BI and TM1 (TM1 rules, TI, TM1Web and Planning Manager), Cognos Framework Manager, dynaSight/ArcPlan, ASP, DHTML, XML, IIS, MS Visual Basic and VBA, Visual Studio, Perl, Splunk, WebSuite, MS SQL server, ORACLE, SYBASE Server, and more.
His responsibilities have also included all aspects of Windows and SQL solution development and design, including analysis; GUI (and Web site) design; data modeling; table, screen/form and script development; SQL (and remote stored procedures and triggers) development/testing; test preparation; and the management and training of programming staff. His other experience includes the development of ETL infrastructure such as data transfer automation between mainframe (DB2, Lawson, Great Plains, and so on) systems and client/server SQL server and web-based applications and integration of enterprise applications and data sources.
Mr. James D. Miller has acted as an Internet Applications Development manager responsible for the design, development, QA, and delivery of multiple websites, including online trading applications, warehouse process control, scheduling systems, and administrative and control applications. He was also responsible for the design, development, and administration of a web-based financial reporting system for a 450-million-dollar organization, reporting directly to the CFO and his executive team.
Mr. Miller has also been responsible for managing and directing multiple resources in various management roles, including project and team leader, lead developer, and applications development director.
Mr. Miller has authored Cognos TM1 Developers Certification Guide, Mastering Splunk, Learning IBM Watson Analytics, and a number of whitepapers on best practices such as Establishing a Center of Excellence, and continues to post blogs on a number of relevant topics based upon personal experiences and industry best practices. Jim is a perpetual learner who continues to pursue experiences and certifications, and currently holds the following current technical certifications:
IBM Certified Business Analyst - Cognos TM1
IBM Cognos TM1 Master 385 Certification (perfect score of 100% on exam)
IBM Certified Advanced Solution Expert - Cognos TM1
IBM Cognos TM1 10.1 Administrator Certification C2020-703 (perfect score of 100% on exam)
IBM OpenPages Developer Fundamentals C2020-001-ENU (98% on exam)
IBM Cognos 10 BI Administrator C2020-622 (98% on exam)
IBM Cognos 10 BI Professional C2020-180
His specialties include the evaluation and introduction of innovative and disruptive technologies, cloud migration, big data, IBM Watson Analytics, Cognos BI and TM1 application Design and Development, OLAP, Visual Basic, SQL Server, Forecasting and Planning, International Application Development, Business Intelligence, Project Development and Delivery, and process improvement.
About the Reviewer
Dave Wentzel is a Data Solutions Architect for Microsoft. He helps customers with their Azure Digital Transformation focused on data science, big data, and SQL Server. After working with customers, he provides feedback and learnings to the product groups at Microsoft to make better solutions. Dave has been working with SQL Server for many years, and with MDX and SSAS since they were in their infancy. Dave shares his experiences at http://davewentzel.com. He’s always looking for new customers. Would you like to engage?
www.PacktPub.com
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Customer Feedback
Thank you for purchasing this Packt book. We take our commitment to improving our content and products to meet your needs seriously—that's why your feedback is so valuable. Whatever your feelings about your purchase, please consider leaving a review on this book's Amazon page. Not only will this help us, more importantly, it will also help others in the community to make an informed decision about the resources that they invest in to learn.
You can also review for us on a regular basis by joining our reviewers' club. If you're interested in joining, or would like to learn more about the benefits we offer, please contact us: customerreviews@packtpub.com.
Preface
The concepts and models necessary to efficiently and effectively visualize big data can be daunting but are not unobtainable. Unfortunately, when it comes to big data, many of the available data visualization tools, with their rudimentary functions and features, are somewhat ineffective.
Using basic analytical concepts (reviewed in this book), you’ll learn to use some of the most popular open source tools (and others) to meet these challenges and approach the task of big data visualization to support better decision making.
What this book covers
Chapter 1, Introduction to Big Data Visualization, – starts out by providing a simple explanation of just what data visualization is and then provides a quick overview of various generally accepted data visualization concepts.
Chapter 2, Access, Speed, and Storage with Hadoop, aims to target the challenge of storing and accessing large volumes and varieties (structured or unstructured) of data offering working examples demonstrating solutions for effectively addressing these issues.
Chapter 3, Understanding Your Data Using R, explores the idea of adding context to the big data you are working on with R.
Chapter 4, Addressing Big Data Quality, talks about categorized data quality and the challenges big data brings to them. In addition, examples demonstrating concepts for effectively addressing these areas are covered.
Chapter 5, Displaying Results Using D3, explores the process of visualizing data using a web browser and Data-Driven Documents (D3) to present results from your big data analysis projects.
Chapter 6, Dashboards for Big Data - Tableau, introduces Tableau as a data visualization tool that can be used to construct dashboards and provides working examples demonstrating solutions for effectively presenting results from your big data analysis in a real-time dashboard format.
Chapter 7, Dealing with Outliers Using Python, focuses on the topic of dealing with outliers and other anomalies as they relate to big data visualization, and introduces the Python language with working examples of effectively dealing with data.
Chapter 8, Big Data Operational Intelligence with Splunk, offers working examples demonstrating solutions for valuing big data by gaining operational intelligence (using Splunk).
What you need for this book
Most of the tools and technologies used in this book are open source and available for no charge. All of the others offer free trials for evaluation. With this book, and some basic exposure to data analysis (or basic programming concepts) the reader will be able to gain valuable insights to the world of big data visualization!
Who this book is for
The target audience of this book are data analysts and those with at least a basic knowledge of big data analysis who now want to learn interesting approaches to big data visualization in order to make their analysis more valuable. Readers who possess adequate knowledge of big data platform tools such as Hadoop or have exposure to programming languages such as R can use this book to learn additional approaches (using various technologies) for addressing the inherent challenges of visualizing big data.
Conventions
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: The next lines of code reads the link and assigns it to the to the BeautifulSoup function.
A block of code is set as follows:
for row in reader:
if (row['Denomination']) == 'Penny':
if int(row['Coin-in'])<2000:
x += int(row['Coin-in'])
row_count += 1
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
row_count = 0
aver_coin_in = 0.0
x = 0.0
y = 999
z = 0.0
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: In order to download new modules, we will go to Files | Settings | Project Name | Project Interpreter.
Note
Warnings or important notes appear in a box like this.
Tip
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply e-mail feedback@packtpub.com, and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
Downloading the example code
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
You can download the code files by following these steps:
Log in or register to our website using your e-mail address and password.
Hover the mouse pointer on the SUPPORT tab at the top.
Click on Code Downloads & Errata.
Enter the name of the book in the Search box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on Code Download.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Big-Data-Visualization. We also have other code bundles from