Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Big Data Visualization
Big Data Visualization
Big Data Visualization
Ebook507 pages3 hours

Big Data Visualization

Rating: 0 out of 5 stars

()

Read preview

About this ebook

About This Book
  • This unique guide teaches you how to visualize your cluttered, huge amounts of big data with ease
  • It is rich with ample options and solid use cases for big data visualization, and is a must-have book for your shelf
  • Improve your decision-making by visualizing your big data the right way
Who This Book Is For

This book is for data analysts or those with a basic knowledge of big data analysis who want to learn big data visualization in order to make their analysis more useful. You need sufficient knowledge of big data platform tools such as Hadoop and also some experience with programming languages such as R. This book will be great for those who are familiar with conventional data visualizations and now want to widen their horizon by exploring big data visualizations.

LanguageEnglish
Release dateFeb 28, 2017
ISBN9781785284168
Big Data Visualization
Author

James D. Miller

James Miller has enjoyed many accomplishments in his life. He has served as Youth Pastor, Pastor, Police Officer, Police Chaplain, and coach of several different sports and teams. But his greatest joy has been the joy of being married to Judith, and together raising 4 boys and being grandparents to 16 wonderful grand kids.

Read more from James D. Miller

Related to Big Data Visualization

Related ebooks

Enterprise Applications For You

View More

Related articles

Reviews for Big Data Visualization

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Big Data Visualization - James D. Miller

    Table of Contents

    Big Data Visualization

    Credits

    About the Author

    About the Reviewer

    www.PacktPub.com

    Why subscribe?

    Customer Feedback

    Preface

    What this book covers

    What you need for this book

    Who this book is for

    Conventions

    Reader feedback

    Customer support

    Downloading the example code

    Downloading the color images of this book

    Errata

    Piracy

    Questions

    1. Introduction to Big Data Visualization

    An explanation of data visualization

    Conventional data visualization concepts

    Training options

    Challenges of big data visualization

    Big data

    Using Excel to gauge your data

    Pushing big data higher

    The 3Vs

    Volume

    Velocity

    Variety

    Categorization

    Such are the 3Vs

    Data quality

    Dealing with outliers

    Meaningful displays

    Adding a fourth V

    Visualization philosophies

    More on variety

    Velocity

    Volume

    All is not lost

    Approaches to big data visualization

    Access, speed, and storage

    Entering Hadoop

    Context

    Quality

    Displaying results

    Not a new concept

    Instant gratifications

    Data-driven documents

    Dashboards

    Outliers

    Investigation and adjudication

    Operational intelligence

    Summary

    2. Access, Speed, and Storage with Hadoop

    About Hadoop

    What else but Hadoop?

    IBM too!

    Log files and Excel

    An R scripting example

    Points to consider

    Hadoop and big data

    Entering Hadoop

    AWS for Hadoop projects

    Example 1

    Defining the environment

    Getting started

    Uploading the data

    Manipulating the data

    A specific example

    Conclusion

    Example 2

    Sorting

    Parsing the IP

    Summary

    3. Understanding Your Data Using R

    Definitions and explanations

    Comparisons

    Contrasts

    Tendencies

    Dispersion

    Adding context

    About R

    R and big data

    Example 1

    Digging in with R

    Example 2

    Definitions and explanations

    No looping

    Comparisons

    Contrasts

    Tendencies

    Dispersion

    Summary

    4. Addressing Big Data Quality

    Data quality categorized

    DataManager

    DataManager and big data

    Some examples

    Some reformatting

    A little setup

    Selecting nodes

    Connecting the nodes

    The work node

    Adding the script code

    Executing the scene

    Other data quality exercises

    What else is missing?

    Status and relevance

    Naming your nodes

    More examples

    Consistency

    Reliability

    Appropriateness

    Accessibility

    Other Output nodes

    Summary

    5. Displaying Results Using D3

    About D3

    D3 and big data

    Some basic examples

    Getting started with D3

    A little down time

    Visual transitions

    Multiple donuts

    More examples

    Another twist on bar chart visualizations

    One more example

    Adopting the sample

    Summary

    6. Dashboards for Big Data - Tableau

    About Tableau

    Tableau and big data

    Example 1 - Sales transactions

    Adding more context

    Wrangling the data

    Moving on

    A Tableau dashboard

    Saving the workbook

    Presenting our work

    More tools

    Example 2

    What's the goal? - purpose and audience

    Sales and spend

    Sales v Spend and Spend as % of Sales Trend

    Tables and indicators

    All together now

    Summary

    7. Dealing with Outliers Using Python

    About Python

    Python and big data

    Outliers

    Options for outliers

    Delete

    Transform

    Outliers identified

    Some basic examples

    Testing slot machines for profitability

    Into the outliers

    Handling excessive values

    Establishing the value

    Big data note

    Setting outliers

    Removing Specific Records

    Redundancy and risk

    Another point

    If Type

    Reused

    Changing specific values

    Setting the Age

    Another note

    Dropping fields entirely

    More to drop

    More examples

    A themed population

    A focused philosophy

    Summary

    8. Big Data Operational Intelligence with Splunk

    About Splunk

    Splunk and big data

    Splunk visualization -  real-time log analysis

    IBM Cognos

    Pointing Splunk

    Setting rows and columns

    Finishing with errors

    Splunk and processing errors

    Splunk visualization - deeper into the logs

    New fields

    Editing the dashboard

    More about dashboards

    Summary

    Big Data Visualization


    Big Data Visualization

    Copyright © 2017 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    First published: February 2017

    Production reference: 1230217

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham 

    B3 2PB, UK.

    ISBN 978-1-78528-194-5

    www.packtpub.com

    Credits

    About the Author

    James D. Miller is an IBM certified expert, creative innovator, and accomplished Director, Sr. Project Leader, and Application/System Architect with more than 35 years of extensive applications, system design, and development experience across multiple platforms and technologies.

    His experiences and specialties include introducing customers to new and sometimes disruptive technologies and platforms, integrating with IBM Watson Analytics, cloud migrations, Cognos BI, TM1 and web architecture design, systems analysis, GUI design and testing, data and database modeling and systems analysis, design, and the development of OLAP, Client/Server, Web and Mainframe applications and systems utilizing IBM Watson Analytics, IBM Cognos BI and TM1 (TM1 rules, TI, TM1Web and Planning Manager), Cognos Framework Manager, dynaSight/ArcPlan, ASP, DHTML, XML, IIS, MS Visual Basic and VBA, Visual Studio, Perl, Splunk, WebSuite, MS SQL server, ORACLE, SYBASE Server, and more.

    His responsibilities have also included all aspects of Windows and SQL solution development and design, including analysis; GUI (and Web site) design; data modeling; table, screen/form and script development; SQL (and remote stored procedures and triggers) development/testing; test preparation; and the management and training of programming staff. His other experience includes the development of ETL infrastructure such as data transfer automation between mainframe (DB2, Lawson, Great Plains, and so on) systems and client/server SQL server and web-based applications and integration of enterprise applications and data sources.

    Mr. James D. Miller has acted as an Internet Applications Development manager responsible for the design, development, QA, and delivery of multiple websites, including online trading applications, warehouse process control, scheduling systems, and administrative and control applications. He was also responsible for the design, development, and administration of a web-based financial reporting system for a 450-million-dollar organization, reporting directly to the CFO and his executive team.

    Mr. Miller has also been responsible for managing and directing multiple resources in various management roles, including project and team leader, lead developer, and applications development director.

    Mr. Miller has authored Cognos TM1 Developers Certification Guide, Mastering Splunk, Learning IBM Watson Analytics, and a number of whitepapers on best practices such as Establishing a Center of Excellence, and continues to post blogs on a number of relevant topics based upon personal experiences and industry best practices. Jim is a perpetual learner who continues to pursue experiences and certifications, and currently holds the following current technical certifications:

    IBM Certified Business Analyst - Cognos TM1

    IBM Cognos TM1 Master 385 Certification (perfect score of 100% on exam)

    IBM Certified Advanced Solution Expert - Cognos TM1

    IBM Cognos TM1 10.1 Administrator Certification C2020-703 (perfect score of 100% on exam)

    IBM OpenPages Developer Fundamentals C2020-001-ENU (98% on exam)

    IBM Cognos 10 BI Administrator C2020-622 (98% on exam)

    IBM Cognos 10 BI Professional C2020-180

    His specialties include the evaluation and introduction of innovative and disruptive technologies, cloud migration, big data, IBM Watson Analytics, Cognos BI and TM1 application Design and Development, OLAP, Visual Basic, SQL Server, Forecasting and Planning, International Application Development, Business Intelligence, Project Development and Delivery, and process improvement.

    About the Reviewer

    Dave Wentzel is a Data Solutions Architect for Microsoft. He helps customers with their Azure Digital Transformation focused on data science, big data, and SQL Server. After working with customers, he provides feedback and learnings to the product groups at Microsoft to make better solutions. Dave has been working with SQL Server for many years, and with MDX and SSAS since they were in their infancy. Dave shares his experiences at http://davewentzel.com. He’s always looking for new customers. Would you like to engage?

    www.PacktPub.com

    For support files and downloads related to your book, please visit www.PacktPub.com.

    Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details.

    At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

    https://www.packtpub.com/mapt

    Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

    Why subscribe?

    Fully searchable across every book published by Packt

    Copy and paste, print, and bookmark content

    On demand and accessible via a web browser

    Customer Feedback

    Thank you for purchasing this Packt book. We take our commitment to improving our content and products to meet your needs seriously—that's why your feedback is so valuable. Whatever your feelings about your purchase, please consider leaving a review on this book's Amazon page. Not only will this help us, more importantly, it will also help others in the community to make an informed decision about the resources that they invest in to learn.

    You can also review for us on a regular basis by joining our reviewers' club. If you're interested in joining, or would like to learn more about the benefits we offer, please contact us: customerreviews@packtpub.com.

    Preface

    The concepts and models necessary to efficiently and effectively visualize big data can be daunting but are not unobtainable. Unfortunately, when it comes to big data, many of the available data visualization tools, with their rudimentary functions and features, are somewhat ineffective.

    Using basic analytical concepts (reviewed in this book), you’ll learn to use some of the most popular open source tools (and others) to meet these challenges and approach the task of big data visualization to support better decision making.

    What this book covers

    Chapter 1, Introduction to Big Data Visualization, –  starts out by providing a simple explanation of just what data visualization is and then provides a quick overview of various generally accepted data visualization concepts.

    Chapter 2, Access, Speed, and Storage with Hadoop, aims to target the challenge of storing and accessing large volumes and varieties (structured or unstructured) of data offering working examples demonstrating solutions for effectively addressing these issues.

    Chapter 3, Understanding Your Data Using R, explores the idea of adding context to the big data you are working on with R.

    Chapter 4, Addressing Big Data Quality, talks about categorized data quality and the challenges big data brings to them. In addition, examples demonstrating concepts for effectively addressing these areas are covered.

    Chapter 5, Displaying Results Using D3, explores the process of visualizing data using a web browser and Data-Driven Documents (D3) to present results from your big data analysis projects.

    Chapter 6, Dashboards for Big Data - Tableau, introduces Tableau as a data visualization tool that can be used to construct dashboards and provides working examples demonstrating solutions for effectively presenting results from your big data analysis in a real-time dashboard format.

    Chapter 7, Dealing with Outliers Using Python, focuses on the topic of dealing with outliers and other anomalies as they relate to big data visualization, and introduces the Python language with working examples of effectively dealing with data.

    Chapter 8, Big Data Operational Intelligence with Splunk, offers working examples demonstrating solutions for valuing big data by gaining operational intelligence (using Splunk).

    What you need for this book

    Most of the tools and technologies used in this book are open source and available for no charge. All of the others offer free trials for evaluation. With this book, and some basic exposure to data analysis (or basic programming concepts) the reader will be able to gain valuable insights to the world of big data visualization!

    Who this book is for

    The target audience of this book are data analysts and those with at least a basic knowledge of big data analysis who now want to learn interesting approaches to big data visualization in order to make their analysis more valuable. Readers who possess adequate knowledge of big data platform tools such as Hadoop or have exposure to programming languages such as R can use this book to learn additional approaches (using various technologies) for addressing the inherent challenges of visualizing big data.

    Conventions

    In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

    Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: The next lines of code reads the link and assigns it to the to the BeautifulSoup function.

    A block of code is set as follows:

    for row in reader:

            if (row['Denomination']) == 'Penny':

                if int(row['Coin-in'])<2000:

                    x += int(row['Coin-in'])

                row_count += 1

    When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

    row_count = 0

       

    aver_coin_in = 0.0

        x = 0.0

        y = 999

        z = 0.0

    New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: In order to download new modules, we will go to Files | Settings | Project Name | Project Interpreter.

    Note

    Warnings or important notes appear in a box like this.

    Tip

    Tips and tricks appear like this.

    Reader feedback

    Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply e-mail feedback@packtpub.com, and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

    Customer support

    Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

    Downloading the example code

    You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

    You can download the code files by following these steps:

    Log in or register to our website using your e-mail address and password.

    Hover the mouse pointer on the SUPPORT tab at the top.

    Click on Code Downloads & Errata.

    Enter the name of the book in the Search box.

    Select the book for which you're looking to download the code files.

    Choose from the drop-down menu where you purchased this book from.

    Click on Code Download.

    Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

    WinRAR / 7-Zip for Windows

    Zipeg / iZip / UnRarX for Mac

    7-Zip / PeaZip for Linux

    The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Big-Data-Visualization. We also have other code bundles from

    Enjoying the preview?
    Page 1 of 1