Learning Hunk
By Dmitry Anoshin and Sheypak Sergey
()
About this ebook
About This Book
- Explore your data in Hadoop and NoSQL data stores
- Create and optimize your reporting experience with advanced data visualizations and data analytics
- A comprehensive developer's guide that helps you create outstanding analytical solutions efficiently
Who This Book Is For
If you are Hadoop developers who want to build efficient real-time Operation Intelligence Solutions based on Hadoop deployments or various NoSQL data stores using Hunk, this book is for you. Some familiarity with Splunk is assumed.
What You Will Learn
- Deploy and configure Hunk on top of Cloudera Hadoop
- Create and configure Virtual Indexes for datasets
- Make your data presentable using the wide variety of data visualization components and knowledge objects
- Design a data model using Hunk best practices
- Add more flexibility to your analytics solution via extended SDK and custom visualizations
- Discover data using MongoDB as a data source
- Integrate Hunk with AWS Elastic MapReduce to improve scalability
In Detail
Hunk is the big data analytics platform that lets you rapidly explore, analyse, and visualize data in Hadoop and NoSQL data stores. It provides a single, fluid user experience, designed to show you insights from your big data without the need for specialized skills, fixed schemas, or months of development. Hunk goes beyond typical data analysis methods and gives you the power to rapidly detect patterns and find anomalies across petabytes of raw data.
This book focuses on exploring, analysing, and visualizing big data in Hadoop and NoSQL data stores with this powerful full-featured big data analytics platform.
You will begin by learning the Hunk architecture and Hunk Virtual Index before moving on to how to easily analyze and visualize data using Splunk Search Language (SPL). Next you will meet Hunk Apps which can easy integrate with NoSQL data stores such as MongoDB or Sqqrl. You will also discover Hunk knowledge objects, build a semantic layer on top of Hadoop, and explore data using the friendly user-interface of Hunk Pivot. You will connect MongoDB and explore data in the data store. Finally, you will go through report acceleration techniques and analyze data in the AWS Cloud.
Style and approach
A step-by-step guide starting right from the basics and deep diving into the more advanced and technical aspects of Hunk.
Read more from Dmitry Anoshin
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts Rating: 0 out of 5 stars0 ratingsSAP Lumira Essentials Rating: 4 out of 5 stars4/5Mastering Business Intelligence with MicroStrategy Rating: 0 out of 5 stars0 ratings
Related to Learning Hunk
Related ebooks
Learning Kibana 5.0 Rating: 0 out of 5 stars0 ratingsPractical Data Analysis - Second Edition Rating: 0 out of 5 stars0 ratingsData Analysis and Business Modeling with Excel 2013 Rating: 1 out of 5 stars1/5Machine Learning with Spark - Second Edition Rating: 0 out of 5 stars0 ratingsPostgreSQL Administration Essentials Rating: 0 out of 5 stars0 ratingsScalable Big Data Architecture: A practitioners guide to choosing relevant Big Data architecture Rating: 0 out of 5 stars0 ratingsPractical Predictive Analytics Rating: 0 out of 5 stars0 ratingsLearning SAP BusinessObjects Dashboards Rating: 0 out of 5 stars0 ratingsMaking Big Data Work for Your Business: A guide to effective Big Data analytics Rating: 0 out of 5 stars0 ratingsPredictive Analytics Using Rattle and Qlik Sense Rating: 0 out of 5 stars0 ratingsBuilding ERP Solutions with Microsoft Dynamics NAV Rating: 0 out of 5 stars0 ratingsAngular Services Rating: 0 out of 5 stars0 ratingsHands-On Machine Learning Recommender Systems with Apache Spark Rating: 0 out of 5 stars0 ratingsMonitoring Hadoop Rating: 0 out of 5 stars0 ratingsEmpower Decision Makers with SAP Analytics Cloud: Modernize BI with SAP's Single Platform for Analytics Rating: 0 out of 5 stars0 ratingsElasticsearch Indexing Rating: 0 out of 5 stars0 ratingsBuilding a Web Application with PHP and MariaDB: A Reference Guide Rating: 0 out of 5 stars0 ratingsDeep Learning with Azure: Building and Deploying Artificial Intelligence Solutions on the Microsoft AI Platform Rating: 0 out of 5 stars0 ratingsBuilding Web Applications with Python and Neo4j Rating: 0 out of 5 stars0 ratingsHadoop Blueprints Rating: 0 out of 5 stars0 ratingsData Analytics with SAS: Explore your data and get actionable insights with the power of SAS (English Edition) Rating: 0 out of 5 stars0 ratingsLearning Data Mining with Python Rating: 0 out of 5 stars0 ratingsMongoDB High Availability Rating: 5 out of 5 stars5/5HDInsight Essentials - Second Edition Rating: 0 out of 5 stars0 ratingsLearning Azure DocumentDB Rating: 0 out of 5 stars0 ratingsImplementing Cloud Design Patterns for AWS Rating: 0 out of 5 stars0 ratingsLearning .NET High-performance Programming Rating: 0 out of 5 stars0 ratingsPython Data Science Essentials Rating: 0 out of 5 stars0 ratings
Data Visualization For You
Mastering Excel: Excel Apps Rating: 3 out of 5 stars3/5DAX Patterns: Second Edition Rating: 5 out of 5 stars5/5Top 20 Essential Skills for ArcGIS Pro Rating: 0 out of 5 stars0 ratingsNo-Code Data Science: Mastering Advanced Analytics, Machine Learning, and Artificial Intelligence Rating: 0 out of 5 stars0 ratingsData Visualization with Excel Dashboards and Reports Rating: 4 out of 5 stars4/5Fieldwork Handbook: A Practical Guide on the Go Rating: 0 out of 5 stars0 ratingsLearning pandas - Second Edition Rating: 4 out of 5 stars4/5Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python Rating: 0 out of 5 stars0 ratingsGetting to Know ArcGIS Desktop 10.8 Rating: 4 out of 5 stars4/5Teach Yourself VISUALLY Power BI Rating: 0 out of 5 stars0 ratingsVisualizing Graph Data Rating: 0 out of 5 stars0 ratingsData Analytics for Beginners: Introduction to Data Analytics Rating: 4 out of 5 stars4/5Mastering Data Analysis with Python: A Comprehensive Guide to NumPy, Pandas, and Matplotlib Rating: 0 out of 5 stars0 ratingsData Pipelines with Apache Airflow Rating: 0 out of 5 stars0 ratingsVisual Analytics with Tableau Rating: 0 out of 5 stars0 ratingsThe Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios Rating: 4 out of 5 stars4/5D3.js in Action: Data visualization with JavaScript Rating: 0 out of 5 stars0 ratingsPractical Data Analysis Cookbook Rating: 0 out of 5 stars0 ratingsData Visualization: A Practical Introduction Rating: 5 out of 5 stars5/5Simulation for Data Science with R Rating: 0 out of 5 stars0 ratingsLearning Tableau 2019 - Third Edition: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition Rating: 0 out of 5 stars0 ratingsCool Infographics: Effective Communication with Data Visualization and Design Rating: 4 out of 5 stars4/5R for Data Science Rating: 5 out of 5 stars5/5Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals Rating: 4 out of 5 stars4/5Financial Reporting with Dashboards in Power BI Rating: 0 out of 5 stars0 ratingsNumPy Recipes Rating: 0 out of 5 stars0 ratings
Reviews for Learning Hunk
0 ratings0 reviews
Book preview
Learning Hunk - Dmitry Anoshin
Table of Contents
Learning Hunk
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Meet Hunk
Big data analytics
The big problem
The elegant solution
Supporting SPL
Intermediate results
Getting to know Hunk
Splunk versus Hunk
Hunk architecture
Connecting to Hadoop
Advance Hunk deployment
Native versus virtual indexes
Native indexes
Virtual index
External result provider
Computation models
Data streaming
Data reporting
Mixed mode
Hunk security
One Hunk user to one Hadoop user
Many Hunk users to one Hadoop user
Hunk user(s) to the same Hadoop user with different queues
Setting up Hadoop
Starting and using a virtual machine with CDH5
SSH user
MySQL
Starting the VM and cluster in VirtualBox
Big data use case
Importing data from RDBMS to Hadoop using Sqoop
Telecommunications – SMS, Call, and Internet dataset from dandelion.eu
Milano grid map
CDR aggregated data import process
Periodical data import from MySQL using Sqoop and Oozie
Problems to solve
Summary
2. Explore Hadoop Data with Hunk
Setting up Hunk
Extracting Hunk to a VM
Setting up Hunk variables and configuration files
Running Hunk for the first time
Setting up a data provider and virtual index for CDR data
Setting up a connection to Hadoop
Setting up a virtual index for data stored in Hadoop
Accessing data through a virtual index
Exploring data
Creating reports
The top five browsers report
Top referrers
Site errors report
Creating alerts
Creating a dashboard
Controlling security with Hunk
The default Hadoop security
One Hunk user to one Hadoop user
Summary
3. Meeting Hunk Features
Knowledge objects
Field aliases
Calculated fields
Field extractions
Tags
Event type
Workflow actions
Macros
Data model
Add auto-extracting fields
Adding GeoIP attributes
Other ways to add attributes
Introducing Pivot
Summary
4. Adding Speed to Reports
Big data performance issues
Hunk report acceleration
Creating a virtual index
Streaming mode
Creating an acceleration search
What's going on in Hadoop?
Report acceleration summaries
Reviewing summary details
Managing report accelerations
Hunk accelerations limits
Summary
5. Customizing Hunk
What we are going to do with the Splunk SDK
Supported languages
Solving problems
REST API
The implementation plan
The conclusion
Dashboard customization using Splunk Web Framework
Functionality
A description of time-series aggregated CDR data
Source data
Creating a virtual index for Milano CDR
Creating a virtual index for the Milano grid
Creating a virtual index using sample data
Implementation
Querying the visualization
Downloading the application
Custom Google Maps
Page layout
Linear gradients and bins for the activity value
Custom map components
Other components
The final result
Summary
6. Discovering Hunk Integration Apps
What is Mongo?
Installation
Installing the Mongo app
Mongo provider
Creating a virtual index
Inputting data from the recommendation engine backend
Data schemas
Data mechanics
Counting by shop in a single collection
Counting events in all collections
Counting events in shops for observed days
Summary
7. Exploring Data in the Cloud
An introduction to Amazon EMR and S3
Amazon EMR
Setting up an Amazon EMR cluster
Amazon S3
S3 as a data provider for Hunk
The advantages of EMR and S3
Integrating Hunk with EMR and S3
Method 1: BYOL
Setting up the Hunk AMI
Adding a license
Configuring the data provider
Configuring a virtual index
Setting up a provider and virtual index in the configuration file
Exploring data
Method 2: Hunk–hourly pricing
Provisioning a Hunk instance using the Cloud formation template
Provisioning a Hunk instance using the EC2 Console
Converting Hunk from an hourly rate to a license
Summary
Index
Learning Hunk
Learning Hunk
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: December 2015
Production reference: 1181215
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78217-482-0
www.packtpub.com
Credits
Authors
Dmitry Anoshin
Sergey Sheypak
Reviewers
Jigar Bhatt
Neil Mehta
Acquisition Editors
Hemal Desai
Reshma Raman
Content Development Editor
Anish Sukumaran
Technical Editor
Shivani Kiran Mistry
Copy Editor
Stephen Copestake
Project Coordinator
Izzat Contractor
Proofreader
Safis Editing
Indexer
Hemangini Bari
Graphics
Jason Monteiro
Production Coordinator
Nilesh Mohite
Cover Work
Nilesh Mohite
About the Authors
Dmitry Anoshin is a data-centric technologist and a recognized expert in building and implementing big data and analytics solutions. He has a successful track record when it comes to implementing business and digital intelligence projects in numerous industries, including retail, finance, marketing, and e-commerce.
Dmitry possesses in-depth knowledge of digital/business intelligence, ETL, data warehousing, and big data technologies. He has extensive experience in the data integration process and is proficient in using various data warehousing methodologies. Dmitry has constantly exceeded project expectations when he has worked for financial, machine tool, and retail industries.
He has completed a number of multinational full BI/DI solution life cycle implementation projects. With expertise in data modeling, Dmitry also has a background and business experience in multiple relation databases, OLAP systems, and NoSQL databases.
In addition, he has reviewed SAP BusinessObjects Reporting Cookbook, Creating Universes with SAP BusinessObjects, and Learning SAP BusinessObjects Dashboards, all by Packt Publishing and was the author of SAP Lumira Essentials, Packt Publishing.
I would like to tell my wife Sveta how much I love her. I dedicate this book to my wife and children, Vasily and Anna. Thank you for your never-ending support that keeps me going.
Sergey Sheypak started his so-called big data practice in 2010 as a Teradata PS consultant. His was leading the Teradata Master Data Management deployment in Sberbank, Russia (which has 110 billion customers). Later Sergey switched to AsterData and Hadoop practices. Sergey joined the Research and Development team at MegaFon (one of the top three telecom companies in Russia with 70 billion customers) in 2012. While leading the Hadoop team at MegaFon, Sergey built ETL processes from existing Oracle DWH to HDFS. Automated end-to-end tests and acceptance tests were introduced as a mandatory part of the Hadoop development process. Scoring geospatial analysis systems based on specific telecom data were developed and launched. Now, Sergey works as independent consultant in Sweden.
About the Reviewer
Jigar Bhatt is a computer engineering undergraduate from the National Institute of Technology, Surat. He specializes in big data technologies and has a deep interest in data science and machine learning. He has also engineered several cloud-based Android applications. He is currently working as a full-time software developer at a renowned start-up, focusing on building and optimizing cloud platforms and ensuring profitable business intelligence round the clock.
Apart from academics, he finds adventurous sports