Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Learning Hunk
Learning Hunk
Learning Hunk
Ebook295 pages1 hour

Learning Hunk

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Visualize and analyze your Hadoop data using Hunk

About This Book

- Explore your data in Hadoop and NoSQL data stores
- Create and optimize your reporting experience with advanced data visualizations and data analytics
- A comprehensive developer's guide that helps you create outstanding analytical solutions efficiently

Who This Book Is For

If you are Hadoop developers who want to build efficient real-time Operation Intelligence Solutions based on Hadoop deployments or various NoSQL data stores using Hunk, this book is for you. Some familiarity with Splunk is assumed.

What You Will Learn

- Deploy and configure Hunk on top of Cloudera Hadoop
- Create and configure Virtual Indexes for datasets
- Make your data presentable using the wide variety of data visualization components and knowledge objects
- Design a data model using Hunk best practices
- Add more flexibility to your analytics solution via extended SDK and custom visualizations
- Discover data using MongoDB as a data source
- Integrate Hunk with AWS Elastic MapReduce to improve scalability

In Detail

Hunk is the big data analytics platform that lets you rapidly explore, analyse, and visualize data in Hadoop and NoSQL data stores. It provides a single, fluid user experience, designed to show you insights from your big data without the need for specialized skills, fixed schemas, or months of development. Hunk goes beyond typical data analysis methods and gives you the power to rapidly detect patterns and find anomalies across petabytes of raw data.
This book focuses on exploring, analysing, and visualizing big data in Hadoop and NoSQL data stores with this powerful full-featured big data analytics platform.
You will begin by learning the Hunk architecture and Hunk Virtual Index before moving on to how to easily analyze and visualize data using Splunk Search Language (SPL). Next you will meet Hunk Apps which can easy integrate with NoSQL data stores such as MongoDB or Sqqrl. You will also discover Hunk knowledge objects, build a semantic layer on top of Hadoop, and explore data using the friendly user-interface of Hunk Pivot. You will connect MongoDB and explore data in the data store. Finally, you will go through report acceleration techniques and analyze data in the AWS Cloud.

Style and approach

A step-by-step guide starting right from the basics and deep diving into the more advanced and technical aspects of Hunk.
LanguageEnglish
Release dateDec 31, 2015
ISBN9781785283024
Learning Hunk

Read more from Dmitry Anoshin

Related to Learning Hunk

Related ebooks

Data Visualization For You

View More

Related articles

Reviews for Learning Hunk

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Learning Hunk - Dmitry Anoshin

    Table of Contents

    Learning Hunk

    Credits

    About the Authors

    About the Reviewer

    www.PacktPub.com

    Support files, eBooks, discount offers, and more

    Why subscribe?

    Free access for Packt account holders

    Preface

    What this book covers

    What you need for this book

    Who this book is for

    Conventions

    Reader feedback

    Customer support

    Downloading the example code

    Downloading the color images of this book

    Errata

    Piracy

    Questions

    1. Meet Hunk

    Big data analytics

    The big problem

    The elegant solution

    Supporting SPL

    Intermediate results

    Getting to know Hunk

    Splunk versus Hunk

    Hunk architecture

    Connecting to Hadoop

    Advance Hunk deployment

    Native versus virtual indexes

    Native indexes

    Virtual index

    External result provider

    Computation models

    Data streaming

    Data reporting

    Mixed mode

    Hunk security

    One Hunk user to one Hadoop user

    Many Hunk users to one Hadoop user

    Hunk user(s) to the same Hadoop user with different queues

    Setting up Hadoop

    Starting and using a virtual machine with CDH5

    SSH user

    MySQL

    Starting the VM and cluster in VirtualBox

    Big data use case

    Importing data from RDBMS to Hadoop using Sqoop

    Telecommunications – SMS, Call, and Internet dataset from dandelion.eu

    Milano grid map

    CDR aggregated data import process

    Periodical data import from MySQL using Sqoop and Oozie

    Problems to solve

    Summary

    2. Explore Hadoop Data with Hunk

    Setting up Hunk

    Extracting Hunk to a VM

    Setting up Hunk variables and configuration files

    Running Hunk for the first time

    Setting up a data provider and virtual index for CDR data

    Setting up a connection to Hadoop

    Setting up a virtual index for data stored in Hadoop

    Accessing data through a virtual index

    Exploring data

    Creating reports

    The top five browsers report

    Top referrers

    Site errors report

    Creating alerts

    Creating a dashboard

    Controlling security with Hunk

    The default Hadoop security

    One Hunk user to one Hadoop user

    Summary

    3. Meeting Hunk Features

    Knowledge objects

    Field aliases

    Calculated fields

    Field extractions

    Tags

    Event type

    Workflow actions

    Macros

    Data model

    Add auto-extracting fields

    Adding GeoIP attributes

    Other ways to add attributes

    Introducing Pivot

    Summary

    4. Adding Speed to Reports

    Big data performance issues

    Hunk report acceleration

    Creating a virtual index

    Streaming mode

    Creating an acceleration search

    What's going on in Hadoop?

    Report acceleration summaries

    Reviewing summary details

    Managing report accelerations

    Hunk accelerations limits

    Summary

    5. Customizing Hunk

    What we are going to do with the Splunk SDK

    Supported languages

    Solving problems

    REST API

    The implementation plan

    The conclusion

    Dashboard customization using Splunk Web Framework

    Functionality

    A description of time-series aggregated CDR data

    Source data

    Creating a virtual index for Milano CDR

    Creating a virtual index for the Milano grid

    Creating a virtual index using sample data

    Implementation

    Querying the visualization

    Downloading the application

    Custom Google Maps

    Page layout

    Linear gradients and bins for the activity value

    Custom map components

    Other components

    The final result

    Summary

    6. Discovering Hunk Integration Apps

    What is Mongo?

    Installation

    Installing the Mongo app

    Mongo provider

    Creating a virtual index

    Inputting data from the recommendation engine backend

    Data schemas

    Data mechanics

    Counting by shop in a single collection

    Counting events in all collections

    Counting events in shops for observed days

    Summary

    7. Exploring Data in the Cloud

    An introduction to Amazon EMR and S3

    Amazon EMR

    Setting up an Amazon EMR cluster

    Amazon S3

    S3 as a data provider for Hunk

    The advantages of EMR and S3

    Integrating Hunk with EMR and S3

    Method 1: BYOL

    Setting up the Hunk AMI

    Adding a license

    Configuring the data provider

    Configuring a virtual index

    Setting up a provider and virtual index in the configuration file

    Exploring data

    Method 2: Hunk–hourly pricing

    Provisioning a Hunk instance using the Cloud formation template

    Provisioning a Hunk instance using the EC2 Console

    Converting Hunk from an hourly rate to a license

    Summary

    Index

    Learning Hunk


    Learning Hunk

    Copyright © 2015 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    First published: December 2015

    Production reference: 1181215

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham B3 2PB, UK.

    ISBN 978-1-78217-482-0

    www.packtpub.com

    Credits

    Authors

    Dmitry Anoshin

    Sergey Sheypak

    Reviewers

    Jigar Bhatt

    Neil Mehta

    Acquisition Editors

    Hemal Desai

    Reshma Raman

    Content Development Editor

    Anish Sukumaran

    Technical Editor

    Shivani Kiran Mistry

    Copy Editor

    Stephen Copestake

    Project Coordinator

    Izzat Contractor

    Proofreader

    Safis Editing

    Indexer

    Hemangini Bari

    Graphics

    Jason Monteiro

    Production Coordinator

    Nilesh Mohite

    Cover Work

    Nilesh Mohite

    About the Authors

    Dmitry Anoshin is a data-centric technologist and a recognized expert in building and implementing big data and analytics solutions. He has a successful track record when it comes to implementing business and digital intelligence projects in numerous industries, including retail, finance, marketing, and e-commerce.

    Dmitry possesses in-depth knowledge of digital/business intelligence, ETL, data warehousing, and big data technologies. He has extensive experience in the data integration process and is proficient in using various data warehousing methodologies. Dmitry has constantly exceeded project expectations when he has worked for financial, machine tool, and retail industries.

    He has completed a number of multinational full BI/DI solution life cycle implementation projects. With expertise in data modeling, Dmitry also has a background and business experience in multiple relation databases, OLAP systems, and NoSQL databases.

    In addition, he has reviewed SAP BusinessObjects Reporting Cookbook, Creating Universes with SAP BusinessObjects, and Learning SAP BusinessObjects Dashboards, all by Packt Publishing and was the author of SAP Lumira Essentials, Packt Publishing.

    I would like to tell my wife Sveta how much I love her. I dedicate this book to my wife and children, Vasily and Anna. Thank you for your never-ending support that keeps me going.

    Sergey Sheypak started his so-called big data practice in 2010 as a Teradata PS consultant. His was leading the Teradata Master Data Management deployment in Sberbank, Russia (which has 110 billion customers). Later Sergey switched to AsterData and Hadoop practices. Sergey joined the Research and Development team at MegaFon (one of the top three telecom companies in Russia with 70 billion customers) in 2012. While leading the Hadoop team at MegaFon, Sergey built ETL processes from existing Oracle DWH to HDFS. Automated end-to-end tests and acceptance tests were introduced as a mandatory part of the Hadoop development process. Scoring geospatial analysis systems based on specific telecom data were developed and launched. Now, Sergey works as independent consultant in Sweden.

    About the Reviewer

    Jigar Bhatt is a computer engineering undergraduate from the National Institute of Technology, Surat. He specializes in big data technologies and has a deep interest in data science and machine learning. He has also engineered several cloud-based Android applications. He is currently working as a full-time software developer at a renowned start-up, focusing on building and optimizing cloud platforms and ensuring profitable business intelligence round the clock.

    Apart from academics, he finds adventurous sports

    Enjoying the preview?
    Page 1 of 1