Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Learning Elasticsearch
Learning Elasticsearch
Learning Elasticsearch
Ebook466 pages4 hours

Learning Elasticsearch

Rating: 4 out of 5 stars

4/5

()

Read preview

About this ebook

About This Book
  • Get to grips with the basics of Elasticsearch concepts and its APIs, and use them to create efficient applications
  • Create large-scale Elasticsearch clusters and perform analytics using aggregation
  • This comprehensive guide will get you up and running with Elasticsearch 5.x in no time
Who This Book Is For

If you want to build efficient search and analytics applications using Elasticsearch, this book is for you. It will also benefit developers who have worked with Lucene or Solr before and now want to work with Elasticsearch. No previous knowledge of Elasticsearch is expected.

LanguageEnglish
Release dateJun 30, 2017
ISBN9781787129917
Learning Elasticsearch

Related to Learning Elasticsearch

Related ebooks

Databases For You

View More

Related articles

Reviews for Learning Elasticsearch

Rating: 4 out of 5 stars
4/5

1 rating1 review

What did you think?

Tap to rate

Review must be at least 10 words

  • Rating: 4 out of 5 stars
    4/5
    Je recommande ce livre , étant débutant, je n'ai pas eu de problème à comprendre, surtout que chaque concept est expliqué par des exemples

Book preview

Learning Elasticsearch - Abhishek Andhavarapu

Learning Elasticsearch

Distributed real-time search and analytics with Elasticsearch 5.x

Abhishek Andhavarapu

BIRMINGHAM - MUMBAI

< html PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN http://www.w3.org/TR/REC-html40/loose.dtd>

Learning Elasticsearch

Copyright © 2017 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: June 2017

Production reference: 1290617

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-78712-845-3

www.packtpub.com

Credits

About the Author

Abhishek Andhavarapu is a software engineer at eBay who enjoys working on highly scalable distributed systems. He has a master's degree in Distributed Computing and has worked on multiple enterprise Elasticsearch applications, which are currently serving hundreds of millions of requests per day.

He began his journey with Elasticsearch in 2012 to build an analytics engine to power dashboards and quickly realized that Elasticsearch is like nothing out there for search and analytics. He has been a strong advocate since then and wrote this book to share the practical knowledge he gained along the way.

Writing a book is a humongous task, I want to thank my wife Ashwini for her patience and support during the nights and weekends I spent writing this book. I am thankful to my parents Govinda Rajulu, Jaya Lakshmi, my brother Sarat Kiran and my in-laws Satya Rao and Suguna for the constant motivation and encouragement throughout the writing of this book. I'm grateful to all my friends and colleagues, whom I couldn't mention by name, for their valuable feedback and inputs.

I also would like to thank my publisher and editors at Packt for the continuous support.

About the Reviewers

Dan Noble is a software engineer with a passion for writing secure, clean, and articulate code. He enjoys working with a variety of programming languages and software frameworks, particularly Python, Elasticsearch, and various Javascript frontend technologies. Dan currently works on geospatial web applications and data processing systems.

Dan has been a user and advocate of Elasticsearch since 2011. He has given several talks about Elasticsearch, is the author of the book Monitoring Elasticsearch, and was a technical reviewer for the book The Elasticsearch Cookbook, Second Edition, by Alberto Paro. Dan is also the author of the Python Elasticsearch client rawes.

Marcelo Ochoa works at the system laboratory of Facultad de Ciencias Exactas of the Universidad Nacional del Centro de la Provincia de Buenos Aires and is the CTO at Scotas.com, a company that specializes in near real-time search solutions using Apache Solr and Oracle. He divides his time between university jobs and external projects related to Oracle and big data technologies. He has worked on several Oracle-related projects, such as the translation of Oracle manuals and multimedia CBTs. His background is in database, network, web, and Java technologies. In the XML world, he is known as the developer of the DB Generator for the Apache Cocoon project. He has worked on the open source projects DBPrism and DBPrism CMS, the Lucene-Oracle integration using the Oracle JVM Directory implementation, and the Restlet.org project, where he worked on the Oracle XDB Restlet Adapter, which is an alternative to writing native REST web services inside a database resident JVM.

Since 2006, he has been part of an Oracle ACE program and recently incorporated into a Docker Mentor program.

He has coauthored Oracle Database Programming using Java and Web Services by Digital Press and Professional XML Databases by Wrox Press and been a technical reviewer for several PacktPub books, such as Mastering Elastic Stack, Mastering Elasticsearch 5.x - Third Edition, Elasticsearch 5.x Cookbook - Third Edition, and others.

www.PacktPub.com

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1787128458.

If you'd like to join our team of regular reviewers, you can e-mail us at customerreviews@packtpub.com. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Table of Contents

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

Introduction to Elasticsearch

Basic concepts of Elasticsearch

Document

Index

Type

Cluster and node

Shard

Interacting with Elasticsearch

Creating a document

Retrieving an existing document

Updating an existing document

Updating a partial document

Deleting an existing document

How does search work?

Importance of information retrieval

Simple search query

Inverted index

Stemming

Synonyms

Phrase search

Apache Lucene

Scalability and availability

Relation between node, index, and shard

Three shards with zero replicas

Six shards with zero replicas

Six shards with one replica

Distributed search

Failure handling

Strengths and limitations of Elasticsearch

Summary

Setting Up Elasticsearch and Kibana

Installing Elasticsearch

Installing Java

Windows

Starting and stopping Elasticsearch

Mac OS X

Starting and stopping Elasticsearch

DEB and RPM packages

Debian package

RPM package

Starting and stopping Elasticsearch

Sample configuration files

Verifying Elasticsearch is running

Installing Kibana

Mac OS X

Starting and stopping Kibana

Windows

Starting and stopping Kibana

Query format used in this book (Kibana Console)

Using cURL or Postman

Health of the cluster

Summary

Modeling Your Data and Document Relations

Mapping

Dynamic mapping

Create index with mapping

Adding a new type/field

Getting the existing mapping

Mapping conflicts

Data type

Metafields

How to handle null values

Storing the original document

Searching all the fields in the document

Difference between full-text search and exact match

Core data types

Text

Keyword

Date

Numeric

Boolean

Binary

Complex data types

Array

Object

Nested

Geo data type

Geo-point data type

Specialized data type

IP

Mapping the same field with different mappings

Handling relations between different document types

Parent-child document relation

How are parent-child documents stored internally?

Nested

Routing

Summary

Indexing and Updating Your Data

Indexing your data

Indexing errors

Node/shards errors

Serialization/mapping errors

Thread pool rejection error

Managing an index

What happens when you index a document?

Updating your data

Update using an entire document

Partial updates

Scripted updates

Upsert

NOOP

What happens when you update a document?

Merging segments

Using Kibana to discover

Using Elasticsearch in your application

Java

Transport client

Dependencies

Initializing the client

Sniffing

Node client

REST client

Third party clients

Indexing using Java client

Concurrency

Translog

Async versus sync

CRUD from translog

Primary and Replica shards

Primary preference

More replicas for query throughput

Increasing/decreasing the number of replicas

Summary

Organizing Your Data and Bulk Data Ingestion

Bulk operations

Bulk API

Multi Get API

Update by query

Delete by query

Reindex API

Change mappings/settings

Combining documents from one or more indices

Copying only missing documents

Copying a subset of documents into a new index

Copying top N documents

Copying the subset of fields into new index

Ingest Node

Organizing your data

Index alias

Index templates

Managing time-based indices

Shrink API

Summary

All About Search

Different types of queries

Sample data

Querying Elasticsearch

Basic query (finding the exact value)

Pagination

Sorting based on existing fields

Selecting the fields in the response

Querying based on range

Handling dates

Analyzed versus non-analyzed fields

Term versus Match query

Match phrase query

Prefix and match phrase prefix query

Wildcard and Regular expression query

Exists and missing queries

Using more than one query

Routing

Debugging search query

Relevance

Queries versus Filters

How to boost relevance based on a single field

How to boost score based on queries

How to boost relevance using decay functions

Rescoring

Debugging relevance score

Searching for same value across multiple fields

Best matching fields

Most matching fields

Cross-matching fields

Caching

Node Query cache

Shard request cache

Summary

More Than a Search Engine (Geofilters, Autocomplete, and More)

Sample data

Correcting typos and spelling mistakes

Fuzzy query

Making suggestions based on the user input

Implementing did you mean feature

Term suggester

Phrase suggester

Implementing the autocomplete feature

Highlighting

Handling document relations using parent-child

The has_parent query

The has_child query

Inner hits for parent-child

How parent-child works internally

Handling document relations using nested

Inner hits for nested documents

Scripting

Script Query

Post Filter

Reverse search using the percolate query

Geo and Spatial Filtering

Geo Distance

Using Geolocation to rank the search results

Geo Bounding Box

Sorting

Multi search

Search templates

Querying Elasticsearch from Java application

Summary

How to Slice and Dice Your Data Using Aggregations

Aggregation basics

Sample data

Query structure

Multilevel aggregations

Types of aggregations

Terms aggregations (group by)

Size and error

Order

Minimum document count

Missing values

Aggregations based on filters

Aggregations on dates ( range, histogram )

Aggregations on numeric values (range, histogram)

Aggregations on geolocation (distance, bounds)

Geo distance

Geo bounds

Aggregations on child documents

Aggregations on nested documents

Reverse nested aggregation

Post filter

Using Kibana to visualize aggregations

Caching

Doc values

Field data

Summary

Production and Beyond

Configuring Elasticsearch

The directory structure

zip/tar.gz

DEB/RPM

Configuration file

Cluster and node name

Network configuration

Memory configuration

Configuring file descriptors

Types of nodes

Multinode cluster

Inspecting the logs

How nodes discover each other

Node failures

X-Pack

Windows

Mac OS X

Debian/RPM

Authentication

X-Pack basic license

Monitoring

Monitoring Elasticsearch clusters

Monitoring indices

Monitoring nodes

Thread pools

Elasticsearch server logs

Slow logs

Summary

Exploring Elastic Stack (Elastic Cloud, Security, Graph, and Alerting)

Elastic Cloud

High availability

Data reliability

Security

Authentication and roles

Securing communications using SSL

Graph

Graph UI

Alerting

Summary

Preface

Welcome to Learning Elasticsearch. We will start by describing the basic concepts of Elasticsearch. You will see how to install Elasticsearch and Kibana and learn how to index and update your data. We will use an e-commerce site as an example to explain how a search engine works and how to query your data. The real power of Elasticsearch is aggregations. You will see how to perform aggregation-based analytics with ease. You will also see how to use Kibana to explore and visualize your data. Finally, we will discuss how to use Graph to discover relations in your data and use alerting to set up alerts and notification on different trends in your data.

To better explain various concepts, lots of examples have been used throughout the book. Detailed instructions to install Elasticsearch, Kibana and how to execute the examples is included in Chapter 2, Setting Up Elasticsearch and Kibana.

What this book covers

Chapter 1, Introduction to Elasticsearch, describes the building blocks of Elasticsearch and what makes Elasticsearch scalable and distributed. In this chapter, we also discuss the strengths and limitations of Elasticsearch.

Chapter 2, Setting Up Elasticsearch and Kibana, covers the installation of Elasticsearch and Kibana.

Chapter 3, Modeling Your Data and Document Relations, focuses on modeling your data. To support text search, Elasticsearch preprocess the data before indexing. This chapter describes why preprocessing is necessary and various analyzers Elasticsearch supports. In addition to that, we discuss how to handle relationships between different document types.

Chapter 4, Indexing and Updating Your Data, covers how to index and update your data and what happens internally when you index and update. The data indexed in Elasticsearch is only available after a small delay, we discuss the reason for the delay and how to control the delay.

Chapter 5, Organizing Your Data and Bulk Data Ingestion, describes how to organize and manage indices in Elasticsearch using aliases and templates and more. This chapter also covers various Bulk API’s Elasticsearch supports and how to rebuild your existing indices using Reindex and Shrink API.

Chapter 6, All About Search, covers how to search, sort and paginate on your data. The concept of relevance is introduced and we discuss how to tune the relevance score to get the most relevant search results at the top.

Chapter 7, More Than a Search Engine (Geofilters, Autocomplete and More), covers how to filter based on geolocation, using Elasticsearch suggesters for autocomplete, correcting user typo’s and lot more.

Chapter 8, How to Slice and Dice Your Data Using Aggregations, covers different kinds of aggregations Elasticsearch supports and how to visualize the data using Kibana.

Chapter 9, Production and Beyond, covers important settings to configure and monitor in production.

Chapter 10, Exploring Elastic Stack (Elastic Cloud, Security, Graph, and Alerting), covers Elastic Cloud, which is managed cloud hosting and other products that are part of X-Pack.

What you need for this book

The book was written using Elasticsearch 5.1.2, and all the examples used in the book should work with it. The request format used in this book is based on the Kibana Console and you’ll need Kibana Console or Sense Chrome plugin to execute the examples used in this book. Please refer to Query format used in this book section of Chapter 2, Setting up Elasticsearch and Kibana for more details. If using Kibana or Sense is not option, you can use other HTTP clients such as cURL or Postman. The request format is slightly different and is explained in the Using cURL or Postman section of Chapter 2, Setting Up Elasticsearch and Kibana.

Who this book is for

This book is for software developers who are planning to build a search and analytics engine or are trying to learn Elasticsearch.

Some familiarity with web technologies (JavaScript, REST, JSON) would be helpful.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, path names, dummy URLs, user input, and Twitter handles are shown as follows: We can include other contexts through the use of the include directive.

A block of code is set as follows:

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: Clicking the Next button moves you to the next screen.

Warnings or important notes appear in a box like this.

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail feedback@packtpub.com, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.

Hover the mouse pointer on the SUPPORT tab at the top.

Click on Code Downloads & Errata.

Enter the name of the book in the Search box.

Select the book for which you're looking to download the code files.

Choose from the drop-down menu where you purchased this book from.

Click on Code Download.

You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Learning-Elasticsearch. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at copyright@packtpub.com with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at questions@packtpub.com, and we will do our best to address the problem.

Introduction to Elasticsearch

In this chapter, we will focus on the basic concepts of Elasticsearch. We will start by explaining the building blocks and then discuss how to create, modify and query in Elasticsearch. Getting started with Elasticsearch is very easy; most operations come with default settings. The default settings can be overridden when you need more advanced features.

I first started using Elasticsearch in 2012 as a backend search engine to power our Analytics dashboards. It has been more than five years, and I never looked for any other technologies for our search needs. Elasticsearch is much more than just a search engine; it supports complex aggregations, geo filters, and the list goes on. Best of all, you can run all your queries at a speed you have never seen before. To understand how this magic happens, we will briefly discuss how Elasticsearch works internally and then discuss how to talk to Elasticsearch. Knowing how it works internally will help you understand its strengths and limitations. Elasticsearch, like any other open source technology, is very rapidly evolving, but the core fundamentals that power Elasticsearch don't change. By the end of this chapter, we will have covered the following:

Basic concepts of

Enjoying the preview?
Page 1 of 1