Learning Elasticsearch
4/5
()
About this ebook
- Get to grips with the basics of Elasticsearch concepts and its APIs, and use them to create efficient applications
- Create large-scale Elasticsearch clusters and perform analytics using aggregation
- This comprehensive guide will get you up and running with Elasticsearch 5.x in no time
If you want to build efficient search and analytics applications using Elasticsearch, this book is for you. It will also benefit developers who have worked with Lucene or Solr before and now want to work with Elasticsearch. No previous knowledge of Elasticsearch is expected.
Related to Learning Elasticsearch
Related ebooks
Learning Elasticsearch 7.x: Index, Analyze, Search and Aggregate Your Data Using Elasticsearch (English Edition) Rating: 0 out of 5 stars0 ratingsKafka Streams - Real-time Streams Processing Rating: 5 out of 5 stars5/5Apache Cassandra Essentials Rating: 4 out of 5 stars4/5Elasticsearch Blueprints Rating: 0 out of 5 stars0 ratingsTroubleshooting PostgreSQL Rating: 5 out of 5 stars5/5Building a RESTful Web Service with Spring Rating: 5 out of 5 stars5/5Cassandra Design Patterns - Second Edition Rating: 0 out of 5 stars0 ratingsMonitoring Docker Rating: 0 out of 5 stars0 ratingsASP.NET Web API Security Essentials Rating: 0 out of 5 stars0 ratingsElasticsearch Indexing Rating: 0 out of 5 stars0 ratingsLearning ELK Stack Rating: 0 out of 5 stars0 ratingsLearning Apache Cassandra - Second Edition Rating: 0 out of 5 stars0 ratingsMastering Elasticsearch - Second Edition Rating: 0 out of 5 stars0 ratingsElasticsearch in Action Rating: 0 out of 5 stars0 ratingsElasticSearch Cookbook Rating: 5 out of 5 stars5/5Redis in Action Rating: 0 out of 5 stars0 ratingsSystem Design Interview: 300 Questions And Answers: Prepare And Pass Rating: 0 out of 5 stars0 ratingsHow To Build Microservices: Top 10 Hacks To Modeling, Integrating & Deploying Microservices Rating: 0 out of 5 stars0 ratingsLearning RabbitMQ Rating: 0 out of 5 stars0 ratingsElasticsearch Essentials Rating: 0 out of 5 stars0 ratingsKibana Essentials Rating: 0 out of 5 stars0 ratingsElasticSearch Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsEvent Processing in Action Rating: 0 out of 5 stars0 ratingsJava Persistence with Spring Data and Hibernate Rating: 0 out of 5 stars0 ratingsThe Easiest Way to Learn Design Patterns Rating: 0 out of 5 stars0 ratingsMastering Elastic Stack Rating: 0 out of 5 stars0 ratingsSpring Security 3.x Cookbook Rating: 0 out of 5 stars0 ratingsRESTful API Design - Best Practices in API Design with REST: API-University Series, #3 Rating: 5 out of 5 stars5/5
Databases For You
Learn SQL in 24 Hours Rating: 5 out of 5 stars5/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Behind Every Good Decision: How Anyone Can Use Business Analytics to Turn Data into Profitable Insight Rating: 5 out of 5 stars5/5Blockchain Basics: A Non-Technical Introduction in 25 Steps Rating: 5 out of 5 stars5/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Data Stewardship: An Actionable Guide to Effective Data Management and Data Governance Rating: 4 out of 5 stars4/5Access 2019 For Dummies Rating: 0 out of 5 stars0 ratingsRelational Database Design and Implementation Rating: 5 out of 5 stars5/5100+ SQL Queries T-SQL for Microsoft SQL Server Rating: 4 out of 5 stars4/5Access 2010 All-in-One For Dummies Rating: 4 out of 5 stars4/5Practical Data Analysis Rating: 4 out of 5 stars4/5Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program Rating: 4 out of 5 stars4/5SQL: Practical Guide for Developers Rating: 2 out of 5 stars2/5Learn SQL Server Administration in a Month of Lunches Rating: 3 out of 5 stars3/5Beginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics Rating: 0 out of 5 stars0 ratingsPython and SQLite Development Rating: 0 out of 5 stars0 ratingsThe Visual Imperative: Creating a Visual Culture of Data Discovery Rating: 4 out of 5 stars4/5Python Projects for Everyone Rating: 0 out of 5 stars0 ratingsMATLAB Machine Learning Recipes: A Problem-Solution Approach Rating: 0 out of 5 stars0 ratingsDeveloping High Quality Data Models Rating: 0 out of 5 stars0 ratingsCOBOL Basic Training Using VSAM, IMS and DB2 Rating: 5 out of 5 stars5/5Oracle Enterprise Manager Cloud Control 12c: Managing Data Center Chaos Rating: 0 out of 5 stars0 ratingsServerless Architectures on AWS, Second Edition Rating: 5 out of 5 stars5/5Excel 2021 Rating: 4 out of 5 stars4/5Artificial Intelligence for Fashion: How AI is Revolutionizing the Fashion Industry Rating: 0 out of 5 stars0 ratingsBusiness Intelligence Strategy and Big Data Analytics: A General Management Perspective Rating: 5 out of 5 stars5/5Visual Basic 2010 Coding Briefs Data Access Rating: 5 out of 5 stars5/5
Reviews for Learning Elasticsearch
1 rating1 review
- Rating: 4 out of 5 stars4/5Je recommande ce livre , étant débutant, je n'ai pas eu de problème à comprendre, surtout que chaque concept est expliqué par des exemples
Book preview
Learning Elasticsearch - Abhishek Andhavarapu
Learning Elasticsearch
Distributed real-time search and analytics with Elasticsearch 5.x
Abhishek Andhavarapu
BIRMINGHAM - MUMBAI
< html PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN
http://www.w3.org/TR/REC-html40/loose.dtd
>
Learning Elasticsearch
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: June 2017
Production reference: 1290617
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78712-845-3
www.packtpub.com
Credits
About the Author
Abhishek Andhavarapu is a software engineer at eBay who enjoys working on highly scalable distributed systems. He has a master's degree in Distributed Computing and has worked on multiple enterprise Elasticsearch applications, which are currently serving hundreds of millions of requests per day.
He began his journey with Elasticsearch in 2012 to build an analytics engine to power dashboards and quickly realized that Elasticsearch is like nothing out there for search and analytics. He has been a strong advocate since then and wrote this book to share the practical knowledge he gained along the way.
Writing a book is a humongous task, I want to thank my wife Ashwini for her patience and support during the nights and weekends I spent writing this book. I am thankful to my parents Govinda Rajulu, Jaya Lakshmi, my brother Sarat Kiran and my in-laws Satya Rao and Suguna for the constant motivation and encouragement throughout the writing of this book. I'm grateful to all my friends and colleagues, whom I couldn't mention by name, for their valuable feedback and inputs.
I also would like to thank my publisher and editors at Packt for the continuous support.
About the Reviewers
Dan Noble is a software engineer with a passion for writing secure, clean, and articulate code. He enjoys working with a variety of programming languages and software frameworks, particularly Python, Elasticsearch, and various Javascript frontend technologies. Dan currently works on geospatial web applications and data processing systems.
Dan has been a user and advocate of Elasticsearch since 2011. He has given several talks about Elasticsearch, is the author of the book Monitoring Elasticsearch, and was a technical reviewer for the book The Elasticsearch Cookbook, Second Edition, by Alberto Paro. Dan is also the author of the Python Elasticsearch client rawes.
Marcelo Ochoa works at the system laboratory of Facultad de Ciencias Exactas of the Universidad Nacional del Centro de la Provincia de Buenos Aires and is the CTO at Scotas.com, a company that specializes in near real-time search solutions using Apache Solr and Oracle. He divides his time between university jobs and external projects related to Oracle and big data technologies. He has worked on several Oracle-related projects, such as the translation of Oracle manuals and multimedia CBTs. His background is in database, network, web, and Java technologies. In the XML world, he is known as the developer of the DB Generator for the Apache Cocoon project. He has worked on the open source projects DBPrism and DBPrism CMS, the Lucene-Oracle integration using the Oracle JVM Directory implementation, and the Restlet.org project, where he worked on the Oracle XDB Restlet Adapter, which is an alternative to writing native REST web services inside a database resident JVM.
Since 2006, he has been part of an Oracle ACE program and recently incorporated into a Docker Mentor program.
He has coauthored Oracle Database Programming using Java and Web Services by Digital Press and Professional XML Databases by Wrox Press and been a technical reviewer for several PacktPub books, such as Mastering Elastic Stack, Mastering Elasticsearch 5.x - Third Edition, Elasticsearch 5.x Cookbook - Third Edition, and others.
www.PacktPub.com
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Customer Feedback
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1787128458.
If you'd like to join our team of regular reviewers, you can e-mail us at customerreviews@packtpub.com. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!
Table of Contents
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
Introduction to Elasticsearch
Basic concepts of Elasticsearch
Document
Index
Type
Cluster and node
Shard
Interacting with Elasticsearch
Creating a document
Retrieving an existing document
Updating an existing document
Updating a partial document
Deleting an existing document
How does search work?
Importance of information retrieval
Simple search query
Inverted index
Stemming
Synonyms
Phrase search
Apache Lucene
Scalability and availability
Relation between node, index, and shard
Three shards with zero replicas
Six shards with zero replicas
Six shards with one replica
Distributed search
Failure handling
Strengths and limitations of Elasticsearch
Summary
Setting Up Elasticsearch and Kibana
Installing Elasticsearch
Installing Java
Windows
Starting and stopping Elasticsearch
Mac OS X
Starting and stopping Elasticsearch
DEB and RPM packages
Debian package
RPM package
Starting and stopping Elasticsearch
Sample configuration files
Verifying Elasticsearch is running
Installing Kibana
Mac OS X
Starting and stopping Kibana
Windows
Starting and stopping Kibana
Query format used in this book (Kibana Console)
Using cURL or Postman
Health of the cluster
Summary
Modeling Your Data and Document Relations
Mapping
Dynamic mapping
Create index with mapping
Adding a new type/field
Getting the existing mapping
Mapping conflicts
Data type
Metafields
How to handle null values
Storing the original document
Searching all the fields in the document
Difference between full-text search and exact match
Core data types
Text
Keyword
Date
Numeric
Boolean
Binary
Complex data types
Array
Object
Nested
Geo data type
Geo-point data type
Specialized data type
IP
Mapping the same field with different mappings
Handling relations between different document types
Parent-child document relation
How are parent-child documents stored internally?
Nested
Routing
Summary
Indexing and Updating Your Data
Indexing your data
Indexing errors
Node/shards errors
Serialization/mapping errors
Thread pool rejection error
Managing an index
What happens when you index a document?
Updating your data
Update using an entire document
Partial updates
Scripted updates
Upsert
NOOP
What happens when you update a document?
Merging segments
Using Kibana to discover
Using Elasticsearch in your application
Java
Transport client
Dependencies
Initializing the client
Sniffing
Node client
REST client
Third party clients
Indexing using Java client
Concurrency
Translog
Async versus sync
CRUD from translog
Primary and Replica shards
Primary preference
More replicas for query throughput
Increasing/decreasing the number of replicas
Summary
Organizing Your Data and Bulk Data Ingestion
Bulk operations
Bulk API
Multi Get API
Update by query
Delete by query
Reindex API
Change mappings/settings
Combining documents from one or more indices
Copying only missing documents
Copying a subset of documents into a new index
Copying top N documents
Copying the subset of fields into new index
Ingest Node
Organizing your data
Index alias
Index templates
Managing time-based indices
Shrink API
Summary
All About Search
Different types of queries
Sample data
Querying Elasticsearch
Basic query (finding the exact value)
Pagination
Sorting based on existing fields
Selecting the fields in the response
Querying based on range
Handling dates
Analyzed versus non-analyzed fields
Term versus Match query
Match phrase query
Prefix and match phrase prefix query
Wildcard and Regular expression query
Exists and missing queries
Using more than one query
Routing
Debugging search query
Relevance
Queries versus Filters
How to boost relevance based on a single field
How to boost score based on queries
How to boost relevance using decay functions
Rescoring
Debugging relevance score
Searching for same value across multiple fields
Best matching fields
Most matching fields
Cross-matching fields
Caching
Node Query cache
Shard request cache
Summary
More Than a Search Engine (Geofilters, Autocomplete, and More)
Sample data
Correcting typos and spelling mistakes
Fuzzy query
Making suggestions based on the user input
Implementing did you mean
feature
Term suggester
Phrase suggester
Implementing the autocomplete feature
Highlighting
Handling document relations using parent-child
The has_parent query
The has_child query
Inner hits for parent-child
How parent-child works internally
Handling document relations using nested
Inner hits for nested documents
Scripting
Script Query
Post Filter
Reverse search using the percolate query
Geo and Spatial Filtering
Geo Distance
Using Geolocation to rank the search results
Geo Bounding Box
Sorting
Multi search
Search templates
Querying Elasticsearch from Java application
Summary
How to Slice and Dice Your Data Using Aggregations
Aggregation basics
Sample data
Query structure
Multilevel aggregations
Types of aggregations
Terms aggregations (group by)
Size and error
Order
Minimum document count
Missing values
Aggregations based on filters
Aggregations on dates ( range, histogram )
Aggregations on numeric values (range, histogram)
Aggregations on geolocation (distance, bounds)
Geo distance
Geo bounds
Aggregations on child documents
Aggregations on nested documents
Reverse nested aggregation
Post filter
Using Kibana to visualize aggregations
Caching
Doc values
Field data
Summary
Production and Beyond
Configuring Elasticsearch
The directory structure
zip/tar.gz
DEB/RPM
Configuration file
Cluster and node name
Network configuration
Memory configuration
Configuring file descriptors
Types of nodes
Multinode cluster
Inspecting the logs
How nodes discover each other
Node failures
X-Pack
Windows
Mac OS X
Debian/RPM
Authentication
X-Pack basic license
Monitoring
Monitoring Elasticsearch clusters
Monitoring indices
Monitoring nodes
Thread pools
Elasticsearch server logs
Slow logs
Summary
Exploring Elastic Stack (Elastic Cloud, Security, Graph, and Alerting)
Elastic Cloud
High availability
Data reliability
Security
Authentication and roles
Securing communications using SSL
Graph
Graph UI
Alerting
Summary
Preface
Welcome to Learning Elasticsearch. We will start by describing the basic concepts of Elasticsearch. You will see how to install Elasticsearch and Kibana and learn how to index and update your data. We will use an e-commerce site as an example to explain how a search engine works and how to query your data. The real power of Elasticsearch is aggregations. You will see how to perform aggregation-based analytics with ease. You will also see how to use Kibana to explore and visualize your data. Finally, we will discuss how to use Graph to discover relations in your data and use alerting to set up alerts and notification on different trends in your data.
To better explain various concepts, lots of examples have been used throughout the book. Detailed instructions to install Elasticsearch, Kibana and how to execute the examples is included in Chapter 2, Setting Up Elasticsearch and Kibana.
What this book covers
Chapter 1, Introduction to Elasticsearch, describes the building blocks of Elasticsearch and what makes Elasticsearch scalable and distributed. In this chapter, we also discuss the strengths and limitations of Elasticsearch.
Chapter 2, Setting Up Elasticsearch and Kibana, covers the installation of Elasticsearch and Kibana.
Chapter 3, Modeling Your Data and Document Relations, focuses on modeling your data. To support text search, Elasticsearch preprocess the data before indexing. This chapter describes why preprocessing is necessary and various analyzers Elasticsearch supports. In addition to that, we discuss how to handle relationships between different document types.
Chapter 4, Indexing and Updating Your Data, covers how to index and update your data and what happens internally when you index and update. The data indexed in Elasticsearch is only available after a small delay, we discuss the reason for the delay and how to control the delay.
Chapter 5, Organizing Your Data and Bulk Data Ingestion, describes how to organize and manage indices in Elasticsearch using aliases and templates and more. This chapter also covers various Bulk API’s Elasticsearch supports and how to rebuild your existing indices using Reindex and Shrink API.
Chapter 6, All About Search, covers how to search, sort and paginate on your data. The concept of relevance is introduced and we discuss how to tune the relevance score to get the most relevant search results at the top.
Chapter 7, More Than a Search Engine (Geofilters, Autocomplete and More), covers how to filter based on geolocation, using Elasticsearch suggesters for autocomplete, correcting user typo’s and lot more.
Chapter 8, How to Slice and Dice Your Data Using Aggregations, covers different kinds of aggregations Elasticsearch supports and how to visualize the data using Kibana.
Chapter 9, Production and Beyond, covers important settings to configure and monitor in production.
Chapter 10, Exploring Elastic Stack (Elastic Cloud, Security, Graph, and Alerting), covers Elastic Cloud, which is managed cloud hosting and other products that are part of X-Pack.
What you need for this book
The book was written using Elasticsearch 5.1.2, and all the examples used in the book should work with it. The request format used in this book is based on the Kibana Console and you’ll need Kibana Console or Sense Chrome plugin to execute the examples used in this book. Please refer to Query format used in this book section of Chapter 2, Setting up Elasticsearch and Kibana for more details. If using Kibana or Sense is not option, you can use other HTTP clients such as cURL or Postman. The request format is slightly different and is explained in the Using cURL or Postman section of Chapter 2, Setting Up Elasticsearch and Kibana.
Who this book is for
This book is for software developers who are planning to build a search and analytics engine or are trying to learn Elasticsearch.
Some familiarity with web technologies (JavaScript, REST, JSON) would be helpful.
Conventions
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, path names, dummy URLs, user input, and Twitter handles are shown as follows: We can include other contexts through the use of the include directive.
A block of code is set as follows:
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: Clicking the Next button moves you to the next screen.
Warnings or important notes appear in a box like this.
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail feedback@packtpub.com, and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
Downloading the example code
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
You can download the code files by following these steps:
Log in or register to our website using your e-mail address and password.
Hover the mouse pointer on the SUPPORT tab at the top.
Click on Code Downloads & Errata.
Enter the name of the book in the Search box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on Code Download.
You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Learning-Elasticsearch. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Errata
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at copyright@packtpub.com with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
Questions
If you have a problem with any aspect of this book, you can contact us at questions@packtpub.com, and we will do our best to address the problem.
Introduction to Elasticsearch
In this chapter, we will focus on the basic concepts of Elasticsearch. We will start by explaining the building blocks and then discuss how to create, modify and query in Elasticsearch. Getting started with Elasticsearch is very easy; most operations come with default settings. The default settings can be overridden when you need more advanced features.
I first started using Elasticsearch in 2012 as a backend search engine to power our Analytics dashboards. It has been more than five years, and I never looked for any other technologies for our search needs. Elasticsearch is much more than just a search engine; it supports complex aggregations, geo filters, and the list goes on. Best of all, you can run all your queries at a speed you have never seen before. To understand how this magic happens, we will briefly discuss how Elasticsearch works internally and then discuss how to talk to Elasticsearch. Knowing how it works internally will help you understand its strengths and limitations. Elasticsearch, like any other open source technology, is very rapidly evolving, but the core fundamentals that power Elasticsearch don't change. By the end of this chapter, we will have covered the following:
Basic concepts of