Elasticsearch has become a favorite technology amongst administrators, engineers, and developers alike. Whether you are using it with the rest of the Elastic Stack or on its own, Elasticsearch has proven itself to be a powerful and easy to use search and analytics engine. Log aggregation, operational analytics, application performance monitoring, NoSQL databases, site search, and ad-hoc data analysis are just a few of the many use cases this product has become synonymous with. Now that Elasticsearch has become a mainstream technology, it is helpful to know a little something about it.
Who is this Deep Dive Course For?
The hands-on Elasticsearch Deep Dive course is great for those new to Elasticsearch or those who want to expand what they already know. However, you should have some prerequisite knowledge before considering this course as we will be getting our hands dirty with the Linux command line, Java Virtual Machines (JVMs), Public Key Infrastructure (PKI), using REST APIs with JSON, modifying configurations with YAML, and configuring network interfaces to form distributed clusters of Elasticsearch nodes. If your a bit rusty or inexperienced in any of these areas, then it may behoove you to freshen up a bit before jumping into this course.
Elasticsearch Fundamentals and Production
The Elasticsearch Deep Dive course has two main parts; Elasticsearch Fundamentals and Elasticsearch in Production. In the fundamentals section, we will first demonstrate how Elasticsearch works by showcasing the various clustering concepts like data storage, replication, recovery, rebalancing, and defining the various node types with their respective roles. Once we have a good understanding of how Elasticsearch functions under the hood, we then showcase its fundamental concepts by creating indices, ingesting data into documents, and then searching, filtering, and aggregating over said data.
You may have heard in the news, stories about massive data breaches suffered by companies who lack sufficient security implementations for their back-end datastores like Elasticsearch. By default, Elasticsearch is a wide open door. There is no network-level encryption or user-access control pre-enabled with new installations of Elasticsearch. This is why, in the Elasticsearch in Production section of this course, we demonstrate how to secure Elasticsearch using X-Pack. By using the Security plugin provided by X-Pack, we will encrypt Elasticsearch’s cluster network communications, enable user authentication, and configure roles to define granular user access control permissions for specific users and data.
Lastly, we will enable audit logging in order to keep a record of every action performed in an Elasticsearch cluster to facilitate low-level visibility into Elasticsearch usage and also to help ensure companies meet their regulatory requirements for any sensitive data they may be storing.
In the Elasticsearch in Production portion of this course, we also demonstrate how to perform various tasks that you will encounter when administrating an Elasticsearch cluster in production like rolling restarts, automated data retention using Curator, and upgrading Elasticsearch. While Elasticsearch is often used as an Application Performance Monitoring (APM) platform, it also provides an APM of sorts to monitor itself using another X-Pack plugin; Monitoring.
We will demonstrate how to enable X-Pack Monitoring for Elasticsearch to monitor itself or to remote monitor from a dedicated monitoring cluster. Using Kibana, we will explore the Monitoring UI to showcase the granular performance and usage statistics the Monitoring plugin provides.
Last but not least, we will address many of the common questions people encounter when designing an Elasticsearch deployment:
- How much heap do I need in my cluster?
- Is there a limit to how big I can make the heap on a single node?
- How do you avoid swapping Elasticsearch heap to disk?
- How many data nodes do I need?
- How many shards should I have?
- Should I spec my hardware for more CPU cores or faster clock speeds?
- What RAID configuration should I use?
- Is NAS storage OK?
- Can I span datacenters without performance degradation?
- What heap to storage ratio should I follow?
Due to the vastness of potential use cases for Elasticsearch, there are no concrete answers to any of these questions. But there are some general guidelines and best practices you can follow to address each of these questions in order to deploy the optimum Elasticsearch cluster for your specific use case.
So what are you waiting for? Go take the course already! As always, feel free to follow me on Twitter or LinkedIn and don’t hesitate to ask me questions about the course or Elastic Stack products in general. You can also join the Linux Academy Slack to interface with Linux Academy Training Architects and your fellow community members. And of course, you can always use the Linux Academy community page to showcase your accomplishments, provide feedback, or ask questions.