How to install Elasticsearch 5.x

Elasticsearch 5.x

and the ELK/Elastic Stack

This article is a step-by-step guide to using Elasticsearch in combination with the rest of the ELK stack (now called Elastic Stack) to ship, parse, store, and analyse logs. Throughout this guide, we’re going to use: Filebeat for collecting logs (note: the information presented here may also apply to metrics, SQL transactions, and other sources for which a Beat exists).

Logstash for enriching logs and metrics, and processing them into structured events that will be indexed in Elasticsearch.

Kibana for visualizing the structured data stored in Elasticsearch.

Scope

In this guide, we’ll use Apache access logs, as we require the log of the Apache logs but all of the knowledge you’ll gain from this article is applicable to working with other types of logs as well. Our OS platform is Ubuntu and we’ll be using Ubuntu 16.04 LTS. Finally, we’ll look at the challenges of scaling out such a system, either on your own or by using a hosted ELK/Elastic Stack option. The infrastructure used in our this environment will be separated shown in figure 1 below.

 

Installing and configuring Elasticsearch, Kibana and Filebeat

Elasticsearch

Elasticsearch is a Java application, so you’ll need to install a recent version of either Oracle’s JDK or OpenJDK. In Ubuntu 16.04, you’ll get OpenJDK 8 with:

sudo apt-get install openjdk-8-jdk

I’d recommend the official packages. Here, we’ll use the APT repository to get the 5.x version:

wget -qO – https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add –

echo “deb https://artifacts.elastic.co/packages/5.x/apt stable main” | sudo tee -a /etc/apt/sources.list.d/elasticsearch-5.x.list

sudo apt-get update && sudo apt-get install elasticsearch

Before starting Elasticsearch (sudo service elasticsearch start), you’ll need to decide how much memory you’re willing to give it. This is specified by changing -Xms and -Xmx in /etc/elasticsearch/jvm.options. The rule of thumb is to start with half of the total available memory of the server as the value for both options; this will leave the other half available to the operating system for caching. Later on, you can monitor Elasticsearch’s heap usage, as well as the IO throughput needs, and adjust the balance.

Kibana

Kibana can be installed from the same repository you added for Elasticsearch:

sudo apt-get install kibana

sudo service kibana start

Filebeat

Once again, we’ll use the APT repository:

sudo apt-get install filebeat

Filebeat 5.x will, by default, push a template to Elasticsearch that will configure indices matching the filebeat* pattern in a way that works for most use-cases. For example, most string fields are indexed as keywords, which works well for analysis (Kibana’s visualizations). The message field is indexed as text, meaning it can be used for full-text search (Kibana’s Discover tab).

For tailing the Apache log file, you’ll also need to point Filebeat to it via the paths variable under filebeat.prospectors (the first prospector is defined there with input_type: log). You’ll define the path, as follows:

paths:

– /var/log/apache.log

Then start Filebeat and your logs should start pouring in:

sudo service filebeat start

Visualizing logs in Kibana

At this point, you can open Kibana in your browser (on port 5601 by default) to explore the logs from Elasticsearch. In the our environment we don’t need to specify the port as we run this behind an Nginx proxy. Nginx has been configured to forward the traffic onto Kibana. We have setup and configured a DNS name kibana.example.com as the url.

Moving on, if we want to use the Filebeat push directly, first you need to tell it which indices to search on, by defining an index pattern. Set it to filebeat-*:


Then, you can go to the Discover pane and search your logs. For example, you can search the contents of logs like you would with grep, but it comes with the same disadvantages. For example, if you search for 200, you can’t tell whether it’s the 200 OK response or 200 bytes were transferred.

When it comes to visualizing, you can go to the Visualize tab and do a hostname breakdown, but again you can’t use information from the message field because different parts of it, such as the response code, aren’t yet parsed into their own fields:

Parsing data using Logstash

To fully exploit data from your logs, you need a tool that can parse them, and Logstash is such a tool. As of 5.0, you can use an Elasticsearch “ingest node” to do some of this processing, which can give you a significant performance boost. However, ingest pipelines don’t offer Logstash’s flexibility yet (for example, conditionals), so we’ll concentrate on Logstash in this post.

To build your Filebeat -> Logstash -> Elasticsearch -> Kibana pipeline, you need to install and configure Logstash, and then update the Filebeat configuration to point to Logstash, instead of Elasticsearch. Let’s take it step by step:

First, install Logstash from the same repository:

sudo apt-get install logstash

Then, create a file in /etc/logstash/conf.d/ (say, filebeat.conf), where you’d specify a beats input for Filebeat to connect to. Next, you can parse your data with the grok filter before pushing it to Elasticsearch:

input {

beats {

port => 5044

}

}

filter {

grok {

match => [ “message”, “%{COMMONAPACHELOG}” ]

}

}

output {

elasticsearch {

hosts => “localhost:9200”

}

}

Moving on to Filebeat, you’d comment the whole output.elasticsearch section and configure hosts under output.logstash instead:

### Logstash as output

logstash:

# The Logstash hosts

hosts: [“localhost:5044”]

To trigger Filebeat to re-send your logs via Logstash, you’ll need to do a few things:

Stop filebeat: sudo service filebeat stop

Remove the registry file, where Filebeat remembers where it left off tailing the Apache log. This will make Filebeat start over again: sudo rm /var/lib/filebeat/registry

Delete the data already indexed in Elasticsearch: curl -XDELETE localhost:9200/filebeat*

Start Logstash (On the server): sudo service logstash start

Start Filebeat (On the client): sudo service filebeat start

Now your logs should be indexed again in Elasticsearch, only now they’re structured, and by default, going to the logstash-* indices. Logstash uses a template similar to Filebeat for its own indices, so you don’t have to worry about settings for now.

Going back to Kibana, you’d need to point it to the logstash-* (instead of filebeat-*), indices by adding a new index pattern under the Settings tab. You’ll probably also want to set it to default:

Because the Apache logs are now structured, you can search in specific fields, like the response field:

You can also build visualizations that allow you to see the breakdown of responses over time:

As a final note I would recommend monitoring the load of the Elastic Stack environment as growing indexes require more resources. This is achieved by adding more servers into the cluster. This also improves fault tolerance and adding new servers into the cluster require some minor changes to the elasticsearch.yml file so that the nodes know about each other and can form a cluster.

You’ll need at least three nodes in production, so that if one is unreachable, the others can still form a quorum. To define what the quorum size should be to, set discovery.zen.minimum_master_nodes to NUMBER_OF_NODES/2+1. Finally, you can make Elasticsearch listen to an outside interface (instead of the default localhost) via network.host, allowing you to separate Filebeat, Logstash, Elasticsearch, and Kibana, and your browser, to separate machines.

Leave a Reply

Your email address will not be published. Required fields are marked *