How to implement ELK stack for logging in microservices?

In a microservices architecture, logging is crucial for maintaining observability and ensuring that systems operate smoothly. As the number of services grows, the volume of log data can become overwhelming. Centralized logging solutions help aggregate, store, and analyze logs from multiple services, providing a unified view of system behavior. In this tutorial, we'll explore popular centralized logging tools, particularly the ELK Stack and Fluentd, and walk through the process of setting up a logging solution.

Why Centralized Logging is Important

Centralized logging addresses several challenges associated with microservices:

Simplifies Troubleshooting: By collecting logs from various services in one location, it becomes easier to troubleshoot issues and trace requests across services.
Improves Monitoring: Centralized logging enables real-time monitoring of service performance, making it easier to identify bottlenecks and failures.
Enhances Security and Compliance: Keeping a comprehensive log of events helps in auditing and ensuring compliance with regulatory requirements.
Enables Data Analysis: With aggregated logs, you can perform analytics to gain insights into application behavior, usage patterns, and potential vulnerabilities.

1. ELK Stack

The ELK Stack is a powerful suite of open-source tools used for centralized logging and data analysis. Comprising Elasticsearch, Logstash, and Kibana, the ELK Stack allows you to aggregate logs from multiple sources, analyze them in real-time, and visualize the results through user-friendly dashboards. In this tutorial, we will explore each component of the ELK Stack, how they work together, and guide you through setting up a centralized logging solution using the ELK Stack.

1. Elasticsearch

Elasticsearch is a distributed search and analytics engine that stores, indexes, and retrieves log data. It is designed for scalability and speed, making it ideal for searching large volumes of data quickly.

Key Features:

Full-text Search: Supports powerful search capabilities on large datasets.
RESTful API: Allows easy interaction with data via HTTP requests.
Distributed Nature: Can handle large datasets by distributing data across multiple nodes.

2. Logstash

Logstash is a powerful data processing pipeline designed to ingest logs from various sources, process them, and then send them to a destination, such as Elasticsearch. It plays a crucial role in the ELK stack by ensuring that logs are not only collected but also transformed into a structured format suitable for analysis.

Key Features:

Data Ingestion: Collects logs from diverse sources, including files, databases, and message queues.
Data Processing: Allows transformations and filtering to structure logs for better readability and analysis.
Output Plugins: Sends processed logs to various destinations, primarily Elasticsearch, but can also send to other systems.

Beats: Lightweight Data Shippers

Beats are lightweight data shippers that are installed on your servers to send different types of operational data to Logstash or Elasticsearch. Each Beat is designed for a specific type of data, allowing you to collect logs, metrics, and network data efficiently.

Types of Beats:

Filebeat:
- Purpose: Collects and ships log files from various sources.
- Use Case: Ideal for gathering logs from applications, servers, and services. It monitors the specified files and sends updates to Logstash or Elasticsearch.
- Example: In an e-commerce application, Filebeat can be used to collect logs from web servers, providing insights into user activities, error rates, and other critical metrics.
Metricbeat:
- Purpose: Collects metrics from your systems and services.
- Use Case: Gathers metrics from the operating system (CPU usage, memory, etc.) and from applications (HTTP request stats, MySQL stats, etc.).
- Example: Metricbeat can monitor the performance of microservices within the e-commerce application, helping identify resource bottlenecks or performance degradation.
Packetbeat:
- Purpose: Monitors network traffic and analyzes protocols.
- Use Case: Provides real-time insights into the network activity and performance.
- Example: In an e-commerce context, Packetbeat can analyze network requests between services, helping detect latency issues or failed requests.
Winlogbeat:
- Purpose: Collects Windows Event logs.
- Use Case: Ideal for gathering logs from Windows servers, including security logs, system logs, and application logs.
- Example: Useful in an e-commerce environment for monitoring Windows-based applications and servers.
Auditbeat:
- Purpose: Monitors the integrity of files and processes.
- Use Case: Tracks changes in file systems and process executions, which can be essential for security and compliance.
- Example: In an e-commerce system, Auditbeat can help monitor changes to sensitive files, such as configurations and transaction logs.

Integration with Logstash

When Beats are used with Logstash, they provide an efficient mechanism for log collection. Here’s how Beats work with Logstash in an e-commerce application:

Data Collection: Beats agents (like Filebeat, Metricbeat, etc.) collect relevant data from various sources.
Data Shipping: The collected data is then shipped to Logstash for processing.
Data Processing: Logstash receives the data, applies filters and transformations as needed, and then routes it to Elasticsearch.
Data Analysis: Finally, the processed data can be visualized in Kibana, allowing for actionable insights and analytics.

Example Flow

In an e-commerce application, the flow might look like this:

Filebeat collects web server logs.
Metricbeat gathers system metrics and application performance data.
Packetbeat analyzes the network traffic between services.
All Beats send their data to Logstash for processing.
Logstash filters, structures, and sends the processed logs to Elasticsearch.
Kibana visualizes the data for monitoring user activities, system performance, and security incidents.

This integration of Beats and Logstash within the ELK stack enables robust logging and monitoring capabilities in a microservices architecture, facilitating better operational visibility and troubleshooting.

3. Kibana

Kibana is a visualization tool that provides a web interface for querying and visualizing data stored in Elasticsearch. It enables users to create dashboards, generate reports, and gain insights from log data.

Key Features:

Interactive Dashboards: Create visualizations to analyze log data.
Search Interface: Easily query and explore your data in Elasticsearch.
Visualization Options: Offers various visualization types, including charts, tables, and maps.

Architecture

Clients: Represents various clients interacting with the e-commerce microservices. This includes a web client, a mobile app, and an admin panel.
Services: The core microservices of the e-commerce application, including:
- Order Service: Handles orders placed by clients.
- Payment Service: Manages payment transactions.
- Inventory Service: Keeps track of product inventory.
- User Service: Manages user profiles and authentication.
Log Collection:
- Logstash: The data processing pipeline that ingests logs from the services and processes them.
- Beats: Lightweight data shippers installed on the service instances to send logs and metrics to Logstash.
Storage:
- Elasticsearch: Stores the structured log data and enables search capabilities.
- Event Store: Captures and stores events for further analysis.
Visualization:
- Kibana: A visualization tool that provides a web interface for querying and visualizing the data stored in Elasticsearch.
Flow of Data:
- Clients interact with microservices and generate logs.
- Each microservice sends logs to Logstash.
- Beats collects logs and metrics from the services and forwards them to Logstash.
- Logstash processes the logs and sends them to Elasticsearch and Event Store.
- Kibana allows users to visualize and analyze logs and metrics, generating reports and alerts as necessary.

This comprehensive flowchart visually represents how logging is managed in an e-commerce microservices architecture using the ELK stack and Beats for data collection.

Why Use the ELK Stack?

Centralized Logging: Aggregates logs from multiple sources, making it easier to manage and analyze them.
Real-Time Analysis: Provides real-time insights into application performance and issues.
Scalability: Can handle large volumes of data and scale horizontally as needed.
Enhanced Troubleshooting: Simplifies troubleshooting by allowing you to correlate logs across services.

Setting Up the ELK Stack

Prerequisites

Before you begin, ensure that you have:

A working Kubernetes cluster or Docker installed on your machine.
Basic knowledge of Kubernetes and Docker commands.
Access to a terminal with appropriate permissions.

1. Deploying Elasticsearch

First, we will deploy Elasticsearch. Create a file named elasticsearch-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: elasticsearch
spec:
  replicas: 1
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
      - name: elasticsearch
        image: elasticsearch:7.10.1
        ports:
        - containerPort: 9200
        env:
        - name: discovery.type
          value: single-node
---
apiVersion: v1
kind: Service
metadata:
  name: elasticsearch
spec:
  ports:
  - port: 9200
    targetPort: 9200
  selector:
    app: elasticsearch

Deploy Elasticsearch to your Kubernetes cluster:

kubectl apply -f elasticsearch-deployment.yaml

2. Deploying Logstash

Next, create a file named logstash-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: logstash
spec:
  replicas: 1
  selector:
    matchLabels:
      app: logstash
  template:
    metadata:
      labels:
        app: logstash
    spec:
      containers:
      - name: logstash
        image: logstash:7.10.1
        ports:
        - containerPort: 5044
        volumeMounts:
        - name: log-volume
          mountPath: /usr/share/logstash/pipeline/
        env:
        - name: ELASTICSEARCH_HOST
          value: elasticsearch
      volumes:
      - name: log-volume
        configMap:
          name: logstash-config
---
apiVersion: v1
kind: Service
metadata:
  name: logstash
spec:
  ports:
  - port: 5044
    targetPort: 5044
  selector:
    app: logstash

Next, create a ConfigMap for Logstash pipeline configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: logstash-config
data:
  logstash.conf: |
    input {
      beats {
        port => 5044
      }
    }
    output {
      elasticsearch {
        hosts => ["http://elasticsearch:9200"]
        index => "logs-%{+YYYY.MM.dd}"
      }
    }

Deploy Logstash to your Kubernetes cluster:

kubectl apply -f logstash-deployment.yaml
kubectl apply -f logstash-config.yaml

3. Deploying Kibana

Finally, create a file named kibana-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kibana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kibana
  template:
    metadata:
      labels:
        app: kibana
    spec:
      containers:
      - name: kibana
        image: kibana:7.10.1
        ports:
        - containerPort: 5601
        env:
        - name: ELASTICSEARCH_HOSTS
          value: "http://elasticsearch:9200"
---
apiVersion: v1
kind: Service
metadata:
  name: kibana
spec:
  ports:
  - port: 5601
    targetPort: 5601
  selector:
    app: kibana

Deploy Kibana to your Kubernetes cluster:

kubectl apply -f kibana-deployment.yaml

4. Sending Logs from Microservices

To effectively monitor and troubleshoot your microservices, it's essential to configure each service to send logs to your centralized logging system. This section will guide you through the steps to achieve this using popular logging libraries such as Winston and Bunyan, and how to utilize Beats for streamlined log transmission.

Step-by-Step Configuration

Choose a Logging Library:
- Winston and Bunyan are popular logging libraries for Node.js that provide flexible logging capabilities.
- Winston: A versatile logging library that supports multiple transports (e.g., console, file, HTTP).
- Bunyan: A simple and fast JSON logging library, ideal for structured logging.
Install the Logging Library: For both libraries, you can install them via npm:
```
npm install winston
```
or
```
npm install bunyan
```
Configure Logging in Your Microservice: Below is an example of how to set up Winston to send logs to Logstash using the Beats protocol. Winston Example:
1. Add Service Context: You can include additional fields in your log entries that provide context, such as the service name, version, environment, and other relevant details.
2. Use Winston's defaultMeta Property: This property allows you to specify default metadata that will be included with every log entry.

const winston = require('winston');
const { LogstashTransport } = require('winston-logstash');

const logger = winston.createLogger({
    level: 'info',
    format: winston.format.json(),
    defaultMeta: {
        service: 'order-service',      // Name of the service
        version: '1.0.0',              // Version of the service
        environment: 'production',      // Environment (e.g., development, staging, production)
    },
    transports: [
        new LogstashTransport({
            host: 'localhost',  // Logstash host
            port: 5044,         // Logstash port
        })
    ]
});

// Logging an event with additional context
logger.info('Order created successfully', { orderId: 12345, userId: 67890 });

Explanation:

defaultMeta: The defaultMeta property contains metadata that is automatically included in every log entry, making it easier to filter and identify logs based on the service, version, or environment.
Logging Context: You can also add more specific contextual information when logging events, such as orderId and userId, which provides additional details about the logged event.

Use Beats for Log Shipping: Beats are lightweight data shippers that can send data from your services to Logstash. Two common Beats used for logging are:
- Filebeat: Monitors log files and forwards them to Logstash.
- Metricbeat: Collects metrics and stats from your services and sends them to Logstash. Configuring Filebeat: Install Filebeat and configure it to monitor your log files. Below is an example configuration for Filebeat:
```
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/myapp/*.log

output.logstash:
  hosts: ["localhost:5044"]
```
This configuration tells Filebeat to monitor all log files in the /var/log/myapp/ directory and send them to Logstash running on localhost at port 5044.
Logstash Configuration: Ensure Logstash is set up to receive logs from Beats. Below is a simple Logstash configuration:
```
input {
    beats {
        port => 5044
    }
}

output {
    elasticsearch {
        hosts => ["http://localhost:9200"]
        index => "microservices-%{+YYYY.MM.dd}"
    }
}
```
This configuration tells Logstash to listen for incoming logs on port 5044 and forward them to Elasticsearch.
Testing and Monitoring:
- After configuring logging in your microservices and setting up Beats, test to ensure that logs are being sent and received correctly.
- Use Kibana to visualize and query your logs, ensuring you have proper logging set up to monitor your microservices effectively.

Conclusion

Centralized logging is an essential aspect of managing microservices, providing visibility and control over distributed systems. By implementing the ELK Stack, you can efficiently collect, store, and analyze log data, helping to ensure that your services run smoothly.

FAQs

Q1: What is centralized logging?

Centralized logging aggregates logs from multiple sources into a single location for easier analysis and monitoring.

Q2: Why is the ELK Stack popular?

The ELK Stack is widely used because it provides powerful search, visualization, and analysis capabilities for log data.

Q3: How can I implement centralized logging in a microservices architecture?

You can implement centralized logging by deploying the ELK Stack and configuring each microservice to send logs to the Logstash endpoint.

Q4: What are the benefits of using structured logging?

Structured logging makes it easier to query and analyze logs by providing a consistent format, improving observability.

Q5: Can I use other tools for centralized logging?

Yes, there are many other tools available, such as Fluentd, Splunk, and Graylog, that can also be used for centralized logging.

Clap here if you liked the blog