Logstash & Index lifecycle management (ILM)

Grok syntax

Grok identifies the data in the log through pattern matching. The grok plug-in can be simply understood as an upgraded version of regular expression. It has more modes. By default, Logstash has 120 modes. If these patterns do not meet our requirements for parsing logs, we can directly use regular expressions for matching.

Official website:
https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns

The syntax of grok mode is:% {SYNTAX:SEMANTIC}
SYNTAX refers to the Grok pattern name, and SEMANTIC refers to the text field name matched to the pattern. For example:

%{NUMBER:duration} %{IP:client}
duration Means: match a number, client Indicates a match IP Address.

By default, in Grok, all matched data types are strings. If you want to convert them to int types (currently only int and float are supported), you can do this:% {number: duration: int}% {IP: client}

The following are common Grok patterns:

Index lifecycle management (ILM)

Elasticsearch index life cycle management refers to the management of elasticsearch from the whole life process of creating index, opening index, closing index and deleting index.
In large-scale Elasticsearch applications, multi index combined with horizontal expansion based on time and index size is generally used to store data. With the increase of the amount of data, there is no need to modify the underlying architecture of the index.

  • Index lifecycle management (ILM) is a function first introduced in Elasticsearch 6.6 and officially launched in version 6.7
  • ILM is a part of elastic search, which is mainly used to help manage indexes
  • ILM based on Elasticsearch can realize hot, warm and cold architecture

Hot, warm and cold architecture

  • Hot, warm and cold architecture is often used for log or index time series data
  • For example, suppose you are using Elasticsearch to aggregate log files from multiple systems
    • Today's logs are frequently indexed, and this week's log search volume is the largest (hot)
    • Last week's log may be searched frequently, but the frequency is not as high as this week's log (warm)
    • The search frequency of last month's log may be high or low, but it's best to keep it for a period of time just in case (cold)

In the figure above, there are 19 nodes in the cluster: 10 hot nodes, 6 warm nodes and 3 cold nodes. Cold nodes are optional. In Elasticsearch, you can define which nodes are hot nodes, warm nodes, or cold nodes.

  • ILM allows you to define when to move between two phases and how to handle indexes when entering that phase
  • For the hot temperature and cold architecture, there is no invariable setting. However, generally speaking, hot nodes require more CPU resources and faster io. For warm and cold nodes, each node usually needs more disk space, but it can barely cope even with less CPU resources and slower IO devices

Configure partition allocation awareness

The hot, warm and cold nodes depend on the perception of partition allocation. Therefore, first mark which nodes are hot nodes, warm nodes and (optional) cold nodes.
Cluster planning:

You can use the following command to create a key Elasticsearch cluster:

jps | grep Elasticsearch | cut -f1 -d" " | xargs kill -9

Configuring ILM policies

  • For index lifecycle management, you need to configure ILM policies, which can be applied to any index you choose
  • ILM strategy is mainly divided into four main stages: hot, warm, cold and delete
  • Instead of defining each phase in a policy, ILM will always execute the phases in that order (skipping any undefined phases)
  • You can configure ILM policies to define when to enter this phase and how to manage indexes

The following code is to create a basic ILM strategy:

PUT /_ilm/policy/my_policy
{
    "policy":{
        "phases":{
            "hot":{
                "actions":{
                    "rollover":{
                        "max_size":"50gb",
                        "max_age":"30d"
                    }
                }
            }
        }
    }
}

This policy stipulates that when the index storage time reaches 30 days or the index size reaches 50GB (based on the main partition), the index will be updated and a new index will be written.

ILM and index templates

When the index type and configuration information are the same, you can use the index template. Otherwise, you need to specify many index parameters every time you create an index. For example, specify the refresh cycle, the number of primary partitions, the number of replicas, and some configurations of translog

Create a file named my_template template and associated with ILM policy:

PUT _template/my_template
{
    "index_patterns": ["test-*"],
        "settings": {
        "index.lifecycle.name": "my_policy",
        "index.lifecycle.rollover_alias": "test-alias"
    }
}

For policies configured with rolling update operations, the index must be started with the write alias after the index template is created

PUT test-000001
{
    "aliases": {
        "test-alias":{
            "is_write_index": true
        }
    }
}

After the rolling update requirements are met, any new index starting with test - * will be automatically updated after 30 days or when it reaches 50GB. By using rolling update management_ After the index starts with size, the number of index fragments can be greatly reduced, thus reducing the overhead.

Configure ILM policies for collection

  • Beats and Logstash both support ILM, and a default policy similar to that shown in the above example will be set after enabling
  • When ILM is enabled for Beats and Logstash, unless the number of indexes per day is large (greater than 50GB / day), index size may be the main factor in determining when to create a new index
  • Starting with 7.0.0, ILM with rolling updates will be the default configuration for Beats and Logstash
  • Since there are no invariable settings for the hot, warm and cold architecture, Beats and Logstash will not automatically configure the hot, warm and cold strategy. We can formulate a new strategy suitable for hot, warm and cold, and make some optimization in this process.

Optimize ILM strategy for warm, hot and cold

The following configuration creates an ILM strategy optimized for hot, warm and cold architecture.

PUT _ilm/policy/hot-warm-cold-delete-60days
{
    "policy": {
        "phases": {
            "hot": {
                "actions": {
                    "rollover": {
                        "max_size": "50gb",
                        "max_age": "30d"
                    },
                    "set_priority": {
                        "priority": 50
                    }
                }
            },
            "warm": {
                "min_age": "7d",
                "actions": {
                    "forcemerge": {
                        "max_num_segments": 1
                    },
                    "shrink": {
                        "number_of_shards": 1
                    },
                    "allocate": {
                        "require": {
                            "data": "warm"
                        }
                    },
                    "set_priority": {
                        "priority": 25
                    }
                }
            },
            "cold": {
                "min_age": "30d",
                "actions": {
                    "set_priority": {
                        "priority": 0
                    },
                    "freeze": {},
                    "allocate": {
                        "require": {
                            "data": "cold"
                        }
                    }
                }
            },
            "delete": {
                "min_age": "60d",
                "actions": {
                    "delete": {}
                }
            }
        }
    }
}
"hot": {
    "actions": {
        "rollover": {
            "max_size": "50gb",
            "max_age": "30d"
        },
        "set_priority": {
            "priority": 50
        }
    }
}
  • This ILM policy first sets the index priority to a higher value so that the hot index can be restored before other indexes
  • After 30 days or when it reaches 50GB (it can meet any one), the index will be updated in a rolling manner, and the system will create a new index
  • The new index will restart the strategy, and the current index (the index just updated by rolling) will wait 7 days after rolling update before entering the warm phase
"warm": {
    "min_age": "7d", # The index entered the warm stage in 7 days
    "actions": {
        "forcemerge": {
            "max_num_segments": 1 # The pre merge segment is 1
        },
        "shrink": {
            "number_of_shards": 1 # Set the number of slices to 1
        },
        "allocate": { 
            "require": {
                "data": "warm" # Move to warm node
            }
        },
        "set_priority": {
            "priority": 25 # The priority is lower than that of the hot phase
        }
    }
}

After the index enters the warm phase, ILM will shrink the index to one partition, forcibly merge the index into one segment, set the index priority to a value lower than that in the hot phase (but higher than that in the cold phase), and move the index to the warm node through allocation operation. After this operation, the index will wait 30 days (from the time of rolling update) before entering the cold phase.

"cold": {
    "min_age": "30d", # After the index enters the warm stage, it enters the cold stage after 30 days
    "actions": {
        "set_priority": {
            "priority": 0 # Lower priority
        },
        "freeze": {},
        "allocate": {
            "require": {
                "data": "cold" # Move index to cold node
            }
        }
    }
}

After the index enters the cold phase, ILM will reduce the index priority again to ensure that the hot index and warm index can be recovered first. ILM then freezes the index and moves it to the cold node. After this operation, the index will wait 60 days (from the time of rolling update) before entering the deletion phase.

"delete": {
    "min_age": "60d",
    "actions": {
        "delete": {}
    }
}

The delete phase has a delete operation for deleting an index. During the delete phase, you will always need to have a min_age condition to allow the index to stay in the hot, warm or cold phase for a given period of time.

Graphical summary

Posted by celsoendo on Sat, 07 May 2022 05:56:18 +0300