curator for elasticsearch
Garbage collection for Elasticsearch
Sometimes, it would be handy to have a janitor like tool which takes care of your Elasticsearch database. If you're using Elasticsearch as backend for a syslog server, you'll notice that the disk will be filled up quite quickly. Of course you would like to search your recent syslog messages but the older ones become less relevant with time. To limit the storage capacity needed, you have to delete old indices from time to time.
Curator takes care of the mess
In fact, there is a janitor like tool for Elasticsearch called Curator. The installation and configuration is quite easy:
apt-get install elasticsearch-curator
After the installation, you should create a config dir in /etc:
In this directory, you create two files: The base configuration for curator and a so called action file. A base configuration (/etc/curator/curator.yml) could look like this.
client: hosts: - 127.0.0.1 port: 9200 url_prefix: use_ssl: False certificate: client_cert: client_key: ssl_no_validate: False http_auth: timeout: 30 master_only: False logging: loglevel: INFO logfile: /var/log/curator.log logformat: default blacklist: ['elasticsearch', 'urllib3']
Please be aware of the fact, that this is a YAML file, so indention is really important as it's part of the syntax!
Next, you create an action file (/etc/curator/action_space.yml), which could look like this:
actions: 1: action: delete_indices description: This action file cleans up the elasticsearch index options: ignore_empty_list: True continue_if_exception: False filters: - filtertype: pattern kind: prefix value: syslog-ng_ - filtertype: space disk_space: 9
Again, keep an eye on th indention!
This action file will delete Elasticsearch indices which look like syslog_ng* when the accumulated size is greater than 9GB. As it goes from now to past, only old indices are deleted.
Unfortunatelly, this is only that easy if you're using this for a single node Elasticsearch cluster.
Testing and periodic execution
Of course you should test your action file before running it on your production database. To do so, you can use the --dry-run option:
curator --dry-run --config /etc/curator/curator.yml /etc/curator/action_space.yml
As you defined in curator.yml to write a log to /var/log/curator.log you will be able to see what curator would do.
To eventually run curator periodically, you simple add a cronjob which executes curator:
curator --config /etc/curator/curator.yml /etc/curator/action_space.yml