curator for elasticsearch

Garbage collection for Elasticsearch

Sometimes, it would be handy to have a janitor like tool which takes care of your Elasticsearch database. If you're using Elasticsearch as backend for a syslog server, you'll notice that the disk will be filled up quite quickly. Of course you would like to search your recent syslog messages but the older ones become less relevant with time. To limit the storage capacity needed, you have to delete old indices from time to time.

Curator takes care of the mess

In fact, there is a janitor like tool for Elasticsearch called Curator. The installation and configuration is quite easy:

apt-get install elasticsearch-curator

After the installation, you should create a config dir in /etc:

mkdir /etc/curator

In this directory, you create two files: The base configuration for curator and a so called action file. A base configuration (/etc/curator/curator.yml) could look like this.

  port: 9200
  use_ssl: False
  ssl_no_validate: False
  timeout: 30
  master_only: False

  loglevel: INFO
  logfile: /var/log/curator.log
  logformat: default
  blacklist: ['elasticsearch', 'urllib3']

Please be aware of the fact, that this is a YAML file, so indention is really important as it's part of the syntax!

Next, you create an action file (/etc/curator/action_space.yml), which could look like this:

    action: delete_indices
    description: This action file cleans up the elasticsearch index
      ignore_empty_list: True
      continue_if_exception: False
      - filtertype: pattern
        kind: prefix
        value: syslog-ng_
      - filtertype: space
        disk_space: 9

Again, keep an eye on th indention!

This action file will delete Elasticsearch indices which look like syslog_ng* when the accumulated size is greater than 9GB. As it goes from now to past, only old indices are deleted.

Unfortunatelly, this is only that easy if you're using this for a single node Elasticsearch cluster.

Testing and periodic execution

Of course you should test your action file before running it on your production database. To do so, you can use the --dry-run option:

curator --dry-run --config /etc/curator/curator.yml /etc/curator/action_space.yml

As you defined in curator.yml to write a log to /var/log/curator.log you will be able to see what curator would do.

To eventually run curator periodically, you simple add a cronjob which executes curator:

curator --config /etc/curator/curator.yml /etc/curator/action_space.yml

That's it.

Go back