Automatically restart struggling Heroku dynos using LogEntries

September 14, 2015 | Ruby on Rails, Thoughts from the team

We have a Rails app hosted on Heroku which periodically develops a memory leak, pushing it well over Heroku’s per-dyno memory quote and slowing everything down as it hits swap. The issue is intermittent, random, and only happens every few days but it’s easy enough to deal with, just restart the dynos. However it has a habit of happening at night or weekends (the site is used entirely in the US), which makes it difficult to deal with out of hours.

While we are making efforts to find the cause of the leak, our primary concern is to make sure the site remains usable.  To that end, I’ve put together a little something to restart the web dynos automatically, even when it’s the middle of the night for us.

We use the LogEntries service, available as a free plugin for Heroku apps, to monitor our applications. LogEntries tails the logs and triggers alerts based on configurable conditions. It can detect all the Heroku platform errors such as the one we are interested in “R14 Memory quote exceeded”, and send an email, slack notification, or poke a webhook. It seemed logical to use LogEntries to restart the dynos when they got into trouble.

Restarting the Dynos

To restart our web dynos we create an ActiveJob task, which uses the Heroku Platform API (Ruby gem) to fetch the list of dynos, filter them down to just the running web instances (we’ve never had a problem with the workers), and restart each one in turn.

First install the Heroku CLI OAuth Plugin

    heroku plugins:install

Then create a OAuth token with write privileges (I suggest you use Heroku that can only access this app to create the token) and set it as an environment variable

    heroku authorizations:create -s write
    heroku config:add RESTART_API_KEY=<API KEY>

Now create an ActiveJob task, which we’ve called RestartAppJob.

require 'platform-api'

class RestartAppJob < ActiveJob::Base
  queue_as :restarts

  class Dyno
    attr_accessor :type
    attr_accessor :name
    attr_accessor :state

    def self.connection
        @@connection ||= PlatformAPI.connect_oauth(ENV['RESTART_API_KEY'])

    def self.dynos
      connection.dyno.list(ENV['APP_NAME']).map do |dyno_info|

    def self.running_web_dynos { |dyno| dyno.web? && dyno.up? }

    def web?
      type == 'web'

    def up?
      state == 'up'

    def connection

    def restart!
      connection.dyno.restart(ENV['APP_NAME'], name)

    def initialize(info)
      self.type = info['type'] = info['name']
      self.state = info['state']

  def perform(*args)
    if Dyno.connection
      Dyno.running_web_dynos.each do |dyno|

As you can see, most of the work is done in the Dyno class.



…will queue up a job to restart your webservices.

Triggering the Job

To trigger the job we have a controller action that looks like this…

def restart_web_dynos
  if params[:key] == ENV['RESTART_WEBHOOK_KEY']
    render text: 'Restart triggered'
    render text: 'You are not allowed to restart the dynos'

You can put this in any controller you think is appropriate, and setup the routes however you like. It expects a parameter of ‘key’ that matches whatever you set the environment variable RESTART_WEBHOOK_KEY to (I suggest generating a GUID using the SecureRandom library)

With the controller action in place you can set the webhook action in LogEntries to point to

Now, whenever LogEntries detects the memory quota issue it will call the webhook, which will schedule the job, which restarts the dynos. You could extend this to other events or monitoring services easily enough.


Obviously this relies on at least one dyno still being functional. We tend to find that while the app slows down when it hits the quota it doesn’t actually stop, so this approach is ok. However if you have dynos that stop responding entirely you will need to host this code separately.

More like this