MRI maintainers have put a tremendous amount of work into improving the garbage collector in Ruby 2.0 through 2.2. The engine has thus gained a lot more horsepower. However, it’s still not trivial to get the most out of it. In this post we’re going to gain a better understanding of how and what to tune for.
Koichi Sasada (_ko1, Ruby MRI maintainer) famously mentioned in a presentation (slide 89):
Try GC parameters
- There is no silver bullet
- No one answer for all applications
- You should not believe other applications settings easily
- Try and try and try!
This is true in theory but a whole lot harder to pull off in practice due to three primary problems:
- Interpreter GC semantics and configuration change over time.
- One GC config isn’t optimal for all app runtime contexts: tests, requests, background jobs, rake tasks, etc.
- During the lifetime and development cycles of a project, it’s very likely that existing GC settings are invalidated quickly.
An evolving Garbage Collector
The garbage collector has frequently changed in the latest MRI Ruby releases. The changes have also broken many existing assumptions and environment variables that tune the GC. Compare GC.stat
on Ruby 2.1:
1 2 3 4 5 6 7 8 |
|
…with Ruby 2.2:
1 2 3 4 5 6 7 8 9 10 11 |
|
In Ruby 2.2 we can see a lot more to introspect and tune, but this also comes with a steep learning curve which is (and should be) out of scope for most developers.
One codebase, different roles
A modern Rails application is typically used day to day in different contexts:
- Running tests
- rake tasks
- database migrations
- background jobs
They all start pretty much the same way with the VM compiling code to instruction sequences. Different roles affect the Ruby heap and the garbage collector in very different ways, however.
This job typically runs for 13 minutes, triggers 133 GC cycles and allocates a metric ton of objects. Allocations are very bursty and in batches.
1 2 3 4 5 6 7 |
|
This controller action allocates 24 555 objects. Allocator throughput isn’t very variable.
1 2 3 4 5 |
|
Our test case contributes 175 objects to the heap. Test cases generally are very variable and bursty in allocation patterns.
1 2 3 4 5 |
|
The default GC behavior isn’t optimal for all of these execution paths within the same project and neither is throwing a single set of RUBY_GC_*
environment variables at it.
We’d like to refer to processing in these different contexts as “units of work”.
Fast development cycles
During the lifetime and development cycle of a project, it’s very likely that garbage collector settings that were valid yesterday aren’t optimal anymore after the next two sprints. Changes to your Gemfile, rolling out new features, and bumping the Ruby interpreter all affect the garbage collector.
1 2 3 4 5 |
|
Process lifecycle events
Let’s have a look at a few events that are important during the lifetime of a process. They help the tuner to gain valuable insights into how well the garbage collector is working and how to further optimize it. They all hint at how the heap changes and what triggered a GC cycle.
How many mutations happened for example while
- processing a request
- between booting the app and processing a request
- during the lifetime of the application?
When it booted
When the application is ready to start doing work. For Rails application, this is typically when the app has been fully loaded in production, ready to serve requests, ready to accept background work, etc. All source files have been loaded and most resources acquired.
When processing started
At the start of a unit of work. Typically the start of an HTTP request, when a background job has been popped off a queue, the start of a test case or any other type of processing that is the primary purpose of running the process.
When processing ended
At the end of a unit of work. Typically the end of a HTTP request, when a background job has finished processing, the end of a test case or any other type of processing that is the primary purpose of running the process.
When it terminated
Triggered when the application terminates.
Knowing when and why GC happens
Tracking GC cycles interleaved with the aforementioned application events yield insights into why a particular GC cycle happens. The progression from BOOTED to TERMINATED and everything else is important because mutations that happen during the fifth HTTP request of a new Rails process also contribute to a GC cycle during request number eight.
On tuning
Primarily the garbage collector exposes tuning variables in these three categories:
- Heap slot values: where Ruby objects live
- Malloc limits: off heap storage for large strings, arrays and other structures
- Growth factors: by how much to grow slots, malloc limits etc.
Tuning GC parameters is generally a tradeoff between tuning for speed (thus using more memory) and tuning for low memory usage while giving up speed. We think it’s possible to infer a reasonable set of defaults from observing the application at runtime that’s conservative with memory, yet maintain reasonable throughput.
A solution
We’ve been working on a product, TuneMyGC for a few weeks that attempts to do just that. Our goals and objectives are:
- A repeatable and systematic tuning process that respects fast development cycles
- It should have awareness of runtime profiles being different for HTTP requests, background job processing etc.
- It should support current mainline Ruby versions without developers having to keep up to date with changes
- Deliver reasonable memory footprints with better runtime performance
- Provide better insights into GC characteristics both for app owners and possibly also ruby-core
Here’s an example of Discourse being automatically tuned for better 99th percentile throughput. Response times in milliseconds, 200 requests:
Controller | GC defaults | Tuned GC |
---|---|---|
categories | 227 | 160 |
home | 163 | 113 |
topic | 55 | 40 |
user | 92 | 76 |
GC defaults:
1
|
|
Raw GC stats from Discourse’s bench.rb script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
TuneMyGC recommendations
1
|
|
Raw GC stats from Discourse’s bench.rb script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
We can see a couple of interesting points here:
- There is much less GC activity – only 44 rounds instead of 106.
- Slot buffers are still decent for high throughput. There are 718247 free slots (
heap_free_slots
) of 1179182 available slots (heap_available_slots
), which is 64% of the current live objects (heap_live_slots
). This value however is slightly skewed because the Discourse benchmark script forces a major GC before dumping these stats - there are about as many swept slots as free slots (heap_swept_slots
). - Malloc limits (
malloc_increase_bytes_limit
andoldmalloc_increase_bytes_limit
) and growth factors (old_objects_limit
andremembered_wb_unprotected_objects_limit
) are in line with actual app usage. The TuneMyGC service considers when limits and growth factors are bumped during the app lifecycle and attempts to raise limits via environment variables slightly higher to prevent excessive GC activity.
Now it’s your turn.
Feel free to take your Rails app for a spin too!
1. Add to your Gemfile.
1
|
|
2. Register your Rails application.
1 2 |
|
3. Boot your app. We recommend an optimal GC configuration when it ends
1
|
|
Related articles
This article is a part of a series about Rails performance optimization and GC tuning. Other articles in the series: