You often hear about how fast languages like Rust and Go are. People port all kinds of things to Rust to make them faster. It's common to hear about a company porting a Ruby microservice to Go or writing native extensions for a dynamic language in Rust for extra performance.
Crystal also compiles your apps into blazing-fast native code, so today I decided to try comparing Rust and Crystal side-by-side in talking to a Redis database.
The Benchmark
I wanted something realistic, and most benchmarks I could find were things like Mandelbrot and digits of π. They're CPU-intensive, absolutely, but they're nothing like the workload a typical web app has.
The benchmark I went with was to connect to a Redis database and run a bunch of pipelined commands. Pipelining means we're sending all of the commands before reading any of them. Because we're not waiting for the result after sending each command, this drastically reduces the impact that latency has on the benchmark. For example, instead of this sequence:
- Send command
- Read result
- Send command
- Read result
- Send command
- Read result
What we do instead is this:
- Send command
- Send command
- Send command
- Read result
- Read result
- Read result
This way we pay the latency cost once between the last send and the first read instead of 3 times.
For our benchmark, we're going to run a mix of common Redis operations:
- Set a key
- Get a key that exists
- Get a key that does not exist
- Increment the value for a key
We do each of these 100k times. The more work we do in this pipeline, the less effect latency has and the more effective the benchmark is. The reason we run a mix of commands isn't so much about what Redis does with them (we're not benchmarking Redis), but what Redis returns for them. The SET
and GET
commands in Redis return strings, which require heap allocations. INCR
returns an integer, which is usually allocated on the stack (no malloc
/ free
needed) and doesn't necessarily require a heap allocation (though the implementation might parse the integer from an intermediate string, which could involve an allocation).
First we'll look at the code in each language, then the results.
Rust
We're using the redis-rs
Rust crate for this app. We construct a Redis pipeline with redis::pipe()
, fill it with data, and then send that data to the connection.
use redis::{self};
use std::time::{Instant};
fn main() {
const ITERATIONS: usize = 100_000;
let client = redis::Client::open("redis://127.0.0.1:6379").unwrap();
let mut con = client.get_connection().unwrap();
let start = Instant::now();
let mut pipe = redis::pipe();
pipe.del("foo").ignore();
for _i in 0..ITERATIONS { pipe.set("foo", "bar").ignore(); }
for _i in 0..ITERATIONS { pipe.get("foo").ignore(); }
pipe.del("foo").ignore();
for _i in 0..ITERATIONS { pipe.incr("foo", 1).ignore(); }
pipe.del("foo").ignore();
for _i in 0..ITERATIONS { pipe.get("foo").ignore(); }
let () = pipe.query(&mut con).unwrap();
println!("{}", start.elapsed().as_millis());
}
Crystal
require "../src/redis"
redis = Redis::Connection.new
start = Time.monotonic
iterations = 100_000
redis.pipeline do |redis|
redis.del "foo"
iterations.times { redis.set "foo", "bar" }
iterations.times { redis.get "foo" }
redis.del "foo"
iterations.times { redis.incr "foo" }
redis.del "foo"
iterations.times { redis.get "foo" }
end
pp Time.monotonic - start
Note that this isn't the more common Crystal Redis shard. This is a Redis client I wrote that is significantly tuned to reduce heap allocations and remain light while supporting as much of Redis as I needed. I will be publishing it on GitHub soon. You can find the code on GitHub.
The Results
$ cargo run --release --example redis_app
Finished release [optimized] target(s) in 0.30s
Running `target/release/examples/redis_app`
568
It took our Rust app 568 milliseconds to connect to Redis, send 400k commands, and receive all their results.
$ crystal run --release bench/bench_redis.cr
00:00:00.328368151
Our Crystal app took just 328 milliseconds to run the same commands. That means the Rust app took 73% more time to perform the exact same work as the Crystal app.
The Caveat
The hard part about benchmarking anything that connects to a server is that the server may actually be your bottleneck. With databases especially, it's easy to get stuck waiting on I/O. In our example apps, the Redis server was indeed capping out at 100% CPU but neither app was, which is why we stop at 400k commands — going beyond that wasn't actually providing any useful information.
So how can we find just the time our app spent in the CPU and ignore all the time we spent waiting on the server? Turns out the UNIX time
command tells us exactly this. Instead of cargo run
and crystal run
, we'll compile our programs and run them directly through time
:
$ cargo build --release --example redis_app
Finished release [optimized] target(s) in 0.26s
$ time target/release/examples/redis_app
563
target/release/examples/redis_app 0.28s user 0.04s system 48% cpu 0.656 total
Our Rust app used the CPU for 320ms (280ms in userland and 40ms in system calls).
$ crystal build --release bench/bench_redis.cr -o bin/bench_redis
$ time bin/bench_redis
00:00:00.327064055
bin/bench_redis 0.12s user 0.02s system 41% cpu 0.341 total
Our Crystal app used the CPU for 140ms (120ms in userland and 20ms in system calls). That means our Crystal app was 2.29x as fast on the CPU!
Also, it was interesting seeing both of these programs were waiting on Redis for over half of their runtime! As someone that has worked mostly in Ruby for 16 years, being able to saturate a Redis server with a single client is hilarious to me.
The End
The purpose of this post was not to say that Rust is slow. Rust is very fast. The idea was to see if Rust was really the performance trailblazer we all thought it was and it turns out Crystal has just as good, if not way better, performance for cases like this.
One thing that strikes me is that you never hear people talk about using Rust and Go for how nice they are to read and write the way you hear people talk about Ruby. It's always about the performance. But somehow we don't hear people talking as much about Crystal for the same reasons. I wonder if it's because it resembles Ruby that people don't take it seriously. Rust and Go have curly braces everywhere, so they're fast, right? 😄
Anyway, if you use Ruby or Python for their expressiveness and Rust or Go for their performance, it might be worth writing a part of your app in Crystal to get both.
Maybe a long running process will show a decrease in Crystal performance due to the garbage collector.