Most people thinking of Redis, think of it is an in-memory datastore. This is totally true. However, there is a lot of misconception with the “in memory” part; as being if my redis server crashes I lose all of my data. This part is most definitely not true. Redis persists your data to disk and provides you with all of the knobs you are going to need to need to fine tune how often you’d like this persist to take place, while still eeking out the performance you’d like to get out of redis.
The two storage options
A picture speaks a thousand words, so here goes:
Option 1: Binary .rdb file
----- | R | | E | Option 1 --------------------- | D | -------------> | Binary File (.rdb) | | I | --------------------- | S | -----
Option 2: A text file popularly known as an Append-Only File (AOF)
----- | R | | E | Option 2 --------------------- | D | -------------> | Text File (.aof) | | I | --------------------- | S | -----
With both these options, you can create a brand new redis server with the data from your old redis server by simply copying the .rdb of .aof file and pointing your new redis server at the copied file.
What’s the difference?
Besides the obvious fact, that one is in binary the other in text format, you may ask what’s the difference between them? The Append Only File (AOF) is basically a log of all the commands your redis-server has run. Every single operation that your redis server has executed gets written to the AOF. So if you did:
INCR USER_COUNT 1 INCR USER_COUNT 1 INCR USER_COUNT 1
In your AOF you will see all three increment operations. On the other hand, the binary .rdb file is basically a snapshot of all the keys and values in your redis server.
As you may have guessed, the AOF can grow really large since it’s logging every operation; which is why redis has a command called BGREWRITEAOF that as the name suggests rewrites the AOF in the background. The re-write results in a file of way smaller size. Taking the example from above, let’s say that after the third INCR USER_COUNT is now 10. The AOF file will have something along the lines of:
SET USER_COUNT 10
Why even have two formats?
I mean, isn’t it going to be a pain to remember to keep doing BGREWRITEAOF; or you risk running out of disk space? The AOF let’s you do something special which the binary .rdb file just is not going to be able to do. Again, a picture will explain this way more clearly:
R1 --> /var/db/redis/file_one.aof R2 --> /var/db/redis/file_two.aof
So you have two redis servers, each writing to their own files. You can now do:
cat /var/db/redis/file_one.aof /var/db/redis/file_two.aof >> /var/db/redis/file_three.aof
And now bring up a new Redis server pointing at file_three.aof:
R3 --> /var/db/redis/file_three.aof
And this new redis server will have all of the data from servers R1 & R2. How cool is that?!
Merging data across all of your redis servers
Going off on a tangent for a bit here and trying to re-hash the trick discussed above. Let’s say you have multiple redis servers running and they are storing in binary format. You would now like to merge all of their data together, while the servers are running with zero downtime. Here’s one way to go about it:
redis-cli -p <first-server-port> CONFIG SET appendonly yes redis-cli -p <first-server-port> BGREWRITEAOF redis-cli -p <first-server-port> CONFIG SET appendonly no redis-cli -p <second-server-port> CONFIG SET appendonly yes redis-cli -p <second-server-port> BGREWRITEAOF redis-cli -p <second-server-port> CONFIG SET appendonly no
All this is doing, is basically flipping the AOF switch, forcing your redis server’s to rewrite their AOF’s and then flipping it back to the binary storage format. You can now, merge these two AOF files by cat'ing them out to a third file as shown before and voila, you now have all of your data from your two redis servers in one AOF file that you can then use to bring up a new redis server (or as a backup).
Which one is better?
For most situations, using the binary format is better. When in doubt, avoid using the AOF – you run the risk of running out of diskspace if you forget to do a BGREWRITEAOF. And if you do end up having to merge data across multiple redis-servers, you could always do the trick shown above. The binary format is also a little faster in couple of redis-benchmarks that I had run.
Fine tuning binary format save strategy
Redis let’s you specify how often you’d like to take a snapshot of your redis DB and persist it to disk. A typical configuration entry that you will in a redis.conf file (here’s an example of one ) will have entries in the SNAPSHOTTING section that look like:
save 900 1 save 300 10 save 60 10000
Read the first line as “Snapshot the in-memory DB to disk every 900 seconds if only 1 key has changed”. The second line as “Snapshot the in-memory DB to disk every 300 seconds if only 10 keys have changed”. And the last line as (you guessed it) “Snapshot the in-memory DB to disk every 60 seconds if 10000 keys have changed”. Now you can play with these parameters and tweak them as you see fit. For instance you could have something like:
save 5 1
Where you’ll be taking a snapshot of your DB every 5 seconds even if a single key has changed. Obviously this is going to come with a performance impact (since disk write’s or fsync’s as they are known the geek world are a time consuming operation), but Redis does support storing your stuff on disk even at this paranoid level! Finally, you can force a snapshot to happen anytime with the BGSAVE command.
Fine tuning AOF save strategy
In AOF mode, redis keeps logging what operations it’s doing in-memory. You can tell redis how often you’d like it to flush this in-memory buffer to disk. In the APPEND ONLY MODE section of a redis.conf file you will see an entry for appendfsync. Typical values are:
appendfsync always appendfsync everysec appendfsync no
Where the always and everysec operations are pretty self-explanatory. The no option is kind of a misnomer. With the no option set, redis assumes no responsibility for deciding when it’s flushing the in-memory AOF buffer to disk, it’s let’s the Operating System decide when it needs to be done. Of the three, the no option is generally the most performant, while at the same time being the most risky. The everysec option seems the most popular giving the best of both worlds.
Redis does store your data to disk and has numerous options that let you control how often you’d like it to persist your data. So if this is what is holding you back from using Redis, don’t let it anymore.