Deploy a Highly Available Redis Cache Cluster

Deploy a Highly Available Redis Cache Cluster

In this blog we will detail a relatively easy way to get a functioning Redis-cluster to cache session data and tokens generated by the Gluu Server. The lessons here can be used for any other utility that uses Redis for caching as long as the client has a library that can utilize the Redis-cluster protocol. See Redis Clients. Gluu Server utilizes Jedis.

Before starting, you should already have Gluu Server installed, either standalone or clustered. Each node that you’d like to deploy Redis-clusters on should have a Redis-server installed. We tested this with 4.0.9.

Configure and Deploy Redis-Cluster

Create three configuration files, a.conf, b.conf, and c.conf, on each node you want the Redis-cluster to be on. They should be configured as follows:

a.conf

protected-mode no
port 6379
pidfile /var/run/Redis_6379.pid
cluster-enabled yes
cluster-config-file nodes-6379.conf
cluster-node-timeout 2000
cluster-slave-validity-factor 0
save ""
appendonly no
auto-aof-rewrite-percentage 0
tcp-backlog 511
loglevel notice
logfile /var/log/Redis_6379.log

b.conf

protected-mode no
port 6380
pidfile /var/run/Redis_6380.pid
cluster-enabled yes
cluster-config-file nodes-6380.conf
cluster-node-timeout 2000
cluster-slave-validity-factor 0
save ""
appendonly no
auto-aof-rewrite-percentage 0
tcp-backlog 511
loglevel notice
logfile /var/log/Redis_6380.log

c.conf

protected-mode no
port 6381
pidfile /var/run/Redis_6381.pid
cluster-enabled yes
cluster-config-file nodes-6381.conf
cluster-node-timeout 2000
cluster-slave-validity-factor 0
save ""
appendonly no
auto-aof-rewrite-percentage 0
tcp-backlog 511
loglevel notice
logfile /var/log/Redis_6381.log

Ideally, you should have 3 or more Redis-cluster nodes. If a node fails, there needs to be at least 2 active masters to adjudicate in a quorum to upgrade a slave to a master.

Launch a.conf, b.conf and c.conf on both nodes:

Redis-server /path/to/a.conf & Redis-server /path/to/b.conf & Redis-server /path/to/c.conf

They should be broadcasting on ports 6379, 16379, 6380, 16380, 6381, and 16381 on all servers. Note that the 163* ports are used for Redis-to-Redis gossip communication.

Next, install Ruby and Gem for the Redis-cluster Ruby Script.

gem install Redis

After that, locate Redis-trib.rb:

/path/to/Redis-trib.rb create --replicas 2 ${NODE_1}:6379 ${NODE_1}:6380 ${NODE_1}:6381 \
${NODE_2}:6380 ${NODE_2}:6379 ${NODE_2}:6381 \
${NODE_3}:6380 ${NODE_3}:6379 ${NODE_3}:6381

This will prompt an option to configure the server in a certain way. Write yes.

Note that there’s also a Python Redis cluster script that can be seen here. I have only tested the Redis-trib.py.

Configure Gluu Server

In either the oxTrust/Identity GUI (/identity/configuration/update) or LDAP directly, set cacheProviderType to Redis.

In RedisConfiguration:

  • Change RedisProviderType to CLUSTER
  • Add the following to oxCacheConfig for Redis servers.
    • ${NODE_1}:6379,${NODE_1}:6380,${NODE_1}:6381,${NODE_2}:6379,${NODE_2}:6380,${NODE_2}:6381,${NODE_3}:6379,${NODE_3}:6380,${NODE_3}:6381
      
    • In the above, ${NODE_N} refers to the servers you have Redis-clusters deployed on.
  • Set config to CLUSTER instead of STANDALONE.

Things to be Aware Of

When a node or a worker fails, the Redis-cluster quorum will convert the presumed most up to date Redis-server slave in the cluster that has the missing master keyring to master. This can lead to a couple of problems going forward.

In the example provided above, the cluster is configured as follows:

Node 1:

Keyring a [master]
Keyring b [slave]
Keyring c [slave]

Node 2:

Keyring a [slave]
Keyring b [master]
Keyring c [slave]

Node 3:

Keyring a [slave]
Keyring b [slave]
Keyring c [master]

There won’t be a master on every single node. In the scenario of a downed node that comes back up and rejoins the cluster, you have to manually redistribute masters and slaves around the cluster. This can be done with the following commands:

Redis-cli -p <$Redis_PORT> CLUSTER NODES

You’ll get an output like the following:

49115bec337bb5194f67595a46ab9c1304f1a5f3 3.3.3.3:6380@16380 master - 0 1528484579070 7 connected 10923-16383
17692c7c50ec70d680c8f7751bd9ee8adbaa2326 2.2.2.2:6379@16379 slave f3f9350bb32fb3df0bf5c22c68a899e871286d37 0 1528484579070 5 connected
fce995e6a9808b0aee5b8f560705ba7d04fa2d0b 1.1.1.1:6380@16380 slave 49115bec337bb5194f67595a46ab9c1304f1a5f3 0 1528484579070 7 connected
afb84fa160435525e47f6cf3eeeb3a339b8971ac 3.3.3.3:6379@16379 slave f3f9350bb32fb3df0bf5c22c68a899e871286d37 0 1528484579773 8 connected
f73cb422e3464faf46067574e182cbd88eab901f 1.1.1.1:6381@16381 slave f63d2d73bd223bde1ff69762f6f2684038985435 0 1528484579070 4 connected
f63d2d73bd223bde1ff69762f6f2684038985435 2.2.2.2:6380@16380 master - 0 1528484579070 4 connected 5461-10922
895513958a17c0ceb3a95512d2bc3611b0c38ad5 2.2.2.2:6381@16381 slave 49115bec337bb5194f67595a46ab9c1304f1a5f3 0 1528484579000 7 connected
f3f9350bb32fb3df0bf5c22c68a899e871286d37 1.1.1.1:6379@16379 myself,master - 0 1528484579000 1 connected 0-5460
25b63783b167414e28034a21d24ba554ede8d4eb 3.3.3.3:6381@16381 slave f63d2d73bd223bde1ff69762f6f2684038985435 0 1528484579070 9 connected

A simple breakdown:

49115bec337bb5194f67595a46ab9c1304f1a5f3 is a node_id. It is the identifier of the Redis server running on port 6380 on the server with the IP address 3.3.3.3. It also shows the gossip port as 16380. It is a master as well. The - refers to the master it is connected to, which is none.

Alternatively the slaves of this server can be found by locating the node_id which is attached to the aforementioned node_id. fce995e6a9808b0aee5b8f560705ba7d04fa2d0b (on port 6380 of 1.1.1.1) and 895513958a17c0ceb3a95512d2bc3611b0c38ad5 (on port 6381 of 2.2.2.2).

Take this information and determine which server(s) don’t have a master and appropriate slave configuration to restore your cluster back to its original redundant state. Each Redis-server worker will provide which server it is attached to and, if it’s a slave, which master it is replicating. It will also identify itself with the myself marker.

The idea is to make sure that every master has one slave replicating it on each server (not including its own server). To do this sort of process, use the following commands.

Redis-cli -p <$Redis_PORT> CLUSTER FAILOVER

This brings the Redis worker on that port to master status. This can also be a valid strategy for a cluster that is failing because too many nodes went down and the Redis-cluster could not come to Quorum about slave to master transition. Manual intervention can be applied with:

Redis-cli -p <$Redis_PORT> CLUSTER FAILOVER TAKEOVER

This will bypass a quorum requirement and force the Redis worker on that node to become a master. Additional documentation can be found here

Now, use the same information previously attained to determine if each nodes slaves are in the same redundant configuration as before. If you don’t have a slave on each node replicating every other node, then run the following command to redistribute the cluster:

Redis-cli -p <$Redis_PORT> CLUSTER REPLICATE <$REDIS_MASTER_NODE-ID>

Repeat this until you have a master on every node and a slave for each master on each node.

Further reading

Redis Cluster tutorial

Redis Cluster CLI Commands

Redis Cluster Spec

Redis.trib Cluster Cheat Sheet 

Have an IAM project you would like to discuss?