Version: Next

How to Enable Khepri

As of RabbitMQ 4.0, Mnesia is still the default metadata store backend. Khepri has to be explicitly enabled using the khepri_db feature flag.

This page demonstrates how to enable Khepri in various situations and what the user should be aware of.

important

While Khepri is fully supported in RabbitMQ 4.0.x, it does not have the 17 years of extensive use that Mnesia has. We encourage all RabbitMQ users to test Khepri thoroughly before adopting it in production.

It will be possible to upgrade from 4.0.x to future releases with Khepri enabled.

Terminology

The feature flags subsystem uses the words stable and experimental to qualify feature flags maturity.

An experimental feature flag is used in two situations:

To introduce changes to get feedback early during the development. These changes could be reverted, upgrading a RabbitMQ node with such a feature flag enabled may not bo possible and support may not be provided.
For features the RabbitMQ team committed to and provides support for, until it is ready to be enabled by default, possibly replacing an older system.

Khepri in RabbitMQ 3.13.x was in the first group. Be reassured that Khepri in RabbitMQ 4.0 and onward is in that second group and is therefore fully supported.

On a brand new RabbitMQ node

Using the CLI

Start the new RabbitMQ node using a method of your choice. The example below executes the rabbitmq-server(8) command directly:
- bash
- PowerShell
rabbitmq-server
rabbitmq-server.bat
At that point, the node is using Mnesia as the metadata store backend.

Enable the khepri_db feature flag:

bash
PowerShell

# Opt-in to enable Khepri.
rabbitmqctl enable_feature_flag --experimental khepri_db

# Opt-in to enable Khepri.
rabbitmqctl.bat enable_feature_flag --experimental khepri_db

See the next page to learn more about what happens when nodes with Mnesia and nodes with Khepri are clustered together.

Using the Management UI

Start the new RabbitMQ node using a method of your choice. See the example above.

At that point, the node is using Mnesia as the metadata store backend.

Enable the management plugin:

bash
PowerShell

rabbitmq-plugins enable rabbitmq_management

rabbitmq-plugins.bat enable rabbitmq_management

Open and log into the management UI.
Navigate to "Admin > Feature Flags".
Tick "I understand the risk" and click the "Enable" button:

The experimental feature flags section in the management UI

Using an Environment Variable

$RABBITMQ_FEATURE_FLAGS environment varable to set the list of feature flags to enable at boot time on a new node. The variable must be set to the exhaustive list of feature flags to enable on this node. This variable is considered on the very first boot only; it is ignored afterwards.

warning

The use of this variable requires caution: because the variable takes an exhaustive list, all feature flags that must be enabled in a given cluster must be listed.

Start the new RabbitMQ node using a method of your choice, setting the $RABBITMQ_FEATURE_FLAGS variable in the process. The example below executes the rabbitmq-server(8) command directly:

bash
PowerShell

env RABBITMQ_FEATURE_FLAGS="khepri_db,..." rabbitmq-server

$Env:RABBITMQ_FEATURE_FLAGS = 'khepri_db,...'
rabbitmq-server.bat

Note that this example does not list other feature flags to keep it short: you need to fill that list.

The RabbitMQ node will use Khepri right from the beginning.

On an Existing Standalone Node or Cluster

Khepri can be enabled when all cluster nodes are online and the cluster is healthy, like any other feature flag. Khepri cannot be enabled it while a node or the entire cluster is stopped.

To enable Khepri, use either the CLI command on the management UI methods described above.

The migration of the existing data from Mnesia to Khepri runs in parallel of regular activities of RabbitMQ. However this migration takes resources and will pause other activities near the end of the process for a short period of time. Therefore, perform this migration away from peek load.

What Happens When Khepri is Enabled?

The migration from Mnesia to Khepri is the responsibility of the khepri_mnesia_migration library.

This library performs the migration in two phases:

It synchronizes the cluster membership from Mnesia to Khepri.
It copies records from Mnesia tables to the Khepri store.

Step 1: Cluster Membership Synchronization

The common situation is that Khepri is enabled in a Mnesia-based cluster and thus all nodes involved are single isolated nodes from Khepri’s point of view.

To be extra safe and avoid the loss of data in case some nodes were already clustered at the Khepri levet too, khepri_mnesia_migration uses several conditions to make sure the Khepri cluster is deterministic. To achieve that, here are the steps it goes through:

It queries the list of members of the Mnesia cluster. This is the baseline list of nodes we want to cluster in Khepri too.
It queries each node to get the members of the Khepri cluster. Usually, Khepri was not clustered yet, so each node just returns itself.
It sorts the list of Khepri "clusters" according to the following criterias:
1. the cluster size (i.e. the number of members)
2. the number of records in the Khepri store
3. the node uptime
4. the node name
Therefore, in the case some nodes were already clustered at the Khepri level, the Khepri clusters will be sorted with the largest cluster (set of nodes) first.

But usually, nodes will be unclustered and thus sorted by node uptime and name.
It selects the largest Khepri "cluster" according to the criteria above and adds all other nodes to that largest cluster
If some nodes were clustered at the Khepri level but were not in Mnesia, they are removed from Khepri

Step 2: Schema Records Copy

Once the cluster membership view is the same between Mnesia and Khepri, khepri_mnesia_migration can proceed with the actual migration of the data. It performs the copy while permitting writes in Mnesia until the very last moment.

The copy relies on callback modules provided by RabbitMQ. These callack modules are responsible for telling khepri_mnesia_migration that record $record from table $table goes into Khepri path $path, after possibly doing some record conversion.

Here are the steps of the data copying algorithm:

khepri_mnesia_migration marks the migration in progress as value in Khepri.
It subscribes to all Mnesia updates.
It does the first copy from Mnesia to Khepri using Mnesia Backup & Restore API. This is based on a checkpoint in time in Mnesia, therefore the view is consistent.
It marks all Mnesia tables as read-only. This is where activities in RabbitMQ are paused. Client operations may time out as a consequence.
All updates received thanks to the Mnesia subscription in step 2 are now consumed and written to Khepri. Because tables are read-only, it is sure there is an end to the stream of updates.
It marks the migration as complete. RabbitMQ can resume activities: they will use Khepri from now on.
It proceeds with the cleanup: tables are deleted.

Rollback In Case of an Error

If there is an error during this process, everything is rolled back and RabbitMQ will resume activities using Mnesia as before.

Terminology​

On a brand new RabbitMQ node​

Using the CLI​

Using the Management UI​

Using an Environment Variable​

On an Existing Standalone Node or Cluster​

What Happens When Khepri is Enabled?​

Step 1: Cluster Membership Synchronization​

Step 2: Schema Records Copy​

Rollback In Case of an Error​