User Tools

Site Tools


docs:galera_vs_primary_replica_setup

Galera Vs Primary Replica Replication

Introduction

First up Galera is not HA neither is a Primary Replica setup. Galera is a multi-master replication system allowing 2 preferably 3 or more nodes to replicate data amongst themselves and provide some HA and load balancing. Although Galera can run with as few as 2 nodes it is not advised due to split brain cases.

Galera can operate with one node left standing alive and will not start failing even due to the number of nodes, if you are down to 2 nodes left you however will be in danger of split brain issues.

The closest you can get to Galera functionality which is not really that close all but a Primary-Primary replication or as it was called earlier Master-Master replication is the one that comes closest. A Primary-Replica or Master-Slave replication is not comparable to either of the others.

There are as of now a few limitations with Galera:

  • It only accepts binlog ROW format for now.
  • Only MariaDB Galera accepts MYISAM tables and this in experimental mode.

Note There should be no reason to use MYISAM tables it's the year 2022+ afterall. It should be safe to convert those tables to InnoDB.

Difference between MySQL and Galera replication.

Replication Concepts

Any modification to the MySQL database generates an event in binary format. This is transported to the other nodes depending on the replication method chosen - MySQL replication or Galera replication.

MySQL Replication

Here is a diagram showing the flow of a successful transaction using MySQL.

The event (transaction) is written to the binary log. The Replica will pull the event from the relay log asynchronously. Because this is asynchronous replication the Replica is not guaranteed to have the data when the Primary commits the transaction. It will need to be pulled and executed first from the Primary server.

In the perfect world the Replica should be configured as a read-only server using either:

  • read_only=ON
  • super_read_only=ON

Note super_read_only was added in MySQL 5.7.8 so this will not work on earlier versions. And since the closest thing to Galera is a Primary - Primary configuration you cannot issue read_only or super_read_only=ON it would make absolutely no sense.

Important When using a Primary - Replica setup the Replica should always be in read-only mode

Galera Replication

Here is a diagram showing the flow of a successful transaction using Galera.

The event (transaction) is transformed into a writeset from the node it was created on (this transaction could be made on any node) and broadcasted to the other nodes in the cluster. The writeset undergoes inspection and certification on every Galera node and if it is accepted it will apply the writeset i.e transaction asynchronously. This means the other nodes will eventually be synced i.e become consistent but only after agreement with all of the participating nodes in cluster. It is logically synchronous, but the actual writing and committing to the tablespace happens independently, and thus asynchronously on each node with a guarantee for the change to propagate on all nodes.

Avoiding Primary Key Collision

In order to deploy MySQL replication in Primary - Primary setup, one has to adjust the auto increment value to avoid primary key collision for INSERT between two or more replicating masters. You can read more on Primary - Primary key collision [Here]

Galera Cluster is controlling the auto increment value and offset automatically with the wsrep_auto_increment_control variable. If set to 1 which is the default value, it will automatically adjust the auto_increment_increment and auto_increment_offset variables according to the number of nodes in the cluster this is also true when the number of nodes in the cluster changes. This prevents replication conflicts regarding auto_increment.

The consequence is the auto increment value will not be in sequential order. Se this example with three nodes.

Node auto_increment_increment auto_increment_offset Auto increment value
Node 1 3 1 1, 4, 7, 10, 13, 16
Node 2 3 2 2, 5, 8, 11, 14, 17
Node 3 3 3 3, 6, 9, 12, 15, 18

This means when using Primary - Primary or Galera replication the applications must be able to cope with non-sequential auto-increment values in its dataset.

Data Consistency

Galera Cluster comes with its flow-control mechanism, where each node in the cluster must keep up when replicating, or otherwise all other nodes will have to slow down to allow the slowest node to catch up. This helps significantly on slave lag issues, which on the other hand i very common when using Primary - Primary replication.

Galera allows nodes to be at least 16 transactions behind. If you have critical reads i.e a SELECT statement that must return the most up to date information you can ad or adjust the session variable wsrep_sync_wait.

Galera comes with a safeguard to regarding data inconsistency which means a node will evicted from the cluster if it fails to apply any writeset for whatever reasons. So if a node fails to apply the writeset (transaction) due to an error from the storage engine the node will pull itself out of the cluster with the following error.

280104 13:09:27 [ERROR] WSREP: Failed to apply trx 1 4 times
280104 13:09:27 [ERROR] WSREP: Node consistency compromized, aborting..

To fix the data consistency, the offending node has to be re-synced before it is allowed to join the cluster. This can be done manually or by wiping out the data directory to trigger snapshot state transfer (full syncing from a donor).

MySQL Primary - Replica replication does not have data consistency protection. This means a Replica is able to diverge or replicate a subset of data or lag behind. This can make the Replica inconsistent with the Primary. It is working by replicate data in one flow, from Primary down to Replicas. Consistency checks have to be done manually or via external tools.

Node Consensus And Failover

Galera uses Group Communication System to check node consensus and availability between cluster members. If a node is unhealthy, it will be automatically evicted from the cluster.

When using MySQL Replication the master does not bother and has more or less no knowledge about it's Replica. So if a Primary goes down replication is broken until the Primary comes up again healthy. If the primary is beyond saving you have a some serious work load ahead of you.

The health checks involved when using MySQL replication are at least the following.

  • Seconds_Behind_Master
  • Slave_IO_Running
  • Slave_SQL_Running
  • read_only variable
  • super_read_only variable (MySQL 5.7.8 and later)

This is not the case with Gallera.

Also keep in mind that since MySQL replication is asynchronous, you can end up with data inconsistency meaning you can not be sure that all changes done on the Primary will have propagated to the Replica.

Provisioning

Mysql Replication

For MySQL replication provisioning a new node is a manual process it consist of the following manual tasks.

  • Configure the new node to be able to communicate with the Primary.
  • Manual making a backup of the database(s) that we want to replicate.
  • Manual import the backed up database(s) into the Replica.
  • Manual adjust binlog file and binlog position on the new node before replication can start.
  • If the Primary binlog have rotated you will need to re-provision all databases from the Primary.

In short this can end up to be very time consuming as this is also depending on the actual size of the database(s) you have.

Galera Replication

You have two ways of doing this.

  • Incremental state transfer (IST).
  • State snapshot transfer (SST).

IST is the preferred method where only the missing transactions transfer from a donor's cache. SST can be compared to a importing a full backup from the donor and usually very resource intensive.

Galera will automatically determine which syncing process to trigger based on the joiner's state. In most cases, if a node fails to join a cluster, simply wipe out the MySQL datadir of the problematic node and start the MySQL service making provisioning much simpler, also handy when scaling out your cluster or re-introducing a problematic node back into the cluster.

Under The Hood

MySQL replication works very well even across slower connections, and with connections that are not continuous. Also hardware specs are not really something worth considering it works with both InnoDB, MyISAM, ARCHIVE and others, and thus less restrictive.

Galera on the other hand is very strict performance is only as fast as the slowest node it uses flow control mechanism for replication flow among members in order to eliminate slave lag. The replication can be all fast or all slow on every node and is adjusted automatically. Because of this it's recommended to use as same hardware specs for all nodes, here thinking of CPU, RAM, DISK, NETWORK SPEED and try to minimize latency between the Galera.

docs/galera_vs_primary_replica_setup.txt · Last modified: 27/04/2022 14:26 by 127.0.0.1