Tuesday, July 1, 2025

HeatWave MySQL Disaster Recovery with Bi-Directional Replication

HeatWave MySQL is a MySQL database service that is part of Oracle Cloud Infrastructure (OCI). It's a fully managed database service that allows you to run MySQL workloads in the cloud.


Replication with MySQL is Asynchronous in nature.  It is uni-directional.   HeatWave has Channel to define the in-bound Replication from another "MySQL" or  "HeatWave".     If we have 2 HeatWave DB Systems in different Region, Replication Channel is a good use to define replication between the Regions.

To setup HeatWave Channel between Regions, please refer to the following URL for more information

Ref : https://docs.oracle.com/en/learn/oci-mysql-dr-copy/index.html


Although Replicatoin is uni-directional, people might want to define Bi-directional Replication between the TWO DB Systems.   It is the TWO Replication Channels, where each DB System is defined with a Channel pointing each other.


Wednesday, June 4, 2025

Troubleshooting HeatWave Replication Channel

HeatWave is a MySQL Database Service running on Cloud with powerful in-memory data warehouse service that provides fast and scalable analytics capabilities. One of the key features of HeatWave is its ability to replicate data from MySQL databases, allowing businesses to replicate data from MySQL source. The replicated deployment may help in horizontally scaling to allow multiple MySQL Servers with data for query. The replica may serve as Disaster Database running in another region. However, like any complex system, HeatWave replication can sometimes encounter issues. In this article, we will provide a step-by-step guide on how to troubleshoot HeatWave replication channel.

Understanding HeatWave Replication

Before we dive into troubleshooting, it's essential to understand how HeatWave replication works. HeatWave replication is based on MySQL replication, where data is replicated from a MySQL source database via binlog Replication. The replication process involves creating a replication channel, which is a connection between the MySQL source database and the HeatWave database.

Common Issues with HeatWave Replication

Some common issues that can occur with HeatWave replication include: 

  • Connectivity issue between Source and Target: The replication channel may not be able to establish a connection between the MySQL source database and the HeatWave database due to network connectivity issues or firewall rules. 
  • GTIDs not matching with source: The GTIDs (Global Transaction Identifiers) on the HeatWave database may not match with the GTIDs on the MySQL source database, causing replication issues. Data loading issues: Data may not be loading correctly into the HeatWave database due to issues with the data loading process. 
  • MySQL IO Thread and SQL Applier Thread Issues: The MySQL IO thread and SQL applier thread are responsible for replicating data from the MySQL source database to the HeatWave database. Issues with these threads can cause replication problems. 
  • EVENT Issues with DR Setup: EVENTs, which are scheduled jobs in MySQL, can cause issues with replication if not properly configured. 
  • Primary Key issue where Heatwave is configured with HA however production is running with Standalone. Troubleshooting Steps

To troubleshoot HeatWave replication channel, follow these steps: 

  • Check the replication channel status: Use the SHOW REPLICA STATUS command to check the status of the replication channel. This will provide information on the current state of the replication channel, including any errors that may have occurred. On OCI HeatWave Console, the channel dashboard shows the status if there is any issue On OCI HeatWave Console, the graphs show the replication channel health and alarm can be setup to send email alert upon any failure to the channel 
  • Verify the connectivity between Source and Target: Check that there are no network connectivity issues between the MySQL source database and the HeatWave database. Check the GTIDs on the HeatWave database: 
    • Use the SELECT * FROM mysql.gtid_executed command to check the GTIDs on the HeatWave database.
    • Show variables like '%gtid%' to identify differences between source and target 
  • Verify if there is any EVENT running on both Source and Target Use the following SQL to list any event with databases 
    • SELECT event_schema, event_name, definer, interval_value, interval_field, status FROM information_schema.EVENTS where event_schema not in ('mysql_task_management', 'mysql', 'sys'); 
  • Retrieve the applier error from the performance_schema table: Use the SELECT * FROM performance_schema.replication_applier_status_by_worker command to retrieve the applier error from the performance_schema table. 
  • Check the MySQL IO Thread and SQL Applier Thread status: Use the SHOW REPLICA STATUS command to check the status of the MySQL IO thread and SQL applier thread. EVENT Issues with DR Setup

EVENTs can cause issues with replication if not properly configured. For example, if an EVENT is scheduled to run on the source database and creates a new record, the replication process may encounter a duplicate key error if the same record is created on the target database.

Best Practices for Using MySQL Shell to Dump and Load Data to HeatWave

MySQL Shell is a powerful tool for dumping and loading database content, particularly for external databases from on-premise or other environments. When using MySQL Shell to dump and load data to HeatWave, it's essential to follow best practices to ensure a smooth and successful process.

  • It is always recommended to use the latest version of MySQL Shell whenever it is possible no matter the target Database version is 5.7, 8.0 or 8.4 or 9.x.
  • Performance is associated with the Network connectivity and CPU of the COMPUTE where MySQL Shell is installed.  It is highly recommended using 4+ more CPU where network and parallel threads are possible.
  • Storage of the dump content to be on local file system or Object Storage depends on the size of the backup and performance requirement.     


To dump data from a MySQL source database, you can use the util.dumpInstance utility in MySQL Shell. This utility provides a convenient way to dump the entire database instance, including all databases, tables, and other database objects. There are many options with util.dumpInstance. There are constraints with the application(s) where the database has to be configured. The following example is illustration for a particular scenario and it may not work for all cases.

When using util.dumpInstance, it's essential to consider the following best practices: 

  • Check for tables without primary keys: MySQL Shell's util.dumpInstance utility checks for tables without primary keys by default. It's recommended to fix any table without primary keys before dumping the data. This ensures that the data is properly loaded into the HeatWave when High Availability deployment is configured.  

Solutions to primary key issue with the Source

    • Fix it manually in the source database.  
    • Workaround 1: Creating invisible primary key in the target. The create_invisible_pks option can be specified with the util.dumpInstance(...). However if there is any AUTO_INCREMENT column defined with the table, the script will fail. The AUTO_INCREMENT column can be defined with Primary Key manually.    
    • Workaround 2 : applying ignore_missing_pks option to the util.dumpInstance(...) and loadDump to the Target HeatWave. By looking up the tables without primary key, Invisible Primary key can be manually applied to those tables. Again consider replication from source where those table does not have primary key, the replication channel must be defined properly with "Generate Primary Key" option and HeatWave Configuration should be configured with variable sql_generate_invisible_primary_key to ON. Please refer to the link for more info https://blogs.oracle.com/mysql/post/introducing-mysql-heatwave-generated-invisible-primary-keys 
      • SELECT t.table_schema, t.table_name FROM information_schema.tables t LEFT JOIN (SELECT table_schema, table_name FROM information_schema.statistics WHERE index_name = 'PRIMARY' GROUP BY table_schema, table_name, index_name ) pks ON t.table_schema = pks.table_schema AND t.table_name = pks.table_name WHERE pks.table_name IS NULL AND t.table_type = 'BASE TABLE' AND t.table_schema NOT IN ('mysql', 'sys', 'performance_schema', 'information_schema');
  • Exclude events: Events can cause issues with replication if not properly configured. To avoid these issues, you can exclude events from the dump using the excludeEvents option. However, without Event being loaded to the Target, any modification on Source database about the Event may result in error with the replication channel. As DR setup, the Event missing from DR Database may result in improper handling which developer must take serious consideration what to do with DR Activation. e.g. Create the Event upon DR activation. Here's an example of how to use util.dumpInstance with the excludeEvents option:

util.dumpInstance("mydump", { threads: 4, consistent: true, ocimds: true, events: false, compatibility: ["strip_definers", "strip_invalid_grants", "strip_restricted_grants", "skip_invalid_accounts"], targetVersion:"8.0.39" })

Once you have dumped the data from the MySQL source database, you can load it into the HeatWave database using the util.loadDump utility. Example :

util.loadDump("mydump", { updateGtidsSet: "replace", loadUsers: true, progressFile: "load_progress.txt", resetProgress: true, ignoreVesrion:true })

  • The updateGtidsSet:"replace" is used for any New DB where the GTID is empty. The loadUsers by default is False. As DR, it is important to mark the option loadUsers:true.

When loading data into HeatWave, it's essential to consider the following best practices: 

  • Create primary keys: If the source database has tables without primary keys, you can use the createInVisiblePKs option with util.loadDump to create invisible primary keys on the target database. 
  • Define replication channel carefully: When defining the replication channel, you must carefully consider the configuration to ensure that it is properly set up. This includes configuring the replication channel to handle tables without primary keys on the source database.

Additional Considerations

When using MySQL Shell to dump and load data to HeatWave, there are additional considerations to keep in mind: 

  • EVENT exclusion: If you exclude events from the dump, any modification to existing events on the source database may result in an error on the target database, which does not have the event registered. 
  • GTID maintenance: When loading data into HeatWave, it's essential to maintain the GTIDs (Global Transaction Identifiers) to ensure that the replication process is properly configured. 
  • Version: MySQL Shell latest version of 9.x can be used to dump / load across MySQL version. Please check compatibility from documentation. To load dump data into HeatWave, you can use the util.dumpInstance / util.loadDump command with version specific options. Here is an example 
    • util.dumpInstance("mydump", {targetVersion:"8.0.39".....})
    • util.loadDump("mydump", {...., ignoreVersion: "8.0.28" }) 

Examples of Common Issues 

  • SQL Applier thread stopped with failure :
    • SQL command running on Source is not valid in HeatWave database  
    • the privilege is not granted / valid on HeatWave database
    • the record being updated /deleted is not found in HeatWave database
    • Duplicated Key in the target database for insert operation
Very often, these errors are due to inconsistent data between Source and Target Database.  Or there is manual operation to the Target Database and making changes to the database directly.
  • Source GTID has been purged where the missing GTID on HeatWave database is not found from Source.   
  • Duplicate Key Error due to EVENT : Suppose we have a table orders with a primary key. If an EVENT creates a new record on both the source and target databases, the replication process may encounter a duplicate key error. 
  • Group Replication Limitations: Suppose we have an HA configuration with HeatWave and the group replication has a maximum transaction size limit. If a transaction exceeds this limit, it may fail. 
  • Max Binlog Cache Size: The max_binlog_cache_size variable controls the maximum size of the binary log cache. If this value is too small, it can cause issues with large transactions.
  • Binlog Expiry: The binlog_expire_logs_seconds variable controls how long binary logs are kept on the source database. If this value is too small, it can cause issues with replication. The default setting with Heatwave is 3600. It is 1 Hour.
  • Relay Log Max Size: The max_relay_log_size variable controls the maximum size of the relay log on the HeatWave cluster. If this value is too small, it can cause issues with replication if the Applier fails and the relay log is not large enough to hold the transactions. 
  • MySQL 8.0.x Considerations :  When using MySQL 8.0.x, creating stored procedures or triggers with a different user from the administrator may result in an error due to the lack of SET_USER_ID privilege. The error log may show with the following errors : MY-1227 (42000): 
    • Access denied; you need (at least one of) the SUPER or SET_USER_ID privilege(s) for this operation
    • Ref : https://docs.oracle.com/en-us/iaas/mysql-database/doc/troubleshooting-inbound-replication.html

To resolve this issue with MySQL 8.0.x, Contact Oracle Support to grant theSET_USER_ID privilege to the Applier username.

Upgrade MySQL 8.0.x to 8.4.<latest> or 9.<latest> is highly recommended. The SET_USER_ID has already been deprecated in 8.2 and removed in 8.4. Administrator with MySQL 8.4 or above has been granted with new Privileged - SET_ANY_DEFINER

Summary 

By following these troubleshooting steps and best practices, you can ensure that your HeatWave replication channel is working correctly and that your data is being replicated accurately and efficiently. 


Please also find the inbound replication trouble shooting document : https://docs.oracle.com/en-us/iaas/mysql-database/doc/troubleshooting-inbound-replication.html