prosync
prosync — utility to replay changes to a source database on a destination database of a different type
Synopsis
prosync config generate [option...]
prosync init [option...]
prosync run [option...]
prosync complete [option...]
Description #
prosync is a utility to perform the change data capture (CDC) of a source database and replay the changes on a destination database of a different type.
prosync is not responsible for:
Database schema transfer.
Transferring data that is missing from log files.
Data verification after the data transfer.
Supported Operations #
DML:
INSERT,UPDATE,DELETEDDL:
TRUNCATE
Usage #
Migrating the Database #
prosync participates in the hot DB migration (see Section 4.5 for more details).
Preparing for a Migration of a Hot DB #
To prepare for the migration of a hot DB, perform the following steps:
Prepare and set up the destination DB for the migration:
Create the necessary schemas and tables.
Remove or turn off triggers.
Generate the configuration file:
prosync config generate -o
filenameUpdate the configuration file with appropriate values.
Initialize prosync with the command:
prosync init -f
filenameFor an Oracle source, prosync will store the current SCN to start the subsequent replication with this value. For a PostgreSQL/Postgres Pro source, prosync will create a logical replication slot for subsequent reading of changes.
Performing the Migration of a Hot DB #
Perform the migration as follows:
Launch the procopy load:
procopy load -f
filenameWhen it finishes, launch the prosync replay of changes:
prosync run -f
filenameWait until the difference between
Parsing SCNandOracle latest SCNdecreases to an admissible value. For the migration of a hot DB, it is critical for this difference to decrease:Parsing SCN 3671057 Oracle latest SCN 4995972
Disable the load on the source DB and wait until
Parsing SCNandOracle latest SCNvirtually stop changing. Stop prosync.Note
Changes always occur in Oracle even if nobody changes the data, so
Oracle latest SCNis continuously growing. Therefore, the difference betweenParsing SCNandOracle latest SCNnever reaches zero.
Excluding Columns from Tasks of the Table Type #
When executing tasks of the TABLE type (see the section called “Types of Tasks” for details) during data migration involving prosync, do not exclude (through exclude_columns or include_columns) columns that are used by LogMiner as a row key. Otherwise, executing DELETE or UPDATE commands can change more than one row.
For tables without keys (heap), do not exclude any columns as LogMiner uses the whole row for identification.
Note
These rules do not apply for data migrations that are only performed by procopy.
Notes for PostgreSQL/Postgres Pro Sources #
When migrating the PostgreSQL/Postgres Pro DB, a replication slot is needed for running prosync replay of changes. Therefore:
Working with Shardman #
This section explains how prosync works with Shardman as a source.
Configuration #
For example, in the YAML configuration file, specify:
source:
driver_name: shardman
shards:
shard-1:
dsn: postgresql://host1:5432,host2:5432/postgres?user=postgres&password=fqs291dbb&target_session_attrs=prefer-standby
shard-2:
dsn: postgresql://host3:5432,host4:5432/postgres?user=postgres&password=fqs291dbb&target_session_attrs=prefer-standby
Where shard-1 and shard-2 are shard names and dsn parameters specify BiHA nodes in these shards.
Note
When source.driver_name is shardman, configuration of each shard must be explicitly specified. No topology autodetection takes place. However, see the section called “Partially Specifying Shards” for more details.
Possible values of target_session_attrs in dsn are as follows:
read-write,primary— the connection is only to the master. Reconnects to master will occur (when the master changes or is inaccessible).read-only— the connection is always to a standby. Reconnects to other standbys will occur (if the current node was promoted to master or is inaccessible).prefer-standby— the preferred connection is to a standby. If the BiHA cluster has more standbys, reconnects to them will occur. If only the master is left, a reconnect to it will occur.
Configure prosync_options as follows:
Set
use_replica_auto_sync_slotsto true if you need failover replication slots and replication from a standby in the BiHA cluster.Set
use_failover_slotsto true if you need failover replication slots and replication only from the master.It is recommended to set
use_replica_auto_sync_slotsto true and indsn, settarget_session_attrstoprefer-standby.
Partially Specifying Shards #
By default, exclusion of shards from the configuration is prohibited as it can cause data loss in the destination when working with sharded and local tables. However, running prosync init or run with the --allow-partial-shardman-read option allows you to exclude some shards. With this option, prosync will work without specifying all the shards in the configuration file. This is useful when:
Data is manually distributed across shards and is located, for example, in specific local tables.
Data is located only in global tables, or data from these tables is only needed.
Processing Global Tables #
Global tables from only one shard are processed. The shard is chosen by sorting shard names in the configuration file.
Two replication slots are created on the chosen shard:
A slot for global tables only.
A slot for sharded and local tables.
On the other shards, replication slots for sharded and local tables are only created.
Processing Local Tables #
By default, processing of local table is prohibited as the result of this processing in the destination is unpredictable. However, the prosync init or run command with the --allow-shardman-local-tables option processes local tables. This is useful when:
At the business logic level, local tables on all shards contain different data.
The local table is on one of the shards.
Additional Information #
Tasks of the Schema or Table types are supported. For more details, see the section called “Types of Tasks”.
If a local table located on one of the shards is specified in the configuration file, a replication slot that includes this table will automatically be created on this shard.
So for different local tables, replication slots with different names and different table lists will be created on different shards.
prosync run outputs Shardman-specific statistics. See the section called “Output Statistics” for details.
Command-Line Reference #
This section describes prosync commands. Optional parameters are enclosed in square brackets.
config generate #
prosync config generate [-f|--formatjson|yaml] [-o|--outputfilename] [-c|--source_config] [global_options]
Generates a prosync configuration file.
-fjson|yaml--formatjson|yamlSpecifies the format of the configuration file: JSON or YAML. The default is YAML.
-ofilename--outputfilenameSpecifies the filename of the configuration file. By default, the configuration file will be output to stdout.
-c--source-configUse the procopy configuration file as the basis of the prosync configuration file and add missing options there.
init #
prosync init -f|--filefilename[-u|--update-config] [--allow-partial-shardman-read] [--allow-shardman-local-tables] [global_options]
Gets the current SCN from the source database. All the hosts specified in the configuration file must be accessible.
-ffilename--filefilenameName of the configuration file, where the connection options are specified.
-u--update-configUse the procopy configuration file as the basis and add missing options there.
--allow-partial-shardman-readRun without the specification of all cluster shards in the configuration file. This may result in data loss.
--allow-shardman-local-tablesProcess local tables of a Shardman cluster. This may result in inconsistent data in the destination.
run #
prosync run -f|--filefilename[--log-levelerror|warn|info|debug] [--dry-run|--read-only] [-c|--clear] [--with-table-stats] [--with-dev-stats] [--allow-key-exclude] [--allow-partial-shardman-read] [--allow-shardman-local-tables] [--final-scn] [--skip-scn-save] [global_options]
Launches the replay of changes to the source DB on the destination DB. Outputs statistics on the replay.
-ffilename--filefilenameName of the configuration file, where the connection options are specified.
--log-levelerror|warn|info|debugLogging level. The default is
info.--dry-runTurns off writing to the destination. prosync will perform all the operations except writing. This flag is useful for checking the performance of writing.
--read-onlyTurns off all the prosync logic except reading from the source. This flag is useful for checking the performance of reading.
-c--clearClear the screen before each next output of the statistics. When enabled, it is recommended to output logs to a file, as follows:
prosync run ... -c 2>prosync.log
--with-table-statsCollect and report the progress per tables and operations. See the section called “Output Statistics” for details.
--with-dev-statsCollect and report the statistics for development. See the section called “Output Statistics” for details.
--allow-key-excudeAllows excluding key columns from replication. If enabled, prosync may not work correctly for tables with unique constraints and heap tables when including or excluding certain columns. Set to true with caution and at the risk of data loss.
--allow-partial-shardman-readRun without the specification of all cluster shards in the configuration file. This may result in data loss.
--allow-shardman-local-tablesProcess local tables of a Shardman cluster. This may result in inconsistent data in the destination.
--final-scnstringSCN to stop the syncronization. The default is “0”.
--skip-scn-saveSkip saving the SCN.
Output Statistics #
The following statistics is output regardless of the options:
Total Measures — general statistics as a table:
Read— number of operations read from the source.Write— number of operations written to the destination.Transactions— number of transcations read.Read bytes— number of bytes read.Write bytes— number of bytes written.
Shards — statistics on shards as a table (for Shardman source):
Shard name— name of a shard. If global tables are transferred, a row with the shard name and the “_global” postfix is added with the statistics on transferring global tables from this shard.State— name of the current status:receiveLogs— reading or waiting for reading data from the source;processLogs— processing the read data;reconnecting— reconnection.Time lag— time of the destination lagging behind the source. Computed as the difference between the completion time of the last transferred operation in the source and the current time on the source server.Bytes lag— destination lagging behind the source in bytes. Computed as the difference between the LSN of the last transferred operation in the source and the current LSN on the source server. The operations on all the tables on the source server are taken into account.Avg 60 sec, speed— the average speed of writing bytes to the destination for 60 seconds.
Additional — more general statistics:
Confirmed data time— time of the transferred data on the source server. Computed as the current source time minus the maximumTime lag. This value can be useful to check data in the source and destination.App time— current time in the prosync application.App duration— time of the prosync operation since the process start.
If --with-table-stats is specified, more statistics on tables is output:
Tables Measures — more statistics on tables as a table:
<schema>.<table_name>
Insert— number of transferredInsertoperations, where <schema>.<table_name> is the name of the schema and table in the destination andInsertis the type of the transferred operation. Each transferred operation is in a separate row.<schema>.<table_name>
Write bytes— number of bytes transferred.
If --with-dev-stats is specified, more statistics is output:
Total Measures:
Read time— Time spent reading data.Process time— Time spent processing data.Write time— Time spent writing data.Write batch count— Number of batches written to the destination.Write, current applying batch importers— Number of loaders that are currently writing batches.
Additional
Latest batch insert time— Time the last batch was written.
complete #
prosync complete [-ffilename] [global_options]
Clears the replication slot used by run. The slot name is determined from the configuration file, whose default name is config.yaml.
Global Options #
--log-levelstringLog level of the command. Possible values are
error,warn,info, anddebug. The default isinfo.