prosync

prosync — utility to replay changes to a source database on a destination database of a different type

Synopsis

prosync config generate [option...]

prosync init [option...]

prosync run [option...]

prosync complete [option...]

Description #

prosync is a utility to perform the change data capture (CDC) of a source database and replay the changes on a destination database of a different type.

prosync is not responsible for:

  • Database schema transfer.

  • Transferring data that is missing from log files.

  • Data verification after the data transfer.

Supported Operations #

  • DML: INSERT, UPDATE, DELETE

  • DDL: TRUNCATE

Usage #

Migrating the Database #

prosync participates in the hot DB migration (see Section 4.5 for more details).

Preparing for a Migration of a Hot DB #

To prepare for the migration of a hot DB, perform the following steps:

  1. Prepare and set up the destination DB for the migration:

    • Create the necessary schemas and tables.

    • Remove or turn off triggers.

  2. Generate the configuration file:

    prosync config generate -o filename
    

  3. Update the configuration file with appropriate values.

  4. Initialize prosync with the command:

    prosync init -f filename
    

    For an Oracle source, prosync will store the current SCN to start the subsequent replication with this value. For a PostgreSQL/Postgres Pro source, prosync will create a logical replication slot for subsequent reading of changes.

Performing the Migration of a Hot DB #

Perform the migration as follows:

  1. Launch the procopy load:

    procopy load -f filename
    

  2. When it finishes, launch the prosync replay of changes:

    prosync run -f filename
    

  3. Wait until the difference between Parsing SCN and Oracle latest SCN decreases to an admissible value. For the migration of a hot DB, it is critical for this difference to decrease:

    Parsing SCN          3671057
    Oracle latest SCN    4995972
    

  4. Disable the load on the source DB and wait until Parsing SCN and Oracle latest SCN virtually stop changing. Stop prosync.

    Note

    Changes always occur in Oracle even if nobody changes the data, so Oracle latest SCN is continuously growing. Therefore, the difference between Parsing SCN and Oracle latest SCN never reaches zero.

Excluding Columns from Tasks of the Table Type #

When executing tasks of the TABLE type (see the section called “Types of Tasks” for details) during data migration involving prosync, do not exclude (through exclude_columns or include_columns) columns that are used by LogMiner as a row key. Otherwise, executing DELETE or UPDATE commands can change more than one row.

For tables without keys (heap), do not exclude any columns as LogMiner uses the whole row for identification.

Note

These rules do not apply for data migrations that are only performed by procopy.

Notes for PostgreSQL/Postgres Pro Sources #

When migrating the PostgreSQL/Postgres Pro DB, a replication slot is needed for running prosync replay of changes. Therefore:

  • Before launching the replay of changes by prosync run, execute the init command, which will create the replication slot.

  • When run finishes, execute complete command to clear the replication slot.

Working with Shardman #

This section explains how prosync works with Shardman as a source.

Configuration #

For example, in the YAML configuration file, specify:

source:
    driver_name: shardman
    shards:
        shard-1:
            dsn: postgresql://host1:5432,host2:5432/postgres?user=postgres&password=fqs291dbb&target_session_attrs=prefer-standby
        shard-2:
            dsn: postgresql://host3:5432,host4:5432/postgres?user=postgres&password=fqs291dbb&target_session_attrs=prefer-standby

Where shard-1 and shard-2 are shard names and dsn parameters specify BiHA nodes in these shards.

Note

When source.driver_name is shardman, configuration of each shard must be explicitly specified. No topology autodetection takes place. However, see the section called “Partially Specifying Shards” for more details.

Possible values of target_session_attrs in dsn are as follows:

  • read-write, primary — the connection is only to the master. Reconnects to master will occur (when the master changes or is inaccessible).

  • read-only — the connection is always to a standby. Reconnects to other standbys will occur (if the current node was promoted to master or is inaccessible).

  • prefer-standby — the preferred connection is to a standby. If the BiHA cluster has more standbys, reconnects to them will occur. If only the master is left, a reconnect to it will occur.

Configure prosync_options as follows:

  • Set use_replica_auto_sync_slots to true if you need failover replication slots and replication from a standby in the BiHA cluster.

  • Set use_failover_slots to true if you need failover replication slots and replication only from the master.

  • It is recommended to set use_replica_auto_sync_slots to true and in dsn, set target_session_attrs to prefer-standby.

Partially Specifying Shards #

By default, exclusion of shards from the configuration is prohibited as it can cause data loss in the destination when working with sharded and local tables. However, running prosync init or run with the --allow-partial-shardman-read option allows you to exclude some shards. With this option, prosync will work without specifying all the shards in the configuration file. This is useful when:

  • Data is manually distributed across shards and is located, for example, in specific local tables.

  • Data is located only in global tables, or data from these tables is only needed.

Processing Global Tables #

Global tables from only one shard are processed. The shard is chosen by sorting shard names in the configuration file.

Two replication slots are created on the chosen shard:

  • A slot for global tables only.

  • A slot for sharded and local tables.

On the other shards, replication slots for sharded and local tables are only created.

Processing Local Tables #

By default, processing of local table is prohibited as the result of this processing in the destination is unpredictable. However, the prosync init or run command with the --allow-shardman-local-tables option processes local tables. This is useful when:

  • At the business logic level, local tables on all shards contain different data.

  • The local table is on one of the shards.

Additional Information #

Tasks of the Schema or Table types are supported. For more details, see the section called “Types of Tasks”.

If a local table located on one of the shards is specified in the configuration file, a replication slot that includes this table will automatically be created on this shard.

So for different local tables, replication slots with different names and different table lists will be created on different shards.

prosync run outputs Shardman-specific statistics. See the section called “Output Statistics” for details.

Command-Line Reference #

This section describes prosync commands. Optional parameters are enclosed in square brackets.

config generate #

prosync config generate [-f|--format json|yaml]
[-o|--output filename] [-c|--source_config]
[global_options]

Generates a prosync configuration file.

-f json|yaml
--format json|yaml

Specifies the format of the configuration file: JSON or YAML. The default is YAML.

-o filename
--output filename

Specifies the filename of the configuration file. By default, the configuration file will be output to stdout.

-c
--source-config

Use the procopy configuration file as the basis of the prosync configuration file and add missing options there.

init #

prosync init -f|--file filename [-u|--update-config]
[--allow-partial-shardman-read] [--allow-shardman-local-tables]
[global_options]

Gets the current SCN from the source database. All the hosts specified in the configuration file must be accessible.

-f filename
--file filename

Name of the configuration file, where the connection options are specified.

-u
--update-config

Use the procopy configuration file as the basis and add missing options there.

--allow-partial-shardman-read

Run without the specification of all cluster shards in the configuration file. This may result in data loss.

--allow-shardman-local-tables

Process local tables of a Shardman cluster. This may result in inconsistent data in the destination.

run #

prosync run -f|--file filename
[--log-level error|warn|info|debug] [--dry-run|--read-only]
[-c|--clear] [--with-table-stats] [--with-dev-stats] [--allow-key-exclude]
[--allow-partial-shardman-read] [--allow-shardman-local-tables]
[--final-scn] [--skip-scn-save]
[global_options]

Launches the replay of changes to the source DB on the destination DB. Outputs statistics on the replay.

-f filename
--file filename

Name of the configuration file, where the connection options are specified.

--log-level error|warn|info|debug

Logging level. The default is info.

--dry-run

Turns off writing to the destination. prosync will perform all the operations except writing. This flag is useful for checking the performance of writing.

--read-only

Turns off all the prosync logic except reading from the source. This flag is useful for checking the performance of reading.

-c
--clear

Clear the screen before each next output of the statistics. When enabled, it is recommended to output logs to a file, as follows:

prosync run ... -c 2>prosync.log

--with-table-stats

Collect and report the progress per tables and operations. See the section called “Output Statistics” for details.

--with-dev-stats

Collect and report the statistics for development. See the section called “Output Statistics” for details.

--allow-key-excude

Allows excluding key columns from replication. If enabled, prosync may not work correctly for tables with unique constraints and heap tables when including or excluding certain columns. Set to true with caution and at the risk of data loss.

--allow-partial-shardman-read

Run without the specification of all cluster shards in the configuration file. This may result in data loss.

--allow-shardman-local-tables

Process local tables of a Shardman cluster. This may result in inconsistent data in the destination.

--final-scn string

SCN to stop the syncronization. The default is 0.

--skip-scn-save

Skip saving the SCN.

Output Statistics #

The following statistics is output regardless of the options:

  • Total Measures — general statistics as a table:

    • Read — number of operations read from the source.

    • Write — number of operations written to the destination.

    • Transactions — number of transcations read.

    • Read bytes — number of bytes read.

    • Write bytes — number of bytes written.

  • Shards — statistics on shards as a table (for Shardman source):

    • Shard name — name of a shard. If global tables are transferred, a row with the shard name and the _global postfix is added with the statistics on transferring global tables from this shard.

    • State — name of the current status: receiveLogs — reading or waiting for reading data from the source; processLogs — processing the read data; reconnecting — reconnection.

    • Time lag — time of the destination lagging behind the source. Computed as the difference between the completion time of the last transferred operation in the source and the current time on the source server.

    • Bytes lag — destination lagging behind the source in bytes. Computed as the difference between the LSN of the last transferred operation in the source and the current LSN on the source server. The operations on all the tables on the source server are taken into account.

    • Avg 60 sec, speed — the average speed of writing bytes to the destination for 60 seconds.

  • Additional — more general statistics:

    • Confirmed data time — time of the transferred data on the source server. Computed as the current source time minus the maximum Time lag. This value can be useful to check data in the source and destination.

    • App time — current time in the prosync application.

    • App duration — time of the prosync operation since the process start.

If --with-table-stats is specified, more statistics on tables is output:

  • Tables Measures — more statistics on tables as a table:

    • <schema>.<table_name> Insert — number of transferred Insert operations, where <schema>.<table_name> is the name of the schema and table in the destination and Insert is the type of the transferred operation. Each transferred operation is in a separate row.

    • <schema>.<table_name> Write bytes — number of bytes transferred.

If --with-dev-stats is specified, more statistics is output:

  • Total Measures:

    • Read time — Time spent reading data.

    • Process time — Time spent processing data.

    • Write time — Time spent writing data.

    • Write batch count — Number of batches written to the destination.

    • Write, current applying batch importers — Number of loaders that are currently writing batches.

  • Additional

    • Latest batch insert time — Time the last batch was written.

complete #

prosync complete [-f filename] [global_options]

Clears the replication slot used by run. The slot name is determined from the configuration file, whose default name is config.yaml.

Global Options #

--log-level string

Log level of the command. Possible values are error, warn, info, and debug. The default is info.