procheck
procheck — utility to verify data after a migration
Synopsis
procheck [option...]
Description #
procheck is a utility for data verification using specialized tests. It is intended to be used after a data migration performed by procopy and prosync. The configuration parameters, specified in the configuration file, define how the tests are performed.
procheck performs the following verification:
Compares the values of unique identifiers in source and destination tables.
Compares the number of rows in tables.
Compares data row-by-row.
Compares query results.
Usage #
Generating the Configuration File #
To generate a sample configuration file with the config_file name, execute the command:
procheck --generate_config config_file
You can later update configuration values there.
Launching the DB Verification #
To launch the DB verification that is specified in the config_file configuration file, execute the command:
procheck -c config_file
Configuring the DB Connection #
The connection to the source DB is specified in the source section of the configuration file, and the connection to the destination DB is specified in the destination section of the configuration file. See also an example of the configuration file.
Specifying the Tests to Perform #
Tests that procheck performs are defined in the comparison_rules section of the configuration file. See also an example of the configuration file.
For each schema, the source and destination databases must be specified, as well as a list of tests.
To exclude unnecessary tables from tests, use the optional configuration parameter exclude_tables, which can be defined for the whole schema and for an individual test.
To exclude table columns from row-wise comparison, use the optional configuration parameter exclude_columns, which can also be defined for the whole schema and for an individual test.
You can define the mapping of table names between the source and destination databases at the schema level. To do this, use the optional configuration parameter table-mapping.
To define the mapping of column names between the source and destination databases at the schema level, use the optional configuration parameter column-mapping.
To specify tables to be included in tests, use the required configuration parameter tables, which should be defined for each test.
You can set the the optional global parameter skip_lobs to specify whether to compare large objects for specific tables. The global parameter can be redefined for specific tables using the skip_lobs parameter in the comparison_rules section.
Types of tests are listed in the table below:
Table 6.1. procheck Types of Tests
Configuration value of type | Description |
|---|---|
min_id | Comparison of minimum values of a unique key. Only performed for tables with a unique key that consists of one column. |
max_id | Comparison of maximum values of a unique key. Only performed for tables with a unique key that consists of one column. |
minmax_id | Comparison of minimum and maximum values of a unique key. Only performed for tables with a unique key that consists of one column. |
count | Comparison of the number of rows in tables. |
query | Comparison of results of user queries to the source and destination DBs. The queries are specified in the source_query and destination_query parameters. When writing the queries, you must ensure the same order of rows and columns in the results. Column names do not matter for the comparison. The tables and exclude_tables parameters are not used for tests of this type. |
all_rows | Comparison of all rows in tables. For columns of the BLOB, CLOB, BYTEA, BFILE, SFILE, and TEXT types, values of the MD5 hash are compared. If the value of skip_lobs is true, comparison of these types of columns is skipped. The test is only performed if the table has a unique key. If this key is numeric and consists of one column, the table can be split into chunks by the value of this key to be compared in parallel. The maximum number of rows in a chunk and the maximum number of chunks processed in parallel are controlled by the chunk_size and max_parallel_chunks parameters, respectively. |
table_list | Verifying the availability of tables in DB. |
col_types | Verifying the comparability of data types in the source and destination databases. |
For tables for which tests were skipped because of the lack of a required key, count tests are automatically performed. If the count test is already specified in the configuration file for some table, the test is not repeated. Automatically preforming count tests can be turned off with the skip_auto_count_test configuration parameter.
Specifying the Row Filtering #
You can filter table rows to perform tests on. To specify filters for tables of the source DB, use the source_where configuration parameter. To specify filters for tables of the destination DB, use the destination_where configuration parameter. Both can be specified either at the schema level or for an individual test. Filters specified for individual tests override those specified at the schema level. The filter condition can be any SQL statement that returns the boolean type.
You need to specify destination_where only if column names in the filter for the destination DB differ from column names in the source DB. In this case, the table name in the destination DB must be specified.
Return Codes #
procheck may return one of the following codes:
0 — Successful termination. All the tests have been successfully passed, that is, have the
PASSEDstatus.1 — The application terminated with an error.
3 — The application terminated due to an error in the configuration or in command-line options.
4 — The application terminated due to an error of the database access.
5 — The application terminated due to an error of the file input/output.
6 — The application terminated due to an error of the terminal input/output.
7 — Normal termination, but at least one test completed with the
ERRORstatus.8 — Normal termination, all the tests completed without errors, but at least one test completed with the
FAILEDstatus.
Command-Line Reference #
This section describes procheck commands. Optional parameters are enclosed in square brackets.
procheck #
procheck -c|--generate_config config_file
[--json] [-v|--version]
[config_options] [output_control_options]
Perfroms the tests if the -c option is specified and generates a procheck configuration file if the --generate_config option is specified. config_file specifies the name of the configuration file.
--jsonOutput the results of tests in the JSON format and save them to the
file. By default, the results are output in a plain-text tabular human-readable format and saved to theconfig_file_results.jsonfile.config_file_results.txt-v--versionOutput the procheck version.
Command-Line Configuration Options #
Additionally, command-line configuration options can be specified, which override respective settings in the configuration file:
-p--skip_passed_tablesSets the
skip_passed_tablesconfiguration parameter totrue, which means no display of detailed information on tables with thePASSEDtest status.-sinteger--chunk_sizeintegerSets the value of the
chunk_sizeconfiguration parameter.-ginteger--max_parallel_chunksintegerSets the value of the
max_parallel_chunksconfiguration parameter.-tinteger--max_parallel_tablesintegerSets the value of the
max_parallel_tablesconfiguration parameter.
Command-Line Output-Control Options #
Command-line options can also be specified that control displaying the information on the screen:
-iOnly display information on tables: list of columns and their types.
--no_tuiDisplay no progress information for tests.
-q--quietDisplay no progress information on tests and their results.
The --skip_passed_tables option above also controls displaying the information on the screen.