5.2. Configuration Parameters Shared by Applications #

The following is an example of the procopy configuration file fragment in the YAML format that lists configuration parameters used both in procopy and prosync:

source:
    driver_name: oracle
    host: enter host
    port: 1521
    database: enter database
    username: enter username
    password: enter password if necessary
    max_conn: 50
    options:
        - PREFETCH_ROWS=500
destination:
    driver_name: postgresql
    host: enter host
    port: 5432
    database: enter database
    username: enter username
    password: enter password if necessary
    max_conn: 50
    options:
        - sslmode=disable
tasks:
    - id: tab1
      table:
        source_table: source_table
        destination_table: destination_table
        exclude_columns:
            - field4
        include_columns:
            - field1
            - field2
            - field3
            - field4
        column_mapping:
            field1: field2
            field2: field1
        where: (COL1 IS NULL OR COL1 > 10) AND COL2 <> 'value'
      batch_bytes: 976.6KiB
      truncate: false
      transform:
        field1:
            null_to_value: ""
        field2:
            values_to_null:
                - "NULL"
                - NULL_VAL
                - INVALID
      snapshot_id: null
    - id: tab2
      query:
        sql: SELECT id, name FROM users ORDER BY id
        destination_table: users_table
        destination_columns:
            - id
            - login
      batch_bytes: 976.6KiB
      truncate: false
      snapshot_id: null
    - id: schema_task
      schema:
        source_schema: schema1
        destination_schema: schema2
        exclude_tables:
            - TABLE1
        include_tables:
            - TABLE1
            - TABLE2
            - TABLE2
        table_mapping:
            TABLE2: main_table
      batch_bytes: 976.6KiB
      truncate: false
      snapshot_id: null

For how to set values for time configuration parameters, refer to Section 4.3.6.

The following table explains configuration parameters that are used both in procopy and prosync:

Table 5.2. Configuration Parameters Shared Between Applications

NameDescriptionDefault ValueExample
sourceParameters to connect to the source  
source.dsnDSN for the connection. If specified, this parameter has precedence: it defines the actual settings. For BiHA cluster:
dsn: "postgresql://db-biha-1:5432,db-biha-2:5432,
db-biha-3:5432/postgres?user=user&
password=password"
source.driver_nameDriver for the connection. Possible values: oracle, postgres, parquet (although Parquet is only used as a destination).  
source.hostHost name or IP address  
source.portConnection port  
source.databaseName of the database to connect to  
source.usernameConnection login  
source.passwordConnection password. Leave empty if the password is not required.  
source.max_connMaximum number of open connections50 
source.optionsAdditional connection parameters 
options:
- PREFETCH_ROWS=500
- sslmode=disable
destinationParameters to connect to the destination  
destination.dsnDSN for the connection. If specified, this parameter has precedence: it defines the actual settings. For BiHA cluster:
dsn: "postgresql://db-biha-1:5432,db-biha-2:5432,
db-biha-3:5432/postgres?user=user&
password=password"
destination.driver_nameDriver for the connection. Possible values: oracle, postgres, parquet.  
destination.hostHost name or IP address  
destination.portConnection port  
destination.databaseName of the database to connect to  
destination.usernameConnection login  
destination.passwordConnection password. Leave empty if the password is not required.  
destination.max_connMaximum number of connections allowed for the connection pool50 
destination.optionsAdditional connection parameters 
options:
- PREFETCH_ROWS=500
- sslmode=disable
tasksList of tasks. A task can be of the Query, Schema, or Table type. For more details, see the section called “Types of Tasks”.  
tasks.idTask name. Must be unique. It is recommended to avoid changing it between tasks. ID: "task_schema_1"
tasks.tableSection for a task of the Table type. For more details, see the section called “Types of Tasks”.   
tasks.table.source_tableQualified name of the source table: <schema>.<table>. The names that are specified without quotes are meant to be case-insensitive.  
source_table:
"SYSTEM"."PRODUCTS"
tasks.table.destination_tableQualified name of the destination table: <schema>.<table>. The names that are specified without quotes are meant to be case-insensitive.  
destination_table:
"public"."products"
tasks.table.exclude_columnsList of columns to exclude from data loading. List columns with case-sensitive names exactly as they are called in the source DB. Quotes are not needed. Incompatible with tasks.table.include_columns.  
exclude_columns:
- COLUMN_1
- COLUMN_2
tasks.table.include_columnsList of columns such that only these columns will be loaded. List columns with case-sensitive names exactly as they are called in the source DB. Quotes are not needed. Incompatible with tasks.table.exclude_columns.  
include_columns:
- COLUMN_1
- COLUMN_2
tasks.table.column_mappingMapping of column names in the source database to names in the destination database if these names are different, in the format: source_name: destination_name. Specify columns with case-sensitive names exactly as they are called in the appropriate DB. Quotes are not needed.  
column_mapping:
"COLUMN_1": "new_column_1"
"COLUMN_2": "new_column_2"
tasks.table.whereFiltering conditions for the source table, which enables limiting the number of rows to load. An SQL statement that returns boolean.  where: "(COL1 IS NULL OR COL1 > 10) AND COL2 <> 'value'"
tasks.table.uniq_keysUnique key to be used for the selection. If not specified, the most suitable key will be chosen automatically.  
tasks.table.last_rowDeprecated  
tasks.table.index_hintA hint for an index name to be used for an index scan. In some situations, the index that is actually chosen for the index scan can differ from the one specified. 
index_hint: "/+ INDEX (employees emp_department_ix)/" # Oracle
index_hint: "/+ IndexScan(tableName indexName)/" # Postgres Pro
index_hint: "/+ IndexScanOnly(tableName indexName)/" # Postgres Pro
tasks.table.disable_order_byDisables sorting during a selection. If not specified, the global setting from procopy_options is used.  
tasks.querySection for a task of the Query type. For more details, see the section called “Types of Tasks”.  
tasks.query.sqlUser's SQL query for the task  
sql: "SELECT id, name FROM users ORDER BY id"
tasks.query.destination_tableQualified name of the destination table: <schema>.<table>. If the schema name or table name is specified without quotes, this name is meant to be case-insensitive.  
destination_table:
"public"."products"
tasks.query.destination_columnsList of destination columns where data will be inserted. Column names must be in the same order as in the query. List columns with case-sensitive names exactly as they are called in the destination DB. Quotes are not needed.  
destination_columns
- id
- login
tasks.query.destination_column_typesTypes of destination columns where data will be inserted. Must be listed in the same order as in tasks.query.destination_columns. Only used to migrate data to the Parquet format. Type names must follow the types of the source DB. Therefore, Parquet types should not be used here.  For migration PostgreSQL -> Parquet:
destination_column_types:
- text
- integer
- boolean
tasks.schemaSection for a task of the Schema type. For more details, see the section called “Types of Tasks”.   
tasks.schema.source_schemaName of the source schema to load. Must be exactly as in the database. No quotes are needed.  
source_schema: SYSTEM
source_schema: BOOK.shelves
tasks.schema.destination_schemaName of the destination schema. Must be exactly as in the database. No quotes are needed.  
destination_schema: public
destination_schema: book.SHELVES
tasks.schema.exclude_tablesList of tables to exclude from the data loading. List tables with case-sensitive names exactly as they are called in the source DB. Quotes are not needed. Incompatible with tasks.schema.include_tables.  
exclude_tables:
- PRODUCTS
- TOKENS
tasks.schema.include_tablesList of tables such that only these tables will be loaded. List tables with case-sensitive names exactly as they are called in the source DB. Quotes are not needed. Incompatible with tasks.schema.exclude_tables.  
include_tables:
- PRODUCTS
- TOKENS
tasks.schema.table_mappingMapping of table names in the source database to names in the destination database if these names are different, in the format <source_name>: <destination_name>. List tables with case-sensitive names exactly as they are called in the appropriate DB. Quotes are not needed.  
table_mapping:
  "source_name_tab1": "destination_name_tab1"
  "source_name_tab2": "destination_name_tab2"
tasks.batch_bytesLimitation on the batch size for the current task, in bytes. If not specified, the global limitation procopy_options.batch_bytes is used.  
tasks.truncateIf enabled, a destination table (tables) will be truncated before the data loading. If not specified, the global flag procopy_options.truncate is used.  
tasks.transformRules to transform null values of the specified columns. Each rule can contain the following fields:
  • null_to_value contains the value to which null values of the column will be converted during the data loading.

  • values_to_null contains an array of column values to be converted to null during the data loading.

  • null_char_replace contains a string to replace the symbol with the zero code (\0x00, CHR(0)). It can be one symbol, a longer string, or an empty string.

    For tasks of the Schema type, if keys do not contain dots, all the columns with this name in all tables will be processed according to the specified rules. And if keys contain a dot, the first dot will be considered as a separator of the table name from the column name, so the column in only one table will be processed.

    It is important to specify tables and columns in the same case as in the destination DB.

 
transform:
column1:
null_to_value: "UNKNOWN"
null_char_replace: "\n"
table1.column2: # for tasks of the Schema type
values_to_null: ["NOTSET", "notset", "NotSet"]
tasks.disable_index_hintDisables index hints when creating select queries. If not specified, the global flag procopy_options.disable_index_hint is used.  
tasks.window_rowsNumber of rows selected at a time. If not specified, the global value of the procopy_options.window_rows is used.  
tasks.encoder_chan_sizeSize of a buffer used by encoders. If not specified, the global value of the procopy_options.encoder_chan_size is used.  
tasks.snapshot_idIdentifier of the PostgreSQL/Postgres Pro snapshot. If not specified, the global value of the procopy_options.snapshot_id is used.  
tasks.enable_auto_snapshotIf true, a snapshot is created in PostgreSQL/Postgres Pro to be used to get data.  
task.sub_task_rowsNumber of rows in a task for parallel reading of tables. Used to split reading large tables into subtasks. The value of zero turns off splitting into subtasks.0