5.2. Configuration Parameters Shared by Applications #
The following is an example of the procopy configuration file fragment in the YAML format that lists configuration parameters used both in procopy and prosync:
source:
driver_name: oracle
host: enter host
port: 1521
database: enter database
username: enter username
password: enter password if necessary
max_conn: 50
options:
- PREFETCH_ROWS=500
destination:
driver_name: postgresql
host: enter host
port: 5432
database: enter database
username: enter username
password: enter password if necessary
max_conn: 50
options:
- sslmode=disable
tasks:
- id: tab1
table:
source_table: source_table
destination_table: destination_table
exclude_columns:
- field4
include_columns:
- field1
- field2
- field3
- field4
column_mapping:
field1: field2
field2: field1
where: (COL1 IS NULL OR COL1 > 10) AND COL2 <> 'value'
batch_bytes: 976.6KiB
truncate: false
transform:
field1:
null_to_value: ""
field2:
values_to_null:
- "NULL"
- NULL_VAL
- INVALID
snapshot_id: null
- id: tab2
query:
sql: SELECT id, name FROM users ORDER BY id
destination_table: users_table
destination_columns:
- id
- login
batch_bytes: 976.6KiB
truncate: false
snapshot_id: null
- id: schema_task
schema:
source_schema: schema1
destination_schema: schema2
exclude_tables:
- TABLE1
include_tables:
- TABLE1
- TABLE2
- TABLE2
table_mapping:
TABLE2: main_table
batch_bytes: 976.6KiB
truncate: false
snapshot_id: null
For how to set values for time configuration parameters, refer to Section 4.3.6.
The following table explains configuration parameters that are used both in procopy and prosync:
Table 5.2. Configuration Parameters Shared Between Applications
| Name | Description | Default Value | Example |
|---|---|---|---|
source | Parameters to connect to the source | ||
source.dsn | DSN for the connection. If specified, this parameter has precedence: it defines the actual settings. | For BiHA cluster:dsn: "postgresql://db-biha-1:5432,db-biha-2:5432, db-biha-3:5432/postgres?user=user& password=password" | |
source.driver_name | Driver for the connection. Possible values: oracle, postgres, parquet (although Parquet is only used as a destination). | ||
source.host | Host name or IP address | ||
source.port | Connection port | ||
source.database | Name of the database to connect to | ||
source.username | Connection login | ||
source.password | Connection password. Leave empty if the password is not required. | ||
source.max_conn | Maximum number of open connections | 50 | |
source.options | Additional connection parameters | options: - PREFETCH_ROWS=500 - sslmode=disable | |
destination | Parameters to connect to the destination | ||
destination.dsn | DSN for the connection. If specified, this parameter has precedence: it defines the actual settings. | For BiHA cluster:dsn: "postgresql://db-biha-1:5432,db-biha-2:5432, db-biha-3:5432/postgres?user=user& password=password" | |
destination.driver_name | Driver for the connection. Possible values: oracle, postgres, parquet. | ||
destination.host | Host name or IP address | ||
destination.port | Connection port | ||
destination.database | Name of the database to connect to | ||
destination.username | Connection login | ||
destination.password | Connection password. Leave empty if the password is not required. | ||
destination.max_conn | Maximum number of connections allowed for the connection pool | 50 | |
destination.options | Additional connection parameters | options: - PREFETCH_ROWS=500 - sslmode=disable | |
tasks | List of tasks. A task can be of the Query, Schema, or Table type. For more details, see the section called “Types of Tasks”. | ||
tasks.id | Task name. Must be unique. It is recommended to avoid changing it between tasks. | ID: "task_schema_1" | |
tasks.table | Section for a task of the Table type. For more details, see the section called “Types of Tasks”. | ||
tasks.table.source_table | Qualified name of the source table: <schema>.<table>. The names that are specified without quotes are meant to be case-insensitive. | source_table: "SYSTEM"."PRODUCTS" | |
tasks.table.destination_table | Qualified name of the destination table: <schema>.<table>. The names that are specified without quotes are meant to be case-insensitive. | destination_table: "public"."products" | |
tasks.table.exclude_columns | List of columns to exclude from data loading. List columns with case-sensitive names exactly as they are called in the source DB. Quotes are not needed. Incompatible with tasks.table.include_columns. | exclude_columns: - COLUMN_1 - COLUMN_2 | |
tasks.table.include_columns | List of columns such that only these columns will be loaded. List columns with case-sensitive names exactly as they are called in the source DB. Quotes are not needed. Incompatible with tasks.table.exclude_columns. | include_columns: - COLUMN_1 - COLUMN_2 | |
tasks.table.column_mapping | Mapping of column names in the source database to names in the destination database if these names are different, in the format: “source_name”: “destination_name”. Specify columns with case-sensitive names exactly as they are called in the appropriate DB. Quotes are not needed. | column_mapping: "COLUMN_1": "new_column_1" "COLUMN_2": "new_column_2" | |
tasks.table.where | Filtering conditions for the source table, which enables limiting the number of rows to load. An SQL statement that returns boolean. | where: "(COL1 IS NULL OR COL1 > 10) AND COL2 <> 'value'" | |
tasks.table.uniq_keys | Unique key to be used for the selection. If not specified, the most suitable key will be chosen automatically. | ||
tasks.table.last_row | Deprecated | ||
tasks.table.index_hint | A hint for an index name to be used for an index scan. In some situations, the index that is actually chosen for the index scan can differ from the one specified. | index_hint: "/+ INDEX (employees emp_department_ix)/" # Oracle index_hint: "/+ IndexScan(tableName indexName)/" # Postgres Pro index_hint: "/+ IndexScanOnly(tableName indexName)/" # Postgres Pro | |
tasks.table.disable_order_by | Disables sorting during a selection. If not specified, the global setting from procopy_options is used. | ||
tasks.query | Section for a task of the Query type. For more details, see the section called “Types of Tasks”. | ||
tasks.query.sql | User's SQL query for the task | sql: "SELECT id, name FROM users ORDER BY id" | |
tasks.query.destination_table | Qualified name of the destination table: <schema>.<table>. If the schema name or table name is specified without quotes, this name is meant to be case-insensitive. | destination_table: "public"."products" | |
tasks.query.destination_columns | List of destination columns where data will be inserted. Column names must be in the same order as in the query. List columns with case-sensitive names exactly as they are called in the destination DB. Quotes are not needed. | destination_columns - id - login | |
tasks.query.destination_column_types | Types of destination columns where data will be inserted. Must be listed in the same order as in tasks.query.destination_columns. Only used to migrate data to the Parquet format. Type names must follow the types of the source DB. Therefore, Parquet types should not be used here. | For migration PostgreSQL -> Parquet: destination_column_types: - text - integer - boolean | |
tasks.schema | Section for a task of the Schema type. For more details, see the section called “Types of Tasks”. | ||
tasks.schema.source_schema | Name of the source schema to load. Must be exactly as in the database. No quotes are needed. | source_schema: SYSTEM source_schema: BOOK.shelves | |
tasks.schema.destination_schema | Name of the destination schema. Must be exactly as in the database. No quotes are needed. | destination_schema: public destination_schema: book.SHELVES | |
tasks.schema.exclude_tables | List of tables to exclude from the data loading. List tables with case-sensitive names exactly as they are called in the source DB. Quotes are not needed. Incompatible with tasks.schema.include_tables. | exclude_tables: - PRODUCTS - TOKENS | |
tasks.schema.include_tables | List of tables such that only these tables will be loaded. List tables with case-sensitive names exactly as they are called in the source DB. Quotes are not needed. Incompatible with tasks.schema.exclude_tables. | include_tables: - PRODUCTS - TOKENS | |
tasks.schema.table_mapping | Mapping of table names in the source database to names in the destination database if these names are different, in the format <source_name>: <destination_name>. List tables with case-sensitive names exactly as they are called in the appropriate DB. Quotes are not needed. | table_mapping: "source_name_tab1": "destination_name_tab1" "source_name_tab2": "destination_name_tab2" | |
tasks.batch_bytes | Limitation on the batch size for the current task, in bytes. If not specified, the global limitation procopy_options.batch_bytes is used. | ||
tasks.truncate | If enabled, a destination table (tables) will be truncated before the data loading. If not specified, the global flag procopy_options.truncate is used. | ||
tasks.transform | Rules to transform null values of the specified columns. Each rule can contain the following fields:
| transform: column1: null_to_value: "UNKNOWN" null_char_replace: "\n" table1.column2: # for tasks of the Schema type values_to_null: ["NOTSET", "notset", "NotSet"] | |
tasks.disable_index_hint | Disables index hints when creating select queries. If not specified, the global flag procopy_options.disable_index_hint is used. | ||
tasks.window_rows | Number of rows selected at a time. If not specified, the global value of the procopy_options.window_rows is used. | ||
tasks.encoder_chan_size | Size of a buffer used by encoders. If not specified, the global value of the procopy_options.encoder_chan_size is used. | ||
tasks.snapshot_id | Identifier of the PostgreSQL/Postgres Pro snapshot. If not specified, the global value of the procopy_options.snapshot_id is used. | ||
tasks.enable_auto_snapshot | If true, a snapshot is created in PostgreSQL/Postgres Pro to be used to get data. | ||
task.sub_task_rows | Number of rows in a task for parallel reading of tables. Used to split reading large tables into subtasks. The value of zero turns off splitting into subtasks. | 0 |