3.6. Instance Data Collection and Post-Processing #

3.6.1. Instance Data Collection #

Instance data collection by agents includes the following steps:

  1. Connecting to the default instance database.

  2. Collecting global objects:

    • roles

    • tablespaces

    • databases

  3. Collecting local objects for each database:

    • extensions

    • schemas

    • tables

    • indexes

    • sequences

    • functions

    • languages

  4. Sending the collected objects to the manager.

  5. Sending the DELETE /instances/objects request to the manager with the specified instance_id attribute and data collection time to delete obsolete objects, for example, deleted objects.

  6. Sending the POST /instances/objects/post_processing request to the manager to start post-processing.

    If the manager already performs post-processing, for example, if it was not finished after the previous data collection cycle, 429 Too Many Requests is returned. In this case, the agent finishes the current data collection cycle and postpones post-processing to the next cycle.

3.6.1.1. Important Considerations #

  • Every instance can potentially contain hundreds of thousands of objects, so agents collect data in batches.

    You can specify the size of batches in the ppem-agent.yml agent configuration file using the collectors.instance_objects.batch_size: number_of_objects_in_each_batch; parameter.

  • Agents send extended requests for collecting the following table and index data:

    • For tables:

      • the table sizes per main, vm, and fsm layers, in bytes

      • the bloat size, in bytes

      • the TOAST size, in bytes

      • the number of tuples

      • the number of pages

      • the total index size, in bytes

      • the storage parameters

      • the table file path

    • For indexes:

      • the index size, in bytes

      • the bloat size, in bytes

      • the storage parameters

      • the index file path

    Extended requests can be resource-intensive and are sent once per 5 data collection cycles to avoid overloading the instance. You can specify how often extended requests are sent in the ppem-agent.yml agent configuration file using the following parameters:

    • collectors.instance_objects.extended.enabled: Specifies whether extended requests are sent.

      Possible values:

      • true

      • false

    • collectors.instance_objects.extended.interval: The interval for sending extended requests.

      Alternatively, you can specify this interval in the crontab format using the collectors.instance_objects.extended.schedule parameter. This parameter takes precedence over collectors.instance_objects.extended.interval.

  • For composite objects, i.e., databases, schemas, and tables, agents collect information on all their dependencies. Based on this information, the manager generates summary information after the data collection cycle is completed.

  • Agents automatically reconnect to the Postgres Pro DBMS instance once per 10 requests to avoid bloating the cache of the corresponding backend.

3.6.2. Post-Processing #

Post-processing by the manager includes the following steps:

  1. Updating the size_bytes field of the collected objects:

    • For tables, the value is calculated as:

      relation_size + visibility_map_size + free_space_map_size + TOAST_size

    • For indexes, the value equals relation_size.

    • For schemas and databases, the value is the total size of all their dependencies.

    Note

    The size of the pg_toast schema's tables is not considered since it is included in the size of database tables.

  2. Re-generating the following summary information for all composite objects:

    • For databases:

      • the total sum of table sizes, in bytes

      • the total sum of index sizes, in bytes

      • the number of tables

      • the number of indexes

      • the total bloat, in bytes

      • available_xid_total and available_xid_percent

    • For schemas:

      • the number of tables

      • the number of indexes

      • the table size and bloat

      • the index size and bloat

    • For tables:

      • the number of indexes

      • the total sum of index sizes

      • the total sum of bloat sizes

    The summary information is stored in the instance_objects table as a JSONB column. The structure of the summary information depends on the composite object type:

    database:
    tables_count: INT
    tables_all_size_bytes: BIGINT
    tables_all_bloat_size_bytes: BIGINT
    indexes_count: INT
    indexes_all_size_bytes: BIGINT
    indexes_all_bloat_size_bytes: BIGINT
    available_xid_total: BIGINT
    available_xid_percent: BIGINT
    
    schema:
    tables_count: INT
    tables_all_size_bytes: BIGINT
    tables_all_bloat_size_bytes: BIGINT
    indexes_count: INT
    indexes_all_size_bytes: BIGINT
    indexes_all_bloat_size_bytes: BIGINT
    
    table:
    indexes_count: INT
    indexes_all_size_bytes: BIGINT
    indexes_all_bloat_size_bytes: BIGINT