Thread: Question about VACUUM behavior with sub-transactions in stored procedures

Question about VACUUM behavior with sub-transactions in stored procedures

From
Кириллов Вячеслав
Date:

Hi hackers,,

I have a question regarding the behavior of the auto VACUUM in PostgreSQL in the context of using stored procedures with sub-transactions.


As I understand it, the parameters that control the initiation of VACUUM are set in the configuration file, such as autovacuum_vacuum_threshold, autovacuum_vacuum_scale_factor, and are stored in the system tables pg_stat_user_tables or pg_class (please correct me if I'm wrong). These system tables are updated after each completed transaction, and VACUUM analyzes them to determine whether to clean up dead rows, depending on the configured thresholds.


Here is the scenario: we have several stored procedures that modify or update table data. These procedures use sub-transactions, which are committed via COMMIT. However, from my understanding, the system table, which VACUUM checks, is not updated until the main (outermost) transaction completes. This means that during the execution of the procedures, a significant number of dead rows may accumulate, and only after the final COMMIT of the main transaction do these dead rows become visible for VACUUM.


As a result, there could be a sharp spike in CPU usage when VACUUM runs after the completion of the procedures, as it begins to clean up a large number of accumulated dead rows.

I would like to know if this behavior is expected and correct? Or could there be a potential issue or bug in this scenario?


To illustrate the issue, here’s an example:

CREATE TABLE bloat
(
    id integer generated always as identity,
    d timestamptz 
);

CREATE OR REPLACE PROCEDURE update_multiple_bloat()
LANGUAGE plpgsql
AS $$
DECLARE
    row_to_update RECORD;
BEGIN
    FOR row_to_update IN SELECT * FROM bloat
    LOOP
        UPDATE bloat SET d = CURRENT_TIMESTAMP WHERE d = row_to_update.d;
        COMMIT;
    END LOOP;
END;
$$;

CREATE OR REPLACE PROCEDURE insert_multiple_into_bloat(num_records integer)
LANGUAGE plpgsql
AS $$
DECLARE
    i integer := 1;
BEGIN
    LOOP
        EXIT WHEN i > num_records;
        INSERT INTO bloat (d) VALUES (CURRENT_TIMESTAMP);
        i := i + 1;
        COMMIT;
    END LOOP;
END;
$$;

DO $$
DECLARE
    row_data RECORD;
    counter INT := 0;
BEGIN

    BEGIN
        INSERT INTO bloat (d) VALUES (CURRENT_TIMESTAMP);
        COMMIT;
    END;​

    BEGIN
        call insert_multiple_into_bloat(100);
    END;

    BEGIN
        call update_multiple_bloat();
    END;

END $$;


Thank you in advance for your help!


With Regards,
Vyacheslav Kirillov!


Re: Question about VACUUM behavior with sub-transactions in stored procedures

From
"David G. Johnston"
Date:
On Monday, October 21, 2024, Кириллов Вячеслав <vkirillov@diasoft.ru> wrote:

I have a question regarding the behavior of the auto VACUUM in PostgreSQL in the context of using stored procedures with sub-transactions.


This is a general usage inquiry not suited to discussion on -hackers.  We have a -general mailing list to discuss how to use the product.  This list is for discussing patches.

Here is the scenario: we have several stored procedures that modify or update table data. These procedures use sub-transactions, which are committed via COMMIT.


This isn’t how sub-transactions work.  They are created mainly by save points and are not independently committed (by the user in SQL).  What you are using are full transactions.


David J.

Re: Question about VACUUM behavior with sub-transactions in stored procedures

From
Кириллов Вячеслав
Date:

Hello everyone,
I’d like to revisit the topic of auto VACUUM’s interaction with stored procedures that perform transactions, with a more technical clarification as suggested earlier.

Let’s consider the behavior of VACUUM and system table updates after transaction commits in procedures that frequently open and commit transactions.
As I understand, statistics updates in PostgreSQL, which VACUUM later analyzes, are performed in pgstat_report_stat, called within db/src/backend/tcop/postgres.c in the PostgresMain function. Specifically:
stats_timeout = pgstat_report_stat(false);
if (stats_timeout > 0) { if (!get_timeout_active(IDLE_STATS_UPDATE_TIMEOUT)) enable_timeout_after(IDLE_STATS_UPDATE_TIMEOUT, stats_timeout); } else { /* all stats flushed, no need for the timeout */ if (get_timeout_active(IDLE_STATS_UPDATE_TIMEOUT)) disable_timeout(IDLE_STATS_UPDATE_TIMEOUT, false); }
Inside procedures, when _SPI_commit is called in db/src/backend/executor/spi.c, the main command responsible for completing a transaction is CommitTransactionCommand(). My question is the following:​
  1. Is it expected behavior that system table updates are deferred until all nested transactions are complete? This would mean that auto VACUUM might not account for dead tuples accumulated during procedure execution until the entire main transaction is finished.
  2. Is it possible or advisable to call pgstat_report_stat after each CommitTransactionCommand() within procedures so that auto VACUUM can track intermediate changes and prevent an accumulation of dead rows?
  3. To what extent would this approach be viable in terms of performance and correctness?
 
I look forward to any insights and advice you can offer on this matter.

Best regards,
Vyacheslav Kirillov



От: David G. Johnston <david.g.johnston@gmail.com>
Отправлено: 21 октября 2024 г. 16:55
Кому: Кириллов Вячеслав
Копия: pgsql-hackers@postgresql.org
Тема: Re: Question about VACUUM behavior with sub-transactions in stored procedures
 
On Monday, October 21, 2024, Кириллов Вячеслав <vkirillov@diasoft.ru> wrote:

I have a question regarding the behavior of the auto VACUUM in PostgreSQL in the context of using stored procedures with sub-transactions.


This is a general usage inquiry not suited to discussion on -hackers.  We have a -general mailing list to discuss how to use the product.  This list is for discussing patches.

Here is the scenario: we have several stored procedures that modify or update table data. These procedures use sub-transactions, which are committed via COMMIT.


This isn’t how sub-transactions work.  They are created mainly by save points and are not independently committed (by the user in SQL).  What you are using are full transactions.


David J.

Re: Question about VACUUM behavior with sub-transactions in stored procedures

From
Кириллов Вячеслав
Date:
Hello everyone,
I’d like to revisit the topic of auto VACUUM’s interaction with stored procedures that perform transactions, with a more technical clarification as suggested earlier.

Let’s consider the behavior of VACUUM and system table updates after transaction commits in procedures that frequently open and commit transactions.
As I understand, statistics updates in PostgreSQL, which VACUUM later analyzes, are performed in pgstat_report_stat, called within db/src/backend/tcop/postgres.c in the PostgresMain function. Specifically:
stats_timeout = pgstat_report_stat(false);
if (stats_timeout > 0) { if (!get_timeout_active(IDLE_STATS_UPDATE_TIMEOUT)) enable_timeout_after(IDLE_STATS_UPDATE_TIMEOUT, stats_timeout); } else { /* all stats flushed, no need for the timeout */ if (get_timeout_active(IDLE_STATS_UPDATE_TIMEOUT)) disable_timeout(IDLE_STATS_UPDATE_TIMEOUT, false); }
Inside procedures, when _SPI_commit is called in db/src/backend/executor/spi.c, the main command responsible for completing a transaction is CommitTransactionCommand(). My question is the following:​
  1. Is it expected behavior that system table updates are deferred until all nested transactions are complete? This would mean that auto VACUUM might not account for dead tuples accumulated during procedure execution until the entire main transaction is finished.
  2. Is it possible or advisable to call pgstat_report_stat after each CommitTransactionCommand() within procedures so that auto VACUUM can track intermediate changes and prevent an accumulation of dead rows?
  3. To what extent would this approach be viable in terms of performance and correctness?
 
I look forward to any insights and advice you can offer on this matter.

Best regards,
Vyacheslav Kirillov​