Re: BUG #16967: Extremely slow update statement in trigger - Mailing list pgsql-bugs

From David Fetter
Subject Re: BUG #16967: Extremely slow update statement in trigger
Date
Msg-id 20210423015012.GI2586@fetter.org
Whole thread Raw
In response to BUG #16967: Extremely slow update statement in trigger  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #16967: Extremely slow update statement in trigger  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On Fri, Apr 16, 2021 at 04:52:15PM +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      16967
> Logged by:          Nicolas Burri
> Email address:      postgres@burri.li
> PostgreSQL version: 13.2
> Operating system:   Linux & Windows
> Description:        
> 
> Hi,
> During a data migration our production system spent several hours in a
> simple update statement of a table with very few columns and about 200k
> records. We managed to reproduce the problem and a minimal setup for
> reproduction can be found here:
> https://www.dropbox.com/s/mxjavpl43s48hdg/bug_report.sql. The bug seems to
> be related to query plan caching and can be reproduced on postrgres 11.11
> and 13.2 (I have not run tests with other versions)
> 
> Description:
> We use a statement level trigger to log changes to records in a certain
> table. On every update, the trigger calls a function comparing “old table”
> and “new table”. For every record changed by the statement, the function
> inserts a new record in a “log” table. In the following scenario, execution
> times explode:
> First, execute an update statement that affects no rows. This query is fast
> and completes within milliseconds. Then, execute a second update statement
> that affects a lot of records. At 200k records, this query runs for more
> than 4h on my workstation. If we call “discard plans” before executing the
> second update statement, or if we do not execute the first statement at all,
> the update of all rows completes within about 1 second.
> 
> Thanks and best Regards
> Nicolas
> 

I'm including what I found on dropbox below. In future, please include
the entire content in emails so it stays independent of sites that
might not still be there when people look back.

I've managed to reproduce the extremely long execution, which I
haven't yet tried to wait out.

I did find something interesting here's the EXPLAIN for the update
that's intended to change everything in the table. Note that the
estimated rows is 1:

    shackle@[local]:5414/slow_stmt_triggers(14devel)(110490) # explain update demo set type = 'black' where
type='blue';
                              QUERY PLAN                          
    ══════════════════════════════════════════════════════════════
     Update on demo  (cost=0.00..3976.35 rows=0 width=0)
       ->  Seq Scan on demo  (cost=0.00..3976.35 rows=1 width=34)
             Filter: ((type)::text = 'blue'::text)
    (3 rows)

    Time: 0.707 ms

Next, I run ANALYZE on that table to collect statistics, as people
generally do just after a bulk load to ensure that the planner is
acting on reasonably accurate statistics.

    shackle@[local]:5414/slow_stmt_triggers(14devel)(110490) # analyze demo;
    ANALYZE
    Time: 35.511 ms

...and run the EXPLAIN again:

    shackle@[local]:5414/slow_stmt_triggers(14devel)(110490) # explain update demo set type = 'black' where
type='blue';
                                QUERY PLAN                             
    ═══════════════════════════════════════════════════════════════════
     Update on demo  (cost=0.00..3582.00 rows=0 width=0)
       ->  Seq Scan on demo  (cost=0.00..3582.00 rows=200000 width=34)
             Filter: ((type)::text = 'blue'::text)
    (3 rows)

    Time: 0.772 ms

Now the row estimate is 200_000, which is to say exactly what's in
there.

Now, I run the whole-table UPDATE, and:

    shackle@[local]:5414/slow_stmt_triggers(14devel)(110490) # update demo set type = 'black' where type='blue';
    UPDATE 200000
    Time: 2208.754 ms (00:02.209)

Perhaps my machine is a little slower than the one the original test
was run on, but updating a little under a hundred thousand rows a
second, complete with an INSERT of another hundred thousand rows in a
second, doesn't seem terrible.


tl;dr: running ANALYZE after a bulk load is a good idea, and I don't
see a bug here.

Original self-contained repro:

-----8<-------------------------------------------------------------------------
-- on this table we will execute a simple update statement
create table demo (
    id int not null primary key,
    type varchar(5)
);

-- this is our log table that has a record written to for every updated record in the "demo" table.
create table log (
    data_after text,
    data_before text
);

-- This function is executed in an update trigger (defined below) attached to the "demo" table. The function joins
"before"and "after" data provided by a statement level trigger.
 
-- For all records containing a changed value, both "before" and "after" data are written to the "log" table.
CREATE OR REPLACE FUNCTION log_function()
    RETURNS TRIGGER
AS
$$
DECLARE
begin
        INSERT
        INTO log
        select new_data.row        AS data_after,
               old_data.row        AS data_before
        FROM (SELECT row_number() OVER () as id,
                     ROW (new_table.*)
              FROM new_table) new_data
                 JOIN (
            SELECT row_number() OVER () id,
                   ROW (old_table.*)
            FROM old_table) old_data
                      ON (new_data.id = old_data.id)
        WHERE new_data.row IS DISTINCT FROM old_data.row;
    RETURN NULL;
        end;
$$
    LANGUAGE plpgsql;

-- This attaches a update trigger to the "demo" table. On "update", log_function() is called
CREATE TRIGGER log_trigger AFTER UPDATE ON demo REFERENCING OLD TABLE AS old_table NEW TABLE AS new_table FOR EACH
STATEMENTEXECUTE PROCEDURE log_function();
 

truncate demo; -- just to make sure all tables are empty
truncate log; -- just to make sure all tables are empty

insert into demo (id, type) select i, 'blue' from generate_series(1,200000) s(i); -- generate 200k sample records in
the"demo" table. The log table is still empty after the execution.
 

update demo set type = 'black' where type='red'; -- No rows are updated and the query executes in 16ms. After the
executionof this query, "log" is still empty, as no records have changed.
 

update demo set type = 'black' where type='blue'; -- If run after the query above, this query does not stop within a
reasonableamount of time (hours). However, if the black -> red update statement is not called first, execution of this
querytakes less than 1.2s. It updates all 200k records in the demo table and creates 200k new records in the "log"
table.

-- Additional information: If we call "discard plans" between the execution of the two update statements above. The
secondone will also complete in 1.2s.
 
truncate demo; -- just to make sure all tables are empty
truncate log; -- just to make sure all tables are empty
insert into demo (id, type) select i, 'blue' from generate_series(1,200000) s(i); -- generate 200k sample records in
the"demo" table. The log table is still empty after the execution.
 
update demo set type = 'black' where type='red'; -- No rows are updated and the query executes in 16ms. After the
executionof this query, "log" is still empty, as no records have changed.
 
discard plans;
update demo set type = 'black' where type='blue';

---- just for checking results. Not part of the reproduction
select * from demo;
select * from log;


-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



pgsql-bugs by date:

Previous
From: Amit Langote
Date:
Subject: Re: posgres 12 bug (partitioned table)
Next
From: Tom Lane
Date:
Subject: Re: BUG #16967: Extremely slow update statement in trigger