Re: Resetting spilled txn statistics in pg_stat_replication - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Resetting spilled txn statistics in pg_stat_replication |
Date | |
Msg-id | CAA4eK1KtR90LZVhASiDpqc9yHp8ALWAXMewkR4r+ce3Zj4fcrw@mail.gmail.com Whole thread Raw |
In response to | Re: Resetting spilled txn statistics in pg_stat_replication (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>) |
Responses |
Re: Resetting spilled txn statistics in pg_stat_replication
Re: Resetting spilled txn statistics in pg_stat_replication Re: Resetting spilled txn statistics in pg_stat_replication |
List | pgsql-hackers |
On Tue, Oct 13, 2020 at 12:17 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 13 Oct 2020 at 15:27, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Oct 13, 2020 at 11:49 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Tue, 13 Oct 2020 at 14:53, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > The original theory I have given above [1] which is an interleaved > > > > autovacumm transaction. Let me try to explain in a bit more detail. > > > > Say when transaction T-1 is performing Insert ('INSERT INTO stats_test > > > > SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) > > > > g(i);') a parallel autovacuum transaction occurs. The problem as seen > > > > in buildfarm will happen when autovacuum transaction happens after 80% > > > > or more of the Insert is done. > > > > > > > > In such a situation we will start decoding 'Insert' first and need to > > > > spill multiple times due to the amount of changes (more than threshold > > > > logical_decoding_work_mem) and then before we encounter Commit of > > > > transaction that performed Insert (and probably some more changes from > > > > that transaction) we will encounter a small transaction (autovacuum > > > > transaction). The decode of that small transaction will send the > > > > stats collected till now which will lead to the problem shown in > > > > buildfarm. > > > > > > That seems a possible scenario. > > > > > > I think probably this also explains the reason why spill_count > > > slightly varied and spill_txns was still 1. The spill_count value > > > depends on how much the process spilled out transactions before > > > encountering the commit of an autovacuum transaction. Since we have > > > the spill statistics per reorder buffer, not per transactions, it's > > > possible. > > > > > > > Okay, here is an updated version (changed some comments) of the patch > > I posted some time back. What do you think? I have tested this on both > > Windows and Linux environments. I think it is a bit tricky to > > reproduce the exact scenario so if you are fine we can push this and > > check or let me know if you any better idea? > > I agree to check if the spill_counts and spill_txns are positive. > I am able to reproduce this problem via debugger. Basically, execute the Insert mentioned above from one the psql sessions and in ExecInsert() stop the execution once 'estate->es_processed > 4000' and then from another psql terminal execute some DDL which will be ignored but will any try to decode commit. Then perform 'continue' in the first session. This will lead to inconsistent stats value depending upon at what time DDL is performed. I'll push the patch as I am more confident now. > I > thought we can reduce the number of tuples to insert to the half. It > would help to reduce the likelihood of other transactions interfere > and speed up the test (currently, the stats.sql test takes almost 1 > sec in my environment). But it might lead to another problem like the > logical decoding doesn't spill out the transaction on some > environment. > Yeah, and in other cases, in spill.sql we are using the same amount of data to test spilling. -- With Regards, Amit Kapila.
pgsql-hackers by date: