Re: Resetting spilled txn statistics in pg_stat_replication - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Resetting spilled txn statistics in pg_stat_replication |
Date | |
Msg-id | CA+fd4k59yKGXVTahoez_G_aARPSNks=uQFK-XtDD_LS9tU22qA@mail.gmail.com Whole thread Raw |
In response to | Re: Resetting spilled txn statistics in pg_stat_replication (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Resetting spilled txn statistics in pg_stat_replication
|
List | pgsql-hackers |
On Tue, 13 Oct 2020 at 16:12, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 13, 2020 at 12:17 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 13 Oct 2020 at 15:27, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Oct 13, 2020 at 11:49 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Tue, 13 Oct 2020 at 14:53, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > The original theory I have given above [1] which is an interleaved > > > > > autovacumm transaction. Let me try to explain in a bit more detail. > > > > > Say when transaction T-1 is performing Insert ('INSERT INTO stats_test > > > > > SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) > > > > > g(i);') a parallel autovacuum transaction occurs. The problem as seen > > > > > in buildfarm will happen when autovacuum transaction happens after 80% > > > > > or more of the Insert is done. > > > > > > > > > > In such a situation we will start decoding 'Insert' first and need to > > > > > spill multiple times due to the amount of changes (more than threshold > > > > > logical_decoding_work_mem) and then before we encounter Commit of > > > > > transaction that performed Insert (and probably some more changes from > > > > > that transaction) we will encounter a small transaction (autovacuum > > > > > transaction). The decode of that small transaction will send the > > > > > stats collected till now which will lead to the problem shown in > > > > > buildfarm. > > > > > > > > That seems a possible scenario. > > > > > > > > I think probably this also explains the reason why spill_count > > > > slightly varied and spill_txns was still 1. The spill_count value > > > > depends on how much the process spilled out transactions before > > > > encountering the commit of an autovacuum transaction. Since we have > > > > the spill statistics per reorder buffer, not per transactions, it's > > > > possible. > > > > > > > > > > Okay, here is an updated version (changed some comments) of the patch > > > I posted some time back. What do you think? I have tested this on both > > > Windows and Linux environments. I think it is a bit tricky to > > > reproduce the exact scenario so if you are fine we can push this and > > > check or let me know if you any better idea? > > > > I agree to check if the spill_counts and spill_txns are positive. > > > > I am able to reproduce this problem via debugger. Basically, execute > the Insert mentioned above from one the psql sessions and in > ExecInsert() stop the execution once 'estate->es_processed > 4000' and > then from another psql terminal execute some DDL which will be ignored > but will any try to decode commit. Then perform 'continue' in the > first session. This will lead to inconsistent stats value depending > upon at what time DDL is performed. Thanks! I'm also able to reproduce this in a similar way and have confirmed the patch fixes it. > I'll push the patch as I am more > confident now. +1. Let's check how the tests are going to be. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: