Re: Replication slot stats misgivings - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Replication slot stats misgivings
Date
Msg-id CAA4eK1Li_m6WVkHpcf4437+b1kAg4zbWc90q5ynjWD93Xen5Xw@mail.gmail.com
Whole thread Raw
In response to Re: Replication slot stats misgivings  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, Apr 29, 2021 at 8:50 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > This is the first test and inserts just one small record, so how it
> > can lead to spill of data. Do you mean to say that may be some
> > background process has written some transaction which leads to a spill
> > of data?
>
> autovacuum, say?
>
> > Yeah, something like this could happen. Another possibility here could
> > be that before the stats collector has processed drop and create
> > messages, we have enquired about the stats which lead to it giving us
> > the old stats. Note, that we don't wait for 'drop' or 'create' message
> > to be delivered. So, there is a possibility of the same. What do you
> > think?
>
> You should take a close look at the stats test in the main regression
> tests.  We had to jump through *high* hoops to get that to be stable,
> and yet it still fails semi-regularly.  This looks like pretty much the
> same thing, and so I'm pessimistically inclined to guess that it will
> never be entirely stable.
>

True, it is possible that we can't make it entirely stable but I would
like to try some more before giving up on this. Otherwise, I guess the
other possibility is to remove some of the latest tests added or
probably change them to be more forgiving. For example, we can change
the currently failing test to not check 'spill*' count and rely on
just 'total*' count which will work even in scenarios we discussed for
this failure but it will reduce the efficiency/completeness of the
test case.

> (At least not before the fabled stats collector rewrite, which may well
> introduce some entirely new set of failure modes.)
>
> Do we really need this test in this form?  Perhaps it could be converted
> to a TAP test that's a bit more forgiving.
>

We have a TAP test for slot stats but there we are checking some
scenarios across the restart. We can surely move these tests also
there but it is not apparent to me how it can create a difference?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: [BUG]"FailedAssertion" reported in lazy_scan_heap() when running logical replication
Next
From: Thomas Munro
Date:
Subject: Re: WIP: WAL prefetch (another approach)