Thread: Intermittent buildfarm failures on wrasse

Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

13 April 2022, 22:08:13

For the past five days or so, wrasse has been intermittently
failing due to unexpectedly not using an Index Only Scan plan
in the create_index test [1], eg

@@ -1910,11 +1910,15 @@
 SELECT unique1 FROM tenk1
 WHERE unique1 IN (1,42,7)
 ORDER BY unique1;
-                      QUERY PLAN
--------------------------------------------------------
- Index Only Scan using tenk1_unique1 on tenk1
-   Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
-(2 rows)
+                            QUERY PLAN
+-------------------------------------------------------------------
+ Sort
+   Sort Key: unique1
+   ->  Bitmap Heap Scan on tenk1
+         Recheck Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+         ->  Bitmap Index Scan on tenk1_unique1
+               Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+(6 rows)

 SELECT unique1 FROM tenk1
 WHERE unique1 IN (1,42,7)

The most probable explanation for this seems to be that tenk1's
pg_class.relallvisible value hasn't been set high enough to make an IOS
look cheaper than the alternatives.  Where that ought to be getting set
is the "VACUUM ANALYZE tenk1" step in test_setup.sql.  It's plausible
I guess that a background autovacuum is preventing that command from
setting relallvisible as high as it ought to be --- but if so, why
are we only seeing two plans changing, on only one animal?

But what I'm really confused about is that this test arrangement has
been stable since early February.  Why has wrasse suddenly started
showing a 25% failure rate when it never failed this way before that?
Somebody has to have recently committed a change that affects this.
Checking the commit log up to the onset of the failures on 8 April,
I only see two plausible candidates:

* shared-memory pgstats
* Peter's recent VACUUM changes

Any connection to pgstats is, um, pretty obscure.  I'd finger the VACUUM
changes as a more likely trigger except that the last interesting-looking
one was f3c15cbe5 on 3 April, and wrasse got through "make check" 38 times
after that before its first failure of this kind.  That doesn't square with
the 25% failure rate since then, so I'm kind of forced to the conclusion
that the pgstats work changed some behavior that it should not have.

Any ideas?

I'm tempted to add something like

SELECT relallvisible = relpages FROM pg_class WHERE relname = 'tenk1';

so that we can confirm or refute the theory that relallvisible is
the driving factor.

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-04-08%2003%3A48%3A30

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

13 April 2022, 22:20:30

On Wed, Apr 13, 2022 at 3:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm tempted to add something like
>
> SELECT relallvisible = relpages FROM pg_class WHERE relname = 'tenk1';
>
> so that we can confirm or refute the theory that relallvisible is
> the driving factor.

It would be fairly straightforward to commit a temporary debugging
patch that has the autovacuum logging stuff report directly on how
VACUUM set new_rel_allvisible in pg_class. We should probably be doing
that already, just because it's useful information that is already
close at hand.

Might be a bit trickier to make sure that wrasse reliably reported on
all relevant VACUUMs, since that would have to include manual VACUUMs
(which would really have to use VACUUM VERBOSE), as well as
autovacuums.

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

13 April 2022, 22:54:06

Peter Geoghegan <pg@bowt.ie> writes:
> On Wed, Apr 13, 2022 at 3:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm tempted to add something like
>> SELECT relallvisible = relpages FROM pg_class WHERE relname = 'tenk1';
>> so that we can confirm or refute the theory that relallvisible is
>> the driving factor.

> It would be fairly straightforward to commit a temporary debugging
> patch that has the autovacuum logging stuff report directly on how
> VACUUM set new_rel_allvisible in pg_class. We should probably be doing
> that already, just because it's useful information that is already
> close at hand.

Doesn't look like wrasse has autovacuum logging enabled, though.

After a bit more navel-contemplation I see a way that the pgstats
work could have changed timing in this area.  We used to have a
rate limit on how often stats reports would be sent to the
collector, which'd ensure half a second or so delay before a
transaction's change counts became visible to the autovac daemon.
I've not looked at the new code, but I'm betting that that's gone
and the autovac launcher might start a worker nearly immediately
after some foreground process finishes inserting some rows.
So that could result in autovac activity occurring concurrently
with test_setup where it didn't before.

As to what to do about it ... maybe apply the FREEZE and
DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
not sufficient.

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

David Rowley

Date:

13 April 2022, 23:06:33

On Thu, 14 Apr 2022 at 10:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> After a bit more navel-contemplation I see a way that the pgstats
> work could have changed timing in this area.  We used to have a
> rate limit on how often stats reports would be sent to the
> collector, which'd ensure half a second or so delay before a
> transaction's change counts became visible to the autovac daemon.
> I've not looked at the new code, but I'm betting that that's gone
> and the autovac launcher might start a worker nearly immediately
> after some foreground process finishes inserting some rows.
> So that could result in autovac activity occurring concurrently
> with test_setup where it didn't before.

It's not quite clear to me why the manual vacuum wouldn't just cancel
the autovacuum and complete the job.  I can't quite see how there's
room for competing page locks here. Also, see [1].  One of the
reported failing tests there is the same as one of the failing tests
on wrasse. My investigation for the AIO branch found that
relallvisible was not equal to relpages. I don't recall the reason why
that was happening now.

> As to what to do about it ... maybe apply the FREEZE and
> DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
> It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
> not sufficient.

We should likely try and confirm it's due to relallvisible first.

David

[1] https://www.postgresql.org/message-id/20220224153339.pqn64kseb5gpgl74@alap3.anarazel.de

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

13 April 2022, 23:07:01

On Wed, Apr 13, 2022 at 3:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> After a bit more navel-contemplation I see a way that the pgstats
> work could have changed timing in this area.  We used to have a
> rate limit on how often stats reports would be sent to the
> collector, which'd ensure half a second or so delay before a
> transaction's change counts became visible to the autovac daemon.
> I've not looked at the new code, but I'm betting that that's gone
> and the autovac launcher might start a worker nearly immediately
> after some foreground process finishes inserting some rows.
> So that could result in autovac activity occurring concurrently
> with test_setup where it didn't before.

But why should it matter? The test_setup.sql VACUUM of tenk1 should
leave relallvisible and relpages in the same state, either way (or
very close to it).

The only way that it seems like it could matter is if OldestXmin was
held back during test_setup.sql's execution of the VACUUM command.

> As to what to do about it ... maybe apply the FREEZE and
> DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
> It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
> not sufficient.

BTW, the work on VACUUM for Postgres 15 probably makes VACUUM test
flappiness issues less of a problem -- unless they're issues involving
something holding back OldestXmin when it shouldn't (in which case it
won't have any effect on test stability). I would expect that to be
the case, at least, since VACUUM now does almost all of the same work
for any individual page that it cannot get a cleanup lock on. There is
surprisingly little difference between a page that gets processed by
lazy_scan_prune and a page that gets processed by lazy_scan_noprune.

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

13 April 2022, 23:13:07

Hi,

On April 13, 2022 7:06:33 PM EDT, David Rowley <dgrowleyml@gmail.com> wrote:
>On Thu, 14 Apr 2022 at 10:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> After a bit more navel-contemplation I see a way that the pgstats
>> work could have changed timing in this area.  We used to have a
>> rate limit on how often stats reports would be sent to the
>> collector, which'd ensure half a second or so delay before a
>> transaction's change counts became visible to the autovac daemon.
>> I've not looked at the new code, but I'm betting that that's gone
>> and the autovac launcher might start a worker nearly immediately
>> after some foreground process finishes inserting some rows.
>> So that could result in autovac activity occurring concurrently
>> with test_setup where it didn't before.
>
>It's not quite clear to me why the manual vacuum wouldn't just cancel
>the autovacuum and complete the job.  I can't quite see how there's
>room for competing page locks here. Also, see [1].  One of the
>reported failing tests there is the same as one of the failing tests
>on wrasse. My investigation for the AIO branch found that
>relallvisible was not equal to relpages. I don't recall the reason why
>that was happening now.
>
>> As to what to do about it ... maybe apply the FREEZE and
>> DISABLE_PAGE_SKIPPING options in test_setup's vacuums?
>> It seems like DISABLE_PAGE_SKIPPING is necessary but perhaps
>> not sufficient.
>
>We should likely try and confirm it's due to relallvisible first.

We had this issue before, and not just on the aio branch. On my phone right now, so won't look up references.

IIRC the problem in matter isn't skipped pages, but that the horizon simply isn't new enough to mark pages as all
visible. An independent autovac worker starting is enough for that, for example. Previously the data load and vacuum
werefurther apart, preventing this kind of issue. 

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

13 April 2022, 23:17:02

On Wed, Apr 13, 2022 at 4:13 PM Andres Freund <andres@anarazel.de> wrote:
> IIRC the problem in matter isn't skipped pages, but that the horizon simply isn't new enough to mark pages as all
visible.

Sometimes OldestXmin can go backwards in VACUUM operations that are
run in close succession against the same table, due to activity from
other databases in the same cluster (perhaps other factors are
involved at times).

That's why the following assertion that I recently added to
vacuumlazy.c will fail pretty quickly without the
"vacrel->NewRelfrozenXid == OldestXmin" part of its test:

    Assert(vacrel->NewRelfrozenXid == OldestXmin ||
           TransactionIdPrecedesOrEquals(aggressive ? FreezeLimit :
                                         vacrel->relfrozenxid,
                                         vacrel->NewRelfrozenXid));

If you remove "vacrel->NewRelfrozenXid == OldestXmin", and run the
regression tests, the remaining assertion will fail quite easily.
Though perhaps not with a serial "make check".

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

13 April 2022, 23:38:06

Peter Geoghegan <pg@bowt.ie> writes:
> On Wed, Apr 13, 2022 at 4:13 PM Andres Freund <andres@anarazel.de> wrote:
>> IIRC the problem in matter isn't skipped pages, but that the horizon simply isn't new enough to mark pages as all
visible.

> Sometimes OldestXmin can go backwards in VACUUM operations that are
> run in close succession against the same table, due to activity from
> other databases in the same cluster (perhaps other factors are
> involved at times).

I've been doing some testing locally by inserting commands to
manually set tenk1's relallvisible to zero.  I first did that
in test_setup.sql ... and it had no effect whatsoever.  Further
experimentation showed that the "CREATE INDEX ON tenk1" steps
in create_index.sql itself generally suffice to fix relallvisible;
although if you force it back to zero after the last such command,
you get the same plan diffs wrasse is showing.  And you don't
get any others, which I thought curious until I realized that
sanity_check.sql's database-wide VACUUM offers yet another
opportunity to heal the incorrect value.  If you force it back
to zero again after that, a bunch of later tests start to show
plan differences, which is what I'd been expecting.

So what seems to be happening on wrasse is that a background
autovacuum (or really autoanalyze?) is preventing pages from
being marked all-visible not only during test_setup.sql but
also create_index.sql; but it's gone by the time sanity_check.sql
runs.  Which is odd in itself because not that much time elapses
between create_index and sanity_check, certainly less than the
time from test_setup to create_index.

It seems like a reliable fix might require test_setup to wait
for any background autovac to exit before it does its own
vacuums.  Ick.

And we still lack an explanation of why this only now broke.
I remain suspicious that pgstats is behaving unexpectedly.

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

13 April 2022, 23:45:44

On Wed, Apr 13, 2022 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So what seems to be happening on wrasse is that a background
> autovacuum (or really autoanalyze?) is preventing pages from
> being marked all-visible not only during test_setup.sql but
> also create_index.sql; but it's gone by the time sanity_check.sql
> runs.

I agree that it would need to be an autoanalyze (due to the
PROC_IN_VACUUM optimization).

> It seems like a reliable fix might require test_setup to wait
> for any background autovac to exit before it does its own
> vacuums.  Ick.

This is hardly a new problem, really. I wonder if it's worth inventing
a comprehensive solution. Some kind of infrastructure that makes
VACUUM establish a next XID up-front (by calling
ReadNextTransactionId()), and then find a way to run with an
OldestXmin that's >= the earleir "next" XID value. If necessary by
waiting.

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

13 April 2022, 23:51:23

Peter Geoghegan <pg@bowt.ie> writes:
> On Wed, Apr 13, 2022 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It seems like a reliable fix might require test_setup to wait
>> for any background autovac to exit before it does its own
>> vacuums.  Ick.

> This is hardly a new problem, really. I wonder if it's worth inventing
> a comprehensive solution.

Yeah, we have band-aided around this type of problem repeatedly.
Making a fix that's readily accessible from any test script
seems like a good idea.

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

14 April 2022, 00:17:00

On Wed, Apr 13, 2022 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah, we have band-aided around this type of problem repeatedly.
> Making a fix that's readily accessible from any test script
> seems like a good idea.

We might even be able to consistently rely on this new option, given
*any* problem involving test stability and VACUUM. Having a
one-size-fits-all solution to these kinds of stability problems would
be nice -- no more DISABLE_PAGE_SKIPPING bandaids.

That general approach will be possible provided an inability to
acquire a cleanup lock during VACUUM (which can more or less occur at
random in most environments) doesn't ever lead to unexpected test
results. There is good reason to think that it might work out that
way. Simulating problems with acquiring cleanup locks during VACUUM
left me with the impression that that could really work:

https://postgr.es/m/CAH2-WzkiB-qcsBmWrpzP0nxvrQExoUts1d7TYShg_DrkOHeg4Q@mail.gmail.com

--
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

14 April 2022, 00:35:02

Hi,

On 2022-04-13 16:45:44 -0700, Peter Geoghegan wrote:
> On Wed, Apr 13, 2022 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > So what seems to be happening on wrasse is that a background
> > autovacuum (or really autoanalyze?) is preventing pages from
> > being marked all-visible not only during test_setup.sql but
> > also create_index.sql; but it's gone by the time sanity_check.sql
> > runs.
> 
> I agree that it would need to be an autoanalyze (due to the
> PROC_IN_VACUUM optimization).

That's not a realiable protection - the snapshot is established normally
at first, only after a while we set PROC_IN_VACUUM...

Greetings,

Andres Freund

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

14 April 2022, 00:35:50

Peter Geoghegan <pg@bowt.ie> writes:
> On Wed, Apr 13, 2022 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, we have band-aided around this type of problem repeatedly.
>> Making a fix that's readily accessible from any test script
>> seems like a good idea.

> We might even be able to consistently rely on this new option, given
> *any* problem involving test stability and VACUUM. Having a
> one-size-fits-all solution to these kinds of stability problems would
> be nice -- no more DISABLE_PAGE_SKIPPING bandaids.

My guess is that you'd need both this new wait-for-horizon behavior
*and* DISABLE_PAGE_SKIPPING.  But the two together ought to make
for pretty reproducible behavior.  I noticed while scanning the
commit log that some patches have tried adding a FREEZE option,
which seems more like waving a dead chicken than anything that'd
improve stability.

We'd not necessarily have to embed wait-for-horizon into VACUUM
itself.  It seems like a SQL-accessible function could be written
and then called before any problematic VACUUM.  I like this better
for something we're thinking of jamming in post-feature-freeze;
we'd not be committing to the feature quite as much as if we
added a VACUUM option.

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

14 April 2022, 00:59:00

Hi,

On 2022-04-13 18:54:06 -0400, Tom Lane wrote:
> We used to have a rate limit on how often stats reports would be sent
> to the collector, which'd ensure half a second or so delay before a
> transaction's change counts became visible to the autovac daemon.

Just for posterity: That's not actually gone. But what is gone is the
rate limiting in autovacuum about requesting recent stats for a table /
autovac seeing slightly older stats.

Greetings,

Andres Freund

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

14 April 2022, 01:03:54

On Wed, Apr 13, 2022 at 5:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> My guess is that you'd need both this new wait-for-horizon behavior
> *and* DISABLE_PAGE_SKIPPING.  But the two together ought to make
> for pretty reproducible behavior.  I noticed while scanning the
> commit log that some patches have tried adding a FREEZE option,
> which seems more like waving a dead chicken than anything that'd
> improve stability.

I think that it's more likely that FREEZE will correct problems, out of the two:

* FREEZE forces an aggressive VACUUM whose FreezeLimit is as recent a
cutoff value as possible (FreezeLimit will be equal to OldestXmin).

* DISABLE_PAGE_SKIPPING also forces an aggressive VACUUM. But unlike
FREEZE it makes VACUUM not use the visibility map, even in the case of
all-frozen pages. And it changes nothing about FreezeLimit.

It's also a certainty that VACUUM(FREEZE, DISABLE_PAGE_SKIPPING) is
not a sensible remedy for any problem with test stability, but there
are still some examples of that combination in the regression tests.
The only way it could make sense is if the visibility map might be
corrupt, but surely we're not expecting that anyway (and if we were
we'd be testing it more directly).

I recently argued that DISABLE_PAGE_SKIPPING should have nothing to do
with aggressive vacuuming -- that should all be left up to VACUUM
FREEZE. It seems more logical to make DISABLE_PAGE_SKIPPING mean
"don't use the visibility map to skip anything", without bringing
aggressiveness into it at all. That would be less confusing.

> We'd not necessarily have to embed wait-for-horizon into VACUUM
> itself.  It seems like a SQL-accessible function could be written
> and then called before any problematic VACUUM.  I like this better
> for something we're thinking of jamming in post-feature-freeze;
> we'd not be committing to the feature quite as much as if we
> added a VACUUM option.

Hmm. I would say that the feature has zero appeal to users anyway.
Maybe it can and should be done through an SQL function for other
reasons, though. Users already think that there are several different
flavors of VACUUM, which isn't really true.

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

14 April 2022, 01:05:05

Hi,

On 2022-04-13 20:35:50 -0400, Tom Lane wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
> > On Wed, Apr 13, 2022 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Yeah, we have band-aided around this type of problem repeatedly.
> >> Making a fix that's readily accessible from any test script
> >> seems like a good idea.
> 
> > We might even be able to consistently rely on this new option, given
> > *any* problem involving test stability and VACUUM. Having a
> > one-size-fits-all solution to these kinds of stability problems would
> > be nice -- no more DISABLE_PAGE_SKIPPING bandaids.
> 
> My guess is that you'd need both this new wait-for-horizon behavior
> *and* DISABLE_PAGE_SKIPPING.  But the two together ought to make
> for pretty reproducible behavior.  I noticed while scanning the
> commit log that some patches have tried adding a FREEZE option,
> which seems more like waving a dead chicken than anything that'd
> improve stability.

I think most of those we've ended up replacing by using temp tables in
those tests instead, since they're not affected by the global horizon
anymore.


> We'd not necessarily have to embed wait-for-horizon into VACUUM
> itself.

I'm not sure it'd be quite reliable outside of vacuum though, due to the
horizon potentially going backwards (in otherwise harmless ways)?


> It seems like a SQL-accessible function could be written
> and then called before any problematic VACUUM.  I like this better
> for something we're thinking of jamming in post-feature-freeze;
> we'd not be committing to the feature quite as much as if we
> added a VACUUM option.

We could otherwise just disable IOS for that query, for now.

Greetings,

Andres Freund

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

14 April 2022, 01:05:15

On Wed, Apr 13, 2022 at 6:03 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I think that it's more likely that FREEZE will correct problems, out of the two:
>
> * FREEZE forces an aggressive VACUUM whose FreezeLimit is as recent a
> cutoff value as possible (FreezeLimit will be equal to OldestXmin).

The reason why that might have helped (at least in the past) is that
it's enough to force us to wait for a cleanup lock to prune and
freeze, if necessary. Which was never something that
DISABLE_PAGE_SKIPPING could do.

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

14 April 2022, 01:08:08

On Wed, Apr 13, 2022 at 6:05 PM Andres Freund <andres@anarazel.de> wrote:
> I think most of those we've ended up replacing by using temp tables in
> those tests instead, since they're not affected by the global horizon
> anymore.

Maybe, but it's a pain to have to work that way. You can't do it in
cases like this, because a temp table is not workable. So that's not
an ideal long term solution.

> > We'd not necessarily have to embed wait-for-horizon into VACUUM
> > itself.
>
> I'm not sure it'd be quite reliable outside of vacuum though, due to the
> horizon potentially going backwards (in otherwise harmless ways)?

I agree, since vacuumlazy.c would need to either be given its own
OldestXmin, or knowledge of a wait-up-to XID. Either way we have to
make non-trivial changes to vacuumlazy.c.

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

14 April 2022, 01:23:12

Andres Freund <andres@anarazel.de> writes:
> On 2022-04-13 20:35:50 -0400, Tom Lane wrote:
>> It seems like a SQL-accessible function could be written
>> and then called before any problematic VACUUM.  I like this better
>> for something we're thinking of jamming in post-feature-freeze;
>> we'd not be committing to the feature quite as much as if we
>> added a VACUUM option.

> We could otherwise just disable IOS for that query, for now.

The entire point of that test case is to verify the shape of the
IOS plan, so no that's not an acceptable answer.  But if we're
looking for quick hacks, we could do

update pg_class set relallvisible = relpages where relname = 'tenk1';

just before that test.

I'm still suspicious of the pgstat changes, though.  I checked into
things here by doing

    initdb
    edit postgresql.conf to set log_autovacuum_min_duration = 0
    pg_ctl start && make installcheck-parallel

and what I see is that the first reported autovacuum activity begins
exactly one minute after the postmaster starts, which is what I'd
expect given the autovacuum naptime rules.  On my machine, of course,
the installcheck-parallel run is long gone by then.  But even on the
much slower wrasse, we should be well past create_index by the time any
autovac worker launches --- you can see from wrasse's reported test
runtimes that only about 10 seconds have elapsed when it get to the end
of create_index.

This suggests to me that what is holding the (presumed) conflicting
snapshot must be the autovac launcher, because what else could it be?
So I'm suspicious that we broke something in that area, though I am
baffled why only wrasse would be telling us so.

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

14 April 2022, 01:51:12

Hi,

Noah, any chance you could enable log_autovacuum_min_duration=0 on
wrasse?

On 2022-04-13 21:23:12 -0400, Tom Lane wrote:
> I'm still suspicious of the pgstat changes, though.  I checked into
> things here by doing
> 
>     initdb
>     edit postgresql.conf to set log_autovacuum_min_duration = 0
>     pg_ctl start && make installcheck-parallel
> 
> and what I see is that the first reported autovacuum activity begins
> exactly one minute after the postmaster starts, which is what I'd
> expect given the autovacuum naptime rules.

It'd not necessarily have to be autovacuum. A CREATE INDEX or VACUUM
using parallelism, could also cause this, I think. It'd be a narrow
window, of course...

Does sparc have wider alignment rules for some types? Perhaps that'd be
enough to put some tables to be sufficiently larger to trigger parallel
vacuum?

Greetings,

Andres Freund

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

14 April 2022, 02:18:22

Andres Freund <andres@anarazel.de> writes:
> Noah, any chance you could enable log_autovacuum_min_duration=0 on
> wrasse?

+1

> Does sparc have wider alignment rules for some types? Perhaps that'd be
> enough to put some tables to be sufficiently larger to trigger parallel
> vacuum?

No, the configure results on wrasse look pretty ordinary:

checking size of void *... 8
checking size of size_t... 8
checking size of long... 8
checking alignment of short... 2
checking alignment of int... 4
checking alignment of long... 8
checking alignment of double... 8

I wondered for a moment about force_parallel_mode, but wrasse doesn't
appear to be setting that, and in any case I'm pretty sure it only
affects plannable statements.

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Noah Misch

Date:

14 April 2022, 02:51:22

On Wed, Apr 13, 2022 at 06:51:12PM -0700, Andres Freund wrote:
> Noah, any chance you could enable log_autovacuum_min_duration=0 on
> wrasse?

Done.  Also forced hourly builds.

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

14 April 2022, 16:01:23

Noah Misch <noah@leadboat.com> writes:
> On Wed, Apr 13, 2022 at 06:51:12PM -0700, Andres Freund wrote:
>> Noah, any chance you could enable log_autovacuum_min_duration=0 on
>> wrasse?

> Done.  Also forced hourly builds.

Thanks!  We now have two failing runs with the additional info [1][2],
and in both, it's clear that the first autovac worker doesn't launch
until 1 minute after postmaster start, by which time we're long done
with the test scripts of interest.  So whatever is breaking this is
not an autovac worker.

I think I'm going to temporarily add a couple of queries to check
what tenk1's relallvisible actually is, just so we can confirm
positively that that's what's causing the plan change.  (I'm also
curious about whether the CREATE INDEX steps manage to change it
at all.)

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-04-14%2013%3A28%3A14
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-04-14%2004%3A48%3A13

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

14 April 2022, 16:18:50

Hi,

On 2022-04-14 12:01:23 -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Wed, Apr 13, 2022 at 06:51:12PM -0700, Andres Freund wrote:
> >> Noah, any chance you could enable log_autovacuum_min_duration=0 on
> >> wrasse?
> 
> > Done.  Also forced hourly builds.

Thanks! Can you repro the problem manually on wrasse, perhaps even
outside the buildfarm script? That might be simpler than debugging via
the BF...

> Thanks!  We now have two failing runs with the additional info [1][2],
> and in both, it's clear that the first autovac worker doesn't launch
> until 1 minute after postmaster start, by which time we're long done
> with the test scripts of interest.  So whatever is breaking this is
> not an autovac worker.

I did some experiments around that too, and didn't find any related
problems.

For a second I was wondering if it's caused by the time of initdb (which
ends up with a working pgstat snapshot now, but didn't before), but
that's just a few more seconds. While the BF scripts don't show
timestamps for initdb, the previous step's log output confirms that it's
just a few seconds...

> I think I'm going to temporarily add a couple of queries to check
> what tenk1's relallvisible actually is, just so we can confirm
> positively that that's what's causing the plan change.  (I'm also
> curious about whether the CREATE INDEX steps manage to change it
> at all.)

I wonder if we should make VACUUM log the VERBOSE output at DEBUG1
unconditionally. This is like the third bug where we needed that
information, and it's practically impossible to include in regression
output. Then we'd know what the xid horizon is, whether pages were
skipped, etc.

It also just generally seems like a good thing.

Greetings,

Andres Freund

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

14 April 2022, 16:21:54

On Thu, Apr 14, 2022 at 9:18 AM Andres Freund <andres@anarazel.de> wrote:
> I wonder if we should make VACUUM log the VERBOSE output at DEBUG1
> unconditionally. This is like the third bug where we needed that
> information, and it's practically impossible to include in regression
> output. Then we'd know what the xid horizon is, whether pages were
> skipped, etc.

I like the idea of making VACUUM log the VERBOSE output as a
configurable user-visible feature. We'll then be able to log all
VACUUM statements (not just autovacuum worker VACUUMs).

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

14 April 2022, 16:26:20

Andres Freund <andres@anarazel.de> writes:
> Thanks! Can you repro the problem manually on wrasse, perhaps even
> outside the buildfarm script?

I'm working on that right now, actually...

> I wonder if we should make VACUUM log the VERBOSE output at DEBUG1
> unconditionally. This is like the third bug where we needed that
> information, and it's practically impossible to include in regression
> output. Then we'd know what the xid horizon is, whether pages were
> skipped, etc.

Right at the moment it seems like we also need visibility into what
CREATE INDEX is doing.

I'm not sure I'd buy into permanent changes here (at least not ones made
in haste), but temporarily adding more logging seems perfectly reasonable.

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

14 April 2022, 16:48:30

Hi,

On 2022-04-14 12:26:20 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Thanks! Can you repro the problem manually on wrasse, perhaps even
> > outside the buildfarm script?

Ah, cool.


> I'm working on that right now, actually...
> 
> > I wonder if we should make VACUUM log the VERBOSE output at DEBUG1
> > unconditionally. This is like the third bug where we needed that
> > information, and it's practically impossible to include in regression
> > output. Then we'd know what the xid horizon is, whether pages were
> > skipped, etc.
> 
> Right at the moment it seems like we also need visibility into what
> CREATE INDEX is doing.

> I'm not sure I'd buy into permanent changes here (at least not ones made
> in haste), but temporarily adding more logging seems perfectly reasonable.

I think it might be worth leaving in, but let's debate that separately?
I'm thinking of something like the attached.

Greetings,

Andres Freund

Attachment

vacuum-log-verbose-debug1.diff

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

14 April 2022, 17:07:13

On Thu, Apr 14, 2022 at 9:48 AM Andres Freund <andres@anarazel.de> wrote:
> I think it might be worth leaving in, but let's debate that separately?
> I'm thinking of something like the attached.

The current convention for the "extra" ereport()s that VACUUM VERBOSE
outputs at INFO elevel is to use DEBUG2 elevel in all other cases
(these extra messages are considered part of VACUUM VERBOSE output,
but are *not* considered part of the autovacuum log output).

It looks like you're changing the elevel convention for these "extra"
messages with this patch. That might be fine, but don't forget about
similar ereports() in vacuumparallel.c. I think that the elevel should
probably remain uniform across all of these messages. Though I don't
particular care if it's DEBUG2 or DEBUG5.

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

14 April 2022, 17:12:42

On Thu, Apr 14, 2022 at 10:07 AM Peter Geoghegan <pg@bowt.ie> wrote:
> It looks like you're changing the elevel convention for these "extra"
> messages with this patch. That might be fine, but don't forget about
> similar ereports() in vacuumparallel.c. I think that the elevel should
> probably remain uniform across all of these messages. Though I don't
> particular care if it's DEBUG2 or DEBUG5.

Also, don't forget to do something here, with the assertion and with
the message:

            if (verbose)
            {
                /*
                 * Aggressiveness already reported earlier, in dedicated
                 * VACUUM VERBOSE ereport
                 */
                Assert(!params->is_wraparound);
                msgfmt = _("finished vacuuming \"%s.%s.%s\": index
scans: %d\n");
            }
            else if (params->is_wraparound)
            {

Presumably we will need to report on antiwraparound-ness in the
verbose-debug-elevel-for-autovacuum case (and not allow this assertion
to fail).

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

14 April 2022, 21:33:15

I wroteL
> Andres Freund <andres@anarazel.de> writes:
>> Thanks! Can you repro the problem manually on wrasse, perhaps even
>> outside the buildfarm script?

> I'm working on that right now, actually...

So far, reproducing it manually has been a miserable failure: I've
run about 180 cycles of the core regression tests with no error.
Not sure what's different between my test scenario and wrasse's.

Meanwhile, wrasse did fail with my relallvisible check in place [1],
and what that shows is that relallvisible is *zero* to start with
and remains so throughout the CREATE INDEX sequence.  That pretty
definitively proves that it's not a page-skipping problem but
an xmin-horizon-too-old problem.  We're no closer to understanding
where that horizon value is coming from, though.

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-04-14%2019%3A28%3A12

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

14 April 2022, 22:08:45

On Thu, Apr 14, 2022 at 2:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Meanwhile, wrasse did fail with my relallvisible check in place [1],
> and what that shows is that relallvisible is *zero* to start with
> and remains so throughout the CREATE INDEX sequence.  That pretty
> definitively proves that it's not a page-skipping problem but
> an xmin-horizon-too-old problem.  We're no closer to understanding
> where that horizon value is coming from, though.

Have you looked at the autovacuum log output in more detail? It might
be possible to debug further, but looks like there are no XIDs to work
off of in the log_line_prefix that's in use on wrasse.

The CITester log_line_prefix is pretty useful -- I wonder if we can
standardize on that within the buildfarm, too.
-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

14 April 2022, 22:23:38

Peter Geoghegan <pg@bowt.ie> writes:
> Have you looked at the autovacuum log output in more detail?

I don't think there's anything to be learned there.  The first autovacuum
in wrasse's log happens long after things went south:

2022-04-14 22:49:15.177 CEST [9427:1] LOG:  automatic vacuum of table "regression.pg_catalog.pg_type": index scans: 1
    pages: 0 removed, 49 remain, 49 scanned (100.00% of total)
    tuples: 539 removed, 1112 remain, 0 are dead but not yet removable
    removable cutoff: 8915, older by 1 xids when operation ended
    index scan needed: 34 pages from table (69.39% of total) had 1107 dead item identifiers removed
    index "pg_type_oid_index": pages: 14 in total, 0 newly deleted, 0 currently deleted, 0 reusable
    index "pg_type_typname_nsp_index": pages: 13 in total, 0 newly deleted, 0 currently deleted, 0 reusable
    avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
    buffer usage: 193 hits, 0 misses, 0 dirtied
    WAL usage: 116 records, 0 full page images, 14113 bytes
    system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

If we captured equivalent output from the manual VACUUM in test_setup,
maybe something could be learned.  However, it seems virtually certain
to me that the problematic xmin is in some background process
(eg autovac launcher) and thus wouldn't show up in the postmaster log,
log_line_prefix or no.

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

14 April 2022, 22:28:34

On Thu, Apr 14, 2022 at 3:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If we captured equivalent output from the manual VACUUM in test_setup,
> maybe something could be learned.  However, it seems virtually certain
> to me that the problematic xmin is in some background process
> (eg autovac launcher) and thus wouldn't show up in the postmaster log,
> log_line_prefix or no.

A bunch of autovacuums that ran between "2022-04-14 22:49:16.274" and
"2022-04-14 22:49:19.088" all have the same "removable cutoff".

The logs from this time show a period of around three seconds
(likely more) where something held back OldestXmin generally.
That does seem a bit fishy to me, even though it happened about a
minute after the failure itself took place.

--
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

15 April 2022, 01:18:40

On Thu, Apr 14, 2022 at 3:28 PM Peter Geoghegan <pg@bowt.ie> wrote:
> A bunch of autovacuums that ran between "2022-04-14 22:49:16.274" and
> "2022-04-14 22:49:19.088" all have the same "removable cutoff".

Are you aware of Andres' commit 02fea8fd? That work prevented exactly
the same set of symptoms (the same index-only scan create_index
regressions), which was apparently necessary following the
rearrangements to the regression tests to remove cross-script
dependencies (Tom's commit cc50080a82).

This was the thread that led to Andres' commit, which was just over a month ago:

https://www.postgresql.org/message-id/flat/CAJ7c6TPJNof1Q+vJsy3QebgbPgXdu2ErPvYkBdhD6_Ckv5EZRg@mail.gmail.com

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 01:32:27

Peter Geoghegan <pg@bowt.ie> writes:
> Are you aware of Andres' commit 02fea8fd? That work prevented exactly
> the same set of symptoms (the same index-only scan create_index
> regressions),

Hm.  I'm starting to get the feeling that the real problem here is
we've "optimized" the system to the point where repeatable results
from VACUUM are impossible :-(

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

15 April 2022, 01:49:48

On Thu, Apr 14, 2022 at 6:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hm.  I'm starting to get the feeling that the real problem here is
> we've "optimized" the system to the point where repeatable results
> from VACUUM are impossible :-(

I don't think that there is any fundamental reason why VACUUM cannot
have repeatable results.

Anyway, I suppose it's possible that problems reappeared here due to
some other patch. Something else could have broken Andres' earlier
band aid solution (which was to set synchronous_commit=on in
test_setup).

Is there any patch that could plausibly have had that effect, whose
commit fits with our timeline for the problems seen on wrasse?

--
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

15 April 2022, 01:52:49

Hi,

On 2022-04-14 21:32:27 -0400, Tom Lane wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
> > Are you aware of Andres' commit 02fea8fd? That work prevented exactly
> > the same set of symptoms (the same index-only scan create_index
> > regressions),
> 
> Hm.  I'm starting to get the feeling that the real problem here is
> we've "optimized" the system to the point where repeatable results
> from VACUUM are impossible :-(

The synchronous_commit issue is an old one. It might actually be worth
addressing it by flushing out pending async commits out instead. It just
started to be noticeable when tenk1 load and vacuum were moved closer.

What do you think about applying a polished version of what I posted in
https://postgr.es/m/20220414164830.63rk5zqsvtqqk7qz%40alap3.anarazel.de
? That'd tell us a bit more about the horizon etc.

It doesn't have to be the autovacuum launcher. I think it shouldn't even
be taken into account - it's not database connected, and tenk1 isn't a
shared relation.  All very odd.

It's also interesting that it only happens in the installcheck cases,
afaics, not the check ones. Although that might just be because there's
more of them...

Greetings,

Andres Freund

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 01:53:34

Peter Geoghegan <pg@bowt.ie> writes:
> Anyway, I suppose it's possible that problems reappeared here due to
> some other patch. Something else could have broken Andres' earlier
> band aid solution (which was to set synchronous_commit=on in
> test_setup).

That band-aid only addressed the situation of someone having turned
off synchronous_commit in the first place; which is not the case
on wrasse or most/all other buildfarm animals.  Whatever we're
dealing with here is something independent of that.

> Is there any patch that could plausibly have had that effect, whose
> commit fits with our timeline for the problems seen on wrasse?

I already enumerated my suspects, back at the top of this thread.

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

15 April 2022, 01:59:14

On Thu, Apr 14, 2022 at 6:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> That band-aid only addressed the situation of someone having turned
> off synchronous_commit in the first place; which is not the case
> on wrasse or most/all other buildfarm animals.  Whatever we're
> dealing with here is something independent of that.

That was the intent, but that in itself doesn't mean that it isn't
something to do with setting hint bits (not the OldestXmin horizon
being held back). I'd really like to know what the removable cutoff
looks like for these VACUUM operations, which is something like
Andres' VACUUM VERBOSE debug patch should tell us.

> > Is there any patch that could plausibly have had that effect, whose
> > commit fits with our timeline for the problems seen on wrasse?
>
> I already enumerated my suspects, back at the top of this thread.

Right, but I thought that the syncronous_commit thing was new
information that made that worth revisiting.

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 02:20:29

Peter Geoghegan <pg@bowt.ie> writes:
> That was the intent, but that in itself doesn't mean that it isn't
> something to do with setting hint bits (not the OldestXmin horizon
> being held back).

Oh!  You mean that maybe the OldestXmin horizon was fine, but something
decided not to update hint bits (and therefore also not the all-visible
bit) anyway?  Worth investigating I guess.

> I'd really like to know what the removable cutoff
> looks like for these VACUUM operations, which is something like
> Andres' VACUUM VERBOSE debug patch should tell us.

Yeah.  I'd hoped to investigate this manually and not have to clutter
the main repo with debugging commits.  However, since I've still
utterly failed to reproduce the problem on wrasse's host, I think
we're going to be forced to do that.

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

15 April 2022, 02:27:26

On Thu, Apr 14, 2022 at 7:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Oh!  You mean that maybe the OldestXmin horizon was fine, but something
> decided not to update hint bits (and therefore also not the all-visible
> bit) anyway?  Worth investigating I guess.

Yes. That is starting to seem like a plausible alternative explanation.

> > I'd really like to know what the removable cutoff
> > looks like for these VACUUM operations, which is something like
> > Andres' VACUUM VERBOSE debug patch should tell us.
>
> Yeah.  I'd hoped to investigate this manually and not have to clutter
> the main repo with debugging commits.

Suppose that the bug was actually in 06f5295af6, "Add single-item
cache when looking at topmost XID of a subtrans XID". Doesn't that fit
your timeline just as well?

I haven't really started to investigate that theory (just putting
dinner on here). Just a wild guess at this point.

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 02:40:51

Peter Geoghegan <pg@bowt.ie> writes:
> Suppose that the bug was actually in 06f5295af6, "Add single-item
> cache when looking at topmost XID of a subtrans XID". Doesn't that fit
> your timeline just as well?

I'd dismissed that on the grounds that there are no subtrans XIDs
involved in tenk1's contents.  However, if that patch was faulty
enough, maybe it affected other cases besides the advertised one?
I've not read it.

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Noah Misch

Date:

15 April 2022, 02:45:15

On Thu, Apr 14, 2022 at 06:52:49PM -0700, Andres Freund wrote:
> On 2022-04-14 21:32:27 -0400, Tom Lane wrote:
> > Peter Geoghegan <pg@bowt.ie> writes:
> > > Are you aware of Andres' commit 02fea8fd? That work prevented exactly
> > > the same set of symptoms (the same index-only scan create_index
> > > regressions),
> > 
> > Hm.  I'm starting to get the feeling that the real problem here is
> > we've "optimized" the system to the point where repeatable results
> > from VACUUM are impossible :-(
> 
> The synchronous_commit issue is an old one. It might actually be worth
> addressing it by flushing out pending async commits out instead. It just
> started to be noticeable when tenk1 load and vacuum were moved closer.
> 
> 
> What do you think about applying a polished version of what I posted in
> https://postgr.es/m/20220414164830.63rk5zqsvtqqk7qz%40alap3.anarazel.de
> ? That'd tell us a bit more about the horizon etc.

No objection.

> It's also interesting that it only happens in the installcheck cases,
> afaics, not the check ones. Although that might just be because there's
> more of them...

I suspect the failure is somehow impossible in "check".  Yesterday, I cranked
up the number of locales, so there are now a lot more installcheck.  Before
that, each farm run had one "check" and two "installcheck".  Those days saw
ten installcheck failures, zero check failures.

Like Tom, I'm failing to reproduce this outside the buildfarm client.  I wrote
a shell script to closely resemble the buildfarm installcheck sequence, but
it's lasted a dozen runs without failing.

Re: Intermittent buildfarm failures on wrasse

From

Noah Misch

Date:

15 April 2022, 02:50:19

On Thu, Apr 14, 2022 at 07:45:15PM -0700, Noah Misch wrote:
> On Thu, Apr 14, 2022 at 06:52:49PM -0700, Andres Freund wrote:
> > On 2022-04-14 21:32:27 -0400, Tom Lane wrote:
> > > Peter Geoghegan <pg@bowt.ie> writes:
> > > > Are you aware of Andres' commit 02fea8fd? That work prevented exactly
> > > > the same set of symptoms (the same index-only scan create_index
> > > > regressions),
> > > 
> > > Hm.  I'm starting to get the feeling that the real problem here is
> > > we've "optimized" the system to the point where repeatable results
> > > from VACUUM are impossible :-(
> > 
> > The synchronous_commit issue is an old one. It might actually be worth
> > addressing it by flushing out pending async commits out instead. It just
> > started to be noticeable when tenk1 load and vacuum were moved closer.
> > 
> > 
> > What do you think about applying a polished version of what I posted in
> > https://postgr.es/m/20220414164830.63rk5zqsvtqqk7qz%40alap3.anarazel.de
> > ? That'd tell us a bit more about the horizon etc.
> 
> No objection.
> 
> > It's also interesting that it only happens in the installcheck cases,
> > afaics, not the check ones. Although that might just be because there's
> > more of them...
> 
> I suspect the failure is somehow impossible in "check".  Yesterday, I cranked
> up the number of locales, so there are now a lot more installcheck.  Before
> that, each farm run had one "check" and two "installcheck".  Those days saw
> ten installcheck failures, zero check failures.
> 
> Like Tom, I'm failing to reproduce this outside the buildfarm client.  I wrote
> a shell script to closely resemble the buildfarm installcheck sequence, but
> it's lasted a dozen runs without failing.

But 24s after that email, it did reproduce the problem.  Same symptoms as the
last buildfarm runs, including visfrac=0.  I'm attaching my script.  (It has
various references to my home directory, so it's not self-contained.)

Attachment

repro-wrasse-vis.sh

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 02:54:40

Noah Misch <noah@leadboat.com> writes:
> Like Tom, I'm failing to reproduce this outside the buildfarm client.

This is far from the first time that I've failed to reproduce a buildfarm
result manually, even on the very machine hosting the animal.  I would
like to identify the cause(s) of that.  One obvious theory is that the
environment under a cron job is different --- but the only thing I know
of that should be different is possibly nice'ing the job priorities.
I did try a fair number of test cycles under "nice" in this case.
Anybody have other ideas?

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 03:06:04

Noah Misch <noah@leadboat.com> writes:
> But 24s after that email, it did reproduce the problem.

Ain't that always the way.

> Same symptoms as the
> last buildfarm runs, including visfrac=0.  I'm attaching my script.  (It has
> various references to my home directory, so it's not self-contained.)

That's interesting, because I see you used installcheck-parallel,
which I'd not been using on the grounds that wrasse isn't parallelizing
these runs.  That puts a big hole in my working assumption that the
problem is one of timing.  I'm now suspecting that the key issue is
something about how wrasse is building the executables that I did
not exactly reproduce.  I did not try to copy the build details
involving stuff under your home directory (like the private openldap
version), mainly because it didn't seem like openldap or uuid or
perl could be involved at all.  But maybe somehow?

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

15 April 2022, 03:06:16

On Thu, Apr 14, 2022 at 7:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> This is far from the first time that I've failed to reproduce a buildfarm
> result manually, even on the very machine hosting the animal.  I would
> like to identify the cause(s) of that.  One obvious theory is that the
> environment under a cron job is different --- but the only thing I know
> of that should be different is possibly nice'ing the job priorities.
> I did try a fair number of test cycles under "nice" in this case.
> Anybody have other ideas?

Well, Noah is running wrasse with 'fsync = off'. And did so in the
script as well.

That seems like it definitely could matter.

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 03:17:16

Peter Geoghegan <pg@bowt.ie> writes:
> Well, Noah is running wrasse with 'fsync = off'. And did so in the
> script as well.

As am I.  I duplicated wrasse's config to the extent of

cat >>$PGDATA/postgresql.conf <<EOF
log_line_prefix = '%m [%p:%l] %q%a '
log_connections = 'true'
log_disconnections = 'true'
log_statement = 'all'
fsync = off
log_autovacuum_min_duration = 0
EOF

One thing I'm eyeing now is that it looks like Noah is re-initdb'ing
each time, whereas I'd just stopped and started the postmaster of
an existing installation.  That does not seem like it could matter
but ...

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Noah Misch

Date:

15 April 2022, 03:27:09

On Thu, Apr 14, 2022 at 11:06:04PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > But 24s after that email, it did reproduce the problem.
> 
> Ain't that always the way.

Quite so.

> > Same symptoms as the
> > last buildfarm runs, including visfrac=0.  I'm attaching my script.  (It has
> > various references to my home directory, so it's not self-contained.)
> 
> That's interesting, because I see you used installcheck-parallel,
> which I'd not been using on the grounds that wrasse isn't parallelizing
> these runs.  That puts a big hole in my working assumption that the
> problem is one of timing.

With "make installcheck-tests TESTS='test_setup create_index'" it remains
reproducible.  The attached script reproduced it in 103s and then in 703s.

> I'm now suspecting that the key issue is
> something about how wrasse is building the executables that I did
> not exactly reproduce.  I did not try to copy the build details
> involving stuff under your home directory (like the private openldap
> version), mainly because it didn't seem like openldap or uuid or
> perl could be involved at all.  But maybe somehow?

Can't rule it out entirely.  I think I've now put world-read on all the
directories referenced in the script, in the event you'd like to use them.

Attachment

repro-wrasse-vis.sh

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 03:56:15

I wrote:
> One thing I'm eyeing now is that it looks like Noah is re-initdb'ing
> each time, whereas I'd just stopped and started the postmaster of
> an existing installation.  That does not seem like it could matter
> but ...

Well, damn.  I changed my script that way and it failed on the tenth
iteration (versus a couple hundred successful iterations the other
way).  So somehow this is related to time-since-initdb, not
time-since-postmaster-start.  Any ideas?

Anyway, I'm too tired to do more tonight, but now that I can reproduce it
I will stick some debugging logic in tomorrow.  I no longer think we
should clutter the git repo with any more short-term hacks.

            regards, tom lane

PS a bit later: I've not yet reproduced it a second time, so the
failure rate is unfortunately a lot less than one-in-ten.  Still,
this eliminates the idea that there's some secret sauce in Noah's
build details.

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

15 April 2022, 04:50:27

Hi,

On 2022-04-14 22:40:51 -0400, Tom Lane wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
> > Suppose that the bug was actually in 06f5295af6, "Add single-item
> > cache when looking at topmost XID of a subtrans XID". Doesn't that fit
> > your timeline just as well?
> 
> I'd dismissed that on the grounds that there are no subtrans XIDs
> involved in tenk1's contents.  However, if that patch was faulty
> enough, maybe it affected other cases besides the advertised one?
> I've not read it.

I was planning to complain about that commit, fwiw. Without so much as
an assertion verifying the cache is correct it seems quite dangerous to
me.

And looking at it, it has obvious wraparound issues... But that can't
matter here, obviously.

We also reach SubTransGetTopmostTransaction() from XactLockTableWait()
but I don't quite see how we reach that here either...

Greetings,

Andres Freund

Re: Intermittent buildfarm failures on wrasse

From

Noah Misch

Date:

15 April 2022, 05:01:16

On Thu, Apr 14, 2022 at 11:56:15PM -0400, Tom Lane wrote:
> Anyway, I'm too tired to do more tonight, but now that I can reproduce it
> I will stick some debugging logic in tomorrow.  I no longer think we
> should clutter the git repo with any more short-term hacks.

Sounds good.  I've turned off the wrasse buildfarm client for the moment.
It's less useful than your local setup, and they'd compete for resources.

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

15 April 2022, 05:05:17

Hi,

On 2022-04-14 23:56:15 -0400, Tom Lane wrote:
> I wrote:
> > One thing I'm eyeing now is that it looks like Noah is re-initdb'ing
> > each time, whereas I'd just stopped and started the postmaster of
> > an existing installation.  That does not seem like it could matter
> > but ...
> 
> Well, damn.  I changed my script that way and it failed on the tenth
> iteration (versus a couple hundred successful iterations the other
> way).

Just to make sure: This is also on wrasse?

What DSM backend do we end up with on solaris? With shared memory stats
we're using DSM a lot earlier and more commonly than before.

Another thing that might be worth trying is to enable checksums. I've
caught weird bugs with that in the past. And it's possible that bgwriter
writes out a page that we then read back in quickly after, or something
like that.

> So somehow this is related to time-since-initdb, not
> time-since-postmaster-start.  Any ideas?

Perhaps it makes a difference that we start with a "young" database xid
age wise?  We've had bugs around subtracting xids and ending up on some
special one in the past.

Greetings,

Andres Freund

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

15 April 2022, 05:12:05

Hi,

On 2022-04-14 19:45:15 -0700, Noah Misch wrote:
> I suspect the failure is somehow impossible in "check".  Yesterday, I cranked
> up the number of locales, so there are now a lot more installcheck.  Before
> that, each farm run had one "check" and two "installcheck".  Those days saw
> ten installcheck failures, zero check failures.

I notice that the buildfarm appears to run initdb with syncing enabled
("syncing data to disk ... ok" in the initdb steps).  Whereas pg_regress
uses --no-sync.

I wonder if that's what makes the difference? Now that you reproduced
it, does it still reproduce with --no-sync added?

Also worth noting that pg_regress doesn't go through pg_ctl...

Greetings,

Andres Freund

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 05:18:59

Andres Freund <andres@anarazel.de> writes:
> On 2022-04-14 23:56:15 -0400, Tom Lane wrote:
>> Well, damn.  I changed my script that way and it failed on the tenth
>> iteration (versus a couple hundred successful iterations the other
>> way).

> Just to make sure: This is also on wrasse?

Right, gcc211 with a moderately close approximation to wrasse's
build details.  Why that shows the problem when we've not seen
it elsewhere remains to be seen.

> What DSM backend do we end up with on solaris? With shared memory stats
> we're using DSM a lot earlier and more commonly than before.

That ... is an interesting point.  It seems to be just "posix" though.

>> So somehow this is related to time-since-initdb, not
>> time-since-postmaster-start.  Any ideas?

> Perhaps it makes a difference that we start with a "young" database xid
> age wise?  We've had bugs around subtracting xids and ending up on some
> special one in the past.

It does seem like it's got to be related to small XID and/or small
LSN values.  No clue right now, but more news tomorrow, I hope.

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Noah Misch

Date:

15 April 2022, 05:21:25

On Thu, Apr 14, 2022 at 10:12:05PM -0700, Andres Freund wrote:
> On 2022-04-14 19:45:15 -0700, Noah Misch wrote:
> > I suspect the failure is somehow impossible in "check".  Yesterday, I cranked
> > up the number of locales, so there are now a lot more installcheck.  Before
> > that, each farm run had one "check" and two "installcheck".  Those days saw
> > ten installcheck failures, zero check failures.
> 
> I notice that the buildfarm appears to run initdb with syncing enabled
> ("syncing data to disk ... ok" in the initdb steps).  Whereas pg_regress
> uses --no-sync.

Yep.

> I wonder if that's what makes the difference? Now that you reproduced
> it, does it still reproduce with --no-sync added?

It does; the last version of my script used "initdb -N ...".

> Also worth noting that pg_regress doesn't go through pg_ctl...

Hmmm.

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 14:15:32

The morning's first result is that during a failing run,
the vacuum in test_setup sees

2022-04-15 16:01:43.064 CEST [4436:75] pg_regress/test_setup LOG:  statement: VACUUM ANALYZE tenk1;
        
2022-04-15 16:01:43.064 CEST [4436:76] pg_regress/test_setup LOG:  vacuuming "regression.public.tenk1"
        
2022-04-15 16:01:43.064 CEST [4436:77] pg_regress/test_setup STATEMENT:  VACUUM ANALYZE tenk1;
        
2022-04-15 16:01:43.071 CEST [4436:78] pg_regress/test_setup LOG:  finished vacuuming "regression.public.tenk1": index
scans:0 
        pages: 0 removed, 345 remain, 345 scanned (100.00% of total)
        
        tuples: 0 removed, 10000 remain, 0 are dead but not yet removable
        
        removable cutoff: 724, older by 26 xids when operation ended
        index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
        avg read rate: 2.189 MB/s, avg write rate: 2.189 MB/s
        buffer usage: 695 hits, 2 misses, 2 dirtied
        WAL usage: 1 records, 0 full page images, 188 bytes
        system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2022-04-15 16:01:43.071 CEST [4436:79] pg_regress/test_setup STATEMENT:  VACUUM ANALYZE tenk1;

OldestXmin = 724 is too old to consider tenk1's contents as all-visible:

regression=# select distinct xmin from tenk1;
 xmin
------
  749
(1 row)

In fact, right after initdb pg_controldata shows
Latest checkpoint's NextXID:          0:724
Latest checkpoint's oldestXID:        716

So there's no longer any doubt that something is holding back OldestXmin.
I will go put some instrumentation into the code that's computing that.

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

15 April 2022, 15:12:10

Hi,

On 2022-04-15 10:15:32 -0400, Tom Lane wrote:
> The morning's first result is that during a failing run,
> the vacuum in test_setup sees
>
> 2022-04-15 16:01:43.064 CEST [4436:75] pg_regress/test_setup LOG:  statement: VACUUM ANALYZE tenk1;
> 2022-04-15 16:01:43.064 CEST [4436:76] pg_regress/test_setup LOG:  vacuuming "regression.public.tenk1"
> 2022-04-15 16:01:43.064 CEST [4436:77] pg_regress/test_setup STATEMENT:  VACUUM ANALYZE tenk1;
> 2022-04-15 16:01:43.071 CEST [4436:78] pg_regress/test_setup LOG:  finished vacuuming "regression.public.tenk1":
indexscans: 0
 
>         pages: 0 removed, 345 remain, 345 scanned (100.00% of total)
>         tuples: 0 removed, 10000 remain, 0 are dead but not yet removable
>         removable cutoff: 724, older by 26 xids when operation ended
>         index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
>         avg read rate: 2.189 MB/s, avg write rate: 2.189 MB/s
>         buffer usage: 695 hits, 2 misses, 2 dirtied
>         WAL usage: 1 records, 0 full page images, 188 bytes
>         system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> 2022-04-15 16:01:43.071 CEST [4436:79] pg_regress/test_setup STATEMENT:  VACUUM ANALYZE tenk1;

The horizon advancing by 26 xids during tenk1's vacuum seems like quite
a bit, given there's no normal concurrent activity during test_setup.


> In fact, right after initdb pg_controldata shows
> Latest checkpoint's NextXID:          0:724
> Latest checkpoint's oldestXID:        716

So that's the xmin that e.g. the autovac launcher ends up with during
start...

If I make get_database_list() sleep for 5s within the scan, I can
reproduce on x86-64.

Greetings,

Andres Freund

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 15:14:13

I wrote:
> the vacuum in test_setup sees
> ...
>         removable cutoff: 724, older by 26 xids when operation ended
> ...

BTW, before I forget: the wording of this log message is just awful.
On first sight, I thought that it meant that we'd computed OldestXmin
a second time and discovered that it advanced by 26 xids while the VACUUM
was running.  Looking at the code, I see that's not so:

            diff = (int32) (ReadNextTransactionId() - OldestXmin);
            appendStringInfo(&buf,
                             _("removable cutoff: %u, older by %d xids when operation ended\n"),
                             OldestXmin, diff);

but good luck understanding what it actually means from the message
text alone.  I think more appropriate wording would be something like

"removable cutoff: %u, which was %d xids old when operation ended\n"

Also, is it really our practice to spell XID in lower-case in
user-facing messages?

Thoughts, better ideas?

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

15 April 2022, 15:19:54

Hi,

On April 15, 2022 11:12:10 AM EDT, Andres Freund <andres@anarazel.de> wrote:
>Hi,
>
>On 2022-04-15 10:15:32 -0400, Tom Lane wrote:
>> The morning's first result is that during a failing run,
>> the vacuum in test_setup sees
>>
>> 2022-04-15 16:01:43.064 CEST [4436:75] pg_regress/test_setup LOG:  statement: VACUUM ANALYZE tenk1;
>> 2022-04-15 16:01:43.064 CEST [4436:76] pg_regress/test_setup LOG:  vacuuming "regression.public.tenk1"
>> 2022-04-15 16:01:43.064 CEST [4436:77] pg_regress/test_setup STATEMENT:  VACUUM ANALYZE tenk1;
>> 2022-04-15 16:01:43.071 CEST [4436:78] pg_regress/test_setup LOG:  finished vacuuming "regression.public.tenk1":
indexscans: 0 
>>         pages: 0 removed, 345 remain, 345 scanned (100.00% of total)
>>         tuples: 0 removed, 10000 remain, 0 are dead but not yet removable
>>         removable cutoff: 724, older by 26 xids when operation ended
>>         index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
>>         avg read rate: 2.189 MB/s, avg write rate: 2.189 MB/s
>>         buffer usage: 695 hits, 2 misses, 2 dirtied
>>         WAL usage: 1 records, 0 full page images, 188 bytes
>>         system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
>> 2022-04-15 16:01:43.071 CEST [4436:79] pg_regress/test_setup STATEMENT:  VACUUM ANALYZE tenk1;
>
>The horizon advancing by 26 xids during tenk1's vacuum seems like quite
>a bit, given there's no normal concurrent activity during test_setup.
>
>
>> In fact, right after initdb pg_controldata shows
>> Latest checkpoint's NextXID:          0:724
>> Latest checkpoint's oldestXID:        716
>
>So that's the xmin that e.g. the autovac launcher ends up with during
>start...
>
>If I make get_database_list() sleep for 5s within the scan, I can
>reproduce on x86-64.

Off for a bit, but I realized that we likely don't exclude the launcher because it's not database associated...

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 15:23:40

I wrote:
> So there's no longer any doubt that something is holding back OldestXmin.
> I will go put some instrumentation into the code that's computing that.

The something is the logical replication launcher.  In the failing runs,
it is advertising xmin = 724 (the post-initdb NextXID) and continues to
do so well past the point where tenk1 gets vacuumed.

Discuss.

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 16:16:40

Andres Freund <andres@anarazel.de> writes:
> Off for a bit, but I realized that we likely don't exclude the launcher because it's not database associated...

Yeah. I think this bit in ComputeXidHorizons needs rethinking:

/*
* Normally queries in other databases are ignored for anything but
* the shared horizon. ...
*/
if (in_recovery ||
MyDatabaseId == InvalidOid || proc->databaseId == MyDatabaseId ||
proc->databaseId == 0) /* always include WalSender */
{

The "proc->databaseId == 0" business apparently means to include only
walsender processes, and it's broken because that condition doesn't
include only walsender processes.

At this point we have the following conclusions:

1. A slow transaction in the launcher's initial get_database_list()
call fully explains these failures. (I had been thinking that the
launcher's xact would have to persist as far as the create_index
script, but that's not so: it only has to last until test_setup
begins vacuuming tenk1. The CREATE INDEX steps are not doing any
visibility map changes of their own, but what they are doing is
updating relallvisible from the results of visibilitymap_count().
That's why they undid the effects of manually poking relallvisible,
without actually inserting any data better than what the initial
VACUUM computed.)

2. We can probably explain why only wrasse sees this as some quirk
of the Solaris scheduler. I'm satisfied to blame it-happens-in-
installcheck-but-not-check on that too.

3. It remains unclear why we suddenly started seeing this last week.
I suppose it has to be a side-effect of the pgstats changes, but
the mechanism is obscure. Probably not worth the effort to pin
down exactly why.

As for fixing it, what I think would be the preferable answer is to
fix the above-quoted logic so that it indeed includes only walsenders
and not random other background workers. (Why does it need to include
walsenders, anyway? The commentary sucks.) Alternatively, or perhaps
also, we could do what was discussed previously and make a hack to
allow delaying vacuum until the system is quiescent.

regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 16:17:56

Andres Freund <andres@anarazel.de> writes:
> On 2022-04-15 10:15:32 -0400, Tom Lane wrote:
>> removable cutoff: 724, older by 26 xids when operation ended

> The horizon advancing by 26 xids during tenk1's vacuum seems like quite
> a bit, given there's no normal concurrent activity during test_setup.

Hah, so you were taken in by this wording too.  See my complaint
about it downthread.

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

15 April 2022, 16:22:41

Hi,

On April 15, 2022 11:23:40 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>I wrote:
>> So there's no longer any doubt that something is holding back OldestXmin.
>> I will go put some instrumentation into the code that's computing that.
>
>The something is the logical replication launcher.  In the failing runs,
>it is advertising xmin = 724 (the post-initdb NextXID) and continues to
>do so well past the point where tenk1 gets vacuumed.
>
>Discuss.

That explains it. Before shmstat autovac needed to wait for the stats collector to write out stats. Now it's near
instantaneous.So the issue probably existed before, just unlikely to ever be reached. 

We can't just ignore database less xmins for non-shared rels, because walsender propagates hot_standby_feedback that
way.But we can probably add a flag somewhere indicating whether a database less PGPROC has to be accounted in the
horizonfor non-shared rels. 

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

15 April 2022, 16:29:20

On Fri, Apr 15, 2022 at 8:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> BTW, before I forget: the wording of this log message is just awful.
> On first sight, I thought that it meant that we'd computed OldestXmin
> a second time and discovered that it advanced by 26 xids while the VACUUM
> was running.

> "removable cutoff: %u, which was %d xids old when operation ended\n"

How the output appears when placed right before the output describing
how VACUUM advanced relfrozenxid is an important consideration. I want
the format and wording that we use to imply a relationship between
these two things. Right now, that other line looks like this:

"new relfrozenxid: %u, which is %d xids ahead of previous value\n"

Do you think that this juxtaposition works well?

> Also, is it really our practice to spell XID in lower-case in
> user-facing messages?

There are examples of both. This could easily be changed to "XIDs".

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 16:36:52

Andres Freund <andres@anarazel.de> writes:
> On April 15, 2022 11:23:40 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The something is the logical replication launcher.  In the failing runs,
>> it is advertising xmin = 724 (the post-initdb NextXID) and continues to
>> do so well past the point where tenk1 gets vacuumed.

> That explains it. Before shmstat autovac needed to wait for the stats collector to write out stats. Now it's near
instantaneous.So the issue probably existed before, just unlikely to ever be reached. 

Um, this is the logical replication launcher, not the autovac launcher.
Your observation that a sleep in get_database_list() reproduces it
confirms that, and I don't entirely see why the timing of the LR launcher
would have changed.

(On thinking about it, I suppose the AV launcher might trigger this
too, but that is not the PID I saw in testing.)

> We can't just ignore database less xmins for non-shared rels, because walsender propagates hot_standby_feedback that
way.But we can probably add a flag somewhere indicating whether a database less PGPROC has to be accounted in the
horizonfor non-shared rels. 

Yeah, I was also thinking about a flag in PGPROC being a more reliable
way to do this.  Is there anything besides walsenders that should set
that flag?

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 16:40:39

Peter Geoghegan <pg@bowt.ie> writes:
> On Fri, Apr 15, 2022 at 8:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> BTW, before I forget: the wording of this log message is just awful.
>> [ so how about ]
>> "removable cutoff: %u, which was %d xids old when operation ended\n"

> How the output appears when placed right before the output describing
> how VACUUM advanced relfrozenxid is an important consideration. I want
> the format and wording that we use to imply a relationship between
> these two things. Right now, that other line looks like this:

> "new relfrozenxid: %u, which is %d xids ahead of previous value\n"

> Do you think that this juxtaposition works well?

Seems all right to me; do you have a better suggestion?

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

15 April 2022, 16:47:44

On Fri, Apr 15, 2022 at 9:40 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Do you think that this juxtaposition works well?
>
> Seems all right to me; do you have a better suggestion?

No. At first I thought that mixing "which is" and "which was" wasn't
quite right. I changed my mind, though. Your new wording is fine.

I'll update the log output some time today.

--
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

15 April 2022, 16:57:01

Hi,

On 2022-04-15 12:36:52 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On April 15, 2022 11:23:40 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> The something is the logical replication launcher.  In the failing runs,
> >> it is advertising xmin = 724 (the post-initdb NextXID) and continues to
> >> do so well past the point where tenk1 gets vacuumed.
>
> > That explains it. Before shmstat autovac needed to wait for the stats collector to write out stats. Now it's near
instantaneous.So the issue probably existed before, just unlikely to ever be reached.

>
> Um, this is the logical replication launcher, not the autovac
> launcher.

Short term confusion...

> Your observation that a sleep in get_database_list() reproduces it
> confirms that

I don't understand what you mean here? get_database_list() is autovac
launcher code? So being able to reproduce the problem by putting in a
sleep there doesn't seem like a confirm anything about the logical rep
launcher?

> , and I don't entirely see why the timing of the LR launcher
> would have changed.

Could still be related to the autovac launcher not requesting / pgstats
not writing / launcher not reading the stats file(s). That obviously is
going to have some scheduler impact.

> > We can't just ignore database less xmins for non-shared rels, because walsender propagates hot_standby_feedback
thatway. But we can probably add a flag somewhere indicating whether a database less PGPROC has to be accounted in the
horizonfor non-shared rels.

>
> Yeah, I was also thinking about a flag in PGPROC being a more reliable
> way to do this.  Is there anything besides walsenders that should set
> that flag?

Not that I can think of. It's only because of hs_feedback that we need
to.  I guess it's possible that somebody build some extension that needs
something similar, but then they'd need to set that flag...

Greetings,

Andres Freund

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

15 April 2022, 16:58:50

Hi,

(Sent again, somehow my editor started to sometimes screw up mail
headers, and ate the From:, sorry for the duplicate)

On 2022-04-15 12:36:52 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On April 15, 2022 11:23:40 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> The something is the logical replication launcher.  In the failing runs,
> >> it is advertising xmin = 724 (the post-initdb NextXID) and continues to
> >> do so well past the point where tenk1 gets vacuumed.
>
> > That explains it. Before shmstat autovac needed to wait for the stats collector to write out stats. Now it's near
instantaneous.So the issue probably existed before, just unlikely to ever be reached.

>
> Um, this is the logical replication launcher, not the autovac
> launcher.

Short term confusion...

> Your observation that a sleep in get_database_list() reproduces it
> confirms that

I don't understand what you mean here? get_database_list() is autovac
launcher code? So being able to reproduce the problem by putting in a
sleep there doesn't seem like a confirm anything about the logical rep
launcher?

> , and I don't entirely see why the timing of the LR launcher
> would have changed.

Could still be related to the autovac launcher not requesting / pgstats
not writing / launcher not reading the stats file(s). That obviously is
going to have some scheduler impact.

> > We can't just ignore database less xmins for non-shared rels, because walsender propagates hot_standby_feedback
thatway. But we can probably add a flag somewhere indicating whether a database less PGPROC has to be accounted in the
horizonfor non-shared rels.

>
> Yeah, I was also thinking about a flag in PGPROC being a more reliable
> way to do this.  Is there anything besides walsenders that should set
> that flag?

Not that I can think of. It's only because of hs_feedback that we need
to.  I guess it's possible that somebody build some extension that needs
something similar, but then they'd need to set that flag...

Greetings,

Andres Freund

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 17:05:29

I wrote:
> Um, this is the logical replication launcher, not the autovac launcher.
> Your observation that a sleep in get_database_list() reproduces it
> confirms that, and I don't entirely see why the timing of the LR launcher
> would have changed.

Oh, to clarify: I misread "get_database_list()" as
"get_subscription_list()", which is the part of the LR launcher startup
that causes the problem for me.  So what we actually have confirmed is
that BOTH of those launchers are problematic for this.  And AFAICS
neither of them needs to be causing horizon adjustments for non-shared
tables.

(It's possible that the AV launcher is responsible in some of wrasse's
reports, but it's been the LR launcher in five out of five
sufficiently-instrumented failures for me.)

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

15 April 2022, 17:05:42

Hi,

On 2022-04-15 09:29:20 -0700, Peter Geoghegan wrote:
> On Fri, Apr 15, 2022 at 8:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > BTW, before I forget: the wording of this log message is just awful.
> > On first sight, I thought that it meant that we'd computed OldestXmin
> > a second time and discovered that it advanced by 26 xids while the VACUUM
> > was running.
> 
> > "removable cutoff: %u, which was %d xids old when operation ended\n"
> 
> How the output appears when placed right before the output describing
> how VACUUM advanced relfrozenxid is an important consideration. I want
> the format and wording that we use to imply a relationship between
> these two things. Right now, that other line looks like this:
>
> "new relfrozenxid: %u, which is %d xids ahead of previous value\n"
> 
> Do you think that this juxtaposition works well?

I don't think they're actually that comparable. One shows how much
relfrozenxid advanced, to a large degree influenced by the time between
aggressive (or "unintentionally aggressive") vacuums. The other shows
the age of OldestXmin at the end of the vacuum. Which is influenced by
what's currently running.

Greetings,

Andres Freund

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

15 April 2022, 17:11:03

On Fri, Apr 15, 2022 at 10:05 AM Andres Freund <andres@anarazel.de> wrote:
> I don't think they're actually that comparable. One shows how much
> relfrozenxid advanced, to a large degree influenced by the time between
> aggressive (or "unintentionally aggressive") vacuums.

It matters more in the extreme cases. The most recent possible value
for our new relfrozenxid is OldestXmin/removable cutoff. So when
something holds back OldestXmin, it also holds back new relfrozenxid
values.

> The other shows
> the age of OldestXmin at the end of the vacuum. Which is influenced by
> what's currently running.

As well as the age of OldestXmin at the start of VACUUM.

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 17:15:49

Peter Geoghegan <pg@bowt.ie> writes:
> On Fri, Apr 15, 2022 at 10:05 AM Andres Freund <andres@anarazel.de> wrote:
>> The other shows
>> the age of OldestXmin at the end of the vacuum. Which is influenced by
>> what's currently running.

> As well as the age of OldestXmin at the start of VACUUM.

Is it worth capturing and logging both of those numbers?  Why is
the age at the end more interesting than the age at the start?

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

15 April 2022, 17:23:56

On Fri, Apr 15, 2022 at 10:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > As well as the age of OldestXmin at the start of VACUUM.
>
> Is it worth capturing and logging both of those numbers?  Why is
> the age at the end more interesting than the age at the start?

As Andres said, that's often more interesting because most of the time
OldestXmin is not held back by much (not enough to matter).

Users will often look at the output of successive related VACUUM
operations. Often the way things change over time is much more
interesting than the details at any particular point in time.
Especially in the kinds of extreme cases I'm thinking about.

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

15 April 2022, 17:43:22

Hi,

On 2022-04-15 10:23:56 -0700, Peter Geoghegan wrote:
> On Fri, Apr 15, 2022 at 10:15 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > As well as the age of OldestXmin at the start of VACUUM.
> >
> > Is it worth capturing and logging both of those numbers?  Why is
> > the age at the end more interesting than the age at the start?
> 
> As Andres said, that's often more interesting because most of the time
> OldestXmin is not held back by much (not enough to matter).

I think it'd be interesting - particularly for large relations or when
looking to adjust autovac cost limits. It's not rare for autovac to take
long enough that another autovac is necessary immediately again. Also
helps to interpret the "dead but not yet removable" counts.

Something like:
removable cutoff: %u, age at start: %u, age at end: %u...

Greetings,

Andres Freund

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

15 April 2022, 18:12:34

On Fri, Apr 15, 2022 at 10:43 AM Andres Freund <andres@anarazel.de> wrote:
> I think it'd be interesting - particularly for large relations or when
> looking to adjust autovac cost limits.

> Something like:
> removable cutoff: %u, age at start: %u, age at end: %u...

Part of the problem here is that we determine VACUUM's FreezeLimit by
calculating `OldestXmin - vacuum_freeze_min_age` (more or less [1]).
Why should we do less freezing due to the presence of an old snapshot?
Sure, that has to happen with those XIDs that are fundamentally
ineligible for freezing due to the presence of the old snapshot -- but
what about those XIDs that *are* eligible, and still don't get frozen
at first?

We should determine FreezeLimit by calculating `NextXID -
vacuum_freeze_min_age ` instead (and then clamp, to make sure that
it's always <= OldestXmin). That approach would make our final
FreezeLimit "strictly age-based".

[1] We do something a bit like this when OldestXmin is already very
old -- then FreezeLimit is the same value as OldestXmin (see WARNING
from vacuum_set_xid_limits() function). That's better than nothing,
but doesn't change the fact that our general approach to calculating
FreezeLimit makes little sense.

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 18:14:47

Andres Freund <andres@anarazel.de> writes:
> On 2022-04-15 12:36:52 -0400, Tom Lane wrote:
>> Yeah, I was also thinking about a flag in PGPROC being a more reliable
>> way to do this.  Is there anything besides walsenders that should set
>> that flag?

> Not that I can think of. It's only because of hs_feedback that we need
> to.  I guess it's possible that somebody build some extension that needs
> something similar, but then they'd need to set that flag...

Here's a WIP patch for that.  The only exciting thing in it is that
because of some undocumented cowboy programming in walsender.c, the
    Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0);
in ProcArrayInstallRestoredXmin fires unless we skip that.

I could use some help filling in the XXX comments, because it's far
from clear to me *why* walsenders need this to happen.

            regards, tom lane

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index be40261393..a04a4a8e5d 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -285,6 +285,15 @@ InitWalSender(void)
     MarkPostmasterChildWalSender();
     SendPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE);

+    /*
+     * Also, mark ourselves in PGPROC as affecting vacuum horizons in all
+     * databases.  XXX explain why
+     */
+    LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+    MyProc->statusFlags |= PROC_AFFECTS_ALL_HORIZONS;
+    ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+    LWLockRelease(ProcArrayLock);
+
     /* Initialize empty timestamp buffer for lag tracking. */
     lag_tracker = MemoryContextAllocZero(TopMemoryContext, sizeof(LagTracker));
 }
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index f6e98aae29..8b867bd56c 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1821,10 +1821,12 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
          * to prune still needed data away). If the current backend never
          * connects to a database that is harmless, because
          * data_oldest_nonremovable will never be utilized.
+         *
+         * XXX explain PROC_AFFECTS_ALL_HORIZONS, too
          */
         if (in_recovery ||
             MyDatabaseId == InvalidOid || proc->databaseId == MyDatabaseId ||
-            proc->databaseId == 0)    /* always include WalSender */
+            (statusFlags & PROC_AFFECTS_ALL_HORIZONS))
         {
             /*
              * We can ignore this backend if it's running CREATE INDEX
@@ -2681,10 +2683,14 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc)
         /* Install xmin */
         MyProc->xmin = TransactionXmin = xmin;

-        /* Flags being copied must be valid copy-able flags. */
-        Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0);
-        MyProc->statusFlags = proc->statusFlags;
-        ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+        /* walsender cheats by passing proc == MyProc, don't check flags */
+        if (proc != MyProc)
+        {
+            /* Flags being copied must be valid copy-able flags. */
+            Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0);
+            MyProc->statusFlags = proc->statusFlags;
+            ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+        }

         result = true;
     }
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 809b74db5e..e132665938 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -60,6 +60,9 @@ struct XidCache
 #define        PROC_VACUUM_FOR_WRAPAROUND    0x08    /* set by autovac only */
 #define        PROC_IN_LOGICAL_DECODING    0x10    /* currently doing logical
                                                  * decoding outside xact */
+#define        PROC_AFFECTS_ALL_HORIZONS    0x20    /* proc's xmin must be
+                                                 * included in vacuum horizons
+                                                 * in all databases */

 /* flags reset at EOXact */
 #define        PROC_VACUUM_STATE_MASK \

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

15 April 2022, 19:01:38

Hi,

On April 15, 2022 2:14:47 PM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Andres Freund <andres@anarazel.de> writes:
>> On 2022-04-15 12:36:52 -0400, Tom Lane wrote:
>>> Yeah, I was also thinking about a flag in PGPROC being a more reliable
>>> way to do this.  Is there anything besides walsenders that should set
>>> that flag?
>
>> Not that I can think of. It's only because of hs_feedback that we need
>> to.  I guess it's possible that somebody build some extension that needs
>> something similar, but then they'd need to set that flag...
>
>Here's a WIP patch for that.  The only exciting thing in it is that
>because of some undocumented cowboy programming in walsender.c, the
>    Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0);
>in ProcArrayInstallRestoredXmin fires unless we skip that.
>
>I could use some help filling in the XXX comments, because it's far
>from clear to me *why* walsenders need this to happen.

I'm out for the rest of the day due to family events (visiting my girlfriend's parents till Wednesday), I can take a
stabat formulating something after.  

If you want to  commit before: The reason is that walsenders use their xmin to represent the xmin of standbys when
usinghot_standby_feedback. Since we're only transmitting global horizons up from standbys, it has to influence globally
(andit would be hard to represent per db horizons anyway). 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

15 April 2022, 21:51:26

Andres Freund <andres@anarazel.de> writes:
> On April 15, 2022 2:14:47 PM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I could use some help filling in the XXX comments, because it's far
>> from clear to me *why* walsenders need this to happen.

> If you want to  commit before: The reason is that walsenders use their xmin to represent the xmin of standbys when
usinghot_standby_feedback. Since we're only transmitting global horizons up from standbys, it has to influence globally
(andit would be hard to represent per db horizons anyway). 

Got it.  I rewrote the comments and pushed.  Noah, it should be safe
to turn wrasse back on.

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Andres Freund

Date:

17 April 2022, 14:36:15

Hi,

On 2022-04-15 11:12:34 -0700, Peter Geoghegan wrote:
> On Fri, Apr 15, 2022 at 10:43 AM Andres Freund <andres@anarazel.de> wrote:
> > I think it'd be interesting - particularly for large relations or when
> > looking to adjust autovac cost limits.
> 
> > Something like:
> > removable cutoff: %u, age at start: %u, age at end: %u...
> 
> Part of the problem here is that we determine VACUUM's FreezeLimit by
> calculating `OldestXmin - vacuum_freeze_min_age` (more or less [1]).

What the message outputs is OldestXmin and not FreezeLimit though. And
FreezeLimit doesn't affect "dead but not yet removable".

IOW, the following might be right, but that seems independent of
improving the output of
            diff = (int32) (ReadNextTransactionId() - OldestXmin);
            appendStringInfo(&buf,
                             _("removable cutoff: %u, which was %d XIDs old when operation ended\n"),
                             OldestXmin, diff);


> Why should we do less freezing due to the presence of an old snapshot?
> Sure, that has to happen with those XIDs that are fundamentally
> ineligible for freezing due to the presence of the old snapshot -- but
> what about those XIDs that *are* eligible, and still don't get frozen
> at first?
> 
> We should determine FreezeLimit by calculating `NextXID -
> vacuum_freeze_min_age ` instead (and then clamp, to make sure that
> it's always <= OldestXmin). That approach would make our final
> FreezeLimit "strictly age-based".
> 
> [1] We do something a bit like this when OldestXmin is already very
> old -- then FreezeLimit is the same value as OldestXmin (see WARNING
> from vacuum_set_xid_limits() function). That's better than nothing,
> but doesn't change the fact that our general approach to calculating
> FreezeLimit makes little sense.

Greetings,

Andres Freund

Re: Intermittent buildfarm failures on wrasse

From

Peter Geoghegan

Date:

17 April 2022, 15:29:50

On Sun, Apr 17, 2022 at 7:36 AM Andres Freund <andres@anarazel.de> wrote:
> > Part of the problem here is that we determine VACUUM's FreezeLimit by
> > calculating `OldestXmin - vacuum_freeze_min_age` (more or less [1]).
>
> What the message outputs is OldestXmin and not FreezeLimit though.

My higher level point is that there is a general tendency to assume
that OldestXmin is the same thing as NextXID, which it isn't. It's an
easy enough mistake to make, though, in part because they're usually
quite close together. The "Routine Vacuuming" docs seem to suggest
that they're the same thing, or at least that's what I take away from
the following sentence:

"This implies that if a table is not otherwise vacuumed, autovacuum
will be invoked on it approximately once every
autovacuum_freeze_max_age minus vacuum_freeze_min_age transactions".

>  And FreezeLimit doesn't affect "dead but not yet removable".

But OldestXmin affects FreezeLimit.

Anyway, I'm not opposed to showing the age at the start as well. But
from the point of view of issues like this tenk1 issue, it would be
more useful to just report on new_rel_allvisible. It would also be
more useful to users.

-- 
Peter Geoghegan

Re: Intermittent buildfarm failures on wrasse

From

Alvaro Herrera

Date:

19 April 2022, 16:37:13

On 2022-Apr-15, Tom Lane wrote:

> Here's a WIP patch for that.  The only exciting thing in it is that
> because of some undocumented cowboy programming in walsender.c, the
>     Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0);
> in ProcArrayInstallRestoredXmin fires unless we skip that.

Hmm, maybe a better use of that define is to use to select which flags
to copy, rather than to ensure we they are the only ones set.  What
about this?


-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"¿Qué importan los años?  Lo que realmente importa es comprobar que
a fin de cuentas la mejor edad de la vida es estar vivo"  (Mafalda)

Attachment

copyflags.patch

Re: Intermittent buildfarm failures on wrasse

From

Tom Lane

Date:

19 April 2022, 18:29:06

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2022-Apr-15, Tom Lane wrote:
>> Here's a WIP patch for that.  The only exciting thing in it is that
>> because of some undocumented cowboy programming in walsender.c, the
>> Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0);
>> in ProcArrayInstallRestoredXmin fires unless we skip that.

> Hmm, maybe a better use of that define is to use to select which flags
> to copy, rather than to ensure we they are the only ones set.  What
> about this?

Yeah, I thought about that too, but figured that the author probably
had a reason for writing the assertion the way it was.  If we want
to redefine PROC_COPYABLE_FLAGS as "flags associated with xmin",
that's fine by me.  But I'd suggest that both the name of the macro
and the comment for it in proc.h should be revised to match that
definition.

Another point is that as you've coded it, the code doesn't so much
copy those flags as union them with whatever the recipient had,
which seems wrong.  I could go with either

    Assert(!(MyProc->statusFlags & PROC_XMIN_FLAGS));
    MyProc->statusFlags |= (proc->statusFlags & PROC_XMIN_FLAGS);

or

    MyProc->statusFlags = (MyProc->statusFlags & ~PROC_XMIN_FLAGS) |
                          (proc->statusFlags & PROC_XMIN_FLAGS);

Perhaps the latter is more future-proof.

            regards, tom lane

Re: Intermittent buildfarm failures on wrasse

From

Masahiko Sawada

Date:

20 April 2022, 04:55:48

On Wed, Apr 20, 2022 at 3:29 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > On 2022-Apr-15, Tom Lane wrote:
> >> Here's a WIP patch for that.  The only exciting thing in it is that
> >> because of some undocumented cowboy programming in walsender.c, the
> >> Assert((proc->statusFlags & (~PROC_COPYABLE_FLAGS)) == 0);
> >> in ProcArrayInstallRestoredXmin fires unless we skip that.
>
> > Hmm, maybe a better use of that define is to use to select which flags
> > to copy, rather than to ensure we they are the only ones set.  What
> > about this?
>
> Yeah, I thought about that too, but figured that the author probably
> had a reason for writing the assertion the way it was.

The motivation behind the assertion was that when we copy whole
statusFlags from the leader process to the worker process we want to
make sure that the flags we're copying is a known subset of the flags
that are valid to copy from the leader.

> If we want
> to redefine PROC_COPYABLE_FLAGS as "flags associated with xmin",
> that's fine by me.  But I'd suggest that both the name of the macro
> and the comment for it in proc.h should be revised to match that
> definition.
>
> Another point is that as you've coded it, the code doesn't so much
> copy those flags as union them with whatever the recipient had,
> which seems wrong.  I could go with either
>
>     Assert(!(MyProc->statusFlags & PROC_XMIN_FLAGS));
>     MyProc->statusFlags |= (proc->statusFlags & PROC_XMIN_FLAGS);
>
> or
>
>     MyProc->statusFlags = (MyProc->statusFlags & ~PROC_XMIN_FLAGS) |
>                           (proc->statusFlags & PROC_XMIN_FLAGS);
>
> Perhaps the latter is more future-proof.

Copying only xmin-related flags in this way also makes sense to me and
there is no problem at least for now. A note would be that when we
introduce a new flag that needs to be copied in the future, we need to
make sure to add it to PROC_XMIN_FLAGS so it is copied. Otherwise a
similar issue we fixed by 0f0cfb494004befb0f6e could happen again.

Regards,


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Intermittent buildfarm failures on wrasse

From

Alvaro Herrera

Date:

14 May 2022, 14:53:00

On 2022-Apr-20, Masahiko Sawada wrote:

> >     MyProc->statusFlags = (MyProc->statusFlags & ~PROC_XMIN_FLAGS) |
> >                           (proc->statusFlags & PROC_XMIN_FLAGS);
> >
> > Perhaps the latter is more future-proof.

> Copying only xmin-related flags in this way also makes sense to me and
> there is no problem at least for now. A note would be that when we
> introduce a new flag that needs to be copied in the future, we need to
> make sure to add it to PROC_XMIN_FLAGS so it is copied. Otherwise a
> similar issue we fixed by 0f0cfb494004befb0f6e could happen again.

OK, done this way -- patch attached.

Reading the comment I wrote about it, I wonder if flags
PROC_AFFECTS_ALL_HORIZONS and PROC_IN_LOGICAL_DECODING should also be
included.  I think the only reason we don't care at this point is that
walsenders (logical or otherwise) do not engage in snapshot copying.
But if we were to implement usage of parallel workers sharing a common
snapshot to do table sync in parallel, then it ISTM it would be
important to copy at least the latter.  Not sure there are any cases
were we might care about the former.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Every machine is a smoke machine if you operate it wrong enough."
https://twitter.com/libseybieda/status/1541673325781196801

Attachment

v2-0001-Repurpose-PROC_COPYABLE_FLAGS-as-PROC_XMIN_FLAGS.patch

Re: Intermittent buildfarm failures on wrasse

From

Masahiko Sawada

Date:

18 May 2022, 07:49:04

On Sun, May 15, 2022 at 12:29 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2022-Apr-20, Masahiko Sawada wrote:
>
> > >     MyProc->statusFlags = (MyProc->statusFlags & ~PROC_XMIN_FLAGS) |
> > >                           (proc->statusFlags & PROC_XMIN_FLAGS);
> > >
> > > Perhaps the latter is more future-proof.
>
> > Copying only xmin-related flags in this way also makes sense to me and
> > there is no problem at least for now. A note would be that when we
> > introduce a new flag that needs to be copied in the future, we need to
> > make sure to add it to PROC_XMIN_FLAGS so it is copied. Otherwise a
> > similar issue we fixed by 0f0cfb494004befb0f6e could happen again.
>
> OK, done this way -- patch attached.

Thank you for updating the patch.

>
> Reading the comment I wrote about it, I wonder if flags
> PROC_AFFECTS_ALL_HORIZONS and PROC_IN_LOGICAL_DECODING should also be
> included.  I think the only reason we don't care at this point is that
> walsenders (logical or otherwise) do not engage in snapshot copying.
> But if we were to implement usage of parallel workers sharing a common
> snapshot to do table sync in parallel, then it ISTM it would be
> important to copy at least the latter.  Not sure there are any cases
> were we might care about the former.

Yeah, it seems to be inconsistent between the comment (and the new
name) and the flags actually included. I think we can include all
xmin-related flags to PROC_XMIN_FLAGS as the comment says. That way,
it would be useful also for other use cases, and I don't see any
downside for now.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Intermittent buildfarm failures on wrasse

From

Alvaro Herrera

Date:

19 May 2022, 10:21:58

I was looking at backpatching this to pg13.  That made me realize that
commit dc7420c2c927 changed things in 14; and before that commit, the
bitmask that is checked is PROCARRAY_FLAGS_VACUUM, which has a
definition independently from whatever proc.h says.  As far as I can
tell, there's no problem with the patches I post here (the backpatched
version for pg13 and p14).  But's it's something to be aware of; and if
we do want to add the additional bits to the bitmask, we should do that
in a separate master-only commit.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/