Re: Parallel Append subplan order instability on aye-aye - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Parallel Append subplan order instability on aye-aye
Date
Msg-id 22315.1563378828@sss.pgh.pa.us
Whole thread Raw
In response to Re: Parallel Append subplan order instability on aye-aye  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Parallel Append subplan order instability on aye-aye  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> writes:
> Surely it can't be that since that just sets what *pages gets set to.
> Tom mentioned that following was returning 0 pages and tuples:

> -- Temporary hack to investigate whether extra vacuum/analyze is happening
> select relname, relpages, reltuples
> from pg_class
> where relname like '__star' order by relname;
>  relname | relpages | reltuples
> ---------+----------+-----------
>  a_star  |        1 |         3

I poked around a little and came up with a much simpler theory:
VACUUM will not change relpages/reltuples if it does not scan any pages
(cf. special case for tupcount_pages == 0 in heap_vacuum_rel, at line 343
in HEAD's vacuumlazy.c).  And, because sanity_check.sql's VACUUM is a
plain unaggressive vacuum, all that it takes to make it skip over a_star's
one page is for somebody else to have a pin on that page.  So a chance
collision with the bgwriter or checkpointer could cause the observed
symptom, not just for a_star but for the other single-page relations that
are at stake here.  Those pages are certainly dirty after create_misc.sql,
so it's hardly implausible for one of these processes to be holding pin
while trying to write out the buffer at the time sanity_check.sql runs.

A brute-force way to fix this (or at least reduce the odds quite a bit)
would be to have sanity_check.sql issue a CHECKPOINT before its VACUUM,
thereby guaranteeing that none of these pages are still in need of being
written.  Not sure how much that'd penalize the regression tests' runtime,
or whether we'd have a loss of test coverage of VACUUM behaviors.

            regards, tom lane



pgsql-hackers by date:

Previous
From: r.zharkov@postgrespro.ru
Date:
Subject: Re: Intermittent pg_ctl failures on Windows
Next
From: Tom Lane
Date:
Subject: sepgsql seems rather thoroughly broken on Fedora 30