Re: SimpleLruTruncate() mutual exclusion - Mailing list pgsql-hackers

From Tom Lane
Subject Re: SimpleLruTruncate() mutual exclusion
Date
Msg-id 29566.1564419497@sss.pgh.pa.us
Whole thread Raw
In response to Re: SimpleLruTruncate() mutual exclusion  (Noah Misch <noah@leadboat.com>)
Responses Re: SimpleLruTruncate() mutual exclusion  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
Noah Misch <noah@leadboat.com> writes:
> On Sun, Feb 17, 2019 at 11:31:03PM -0800, Noah Misch wrote:
>>> b. Arrange so only one backend runs vac_truncate_clog() at a time.  Other than
>>> AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
>>> checkpoint, or in the startup process.  Hence, also arrange for only one
>>> backend to call SimpleLruTruncate(AsyncCtl) at a time.

>> More specifically, restrict vac_update_datfrozenxid() to one backend per
>> database, and restrict vac_truncate_clog() and asyncQueueAdvanceTail() to one
>> backend per cluster.  This, attached, was rather straightforward.

> Rebased.  The conflicts were limited to comments and documentation.

I tried to review this, along with your adjacent patch to adjust the
segment-roundoff logic, but I didn't get far.  I see the point that
the roundoff might create problems when we are close to hitting
wraparound.  I do not, however, see why serializing vac_truncate_clog
helps.  I'm pretty sure it was intentional that multiple backends
could run truncation directory scans concurrently, and I don't really
want to give that up if possible.

Also, if I understand the data-loss hazard properly, it's what you
said in the other thread: the latest_page_number could advance after
we make our decision about what to truncate, and then maybe we could
truncate newly-added data.  We surely don't want to lock out the
operations that can advance last_page_number, so how does serializing
vac_truncate_clog help?

I wonder whether what we need to do is add some state in shared
memory saying "it is only safe to create pages before here", and
make SimpleLruZeroPage or its callers check that.  The truncation
logic would advance that value, but only after completing a scan.
(Oh ..., hmm, maybe the point of serializing truncations is to
ensure that we know that we can safely advance that?)

Can you post whatever script you used to detect/reproduce the problem
in the first place?

            regards, tom lane

PS: also, re-reading this code, it's worrisome that we are not checking
for failure of the unlink calls.  I think the idea was that it didn't
really matter because if we did re-use an existing file we'd just
re-zero it; hence failing the truncate is an overreaction.  But probably
some comments about that are in order.



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: should there be a hard-limit on the number of transactionspending undo?
Next
From: Jeevan Ladhe
Date:
Subject: concerns around pg_lsn