Re: SimpleLruTruncate() mutual exclusion - Mailing list pgsql-hackers

From Noah Misch
Subject Re: SimpleLruTruncate() mutual exclusion
Date
Msg-id 20190628170628.GA1238361@rfd.leadboat.com
Whole thread Raw
In response to SimpleLruTruncate() mutual exclusion  (Noah Misch <noah@leadboat.com>)
Responses Re: SimpleLruTruncate() mutual exclusion
Re: SimpleLruTruncate() mutual exclusion
List pgsql-hackers
On Sun, Feb 17, 2019 at 11:31:03PM -0800, Noah Misch wrote:
> I'm forking this thread from
> https://postgr.es/m/flat/20190202083822.GC32531@gust.leadboat.com, which
> reported a race condition involving the "apparent wraparound" safety check in
> SimpleLruTruncate():
> 
> On Wed, Feb 13, 2019 at 11:26:23PM -0800, Noah Misch wrote:
> > 1. The result of the test is valid only until we release the SLRU ControlLock,
> >    which we do before SlruScanDirCbDeleteCutoff() uses the cutoff to evaluate
> >    segments for deletion.  Once we release that lock, latest_page_number can
> >    advance.  This creates a TOCTOU race condition, allowing excess deletion:
> > 
> >    [local] test=# table trunc_clog_concurrency ;
> >    ERROR:  could not access status of transaction 2149484247
> >    DETAIL:  Could not open file "pg_xact/0801": No such file or directory.
> 
> > Fixes are available:
> 
> > b. Arrange so only one backend runs vac_truncate_clog() at a time.  Other than
> >    AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
> >    checkpoint, or in the startup process.  Hence, also arrange for only one
> >    backend to call SimpleLruTruncate(AsyncCtl) at a time.
> 
> More specifically, restrict vac_update_datfrozenxid() to one backend per
> database, and restrict vac_truncate_clog() and asyncQueueAdvanceTail() to one
> backend per cluster.  This, attached, was rather straightforward.

Rebased.  The conflicts were limited to comments and documentation.

> I wonder about performance in a database with millions of small relations,
> particularly considering my intent to back-patch this.  In such databases,
> vac_update_datfrozenxid() can be a major part of the VACUUM's cost.  Two
> things work in our favor.  First, vac_update_datfrozenxid() runs once per
> VACUUM command, not once per relation.  Second, Autovacuum has this logic:
> 
>      * ... we skip
>      * this if (1) we found no work to do and (2) we skipped at least one
>      * table due to concurrent autovacuum activity.  In that case, the other
>      * worker has already done it, or will do so when it finishes.
>      */
>     if (did_vacuum || !found_concurrent_worker)
>         vac_update_datfrozenxid();
> 
> That makes me relatively unworried.  I did consider some alternatives:
> 
> - Run vac_update_datfrozenxid()'s pg_class scan before taking a lock.  If we
>   find the need for pg_database updates, take the lock and scan pg_class again
>   to get final numbers.  This doubles the work in cases that end up taking the
>   lock, so I'm not betting it being a net win.
> 
> - Use LockWaiterCount() to skip vac_update_datfrozenxid() if some other
>   backend is already waiting.  This is similar to Autovacuum's
>   found_concurrent_worker test.  It is tempting.  I'm not proposing it,
>   because it changes the states possible when manual VACUUM completes.  Today,
>   you can predict post-VACUUM datfrozenxid from post-VACUUM relfrozenxid
>   values.  If manual VACUUM could skip vac_update_datfrozenxid() this way,
>   datfrozenxid could lag until some concurrent VACUUM finishes.

Attachment

pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Avoid full GIN index scan when possible
Next
From: Tom Lane
Date:
Subject: Re: Commitfest 2019-07, the first of five* for PostgreSQL 13