Re: [COMMITTERS] pgsql: Fix inadequate locking during get_rel_oids(). - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [COMMITTERS] pgsql: Fix inadequate locking during get_rel_oids().
Date
Msg-id 20180306001152.qxguen2kclystz7p@alap3.anarazel.de
Whole thread Raw
Responses Re: [COMMITTERS] pgsql: Fix inadequate locking during get_rel_oids().  (Andres Freund <andres@anarazel.de>)
Re: [COMMITTERS] pgsql: Fix inadequate locking during get_rel_oids().  (Andres Freund <andres@anarazel.de>)
Re: [COMMITTERS] pgsql: Fix inadequate locking during get_rel_oids().  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi Tom,

On 2017-09-29 20:26:42 +0000, Tom Lane wrote:
> get_rel_oids used to not take any relation locks at all, but that stopped
> being a good idea with commit 3c3bb9933, which inserted a syscache lookup
> into the function.  A concurrent DROP TABLE could now produce "cache lookup
> failed", which we don't want to have happen in normal operation.  The best
> solution seems to be to transiently take a lock on the relation named by
> the RangeVar (which also makes the result of RangeVarGetRelid a lot less
> spongy).  But we shouldn't hold the lock beyond this function, because we
> don't want VACUUM to lock more than one table at a time.  (That would not
> be a big problem right now, but it will become one after the pending
> feature patch to allow multiple tables to be named in VACUUM.)

I'm not sure how big a problem this is, but I suspect it is
one. Autovacuum used to skip relations when they're locked, and the
vacuum isn't an anti-wraparound one.  But after this change it appears
we'll get stuck behind this new lock, even if VACOPT_NOWAIT is
specified.  That's bad because now relations that are AEL locked
(e.g. because of some longrunning DDL) can block global autovacuum
progress.

I noticed this when reviewing a patch that adds NOWAIT (now renamed to
SKIP LOCKED) as a user visible knob and it getting stuck behind an
AEL. The updated version of the patch
http://archives.postgresql.org/message-id/7327B413-1A57-477F-A6A0-6FD80CB5482D%40amazon.com
has an attempt at fixing the issue, although I've not yet looked at it
in any sort of detail.

I suspect we should break out that part once reviewed, and backpatch to
10?

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: 2018-03 CFM
Next
From: Peter Geoghegan
Date:
Subject: Re: Faster inserts with mostly-monotonically increasing values