Re: Fwd: Fwd: Postgres attach partition: AccessExclusive lock set on different tables depending on how attaching is performed - Mailing list pgsql-general

From Alvaro Herrera
Subject Re: Fwd: Fwd: Postgres attach partition: AccessExclusive lock set on different tables depending on how attaching is performed
Date
Msg-id 202411131149.rax7xn7gnkxi@alvherre.pgsql
Whole thread Raw
In response to Re: Fwd: Fwd: Postgres attach partition: AccessExclusive lock set on different tables depending on how attaching is performed  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On 2024-Nov-10, Tom Lane wrote:

> This surprised me a bit too, because I thought we took a
> slightly-less-than-exclusive lock for FK additions or deletions.
> Tracing through it, I find that CloneFkReferencing opens the
> referenced relation with ShareRowExclusiveLock as I expected.
> But then we conclude that we can drop the existing FK enforcement
> triggers for the table being attached.  That causes us to take
> AccessExclusiveLock on the trigger itself, which is fine because
> nobody's really paying attention to that.  But then RemoveTriggerById
> takes AccessExclusiveLock on the trigger's table.  We already had
> that on the table being attached, but not on the other table.

Oooh.

> I wonder whether it'd be all right for RemoveTriggerById to take
> only ShareRowExclusiveLock on the trigger's table.  This seems
> OK in terms of basic semantics: that's enough to lock out
> anything that might want to fire triggers on the table.  However,
> this comment for AlterTableGetLockLevel gives me pause:
> 
>  * Also note that pg_dump uses only an AccessShareLock, meaning that anything
>  * that takes a lock less than AccessExclusiveLock can change object definitions
>  * while pg_dump is running. Be careful to check that the appropriate data is
>  * derived by pg_dump using an MVCC snapshot, rather than syscache lookups,
>  * otherwise we might end up with an inconsistent dump that can't restore.
> 
> I think pg_dump uses pg_get_triggerdef, which is probably not
> safe in these terms.

Looking at pg_get_triggerdef_worker, it is not using syscache but a
systable scan, which uses the catalog snapshot.  A catalog snapshot is
indeed implemented as an MVCC snapshot (so strictly speaking it _is_ an
MVCC snapshot), but the invalidation rules are different from a normal
MVCC snapshot, so AFAIU it's still unsafe.

> An alternative answer might be what Alvaro was muttering about
> the other day: redesign FKs for partitioned tables so that we
> do not have to change the set of triggers when attaching/detaching.

Hmm, I hadn't thought about this idea in those terms, but perhaps we
could reimplement this by not having one trigger for each RI check, but
instead a single trigger which internally determines which FK
constraints exist on the table and does the necessary work in a single
pass.  Then we don't need to add/drop triggers all the time, but we just
add it with the first FK in the table, and remove it when dropping the
last FK.

For tables with many FKs, this could be a win, because we'd only go
through the trigger machinery once.  If a table has both outgoing and
incoming FKs, maybe we could have _one_ single trigger.

(I think this would be orthogonal with the project to stop using SPI for
RI triggers.)

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



pgsql-general by date:

Previous
From: Vijaykumar Jain
Date:
Subject: Re: Fwd: A million users
Next
From: Kaare Rasmussen
Date:
Subject: Re: Fwd: A million users