Thread: Re: locking [user] catalog tables vs 2pc vs logical rep

Re: locking [user] catalog tables vs 2pc vs logical rep

From
Ajin Cherian
Date:


On Tue, Mar 16, 2021 at 1:36 AM vignesh C <vignesh21@gmail.com> wrote:
On Tue, Feb 23, 2021 at 3:59 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> The 2pc decoding added in
>
> commit a271a1b50e9bec07e2ef3a05e38e7285113e4ce6
> Author: Amit Kapila <akapila@postgresql.org>
> Date:   2021-01-04 08:34:50 +0530
>
>     Allow decoding at prepare time in ReorderBuffer.
>
> has a deadlock danger when used in a way that takes advantage of
> separate decoding of the 2PC PREPARE.
>
>
> I assume the goal of decoding the 2PC PREPARE is so one can wait for the
> PREPARE to have logically replicated, before doing the COMMIT PREPARED.
>
>
> However, currently it's pretty easy to get into a state where logical
> decoding cannot progress until the 2PC transaction has
> committed/aborted. Which essentially would lead to undetected deadlocks.
>
> The problem is that catalog tables accessed during logical decoding need
> to get locked (otherwise e.g. a table rewrite could happen
> concurrently). But if the prepared transaction itself holds a lock on a
> catalog table, logical decoding will block on that lock - which won't be
> released until replication progresses. A deadlock.
>
> A trivial example:
>
> SELECT pg_create_logical_replication_slot('test', 'test_decoding');
> CREATE TABLE foo(id serial primary key);
> BEGIN;
> LOCK pg_class;
> INSERT INTO foo DEFAULT VALUES;
> PREPARE TRANSACTION 'foo';
>
> -- hangs waiting for pg_class to be unlocked
> SELECT pg_logical_slot_get_changes('test', NULL, NULL, 'two-phase-commit', '1');
>
>
> Now, more realistic versions of this scenario would probably lock a
> 'user catalog table' containing replication metadata instead of
> pg_class, but ...
>
>
> At first this seems to be a significant issue. But on the other hand, if
> you were to shut the cluster down in this situation (or disconnect all
> sessions), you have broken cluster on your hand - without logical
> decoding being involved. As it turns out, we need to read pg_class to
> log in...  And I can't remember this being reported to be a problem?
>
>
> Perhaps all that we need to do is to disallow 2PC prepare if [user]
> catalog tables have been locked exclusively? Similar to how we're
> disallowing preparing tables with temp table access.
>

Even I felt we should not allow prepare a transaction that has locked
system tables, as it does not allow creating a new session after
restart and also causes the deadlock while logical decoding of
prepared transaction.
I have made a patch to make the prepare transaction fail in this
scenario. Attached the patch for the same.
Thoughts?


The patch applies fine on HEAD and "make check" passes fine. No major comments on the patch, just a minor comment:

If you could change the error from, " cannot PREPARE a transaction that has a lock on user catalog/system table(s)"
to "cannot PREPARE a transaction that has an exclusive lock on user catalog/system table(s)" that would be a more
accurate instruction to the user.

regards,
Ajin Cherian
Fujitsu Australia

Re: locking [user] catalog tables vs 2pc vs logical rep

From
vignesh C
Date:
On Wed, Mar 31, 2021 at 2:35 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
> The patch applies fine on HEAD and "make check" passes fine. No major comments on the patch, just a minor comment:
>
> If you could change the error from, " cannot PREPARE a transaction that has a lock on user catalog/system table(s)"
> to "cannot PREPARE a transaction that has an exclusive lock on user catalog/system table(s)" that would be a more
> accurate instruction to the user.
>

Thanks for reviewing the patch.
Please find the updated patch which includes the fix for the same.

Regards,
Vignesh

Attachment

Re: locking [user] catalog tables vs 2pc vs logical rep

From
vignesh C
Date:
On Wed, Mar 31, 2021 at 5:47 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, Mar 31, 2021 at 2:35 PM Ajin Cherian <itsajin@gmail.com> wrote:
> >
> > The patch applies fine on HEAD and "make check" passes fine. No major comments on the patch, just a minor comment:
> >
> > If you could change the error from, " cannot PREPARE a transaction that has a lock on user catalog/system
table(s)"
> > to "cannot PREPARE a transaction that has an exclusive lock on user catalog/system table(s)" that would be a more
> > accurate instruction to the user.
> >
>
> Thanks for reviewing the patch.
> Please find the updated patch which includes the fix for the same.

This similar problem exists in case of synchronous replication setup
having synchronous_standby_names referring to the subscriber, when we
do the steps "begin;lock pg_class; insert into test1 values(10);
commit". In this case while decoding of commit, the commit will wait
while trying to acquire a lock on pg_class relation, stack trace for
the same is given below:
#4 0x0000556936cd5d37 in ProcSleep (locallock=0x556937de8728,
lockMethodTable=0x5569371c2620 <default_lockmethod>) at proc.c:1361
#5 0x0000556936cc294a in WaitOnLock (locallock=0x556937de8728,
owner=0x556937e3cd90) at lock.c:1858
#6 0x0000556936cc1231 in LockAcquireExtended (locktag=0x7ffcbb23cff0,
lockmode=1, sessionLock=false, dontWait=false, reportMemoryError=true,
locallockp=0x7ffcbb23cfe8)
at lock.c:1100
#7 0x0000556936cbdbce in LockRelationOid (relid=1259, lockmode=1) at lmgr.c:117
#8 0x00005569367afb12 in relation_open (relationId=1259, lockmode=1)
at relation.c:56
#9 0x00005569368888a2 in table_open (relationId=1259, lockmode=1) at table.c:43
#10 0x0000556936e90a91 in RelidByRelfilenode (reltablespace=0,
relfilenode=16385) at relfilenodemap.c:192
#11 0x0000556936c40361 in ReorderBufferProcessTXN (rb=0x556937e8e760,
txn=0x556937eb8778, commit_lsn=23752880, snapshot_now=0x556937ea0a90,
command_id=0, streaming=false)
at reorderbuffer.c:2122
#12 0x0000556936c411b7 in ReorderBufferReplay (txn=0x556937eb8778,
rb=0x556937e8e760, xid=590, commit_lsn=23752880, end_lsn=23752928,
commit_time=672204445820756,
origin_id=0, origin_lsn=0) at reorderbuffer.c:2589
#13 0x0000556936c41239 in ReorderBufferCommit (rb=0x556937e8e760,
xid=590, commit_lsn=23752880, end_lsn=23752928,
commit_time=672204445820756, origin_id=0, origin_lsn=0)
at reorderbuffer.c:2613
#14 0x0000556936c2f4d9 in DecodeCommit (ctx=0x556937e8c750,
buf=0x7ffcbb23d610, parsed=0x7ffcbb23d4b0, xid=590, two_phase=false)
at decode.c:744

Thoughts?

Regards,
Vignesh



Re: locking [user] catalog tables vs 2pc vs logical rep

From
Amit Kapila
Date:
On Tue, Apr 20, 2021 at 9:57 AM vignesh C <vignesh21@gmail.com> wrote:
>
> This similar problem exists in case of synchronous replication setup
> having synchronous_standby_names referring to the subscriber, when we
> do the steps "begin;lock pg_class; insert into test1 values(10);
> commit". In this case while decoding of commit, the commit will wait
> while trying to acquire a lock on pg_class relation,
>

So, this appears to be an existing caveat of synchronous replication.
If that is the case, I am not sure if it is a good idea to just block
such ops for the prepared transaction. Also, what about other
operations which acquire an exclusive lock on [user]_catalog_tables
like:
cluster pg_trigger using pg_class_oid_index, similarly cluster on any
user_catalog_table, then the other problematic operation could
truncate of user_catalog_table as is discussed in another thread [1].
I think all such operations can block even with synchronous
replication. I am not sure if we can create examples for all cases
because for ex. we don't have use of user_catalog_tables in-core but
maybe for others, we can try to create examples and see what happens?

If all such operations can block for synchronous replication and
prepared transactions replication then we might want to document them
as caveats at page:
https://www.postgresql.org/docs/devel/logicaldecoding-synchronous.html
and then also give the reference for these caveats at prepared
transactions page:https://www.postgresql.org/docs/devel/logicaldecoding-two-phase-commits.html

What do you think?

As this appears to be an existing caveat of logical replication, I
have added the Petr and Peter E in this email.

[1] - https://www.postgresql.org/message-
id/OSBPR01MB4888314C70DA6B112E32DD6AED2B9%40OSBPR01MB4888.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.



Re: locking [user] catalog tables vs 2pc vs logical rep

From
Michael Paquier
Date:
On Mon, May 24, 2021 at 10:03:01AM +0530, Amit Kapila wrote:
> So, this appears to be an existing caveat of synchronous replication.
> If that is the case, I am not sure if it is a good idea to just block
> such ops for the prepared transaction. Also, what about other
> operations which acquire an exclusive lock on [user]_catalog_tables
> like:
> cluster pg_trigger using pg_class_oid_index, similarly cluster on any
> user_catalog_table, then the other problematic operation could
> truncate of user_catalog_table as is discussed in another thread [1].
> I think all such operations can block even with synchronous
> replication. I am not sure if we can create examples for all cases
> because for ex. we don't have use of user_catalog_tables in-core but
> maybe for others, we can try to create examples and see what happens?
>
> If all such operations can block for synchronous replication and
> prepared transactions replication then we might want to document them
> as caveats at page:
> https://www.postgresql.org/docs/devel/logicaldecoding-synchronous.html
> and then also give the reference for these caveats at prepared
> transactions page:https://www.postgresql.org/docs/devel/logicaldecoding-two-phase-commits.html
>
> What do you think?

It seems to me that the 2PC issues on catalog tables and the issues
related to logical replication in synchonous mode are two distinct
things that need to be fixed separately.

The issue with LOCK taken on a catalog while a PREPARE TRANSACTION
holds locks around is bad enough in itself as it could lock down a
user from a cluster as long as the PREPARE TRANSACTION is not removed
from WAL (say the relation is critical for the connection startup).
This could be really disruptive for the user even if he tried to take
a lock on an object he owns, and the way to recover is not easy here,
and the way to recover involves either an old backup or worse,
pg_resetwal.

The second issue with logical replication is still disruptive, but it
looks to me more like a don't-do-it issue, and documenting the caveats
sounds fine enough.

Looking at the patch from upthread..

+           /*
+            * Make note that we've locked a system table or an user catalog
+            * table. This flag will be checked later during prepare transaction
+            * to fail the prepare transaction.
+            */
+           if (lockstmt->mode >= ExclusiveLock &&
+               (IsCatalogRelationOid(reloid) ||
+                RelationIsUsedAsCatalogTable(rel)))
+               MyXactFlags |= XACT_FLAGS_ACQUIREDEXCLUSIVELOCK_SYSREL;
I think that I'd just use IsCatalogRelationOid() here, and I'd be more
severe and restrict all attempts for any lock levels.  It seems to me
that this needs to happen within RangeVarCallbackForLockTable().
I would also rename the flag as just XACT_FLAGS_LOCKEDCATALOG.

+                errmsg("cannot PREPARE a transaction that has an exclusive lock on user catalog/system table(s)")));
What about "cannot PREPARE a transaction that has locked a catalog
relation"?
--
Michael

Attachment

RE: locking [user] catalog tables vs 2pc vs logical rep

From
"osumi.takamichi@fujitsu.com"
Date:
On Monday, May 24, 2021 1:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Apr 20, 2021 at 9:57 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > This similar problem exists in case of synchronous replication setup
> > having synchronous_standby_names referring to the subscriber, when we
> > do the steps "begin;lock pg_class; insert into test1 values(10);
> > commit". In this case while decoding of commit, the commit will wait
> > while trying to acquire a lock on pg_class relation,
> >
> 
> So, this appears to be an existing caveat of synchronous replication.
> If that is the case, I am not sure if it is a good idea to just block such ops for the
> prepared transaction. Also, what about other operations which acquire an
> exclusive lock on [user]_catalog_tables
> like:
> cluster pg_trigger using pg_class_oid_index, similarly cluster on any
> user_catalog_table, then the other problematic operation could truncate of
> user_catalog_table as is discussed in another thread [1].
> I think all such operations can block even with synchronous replication. I am not
> sure if we can create examples for all cases because for ex. we don't have use
> of user_catalog_tables in-core but maybe for others, we can try to create
> examples and see what happens?
> 
> If all such operations can block for synchronous replication and prepared
> transactions replication then we might want to document them as caveats at
> page:
> https://www.postgresql.org/docs/devel/logicaldecoding-synchronous.html
> and then also give the reference for these caveats at prepared transactions
> page:https://www.postgresql.org/docs/devel/logicaldecoding-two-phase-com
> mits.html
> 
> What do you think?
I've checked the behavior of CLUSTER command
in synchronous mode, one of the examples above, as well.

IIUC, you meant pg_class, and
the deadlock happens when I run cluster commands on pg_class using its index in synchronous mode.
The command I used is "BEGIN; CLUSTER pg_class USING pg_class_oid_index; END;".
This deadlock comes from 2 processes, the backend to wait synchronization of the standby
and the walsender process which wants to take a lock on pg_class.
Therefore, I think we need to do something, at least documentation fix,
as you mentioned.

From the perspective of restating,
when I restart the locked pub with fast and immediate mode,
in both cases, the pub succeeded in restart and accepted
interactive psql connections. So, after the restart,
we are released from the lock.


Best Regards,
    Takamichi Osumi


Re: locking [user] catalog tables vs 2pc vs logical rep

From
Amit Kapila
Date:
On Tue, May 25, 2021 at 1:43 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
>
> On Monday, May 24, 2021 1:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Tue, Apr 20, 2021 at 9:57 AM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > This similar problem exists in case of synchronous replication setup
> > > having synchronous_standby_names referring to the subscriber, when we
> > > do the steps "begin;lock pg_class; insert into test1 values(10);
> > > commit". In this case while decoding of commit, the commit will wait
> > > while trying to acquire a lock on pg_class relation,
> > >
> >
> > So, this appears to be an existing caveat of synchronous replication.
> > If that is the case, I am not sure if it is a good idea to just block such ops for the
> > prepared transaction. Also, what about other operations which acquire an
> > exclusive lock on [user]_catalog_tables
> > like:
> > cluster pg_trigger using pg_class_oid_index, similarly cluster on any
> > user_catalog_table, then the other problematic operation could truncate of
> > user_catalog_table as is discussed in another thread [1].
> > I think all such operations can block even with synchronous replication. I am not
> > sure if we can create examples for all cases because for ex. we don't have use
> > of user_catalog_tables in-core but maybe for others, we can try to create
> > examples and see what happens?
> >
> > If all such operations can block for synchronous replication and prepared
> > transactions replication then we might want to document them as caveats at
> > page:
> > https://www.postgresql.org/docs/devel/logicaldecoding-synchronous.html
> > and then also give the reference for these caveats at prepared transactions
> > page:https://www.postgresql.org/docs/devel/logicaldecoding-two-phase-com
> > mits.html
> >
> > What do you think?
> I've checked the behavior of CLUSTER command
> in synchronous mode, one of the examples above, as well.
>
> IIUC, you meant pg_class, and
> the deadlock happens when I run cluster commands on pg_class using its index in synchronous mode.
> The command I used is "BEGIN; CLUSTER pg_class USING pg_class_oid_index; END;".
> This deadlock comes from 2 processes, the backend to wait synchronization of the standby
> and the walsender process which wants to take a lock on pg_class.
>

Have you tried to prepare this transaction? That won't be allowed. I
wanted to see if we can generate some scenarios where it is blocked
for prepared xacts decoding and for synchronous replication.

> Therefore, I think we need to do something, at least documentation fix,
> as you mentioned.
>

Yes, I think that is true.

-- 
With Regards,
Amit Kapila.



Re: locking [user] catalog tables vs 2pc vs logical rep

From
Amit Kapila
Date:
On Tue, May 25, 2021 at 12:40 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, May 24, 2021 at 10:03:01AM +0530, Amit Kapila wrote:
> > So, this appears to be an existing caveat of synchronous replication.
> > If that is the case, I am not sure if it is a good idea to just block
> > such ops for the prepared transaction. Also, what about other
> > operations which acquire an exclusive lock on [user]_catalog_tables
> > like:
> > cluster pg_trigger using pg_class_oid_index, similarly cluster on any
> > user_catalog_table, then the other problematic operation could
> > truncate of user_catalog_table as is discussed in another thread [1].
> > I think all such operations can block even with synchronous
> > replication. I am not sure if we can create examples for all cases
> > because for ex. we don't have use of user_catalog_tables in-core but
> > maybe for others, we can try to create examples and see what happens?
> >
> > If all such operations can block for synchronous replication and
> > prepared transactions replication then we might want to document them
> > as caveats at page:
> > https://www.postgresql.org/docs/devel/logicaldecoding-synchronous.html
> > and then also give the reference for these caveats at prepared
> > transactions page:https://www.postgresql.org/docs/devel/logicaldecoding-two-phase-commits.html
> >
> > What do you think?
>
> It seems to me that the 2PC issues on catalog tables and the issues
> related to logical replication in synchonous mode are two distinct
> things that need to be fixed separately.
>

Fair enough. But the way we were looking at them as they will also
block (lead to deadlock) for logical replication of prepared
transactions and also logical replication in synchonous mode without
prepared transactions. Now, if we want to deal with the 2PC issues
separately that should be fine as well. However, for that we need to
see which all operations we want to block on [user]_catalog_tables.
The first one is lock command, then there are other operations like
Cluster which take exclusive lock on system catalog tables and we
allow them to be part of prepared transactions (example Cluster
pg_trigger using pg_trigger_oid_index;), another kind of operation is
Truncate on user_catalog_tables. Now, some of these might not allow
connecting after restart so we might need to think whether we want to
prohibit all such operations or only some of them.

>
> The second issue with logical replication is still disruptive, but it
> looks to me more like a don't-do-it issue, and documenting the caveats
> sounds fine enough.
>

Right, that is what I also think.

-- 
With Regards,
Amit Kapila.



Re: locking [user] catalog tables vs 2pc vs logical rep

From
Tom Lane
Date:
Amit Kapila <amit.kapila16@gmail.com> writes:
> Fair enough. But the way we were looking at them as they will also
> block (lead to deadlock) for logical replication of prepared
> transactions and also logical replication in synchonous mode without
> prepared transactions. Now, if we want to deal with the 2PC issues
> separately that should be fine as well. However, for that we need to
> see which all operations we want to block on [user]_catalog_tables.
> The first one is lock command, then there are other operations like
> Cluster which take exclusive lock on system catalog tables and we
> allow them to be part of prepared transactions (example Cluster
> pg_trigger using pg_trigger_oid_index;), another kind of operation is
> Truncate on user_catalog_tables. Now, some of these might not allow
> connecting after restart so we might need to think whether we want to
> prohibit all such operations or only some of them.

2PC has pretty much always worked like that, and AFAIR there have been
a grand total of zero complaints about it.  It seems quite likely to
me that you're proposing to expend a lot of effort on restrictions
that will hurt more people than they help.  Maybe that score is only
about one to zero, but still you should account for the possibility
that you're breaking legitimate use-cases.

            regards, tom lane



Re: locking [user] catalog tables vs 2pc vs logical rep

From
Petr Jelinek
Date:
On 26 May 2021, at 05:04, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, May 25, 2021 at 12:40 PM Michael Paquier <michael@paquier.xyz> wrote:
>>
>> On Mon, May 24, 2021 at 10:03:01AM +0530, Amit Kapila wrote:
>>> So, this appears to be an existing caveat of synchronous replication.
>>> If that is the case, I am not sure if it is a good idea to just block
>>> such ops for the prepared transaction. Also, what about other
>>> operations which acquire an exclusive lock on [user]_catalog_tables
>>> like:
>>> cluster pg_trigger using pg_class_oid_index, similarly cluster on any
>>> user_catalog_table, then the other problematic operation could
>>> truncate of user_catalog_table as is discussed in another thread [1].
>>> I think all such operations can block even with synchronous
>>> replication. I am not sure if we can create examples for all cases
>>> because for ex. we don't have use of user_catalog_tables in-core but
>>> maybe for others, we can try to create examples and see what happens?
>>>
>>> If all such operations can block for synchronous replication and
>>> prepared transactions replication then we might want to document them
>>> as caveats at page:
>>> https://www.postgresql.org/docs/devel/logicaldecoding-synchronous.html
>>> and then also give the reference for these caveats at prepared
>>> transactions page:https://www.postgresql.org/docs/devel/logicaldecoding-two-phase-commits.html
>>>
>>> What do you think?
>>
>> It seems to me that the 2PC issues on catalog tables and the issues
>> related to logical replication in synchonous mode are two distinct
>> things that need to be fixed separately.
>>
>
> Fair enough. But the way we were looking at them as they will also
> block (lead to deadlock) for logical replication of prepared
> transactions and also logical replication in synchonous mode without
> prepared transactions. Now, if we want to deal with the 2PC issues
> separately that should be fine as well. However, for that we need to
> see which all operations we want to block on [user]_catalog_tables.
> The first one is lock command, then there are other operations like
> Cluster which take exclusive lock on system catalog tables and we
> allow them to be part of prepared transactions (example Cluster
> pg_trigger using pg_trigger_oid_index;), another kind of operation is
> Truncate on user_catalog_tables. Now, some of these might not allow
> connecting after restart so we might need to think whether we want to
> prohibit all such operations or only some of them.
>


IIRC this was discussed the first time 2PC decoding was proposed and everybody seemed fine with the limitation so I'd
votefor just documenting it, same way as the sync rep issue. 

If you'd prefer fixing it by blocking something, wouldn't it make more sense to simply not allow PREPARE if one of
theseoperations was executed, similarly to what we do with temp table access? 

--
Petr


Re: locking [user] catalog tables vs 2pc vs logical rep

From
vignesh C
Date:
On Tue, May 25, 2021 at 12:40 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, May 24, 2021 at 10:03:01AM +0530, Amit Kapila wrote:
> > So, this appears to be an existing caveat of synchronous replication.
> > If that is the case, I am not sure if it is a good idea to just block
> > such ops for the prepared transaction. Also, what about other
> > operations which acquire an exclusive lock on [user]_catalog_tables
> > like:
> > cluster pg_trigger using pg_class_oid_index, similarly cluster on any
> > user_catalog_table, then the other problematic operation could
> > truncate of user_catalog_table as is discussed in another thread [1].
> > I think all such operations can block even with synchronous
> > replication. I am not sure if we can create examples for all cases
> > because for ex. we don't have use of user_catalog_tables in-core but
> > maybe for others, we can try to create examples and see what happens?
> >
> > If all such operations can block for synchronous replication and
> > prepared transactions replication then we might want to document them
> > as caveats at page:
> > https://www.postgresql.org/docs/devel/logicaldecoding-synchronous.html
> > and then also give the reference for these caveats at prepared
> > transactions page:https://www.postgresql.org/docs/devel/logicaldecoding-two-phase-commits.html
> >
> > What do you think?
>
> It seems to me that the 2PC issues on catalog tables and the issues
> related to logical replication in synchonous mode are two distinct
> things that need to be fixed separately.
>
> The issue with LOCK taken on a catalog while a PREPARE TRANSACTION
> holds locks around is bad enough in itself as it could lock down a
> user from a cluster as long as the PREPARE TRANSACTION is not removed
> from WAL (say the relation is critical for the connection startup).
> This could be really disruptive for the user even if he tried to take
> a lock on an object he owns, and the way to recover is not easy here,
> and the way to recover involves either an old backup or worse,
> pg_resetwal.
>
> The second issue with logical replication is still disruptive, but it
> looks to me more like a don't-do-it issue, and documenting the caveats
> sounds fine enough.
>
> Looking at the patch from upthread..
>
> +           /*
> +            * Make note that we've locked a system table or an user catalog
> +            * table. This flag will be checked later during prepare transaction
> +            * to fail the prepare transaction.
> +            */
> +           if (lockstmt->mode >= ExclusiveLock &&
> +               (IsCatalogRelationOid(reloid) ||
> +                RelationIsUsedAsCatalogTable(rel)))
> +               MyXactFlags |= XACT_FLAGS_ACQUIREDEXCLUSIVELOCK_SYSREL;
> I think that I'd just use IsCatalogRelationOid() here, and I'd be more
> severe and restrict all attempts for any lock levels.  It seems to me
> that this needs to happen within RangeVarCallbackForLockTable().
> I would also rename the flag as just XACT_FLAGS_LOCKEDCATALOG.
>
> +                errmsg("cannot PREPARE a transaction that has an exclusive lock on user catalog/system
table(s)")));
> What about "cannot PREPARE a transaction that has locked a catalog
> relation"?

At this point it is not clear if we are planning to fix this issue by
throwing an error or document it. I will fix these comments once we
come to consensus.

Regards,
Vignesh



Re: locking [user] catalog tables vs 2pc vs logical rep

From
Amit Kapila
Date:
On Wed, May 26, 2021 at 1:53 PM Petr Jelinek
<petr.jelinek@enterprisedb.com> wrote:
>
> On 26 May 2021, at 05:04, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, May 25, 2021 at 12:40 PM Michael Paquier <michael@paquier.xyz> wrote:
> >>
> >> It seems to me that the 2PC issues on catalog tables and the issues
> >> related to logical replication in synchonous mode are two distinct
> >> things that need to be fixed separately.
> >>
> >
> > Fair enough. But the way we were looking at them as they will also
> > block (lead to deadlock) for logical replication of prepared
> > transactions and also logical replication in synchonous mode without
> > prepared transactions. Now, if we want to deal with the 2PC issues
> > separately that should be fine as well. However, for that we need to
> > see which all operations we want to block on [user]_catalog_tables.
> > The first one is lock command, then there are other operations like
> > Cluster which take exclusive lock on system catalog tables and we
> > allow them to be part of prepared transactions (example Cluster
> > pg_trigger using pg_trigger_oid_index;), another kind of operation is
> > Truncate on user_catalog_tables. Now, some of these might not allow
> > connecting after restart so we might need to think whether we want to
> > prohibit all such operations or only some of them.
> >
>
>
> IIRC this was discussed the first time 2PC decoding was proposed and everybody seemed fine with the limitation so I'd
votefor just documenting it, same way as the sync rep issue.
 
>

+1.

> If you'd prefer fixing it by blocking something, wouldn't it make more sense to simply not allow PREPARE if one of
theseoperations was executed, similarly to what we do with temp table access?
 
>

The point was that even if somehow we block for prepare, there doesn't
seem to be a simple way for synchronous logical replication which can
also have similar problems. So, I would prefer to document it and we
can even think to backpatch the sync rep related documentation.
Michael seems to be a bit interested in dealing with some of the 2PC
issues due to reasons different than logical replication which I am
not completely sure is a good idea and Tom also feels that is not a
good idea.

-- 
With Regards,
Amit Kapila.