On Monday, May 24, 2021 1:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Apr 20, 2021 at 9:57 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > This similar problem exists in case of synchronous replication setup
> > having synchronous_standby_names referring to the subscriber, when we
> > do the steps "begin;lock pg_class; insert into test1 values(10);
> > commit". In this case while decoding of commit, the commit will wait
> > while trying to acquire a lock on pg_class relation,
> >
>
> So, this appears to be an existing caveat of synchronous replication.
> If that is the case, I am not sure if it is a good idea to just block such ops for the
> prepared transaction. Also, what about other operations which acquire an
> exclusive lock on [user]_catalog_tables
> like:
> cluster pg_trigger using pg_class_oid_index, similarly cluster on any
> user_catalog_table, then the other problematic operation could truncate of
> user_catalog_table as is discussed in another thread [1].
> I think all such operations can block even with synchronous replication. I am not
> sure if we can create examples for all cases because for ex. we don't have use
> of user_catalog_tables in-core but maybe for others, we can try to create
> examples and see what happens?
>
> If all such operations can block for synchronous replication and prepared
> transactions replication then we might want to document them as caveats at
> page:
> https://www.postgresql.org/docs/devel/logicaldecoding-synchronous.html
> and then also give the reference for these caveats at prepared transactions
> page:https://www.postgresql.org/docs/devel/logicaldecoding-two-phase-com
> mits.html
>
> What do you think?
I've checked the behavior of CLUSTER command
in synchronous mode, one of the examples above, as well.
IIUC, you meant pg_class, and
the deadlock happens when I run cluster commands on pg_class using its index in synchronous mode.
The command I used is "BEGIN; CLUSTER pg_class USING pg_class_oid_index; END;".
This deadlock comes from 2 processes, the backend to wait synchronization of the standby
and the walsender process which wants to take a lock on pg_class.
Therefore, I think we need to do something, at least documentation fix,
as you mentioned.
From the perspective of restating,
when I restart the locked pub with fast and immediate mode,
in both cases, the pub succeeded in restart and accepted
interactive psql connections. So, after the restart,
we are released from the lock.
Best Regards,
Takamichi Osumi