Re: Truncate in synchronous logical replication failed - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Truncate in synchronous logical replication failed
Date
Msg-id CAA4eK1JfDjgr1YckE4Z9=Xwk48hqb58PBP7+4_Ehdi7ja+MRvg@mail.gmail.com
Whole thread Raw
In response to Re: Truncate in synchronous logical replication failed  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Truncate in synchronous logical replication failed  (Japin Li <japinli@hotmail.com>)
List pgsql-hackers
On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
> <petr.jelinek@enterprisedb.com> wrote:
> >
> > > On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > The problem happens only when we try to fetch IDENTITY_KEY attributes
> > > because pgoutput uses RelationGetIndexAttrBitmap() to get that
> > > information which locks the required indexes. Now, because TRUNCATE
> > > has already acquired an exclusive lock on the index, it seems to
> > > create a sort of deadlock where the actual Truncate operation waits
> > > for logical replication of operation to complete and logical
> > > replication waits for actual Truncate operation to finish.
> > >
> > > Do we really need to use RelationGetIndexAttrBitmap() to build
> > > IDENTITY_KEY attributes? During decoding, we don't even lock the main
> > > relation, we just scan the system table and build that information
> > > using a historic snapshot. Can't we do something similar here?
> > >
> > > Adding Petr J. and Peter E. to know their views as this seems to be an
> > > old problem (since the decoding of Truncate operation is introduced).
> >
> > We used RelationGetIndexAttrBitmap because it already existed, no other reason.
> >
>
> Fair enough. But I think we should do something about it because using
> the same (RelationGetIndexAttrBitmap) just breaks the synchronous
> logical replication. I think this is broken since the logical
> replication of Truncate is supported.
>
> > I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed.
> >
>
> Are you telling that we need AccessShareLock on the index? If so, why
> is it different from how we access the relation during decoding
> (basically in ReorderBufferProcessTXN, we directly use
> RelationIdGetRelation() without any lock on the relation)? I think we
> do it that way because we need it to process WAL entries and we need
> the relation state based on the historic snapshot, so, even if the
> relation is later changed/dropped, we are fine with the old state we
> got. Isn't the same thing applies here in logicalrep_write_attrs? If
> that is true then some equivalent of RelationGetIndexAttrBitmap where
> we use RelationIdGetRelation instead of index_open should work?
>

Today, again I have thought about this and don't see a problem with
the above idea. If the above understanding is correct, then I think
for our purpose in pgoutput, we just need to call RelationGetIndexList
and then build the idattr list for relation->rd_replidindex. We can
then cache it in relation->rd_idattr. I am not sure if it is really
required to do all the other work in RelationGetIndexAttrBitmap.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Luc Vlaming
Date:
Subject: Re: "could not find pathkey item to sort" for TPC-DS queries 94-96
Next
From: Japin Li
Date:
Subject: Forget close an open relation in ReorderBufferProcessTXN()