RE: Truncate in synchronous logical replication failed - Mailing list pgsql-hackers

From osumi.takamichi@fujitsu.com
Subject RE: Truncate in synchronous logical replication failed
Date
Msg-id OSBPR01MB488889351DE334FE69F1D05EED4C9@OSBPR01MB4888.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Truncate in synchronous logical replication failed  (Japin Li <japinli@hotmail.com>)
Responses Re: Truncate in synchronous logical replication failed
List pgsql-hackers
Hi


On Friday, April 16, 2021 11:02 AM Japin Li <japinli@hotmail.com>
> On Thu, 15 Apr 2021 at 19:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Thu, Apr 15, 2021 at 4:30 PM Japin Li <japinli@hotmail.com> wrote:
> >>
> >>
> >> On Thu, 15 Apr 2021 at 18:22, Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >> > On Wed, Apr 14, 2021 at 3:31 PM Amit Kapila
> <amit.kapila16@gmail.com> wrote:
> >> >>
> >> >> On Tue, Apr 13, 2021 at 8:07 PM Petr Jelinek
> >> >> <petr.jelinek@enterprisedb.com> wrote:
> >> >> >
> >> >> > > On 12 Apr 2021, at 08:58, Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >> >> > >
> >> >> > > The problem happens only when we try to fetch IDENTITY_KEY
> >> >> > > attributes because pgoutput uses RelationGetIndexAttrBitmap()
> >> >> > > to get that information which locks the required indexes. Now,
> >> >> > > because TRUNCATE has already acquired an exclusive lock on the
> >> >> > > index, it seems to create a sort of deadlock where the actual
> >> >> > > Truncate operation waits for logical replication of operation
> >> >> > > to complete and logical replication waits for actual Truncate
> operation to finish.
> >> >> > >
> >> >> > > Do we really need to use RelationGetIndexAttrBitmap() to build
> >> >> > > IDENTITY_KEY attributes? During decoding, we don't even lock
> >> >> > > the main relation, we just scan the system table and build
> >> >> > > that information using a historic snapshot. Can't we do something
> similar here?
> >> >> > >
> >> >> > > Adding Petr J. and Peter E. to know their views as this seems
> >> >> > > to be an old problem (since the decoding of Truncate operation is
> introduced).
> >> >> >
> >> >> > We used RelationGetIndexAttrBitmap because it already existed, no
> other reason.
> >> >> >
> >> >>
> >> >> Fair enough. But I think we should do something about it because
> >> >> using the same (RelationGetIndexAttrBitmap) just breaks the
> >> >> synchronous logical replication. I think this is broken since the
> >> >> logical replication of Truncate is supported.
> >> >>
> >> >> > I am not sure what exact locking we need but I would have guessed
> at least AccessShareLock would be needed.
> >> >> >
> >> >>
> >> >> Are you telling that we need AccessShareLock on the index? If so,
> >> >> why is it different from how we access the relation during
> >> >> decoding (basically in ReorderBufferProcessTXN, we directly use
> >> >> RelationIdGetRelation() without any lock on the relation)? I think
> >> >> we do it that way because we need it to process WAL entries and we
> >> >> need the relation state based on the historic snapshot, so, even
> >> >> if the relation is later changed/dropped, we are fine with the old
> >> >> state we got. Isn't the same thing applies here in
> >> >> logicalrep_write_attrs? If that is true then some equivalent of
> >> >> RelationGetIndexAttrBitmap where we use RelationIdGetRelation
> instead of index_open should work?
> >> >>
> >> >
> >> > Today, again I have thought about this and don't see a problem with
> >> > the above idea. If the above understanding is correct, then I think
> >> > for our purpose in pgoutput, we just need to call
> >> > RelationGetIndexList and then build the idattr list for
> relation->rd_replidindex.
> >>
> >> Sorry, I don't know how can we build the idattr without open the index.
> >> If we should open the index, then we should use NoLock, since the
> >> TRUNCATE side hold AccessExclusiveLock. As Osumi points out in [1],
> >> The NoLock mode assumes that the  appropriate lock on the index is
> already taken.
> >>
> >
> > Why can't we use RelationIdGetRelation() by passing the required
> > indexOid to it?
>
> Thanks for your reminder.  It might be a way to solve this problem.
Yeah. I've made the 1st patch for this issue.

In my env, with the patch
the TRUNCATE in synchronous logical replication doesn't hang.
It's OK with make check-world as well.


Best Regards,
    Takamichi Osumi


Attachment

pgsql-hackers by date:

Previous
From: Japin Li
Date:
Subject: Re: Truncate in synchronous logical replication failed
Next
From: Noah Misch
Date:
Subject: Re: Allowing to create LEAKPROOF functions to non-superuser