RE: Truncate in synchronous logical replication failed - Mailing list pgsql-hackers
| From | osumi.takamichi@fujitsu.com |
|---|---|
| Subject | RE: Truncate in synchronous logical replication failed |
| Date | |
| Msg-id | OSBPR01MB4888C84371882A4E001E3A44ED459@OSBPR01MB4888.jpnprd01.prod.outlook.com Whole thread Raw |
| In response to | Re: Truncate in synchronous logical replication failed (Amit Kapila <amit.kapila16@gmail.com>) |
| Responses |
Re: Truncate in synchronous logical replication failed
|
| List | pgsql-hackers |
On Thursday, April 22, 2021 6:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Apr 20, 2021 at 9:00 AM osumi.takamichi@fujitsu.com
> <osumi.takamichi@fujitsu.com> wrote:
> >
> > On Tuesday, April 20, 2021 10:53 AM Ajin Cherian <itsajin@gmail.com>
> wrote:
> > >
> > > I reviewed the patch, ran make check, no issues. One minor comment:
> > >
> > > Could you add the comment similar to RelationGetIndexAttrBitmap() on
> > > why the redo, it's not very obvious to someone reading the code, why
> > > we are refetching the index list here.
> > >
> > > + /* Check if we need to redo */
> > >
> > > + newindexoidlist = RelationGetIndexList(relation);
> > Yeah, makes sense. Fixed.
>
> I don't think here we need to restart to get a stable list of indexes as we do in
> RelationGetIndexAttrBitmap. The reason is here we build the cache entry
> using a historic snapshot and all the later changes are absorbed while
> decoding WAL.
I rechecked this and I agree with this.
I don't see any problem to remove the redo check.
Based on this direction, I'll share my several minor comments.
[1] a typo of RelationGetIdentityKeyBitmap's comment
+ * This is a special purpose function used during logical replication. Here,
+ * unlike RelationGetIndexAttrBitmap(), we don't a acquire lock on the required
We have "a" in an unnatural place. It should be "we don't acquire...".
[2] suggestion to fix RelationGetIdentityKeyBitmap's comment
+ * later changes are absorbed while decoding WAL. Due to this reason, we don't
+ * need to retry here in case of a change in the set of indexes.
I think it's better to use "Because of" instead of "Due to".
This patch works to solve the deadlock.
[3] wrong comment in RelationGetIdentityKeyBitmap
+ /* Save some values to compare after building attributes */
I wrote this comment for the redo check
that has been removed already. We can delete it.
[4] suggestion to remove local variable relreplindex in RelationGetIdentityKeyBitmap
I think we don't need relreplindex.
We can just pass relaton->rd_replidindex to RelationIdGetRelation().
There is no more reference of the variable.
[5] suggestion to fix the place to release indexoidlist
I thought we can do its list_free() ealier,
after checking if there is no indexes.
[6] suggestion to remove period in one comment.
+ /* Quick exit if we already computed the result. */
This comes from RelationGetIndexAttrBitmap (and my posted versions)
but I think we can remove the period to give better alignment shared with other comments in the function.
> I have updated that and modified few comments in the
> attached patch. Can you please test this in clobber_cache_always mode? I
> think just testing subscription/t/010_truncate.pl would be sufficient.
I did it. It didn't fail. No problem.
> Now, this bug exists in prior versions (>= v11) where we have introduced
> decoding of Truncate. Do we want to backpatch this? As no user has reported
> this till now apart from Tang who I think got it while doing some other patch
> testing, we might want to just push it in HEAD. I am fine either way. Petr,
> others, do you have any opinion on this matter?
I think we are fine with applying this patch to the HEAD only,
since nobody has complained about the hang issue.
Best Regards,
Takamichi Osumi
pgsql-hackers by date: