Re: Forget close an open relation in ReorderBufferProcessTXN() - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Forget close an open relation in ReorderBufferProcessTXN()
Date
Msg-id CAA4eK1Ks+p8wDbzhDr7yMYEWDbWFRJAd_uOY-moikc+zr9ER+g@mail.gmail.com
Whole thread Raw
In response to Re: Forget close an open relation in ReorderBufferProcessTXN()  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Forget close an open relation in ReorderBufferProcessTXN()  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Thu, Apr 15, 2021 at 10:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > On Thu, Apr 15, 2021 at 4:00 PM Japin Li <japinli@hotmail.com> wrote:
> >>
> >> The RelationIdGetRelation() comment says:
> >>
> > Caller should eventually decrement count. (Usually,
> > that happens by calling RelationClose().)
> >>
> >> However, it doesn't do it in ReorderBufferProcessTXN().
> >> I think we should close it, here is a patch that fixes it. Thoughts?
> >>
>
> > +1. Your fix looks correct to me but can we test it in some way?
>
> I think this code has a bigger problem: it should not be using
> RelationIdGetRelation and RelationClose directly.  99.44% of
> the backend goes through relation_open or one of the other
> relation.c wrappers, so why doesn't this?
>

I think it is because relation_open expects either caller to have a
lock on the relation or don't use 'NoLock' lockmode. AFAIU, we don't
need to acquire a lock on relation while decoding changes from WAL
because it uses a historic snapshot to build a relcache entry and all
the later changes to the rel are absorbed while decoding WAL.

I think it is also important to *not* acquire any lock on relation
otherwise it can lead to some sort of deadlock or infinite wait in the
decoding process. Consider a case for operations like Truncate (or if
the user has acquired an exclusive lock on the relation in some other
way say via Lock command) which acquires an exclusive lock on
relation, it won't get replicated in synchronous mode (when
synchronous_standby_name is configured). The truncate operation will
wait for the transaction to be replicated to the subscriber and the
decoding process will wait for the Truncate operation to finish.

> Possibly the answer is "it copied the equally misguided code
> in pgoutput.c".
>

I think it is following what is done during decoding, otherwise, it
will lead to the problems as described above. We are already
discussing one of the similar problems [1] where pgoutput
unintentionally acquired a lock on the index and lead to a sort of
deadlock.

If the above understanding is correct, I think we might want to
improve comments in this area.

[1] -
https://www.postgresql.org/message-id/OS0PR01MB6113C2499C7DC70EE55ADB82FB759%40OS0PR01MB6113.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Bogus collation version recording in recordMultipleDependencies
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Performance degradation of REFRESH MATERIALIZED VIEW