Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Date
Msg-id CALj2ACW1jAJTi-SmBCStZ6+a1C3DeAc8-_Zuw1LYFzo8D33k2w@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
On Fri, Jan 29, 2021 at 10:55 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
> On 2021/01/29 14:12, Bharath Rupireddy wrote:
> > On Fri, Jan 29, 2021 at 10:28 AM Fujii Masao
> > <masao.fujii@oss.nttdata.com> wrote:
> >> On 2021/01/29 11:09, Tom Lane wrote:
> >>> Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
> >>>> On Fri, Jan 29, 2021 at 1:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
> >>>>> This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
> >>>>> telling us is that the patch's behavior is unstable in the face
> >>>>> of unexpected cache flushes.
> >>>
> >>>> Thanks a lot! It looks like the syscache invalidation messages are
> >>>> generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
> >>>> which pgfdw_inval_callback gets called many times in which the cached
> >>>> entries are marked as invalid and closed if they are not used in the
> >>>> txn. The new function postgres_fdw_get_connections outputs the
> >>>> information of the cached connections such as name if the connection
> >>>> is still open and their validity. Hence the output of the
> >>>> postgres_fdw_get_connections became unstable in the buildfarm member.
> >>>> I will further analyze making tests stable, meanwhile any suggestions
> >>>> are welcome.
> >>>
> >>> I do not think you should regard this as "we need to hack the test
> >>> to make it stable".  I think you should regard this as "this is a
> >>> bug".  A cache flush should not cause user-visible state changes.
> >>> In particular, the above analysis implies that you think a cache
> >>> flush is equivalent to end-of-transaction, which it absolutely
> >>> is not.
> >>>
> >>> Also, now that I've looked at pgfdw_inval_callback, it scares
> >>> the heck out of me.  Actually disconnecting a connection during
> >>> a cache inval callback seems quite unsafe --- what if that happens
> >>> while we're using the connection?
> >>
> >> If the connection is still used in the transaction, pgfdw_inval_callback()
> >> marks it as invalidated and doesn't close it. So I was not thinking that
> >> this is so unsafe.
> >>
> >> The disconnection code in pgfdw_inval_callback() was added in commit
> >> e3ebcca843 to fix connection leak issue, and it's back-patched. If this
> >> change is really unsafe, we need to revert it immediately at least from back
> >> branches because the next minor release is scheduled soon.
> >
> > I think we can remove disconnect_pg_server in pgfdw_inval_callback and
> > make entries only invalidated. Anyways, those connections can get
> > closed at the end of main txn in pgfdw_xact_callback. Thoughts?
>
> But this revives the connection leak issue. So isn't it better to
> to do that after we confirm that the current code is really unsafe?

IMO, connections will not leak, because the invalidated connections
eventually will get closed in pgfdw_xact_callback at the main txn end.

IIRC, when we were finding a way to close the invalidated connections
so that they don't leaked, we had two options:

1) let those connections (whether currently being used in the xact or
not) get marked invalidated in pgfdw_inval_callback and closed in
pgfdw_xact_callback at the main txn end as shown below

        if (PQstatus(entry->conn) != CONNECTION_OK ||
            PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
            entry->changing_xact_state ||
            entry->invalidated).   ----> by adding this
        {
            elog(DEBUG3, "discarding connection %p", entry->conn);
            disconnect_pg_server(entry);
        }

2) close the unused connections right away in pgfdw_inval_callback
instead of marking them invalidated. Mark used connections as
invalidated in pgfdw_inval_callback and close them in
pgfdw_xact_callback at the main txn end.

We went with option (2) because we thought this would ease some burden
on pgfdw_xact_callback closing a lot of invalid connections at once.

Hope that's fine.

I will respond to postgres_fdw_get_connections issue separately.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Next
From: David Rowley
Date:
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans