Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Date
Msg-id 7fa16ff3-5b37-0617-7a27-22e839695ef5@oss.nttdata.com
Whole thread Raw
In response to Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers

On 2021/01/29 14:46, Bharath Rupireddy wrote:
> On Fri, Jan 29, 2021 at 11:08 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>>
>> On Fri, Jan 29, 2021 at 10:55 AM Fujii Masao
>> <masao.fujii@oss.nttdata.com> wrote:
>>> On 2021/01/29 14:12, Bharath Rupireddy wrote:
>>>> On Fri, Jan 29, 2021 at 10:28 AM Fujii Masao
>>>> <masao.fujii@oss.nttdata.com> wrote:
>>>>> On 2021/01/29 11:09, Tom Lane wrote:
>>>>>> Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
>>>>>>> On Fri, Jan 29, 2021 at 1:52 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>>>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
>>>>>>>> This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
>>>>>>>> telling us is that the patch's behavior is unstable in the face
>>>>>>>> of unexpected cache flushes.
>>>>>>
>>>>>>> Thanks a lot! It looks like the syscache invalidation messages are
>>>>>>> generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
>>>>>>> which pgfdw_inval_callback gets called many times in which the cached
>>>>>>> entries are marked as invalid and closed if they are not used in the
>>>>>>> txn. The new function postgres_fdw_get_connections outputs the
>>>>>>> information of the cached connections such as name if the connection
>>>>>>> is still open and their validity. Hence the output of the
>>>>>>> postgres_fdw_get_connections became unstable in the buildfarm member.
>>>>>>> I will further analyze making tests stable, meanwhile any suggestions
>>>>>>> are welcome.
>>>>>>
>>>>>> I do not think you should regard this as "we need to hack the test
>>>>>> to make it stable".  I think you should regard this as "this is a
>>>>>> bug".  A cache flush should not cause user-visible state changes.
>>>>>> In particular, the above analysis implies that you think a cache
>>>>>> flush is equivalent to end-of-transaction, which it absolutely
>>>>>> is not.
>>>>>>
>>>>>> Also, now that I've looked at pgfdw_inval_callback, it scares
>>>>>> the heck out of me.  Actually disconnecting a connection during
>>>>>> a cache inval callback seems quite unsafe --- what if that happens
>>>>>> while we're using the connection?
>>>>>
>>>>> If the connection is still used in the transaction, pgfdw_inval_callback()
>>>>> marks it as invalidated and doesn't close it. So I was not thinking that
>>>>> this is so unsafe.
>>>>>
>>>>> The disconnection code in pgfdw_inval_callback() was added in commit
>>>>> e3ebcca843 to fix connection leak issue, and it's back-patched. If this
>>>>> change is really unsafe, we need to revert it immediately at least from back
>>>>> branches because the next minor release is scheduled soon.
>>>>
>>>> I think we can remove disconnect_pg_server in pgfdw_inval_callback and
>>>> make entries only invalidated. Anyways, those connections can get
>>>> closed at the end of main txn in pgfdw_xact_callback. Thoughts?
>>>
>>> But this revives the connection leak issue. So isn't it better to
>>> to do that after we confirm that the current code is really unsafe?
>>
>> IMO, connections will not leak, because the invalidated connections
>> eventually will get closed in pgfdw_xact_callback at the main txn end.
>>
>> IIRC, when we were finding a way to close the invalidated connections
>> so that they don't leaked, we had two options:
>>
>> 1) let those connections (whether currently being used in the xact or
>> not) get marked invalidated in pgfdw_inval_callback and closed in
>> pgfdw_xact_callback at the main txn end as shown below
>>
>>          if (PQstatus(entry->conn) != CONNECTION_OK ||
>>              PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
>>              entry->changing_xact_state ||
>>              entry->invalidated).   ----> by adding this
>>          {
>>              elog(DEBUG3, "discarding connection %p", entry->conn);
>>              disconnect_pg_server(entry);
>>          }
>>
>> 2) close the unused connections right away in pgfdw_inval_callback
>> instead of marking them invalidated. Mark used connections as
>> invalidated in pgfdw_inval_callback and close them in
>> pgfdw_xact_callback at the main txn end.
>>
>> We went with option (2) because we thought this would ease some burden
>> on pgfdw_xact_callback closing a lot of invalid connections at once.
> 
> Also, see the original patch for the connection leak issue just does
> option (1), see [1]. But in [2] and [3], we chose option (2).
> 
> I feel, we can go for option (1), with the patch attached in [1] i.e.
> having have_invalid_connections whenever any connection gets invalided
> so that we don't quickly exit in pgfdw_xact_callback and the
> invalidated connections get closed properly. Thoughts?

Before going for (1) or something, I'd like to understand what the actual
issue of (2), i.e., the current code is. Otherwise other approaches might
have the same issue.


Regarding (1), as far as I understand correctly, even when the transaction
doesn't use foreign tables at all, it needs to scan the connection cache
entries if necessary. I was thinking to avoid this. I guess that this doesn't
work with at least the postgres_fdw 2PC patch that Sawada-san is proposing
because with the patch the commit/rollback callback is performed only
for the connections used in the transaction.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Next
From: "Joel Jacobson"
Date:
Subject: Re: Add primary keys to system catalogs