Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
Date
Msg-id de2e1392-b1c3-592a-34c0-d0925d5ab91f@oss.nttdata.com
Whole thread Raw
In response to Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
List pgsql-hackers

On 2020/10/02 0:46, Bharath Rupireddy wrote:
> On Thu, Oct 1, 2020 at 8:10 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>> pg_stat_clear_snapshot() can be used to reset the entry.
>>
> 
> Thanks. I wasn't knowing it.
> 
>>
>> +               EXIT WHEN proccnt = 0;
>> +    END LOOP;
>>
>> Isn't it better to sleep here, to avoid th busy loop?
>>
> 
> +1.
> 
>>
>> So what I thought was something like
>>
>> CREATE OR REPLACE PROCEDURE wait_for_backend_termination()
>> LANGUAGE plpgsql
>> AS $$
>> BEGIN
>>       LOOP
>>           PERFORM * FROM pg_stat_activity WHERE application_name = 'fdw_retry_check';
>>           EXIT WHEN NOT FOUND;
>>           PERFORM pg_sleep(1), pg_stat_clear_snapshot();
>>       END LOOP;
>> END;
>> $$;
>>
> 
> Changed.
> 
> Attaching v8 patch, please review it.. Both make check and make
> check-world passes on v8.

Thanks for updating the patch! It basically looks good to me.
I tweaked the patch as follows.

+        if (!entry->conn ||
+            PQstatus(entry->conn) != CONNECTION_BAD ||

With the above change, if entry->conn is NULL, an error is thrown and no new
connection is reestablished. But why? IMO it's more natural to reestablish
new connection in that case. So I removed "!entry->conn" from the above
condition.

+        ereport(DEBUG3,
+                (errmsg("could not start remote transaction on connection %p",
+                 entry->conn)),

I replaced errmsg() with errmsg_internal() because the translation of
this debug message is not necessary.


+SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE
+backend_type = 'client backend' AND application_name = 'fdw_retry_check';
+CALL wait_for_backend_termination();

Since we always use pg_terminate_backend() and wait_for_backend_termination()
together, I merged them into one function.

I simplied the comments on the regression test.

  Attached is the updated version of the patch. If this patch is ok,
  I'd like to mark it as ready for committer.


> I have another question not related to this patch: though we have
> wait_pid() function, we are not able to use it like
> pg_terminate_backend() in other modules, wouldn't be nice if we can
> make it generic under the name pg_wait_pid() and usable across all pg
> modules?

I thought that, too. But I could not come up with good idea for *real* use case
of that function. At least that's useful for the regression test, though.
Anyway, IMO it's worth proposing that and hearing more opinions about that
from other hackers.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment

pgsql-hackers by date:

Previous
From: Kuntal Ghosh
Date:
Subject: Incorrect assumption in heap_prepare_freeze_tuple
Next
From: Andres Freund
Date:
Subject: Re: buildfarm animal shoveler failing with "Illegal instruction"