Thread: Shouldn't postgres_fdw report warning when it gives up getting result from foreign server?

Hi,

postgres_fdw reports no log message when it sends "ABORT TRANSACTION" etc
and gives up getting a reply from a foreign server because of timeout or
connection trouble. This makes the troubleshooting a bit harder when
using postgres_fdw.

So how about making postgres_fdw report a warning in that case?
Specifically I'm thinking to change pgfdw_get_cleanup_result()
in postgres_fdw/connection.c so that it reports a warning in case of
a timeout or connection failure (error of PQconsumeInput()).

BTW, pgfdw_get_cleanup_result() does almost the same things as
what pgfdw_get_result() does. So it might be good idea to refactor
those function to reduce the code duplication.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Fri, Nov 19, 2021 at 3:31 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> Hi,
>
> postgres_fdw reports no log message when it sends "ABORT TRANSACTION" etc
> and gives up getting a reply from a foreign server because of timeout or
> connection trouble. This makes the troubleshooting a bit harder when
> using postgres_fdw.
>
> So how about making postgres_fdw report a warning in that case?
> Specifically I'm thinking to change pgfdw_get_cleanup_result()
> in postgres_fdw/connection.c so that it reports a warning in case of
> a timeout or connection failure (error of PQconsumeInput()).

How about adding the warning message in pgfdw_abort_cleanup instead of
pgfdw_get_cleanup_result?

Just before this in pgfdw_abort_cleanup seems better to me.

    /*
     * If a command has been submitted to the remote server by using an
     * asynchronous execution function, the command might not have yet
     * completed.  Check to see if a command is still being processed by the
     * remote server, and if so, request cancellation of the command.
     */
    if (PQtransactionStatus(entry->conn) == PQTRANS_ACTIVE &&
        !pgfdw_cancel_query(entry->conn))
        return;                 /* Unable to cancel running query */

    if (!pgfdw_exec_cleanup_query(entry->conn, sql, false))
        return;

> BTW, pgfdw_get_cleanup_result() does almost the same things as
> what pgfdw_get_result() does. So it might be good idea to refactor
> those function to reduce the code duplication.

Yeah, this seems to be an opportunity. But, the function should deal
with the timeout separately, I'm concerned that the function will
eventually be having if (timeout_param_specified) {  } else { } sort
of code. We can see how much duplicate code we save here vs the
readability or complexity that comes with the single function.

Regards,
Bharath Rupireddy.




On 2021/11/19 22:13, Bharath Rupireddy wrote:
> How about adding the warning message in pgfdw_abort_cleanup instead of
> pgfdw_get_cleanup_result?
> 
> Just before this in pgfdw_abort_cleanup seems better to me.

I was thinking pgfdw_get_cleanup_result() is better because it can
easily report different warning messages based on cases of a timeout
or connection failure, respectively. Since pgfdw_get_cleanup_result()
returns true in both those cases, ISTM that it's not easy to
distinguish them in pgfdw_abort_cleanup().

Anyway, attached is the patch (pgfdw_get_cleanup_result_v1.patch)
that makes pgfdw_get_cleanup_result() report a warning message.


> Yeah, this seems to be an opportunity. But, the function should deal
> with the timeout separately, I'm concerned that the function will
> eventually be having if (timeout_param_specified) {  } else { } sort
> of code. We can see how much duplicate code we save here vs the
> readability or complexity that comes with the single function.

Please see the attached patch (refactor_pgfdw_get_result_v1.patch).
This is still WIP, but you can check how much the refactoring can
simplify the code.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachment
On Fri, Nov 19, 2021 at 9:14 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> On 2021/11/19 22:13, Bharath Rupireddy wrote:
> > How about adding the warning message in pgfdw_abort_cleanup instead of
> > pgfdw_get_cleanup_result?
> >
> > Just before this in pgfdw_abort_cleanup seems better to me.
>
> I was thinking pgfdw_get_cleanup_result() is better because it can
> easily report different warning messages based on cases of a timeout
> or connection failure, respectively. Since pgfdw_get_cleanup_result()
> returns true in both those cases, ISTM that it's not easy to
> distinguish them in pgfdw_abort_cleanup().
>
> Anyway, attached is the patch (pgfdw_get_cleanup_result_v1.patch)
> that makes pgfdw_get_cleanup_result() report a warning message.

It reports "remote SQL command: (cancel request)" which isn't a sql
query, but it looks okay to me as we report (cancel request). The
pgfdw_get_cleanup_result_v1 patch LGTM.

> > Yeah, this seems to be an opportunity. But, the function should deal
> > with the timeout separately, I'm concerned that the function will
> > eventually be having if (timeout_param_specified) {  } else { } sort
> > of code. We can see how much duplicate code we save here vs the
> > readability or complexity that comes with the single function.
>
> Please see the attached patch (refactor_pgfdw_get_result_v1.patch).
> This is still WIP, but you can check how much the refactoring can
> simplify the code.

I think we can discuss this refactoring patch separately. Thoughts?

Regards,
Bharath Rupireddy.




On 2021/11/20 1:38, Bharath Rupireddy wrote:
> It reports "remote SQL command: (cancel request)" which isn't a sql
> query, but it looks okay to me as we report (cancel request). The
> pgfdw_get_cleanup_result_v1 patch LGTM.

BTW, we can hide the message "remote SQL command: .." in cancel request case,
but which would make the debug and troubleshooting harder. So I decided to
use the string "(cancel request)" as SQL command string. Probably what string
should be used as SQL command might be debatable.


> I think we can discuss this refactoring patch separately. Thoughts?

Yes! I will consider again if it's worth doing the refactoring,
if yes, I will start new thread for the discussion for that.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Mon, Nov 22, 2021 at 8:25 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> On 2021/11/20 1:38, Bharath Rupireddy wrote:
> > It reports "remote SQL command: (cancel request)" which isn't a sql
> > query, but it looks okay to me as we report (cancel request). The
> > pgfdw_get_cleanup_result_v1 patch LGTM.
>
> BTW, we can hide the message "remote SQL command: .." in cancel request case,
> but which would make the debug and troubleshooting harder.

Yeah, let's not hide the message.

> So I decided to
> use the string "(cancel request)" as SQL command string. Probably what string
> should be used as SQL command might be debatable.

For a cancel request maybe we can just say without te errcontext:
                    ereport(WARNING,
                            (errmsg("could not get result of cancel
request due to timeout")));

See the below existing message using "cancel request":
                     errmsg("could not send cancel request: %s",

For SQL command we can say:
                    ereport(WARNING,
                            (errmsg("could not get query result due to
timeout"),
                             query ? errcontext("remote SQL command:
%s", query) : 0));

Regards,
Bharath Rupireddy.




On 2021/11/22 14:16, Bharath Rupireddy wrote:
>> BTW, we can hide the message "remote SQL command: .." in cancel request case,
>> but which would make the debug and troubleshooting harder.
> 
> Yeah, let's not hide the message.

Yes!


> For a cancel request maybe we can just say without te errcontext:
>                      ereport(WARNING,
>                              (errmsg("could not get result of cancel
> request due to timeout")));
> 
> See the below existing message using "cancel request":
>                       errmsg("could not send cancel request: %s",
> 
> For SQL command we can say:
>                      ereport(WARNING,
>                              (errmsg("could not get query result due to
> timeout"),
>                               query ? errcontext("remote SQL command:
> %s", query) : 0));

I wonder how pgfdw_get_cleanup_result() can determine which
log message to report. Probably we can add new boolean argument
to pgfdw_get_cleanup_result() so that it should be set to true
for cancel request case, but false for query case. Then
pgfdw_get_cleanup_result() can decide wihch message to log
based on that argument. But it seems not good design to me.
Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



On Fri, Dec 3, 2021 at 2:26 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> > For a cancel request maybe we can just say without te errcontext:
> >                      ereport(WARNING,
> >                              (errmsg("could not get result of cancel
> > request due to timeout")));
> >
> > See the below existing message using "cancel request":
> >                       errmsg("could not send cancel request: %s",
> >
> > For SQL command we can say:
> >                      ereport(WARNING,
> >                              (errmsg("could not get query result due to
> > timeout"),
> >                               query ? errcontext("remote SQL command:
> > %s", query) : 0));
>
> I wonder how pgfdw_get_cleanup_result() can determine which
> log message to report. Probably we can add new boolean argument
> to pgfdw_get_cleanup_result() so that it should be set to true
> for cancel request case, but false for query case. Then
> pgfdw_get_cleanup_result() can decide wihch message to log
> based on that argument. But it seems not good design to me.
> Thought?

Let's not use the boolean just for the cancel request which isn't
scalable IMO. Maybe a macro/enum?

Otherwise, we could just do, although it doesn't look elegant:

if (pgfdw_get_cleanup_result(conn, endtime, &result, "(cancel request)"))

if (strcmp(query, "(cancel request)") == 0)
    WARNING without  "remote SQL command:
else
    WARNING with "remote SQL command:

Regards,
Bharath Rupireddy.




On 2021/12/03 23:04, Bharath Rupireddy wrote:
> Let's not use the boolean just for the cancel request which isn't
> scalable IMO. Maybe a macro/enum?
> 
> Otherwise, we could just do, although it doesn't look elegant:
> 
> if (pgfdw_get_cleanup_result(conn, endtime, &result, "(cancel request)"))
> 
> if (strcmp(query, "(cancel request)") == 0)
>      WARNING without  "remote SQL command:
> else
>      WARNING with "remote SQL command:

Yeah, I agree that's not elegant..

So I'd like to propose new patch with different design from
what I proposed before. Patch attached.

This patch changes pgfdw_exec_cleanup_query() so that it tells
its callers the information about whether the timeout expired
or not. Then the callers (pgfdw_exec_cleanup_query and
pgfdw_cancel_query) report the warning messages based on
the results from pgfdw_exec_cleanup_query().

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachment
On Mon, Dec 6, 2021 at 1:47 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> Yeah, I agree that's not elegant..
>
> So I'd like to propose new patch with different design from
> what I proposed before. Patch attached.
>
> This patch changes pgfdw_exec_cleanup_query() so that it tells
> its callers the information about whether the timeout expired
> or not. Then the callers (pgfdw_exec_cleanup_query and
> pgfdw_cancel_query) report the warning messages based on
> the results from pgfdw_exec_cleanup_query().

+1 for adding a new timed_out param to pgfdw_get_cleanup_result.
pgfdw_get_cleanup_result_v2 patch looks good to me.

Regards,
Bharath Rupireddy.




On 2021/12/06 20:50, Bharath Rupireddy wrote:
> On Mon, Dec 6, 2021 at 1:47 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>> Yeah, I agree that's not elegant..
>>
>> So I'd like to propose new patch with different design from
>> what I proposed before. Patch attached.
>>
>> This patch changes pgfdw_exec_cleanup_query() so that it tells
>> its callers the information about whether the timeout expired
>> or not. Then the callers (pgfdw_exec_cleanup_query and
>> pgfdw_cancel_query) report the warning messages based on
>> the results from pgfdw_exec_cleanup_query().
> 
> +1 for adding a new timed_out param to pgfdw_get_cleanup_result.
> pgfdw_get_cleanup_result_v2 patch looks good to me.

Thanks for the review! I pushed the patch.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION