Re: walreceiver is uninterruptible on win32 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: walreceiver is uninterruptible on win32
Date
Msg-id z2g603c8f071004150726rf33e1c27vf48503eaf9630a6d@mail.gmail.com
Whole thread Raw
In response to Re: walreceiver is uninterruptible on win32  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: walreceiver is uninterruptible on win32  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
On Wed, Apr 14, 2010 at 11:13 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Apr 14, 2010 at 11:15 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>> http://archives.postgresql.org/pgsql-hackers/2010-04/msg00077.php
>>>> As for the code itself, don't we need a CHECK_FOR_INTERRUPTS in there
>>>> for it to be actually useful?
>>>
>>> Since the backend version of select() receives any incoming signals
>>> on Win32, CHECK_FOR_INTERRUPTS seems not to be needed in the loop,
>>> at least in walreceiver. No? The patch doesn't put it in there, and
>>> I was able to interrupt walreceiver executing libpqrcv_PQexec() when
>>> I tested the patch on Win32.
>>
>> It will call the signal handler, yes. Normally, the signal handler
>> just sets a flag somewhere, which needs to be checked with
>> CHECK_FOR_INTERRUPTS.
>>
>> From how I read the walreceiver.c code (which I'm not that familiar
>> with), the signal handlers call ProcessWalRcvInterrupts() which in
>> turn has CHECK_FOR_INTERRUPTS in it, and this is where it ends up
>> being called.
>
> Yes. While establishing replication connection (i.e., executing
> walrcv_connect function), the SIGTERM signal handler directly calls
> ProcessWalRcvInterrupts() which does CHECK_FOR_INTERRUPTS() and
> elog(FATAL).
>
>>>> The patch also seems confused about whether it's intending to be a
>>>> general-purpose solution or not.  You can maybe take some shortcuts
>>>> if it's only going to be for walreceiver, but if you're going to put
>>>> it in dblink it is *not* acceptable to take shortcuts like not
>>>> processing errors completely.
>>>
>>> The patch still takes some shortcuts since we decided to postpone
>>> the fix for dblink to 9.1 or later.
>>
>> Given those shortcuts, can't we simplify it even further like
>> attached?
>
> No, we need to repeat PQgetResult() at least two times. The first call
> of it reads the RowDescription, DataRow and CommandComplete messages
> and switches the state to PGASYNC_READY. The second one reads the
> ReadyForQuery message and switches the state to PGASYNC_IDLE. So if we
> don't repeat it, libpqrcv_PQexec() would end in a half-finished state.
>
>> (If nothing else, your code did PQclear() on an
>> uninitialized pointer - must've been pure luck that it worked)
>
> PQclear(NULL) might be called in my patch, but this is not a problem
> since PQclear() does nothing if the specified PGresult argument is NULL.
>
>> Looking at the call-sites, there are bugs now - if PQexec() returns
>> NULL, we don't deal with it. It also doesn't always free the result
>> properly. I've added checks for that.
>
> As Tom pointed out in another post, we don't need to treat the
> "result is NULL" case as special. OTOH, as you pointed out, I
> forgot calling PQclear() when the second call of libpqrcv_PQexec()
> in libpqrcv_connect() fails. I added it to the patch. Thanks!
>
>> Finally, I've updated some of the comments.
>
> Thanks a lot! I applied that improvements to the patch.
>
> I attached the revised patch.

I have to admit to finding this confusing.  According to the comments:

+         /*
+          * Don't emulate the PQexec()'s behavior of returning the last
+          * result when there are many, since walreceiver never sends a
+          * query returning multiple results.
+          */

...but it looks like the code actually is implementing some sort of
loop-that-returns-the-last result.

...Robert


pgsql-hackers by date:

Previous
From: "Erik Rijkers"
Date:
Subject: Re: testing HS/SR - invalid magic number
Next
From: Alvaro Herrera
Date:
Subject: Re: Streaming replication and a disk full in primary