Re: NOTIFY does not work as expected - Mailing list pgsql-bugs

From Jeff Janes
Subject Re: NOTIFY does not work as expected
Date
Msg-id CAMkU=1yxjDmqpFGMN04EXz86XoQUufQfh5zTscwbvdCikPosEA@mail.gmail.com
Whole thread Raw
In response to Re: NOTIFY does not work as expected  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: NOTIFY does not work as expected  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
On Tue, Jul 3, 2018 at 9:42 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Jul 3, 2018 at 12:53 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Jul 3, 2018 at 12:30 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

In my hands, it bisects down to this:

commit 4f85fde8eb860f263384fffdca660e16e77c7f76
Author: Andres Freund <andres@anarazel.de>
Date:   Tue Feb 3 22:25:20 2015 +0100

    Introduce and use infrastructure for interrupt processing during client reads.

But that was committed in 9.5.dev, not between 9.6.2 and 9.6.9.

It is on the server side.  This is testing with psql, and it doesn't seem to matter which version of it.  Maybe there is something between 9.6.2 and 9.6.9 that shows up with another client or more.

Further diagnosis here is that in the "working" case the client receives a single packet from the server containing both the pg_sleep response, and async response, in that order, and the client processes both of them.  In the "broken" case, the client receives a single packet from the server containing the pg_sleep response, and processes it, and then blocks on user input.  The async response is immediately available in the next packet if the client would ask for it, but the client doesn't do so.

It looks like I was wrong here.  The 2nd packet with the async message is not generally sent immediately, the server backend can hold it up until the next time it either hears from the frontend, or until it gets a SIGUSR1 due to another incoming NOTIFY.

But I still see this undesired behavior showing up in 9.5dev in the mentioned commit, not showing up between 9.6.2 and 9.6.9.

Andrey confirmed to me off list that he was mistaken about it working the way he expected in 9.6.2, the issue started earlier.

Reading through the comments touched by the commit, it seems obvious what the bug is.  It says "cause the processing to occur just before we next go idle", but also says "This is called just *after* waiting for a frontend command", which is too late to be "before we next go idle"

I've attached a simple proof-of-concept patch which calls ProcessClientReadInterrupt() before the client read is started so it can process pre-existing interrupt flags, not just when it is interrupted during the read or when the read is done.  (That does seem to render the name of the function inapt).  

I don't know if this is the correct way to fix it, it is just a POC.  It looks like cache invalidation catch-up demands might also be involved in the delay.

With this patch, the async arrives right after the pg_sleep response, but it is still in a second packet.  That means it doesn't fix the apparent regression when viewed with psql, as psql doesn't process the 2nd packet right away.  So I instead tested it with this Perl script:

perl -le 'use DBI; my $dbh=DBI->connect("dbi:Pg:port=9999;host=localhost"); $dbh->do("listen test"); $dbh->do("select pg_sleep(5)"); warn "done with sleep"; while (true) { $r=$dbh->pg_notifies(); warn join "\t", @$r if $r}'
 
Without my patch, notices sent during the pg_sleep will not get delivered until another notice is sent after the sleep is over.  With the they patch, they get delivered as soon as the pg_sleep is done.

Cheers

Jeff
Attachment

pgsql-bugs by date:

Previous
From:
Date:
Subject: RE: BUG #15261: Insert binary data in bytea rejected due to UTF-8encoding errors
Next
From: Michael Monscheuer
Date:
Subject: Year 2024: Extracting week from date gives wrong results for the lasttwo days of the year