Thread: RE: BUG #14891: Old cancel request presented by pgbouncer honoredafter skipping a query.

Well, taking pgbouncer down to a 60 second connection lifetime was 
insufficient to completely alleviate this. We had another transaction 
get killed yesterday due to a different query being cancelled. I'm still 
unsure how this isn't affecting other people.

At this point, we're going to change our application to just abandon the 
current postgres connection when a user hits cancel instead of actually 
sending the cancel request. Ugly hack, but I'm not sure what else to do 
at this point. I'm still open to suggestions though. Or really, any 
discussion about this issue at all. Would this belong better on a 
different list?

-----

The following bug has been logged on the website:

Bug reference:      14891
Logged by:          Skarsol
Email address:      postgresql(at)skarsol(dot)com
PostgreSQL version: 9.6.3
Operating system:   Linux 4.4.8-hardened-r1 #4 SMP Mon Jun 12
Description:

This might be a symptom of the issue discussed in the ToDo "Changes to 
make
cancellations more reliable and more secure" but as it is related to the
pgbouncer bug I've opened at
https://github.com/pgbouncer/pgbouncer/issues/245 I figured I'd post it 
over
here just to make sure.

As the last step of this bug, pgbouncer 1.7.2 presents a cancel request 
to
postgres 9.6.3. This request targets pid 29330 which is connected to
pgbouncer on port 33024. That pid then accepts a new query, returns a 
result
set, accepts another new query, and then cancels that one out.

Expected behavior would have been for either no cancel (as that pid was
between queries at the time) or to cancel the first query. Cancelling 
the
2nd query is just weird (to me).

I have no idea how much of this is related to whatever pgbouncer is 
doing to
delay the cancel in the first place before presenting it to postgres.

I'm aware that we're 2 minor versions behind, but I don't see anything 
that
seems relevant to this in the changelogs.

Image of the relevant wireshark display at
https://user-images.githubusercontent.com/1915152/32578433-d5d4a71c-c4a2-11e7-9d25-f59d5afbb06b.jpg


If I understand the bug report correctly you have two layers of
pgbouncer and it's only the second layer that is introducing these
latencies? I wonder if that should help point out why these latencies
are happening but I don't have any immediate ideas.

I suppose there are two avenues for addressing this, but they're both
probably large projects, not simple bug fixes....

1) PGBouncer is the one breaking the session <-> tcp stream
correspondence. Perhaps it should learn about cancellation messages
and learn about how to determine whether that backend session is still
allocated to the same incoming session? I'm not sure how that would
happen though since the cancellation request doesn't have any
information about which session is the intended target.

2) The postgres client could include some meta information about what
request the client is intending to cancel. But that has a few gotchas
as well. Clients stream many requests and process the responses as
they arrive. Just because the client has processed up to a certain
point doesn't mean the server hasn't streamed out responses past that.
It's quite common to want to cancel precisely because a stream of
responses are coming.


On 2017-12-19 10:43, Greg Stark wrote:
> If I understand the bug report correctly you have two layers of
> pgbouncer and it's only the second layer that is introducing these
> latencies? I wonder if that should help point out why these latencies
> are happening but I don't have any immediate ideas.

To clarify, we also see the issue in cases where there is only one PGB 
between the client and the database.