Thread: RE: BUG #14891: Old cancel request presented by pgbouncer honoredafter skipping a query.
RE: BUG #14891: Old cancel request presented by pgbouncer honoredafter skipping a query.
From
Skarsol
Date:
Well, taking pgbouncer down to a 60 second connection lifetime was insufficient to completely alleviate this. We had another transaction get killed yesterday due to a different query being cancelled. I'm still unsure how this isn't affecting other people. At this point, we're going to change our application to just abandon the current postgres connection when a user hits cancel instead of actually sending the cancel request. Ugly hack, but I'm not sure what else to do at this point. I'm still open to suggestions though. Or really, any discussion about this issue at all. Would this belong better on a different list? ----- The following bug has been logged on the website: Bug reference: 14891 Logged by: Skarsol Email address: postgresql(at)skarsol(dot)com PostgreSQL version: 9.6.3 Operating system: Linux 4.4.8-hardened-r1 #4 SMP Mon Jun 12 Description: This might be a symptom of the issue discussed in the ToDo "Changes to make cancellations more reliable and more secure" but as it is related to the pgbouncer bug I've opened at https://github.com/pgbouncer/pgbouncer/issues/245 I figured I'd post it over here just to make sure. As the last step of this bug, pgbouncer 1.7.2 presents a cancel request to postgres 9.6.3. This request targets pid 29330 which is connected to pgbouncer on port 33024. That pid then accepts a new query, returns a result set, accepts another new query, and then cancels that one out. Expected behavior would have been for either no cancel (as that pid was between queries at the time) or to cancel the first query. Cancelling the 2nd query is just weird (to me). I have no idea how much of this is related to whatever pgbouncer is doing to delay the cancel in the first place before presenting it to postgres. I'm aware that we're 2 minor versions behind, but I don't see anything that seems relevant to this in the changelogs. Image of the relevant wireshark display at https://user-images.githubusercontent.com/1915152/32578433-d5d4a71c-c4a2-11e7-9d25-f59d5afbb06b.jpg
Re: BUG #14891: Old cancel request presented by pgbouncer honoredafter skipping a query.
From
Greg Stark
Date:
If I understand the bug report correctly you have two layers of pgbouncer and it's only the second layer that is introducing these latencies? I wonder if that should help point out why these latencies are happening but I don't have any immediate ideas. I suppose there are two avenues for addressing this, but they're both probably large projects, not simple bug fixes.... 1) PGBouncer is the one breaking the session <-> tcp stream correspondence. Perhaps it should learn about cancellation messages and learn about how to determine whether that backend session is still allocated to the same incoming session? I'm not sure how that would happen though since the cancellation request doesn't have any information about which session is the intended target. 2) The postgres client could include some meta information about what request the client is intending to cancel. But that has a few gotchas as well. Clients stream many requests and process the responses as they arrive. Just because the client has processed up to a certain point doesn't mean the server hasn't streamed out responses past that. It's quite common to want to cancel precisely because a stream of responses are coming.
Re: BUG #14891: Old cancel request presented by pgbouncer honoredafter skipping a query.
From
Skarsol
Date:
On 2017-12-19 10:43, Greg Stark wrote: > If I understand the bug report correctly you have two layers of > pgbouncer and it's only the second layer that is introducing these > latencies? I wonder if that should help point out why these latencies > are happening but I don't have any immediate ideas. To clarify, we also see the issue in cases where there is only one PGB between the client and the database.