Re: Query cancel seems to be broken in master since Oct 17 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Query cancel seems to be broken in master since Oct 17
Date
Msg-id 21605.1476799419@sss.pgh.pa.us
Whole thread Raw
In response to Re: Query cancel seems to be broken in master since Oct 17  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Query cancel seems to be broken in master since Oct 17  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Query cancel seems to be broken in master since Oct 17  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 10/18/2016 04:13 PM, Tom Lane wrote:
>> There's a smoking gun in the postmaster log:
>> 2016-10-18 09:10:34.547 EDT [18502] LOG:  wrong key in cancel request for process 18491

> Ok, I've reverted that commit for now. It clearly needs more thought,
> because of this, and the pademelon failure discussed on the other thread.

I think that was an overreaction.  The problem is pretty obvious after
adding some instrumentation:

2016-10-18 09:57:47.508 EDT [21229] LOG:  wrong key (0x7B7E4D5E, expected 0xF0F804017B7E4D5E) in cancel request for
process21228 

To wit, the various cancel_key backend variables are declared as "long",
and the new code
if (!pg_strong_random(&MyCancelKey, sizeof(MyCancelKey)))

is therefore computing an 8-byte random value on 64-bit-long machines.
But only 4 bytes go to the client and come back.

The cleanest fix might be to change those various "long" variables
to uint32.  You'd have to think about how to handle the ntohl/htonl
calls that are used on them, though.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: emergency outage requiring database restart
Next
From: Euler Taveira
Date:
Subject: Re: Move pg_largeobject to a different tablespace *without* turning on system_table_mods.