Thread: Inexplicable UPDATE...RETURNING behaviour

Inexplicable UPDATE...RETURNING behaviour

From

Joe Wildish

Date:

15 April 2019, 23:07:53

Hello all,

We are seeing an inexplicable behaviour when issuing an "UPDATE..RETURNING" statement. I am unsure if it is a Postgres
bug.Additional eyes-on would be much appreicated. 

When issuing the following statement we are seeing multiple rows UPDATE'd despite the use of LIMIT 1 and despite the
"uid"column in the "some_queue" table having a PRIMARY KEY constraint on it: 

UPDATE queue.some_queue AS q
   SET (state, awaiting) = ('executing', FALSE)
  FROM (SELECT uid
          FROM queue.some_queue
         WHERE awaiting
           AND process_after <= CURRENT_TIMESTAMP AT TIME ZONE 'UTC'
         ORDER BY process_after ASC
           FOR UPDATE SKIP LOCKED
         LIMIT 1)
    AS dq(uid)
 WHERE q.uid = dq.uid
 RETURNING q.uid;

However, when using the following statement, which (AFAIK) is semantically equivalent, we see only a single row being
updated/dequeued:

UPDATE queue.some_queue AS q
   SET (state, awaiting) = ('executing', FALSE)
 WHERE uid = (SELECT uid
                FROM queue.some_queue
               WHERE awaiting
                 AND process_after <= CURRENT_TIMESTAMP AT TIME ZONE 'UTC'
               ORDER BY process_after ASC
                 FOR UPDATE SKIP LOCKED
               LIMIT 1)
 RETURNING uid;

IMO the two statements should yield the same result. But, we see the first one updating multiple rows and therefore
dequeingmultiple uids, yet the second one functions as intended (ie. single item is dequeued). 

We can replicate this locally in tests but I can't explain it. Is this a bug, or am I overlooking something?

Cheers,
-Joe

PG. Postgres 10.6 in production, and the same behaviour with 10.5 + 11.2 in dev.

Re: Inexplicable UPDATE...RETURNING behaviour

From

Tom Lane

Date:

15 April 2019, 23:58:49

Joe Wildish <joe-postgresql.org@elusive.cx> writes:
> We are seeing an inexplicable behaviour when issuing an "UPDATE..RETURNING" statement. I am unsure if it is a
Postgresbug. Additional eyes-on would be much appreicated. 

> When issuing the following statement we are seeing multiple rows UPDATE'd despite the use of LIMIT 1 and despite the
"uid"column in the "some_queue" table having a PRIMARY KEY constraint on it: 

> UPDATE queue.some_queue AS q
>    SET (state, awaiting) = ('executing', FALSE)
>   FROM (SELECT uid
>           FROM queue.some_queue
>          WHERE awaiting
>            AND process_after <= CURRENT_TIMESTAMP AT TIME ZONE 'UTC'
>          ORDER BY process_after ASC
>            FOR UPDATE SKIP LOCKED
>          LIMIT 1)
>     AS dq(uid)
>  WHERE q.uid = dq.uid
>  RETURNING q.uid;

Yeah, there was another similar complaint a few weeks ago --- has this
suddenly gotten to be a popular coding idea?

The basic problem with what you have here is that FOR UPDATE (especially
with SKIP LOCKED) makes the sub-select's output unstable by definition.
If it's executed more than once then you might get different rows back,
allowing the outer UPDATE's join to potentially match multiple rows from
the outer instance of queue.some_queue.  Typically, since it's LIMIT 1,
I'd think that the planner would put dq on the outside of a nestloop plan
and you'd escape seeing any problem --- but if it gets put on the inside
of a nestloop, it's going to misbehave.

There are (at least) two different ways that the sub-select's output
might change when re-executed, even though it's still using the same
snapshot as before:

1. If some other transaction releases a row lock in between, SKIP LOCKED
might not skip that row any more.

2. The row returned the first time will absolutely not get chosen the
second time, given this particular query formulation, because its latest
updated version will have awaiting = false thanks to the action of the
outer UPDATE, so it'll fail the inner WHERE test.

The way I'd recommend fixing it is to put the FOR UPDATE into a WITH
to guarantee single execution:

WITH dq(uid) AS (SELECT uid ... LIMIT 1)
UPDATE queue.some_queue q SET ...
FROM dq
WHERE q.uid = dq.uid
RETURNING q.uid;

            regards, tom lane

Re: Inexplicable UPDATE...RETURNING behaviour

From

Joe Wildish

Date:

26 April 2019, 12:25:54

Hi Tom,

> On 16 Apr 2019, at 00:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Joe Wildish <joe-postgresql.org@elusive.cx> writes:
>> We are seeing an inexplicable behaviour when issuing an "UPDATE..RETURNING" statement. I am unsure if it is a
Postgresbug. Additional eyes-on would be much appreicated. 
>
>> When issuing the following statement we are seeing multiple rows UPDATE'd despite the use of LIMIT 1 and despite the
"uid"column in the "some_queue" table having a PRIMARY KEY constraint on it: 
>
>> UPDATE queue.some_queue AS q
>>   SET (state, awaiting) = ('executing', FALSE)
>>  FROM (SELECT uid
>>          FROM queue.some_queue
>>         WHERE awaiting
>>           AND process_after <= CURRENT_TIMESTAMP AT TIME ZONE 'UTC'
>>         ORDER BY process_after ASC
>>           FOR UPDATE SKIP LOCKED
>>         LIMIT 1)
>>    AS dq(uid)
>> WHERE q.uid = dq.uid
>> RETURNING q.uid;
>
> Yeah, there was another similar complaint a few weeks ago --- has this
> suddenly gotten to be a popular coding idea?

I can't comment on that specifically. However, in my case, the reason this came up was simply that the original code
wasissuing a SELECT to grab a UID and then immediately issuing an UPDATE against the same UID. This isn't necessarily a
problem,of course, given the semantics of FOR UPDATE SKIP LOCKED LIMIT 1. It just seemed neater to have a single
statementhandle the dequeue operation. An added bonus was that tracking statements via pg_stat_activity for
queue-relatedoperations became easier to comprehend, as the dequeue was no longer split between two statements. 

> The basic problem with what you have here is that FOR UPDATE (especially
> with SKIP LOCKED) makes the sub-select's output unstable by definition.
> If it's executed more than once then you might get different rows back,
> allowing the outer UPDATE's join to potentially match multiple rows from
> the outer instance of queue.some_queue.  Typically, since it's LIMIT 1,
> I'd think that the planner would put dq on the outside of a nestloop plan
> and you'd escape seeing any problem --- but if it gets put on the inside
> of a nestloop, it's going to misbehave.
> ... 8< ...

Unfortunately I can't remember exactly what the plans were doing exactly --- we did take a look at EXPLAIN to see if we
couldfigure out an explanation for the behaviour, but I foolishly didn't keep a copy of the outputs. 

> The way I'd recommend fixing it is to put the FOR UPDATE into a WITH
> to guarantee single execution:
>
> WITH dq(uid) AS (SELECT uid ... LIMIT 1)
> UPDATE queue.some_queue q SET ...
> FROM dq
> WHERE q.uid = dq.uid
> RETURNING q.uid;

Thanks. We are actually now running a version of the code now that does "UPDATE .. WHERE q.uid = (SELECT .. LIMIT 1)
RETURNINGq.uid" and the multiple dequeues have gone away. The subselect therefore appears to only being executed once
inthis scenario too. 

Ironically my original version of this code used the WITH construct. I switched to using the subselect instead, purely
tomake it easier to write a query over pg_stat_activity that classified each statement on the basis of if they were
INSERTing,UPDATEing or DELETEing from the queues :-) 

Thanks for you help.

Cheers,
-Joe