Re: delete statement returning too many results - Mailing list pgsql-general

From Tom Lane
Subject Re: delete statement returning too many results
Date
Msg-id 1011786.1669743348@sss.pgh.pa.us
Whole thread Raw
In response to Re: delete statement returning too many results  (Harmen <harmen@lijzij.de>)
Responses Re: delete statement returning too many results  (Arlo Louis O'Keeffe <gnomelver@k5d.de>)
List pgsql-general
Harmen <harmen@lijzij.de> writes:
> On Mon, Nov 28, 2022 at 12:11:53PM -0500, Tom Lane wrote:
>> So basically it's unsafe to run the sub-select more than once,
>> but the query as written leaves it up to the planner whether
>> to do that.  I'd suggest rephrasing as [...]

> I'm not the original poster, but I do use similar constructions for simple
> postgres queues. I've been trying for a while, but I don't understand where the
> extra rows come from, or what's "silent" about SKIP LOCKED.

Sorry, I should not have blamed SKIP LOCKED in particular; this
construction will misbehave with or without that.  The issue is with
using SELECT FOR UPDATE inside a DELETE or UPDATE that then modifies
the row that the subquery returned.  The next execution of the subquery
will, or should, return a different row: either some not-deleted row,
or the modified row.  So in this context, the result of the subquery
is volatile.  The point of putting it in a MATERIALIZED CTE is to
lock the result down regardless of that.

> Because we get different results depending on the plan postgres picks, I can
> see two options: either the query is broken, or postgres is broken.

You can argue that the planner should treat volatile subqueries
differently than it does today.  But the only reasonable way of
tightening the semantics would be to force re-execution of such a
subquery every time, even when it's not visibly dependent on the
outer query.  That would be pretty bad for performance, and I doubt
it would make the OP happy in this example, because what it would
mean is that his query "fails" every time not just sometimes.
(Because of that, I don't have too much trouble concluding that
the query is broken, whether or not you feel that postgres is
also broken.)

The bigger picture here is that we long ago decided that the planner
should not inquire too closely into the volatility of subqueries,
primarily because there are use-cases where people intentionally rely
on them not to be re-executed.  As an example, these queries give
different results:

regression=# select random() from generate_series(1,3);
       random
---------------------
  0.7637195395988317
 0.09569374432524946
   0.490132093120365
(3 rows)

regression=# select (select random()) from generate_series(1,3);
       random
--------------------
 0.9730230633436501
 0.9730230633436501
 0.9730230633436501
(3 rows)

In the second case, the sub-select is deemed to be independent
of the outer query and executed only once.  You can argue that
if that's what you want you should be forced to put the sub-select
in a materialized CTE to make that plain.  But we felt that that
would make many more people unhappy than happy, so we haven't
done it.  Maybe the question could be revisited once all PG
versions lacking the MATERIALIZED syntax are long dead.

            regards, tom lane



pgsql-general by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Re: Seeking the correct term of art for the (unique) role that is usually called "postgres"—and the mental model that underlies it all
Next
From: Young Seung Andrew Ko
Date:
Subject: PostgreSQL extension for processing Graph queries (Apache AGE)