On Fri, Oct 5, 2012 at 10:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
>> On Friday, October 05, 2012 05:31:43 PM Tom Lane wrote:
>>> There's no guarantee that the planner won't re-sort the rows coming from
>>> the sub-select, unfortunately.
>
>> More often than not you can prevent the planner from doing that by putting a
>> OFFSET 0 in the query. Not 100% but better than nothing.
>
> No, that will accomplish exactly nothing. The ORDER BY is already an
> optimization fence. The problem is that of the several ways the planner
> might choose to join the subquery output to the original table, not all
> will produce the join rows in the same order as the subquery's result
> is. For instance, when I tried his example I initially got
>
> Delete on test (cost=400.88..692.85 rows=18818 width=34)
> -> Merge Join (cost=400.88..692.85 rows=18818 width=34)
> Merge Cond: (test.g = x.g)
> -> Sort (cost=135.34..140.19 rows=1940 width=10)
> Sort Key: test.g
> -> Seq Scan on test (cost=0.00..29.40 rows=1940 width=10)
> -> Sort (cost=265.53..270.38 rows=1940 width=32)
> Sort Key: x.g
> -> Subquery Scan on x (cost=135.34..159.59 rows=1940 width=32)
> -> Sort (cost=135.34..140.19 rows=1940 width=10)
> Sort Key: test_1.ctid
> -> Seq Scan on test test_1 (cost=0.00..29.40 rows=1940 width=10)
>
> which is going to do the deletes in "g" order, not ctid order;
> and then after an ANALYZE I got
>
> Delete on test (cost=90.83..120.58 rows=1000 width=34)
> -> Hash Join (cost=90.83..120.58 rows=1000 width=34)
> Hash Cond: (test.g = x.g)
> -> Seq Scan on test (cost=0.00..16.00 rows=1000 width=10)
> -> Hash (cost=78.33..78.33 rows=1000 width=32)
> -> Subquery Scan on x (cost=65.83..78.33 rows=1000 width=32)
> -> Sort (cost=65.83..68.33 rows=1000 width=10)
> Sort Key: test_1.ctid
> -> Seq Scan on test test_1 (cost=0.00..16.00 rows=1000 width=10)
>
> which is going to do the deletes in ctid order, but that's an artifact
> of using a seqscan on the test table; the order of the subquery's output
> is irrelevant, since it got hashed.
>
>> We really need ORDER BY for DML.
>
> Meh. That's outside the SQL standard (not only outside the letter of
> the standard, but foreign to its very conceptual model) and I don't
> think the problem really comes up that often. Personally, if I had to
> deal with this I'd use a plpgsql function (or DO command) that does
Can't it be forced like this (assuming it is in fact a vanilla order
by problem)?
EXPLAIN DELETE FROM test USING (SELECT g FROM test ORDER BY
ctid FOR UPDATE) x where x.g = test.g;
(emphasis on 'for update')
merlin