Re: allow LIMIT in UPDATE and DELETE - Mailing list pgsql-general

From Csaba Nagy
Subject Re: allow LIMIT in UPDATE and DELETE
Date
Msg-id 1148044679.17461.367.camel@coppola.muc.ecircle.de
Whole thread Raw
In response to Re: allow LIMIT in UPDATE and DELETE  ("chris smith" <dmagick@gmail.com>)
Responses Re: allow LIMIT in UPDATE and DELETE
List pgsql-general
> I asked that question a while ago..
>
> http://archives.postgresql.org/pgsql-general/2005-04/msg00386.php
>
> and got this response:
>
> http://archives.postgresql.org/pgsql-general/2005-04/msg00387.php
>
> Works quite well :)

I don't doubt that it works well, in fact that's what I plan to use
myself... I was not asking for a workaround, that I can figure out
myself :-)

Regarding the arguments of the post you linked:

"It isn't in the SQL standard" - neither LIMIT on SELECT, still it is a
very useful feature;

"it would have undefined behavior" - so does LIMIT on SELECT without
order by, and it is again still useful when you don't care about which
entry you get, you just want one of them. It certainly provides a
planning advantage in some cases where ordering would mean a sort;

The subquery delete will definitely have more involved plan than a
delete with limit. On some of my tables that would make a difference,
even if not that big one due to caching effects (the same rows are
deleted/updated which were already visited by the subquery). I can't say
for sure how big is the penalty of doing the subqery, but the plans I
have seen involve something like:

db=# prepare test_001(bigint, bigint, smallint) as
db-# DELETE FROM big_table
db-# WHERE (col1, col2) IN
db-#     (SELECT col1, col2 FROM big_table
db(#      WHERE col1=$2
db(#        AND col3 IS NOT NULL
db(#        AND col4 =$3
db(#        AND col5 <> 'o'
db(#        LIMIT 1000);
PREPARE
db=# explain execute test_001(1,1,1);
                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=2166.10..2282.99 rows=1 width=6)
   ->  HashAggregate  (cost=2166.10..2166.10 rows=29 width=16)
         ->  Subquery Scan "IN_subquery"  (cost=0.00..2165.95 rows=29
width=16)
               ->  Limit  (cost=0.00..2165.66 rows=29 width=16)
                     ->  Index Scan using idx_big_table_col3 on
big_table  (cost=0.00..2165.66 rows=29 width=16)
                           Index Cond: (col1 = $2)
                           Filter: ((col3 IS NOT NULL) AND (col4 = $3)
AND ("col5" <> 'o'::bpchar))
   ->  Index Scan using pk_big_table on big_table  (cost=0.00..4.02
rows=1 width=22)
         Index Cond: ((big_table.col1 = "outer".col1) AND
(big_table.col2 = "outer".col2))
(9 rows)

idx_big_table_col3 - is a partial index where col3 is not null, which
means ~ 10% of the total rows. The estimates are a generic estimate, the
worst case is that there are a few 100K rows selected by the subselect
before the limit. So I guess the worst case can have as much as double
cost than a plan for DELETE with LIMIT would have.

With the LIMIT allowed on DELETE it would be something like:

               ->  Limit  (cost=0.00..2165.66 rows=29 width=16)
                     ->  Index Scan using idx_big_table_col3 on
big_table  (cost=0.00..2165.66 rows=29 width=16)
                           Index Cond: (col1 = $2)
                           Filter: ((col3 IS NOT NULL) AND (col4 = $3)
AND ("col5" <> 'o'::bpchar))


That would spare a HashAggregate and an index scan. The index scan would
very likely not be a problem, as the same index entries are visited in
the subquery and likely are cached, and the HashAggregate should be also
fast for the max 1000 rows it has to handle, but they are still
completely unnecessary for my purpose, so I still think the LIMIT on
DELETE and UPDATE would make perfect sense.

Cheers,
Csaba.



pgsql-general by date:

Previous
From: "Dawid Kuroczko"
Date:
Subject: Re: allow LIMIT in UPDATE and DELETE
Next
From: "Nik"
Date:
Subject: Insert into partition table hangs