Re: allow LIMIT in UPDATE and DELETE - Mailing list pgsql-general
From | Csaba Nagy |
---|---|
Subject | Re: allow LIMIT in UPDATE and DELETE |
Date | |
Msg-id | 1148044679.17461.367.camel@coppola.muc.ecircle.de Whole thread Raw |
In response to | Re: allow LIMIT in UPDATE and DELETE ("chris smith" <dmagick@gmail.com>) |
Responses |
Re: allow LIMIT in UPDATE and DELETE
|
List | pgsql-general |
> I asked that question a while ago.. > > http://archives.postgresql.org/pgsql-general/2005-04/msg00386.php > > and got this response: > > http://archives.postgresql.org/pgsql-general/2005-04/msg00387.php > > Works quite well :) I don't doubt that it works well, in fact that's what I plan to use myself... I was not asking for a workaround, that I can figure out myself :-) Regarding the arguments of the post you linked: "It isn't in the SQL standard" - neither LIMIT on SELECT, still it is a very useful feature; "it would have undefined behavior" - so does LIMIT on SELECT without order by, and it is again still useful when you don't care about which entry you get, you just want one of them. It certainly provides a planning advantage in some cases where ordering would mean a sort; The subquery delete will definitely have more involved plan than a delete with limit. On some of my tables that would make a difference, even if not that big one due to caching effects (the same rows are deleted/updated which were already visited by the subquery). I can't say for sure how big is the penalty of doing the subqery, but the plans I have seen involve something like: db=# prepare test_001(bigint, bigint, smallint) as db-# DELETE FROM big_table db-# WHERE (col1, col2) IN db-# (SELECT col1, col2 FROM big_table db(# WHERE col1=$2 db(# AND col3 IS NOT NULL db(# AND col4 =$3 db(# AND col5 <> 'o' db(# LIMIT 1000); PREPARE db=# explain execute test_001(1,1,1); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=2166.10..2282.99 rows=1 width=6) -> HashAggregate (cost=2166.10..2166.10 rows=29 width=16) -> Subquery Scan "IN_subquery" (cost=0.00..2165.95 rows=29 width=16) -> Limit (cost=0.00..2165.66 rows=29 width=16) -> Index Scan using idx_big_table_col3 on big_table (cost=0.00..2165.66 rows=29 width=16) Index Cond: (col1 = $2) Filter: ((col3 IS NOT NULL) AND (col4 = $3) AND ("col5" <> 'o'::bpchar)) -> Index Scan using pk_big_table on big_table (cost=0.00..4.02 rows=1 width=22) Index Cond: ((big_table.col1 = "outer".col1) AND (big_table.col2 = "outer".col2)) (9 rows) idx_big_table_col3 - is a partial index where col3 is not null, which means ~ 10% of the total rows. The estimates are a generic estimate, the worst case is that there are a few 100K rows selected by the subselect before the limit. So I guess the worst case can have as much as double cost than a plan for DELETE with LIMIT would have. With the LIMIT allowed on DELETE it would be something like: -> Limit (cost=0.00..2165.66 rows=29 width=16) -> Index Scan using idx_big_table_col3 on big_table (cost=0.00..2165.66 rows=29 width=16) Index Cond: (col1 = $2) Filter: ((col3 IS NOT NULL) AND (col4 = $3) AND ("col5" <> 'o'::bpchar)) That would spare a HashAggregate and an index scan. The index scan would very likely not be a problem, as the same index entries are visited in the subquery and likely are cached, and the HashAggregate should be also fast for the max 1000 rows it has to handle, but they are still completely unnecessary for my purpose, so I still think the LIMIT on DELETE and UPDATE would make perfect sense. Cheers, Csaba.
pgsql-general by date: