Thread: Deleting 100 rows which meets certain criteria
I am newbie to postgres/SQL. I want to delete all rows exceeding 400 hours (10 days) and I want to limit deletion of only 100 records at a time. I was trying the following in PostgreSQL: DELETE from mytable WHERE (now() - timestamp_field > INTERVAL '400 hour' ) LIMIT 100; Looks like DELETE syntax doesn't support LIMIT. Is there any other way to achieve this? thanks Shul
On Wed, Dec 30, 2009 at 12:11 PM, shulkae <shulkae@gmail.com> wrote: > I am newbie to postgres/SQL. > > I want to delete all rows exceeding 400 hours (10 days) and I want to > limit deletion of only 100 records at a time. > > I was trying the following in PostgreSQL: > > DELETE from mytable WHERE (now() - timestamp_field > INTERVAL '400 > hour' ) LIMIT 100; > > Looks like DELETE syntax doesn't support LIMIT. > > Is there any other way to achieve this? > > thanks > Shul > You're correct to infer that DELETE does not support LIMIT clauses. The reason for this is there is no way to tell exactly which rows will be deleted (unless ORDER BY is used - but there are still other issues with that as well). You could, however, do something like such: DELETE FROM "table" WHERE "column" IN (SELECT "column" FROM "table" LIMIT 100 OFFSET 0); (But again, you would need explicit ORDER BY clauses to determine exactly which rows are actually deleted.
shulkae <shulkae@gmail.com> wrote: > I am newbie to postgres/SQL. > > I want to delete all rows exceeding 400 hours (10 days) and I want to > limit deletion of only 100 records at a time. > > I was trying the following in PostgreSQL: > > DELETE from mytable WHERE (now() - timestamp_field > INTERVAL '400 > hour' ) LIMIT 100; > > Looks like DELETE syntax doesn't support LIMIT. > > Is there any other way to achieve this? Select all rows with this condition, order by this field descending, limit 100. Now you have all possible rows to delete. Simple example: test=*# select * from shulkae ; i ---- 2 4 1 6 4 1 9 11 13 2 17 15 (12 rows) Time: 0.211 ms test=*# delete from shulkae where i in (select * from shulkae where i > 10 order by i desc limit 2); DELETE 2 Time: 0.542 ms test=*# select * from shulkae ; i ---- 2 4 1 6 4 1 9 11 13 2 (10 rows) Time: 0.200 ms test=*# Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
shulkae wrote: > DELETE from mytable WHERE (now() - timestamp_field > INTERVAL '400 > hour' ) LIMIT 100; > Force of habit (not sure if the optimizer does this trick for you) is first to rewrite this as follows: DELETE from mytable WHERE timestamp_field < (now() - INTERVAL '400 hour' ) LIMIT 100; Just to turn the comparison into a constant being compared with the field. If there's a useful primary key on this table, you can do this to delete: DELETE FROM mytable where pkey IN (SELECT pkey from mytable WHERE timestamp_field < (now() - INTERVAL '400 hour' ) LIMIT 100); If there's not a primary key, you can use a hidden field named ctid to get your record list: http://www.postgresql.org/docs/current/static/ddl-system-columns.html And then use that as the way to communicate the candidate deletion list out of the subselect: DELETE FROM mytable where ctid IN (SELECT ctid from mytable WHERE timestamp_field < (now() - INTERVAL '400 hour' ) LIMIT 100); The main advantage of using the primary key is that the result will be more portable to other databases--the ctid field is very much a PostgreSQL specific hack. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
On Wed, 2009-12-30 at 09:11 -0800, shulkae wrote: > I want to delete all rows exceeding 400 hours (10 days) and I want to > limit deletion of only 100 records at a time. If your table has a primary key or you can contrive one, then the DELETE FROM tbl WHERE pk in (SELECT pk FROM tbl WHERE hours>400) construct sketched by others will work fine. The much harder case is where you have a table that doesn't have a pk. For instance, if you have a table like name | hours -----+------ bob | 400 bob | 20 and you naively do DELETE FROM tbl WHERE name in (SELECT name FROM tbl WHERE hours>400) then you'll delete *all* bob rows, which I suspect is not what you want. In cases without a pk, try something like BEGIN; CREATE TABLE tbl2 AS SELECT * FROM tbl WHERE hours>400; TRUNCATE tbl; INSERT INTO tbl SELECT * FROM tbl2; COMMIT; The effect is to select the rows you want and replace the contents of tbl. The above will work if you have indexes or views that depend on tbl, but there are simpler variations on this theme if that is not a concern. And for a completely different approach, consider creating a view: CREATE VIEW tbl_le_400 AS SELECT * FROM tbl WHERE hours<=400; Then use tbl_le_400 in lieu of tbl for your selects. -Reece
On Wed, Dec 30, 2009 at 1:43 PM, APseudoUtopia <apseudoutopia@gmail.com> wrote:
You're correct to infer that DELETE does not support LIMIT clauses.
The reason for this is there is no way to tell exactly which rows will
be deleted (unless ORDER BY is used - but there are still other issues
with that as well).
You could, however, do something like such:
DELETE FROM "table" WHERE "column" IN (SELECT "column" FROM "table"
LIMIT 100 OFFSET 0);
(But again, you would need explicit ORDER BY clauses to determine
exactly which rows are actually deleted.
Are there any plans to disable that method as well, since it suffers from the same problem?