Thread: Deleting 100 rows which meets certain criteria

Deleting 100 rows which meets certain criteria

From
shulkae
Date:
I am newbie to postgres/SQL.

I want to delete all rows exceeding 400 hours (10 days) and I want to
limit deletion of only 100 records at a time.

I was trying the following in PostgreSQL:

DELETE from mytable WHERE (now() - timestamp_field  > INTERVAL '400
hour' ) LIMIT 100;

Looks like DELETE syntax doesn't support LIMIT.

Is there any other way to achieve this?

thanks
Shul

Re: Deleting 100 rows which meets certain criteria

From
APseudoUtopia
Date:
On Wed, Dec 30, 2009 at 12:11 PM, shulkae <shulkae@gmail.com> wrote:
> I am newbie to postgres/SQL.
>
> I want to delete all rows exceeding 400 hours (10 days) and I want to
> limit deletion of only 100 records at a time.
>
> I was trying the following in PostgreSQL:
>
> DELETE from mytable WHERE (now() - timestamp_field  > INTERVAL '400
> hour' ) LIMIT 100;
>
> Looks like DELETE syntax doesn't support LIMIT.
>
> Is there any other way to achieve this?
>
> thanks
> Shul
>

You're correct to infer that DELETE does not support LIMIT clauses.
The reason for this is there is no way to tell exactly which rows will
be deleted (unless ORDER BY is used - but there are still other issues
with that as well).

You could, however, do something like such:
DELETE FROM "table" WHERE "column" IN (SELECT "column" FROM "table"
LIMIT 100 OFFSET 0);

(But again, you would need explicit ORDER BY clauses to determine
exactly which rows are actually deleted.

Re: Deleting 100 rows which meets certain criteria

From
Andreas Kretschmer
Date:
shulkae <shulkae@gmail.com> wrote:

> I am newbie to postgres/SQL.
>
> I want to delete all rows exceeding 400 hours (10 days) and I want to
> limit deletion of only 100 records at a time.
>
> I was trying the following in PostgreSQL:
>
> DELETE from mytable WHERE (now() - timestamp_field  > INTERVAL '400
> hour' ) LIMIT 100;
>
> Looks like DELETE syntax doesn't support LIMIT.
>
> Is there any other way to achieve this?

Select all rows with this condition, order by this field descending, limit 100.

Now you have all possible rows to delete. Simple example:

test=*# select * from shulkae ;
 i
----
  2
  4
  1
  6
  4
  1
  9
 11
 13
  2
 17
 15
(12 rows)

Time: 0.211 ms
test=*# delete from shulkae where i in (select * from shulkae where i > 10 order by i desc limit 2);
DELETE 2
Time: 0.542 ms
test=*# select * from shulkae ;
 i
----
  2
  4
  1
  6
  4
  1
  9
 11
 13
  2
(10 rows)

Time: 0.200 ms
test=*#



Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: Deleting 100 rows which meets certain criteria

From
Greg Smith
Date:
shulkae wrote:
> DELETE from mytable WHERE (now() - timestamp_field  > INTERVAL '400
> hour' ) LIMIT 100;
>

Force of habit (not sure if the optimizer does this trick for you) is
first to rewrite this as follows:

DELETE from mytable WHERE timestamp_field <  (now() - INTERVAL '400
hour' ) LIMIT 100;

Just to turn the comparison into a constant being compared with the field.

If there's a useful primary key on this table, you can do this to delete:

DELETE FROM mytable where pkey IN (SELECT pkey from mytable WHERE
timestamp_field <  (now() - INTERVAL '400 hour' ) LIMIT 100);

If there's not a primary key, you can use a hidden field named ctid to
get your record list:

http://www.postgresql.org/docs/current/static/ddl-system-columns.html

And then use that as the way to communicate the candidate deletion list
out of the subselect:

DELETE FROM mytable where ctid IN (SELECT ctid from mytable WHERE
timestamp_field <  (now() - INTERVAL '400 hour' ) LIMIT 100);

The main advantage of using the primary key is that the result will be
more portable to other databases--the ctid field is very much a
PostgreSQL specific hack.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com


Re: Deleting 100 rows which meets certain criteria

From
Reece Hart
Date:
On Wed, 2009-12-30 at 09:11 -0800, shulkae wrote:
> I want to delete all rows exceeding 400 hours (10 days) and I want to
> limit deletion of only 100 records at a time.

If your table has a primary key or you can contrive one, then the
DELETE FROM tbl WHERE pk in (SELECT pk FROM tbl WHERE hours>400)
construct sketched by others will work fine.

The much harder case is where you have a table that doesn't have a pk.
For instance, if you have a table like

name | hours
-----+------
bob  | 400
bob  | 20

and you naively do
DELETE FROM tbl WHERE name in (SELECT name FROM tbl WHERE hours>400)
then you'll delete *all* bob rows, which I suspect is not what you want.

In cases without a pk, try something like
BEGIN;
CREATE TABLE tbl2 AS SELECT * FROM tbl WHERE hours>400;
TRUNCATE tbl;
INSERT INTO tbl SELECT * FROM tbl2;
COMMIT;

The effect is to select the rows you want and replace the contents of
tbl.  The above will work if you have indexes or views that depend on
tbl, but there are simpler variations on this theme if that is not a
concern.


And for a completely different approach, consider creating a view:
CREATE VIEW tbl_le_400 AS
SELECT * FROM tbl WHERE hours<=400;

Then use tbl_le_400 in lieu of tbl for your selects.


-Reece



Re: Deleting 100 rows which meets certain criteria

From
Anthony
Date:
On Wed, Dec 30, 2009 at 1:43 PM, APseudoUtopia <apseudoutopia@gmail.com> wrote:
You're correct to infer that DELETE does not support LIMIT clauses.
The reason for this is there is no way to tell exactly which rows will
be deleted (unless ORDER BY is used - but there are still other issues
with that as well).

You could, however, do something like such:
DELETE FROM "table" WHERE "column" IN (SELECT "column" FROM "table"
LIMIT 100 OFFSET 0);

(But again, you would need explicit ORDER BY clauses to determine
exactly which rows are actually deleted.

Are there any plans to disable that method as well, since it suffers from the same problem?