Thread: how to delete many rows from a huge table?

how to delete many rows from a huge table?

From
mARK bLOORE
Date:
I have a table of about 1 G rows, and I want to delete about 5 M rows,
listed in another table.
The big table is

             Table "public.backlinks"
     Column      | Type  | Modifiers | Description
-----------------+-------+-----------+-------------
 key             | bytea | not null  |
 backlink        | text  | not null  |
 backlink_hash   | bytea | not null  |
 url             | text  | not null  |
 time_downloaded | date  | not null  |
Triggers:
    insert_backlinks_trigger BEFORE INSERT ON backlinks FOR EACH ROW
EXECUTE PROCEDURE backlinks_insert_trigger()
Has OIDs: no

It is divided into 64 partitions, like

            Table "public.backlinks_0"
     Column      | Type  | Modifiers | Description
-----------------+-------+-----------+-------------
 key             | bytea | not null  |
 backlink        | text  | not null  |
 backlink_hash   | bytea | not null  |
 url             | text  | not null  |
 time_downloaded | date  | not null  |
Indexes:
    "backlinks_0_pkey" PRIMARY KEY, btree (key, backlink_hash),
tablespace "fastdisk"
Check constraints:
    "backlinks_0_key_check" CHECK (partition(key) = 0)
Inherits: backlinks
Has OIDs: no

The partitions are very evenly filled.  The partition function just
takes the first byte modulo 64.

The table listing what I want to delete has just the key values.

If I just do

DELETE FROM backlinks b USING bad_links bl WHERE b.key = bl.key;

then it grinds for an hour or so and runs out of memory.

If I do

DELETE FROM backlinks b WHERE b.key IN (SELECT bl.key FROM bad_links
bl LIMIT 40 OFFSET 0);

it finishes in milliseconds, but if I set the LIMIT to 50, it runs for
minutes til I kill it.  EXPLAIN says that it is doing a sequential
scan on several of the partitions in the 50 case, but not the 40.
Auto-vacuuming is running on the DB.

If I put that DELETE into a procedure and loop on the offset, it acts
like the 50 case, even if i set the LIMIT to 1.

I am currently running a shell loop doing the delete in chunks of 40,
but it has been running all weekend and is only half way through.

What am I doing wrong?

--
mARK bLOORE <mbloore@gmail.com>

Re: how to delete many rows from a huge table?

From
Scott Marlowe
Date:
On Mon, May 25, 2009 at 1:29 PM, mARK bLOORE <mbloore@gmail.com> wrote:
> The table listing what I want to delete has just the key values.
>
> If I just do
>
> DELETE FROM backlinks b USING bad_links bl WHERE b.key = bl.key;
>
> then it grinds for an hour or so and runs out of memory.
>
> If I do
>
> DELETE FROM backlinks b WHERE b.key IN (SELECT bl.key FROM bad_links
> bl LIMIT 40 OFFSET 0);
>
> it finishes in milliseconds, but if I set the LIMIT to 50, it runs for
> minutes til I kill it.  EXPLAIN says that it is doing a sequential
> scan on several of the partitions in the 50 case, but not the 40.
> Auto-vacuuming is running on the DB.
>
> If I put that DELETE into a procedure and loop on the offset, it acts
> like the 50 case, even if i set the LIMIT to 1.

What does explain of those three queries show you?  I'm guessing that
cranking up work_mem may help.