I have a table of about 1 G rows, and I want to delete about 5 M rows,
listed in another table.
The big table is
Table "public.backlinks"
Column | Type | Modifiers | Description
-----------------+-------+-----------+-------------
key | bytea | not null |
backlink | text | not null |
backlink_hash | bytea | not null |
url | text | not null |
time_downloaded | date | not null |
Triggers:
insert_backlinks_trigger BEFORE INSERT ON backlinks FOR EACH ROW
EXECUTE PROCEDURE backlinks_insert_trigger()
Has OIDs: no
It is divided into 64 partitions, like
Table "public.backlinks_0"
Column | Type | Modifiers | Description
-----------------+-------+-----------+-------------
key | bytea | not null |
backlink | text | not null |
backlink_hash | bytea | not null |
url | text | not null |
time_downloaded | date | not null |
Indexes:
"backlinks_0_pkey" PRIMARY KEY, btree (key, backlink_hash),
tablespace "fastdisk"
Check constraints:
"backlinks_0_key_check" CHECK (partition(key) = 0)
Inherits: backlinks
Has OIDs: no
The partitions are very evenly filled. The partition function just
takes the first byte modulo 64.
The table listing what I want to delete has just the key values.
If I just do
DELETE FROM backlinks b USING bad_links bl WHERE b.key = bl.key;
then it grinds for an hour or so and runs out of memory.
If I do
DELETE FROM backlinks b WHERE b.key IN (SELECT bl.key FROM bad_links
bl LIMIT 40 OFFSET 0);
it finishes in milliseconds, but if I set the LIMIT to 50, it runs for
minutes til I kill it. EXPLAIN says that it is doing a sequential
scan on several of the partitions in the 50 case, but not the 40.
Auto-vacuuming is running on the DB.
If I put that DELETE into a procedure and loop on the offset, it acts
like the 50 case, even if i set the LIMIT to 1.
I am currently running a shell loop doing the delete in chunks of 40,
but it has been running all weekend and is only half way through.
What am I doing wrong?
--
mARK bLOORE <mbloore@gmail.com>