Thread: efficient deletes on subqueries

efficient deletes on subqueries

From
"postgresql"
Date:
I have a very large table with > 1 million entries and I wish to delete rows
which match any entries in a second table.
What is the most efficient way of doing this?

CREATE TABLE AA (keyA integer NOT NULL, info TEXT);
CREATE TABLE obsolete_AA_entries (keyA integer NOT NULL);

I want to remove all entries from AA where keyA matches that from
obsolete_AA_entries,
i.e. SELECT * FROM AA NATURAL JOIN obsolete_AA_entries;

Both of the tables are UNIQUE indiced on keyA.

DELETE FROM AA WHERE EXISTS(         SELECT * from obsolete_AA_entries o where AA.keyA = o.keyA);
seems to be faster than
DELETE FROM AA where (keyA) in (SELECT * from obsolete_AA_entries);

However, both are sequentially going through AA which is huge rather than
looking up values one by one from obsolete_AA_entries which is small.
How do I persuade the database to change its query strategy?
Thanks a lot
Llew




Re: efficient deletes on subqueries

From
Tom Lane
Date:
"postgresql" <postgres@lg.ndirect.co.uk> writes:
> I want to remove all entries from AA where keyA matches that from
> obsolete_AA_entries,
> i.e. SELECT * FROM AA NATURAL JOIN obsolete_AA_entries;

> DELETE FROM AA WHERE EXISTS(
>           SELECT * from obsolete_AA_entries o where AA.keyA = o.keyA);
> seems to be faster than
> DELETE FROM AA where (keyA) in (SELECT * from obsolete_AA_entries);

> However, both are sequentially going through AA which is huge rather than
> looking up values one by one from obsolete_AA_entries which is small.
> How do I persuade the database to change its query strategy?

TryDELETE FROM AA WHERE AA.keyA = obsolete_AA_entries.keyA;

This is not valid SQL according to the SQL standard, but Postgres takes
it anyway.

Turning sub-SELECT queries into plannable joins is on the todo list,
but it's not done yet :-(
        regards, tom lane