Thread: efficient deletes on subqueries
I have a very large table with > 1 million entries and I wish to delete rows which match any entries in a second table. What is the most efficient way of doing this? CREATE TABLE AA (keyA integer NOT NULL, info TEXT); CREATE TABLE obsolete_AA_entries (keyA integer NOT NULL); I want to remove all entries from AA where keyA matches that from obsolete_AA_entries, i.e. SELECT * FROM AA NATURAL JOIN obsolete_AA_entries; Both of the tables are UNIQUE indiced on keyA. DELETE FROM AA WHERE EXISTS( SELECT * from obsolete_AA_entries o where AA.keyA = o.keyA); seems to be faster than DELETE FROM AA where (keyA) in (SELECT * from obsolete_AA_entries); However, both are sequentially going through AA which is huge rather than looking up values one by one from obsolete_AA_entries which is small. How do I persuade the database to change its query strategy? Thanks a lot Llew
"postgresql" <postgres@lg.ndirect.co.uk> writes: > I want to remove all entries from AA where keyA matches that from > obsolete_AA_entries, > i.e. SELECT * FROM AA NATURAL JOIN obsolete_AA_entries; > DELETE FROM AA WHERE EXISTS( > SELECT * from obsolete_AA_entries o where AA.keyA = o.keyA); > seems to be faster than > DELETE FROM AA where (keyA) in (SELECT * from obsolete_AA_entries); > However, both are sequentially going through AA which is huge rather than > looking up values one by one from obsolete_AA_entries which is small. > How do I persuade the database to change its query strategy? TryDELETE FROM AA WHERE AA.keyA = obsolete_AA_entries.keyA; This is not valid SQL according to the SQL standard, but Postgres takes it anyway. Turning sub-SELECT queries into plannable joins is on the todo list, but it's not done yet :-( regards, tom lane