Hello.
I have a database with company table that have a bunch of related (delete=cascade) tables.
Also it has 1<->M relation to company_descr table.
Once we've found that ~half of our companies do not have any description and we would like to remove them.
First this I've tried was
delete from company where id not in (select company_id from company_descr);
I've tried to analyze command, but unlike to other RDBM I've used it did not include cascade deletes/checks into query plan. That is first problem.
It was SLOW. To make it faster I've done next thing:
create temporary table comprm(id) as select id from company;
delete from comprm where id in (select company_id from company_descr);
delete from company where id in (select id from comprm);
That was much better. So the question is why postgresql can't do such a thing.
But it was better only until "removing" dataset was small (~5% of all table).
As soon as I've tried to remove 50% I've got speed problems. I've ensured I have all indexes for both ends of foreign key.
I've tried to remove all cascaded entries by myself, e.g.:
create temporary table comprm(id) as select id from company;
delete from comprm where id in (select company_id from company_descr);
delete from company_alias where company_id in (select id from comprm);
...
delete from company where id in (select id from comprm);
It did not help until I drop all constraints before and recreate all constraints after.
Now I have it work for 15minutes, while previously it could not do in a day.
Is it OK? I'd say, some (if not all) of the optimizations could be done by postgresql optimizer.