Re: Deleting orphaned records to establish Ref Integrity - Mailing list pgsql-general

From Tom Lane
Subject Re: Deleting orphaned records to establish Ref Integrity
Date
Msg-id 10442.1117694169@sss.pgh.pa.us
Whole thread Raw
In response to Deleting orphaned records to establish Ref Integrity  ("Roman F" <romanf@fusemail.com>)
Responses Re: Deleting orphaned records to establish Ref Integrity  (Greg Stark <gsstark@mit.edu>)
List pgsql-general
"Roman F" <romanf@fusemail.com> writes:
> ...  Executing something like
> the following statement would work, but even with indexes it takes an
> insane amount of time to execute for each of the tables:

> DELETE FROM child_table WHERE parentid NOT IN
>   (SELECT parentid FROM parent_table)

Uh, what sort of query plan are you getting for that?

PG 7.4 and up can do a reasonable job with NOT IN if the sub-select is
small enough to fit into an in-memory hash table (of size sort_mem).
I'm betting that your sort_mem setting is not high enough to encourage
the planner to try the hash method.  You could try increasing sort_mem
... but given the size of your tables, you might end up with a hash
table large enough to drive the system into swapping, in which case
it'll still be mighty slow.

Another idea is to try an outer join:

    SELECT child_table.parentid INTO tmp_table
    FROM child_table LEFT JOIN parent_table
         ON (child_table.parentid = parent_table.parentid)
    WHERE parent_table.parentid IS NULL;

which essentially does a join and then pulls out just the child_table
rows that failed to match.  This will probably end up getting done via a
merge join or hybrid hash join, either of which are more scalable than
the NOT IN code.  You still have to do the actual deletions in
child_table, but as long as there aren't too many, a NOT IN using
tmp_table should work OK.

            regards, tom lane

pgsql-general by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: interval integer comparison
Next
From: "Dinesh Pandey"
Date:
Subject: index row size 2728 exceeds btree maximum, 2713