Thread: Feature Request: Better handling of foreign keys in DELETE statements

Feature Request: Better handling of foreign keys in DELETE statements

From
Daniel Migowski
Date:

Hi,

 

I face the following problem: I have a large table with 12 million addresses, referenced by 20 other tables (some containing about one million entries). There are indexes on the foreign keys.

 

Now I wanted to delete about 10 million addresses (that are not referenced anymore from anywhere), and have a statement like:

 

DELETE FROM address

WHERE id NOT IN (SELECT address_id FROM bank where address_id IS NOT NULL)

   AND id NOT IN (SELECT poboxaddress_id FROM bank where poboxaddress_id IS NOT NULL)

   AND id NOT IN (SELECT address_id FROM bankconnection where address_id IS NOT NULL)

...lots more...

 

This takes more than 10 hours here (I had to cancel the statement).

 

I have two suggestions:

 

1.       Currently for each row to be deleted, a SELECT is done in each column referencing the deleted entry. This takes really a lot of time. It is possible to check in an elegant way if an entry can be deleted, like in the above query.  I know it is not easy to autocreate such a statement, but this would make deletions much faster.

2.       I would have loved a special option “UNREREFENCED” given to the delete statement, so all rows referenced from anywhere would automagically be excluded from my delete statement. When this keyword is given, no FK checks have to be done, because FK referenciality cannot be violated anyway.

 

DELETE UNREFERENCED FROM address WHERE …;

 

Thanks for your time and this great database product.

 

Regards,

Daniel Migowski

Re: Feature Request: Better handling of foreign keys in DELETE statements

From
Chris Angelico
Date:
On Tue, Dec 20, 2011 at 3:01 AM, Daniel Migowski <dmigowski@ikoffice.de> wrote:
> DELETE FROM address
>
> WHERE id NOT IN (SELECT address_id FROM bank where address_id IS NOT NULL)
>
>    AND id NOT IN (SELECT poboxaddress_id FROM bank where poboxaddress_id IS
> NOT NULL)
>
>    AND id NOT IN (SELECT address_id FROM bankconnection where address_id IS
> NOT NULL)

You may find it faster if you rewrite it as an outer join.

SELECT id FROM address
LEFT JOIN bank ON address.id=bank.address_id
LEFT JOIN pobox ON address.id=pobox.address_id
LEFT JOIN bankconnection ON address.id=bankconnection.address_id
WHERE bank.address_id IS NULL
AND pobox.address_id IS NULL
AND bankconnection.address_id IS NULL

Something like that. (This will enumerate the IDs to be deleted. I
always prefer to start with safe operations before moving to huge
deletions!) Alternatively, use a multi-pass approach:

CREATE TABLE deleteme (ID same_type_as_address)
INSERT INTO deleteme SELECT id FROM address
DELETE FROM deleteme WHERE id IN (SELECT address_id FROM bank)
DELETE FROM deleteme WHERE id IN (SELECT address_id FROM pobox)
DELETE FROM deleteme WHERE id IN (SELECT address_id FROM bankconnection)
-- etc --
DELETE FROM address WHERE id IN (SELECT id FROM deleteme)

Chris Angelico

Re: Feature Request: Better handling of foreign keys in DELETE statements

From
Pavel Stehule
Date:
Hello

2011/12/19 Daniel Migowski <dmigowski@ikoffice.de>:
> Hi,
>
>
>
> I face the following problem: I have a large table with 12 million
> addresses, referenced by 20 other tables (some containing about one million
> entries). There are indexes on the foreign keys.
>
>
>
> Now I wanted to delete about 10 million addresses (that are not referenced
> anymore from anywhere), and have a statement like:
>

you can disable check per session if you need

ALTER TABLE ... DISABLE TRIGGER ALL;

Regards

Pavel Stehule

>
>
> DELETE FROM address
>
> WHERE id NOT IN (SELECT address_id FROM bank where address_id IS NOT NULL)
>
>    AND id NOT IN (SELECT poboxaddress_id FROM bank where poboxaddress_id IS
> NOT NULL)
>
>    AND id NOT IN (SELECT address_id FROM bankconnection where address_id IS
> NOT NULL)
>
> ...lots more...
>
>
>
> This takes more than 10 hours here (I had to cancel the statement).
>
>
>
> I have two suggestions:
>
>
>
> 1.       Currently for each row to be deleted, a SELECT is done in each
> column referencing the deleted entry. This takes really a lot of time. It is
> possible to check in an elegant way if an entry can be deleted, like in the
> above query.  I know it is not easy to autocreate such a statement, but this
> would make deletions much faster.
>
> 2.       I would have loved a special option “UNREREFENCED” given to the
> delete statement, so all rows referenced from anywhere would automagically
> be excluded from my delete statement. When this keyword is given, no FK
> checks have to be done, because FK referenciality cannot be violated anyway.
>
>
>
> DELETE UNREFERENCED FROM address WHERE …;
>
>
>
> Thanks for your time and this great database product.
>
>
>
> Regards,
>
> Daniel Migowski

Re: Feature Request: Better handling of foreign keys in DELETE statements

From
Marti Raudsepp
Date:
On Mon, Dec 19, 2011 at 18:14, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> you can disable check per session if you need
> ALTER TABLE ... DISABLE TRIGGER ALL;

This is NOT a per-session command, this applies to all connections and
queries, and may allow incorrect data into the table. Don't do this!
(And it doesn't address the question anyway)

Regards,
Marti

Re: Feature Request: Better handling of foreign keys in DELETE statements

From
Pavel Stehule
Date:
2011/12/19 Marti Raudsepp <marti@juffo.org>:
> On Mon, Dec 19, 2011 at 18:14, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> you can disable check per session if you need
>> ALTER TABLE ... DISABLE TRIGGER ALL;
>
> This is NOT a per-session command, this applies to all connections and
> queries, and may allow incorrect data into the table. Don't do this!
> (And it doesn't address the question anyway)
>

you have true,

sorry

Regards

Pavel

> Regards,
> Marti

Re: Feature Request: Better handling of foreign keys in DELETE statements

From
Hannes Erven
Date:
Hi Daniel,


> Now I wanted to delete about 10 million addresses (that are not
> referenced anymore from anywhere), and have a statement like:

What about:

DELETE FROM address
WHERE id IN (
    SELECT id FROM address
    EXCEPT
    (
     SELECT address_id FROM tab1
    UNION ALL
    SELECT address_id FROM tab2
    UNION ALL
    ...
    )
)
?

You could also easily restrict that query to multiple ranges of
adress.ids to control memory usage and transaction duration.



-hannes