Thread: Getting the list of foreign keys (for deleting data from the database)

Getting the list of foreign keys (for deleting data from the database)

From
Mario Splivalo
Date:
I have a large, in-house built, ERP system that I need to clean up from
old/stale data.

As all the tables are FK-related I could do 'DELETE FROM' from the
top-most table (invoices, or stock documents, or whatever) to remove all
data from all the related tables, but that is, of course, extremely slow
(The datadir is around 20GB in size, and I need to remove 4/5 of the
data from the database - fiscal years 2014, 2013, 2012 and 2011 - only
2015 should remain).

Instead of doing DELETE FROM table WHERE date_created < '2015-01-01' I
was thinking of doing something like this:

SELECT foo_drop_all_constraints();
SELECT * FROM table INTO table_copy WHERE date_created >= '2015-01-01';
DROP TABLE table;
ALTER TABLE table_copy RENAME TO table;
SELECT foo_restore_all_constraints();

Of course, this is simple if I have only one table, but when there is
over 400 tables that are 'linked' with foreign keys, things get a bit
complicated.

Suppose I have a table_detail that has column table_id which is FK
pointing to table(id), I would need to do  something like this:

SELECT foo_drop_all_constraints();
SELECT * FROM table INTO table_copy WHERE date_created >= '2015-01-01';
SELECT table_detail.* INTO table_detail_copy FROM table_detail JOIN
table_copy ON table_detail.table_id = table_copy.id
DROP TABLE table;
DROP TABLE table_copy
ALTER TABLE table_copy RENAME TO table;
ALTER TABLE table_detail_copy TO table_detail;
SELECT foo_restore_all_constraints();

Now, what am I asking is - is there a tool which would help me find all
the _detail tables? I know I could query pg_constraints and similar
views but before I go onto hacking into those I'm wondering if there is
something that could aid me in doing so.

Of course, if this is not the best approach I'd appreciate different
views/opinions.
Mario


-- 
Mario Splivalo
mario@splivalo.hr

"I can do it quick, I can do it cheap, I can do it well. Pick any two."



Re: Getting the list of foreign keys (for deleting data from the database)

From
Thomas Kellerer
Date:
Mario Splivalo schrieb am 02.08.2015 um 17:04:
> Suppose I have a table_detail that has column table_id which is FK
> pointing to table(id), I would need to do  something like this:
>
> SELECT foo_drop_all_constraints();
> SELECT * FROM table INTO table_copy WHERE date_created >= '2015-01-01';
> SELECT table_detail.* INTO table_detail_copy FROM table_detail JOIN
> table_copy ON table_detail.table_id = table_copy.id
> DROP TABLE table;
> DROP TABLE table_copy
> ALTER TABLE table_copy RENAME TO table;
> ALTER TABLE table_detail_copy TO table_detail;
> SELECT foo_restore_all_constraints();
>
> Now, what am I asking is - is there a tool which would help me find all
> the _detail tables? I know I could query pg_constraints and similar
> views but before I go onto hacking into those I'm wondering if there is
> something that could aid me in doing so.

The SQL tool I maintain (http:://www.sql-workbench.net) has such a feature.

It supports a (SQL Workbench specific) command that generates (recursively) the delete
statements starting with the "root" table given a condition on the root table:
   http://www.sql-workbench.net/manual/wb-commands.html#command-gendelete

The generated SQL script honors the FKs and thus there is no need to drop all constraints.

In your case it would be something like:
   WbGenerateDelete -table=root_table -columnValue="date_created >= '2015-01-01'";

The output is a script with the DELETEs in the right order - or at least it _should_.

I have to admit that I had to deal with one or two really large schemas (> 700 tables) where the
delete statements where not ordered properly, especially if there are multiple FKs to/from the
same table.

Note that the generated statements are not pretty and far from being efficient. 
> Of course, if this is not the best approach I'd appreciate different
> views/opinions.

In my experience, setting all the FKs to "on delete cascade" and properly indexing the FK
columns is very often faster than doing the deletes all "manually".

Another option (if you need to do that very often) is to partition the tables by e.g. year.
Then getting rid of all the data for a year is as simple as dropping the partitions for that year.

However partitioning and foreign key constraints don't work together in Postgres, which is a real shame.

Thomas






Re: Getting the list of foreign keys (for deleting data from the database)

From
Adrian Klaver
Date:
On 08/02/2015 08:04 AM, Mario Splivalo wrote:
> I have a large, in-house built, ERP system that I need to clean up from
> old/stale data.
>
> As all the tables are FK-related I could do 'DELETE FROM' from the
> top-most table (invoices, or stock documents, or whatever) to remove all
> data from all the related tables, but that is, of course, extremely slow
> (The datadir is around 20GB in size, and I need to remove 4/5 of the
> data from the database - fiscal years 2014, 2013, 2012 and 2011 - only
> 2015 should remain).

I have an answer of sorts below.

I do have some questions in the meantime though.

What is the purpose of an ERP that has no history?

In particular how do you do the P(lan) part without reference to the past?

>
> Instead of doing DELETE FROM table WHERE date_created < '2015-01-01' I
> was thinking of doing something like this:
>
> SELECT foo_drop_all_constraints();
> SELECT * FROM table INTO table_copy WHERE date_created >= '2015-01-01';
> DROP TABLE table;
> ALTER TABLE table_copy RENAME TO table;
> SELECT foo_restore_all_constraints();
>
> Of course, this is simple if I have only one table, but when there is
> over 400 tables that are 'linked' with foreign keys, things get a bit
> complicated.
>
> Suppose I have a table_detail that has column table_id which is FK
> pointing to table(id), I would need to do  something like this:
>
> SELECT foo_drop_all_constraints();
> SELECT * FROM table INTO table_copy WHERE date_created >= '2015-01-01';
> SELECT table_detail.* INTO table_detail_copy FROM table_detail JOIN
> table_copy ON table_detail.table_id = table_copy.id
> DROP TABLE table;
> DROP TABLE table_copy
> ALTER TABLE table_copy RENAME TO table;
> ALTER TABLE table_detail_copy TO table_detail;
> SELECT foo_restore_all_constraints();
>
> Now, what am I asking is - is there a tool which would help me find all
> the _detail tables? I know I could query pg_constraints and similar
> views but before I go onto hacking into those I'm wondering if there is
> something that could aid me in doing so.

My guess is for this case it will be less resource intensive to just do 
the DELETE(s), in smaller batches then a year, then to replicate the 
referential integrity in your own code.

If this is going to be a regular(yearly) thing I would look at partitioning:

http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html


>
> Of course, if this is not the best approach I'd appreciate different
> views/opinions.
>
>     Mario
>
>


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Getting the list of foreign keys (for deleting data from the database)

From
Mario Splivalo
Date:
On 08/02/2015 05:25 PM, Adrian Klaver wrote:
> On 08/02/2015 08:04 AM, Mario Splivalo wrote:
>> I have a large, in-house built, ERP system that I need to clean up from
>> old/stale data.
>>
>> As all the tables are FK-related I could do 'DELETE FROM' from the
>> top-most table (invoices, or stock documents, or whatever) to remove all
>> data from all the related tables, but that is, of course, extremely slow
>> (The datadir is around 20GB in size, and I need to remove 4/5 of the
>> data from the database - fiscal years 2014, 2013, 2012 and 2011 - only
>> 2015 should remain).
> 
> I have an answer of sorts below.
> 
> I do have some questions in the meantime though.
> 
> What is the purpose of an ERP that has no history?
> 
> In particular how do you do the P(lan) part without reference to the past?

I don't need that data in the 'current' database - it makes backups and
archiving harder. The customers can still access 'old' databases if they
need to check data that exists there.

>>
>> Now, what am I asking is - is there a tool which would help me find all
>> the _detail tables? I know I could query pg_constraints and similar
>> views but before I go onto hacking into those I'm wondering if there is
>> something that could aid me in doing so.
> 
> My guess is for this case it will be less resource intensive to just do
> the DELETE(s), in smaller batches then a year, then to replicate the
> referential integrity in your own code.

Yup, that would work. Actually, I am using that approach on some other
databases, I have a cronjob that runs every hour that deletes all data
older than 8765 hours from the database, thus keeping only the
year-worth of data.

Unfortunately, I inherited this and I need to 'purge' old data from the
database.
Mario
-- 
Mario Splivalo
mario@splivalo.hr

"I can do it quick, I can do it cheap, I can do it well. Pick any two."



Re: Re: Getting the list of foreign keys (for deleting data from the database)

From
Mario Splivalo
Date:
On 08/02/2015 05:20 PM, Thomas Kellerer wrote:
> Mario Splivalo schrieb am 02.08.2015 um 17:04:
>> Suppose I have a table_detail that has column table_id which is FK
>> pointing to table(id), I would need to do  something like this:
>>
>> SELECT foo_drop_all_constraints();
>> SELECT * FROM table INTO table_copy WHERE date_created >= '2015-01-01';
>> SELECT table_detail.* INTO table_detail_copy FROM table_detail JOIN
>> table_copy ON table_detail.table_id = table_copy.id
>> DROP TABLE table;
>> DROP TABLE table_copy
>> ALTER TABLE table_copy RENAME TO table;
>> ALTER TABLE table_detail_copy TO table_detail;
>> SELECT foo_restore_all_constraints();
>>
>> Now, what am I asking is - is there a tool which would help me find all
>> the _detail tables? I know I could query pg_constraints and similar
>> views but before I go onto hacking into those I'm wondering if there is
>> something that could aid me in doing so.
> 
> The SQL tool I maintain (http:://www.sql-workbench.net) has such a feature.
> 
> It supports a (SQL Workbench specific) command that generates
> (recursively) the delete
> statements starting with the "root" table given a condition on the root
> table:
> 
>    http://www.sql-workbench.net/manual/wb-commands.html#command-gendelete

Oho! Thank you, I will check this out immediately!
> 
> The generated SQL script honors the FKs and thus there is no need to
> drop all constraints.

The main reason for dropping FKs is because of the speed. It is WAY
faster to copy non-deleting data to a new (temporary) table, then drop
originating table and then rename the temporary table.

> 
> In your case it would be something like:
> 
>    WbGenerateDelete -table=root_table -columnValue="date_created >=
> '2015-01-01'";
> 
> The output is a script with the DELETEs in the right order - or at least
> it _should_.
> 
> I have to admit that I had to deal with one or two really large schemas
> (> 700 tables) where the
> delete statements where not ordered properly, especially if there are
> multiple FKs to/from the
> same table.
> 
> Note that the generated statements are not pretty and far from being
> efficient.

We'll see. If I can adapt/change those so that they INSERT INTO instead
of DELETE, then I'm 'riding on horse' (I'm on donkey now).

>  
>> Of course, if this is not the best approach I'd appreciate different
>> views/opinions.
> 
> In my experience, setting all the FKs to "on delete cascade" and
> properly indexing the FK
> columns is very often faster than doing the deletes all "manually".

All the constraints are set to 'on delete cascade' - deleting data just
from the top-most tables currently takes over 3 days to complete.

> Another option (if you need to do that very often) is to partition the
> tables by e.g. year.
> Then getting rid of all the data for a year is as simple as dropping the
> partitions for that year.

Yup, this is a very good suggestion! But for now I first need to get rid
of 'unneeded' data from the database.

> However partitioning and foreign key constraints don't work together in
> Postgres, which is a real shame.

+1
Mario

-- 
Mario Splivalo
mario@splivalo.hr

"I can do it quick, I can do it cheap, I can do it well. Pick any two."