Thread: Delete all records NOT referenced by Foreign Keys
I'd like to run a clean up command on my tables to eliminate rows that I'm no longer using in the database. I want to do something like this: DELETE FROM tablename WHERE IS_REFERENCED_BY_FOREIGN_KEY IS FALSE; Does anyone know how something like this could be done in PostgreSQL? I know I can search all the tables that I know refer to this table and see if my primary key exists, but I want a solution that does not require me to rewrite my code every time a new foreign key constraint is added to the database. There must be a way to ask PostgreSQL for a reference count on a given row or something. Thanks, Dante --------- D. Dante Lorenso dante@lorenso.com
On Sat, Dec 13, 2003 at 02:20:15 -0600, "D. Dante Lorenso" <dante@lorenso.com> wrote: > I'd like to run a clean up command on my tables to > eliminate rows that I'm no longer using in the database. > > I want to do something like this: > > DELETE FROM tablename > WHERE IS_REFERENCED_BY_FOREIGN_KEY IS FALSE; > > Does anyone know how something like this could be done > in PostgreSQL? I know I can search all the tables that > I know refer to this table and see if my primary key > exists, but I want a solution that does not require me to > rewrite my code every time a new foreign key constraint > is added to the database. > > There must be a way to ask PostgreSQL for a reference count > on a given row or something. If you are more concerned about flexibility than speed you can do something like the following: Set all of your foreign key references to the desired table to use an on delete restrict clause. Have your application read all of the key values from the desired table and for each key issue a delete of that key in its own transaction. This will fail for keys that are referenced (because of the restrict clause). A more complicated, less future proof, but more efficient approach would be to have your application find out which tables have references to the table of interest by looking at the system catalog and then write a delete query using appropiate where not exist clauses.
Bruno Wolff III wrote: >On Sat, Dec 13, 2003 at 02:20:15 -0600, > "D. Dante Lorenso" <dante@lorenso.com> wrote: > > >>I'd like to run a clean up command on my tables to >>eliminate rows that I'm no longer using in the database. >> >>I want to do something like this: >> >> DELETE FROM tablename >> WHERE IS_REFERENCED_BY_FOREIGN_KEY IS FALSE; >> >>Does anyone know how something like this could be done >>in PostgreSQL? I know I can search all the tables that >>I know refer to this table and see if my primary key >>exists, but I want a solution that does not require me to >>rewrite my code every time a new foreign key constraint >>is added to the database. >> >>There must be a way to ask PostgreSQL for a reference count >>on a given row or something. >> >> > >If you are more concerned about flexibility than speed you can do something >like the following: > >Set all of your foreign key references to the desired table to use an >on delete restrict clause. > >Have your application read all of the key values from the desired table >and for each key issue a delete of that key in its own transaction. >This will fail for keys that are referenced (because of the restrict clause). > > This is something very ugly indeed and is what I'll have to resort to unless I can find something cleaner. Ideally, I would be able to run this cleanup on a subset of the table data after an insert into the table. I would like the query to be fast, though. Does anyone know if there is any way to say something like: DELETE FROM tablename IGNORE ERRORS; Where a delete that is possible is performed but ones that throw referencial integrity voilations would silently fail without abandoning the entire transaction? I have the 'on delete restrict' clause on my foreign keys already. >A more complicated, less future proof, but more efficient approach would >be to have your application find out which tables have references to the >table of interest by looking at the system catalog and then write a >delete query using appropiate where not exist clauses. > >---------------------------(end of broadcast)--------------------------- >TIP 7: don't forget to increase your free space map settings > >
On Sat, Dec 13, 2003 at 09:48:16PM -0600, D. Dante Lorenso wrote: > This is something very ugly indeed and is what I'll have to resort to unless > I can find something cleaner. Ideally, I would be able to run this cleanup > on a subset of the table data after an insert into the table. I would like > the query to be fast, though. What about just: delete from a where a.id not in (select id from b); or the equivalent exists query. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > (... have gone from d-i being barely usable even by its developers > anywhere, to being about 20% done. Sweet. And the last 80% usually takes > 20% of the time, too, right?) -- Anthony Towns, debian-devel-announce
Attachment
Martijn van Oosterhout wrote: >On Sat, Dec 13, 2003 at 09:48:16PM -0600, D. Dante Lorenso wrote: > > >>This is something very ugly indeed and is what I'll have to resort to unless >>I can find something cleaner. Ideally, I would be able to run this cleanup >>on a subset of the table data after an insert into the table. I would like >>the query to be fast, though. >> >> > >What about just: > >delete from a where a.id not in (select id from b); > >or the equivalent exists query. > > You missed the previous part of the thread. I have N tables that have a foreign key to the table in question. Tomorrow there may be more or fewer foreign key references. Without having to know which tables have foreign keys on my table, I want to delete all rows that are not used by any any other table. PG already can block a delete when it knows that foreign key exists, so why can't I perform a query that says... DELETE FROM tablename WHERE FOREIGN_KEY_EXISTS(oid) IS FALSE; You see? Something like what I seek never requires ME the developer or DBA to know about foreign key relationships because I know that PostgreSQL already does. To NOT have this functionality does not cause problems, but it does cause me to waste disk space on rows that are no longer in use. I just want to do some automated cleanup on tables and just leave that process running in a crontab nightly or something. I don't want to have to re-write the cleanup process every time a new dependency is introduced or removed. I think Bruno had a good idea about using the system tables to determine relationships, but how to do that is beyond my PostgreSQL expertise at the moment. I just think there's gotta be an easier way, though...something like what I describe above. Dante ---------- D. Dante Lorenso dante@lorenso.com
On Sun, 14 Dec 2003, D. Dante Lorenso wrote: > PG already can block a delete when it knows that foreign key exists, so > why can't I perform a query that says... > > DELETE FROM tablename > WHERE FOREIGN_KEY_EXISTS(oid) IS FALSE; That's fairly different from the checks that are performed for the foreign keys which happen after the action has happened and errors to prevent the action from being visible. The where clause happens long before that. If the above has to check each referencing table for matching rows for each row in tablename, I'd also expect it to perform poorly.
"D. Dante Lorenso" <dante@lorenso.com> writes: > To NOT have this functionality does not cause problems, but it does cause > me to waste disk space on rows that are no longer in use. I just want to > do some automated cleanup on tables and just leave that process running > in a crontab nightly or something. I don't want to have to re-write the > cleanup process every time a new dependency is introduced or removed. You could just try to delete every record in the desired tables and see if you get a foreign key violation. You would have to do each delete in a separate transaction and just ignore any errors. so you would have to do something like foreach table in <list of "auto-cleanup" tables> select id from table delete from table where id = xx This would mean your auto-cleanup crontab script doesn't even have to look in the system catalog to find out the dependencies. It just depends on postgres knowing all the dependencies and checking them all. Inevitably though there will be some tables that have some implicit depenencies that cannot be represented as foreign key references. Or are just the master records and don't need anything else in the database to depend on them. So you'll need a list somewhere of tables that are purely subservient to other tables and can be summarily cleaned up this way. -- greg
Stephan Szabo wrote: >On Sun, 14 Dec 2003, D. Dante Lorenso wrote: > > > >>PG already can block a delete when it knows that foreign key exists, so >>why can't I perform a query that says... >> >> DELETE FROM tablename >> WHERE FOREIGN_KEY_EXISTS(oid) IS FALSE; >> >> > >That's fairly different from the checks that are performed for the foreign >keys which happen after the action has happened and errors to prevent the >action from being visible. The where clause happens long before that. If >the above has to check each referencing table for matching rows for each >row in tablename, I'd also expect it to perform poorly. > > My hope was that there was some sort of (semaphore? / counter?) associated with each row that indicated whether a dependency existed at all. Although that would most likely not be an indexed column, I could apply additional WHERE constraints to avoid a full table scan. If such a counter existed, it would certainly speed up deletes when no foreign key checks were necessary. But I suppose it would also slow down deletes when it becomes necessary to decrement a counter for the dependencies created by the row being deleted. So...internally, how does a row KNOW that it can not be deleted because of a foreign key constraint? Whatever that mechanism is, isn't there a way for me to make use of that as I try my query? And... if not...how about something like this in PL/SQL: -- Find all the address records which might need to be removed... FOR my_rec IN SELECT addr_id FROM address WHERE acct_id = in_acct_id AND addr_is_active IS FALSE LOOP -- try to delete this record... DELETE FROM address WHERE addr_id = my_rec.addr_id; END LOOP; This will loop through the records one at a time and try to delete them. However, I don't want to have any exceptions thrown if the DELETE action can not be performed. Is there a TRY/CATCH type of code that I can surround the DELETE with to prevent the entire operation from being aborted on the first error found? Dante ---------- D. Dante Lorenso dante@lorenso.com
On Sun, Dec 14, 2003 at 03:02:49AM -0600, D. Dante Lorenso wrote: > My hope was that there was some sort of (semaphore? / counter?) associated > with each row that indicated whether a dependency existed at all. Although > that would most likely not be an indexed column, I could apply additional > WHERE constraints to avoid a full table scan. Ah, I see. There is no counter. When you delete a row, it does a check on the referencing table to see if it would break any foreign keys. The system has a defined trigger for that purpose. If you don't want to have the trigger error out, you do the same test. That's what my query did. In fact, you'll be able to do it more efficiently, since you know you'll be deleting many rows, you can arrange to only scan the referencing table once. The table with the foreign keys in it would be pg_constraint I think. You can use -E on psql to see you \d gets the info. > This will loop through the records one at a time and try to delete them. > However, I don't want to have any exceptions thrown if the DELETE action > can not be performed. Is there a TRY/CATCH type of code that I can > surround the DELETE with to prevent the entire operation from being > aborted on the first error found? That would be subtransactions, and they're not done yet. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > (... have gone from d-i being barely usable even by its developers > anywhere, to being about 20% done. Sweet. And the last 80% usually takes > 20% of the time, too, right?) -- Anthony Towns, debian-devel-announce
Attachment
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Does anyone know how something like this could be done > in PostgreSQL? I know I can search all the tables that > I know refer to this table and see if my primary key > exists, but I want a solution that does not require me to > rewrite my code every time a new foreign key constraint > is added to the database. Here is a function that removes all non-referenced rows from a table. Make a backup before using of course. :) CREATE OR REPLACE FUNCTION delete_nonrefs(TEXT) RETURNS TEXT AS ' DECLARE mytable ALIAS FOR $1; mytext TEXT; myrec RECORD; deltext TEXT; myrows INTEGER; BEGIN mytext := '' SELECT c2.relname AS tname, SUBSTRING((SELECT pg_get_constraintdef(r.oid)) FROM \'\'\\\\\\\\)[^(]+.([^)]+)\'\') AS fkcol, SUBSTRING((SELECT pg_get_constraintdef(r.oid)) FROM \'\'\\\\\\\\(([^)]+)\'\') AS mycol FROM pg_class c, pg_class c2, pg_constraint r WHERE c.relname = \' || quote_literal(mytable) || \' AND r.confrelid = c.oid AND r.contype = \'\'f\'\' AND c2.oid = r.conrelid''; FOR myrec IN EXECUTE mytext LOOP IF deltext IS NULL THEN deltext := \' \\nWHERE \'; ELSE deltext := deltext || \'\\nAND \'; END IF; deltext := deltext || \'NOT EXISTS \' || \'(SELECT 1 FROM \' || quote_ident(myrec.tname) || \' t WHERE \' || quote_ident(mytable) || \'.\' || myrec.fkcol || \' = t.\' || myrec.mycol || \')\'; END LOOP; IF deltext IS NULL THEN RETURN \'Table \' || quote_ident(mytable) || \' is not referenced by any other tables\'; END IF; deltext := \'DELETE FROM \' || quote_ident(mytable) || deltext; EXECUTE deltext; GET DIAGNOSTICS myrows := ROW_COUNT; RETURN \'Rows deleted from table \' || quote_ident(mytable) || \': \' || myrows; END; ' LANGUAGE plpgsql STABLE STRICT; - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200312141306 -----BEGIN PGP SIGNATURE----- iD8DBQE/3KeCvJuQZxSWSsgRAnNwAJ4v0bh/ATZtTaPqqid43qZuaFB/0ACdG+GL m6AtGBa3tNKsoZmy1ir6/KY= =S39B -----END PGP SIGNATURE-----
On Sun, 14 Dec 2003, D. Dante Lorenso wrote: > Stephan Szabo wrote: > > My hope was that there was some sort of (semaphore? / counter?) associated > with each row that indicated whether a dependency existed at all. Although > that would most likely not be an indexed column, I could apply additional > WHERE constraints to avoid a full table scan. > > If such a counter existed, it would certainly speed up deletes when no > foreign key checks were necessary. But I suppose it would also slow > down deletes when it becomes necessary to decrement a counter for the > dependencies created by the row being deleted. More importantly, unless you can defer the counter updates, that means taking out a write lock on the pk row in even more situations than now which makes things even more deadlock prone. If you do defer the updates in some way that removes that likelihood, you need a way to store the local changes to the counters. > So...internally, how does a row KNOW that it can not be deleted because of > a foreign key constraint? Whatever that mechanism is, isn't there a way for > me to make use of that as I try my query? It's a bunch of triggers each doing their own statements in C. The row has no knowledge of it, only the triggers do.