Thread: Rows violating Foreign key constraint exists
Hi, I am using PG version 10.5. Saw a table where we have foreign key defined but few thousand rows violate the foreign key constraint. I understand that one possibility of this happening is if we had manually disabled the triggers that do FK integrity checks and re enabled them afterwards. Is there any way to confirm if this has happened -- does PG internally maintain something like an audit history of DDL changes? I am not sure if this is due to some mistake in our end or if there are any known issues in PG 10.5 that would cause this. I tried the ALTER TABLE ... VALIDATE CONSTRAINT for this particular FK and it doesn't report any errors. The documentation pretty clearly mentions that 'VALIDATE CONSTRAINT' is used only to check those constraints created with 'NOT VALID' clause. It might be useful to have an option to check integrity of any FK (not just ones created with NOT VALID clause). Please let me know if there is already any way to do this. Also, is there any way to make sure the FK checking trigger can never be disabled (so that such a case will never arise)? How do I proceed from here - Do I just delete the inconsistent rows or is there something more I have to do? Thanks for your help. Regards, Nanda
On Thu, 2019-11-28 at 18:55 +0530, Nandakumar M wrote: > I am using PG version 10.5. > > Saw a table where we have foreign key defined but few thousand rows > violate the foreign key constraint. > > I understand that one possibility of this happening is if we had > manually disabled the triggers that do FK integrity checks and re > enabled them afterwards. Is there any way to confirm if this has > happened -- does PG internally maintain something like an audit > history of DDL changes? > > I am not sure if this is due to some mistake in our end or if there > are any known issues in PG 10.5 that would cause this. > > I tried the ALTER TABLE ... VALIDATE CONSTRAINT for this particular FK > and it doesn't report any errors. The documentation pretty clearly > mentions that 'VALIDATE CONSTRAINT' is used only to check those > constraints created with 'NOT VALID' clause. > > It might be useful to have an option to check integrity of any FK (not > just ones created with NOT VALID clause). Please let me know if there > is already any way to do this. > > Also, is there any way to make sure the FK checking trigger can never > be disabled (so that such a case will never arise)? > > How do I proceed from here - Do I just delete the inconsistent rows or > is there something more I have to do? > > Thanks for your help. It could be that somebody disabled the triggers, but that would have to be a superuser. And I hope that people randomly disabling system triggers on tables don't have superuser access to your database. There is no way to ascertain that that did happen. It could be in the log if you have "log_statement = 'ddl'" and keep old logs around. The other option is that you are suppering from data corruption, perhaps because of a software bug, but most likely because of hardware problems. If you don't know better, assume the worst. I would test the hardware for problems. Once you are sure the hardware is fine, manually fix the corruption by deleting rows that violate the constraint. Then create a new PostgreSQL cluster with "initdb", dump the original database with "pg_dumpall" and restore it to the new cluster. That should get rid of all data corruption. Make sure you upgrade to 10.11. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On 11/28/19 5:25 AM, Nandakumar M wrote: > Hi, > > I am using PG version 10.5. > > Saw a table where we have foreign key defined but few thousand rows > violate the foreign key constraint. Have you verified that the FK is not in the parent table and is just not some index error/corruption? > It might be useful to have an option to check integrity of any FK (not > just ones created with NOT VALID clause). Please let me know if there > is already any way to do this. Assuming no corruption, how about?: select distinct on(fk_field) fk_field from child_table where child_table.fk_field not in (select parent_field from parent_table); > > Also, is there any way to make sure the FK checking trigger can never > be disabled (so that such a case will never arise)? Not sure that can happen as it is baked into existing code. For instance: https://www.postgresql.org/docs/11/app-pgdump.html " --disable-triggers This option is relevant only when creating a data-only dump. It instructs pg_dump to include commands to temporarily disable triggers on the target tables while the data is reloaded. Use this if you have referential integrity checks or other triggers on the tables that you do not want to invoke during data reload. ... " > > How do I proceed from here - Do I just delete the inconsistent rows or > is there something more I have to do? > > Thanks for your help. > > Regards, > Nanda > > > -- Adrian Klaver adrian.klaver@aklaver.com
Hi, > It could be that somebody disabled the triggers, but that would have to > be a superuser. And I hope that people randomly disabling system triggers > on tables don't have superuser access to your database. It is unlikely that this happened. So I am assuming corruption. But I am able to query both the referred and referring table successfully without any 'missing chunk' or similar errors that usually indicate corruption. Is it possible that corruption might cause data loss like this without any errors? > The other option is that you are suppering from data corruption, perhaps > because of a software bug, but most likely because of hardware problems. > > If you don't know better, assume the worst. > > I would test the hardware for problems. > Once you are sure the hardware is fine, manually fix the corruption > by deleting rows that violate the constraint. How do I test for hardware problems? Are there any tools for this? I am running PG on windows machine. Thanks. Regards, Nanda
Hi, > Have you verified that the FK is not in the parent table and is just not > some index error/corruption? Yes. > > > > Also, is there any way to make sure the FK checking trigger can never > > be disabled (so that such a case will never arise)? > > Not sure that can happen as it is baked into existing code. For instance: > > https://www.postgresql.org/docs/11/app-pgdump.html > > " > --disable-triggers > > This option is relevant only when creating a data-only dump. It > instructs pg_dump to include commands to temporarily disable triggers on > the target tables while the data is reloaded. Use this if you have > referential integrity checks or other triggers on the tables that you do > not want to invoke during data reload. Found this thread which discusses the same topic as here. https://www.postgresql.org/message-id/20190715160926.GA17140%40alvherre.pgsql PG already allows a new FK to be created with ADD CONSTRAINT ... NOT VALID clause which can be validated later using ALTER TABLE ... VALIDATE CONSTRAINT. I guess what we are looking for here is the same but for existing FKs. i.e Something like `ALTER TABLE distributors ALTER CONSTRAINT distfk NOT VALID;` `ALTER TABLE distributors VALIDATE CONSTRAINT distfk;` Regards, Nanda
Hi, > PG already allows a new FK to be created with ADD CONSTRAINT ... NOT > VALID clause which can be validated later using ALTER TABLE ... > VALIDATE CONSTRAINT. > I guess what we are looking for here is the same but for existing FKs. > > i.e Something like > > `ALTER TABLE distributors ALTER CONSTRAINT distfk NOT VALID;` > `ALTER TABLE distributors VALIDATE CONSTRAINT distfk;` > I was wrong about this. ADD CONSTRAINT ... NOT VALID just postpones integrity existing on existing data. There is no equivalent for that when altering an existing FK. Maybe, DISABLE ALL TRIGGERS can mark the FKs as NOT VALID. ALTER TABLE ... VALIDATE CONSTRAINT would now report inconsistencies if any. Also, VALIDATE CONSTRAINT would have to verify that the trigger implementing FK is enabled. Hope/unsure if this would not break backwards compatibility. Regards, Nanda
Nandakumar M <m.nanda92@gmail.com> writes: >> It could be that somebody disabled the triggers, but that would have to >> be a superuser. And I hope that people randomly disabling system triggers >> on tables don't have superuser access to your database. > It is unlikely that this happened. So I am assuming corruption. > But I am able to query both the referred and referring table > successfully without any 'missing chunk' or similar errors that > usually indicate corruption. > Is it possible that corruption might cause data loss like this without > any errors? The most likely "corruption" explanation is something wrong with the indexes on the referenced and/or referencing column, causing rows to not be found when referential actions should have found them. Random querying of the tables wouldn't necessarily expose that --- you'd need to be sure that your queries use the questionable indexes, and maybe even search for some of the specific rows that seem mis-indexed. regards, tom lane
On Fri, Nov 29, 2019 at 7:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > The most likely "corruption" explanation is something wrong with the > indexes on the referenced and/or referencing column, causing rows to > not be found when referential actions should have found them. Random > querying of the tables wouldn't necessarily expose that --- you'd need > to be sure that your queries use the questionable indexes, and maybe > even search for some of the specific rows that seem mis-indexed. Or try using contrib/amcheck, which is available in Postgres 10. Perhaps try the query here, modified to verify all B-Tree indexes (not just those indexes in the pg_catalog schema): https://www.postgresql.org/docs/10/amcheck.html -- Peter Geoghegan
Hi, Sorry about the delay in getting back with the results. > > On Fri, Nov 29, 2019 at 7:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > The most likely "corruption" explanation is something wrong with the > > indexes on the referenced and/or referencing column, causing rows to > > not be found when referential actions should have found them. Random > > querying of the tables wouldn't necessarily expose that --- you'd need > > to be sure that your queries use the questionable indexes, and maybe > > even search for some of the specific rows that seem mis-indexed. This indeed is the problem. Select query with criteria on FK column did not return any rows. However, after I disabled indexscan, bitmap indexscan and tried the same query this time sequential scan was performed by PG and it returned 80 rows. > Or try using contrib/amcheck, which is available in Postgres 10. > Perhaps try the query here, modified to verify all B-Tree indexes (not > just those indexes in the pg_catalog schema): > > https://www.postgresql.org/docs/10/amcheck.html > > -- I tried amcheck query on all indexes in the database and it did not raise any errors. How do I identify such corruption exists in the database? Will enabling page checksum be of help here? Thanks. Regards, Nanda