Thread: pg_restore fails due to foreign key violation

pg_restore fails due to foreign key violation

From
Olga Vingurt
Date:
Hi,

We are using PostgresSQL 9.5.10 and pg_dump/pg_restore to export and import database.

We encountered an issue (which is not easily reproducible) when running pg_restore:
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 3624; 2606 37504 FK CONSTRAINT <fk_name> postgres
pg_restore: [archiver (db)] could not execute query: ERROR:  insert or update on table "<table_name>" violates foreign key constraint "fk_name"
DETAIL:  Key (<column_name>)=(fbc3dd78-643f-419e-9f0f-72d81ef22cf3) is not present in table "<table_name>".
    Command was: ALTER TABLE ONLY <table_name> ADD CONSTRAINT <fk_name> FOREIGN KEY 

The following command was used to backup the database:
postgresql-9.5.10-2-windows-x64-binaries\pgsql\bin\pg_dump.exe -Fc --verbose --no-password -U postgres -d, postgres -f db.dump -E utf8

The following command was used to restore the database:
postgresql-9.5.10-2-windows-x64-binaries\pgsql\bin\pg_restore.exe --verbose --no-password --single-transaction --no-owner --no-privileges -U postgres -d postgres db.dump

After playing with the dump and importing schema first and data next without the triggers we indeed see that data is missing in the table i.e. dump is not consistent. 
We don't stop the application which uses database during the dump but according to the documentation the dump still should be consistent.

How is it possible that pg_dump created dump which is not consistent? Did it happen because we haven't stopped the application?
Is there any way to create dumps without stopping the application?

Thanks,
Olga


Re: pg_restore fails due to foreign key violation

From
Andreas Kretschmer
Date:
Am 10.12.18 um 11:15 schrieb Olga Vingurt:
> After playing with the dump and importing schema first and data next 
> without the triggers we indeed see that data is missing in the table 
> i.e. dump is not consistent.
> We don't stop the application which uses database during the dump but 
> according to the documentation the dump still should be consistent.


please check if the data are consistent on the source database system.


>
> How is it possible that pg_dump created dump which is not consistent? 
> Did it happen because we haven't stopped the application?


No, but maybe you have corrupted indexes on the source system.



Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



Re: pg_restore fails due to foreign key violation

From
Martín Marqués
Date:
Hi,

El lun., 10 dic. 2018 a las 7:21, Andreas Kretschmer (<andreas@a-kretschmer.de>) escribió:

Am 10.12.18 um 11:15 schrieb Olga Vingurt:
> After playing with the dump and importing schema first and data next
> without the triggers we indeed see that data is missing in the table
> i.e. dump is not consistent.
> We don't stop the application which uses database during the dump but
> according to the documentation the dump still should be consistent.


please check if the data are consistent on the source database system.


I would start by running amcheck to see if there is index corruption somewhere, as that can lead to data corruption (in particular if the corrupted index is a unique or primary key index)
 
Regards,

--
Martín Marqués
It’s not that I have something to hide,
it’s that I have nothing I want you to see

Re: pg_restore fails due to foreign key violation

From
Olga Vingurt
Date:
The data indeed wasn't consistent on the source system and foreign key index was corrupted.
After manually cleaning not relevant records and running REINDEX on the table pd_dump and pg_restore worked as expected.

The only question left is how we got into corrupted data state.
In the event logs (PorstgeSQL is runnign on Wondows Server) we found error which looks relevant:
 
ERROR:  could not truncate file "base/12373/17254" to 19 blocks: Permission denied
CONTEXT:  automatic vacuum of table "postgres.public.<table_name>"

Re: pg_restore fails due to foreign key violation

From
Tom Lane
Date:
Olga Vingurt <olga.vingurt@gmail.com> writes:
> The only question left is how we got into corrupted data state.
> In the event logs (PorstgeSQL is runnign on Wondows Server) we found error
> which looks relevant:

> ERROR:  could not truncate file "base/12373/17254" to 19 blocks: Permission denied
> CONTEXT:  automatic vacuum of table "postgres.public.<table_name>"

Hm.  In theory, that truncation failure in itself shouldn't have caused a
problem --- autovacuum is just trying to remove some empty pages, and if
they don't get removed, they'd still be empty.  However, there's a problem
if the pages are empty because we just deleted some recently-dead tuples,
because the state of the pages on-disk might be different from what it
is in-memory.  In that case the truncation failure effectively results in
those pages reverting to some prior state, possibly bringing dead tuples
"back to life".  We've been poking at that problem off and on for awhile;
the most recent thread is here:

https://www.postgresql.org/message-id/flat/5BBC590AE8DF4ED1A170E4D48F1B53AC%40tunaPC

Assuming that that is the issue (which is unproven, but seems probable)
then your report makes the third report we've seen of this failure mode
in about a decade --- and it's been there, in principle, much longer than
that.  Nobody's thought of a fix that isn't either very unpleasant for
performance (eg, "uselessly" writing out pages we're about to truncate
away) or bad for robustness (eg, PANIC'ing on truncate failure).  So
while it's certainly bad news when it happens, I'm not sure that any
fix is going to get committed soon.

In the short term, what you need to do is figure out what caused the
permission failure.  The general belief among pgsql-hackers is that
shoddy antivirus products tend to cause this, but I don't know details.

            regards, tom lane


Re: pg_restore fails due to foreign key violation

From
Olga Vingurt
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm.  In theory, that truncation failure in itself shouldn't have caused a
problem --- autovacuum is just trying to remove some empty pages, and if
they don't get removed, they'd still be empty.  However, there's a problem
if the pages are empty because we just deleted some recently-dead tuples,
because the state of the pages on-disk might be different from what it
is in-memory. 

It indeed looks like that was exactly the issue. 
The error we saw in the event log happened only once and mentioned the specific table we had issues with.  
We had rows in the table which should have been deleted due to foreign key constraint (ON DELETE CASCADE configured for the foreign key) and when I tried to select one of the rows by using the column with the foreign key nothing returned in the query so I guess the matching index was missing the rows.

In the short term, what you need to do is figure out what caused the
permission failure.  The general belief among pgsql-hackers is that
shoddy antivirus products tend to cause this, but I don't know details.

There is no antivirus on the Windows server. As it happened only once (in a few years we installed on the server) and we don't have any additional info why PostgreSQL got "Permission denied" error we will hope for the best i.e. that we won't get into this situation again.
Thanks a lot for the help!

Regards, 
Olga