Thread: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

On 9.3.17, I tried to restore a tar from pg_dump.  It failed thusly:

bash-4.1$ pg_restore -d spc_test_1 agency_backup.spc.2017.06.05_10.30.01.tar

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 10608; 0 107743 TABLE DATA tbl_payment spc
pg_restore: [archiver (db)] COPY failed for table "tbl_payment": ERROR:  new row for relation "tbl_payment" violates check constraint "authorized_approvers_only"
DETAIL:  Failing row contains (286541, 3685, 2015-09-14, ADJUST, null, null, 137798, 93.00, HONEY, 4841, 2, SHONCRE, September adjustment, 2015-10-01, null, null, null, null, null, f, f, t, f, f, f, f, null, null, null, null, 6, 2015-09-14 16:43:37, 25, 2016-02-08 16:34:20, f, null, null, null, Adjusting approved_at to changed_at for first few approvals
, 6, 2015-09-14 16:43:37, 2015-09-17).
CONTEXT:  COPY tbl_payment, line 179785: "286541        3685    2015-09-14      ADJUST  \N      \N      137798  93.00   HONEY   4841    2
        SHONCRE September adjustment    2015-10-0..."
WARNING: errors ignored on restore: 1

The rest of the DB is fine, but tbl_payment has 0 rows.  I believe this is because tbl_payment has a constraint that calls a function has_perm() that relies on data in a couple of other tables, and that tbl_payment is being restored before those tables.  I was able to created a new dump in Custom format, reorder the List file, and restore that successfully.

So I can switch to Custom format for future backups.  But regarding the existing backups I have in Tar format, is there any way to successfully restore them?  Specifically:
  • Any way to ignore or delay constraint checking?  Something like disable-triggers?
  • Any way to tell pg_restore to skip past the failing row, and restore the rest of what was in tbl_payment?
  • Some other way to go about this?
I also wonder if you folks might consider adding something like a --test_restore option to pg_dump that would attempt to create a new (scratch) DB from the output it creates, and report any errors?  I know the pieces are all there for us users to do that ourselves, but it would be handy for automated backups and might help us to avoid creating backups that won't restore successfully.  In my case, I think the problem started from changes we made about 9 months ago, and happily I discovered it during development/testing and not after a DB crash, which is why I'm also happily not gouging my eyeballs out right now. :)

Cheers, and thanks in advance!

Ken


--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

From
"David G. Johnston"
Date:
On Mon, Jun 5, 2017 at 3:35 PM, Ken Tanzer <ken.tanzer@gmail.com> wrote:
I believe this is because tbl_payment has a constraint that calls a function has_perm() that relies on data in a couple of other tables

​Indeed this is the cause.  That configuration is not supported.  If you need to lookup values in other tables you either need to use an actual FK constraint or create a trigger for the validation.
 
So I can switch to Custom format for future backups.  But regarding the existing backups I have in Tar format, is there any way to successfully restore them?  Specifically:
  • Any way to ignore or delay constraint checking?  Something like disable-triggers?
​Using and then disabling triggers is the "closest" solution​.
  • Any way to tell pg_restore to skip past the failing row, and restore the rest of what was in tbl_payment?
​No, COPY doesn't have that capability and that is what is being used under the hood.
  • Some other way to go about this?
​Ideally figure out how to write an actual FK constraint - otherwise use triggers.​
 
I also wonder if you folks might consider adding something like a --test_restore option to pg_dump

-1; pg_dump should not be trying to restore things.​  The core developers shouldn't really concern themselves with the various and sundry ways people might want to setup such a process.  You have tools for dump, and tools for restore, and you can combine them in whatever fashion you deem useful.  Or otherwise acquire someone else's ideas.

​David J.​

On 06/05/2017 03:35 PM, Ken Tanzer wrote:
> On 9.3.17, I tried to restore a tar from pg_dump.  It failed thusly:
>
> bash-4.1$ pg_restore -d spc_test_1 agency_backup.spc.2017.06.05_10.30.01.tar
>
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 10608; 0 107743 TABLE
> DATA tbl_payment spc
> pg_restore: [archiver (db)] COPY failed for table "tbl_payment": ERROR:
>   new row for relation "tbl_payment" violates check constraint
> "authorized_approvers_only"
> DETAIL:  Failing row contains (286541, 3685, 2015-09-14, ADJUST, null,
> null, 137798, 93.00, HONEY, 4841, 2, SHONCRE, September adjustment,
> 2015-10-01, null, null, null, null, null, f, f, t, f, f, f, f, null,
> null, null, null, 6, 2015-09-14 16:43:37, 25, 2016-02-08 16:34:20, f,
> null, null, null, Adjusting approved_at to changed_at for first few
> approvals
> , 6, 2015-09-14 16:43:37, 2015-09-17).
> CONTEXT:  COPY tbl_payment, line 179785: "286541        3685
>   2015-09-14      ADJUST  \N      \N      137798  93.00   HONEY   4841    2
>          SHONCRE September adjustment    2015-10-0..."
> WARNING: errors ignored on restore: 1
>
> The rest of the DB is fine, but tbl_payment has 0 rows.  I believe this
> is because tbl_payment has a constraint that calls a function has_perm()
> that relies on data in a couple of other tables, and that tbl_payment is
> being restored before those tables.  I was able to created a new dump in
> Custom format, reorder the List file, and restore that successfully.

See this thread for more info:
https://www.postgresql.org/message-id/alpine.DEB.2.20.1703311620581.12863%40tglase.lan.tarent.de

 From the docs:

https://www.postgresql.org/docs/9.6/static/sql-createtable.html

"Currently, CHECK expressions cannot contain subqueries nor refer to
variables other than columns of the current row. The system column
tableoid may be referenced, but not any other system column.

>
> So I can switch to Custom format for future backups.  But regarding the
> existing backups I have in Tar format, is there any way to successfully
> restore them?  Specifically:
>
>   * Any way to ignore or delay constraint checking?  Something like
>     disable-triggers?
>
>   * Any way to tell pg_restore to skip past the failing row, and restore
>     the rest of what was in tbl_payment?
>
>   * Some other way to go about this?

Change the check constraint to a trigger.

>
> I also wonder if you folks might consider adding something like a
> --test_restore option to pg_dump that would attempt to create a new
> (scratch) DB from the output it creates, and report any errors?  I know

Not that I know of. It would be easy enough to point pg_restore at your
own scratch database for testing purposes.

> the pieces are all there for us users to do that ourselves, but it would
> be handy for automated backups and might help us to avoid creating
> backups that won't restore successfully.  In my case, I think the
> problem started from changes we made about 9 months ago, and happily I
> discovered it during development/testing and not after a DB crash, which
> is why I'm also happily not gouging my eyeballs out right now. :)
>
> Cheers, and thanks in advance!
>
> Ken
>
>
> --
> AGENCY Software
> A Free Software data system
> By and for non-profits
> /http://agency-software.org//
> /https://agency-software.org/demo/client/
> ken.tanzer@agency-software.org
> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=ken.tanzer@agency-software.org>
> (253) 245-3801
>
> Subscribe to the mailing list
> <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=agency-general-request@lists.sourceforge.net&body=subscribe> to
> learn more about AGENCY or
> follow the discussion.


--
Adrian Klaver
adrian.klaver@aklaver.com


Thanks Adrian and David.  That all makes sense, and I gather the answer regarding the existing dumps is "no, they can't be restored."  So be it.  Here's a couple of follow-on comments::

Ideally figure out how to write an actual FK constraint - otherwise use triggers.

I can't really make this an FK.  I can (and probably will) put this into a trigger.  Although it seems like an extra layer of wrapping just to call a function.  I'm curious if there's any conceptual reason why constraints couldn't (as an option) be restored after all the data is loaded, and whether there would be any negative consequences of that?  I could see if your data still didn't pass the CHECKs, it's already loaded.  But the constraint could then be marked not valid?


-1; pg_dump should not be trying to restore things.​  The core developers shouldn't really concern themselves with the various and sundry ways people might want to setup such a process.  You have tools for dump, and tools for restore, and you can combine them in whatever fashion you deem useful.  Or otherwise acquire someone else's ideas.

I get that as a general principle.  OTOH, being able to restore your backups isn't just a random or inconsequential feature.  I have access to the superuser and can create DBs, but users in more locked down scenarios might not be able to do so.


From the docs:
https://www.postgresql.org/docs/9.6/static/sql-createtable.html
"Currently, CHECK expressions cannot contain subqueries nor refer to variables other than columns of the current row. The system column tableoid may be referenced, but not any other system column.

I wonder if that should say "should not," or be followed by something like this:

n.b., In CHECK expressions, Postgres will not prevent you from calling functions that reference other rows or tables.  However, doing so may have undesirable consequences, including the possible inability to restore from output created by pg_dump.

(Are there other possible pitfalls too, or is that the only one?)

Cheers,
Ken


--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.
On 6/5/2017 5:15 PM, Ken Tanzer wrote:
> I can't really make this an FK.  I can (and probably will) put this
> into a trigger.  Although it seems like an extra layer of wrapping
> just to call a function.  I'm curious if there's any conceptual reason
> why constraints couldn't (as an option) be restored after all the data
> is loaded, and whether there would be any negative consequences of
> that?  I could see if your data still didn't pass the CHECKs, it's
> already loaded.  But the constraint could then be marked not valid?


when you have constraints that rely on calling functions, how would it
know what order to check things in ?


--
john r pierce, recycling bits in santa cruz



Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

From
"David G. Johnston"
Date:
On Mon, Jun 5, 2017 at 5:15 PM, Ken Tanzer <ken.tanzer@gmail.com> wrote:  
From the docs:
https://www.postgresql.org/docs/9.6/static/sql-createtable.html
"Currently, CHECK expressions cannot contain subqueries nor refer to variables other than columns of the current row. The system column tableoid may be referenced, but not any other system column.

I wonder if that should say "should not," or be followed by something like this:


Make it say "must not" and I'd agree to change the word "cannot" and leave the rest.  Adding a note regarding functions seems appropriate.

Aside from being a bit more verbose there is nothing useful that writing this as "CHECK function()" provides that you don't also get by writing "CREATE TRIGGER". In a green field we'd probably lock down CHECK a bit more but there is too much code that is technically wrong but correctly functioning that we don't want to break.  IOW, we cannot mandate that the supplied function be immutable even though we should.  And we don't even enforce immutable execution if a function is defined that way.

​David J.​

On 6/5/2017 5:32 PM, David G. Johnston wrote:
On Mon, Jun 5, 2017 at 5:15 PM, Ken Tanzer <ken.tanzer@gmail.com> wrote:  
From the docs:
https://www.postgresql.org/docs/9.6/static/sql-createtable.html
"Currently, CHECK expressions cannot contain subqueries nor refer to variables other than columns of the current row. The system column tableoid may be referenced, but not any other system column.

I wonder if that should say "should not," or be followed by something like this:


Make it say "must not" and I'd agree to change the word "cannot" and leave the rest.  Adding a note regarding functions seems appropriate.

Aside from being a bit more verbose there is nothing useful that writing this as "CHECK function()" provides that you don't also get by writing "CREATE TRIGGER". In a green field we'd probably lock down CHECK a bit more but there is too much code that is technically wrong but correctly functioning that we don't want to break.  IOW, we cannot mandate that the supplied function be immutable even though we should.  And we don't even enforce immutable execution if a function is defined that way.


indeed, any sort of constraint that invokes a function call which looks at other tables could later be invalidated if those other tables change, and postgres would be none the smarter.   the same goes for trigger based checks.



-- 
john r pierce, recycling bits in santa cruz
Aside from being a bit more verbose there is nothing useful that writing this as "CHECK function()" provides that you don't also get by writing "CREATE TRIGGER".

I agree you get the same result.  It may be a minor issue, but for me it is convenient to see the logic spelled out when using \d on the table.

Cheers,
Ken

--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

From
"David G. Johnston"
Date:
On Mon, Jun 5, 2017 at 5:40 PM, John R Pierce <pierce@hogranch.com> wrote:
​i​
ndeed, any sort of constraint that invokes a function call which looks at other tables could later be invalidated if those other tables change, and postgres would be none the smarter.   the same goes for trigger based checks.

​Yes.  I could imagine a new kind of "multi-referential trigger" that would specify all relations it touches and the function to fire when each of them is updated.  While you'd still have to write the functions correctly it would at least allow one to explicitly model the multi-table dynamic in pg_catalog.  Lacking that CHECK is no worse than TRIGGER and we've decided to say "use triggers".

David J.​
On 6/5/2017 5:49 PM, David G. Johnston wrote:
On Mon, Jun 5, 2017 at 5:40 PM, John R Pierce <pierce@hogranch.com> wrote:
​i​
ndeed, any sort of constraint that invokes a function call which looks at other tables could later be invalidated if those other tables change, and postgres would be none the smarter.   the same goes for trigger based checks.

​ Yes.  I could imagine a new kind of "multi-referential trigger" that would specify all relations it touches and the function to fire when each of them is updated.  While you'd still have to write the functions correctly it would at least allow one to explicitly model the multi-table dynamic in pg_catalog.  Lacking that CHECK is no worse than TRIGGER and we've decided to say "use triggers".


at $job, the policy is, AVOID ALL TRIGGERS AND FANCY CONSTRAINTS :)

they don't even like using foreign key references, and rely on code logic to do most joins in the performance-critical OLTP side of things.


-- 
john r pierce, recycling bits in santa cruz
On 06/05/2017 05:15 PM, Ken Tanzer wrote:
> Thanks Adrian and David.  That all makes sense, and I gather the answer
> regarding the existing dumps is "no, they can't be restored."  So be
> it.  Here's a couple of follow-on comments::
>
>     Ideally figure out how to write an actual FK constraint - otherwise
>     use triggers.
>
>
> I can't really make this an FK.  I can (and probably will) put this into
> a trigger.  Although it seems like an extra layer of wrapping just to
> call a function.  I'm curious if there's any conceptual reason why
> constraints couldn't (as an option) be restored after all the data is
> loaded, and whether there would be any negative consequences of that?  I
> could see if your data still didn't pass the CHECKs, it's already
> loaded.  But the constraint could then be marked not valid?

Not sure why just know that if I stay within the guidelines it works, if
I do not its does not work:)

>
>
>     -1; pg_dump should not be trying to restore things.​  The core
>     developers shouldn't really concern themselves with the various and
>     sundry ways people might want to setup such a process.  You have
>     tools for dump, and tools for restore, and you can combine them in
>     whatever fashion you deem useful.  Or otherwise acquire someone
>     else's ideas.
>
>
> I get that as a general principle.  OTOH, being able to restore your
> backups isn't just a random or inconsequential feature.  I have access
> to the superuser and can create DBs, but users in more locked down
> scenarios might not be able to do so.
>

See that, but in your scenario you wanted to create a 'scratch' database
so you are back to a user with privileges.  Then there is the whole
overhead of doing a restore twice. Basically, if you have no way to test
your backup/restore procedure before hand you are flying blind.


--
Adrian Klaver
adrian.klaver@aklaver.com


I can't really make this an FK.  I can (and probably will) put this into a trigger.  Although it seems like an extra layer of wrapping just to call a function.  I'm curious if there's any conceptual reason why constraints couldn't (as an option) be restored after all the data is loaded, and whether there would be any negative consequences of that?  I could see if your data still didn't pass the CHECKs, it's already loaded.  But the constraint could then be marked not valid?

Not sure why just know that if I stay within the guidelines it works, if I do not its does not work:)


That's fair enough, leaving aside the curiosity part.  Usually though the things you can't do just aren't allowed.  It's easier to overlook something that you shouldn't (but can) do!

 
See that, but in your scenario you wanted to create a 'scratch' database so you are back to a user with privileges. 


Yeah, I was thinking pg_dump could just conjure it up in the ether (and then discard it), but I can see that doesn't really work.


Basically, if you have no way to test your backup/restore procedure before hand you are flying blind.


In this case, we had tested the restore part.  But then we changed the DB in a way that made it stop working.  Good reminder to retest that periodically!

Ken




--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

From
"David G. Johnston"
Date:
On Mon, Jun 5, 2017 at 5:59 PM, Ken Tanzer <ken.tanzer@gmail.com> wrote:
I can't really make this an FK.  I can (and probably will) put this into a trigger.  Although it seems like an extra layer of wrapping just to call a function.  I'm curious if there's any conceptual reason why constraints couldn't (as an option) be restored after all the data is loaded, and whether there would be any negative consequences of that?  I could see if your data still didn't pass the CHECKs, it's already loaded.  But the constraint could then be marked not valid?

Not sure why just know that if I stay within the guidelines it works, if I do not its does not work:)


That's fair enough, leaving aside the curiosity part.  Usually though the things you can't do just aren't allowed.  It's easier to overlook something that you shouldn't (but can) do!


​I find in life most things that are prohibited are actually doable - you're just punished if you get caught doing them.  In all seriousness though I agree it would be nice if that's how this worked; but decades of historical precedent makes actual preventive enforcement ​difficult if not impossible.

Since "test your backups" covers this potential problem, and so many possible others, any non-trivial effort to solve the actual problem is hard to justify spending time on.

I do get the "make \d show relevant information" argument and that is one that seems easier to solve, since adding explicit dependencies during trigger creation would be a purely new feature.

David J.

I do get the "make \d show relevant information" argument and that is one that seems easier to solve...

Maybe I'm missing something, but I'm not sure how you'd solve this or change what \d shows for a table.  Right now I get to see this in my \d:
"authorized_approvers_only" CHECK (approved_by IS NULL OR has_perm(approved_by, 'APPROVE_PAYMENT'::character varying, 'W'::character 
varying))
But when I move that to a trigger, I'll only see the trigger name.  Any while this procedure would be really short, others not so much, so you wouldn't really want to automatically display it inline.

Ken

--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: [GENERAL] Help with restoring a dump in Tar format? (dependencies/ordering)

From
"David G. Johnston"
Date:
On Mon, Jun 5, 2017 at 6:21 PM, Ken Tanzer <ken.tanzer@gmail.com> wrote:
I do get the "make \d show relevant information" argument and that is one that seems easier to solve...

Maybe I'm missing something, but I'm not sure how you'd solve this or change what \d shows for a table.  Right now I get to see this in my \d:
"authorized_approvers_only" CHECK (approved_by IS NULL OR has_perm(approved_by, 'APPROVE_PAYMENT'::character varying, 'W'::character 
varying))
But when I move that to a trigger, I'll only see the trigger name.  Any while this procedure would be really short, others not so much, so you wouldn't really want to automatically display it inline.

​FWIW​

​I wouldn't show the trigger functions but I'd show something like:

CREATE ​trg_tbl2_exists_tbl3_missing_or_vice_versa 
TRIGGER ON tbl1 CHANGES EXECUTE func_tbl1
REFERENCES tbl2 CHANGES EXECUTE func_tbl2
REFERENCES tbl3 CHANGES EXECUTE func_tbl3;

FOR tbl1
DEPENDS ON tbl2, tbl3 VIA TRIGGER ​trg_tbl2_exists_tbl3_missing_or_vice_versa

​FOR tbl2
DEPENDED ON BY tbl1 VIA TRIGGER ​​trg_tbl2_exists_tbl3_missing_or_vice_versa

FOR tbl3
DEPENDED ON BY tbl1 VIA TRIGGER ​​trg_tbl2_exists_tbl3_missing_or_vice_versa

I suspect the possibility to enforce that trigger execution doesn't touch tables other than those specified.

​David J.

Ken Tanzer <ken.tanzer@gmail.com> writes:
> ...The rest of the DB is fine, but tbl_payment has 0 rows.  I believe this is
> because tbl_payment has a constraint that calls a function has_perm() that
> relies on data in a couple of other tables, and that tbl_payment is being
> restored before those tables.  I was able to created a new dump in Custom
> format, reorder the List file, and restore that successfully.

> So I can switch to Custom format for future backups.  But regarding the
> existing backups I have in Tar format, is there any way to successfully
> restore them?

FWIW, the business with making and editing a list file should work just
fine with a tar-format dump, not only with a custom-format dump.  The
metadata is all there in either case.

As already noted, it's hard to get pg_dump/pg_restore to cope
automatically with hidden dependencies like what you have here.
The fact that those other tables would need to be restored first
simply isn't visible to pg_dump.

            regards, tom lane


> So I can switch to Custom format for future backups.  But regarding the
> existing backups I have in Tar format, is there any way to successfully
> restore them?

FWIW, the business with making and editing a list file should work just
fine with a tar-format dump, not only with a custom-format dump.  The
metadata is all there in either case.

I had tried that originally, but got an error:

bash-4.1$ pg_restore -L spc_restore_list.tmp -d spc_test_1 agency_backup.spc.2017.06.05_10.30.01.tar 

pg_restore: [tar archiver] restoring data out of order is not supported in this archive format: "10608.dat" is required, but comes before "10760.dat" in the archive file.

The pg_dump doc page kinda suggests but doesn't quite say that you can't re-order tar files; between that and the error message I gave up on that possibility.  Are you suggesting it should work?

The alternative archive file formats must be used with pg_restore to rebuild the database. They allow pg_restore to be selective about what is restored, or even to reorder the items prior to being restored. The archive file formats are designed to be portable across architectures.

When used with one of the archive file formats and combined with pg_restorepg_dump provides a flexible archival and transfer mechanism. pg_dump can be used to backup an entire database, then pg_restore can be used to examine the archive and/or select which parts of the database are to be restored. The most flexible output file formats are the "custom" format (-Fc) and the "directory" format(-Fd). They allow for selection and reordering of all archived items, support parallel restoration, and are compressed by default. The "directory" format is the only format that supports parallel dumps.

Cheers,
Ken
--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.
On 06/05/2017 05:59 PM, Ken Tanzer wrote:

>
>
>     Not sure why just know that if I stay within the guidelines it
>     works, if I do not its does not work:)
>
>
> That's fair enough, leaving aside the curiosity part.  Usually though
> the things you can't do just aren't allowed.  It's easier to overlook
> something that you shouldn't (but can) do!

Yes, what you ran into is just a subset of a bigger issue. That being,
there are many ways you can dump a database and not get what you wanted
on the restore. Another example, that is similar, is using the -n switch
to pg_dump when you have cross schema references in the schema you did dump.


>
> Ken
>
>
>
>



--
Adrian Klaver
adrian.klaver@aklaver.com


Ken Tanzer <ken.tanzer@gmail.com> writes:
>> FWIW, the business with making and editing a list file should work just
>> fine with a tar-format dump, not only with a custom-format dump.  The
>> metadata is all there in either case.

> The pg_dump doc page kinda suggests but doesn't quite say that you can't
> re-order tar files; between that and the error message I gave up on that
> possibility.  Are you suggesting it should work?

[ sorry for slow response ]

Ah, right: you can reorder simple object declarations, but you can't
change the relative order in which TABLE DATA objects are restored.
This is because the code doesn't support seeking in the tar file,
so it has to either read or skip each table-data subfile as it comes
to it.

It seems to me that that's just a small matter of programming to fix,
but few people use the tar format so nobody's bothered.

            regards, tom lane