Thread: Re: Proposal: More flexible backup/restore via pg_dump

Re: Proposal: More flexible backup/restore via pg_dump

From
Philip Warner
Date:
At 10:17 26/06/00 +0200, Zeugswetter Andreas SB wrote:
>
>A problem I see with an index at file end is, that you will need to read the
>file twice, and that may be very undesireable if e.g the backup is on tape
>or a compressed file.

The proposal has actually come a fairly long way after extensive
discussions with Tom Lane, and I have added the current plans at the end of
this message. The TOC-at-end problem is an issue that I am trying to deal
with; I am planning a 'custom' format that has the large parts (data dumps)
compressed, to avoid the need of compressing the entire file. This means
that you would not need to uncompress the entire file to get to the TOC, or
to restore just the schema. It also allows good random access to defns and
data. I'm also considering putting the dumped data at the end of the file,
but this has issues when you want to restore table data before defining
indexes, for example.

I must admit that I've been working on the assumption that people using
PostgreSQL don't have multi-GB (compressed) database dumps, so that (in
theory) a restore can be loaded onto disk from tape before being used. I
know this is pretty evil, but it will cover 95% of users. For those people
with huge backups, they will have to suffer tapes that go backward and
forwards a bit. From the details below, you will see that this is unavoidable.

Sanity Check: does fseek work on tapes? If not, what is the correct way to
read a particular block/byte from a file on a tape?

-----------------------------------------------------------
Updated Proposal:
-------------------------

For the sake of argument, call the new utilities pg_backup and pg_restore.

pg_backup
---------

Dump schema [and data] in OID order (to try to make restores sequential,
for when tar/tape storage is used). Each dumped item has a TOC entry which
includes the OID and description, and for those items for which we know
some dependencies (functions for types & aggregates; types for tables;
superclasses for classes; - any more?), it will also dump the dependency OIDs.

Each object (table defn, table data, function defn, type defn etc) is
dumped to a separate file/thing in the output file. The TOC entries go into
a separate file/thing (probably only one file/thing for the whole TOC).

The output scheme will be encapsulated, and in the initial version will be
a custom format (since I can't see an API for tar files), and a
dump-to-a-directory format. Future use of tar, DB, PostgreSQL or even a
Make file should not be excluded in the IO design. This last goal *may* not
be achieved, but I don't see why it can't be at this stage. Hopefully
someone with appropriate skills & motivation can do a tar archive 8-}.

The result of a pg_backup should be a single file with metadata and
optional data, along with whatever dependency and extra data is available
pg_backup, or provided by the DBA.


pg_restore
----------

Reads a backup file and dumps SQL suitable for sending to psql.

Options will include:

- No Data (--no-data? -nd? -s?)
- No metadata (--no-schema? -ns? -d?)
- Specification of items to dump from an input file; this allows custom
ordering AND custom selection of multiple items. Basically, I will allow
the user to dump part of the TOC, edit it, and tell pg_restore to use the
edited partial TOC. (--item-list=<file>? -l=<file>?)
- Dump TOC (--toc-only? -c?)

[Wish List]
- Data For a single table (--table=<name>? -t=<name>)
- Defn/Data for a single OID; (--oid=<oid>? -o=<oid>?)
- User definied dependencies. Allow the DB developer to specify once for
thier DB what the dependencies are, then use that files as a guide to the
restore process. (--deps=<file> -D=<file>)

pg_restore will use the same custom IO routines to allow IO to
tar/directory/custom files. In the first pass, I will do custom file IO.

If a user selects to restore the entire metadata, then it will be dumped
according to the defaul policy (OID order). If they select to specify the
items from an input file, then the file ordering is used.


-------

Typical backup procedure:
   pg_backup mydb mydb.bkp

or *maybe* 
   pg_backup mydb > mydb.bkp

BUT AFAIK, fseek does not work on STDOUT, and at the current time pg_backup
will use fseek.


Typical restore procedure:
   pg_restore mydb mydb.bkp | psql 

A user will be able to extract only the schema (-s), only the data (-d), a
specific table (-t=name), or even edit the object order and selection via:
   pg_restore --dump-toc mydb.bkp > mytoc.txt   vi mytoc.txt   {ie. reorder TOC elements as per known dependency
problems}  pg_restore --item-list=mytoc.txt mydb.bkp | psql
 

FWIW, I envisage the ---dump-toc output to look like:

ID; FUNCTION FRED(INT4)
ID; TYPE MY_TYPE
ID; TABLE MY_TABLE
ID; DATA MY_TABLE
ID; INDEX MY_TABLE_IX1
...etc.

so editing and reordering the dump plan should not be too onerous.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Proposal: More flexible backup/restore via pg_dump

From
Giles Lean
Date:
> I must admit that I've been working on the assumption that people using
> PostgreSQL don't have multi-GB (compressed) database dumps, so that (in
> theory) a restore can be loaded onto disk from tape before being
> used.

Are you are also assuming that a backup fits in a single file,
i.e. that anyone with >2GB of backup has some sort of large file
support?

> Sanity Check: does fseek work on tapes? If not, what is the correct way to
> read a particular block/byte from a file on a tape?

As someone else answered: no.  You can't portably assume random access
to tape blocks.

> The output scheme will be encapsulated, and in the initial version will be
> a custom format (since I can't see an API for tar files)

You can use a standard format without there being a standard API.

Using either tar or cpio format as defined for POSIX would allow a lot
of us to understand your on-tape format with a very low burden on you
for documentation.  (If you do go this route you might want to think
about cpio format; it is less restrictive about filename length than
tar.)

There is also plenty of code lying around for reading and writing tar
and cpio formats that you could steal^H^H^H^H^H reuse.  The BSD pax
code should have a suitable license.

> pg_restore will use the same custom IO routines to allow IO to
> tar/directory/custom files. In the first pass, I will do custom file
> IO.

Presumably you'd expect this file I/O to be through some standard API
that other backends would also use?  I'd be interested to see this;
I've got code for an experimental libtar somewhere around here, so I
could offer comments at least.

> BUT AFAIK, fseek does not work on STDOUT, and at the current time pg_backup
> will use fseek.

It depends what fseek is whether it works on standard output or not.
If it's a pipe, no.  If it's a file, yes.  If it's a tape, no.  If
it's a ...

Not using fseek() would be a win if you can see a way to do it.

Regards,

Giles


Re: Proposal: More flexible backup/restore via pg_dump

From
Philip Warner
Date:
At 07:00 27/06/00 +1000, Giles Lean wrote:
>
>Are you are also assuming that a backup fits in a single file,
>i.e. that anyone with >2GB of backup has some sort of large file
>support?

That's up to the format used to save the database; in the case of the
'custom' format, yes. But that is the size after compression. This is not
substantially different to pg_dump's behaviour, except that pg_dump can be
piped to a tape drive...

The objective of the API components are to (a) make it very easy to add new
metadata to dump (eg. tablespaces), and (b) make it easy to add new output
formats (eg. tar archives). Basically the metadata dumping side makes one
call to register the thing to be saved, passing an optional function
pointer to dump data (eg. table contents) - this *could* even be used to
implement dumping of BLOBs.

The 'archiver' format provider must have some basic IO routines:
Read/WriteBuf and Read/WriteByte and has a number of hook functions which
it can use to output the data. It needs to provide at least one function
that actually writes data somewhere. It also has to provide the associated
function to read the data.

>
>As someone else answered: no.  You can't portably assume random access
>to tape blocks.

This is probably an issue. One of the motivations for this utility it to
allow partial restores (eg. table data for one table only), and
arbitrarilly ordered restores. But I may have a solution:

write the schema and TOC out at the start of the file/tape, then compressed
data with headers for each indicating which TOC item they correspond to.
This metadata can be loaded into /tmp, so fseek is possible. The actual
data restoration (assuming constraints are not defined [THIS IS A PROBLEM])
can be done by scanning the rest of the tape in it's own order since RI
will not be an issue. I think I'm happy with this.

But the problem is the constraints: AFAIK there is no 'ALTER TABLE ADD
CONSTRAINT...' so PK, FK, Not Null constraints have to be applied before
data load (*please* tell me I'm wrong). This also means that for large
databases, I should apply indexes to make PK/FK checks fast, but they will
slow data load.

Any ideas?


>> The output scheme will be encapsulated, and in the initial version will be
>> a custom format (since I can't see an API for tar files)
>
>You can use a standard format without there being a standard API.

Being a relatively lazy person, I was hoping to leave that as an excercise
for the reader...


>Using either tar or cpio format as defined for POSIX would allow a lot
>of us to understand your on-tape format with a very low burden on you
>for documentation.  (If you do go this route you might want to think
>about cpio format; it is less restrictive about filename length than
>tar.)

Tom Lane was also very favorably disposed to tar format. As I said above,
the archive interfaces should be pretty amenable to adding tar support -
it's just I'd like to get a version working with custom and directory based
formats to ensure the flexibility is there. As I see it, the 'backup to
directory' format should be easy to use as a basis for the 'backup to tar'
code.

The problem I have with tar is that it does not support random access to
the associated data. For reordering large backups, or (ultimately) single
BLOB extraction, this is a performance problem.

If you have a tar spec (or suitably licenced code), please mail it to me,
and I'll be able to make more informed comments.


>Presumably you'd expect this file I/O to be through some standard API
>that other backends would also use?  I'd be interested to see this;
>I've got code for an experimental libtar somewhere around here, so I
>could offer comments at least.

No problem: I should have a working version pretty soon. The API is
strictly purpose-built; it would be adaptable to a more general archibe
format, but as you say, tar is fine for most purposes.


>> BUT AFAIK, fseek does not work on STDOUT, and at the current time pg_backup
>> will use fseek.
>
>Not using fseek() would be a win if you can see a way to do it.

I think I probably can if I can work my way around RI problems.
Unfortunately the most effective solution will be to allow reording of the
table data restoration order, but that requires multiple passes through the
file to find the table data...


Bye for now,

Philip



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Proposal: More flexible backup/restore via pg_dump

From
Giles Lean
Date:
> If you have a tar spec (or suitably licenced code), please mail it to me,
> and I'll be able to make more informed comments.

The POSIX tar format is documented as part of the "Single Unix
Specification, version 2" which is available from:

http://www.opengroup.org
http://www.opengroup.org/publications/catalog/t912.htm

You can download the standard as HTML.  They keep moving the location
around so if the second URL breaks start from the top. They do want an
email address from you and they will spam this address with
invitations to conferences.  There's no such thing as a free lunch, I
guess.

For source code, any FreeBSD, NetBSD, or OpenBSD mirror will have pax
which understands both cpio and tar format and is BSD licensed:

ftp://ftp.au.netbsd.org/pub/NetBSD/NetBSD-current/src/bin/pax/

Regards,

Giles



Re: Proposal: More flexible backup/restore via pg_dump

From
Philip Warner
Date:
At 13:10 27/06/00 -0700, Stephan Szabo wrote:
>
>Actually, there is an ALTER TABLE ADD CONSTRAINT for foreign key
>constraints.
>

This is good to know; presumably at some stage in the future the rest will
be added, and the backup/restore can be amended to apply constraints after
data load. In the mean time, I suppose people with tape drives who need to
reorder the data load will have to make multiple passes (or copy the file
locally).



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Proposal: More flexible backup/restore via pg_dump

From
Stephan Szabo
Date:
On Tue, 27 Jun 2000, Philip Warner wrote:

> But the problem is the constraints: AFAIK there is no 'ALTER TABLE ADD
> CONSTRAINT...' so PK, FK, Not Null constraints have to be applied before
> data load (*please* tell me I'm wrong). This also means that for large
> databases, I should apply indexes to make PK/FK checks fast, but they will
> slow data load.

Actually, there is an ALTER TABLE ADD CONSTRAINT for foreign key
constraints.  Of course, if the existing data fails the constraint the
constraint doesn't get made. and if you're in a transaction, it'll force
a rollback.
In fact, you really can't always apply foreign key constraints at
schema reload time because you can have tables with circular
dependencies.  Those would have to be created after data load.



Re: Proposal: More flexible backup/restore via pg_dump

From
Bruce Momjian
Date:
Can I ask on a status?

> At 10:17 26/06/00 +0200, Zeugswetter Andreas SB wrote:
> >
> >A problem I see with an index at file end is, that you will need to read the
> >file twice, and that may be very undesireable if e.g the backup is on tape
> >or a compressed file.
> 
> The proposal has actually come a fairly long way after extensive
> discussions with Tom Lane, and I have added the current plans at the end of
> this message. The TOC-at-end problem is an issue that I am trying to deal
> with; I am planning a 'custom' format that has the large parts (data dumps)
> compressed, to avoid the need of compressing the entire file. This means
> that you would not need to uncompress the entire file to get to the TOC, or
> to restore just the schema. It also allows good random access to defns and
> data. I'm also considering putting the dumped data at the end of the file,
> but this has issues when you want to restore table data before defining
> indexes, for example.
> 
> I must admit that I've been working on the assumption that people using
> PostgreSQL don't have multi-GB (compressed) database dumps, so that (in
> theory) a restore can be loaded onto disk from tape before being used. I
> know this is pretty evil, but it will cover 95% of users. For those people
> with huge backups, they will have to suffer tapes that go backward and
> forwards a bit. From the details below, you will see that this is unavoidable.
> 
> Sanity Check: does fseek work on tapes? If not, what is the correct way to
> read a particular block/byte from a file on a tape?
> 
> -----------------------------------------------------------
> Updated Proposal:
> -------------------------
> 
> For the sake of argument, call the new utilities pg_backup and pg_restore.
> 
> pg_backup
> ---------
> 
> Dump schema [and data] in OID order (to try to make restores sequential,
> for when tar/tape storage is used). Each dumped item has a TOC entry which
> includes the OID and description, and for those items for which we know
> some dependencies (functions for types & aggregates; types for tables;
> superclasses for classes; - any more?), it will also dump the dependency OIDs.
> 
> Each object (table defn, table data, function defn, type defn etc) is
> dumped to a separate file/thing in the output file. The TOC entries go into
> a separate file/thing (probably only one file/thing for the whole TOC).
> 
> The output scheme will be encapsulated, and in the initial version will be
> a custom format (since I can't see an API for tar files), and a
> dump-to-a-directory format. Future use of tar, DB, PostgreSQL or even a
> Make file should not be excluded in the IO design. This last goal *may* not
> be achieved, but I don't see why it can't be at this stage. Hopefully
> someone with appropriate skills & motivation can do a tar archive 8-}.
> 
> The result of a pg_backup should be a single file with metadata and
> optional data, along with whatever dependency and extra data is available
> pg_backup, or provided by the DBA.
> 
> 
> pg_restore
> ----------
> 
> Reads a backup file and dumps SQL suitable for sending to psql.
> 
> Options will include:
> 
> - No Data (--no-data? -nd? -s?)
> - No metadata (--no-schema? -ns? -d?)
> - Specification of items to dump from an input file; this allows custom
> ordering AND custom selection of multiple items. Basically, I will allow
> the user to dump part of the TOC, edit it, and tell pg_restore to use the
> edited partial TOC. (--item-list=<file>? -l=<file>?)
> - Dump TOC (--toc-only? -c?)
> 
> [Wish List]
> - Data For a single table (--table=<name>? -t=<name>)
> - Defn/Data for a single OID; (--oid=<oid>? -o=<oid>?)
> - User definied dependencies. Allow the DB developer to specify once for
> thier DB what the dependencies are, then use that files as a guide to the
> restore process. (--deps=<file> -D=<file>)
> 
> pg_restore will use the same custom IO routines to allow IO to
> tar/directory/custom files. In the first pass, I will do custom file IO.
> 
> If a user selects to restore the entire metadata, then it will be dumped
> according to the defaul policy (OID order). If they select to specify the
> items from an input file, then the file ordering is used.
> 
> 
> -------
> 
> Typical backup procedure:
> 
>     pg_backup mydb mydb.bkp
> 
> or *maybe* 
> 
>     pg_backup mydb > mydb.bkp
> 
> BUT AFAIK, fseek does not work on STDOUT, and at the current time pg_backup
> will use fseek.
> 
> 
> Typical restore procedure:
> 
>     pg_restore mydb mydb.bkp | psql 
> 
> A user will be able to extract only the schema (-s), only the data (-d), a
> specific table (-t=name), or even edit the object order and selection via:
> 
>     pg_restore --dump-toc mydb.bkp > mytoc.txt
>     vi mytoc.txt   {ie. reorder TOC elements as per known dependency problems}
>     pg_restore --item-list=mytoc.txt mydb.bkp | psql
> 
> FWIW, I envisage the ---dump-toc output to look like:
> 
> ID; FUNCTION FRED(INT4)
> ID; TYPE MY_TYPE
> ID; TABLE MY_TABLE
> ID; DATA MY_TABLE
> ID; INDEX MY_TABLE_IX1
> ...etc.
> 
> so editing and reordering the dump plan should not be too onerous.
> 
> 
> ----------------------------------------------------------------
> Philip Warner                    |     __---_____
> Albatross Consulting Pty. Ltd.   |----/       -  \
> (A.C.N. 008 659 498)             |          /(@)   ______---_
> Tel: (+61) 0500 83 82 81         |                 _________  \
> Fax: (+61) 0500 83 82 82         |                 ___________ |
> Http://www.rhyme.com.au          |                /           \|
>                                  |    --________--
> PGP key available upon request,  |  /
> and from pgp5.ai.mit.edu:11371   |/
> 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Proposal: More flexible backup/restore via pg_dump

From
Philip Warner
Date:
At 20:49 10/10/00 -0400, Bruce Momjian wrote:
>Can I ask on a status?

Done.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/