Re: pg_dump additional options for performance - Mailing list pgsql-patches

From Joshua D. Drake
Subject Re: pg_dump additional options for performance
Date
Msg-id 1217095404.16378.184.camel@jd-laptop
Whole thread Raw
In response to Re: pg_dump additional options for performance  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pg_dump additional options for performance
List pgsql-patches
On Sat, 2008-07-26 at 13:43 -0400, Tom Lane wrote:
> Stephen Frost <sfrost@snowman.net> writes:

> > I dislike, and doubt that I'd use, this approach.  At the end of the
> > day, it ends up processing the same (very large amount of data) multiple
> > times.
>
> Well, that's easily avoided: just replace the third step by restoring
> directly to the target database.
>
> pg_restore --schema-before-data whole.dump >before.sql
> edit before.sql
> pg_restore --schema-after-data whole.dump >after.sql
> edit after.sql
> psql -f before.sql target_db
> pg_restore --data-only -d target_db whole.dump
> psql -f after.sql target_db

It seems to me we continue to hack a solution without a clear idea of
the problems involved. There are a number of what I would consider
significant issues with the backup / restore facilities as a whole with
PostgreSQL.

1. We use text based backups, even with custom format. We need a fast
binary representation as well.

2. We have no concurrency which means, anyone with any database over 50G
has unacceptable restore times.

3. We have to continue develop hacks to define custom utilization. Why
am I passing pre-data anything? It should be automatic. For example:

pg_backup (not dump, we aren't dumping. Dumping is usually associated
with some sort of crash or fould human behavoir. We are backing up).
   pg_backup -U <user> -D database -F -f mybackup.sqlc

If I were to extract <mybackup.sqlc> I would get:

  mybackup.datatypes
  mybackup.tables
  mybackup.data
  mybackup.primary_keys
  mybackup.indexes
  mybackup.constraints
  mybackup.grants

All would be the SQL representation.

Further I could do this:

   pg_restore -U <user> -D <database> --data-types -f mybackup.sqlc

Which would restore just the SQL representation of the data types.

Or:

   pg_restore -U <user> -D <database> --tables -f mybackup.sqlc

Which would restore *only* the tables. Yes it would error if I didn't
also specify --data-types.

Further we need to have concurrency capability. Once we have restored
datatypes and tables, there is zero reason not to launch connections on
data (and then primary keys and indexes) so:

   pg_restore -U <user> -D <database> -C 4 --full -f mybackup.sqlc

Which would launch four connections to the database, and perform a full
restore per mybackup.sqlc.

Oh and pg_dumpall? It should have been removed right around the release
of 7.2, pg_dump -A please.

Anyway, I leave other peeps to flame me into oblivion.

Sincerely,

Joshua D. Drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




pgsql-patches by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_dump additional options for performance
Next
From: daveg
Date:
Subject: Re: pg_dump additional options for performance