Re: pg_dump additional options for performance - Mailing list pgsql-patches
From | Joshua D. Drake |
---|---|
Subject | Re: pg_dump additional options for performance |
Date | |
Msg-id | 1217095404.16378.184.camel@jd-laptop Whole thread Raw |
In response to | Re: pg_dump additional options for performance (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: pg_dump additional options for performance
|
List | pgsql-patches |
On Sat, 2008-07-26 at 13:43 -0400, Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: > > I dislike, and doubt that I'd use, this approach. At the end of the > > day, it ends up processing the same (very large amount of data) multiple > > times. > > Well, that's easily avoided: just replace the third step by restoring > directly to the target database. > > pg_restore --schema-before-data whole.dump >before.sql > edit before.sql > pg_restore --schema-after-data whole.dump >after.sql > edit after.sql > psql -f before.sql target_db > pg_restore --data-only -d target_db whole.dump > psql -f after.sql target_db It seems to me we continue to hack a solution without a clear idea of the problems involved. There are a number of what I would consider significant issues with the backup / restore facilities as a whole with PostgreSQL. 1. We use text based backups, even with custom format. We need a fast binary representation as well. 2. We have no concurrency which means, anyone with any database over 50G has unacceptable restore times. 3. We have to continue develop hacks to define custom utilization. Why am I passing pre-data anything? It should be automatic. For example: pg_backup (not dump, we aren't dumping. Dumping is usually associated with some sort of crash or fould human behavoir. We are backing up). pg_backup -U <user> -D database -F -f mybackup.sqlc If I were to extract <mybackup.sqlc> I would get: mybackup.datatypes mybackup.tables mybackup.data mybackup.primary_keys mybackup.indexes mybackup.constraints mybackup.grants All would be the SQL representation. Further I could do this: pg_restore -U <user> -D <database> --data-types -f mybackup.sqlc Which would restore just the SQL representation of the data types. Or: pg_restore -U <user> -D <database> --tables -f mybackup.sqlc Which would restore *only* the tables. Yes it would error if I didn't also specify --data-types. Further we need to have concurrency capability. Once we have restored datatypes and tables, there is zero reason not to launch connections on data (and then primary keys and indexes) so: pg_restore -U <user> -D <database> -C 4 --full -f mybackup.sqlc Which would launch four connections to the database, and perform a full restore per mybackup.sqlc. Oh and pg_dumpall? It should have been removed right around the release of 7.2, pg_dump -A please. Anyway, I leave other peeps to flame me into oblivion. Sincerely, Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
pgsql-patches by date: