Thread: fine tuned database dump/reload?
Does postgresql have any facility to dump anything more fine grained than a database to a text file? For example, to mention a bad word, MySQL's dump command allows you to specify individual tables to dump - and not only that - you can also specify a SQL query that must be satisfied for a row to make it into the dump file - this is extremely useful for the way that we use our database. Also - how does postgresql handle foreign keys if you dump an entire database along with the create table statements? Because my foreign keys require that the tables be created in a certain order - MySQL fails on this front - but since it allows me to specify the tables to dump, and it dumps them in that order, I can specify the correct order. PostgreSQL's pg_dump command seems rather limited in its abilities. Maybe I'm missing the command I'm looking for. Thanks, Dan -- **************************** Daniel Armbrust Biomedical Informatics Mayo Clinic Rochester daniel.armbrust(at)mayo.edu http://informatics.mayo.edu/
On Tue, 2005-10-11 at 11:20, Dan Armbrust wrote: > Does postgresql have any facility to dump anything more fine grained > than a database to a text file? > > For example, to mention a bad word, MySQL's dump command allows you to > specify individual tables to dump - and not only that - you can also > specify a SQL query that must be satisfied for a row to make it into the > dump file - this is extremely useful for the way that we use our database. > > Also - how does postgresql handle foreign keys if you dump an entire > database along with the create table statements? Because my foreign > keys require that the tables be created in a certain order - MySQL fails > on this front - but since it allows me to specify the tables to dump, > and it dumps them in that order, I can specify the correct order. > > PostgreSQL's pg_dump command seems rather limited in its abilities. > Maybe I'm missing the command I'm looking for. Actually, this is one of those instances when it only seems more limited than it really is. pg_dump can dump a whole database full of dependencies and restore it. MySQL makes you jump through hoops to do the same thing. hence, it has the ability to dump certain tables and parts of those tables so that YOU can do the work of figuring out what to tell it to get a coherent backup. Admittedly, it would be nice to have some more selectivity in pg_dump, but for now, it's actually in pg_restore where the selectivity resides. I.e. backup the whole database, then restore what you need. Now, features that would create a .bad or .log file during restores and toss bad tuples in there while continuing, or having a threshold and all that would be very nice. But mostly, I find myself still liking the way postgresql does backups a lot more than the way oracle does them, even though sqlldr and such are far more powerful, they also require a lot more of the DBA to get simple jobs done. Normally, when I need certain tables or parts of them, I just fork the database (create database mycopy with template masterdb) and then prune that database down to what I need and back that up. No interruptions in service, and I can test just those data in that copy database to make sure it's what I really want. It's not that MySQL is more featureful or powerful, just that PostgreSQL generally calls for a different way of looking at things. Don't be afraid of "big commands", like creating a database from another database as a template for this kind of thing. PostgreSQL handles them well.
Dan Armbrust wrote: > Does postgresql have any facility to dump anything more fine grained > than a database to a text file? > > For example, to mention a bad word, MySQL's dump command allows you to > specify individual tables to dump <snip> > PostgreSQL's pg_dump command seems rather limited in its abilities. > Maybe I'm missing the command I'm looking for. > > Thanks, > > Dan > pg_dump --help ... -n, --schema=SCHEMA dump the named schema only -s, --schema-only dump only the schema, no data -t, --table=TABLE dump the named table only ... -- _______________________________ This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized. If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately. _______________________________
Dan Armbrust <daniel.armbrust.list@gmail.com> writes: > Does postgresql have any facility to dump anything more fine grained > than a database to a text file? You can dump a single table with the -t option. Right now you can't specify an arbitrary set of tables to dump. > For example, to mention a bad word, MySQL's dump command allows you to > specify individual tables to dump - and not only that - you can also > specify a SQL query that must be satisfied for a row to make it into > the dump file - this is extremely useful for the way that we use our > database. You'd have to write custom client code for this, currently (or use 'psql', see below). People have been talking about making COPY able to read from views (and maybe from SELECT statements as well), so you might have more flexibility in a later version. > Also - how does postgresql handle foreign keys if you dump an entire > database along with the create table statements? Because my foreign > keys require that the tables be created in a certain order - MySQL > fails on this front - but since it allows me to specify the tables to > dump, and it dumps them in that order, I can specify the correct > order. You don't have to worry about this. The SQL created by the dump adds all the foreign key constraints after populating the tables. > PostgreSQL's pg_dump command seems rather limited in its > abilities. Maybe I'm missing the command I'm looking for. Its primary purpose historically has been to back up whole databases, and it does that very well. You might look at what 'psql' can do for you--it's pretty powerful and can be controlled from a script. -Doug
Bricklen Anderson wrote: > > pg_dump --help > ... > -n, --schema=SCHEMA dump the named schema only > -s, --schema-only dump only the schema, no data > -t, --table=TABLE dump the named table only > ... > Just after I sent the e-mail, I realized that I forgot that I saw that option - but on my first reading of the description (and being and annoying user who didn't try it first) it didn't appear that it would allow me to specify more than one table. But I tested it, and it does let me specify the -t command more than once. So that gets me past that issue. Dan -- **************************** Daniel Armbrust Biomedical Informatics Mayo Clinic Rochester daniel.armbrust(at)mayo.edu http://informatics.mayo.edu/
Dan Armbrust wrote: > Bricklen Anderson wrote: > >> >> pg_dump --help >> ... >> -n, --schema=SCHEMA dump the named schema only >> -s, --schema-only dump only the schema, no data >> -t, --table=TABLE dump the named table only >> ... >> > > Just after I sent the e-mail, I realized that I forgot that I saw that > option - but on my first reading of the description (and being and > annoying user who didn't try it first) it didn't appear that it would > allow me to specify more than one table. > > But I tested it, and it does let me specify the -t command more than > once. So that gets me past that issue. > > Dan > > Now I'm just filling the mailing list with mis-information. It actually ignores all but the last -t flag - so this only allows me to specify one table at a time, rather than several tables. I need to write up my use case so the maintainers can see why I want to be able to dump things in such a specific way - its not for backup purposes - so I'm kind of misusing the intent of the tool. More info in a bit. Dan -- **************************** Daniel Armbrust Biomedical Informatics Mayo Clinic Rochester daniel.armbrust(at)mayo.edu http://informatics.mayo.edu/
Dan Armbrust wrote: >> > Now I'm just filling the mailing list with mis-information. It actually > ignores all but the last -t flag - so this only allows me to specify one > table at a time, rather than several tables. > > I need to write up my use case so the maintainers can see why I want to > be able to dump things in such a specific way - its not for backup > purposes - so I'm kind of misusing the intent of the tool. More info in > a bit. > > Dan > Disregard my last reply, yours hadn't arrived yet. -- _______________________________ This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized. If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately. _______________________________
My use case is not so much for database backup purposes as it is for fine grained export and import. Our database schema consists of 15 tables that represent a terminology. Each database can hold multiple terminologies - every table has a terminologyId column which distinguishes one terminology from another. I now realize that I don't need to do anything special with individual tables since PostgreSQL handles foreign keys properly - so that question is void - I'm fine with working with a whole database at a time. However, quite often I will load up a terminology on our development or test server (which also contains many other very large terminologies). When I am satisfied that it is ready, I want to put this terminology onto our production servers. But I don't want to have to export millions of rows that I don't need. With MySQL, the ability to do a dump of data which satisfies a where clause (for example: terminologyId='foo') gives me the ability to dump the data out to a file very quickly - move the file to the server it needs to be loaded on, and then re-load that data into the production database. In PostgreSQL, the only way that I see to do this is to have my desired data in a database all by itself, so that I can use pg_dump to backup the entire database. Then I can load that into the existing production database. Is there a better way to do this? Is there a flag I could specify for psql that would cause it to output INSERT or COPY statements as a result of a query - select * from foo where terminologyId=foo? Then I could just have 15 select statements batched up in a file, and pipe the output into a new file. I suppose this is kind of an obscure use case - but a flag on pg_dump where I could specify a where condition would certainly be handy. Thanks, Dan -- **************************** Daniel Armbrust Biomedical Informatics Mayo Clinic Rochester daniel.armbrust(at)mayo.edu http://informatics.mayo.edu/
> Now I'm just filling the mailing list with mis-information. It actually
> ignores all but the last -t flag - so this only allows me to specify one
> table at a time, rather than several tables.
There's always the low-tech option of:
for table in `cat /tmp/table_list.txt`
do
pg_dump -t $table my_db >> /tmp/my_tables.dmp
done
Dan Armbrust wrote: > > Is there a better way to do this? Is there a flag I could specify for > psql that would cause it to output INSERT or COPY statements as a result > of a query - select * from foo where terminologyId=foo? Then I could > just have 15 select statements batched up in a file, and pipe the output > into a new file. It's easy enough to get psql to output tab-separated columns to a file from its query (see the various backslash formatting commands and \o). Spit out the 15 different files you want to /tmp, then use a script with COPY ... FROM /tmp/file1 etc. If you use COPY the files need to be readable by the backend process, otherwise look into \copy in a psql script. If it gets much more complicated, I knock up a short Perl script. Others probably prefer Python or Ruby. -- Richard Huxton Archonet Ltd
The TODO has has an open item to all multiple -t options and other complexity. --------------------------------------------------------------------------- Dan Armbrust wrote: > Dan Armbrust wrote: > > Bricklen Anderson wrote: > > > >> > >> pg_dump --help > >> ... > >> -n, --schema=SCHEMA dump the named schema only > >> -s, --schema-only dump only the schema, no data > >> -t, --table=TABLE dump the named table only > >> ... > >> > > > > Just after I sent the e-mail, I realized that I forgot that I saw that > > option - but on my first reading of the description (and being and > > annoying user who didn't try it first) it didn't appear that it would > > allow me to specify more than one table. > > > > But I tested it, and it does let me specify the -t command more than > > once. So that gets me past that issue. > > > > Dan > > > > > Now I'm just filling the mailing list with mis-information. It actually > ignores all but the last -t flag - so this only allows me to specify one > table at a time, rather than several tables. > > I need to write up my use case so the maintainers can see why I want to > be able to dump things in such a specific way - its not for backup > purposes - so I'm kind of misusing the intent of the tool. More info in > a bit. > > Dan > > > > -- > **************************** > Daniel Armbrust > Biomedical Informatics > Mayo Clinic Rochester > daniel.armbrust(at)mayo.edu > http://informatics.mayo.edu/ > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Since no one else has mentioned it, there has been discussion this week on one of the lists (probably -general or -hackers) about expanding the capabilities of pg_dump. I've advocated for allowing a file that specifies what objects to dump and what kind of filtering to apply to the name. Allowing for only dumping certain rows in a table (applying a WHERE clause) would be a neat addition to that. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: > Since no one else has mentioned it, there has been discussion this week > on one of the lists (probably -general or -hackers) about expanding the > capabilities of pg_dump. I've advocated for allowing a file that > specifies what objects to dump and what kind of filtering to apply to > the name. Allowing for only dumping certain rows in a table (applying a > WHERE clause) would be a neat addition to that. That would be exactly what I need - the ability to apply a where clause. That - and the ability to specify multiple tables to dump, instead of just one. Because currently, I still get into foreign key trouble in this scenario: I drop data (and only data) from database A. This data has foreign key constraints. I want to load this data into database B - where the tables already exist, they already have foreign keys, and there is some existing data. My dump from database A won't load - because the tables were dumped in the wrong order to satisfy the foreign keys on the reload - so the only way I can load this data back into an existing database is to dump each table individually, and reload in the proper order, or manually drop my foreign keys while I load the data. Both ways are a pain - and it makes it overly difficult to export/import chunks of data from a database. Dan -- **************************** Daniel Armbrust Biomedical Informatics Mayo Clinic Rochester daniel.armbrust(at)mayo.edu http://informatics.mayo.edu/
pg_dump handles table ordering properly. On Fri, Oct 14, 2005 at 10:24:55AM -0500, Dan Armbrust wrote: > Jim C. Nasby wrote: > >Since no one else has mentioned it, there has been discussion this week > >on one of the lists (probably -general or -hackers) about expanding the > >capabilities of pg_dump. I've advocated for allowing a file that > >specifies what objects to dump and what kind of filtering to apply to > >the name. Allowing for only dumping certain rows in a table (applying a > >WHERE clause) would be a neat addition to that. > > That would be exactly what I need - the ability to apply a where clause. > That - and the ability to specify multiple tables to dump, instead of > just one. > > Because currently, I still get into foreign key trouble in this scenario: > > I drop data (and only data) from database A. This data has foreign key > constraints. I want to load this data into database B - where the > tables already exist, they already have foreign keys, and there is some > existing data. > > My dump from database A won't load - because the tables were dumped in > the wrong order to satisfy the foreign keys on the reload - so the only > way I can load this data back into an existing database is to dump each > table individually, and reload in the proper order, or manually drop my > foreign keys while I load the data. > > Both ways are a pain - and it makes it overly difficult to export/import > chunks of data from a database. > > Dan > > > -- > **************************** > Daniel Armbrust > Biomedical Informatics > Mayo Clinic Rochester > daniel.armbrust(at)mayo.edu > http://informatics.mayo.edu/ > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: > pg_dump handles table ordering properly. > Maybe I missed something then, because it didn't last time I tried to move some data. I had to drop my foreign keys before I could reload it. Dan -- **************************** Daniel Armbrust Biomedical Informatics Mayo Clinic Rochester daniel.armbrust(at)mayo.edu http://informatics.mayo.edu/
Dan Armbrust <daniel.armbrust.list@gmail.com> writes: > Jim C. Nasby wrote: > > pg_dump handles table ordering properly. > > > > Maybe I missed something then, because it didn't last time I tried to > move some data. I had to drop my foreign keys before I could reload > it. What version was this? Older versions definitely had some problems with table ordering. In 8.0.X, the tables in the dump all get loaded before the FK constraints are created, and I have never seen a problem. -Doug
Douglas McNaught <doug@mcnaught.org> writes: > Dan Armbrust <daniel.armbrust.list@gmail.com> writes: >> Maybe I missed something then, because it didn't last time I tried to >> move some data. I had to drop my foreign keys before I could reload >> it. > What version was this? Older versions definitely had some problems > with table ordering. In 8.0.X, the tables in the dump all get loaded > before the FK constraints are created, and I have never seen a > problem. Even in current versions, you have to do a full (schema+data) dump for it to work "right". Data-only dumps are prone to FK problems. This is insoluble in the general case, though you can work around it to some extent with the --disable-triggers option. regards, tom lane
On Oct 17, 2005, at 9:34 AM, Dan Armbrust wrote: > Jim C. Nasby wrote: > > pg_dump handles table ordering properly. > > > > Maybe I missed something then, because it didn't last time I tried > to move some data. I had to drop my foreign keys before I could > reload it. This is my experience as well. pg_dump doesn't work in the presence of FK's always.
-----Original Message-----
From: pgsql-general-owner@postgresql.org on behalf of Vivek Khera
Sent: Mon 10/17/2005 3:35 PM
To: Postgres General
Subject: Re: [GENERAL] fine tuned database dump/reload?
On Oct 17, 2005, at 9:34 AM, Dan Armbrust wrote:
> > Jim C. Nasby wrote:
> > > pg_dump handles table ordering properly.
> > >
> >
> > Maybe I missed something then, because it didn't last time I tried
> > to move some data. I had to drop my foreign keys before I could
> > reload it.
> This is my experience as well. pg_dump doesn't work in the presence
> of FK's always.
Ive had few problems, and every one has been fixed by a newer version of pg_dump. I would imagine a mode that dumps for an older version would be useful, so that if you found something in 7.4 that didn't dump with 7.4's pg_dump, but can be dumped with 8.0, it wouldn't use any 8.0isms in it's dump. Or does that already happen automagically? I can't tell from the docs that it does.
I have seen people show up with issues with dumps and seen them fixed fairly quick, either with a work around via pg_restore, or a patch.
What problems have you run into lately? And could pg_restore fix them, or were they too hairy?
What problems have you run into lately? And could pg_restore fix them, or were they too hairy?