Thread: fine tuned database dump/reload?

fine tuned database dump/reload?

From
Dan Armbrust
Date:
Does postgresql have any facility to dump anything more fine grained
than a database to a text file?

For example, to mention a bad word, MySQL's dump command allows you to
specify individual tables to dump - and not only that - you can also
specify a SQL query that must be satisfied for a row to make it into the
dump file - this is extremely useful for the way that we use our database.

Also - how does postgresql handle foreign keys if you dump an entire
database along with the create table statements?  Because my foreign
keys require that the tables be created in a certain order - MySQL fails
on this front - but since it allows me to specify the tables to dump,
and it dumps them in that order, I can specify the correct order.

PostgreSQL's pg_dump command seems rather limited in its abilities.
Maybe I'm missing the command I'm looking for.

Thanks,

Dan

--
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/

Re: fine tuned database dump/reload?

From
Scott Marlowe
Date:
On Tue, 2005-10-11 at 11:20, Dan Armbrust wrote:
> Does postgresql have any facility to dump anything more fine grained
> than a database to a text file?
>
> For example, to mention a bad word, MySQL's dump command allows you to
> specify individual tables to dump - and not only that - you can also
> specify a SQL query that must be satisfied for a row to make it into the
> dump file - this is extremely useful for the way that we use our database.
>
> Also - how does postgresql handle foreign keys if you dump an entire
> database along with the create table statements?  Because my foreign
> keys require that the tables be created in a certain order - MySQL fails
> on this front - but since it allows me to specify the tables to dump,
> and it dumps them in that order, I can specify the correct order.
>
> PostgreSQL's pg_dump command seems rather limited in its abilities.
> Maybe I'm missing the command I'm looking for.

Actually, this is one of those instances when it only seems more limited
than it really is.

pg_dump can dump a whole database full of dependencies and restore it.

MySQL makes you jump through hoops to do the same thing.  hence, it has
the ability to dump certain tables and parts of those tables so that YOU
can do the work of figuring out what to tell it to get a coherent
backup.

Admittedly, it would be nice to have some more selectivity in pg_dump,
but for now, it's actually in pg_restore where the selectivity resides.
I.e. backup the whole database, then restore what you need.

Now, features that would create a .bad or .log file during restores and
toss bad tuples in there while continuing, or having a threshold and all
that would be very nice.  But mostly, I find myself still liking the way
postgresql does backups a lot more than the way oracle does them, even
though sqlldr and such are far more powerful, they also require a lot
more of the DBA to get simple jobs done.

Normally, when I need certain tables or parts of them, I just fork the
database (create database mycopy with template masterdb) and then prune
that database down to what I need and back that up.  No interruptions in
service, and I can test just those data in that copy database to make
sure it's what I really want.

It's not that MySQL is more featureful or powerful, just that PostgreSQL
generally calls for a different way of looking at things.  Don't be
afraid of "big commands", like creating a database from another database
as a template for this kind of thing. PostgreSQL handles them well.

Re: fine tuned database dump/reload?

From
Bricklen Anderson
Date:
Dan Armbrust wrote:
> Does postgresql have any facility to dump anything more fine grained
> than a database to a text file?
>
> For example, to mention a bad word, MySQL's dump command allows you to
> specify individual tables to dump
<snip>
> PostgreSQL's pg_dump command seems rather limited in its abilities.
> Maybe I'm missing the command I'm looking for.
>
> Thanks,
>
> Dan
>

pg_dump --help
...
  -n, --schema=SCHEMA      dump the named schema only
  -s, --schema-only        dump only the schema, no data
  -t, --table=TABLE        dump the named table only
...

--
_______________________________

This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.
_______________________________

Re: fine tuned database dump/reload?

From
Douglas McNaught
Date:
Dan Armbrust <daniel.armbrust.list@gmail.com> writes:

> Does postgresql have any facility to dump anything more fine grained
> than a database to a text file?

You can dump a single table with the -t option.  Right now you can't
specify an arbitrary set of tables to dump.

> For example, to mention a bad word, MySQL's dump command allows you to
> specify individual tables to dump - and not only that - you can also
> specify a SQL query that must be satisfied for a row to make it into
> the dump file - this is extremely useful for the way that we use our
> database.

You'd have to write custom client code for this, currently (or use
'psql', see below).  People have been talking about making COPY able
to read from views (and maybe from SELECT statements as well), so you
might have more flexibility in a later version.

> Also - how does postgresql handle foreign keys if you dump an entire
> database along with the create table statements?  Because my foreign
> keys require that the tables be created in a certain order - MySQL
> fails on this front - but since it allows me to specify the tables to
> dump, and it dumps them in that order, I can specify the correct
> order.

You don't have to worry about this.  The SQL created by the dump adds
all the foreign key constraints after populating the tables.

> PostgreSQL's pg_dump command seems rather limited in its
> abilities. Maybe I'm missing the command I'm looking for.

Its primary purpose historically has been to back up whole databases,
and it does that very well.  You might look at what 'psql' can do for
you--it's pretty powerful and can be controlled from a script.

-Doug

Re: fine tuned database dump/reload?

From
Dan Armbrust
Date:
Bricklen Anderson wrote:

>
> pg_dump --help
> ...
>   -n, --schema=SCHEMA      dump the named schema only
>   -s, --schema-only        dump only the schema, no data
>   -t, --table=TABLE        dump the named table only
> ...
>

Just after I sent the e-mail, I realized that I forgot that I saw that
option - but on my first reading of the description (and being and
annoying user who didn't try it first) it didn't appear that it would
allow me to specify more than one table.

But I tested it, and it does let me specify the -t command more than
once.  So that gets me past that issue.

Dan


--
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/

Re: fine tuned database dump/reload?

From
Dan Armbrust
Date:
Dan Armbrust wrote:
> Bricklen Anderson wrote:
>
>>
>> pg_dump --help
>> ...
>>   -n, --schema=SCHEMA      dump the named schema only
>>   -s, --schema-only        dump only the schema, no data
>>   -t, --table=TABLE        dump the named table only
>> ...
>>
>
> Just after I sent the e-mail, I realized that I forgot that I saw that
> option - but on my first reading of the description (and being and
> annoying user who didn't try it first) it didn't appear that it would
> allow me to specify more than one table.
>
> But I tested it, and it does let me specify the -t command more than
> once.  So that gets me past that issue.
>
> Dan
>
>
Now I'm just filling the mailing list with mis-information.  It actually
ignores all but the last -t flag - so this only allows me to specify one
table at a time, rather than several tables.

I need to write up my use case so the maintainers can see why I want to
be able to dump things in such a specific way - its not for backup
purposes - so I'm kind of misusing the intent of the tool.  More info in
a bit.

Dan



--
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/

Re: fine tuned database dump/reload?

From
Bricklen Anderson
Date:
Dan Armbrust wrote:
>>
> Now I'm just filling the mailing list with mis-information.  It actually
> ignores all but the last -t flag - so this only allows me to specify one
> table at a time, rather than several tables.
>
> I need to write up my use case so the maintainers can see why I want to
> be able to dump things in such a specific way - its not for backup
> purposes - so I'm kind of misusing the intent of the tool.  More info in
> a bit.
>
> Dan
>
Disregard my last reply, yours hadn't arrived yet.


--
_______________________________

This e-mail may be privileged and/or confidential, and the sender does
not waive any related rights and obligations. Any distribution, use or
copying of this e-mail or the information it contains by other than an
intended recipient is unauthorized. If you received this e-mail in
error, please advise me (by return e-mail or otherwise) immediately.
_______________________________

Re: fine tuned database dump/reload?

From
Dan Armbrust
Date:
My use case is not so much for database backup purposes as it is for
fine grained export and import.

Our database schema consists of 15 tables that represent a terminology.
    Each database can hold multiple terminologies - every table has a
terminologyId column which distinguishes one terminology from another.

I now realize that I don't need to do anything special with individual
tables since PostgreSQL handles foreign keys properly - so that question
is void - I'm fine with working with a whole database at a time.

However, quite often I will load up a terminology on our development or
test server (which also contains many other very large terminologies).
When I am satisfied that it is ready, I want to put this terminology
onto our production servers.  But I don't want to have to export
millions of rows that I don't need.

With MySQL, the ability to do a dump of data which satisfies a where
clause (for example: terminologyId='foo') gives me the ability to dump
the data out to a file very quickly - move the file to the server it
needs to be loaded on, and then re-load that data into the production
database.

In PostgreSQL, the only way that I see to do this is to have my desired
data in a database all by itself, so that I can use pg_dump to backup
the entire database.  Then I can load that into the existing production
database.

Is there a better way to do this?  Is there a flag I could specify for
psql that would cause it to output INSERT or COPY statements as a result
of a query - select * from foo where terminologyId=foo?  Then I could
just have 15 select statements batched up in a file, and pipe the output
into a new file.

I suppose this is kind of an obscure use case - but a flag on pg_dump
where I could specify a where condition would certainly be handy.

Thanks,

Dan

--
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/

Re: fine tuned database dump/reload?

From
Doug Bloebaum
Date:
On 10/11/05, Dan Armbrust wrote::
> Now I'm just filling the mailing list with mis-information.  It actually
> ignores all but the last -t flag - so this only allows me to specify one
> table at a time, rather than several tables.

There's always the low-tech option of:

for table in `cat /tmp/table_list.txt`
do
  pg_dump -t $table my_db >> /tmp/my_tables.dmp
done 

It's far from elegant, but it does allow you to dump a specific list of tables.

Re: fine tuned database dump/reload?

From
Richard Huxton
Date:
Dan Armbrust wrote:
>
> Is there a better way to do this?  Is there a flag I could specify for
> psql that would cause it to output INSERT or COPY statements as a result
> of a query - select * from foo where terminologyId=foo?  Then I could
> just have 15 select statements batched up in a file, and pipe the output
> into a new file.

It's easy enough to get psql to output tab-separated columns to a file
from its query (see the various backslash formatting commands and \o).
Spit out the 15 different files you want to /tmp, then use a script with
COPY ... FROM /tmp/file1 etc.

If you use COPY the files need to be readable by the backend process,
otherwise look into \copy in a psql script.

If it gets much more complicated, I knock up a short Perl script. Others
probably prefer Python or Ruby.
--
   Richard Huxton
   Archonet Ltd

Re: fine tuned database dump/reload?

From
Bruce Momjian
Date:
The TODO has has an open item to all multiple -t options and other
complexity.


---------------------------------------------------------------------------

Dan Armbrust wrote:
> Dan Armbrust wrote:
> > Bricklen Anderson wrote:
> >
> >>
> >> pg_dump --help
> >> ...
> >>   -n, --schema=SCHEMA      dump the named schema only
> >>   -s, --schema-only        dump only the schema, no data
> >>   -t, --table=TABLE        dump the named table only
> >> ...
> >>
> >
> > Just after I sent the e-mail, I realized that I forgot that I saw that
> > option - but on my first reading of the description (and being and
> > annoying user who didn't try it first) it didn't appear that it would
> > allow me to specify more than one table.
> >
> > But I tested it, and it does let me specify the -t command more than
> > once.  So that gets me past that issue.
> >
> > Dan
> >
> >
> Now I'm just filling the mailing list with mis-information.  It actually
> ignores all but the last -t flag - so this only allows me to specify one
> table at a time, rather than several tables.
>
> I need to write up my use case so the maintainers can see why I want to
> be able to dump things in such a specific way - its not for backup
> purposes - so I'm kind of misusing the intent of the tool.  More info in
> a bit.
>
> Dan
>
>
>
> --
> ****************************
> Daniel Armbrust
> Biomedical Informatics
> Mayo Clinic Rochester
> daniel.armbrust(at)mayo.edu
> http://informatics.mayo.edu/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: fine tuned database dump/reload?

From
"Jim C. Nasby"
Date:
Since no one else has mentioned it, there has been discussion this week
on one of the lists (probably -general or -hackers) about expanding the
capabilities of pg_dump. I've advocated for allowing a file that
specifies what objects to dump and what kind of filtering to apply to
the name. Allowing for only dumping certain rows in a table (applying a
WHERE clause) would be a neat addition to that.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: fine tuned database dump/reload?

From
Dan Armbrust
Date:
Jim C. Nasby wrote:
> Since no one else has mentioned it, there has been discussion this week
> on one of the lists (probably -general or -hackers) about expanding the
> capabilities of pg_dump. I've advocated for allowing a file that
> specifies what objects to dump and what kind of filtering to apply to
> the name. Allowing for only dumping certain rows in a table (applying a
> WHERE clause) would be a neat addition to that.

That would be exactly what I need - the ability to apply a where clause.
  That - and the ability to specify multiple tables to dump, instead of
just one.

Because currently, I still get into foreign key trouble in this scenario:

I drop data (and only data) from database A.  This data has foreign key
constraints.  I want to load this data into database B - where the
tables already exist, they already have foreign keys, and there is some
existing data.

My dump from database A won't load - because the tables were dumped in
the wrong order to satisfy the foreign keys on the reload - so the only
way I can load this data back into an existing database is to dump each
table individually, and reload in the proper order, or manually drop my
foreign keys while I load the data.

Both ways are a pain - and it makes it overly difficult to export/import
chunks of data from a database.

Dan


--
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/

Re: fine tuned database dump/reload?

From
"Jim C. Nasby"
Date:
pg_dump handles table ordering properly.

On Fri, Oct 14, 2005 at 10:24:55AM -0500, Dan Armbrust wrote:
> Jim C. Nasby wrote:
> >Since no one else has mentioned it, there has been discussion this week
> >on one of the lists (probably -general or -hackers) about expanding the
> >capabilities of pg_dump. I've advocated for allowing a file that
> >specifies what objects to dump and what kind of filtering to apply to
> >the name. Allowing for only dumping certain rows in a table (applying a
> >WHERE clause) would be a neat addition to that.
>
> That would be exactly what I need - the ability to apply a where clause.
>  That - and the ability to specify multiple tables to dump, instead of
> just one.
>
> Because currently, I still get into foreign key trouble in this scenario:
>
> I drop data (and only data) from database A.  This data has foreign key
> constraints.  I want to load this data into database B - where the
> tables already exist, they already have foreign keys, and there is some
> existing data.
>
> My dump from database A won't load - because the tables were dumped in
> the wrong order to satisfy the foreign keys on the reload - so the only
> way I can load this data back into an existing database is to dump each
> table individually, and reload in the proper order, or manually drop my
> foreign keys while I load the data.
>
> Both ways are a pain - and it makes it overly difficult to export/import
> chunks of data from a database.
>
> Dan
>
>
> --
> ****************************
> Daniel Armbrust
> Biomedical Informatics
> Mayo Clinic Rochester
> daniel.armbrust(at)mayo.edu
> http://informatics.mayo.edu/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: fine tuned database dump/reload?

From
Dan Armbrust
Date:
Jim C. Nasby wrote:
 > pg_dump handles table ordering properly.
 >

Maybe I missed something then, because it didn't last time I tried to
move some data.  I had to drop my foreign keys before I could reload it.

Dan


--
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/

Re: fine tuned database dump/reload?

From
Douglas McNaught
Date:
Dan Armbrust <daniel.armbrust.list@gmail.com> writes:

> Jim C. Nasby wrote:
>  > pg_dump handles table ordering properly.
>  >
>
> Maybe I missed something then, because it didn't last time I tried to
> move some data.  I had to drop my foreign keys before I could reload
> it.

What version was this?  Older versions definitely had some problems
with table ordering.  In 8.0.X, the tables in the dump all get loaded
before the FK constraints are created, and I have never seen a
problem.

-Doug

Re: fine tuned database dump/reload?

From
Tom Lane
Date:
Douglas McNaught <doug@mcnaught.org> writes:
> Dan Armbrust <daniel.armbrust.list@gmail.com> writes:
>> Maybe I missed something then, because it didn't last time I tried to
>> move some data.  I had to drop my foreign keys before I could reload
>> it.

> What version was this?  Older versions definitely had some problems
> with table ordering.  In 8.0.X, the tables in the dump all get loaded
> before the FK constraints are created, and I have never seen a
> problem.

Even in current versions, you have to do a full (schema+data) dump for
it to work "right".  Data-only dumps are prone to FK problems.  This
is insoluble in the general case, though you can work around it to some
extent with the --disable-triggers option.

            regards, tom lane

Re: fine tuned database dump/reload?

From
Vivek Khera
Date:
On Oct 17, 2005, at 9:34 AM, Dan Armbrust wrote:

> Jim C. Nasby wrote:
> > pg_dump handles table ordering properly.
> >
>
> Maybe I missed something then, because it didn't last time I tried
> to move some data.  I had to drop my foreign keys before I could
> reload it.

This is my experience as well.  pg_dump doesn't work in the presence
of FK's always.


Re: fine tuned database dump/reload?

From
"Scott Marlowe"
Date:

-----Original Message-----
From: pgsql-general-owner@postgresql.org on behalf of Vivek Khera
Sent: Mon 10/17/2005 3:35 PM
To: Postgres General
Subject: Re: [GENERAL] fine tuned database dump/reload?


On Oct 17, 2005, at 9:34 AM, Dan Armbrust wrote:

> > Jim C. Nasby wrote:
> > > pg_dump handles table ordering properly.
> > >
> >
> > Maybe I missed something then, because it didn't last time I tried 
> > to move some data.  I had to drop my foreign keys before I could 
> > reload it.

> This is my experience as well.  pg_dump doesn't work in the presence 
> of FK's always.

Ive had few problems, and every one has been fixed by a newer version of pg_dump.  I would imagine a mode that dumps for an older version would be useful, so that if you found something in 7.4 that didn't dump with 7.4's pg_dump, but can be dumped with 8.0, it wouldn't use any 8.0isms in it's dump.  Or does that already happen automagically?  I can't tell from the docs that it does.

I have seen people show up with issues with dumps and seen them fixed fairly quick, either with a work around via pg_restore, or a patch.

What problems have you run into lately?  And could pg_restore fix them, or were they too hairy?

Re: fine tuned database dump/reload?

From
Vivek Khera
Date:

On Oct 18, 2005, at 12:52 AM, Scott Marlowe wrote:

What problems have you run into lately?  And could pg_restore fix them, or were they too hairy?

the only issues I have are when not doing a full dump/restore of a DB -- ie only pulling in data into an existing table structure.  the ordering is not guaranteed, as Tom says.
Attachment