Thread: db partial dumping with pg_dump

db partial dumping with pg_dump

From

Oleg Bartunov

Date:

13 August 2002, 06:50:59

We have a patch for pg_dump which adds a possibility to dump
a part of table, for example:

dump Top.Science.Astronomy heirarchy from dmoz catalog

pg_dump -d -t dmoz -w "select * from dmoz where path <@ 'Top.Science.Astronomy'" dmoz

We found it's very useful. We'd like to extend it to use also with COPY
but it has no support for select (of course it will works only for
special case when ALL columns retrieved).

The question is: Is't worth to submit patch for pg_dump and look into
copy code ?
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: db partial dumping with pg_dump

From

Rod Taylor

Date:

13 August 2002, 08:00:00

On Tue, 2002-08-13 at 07:49, Oleg Bartunov wrote:
> We have a patch for pg_dump which adds a possibility to dump
> a part of table, for example:
> 
> dump Top.Science.Astronomy heirarchy from dmoz catalog
> 
> pg_dump -d -t dmoz -w "select * from dmoz where path <@ 'Top.Science.Astronomy'" dmoz
> 
> We found it's very useful. We'd like to extend it to use also with COPY
> but it has no support for select (of course it will works only for
> special case when ALL columns retrieved).
> 
> The question is: Is't worth to submit patch for pg_dump and look into
> copy code ?

I've been asked by co-workers for information on how to do this type of
thing.  They do partial table dumps to a development system for the
purpose of finding / eliminating bugs.

That said, it may be smart to make the 'select * from <table>' part
yourself, and let the user supply a where clause.

Re: db partial dumping with pg_dump

From

Oleg Bartunov

Date:

13 August 2002, 08:05:23

On 13 Aug 2002, Rod Taylor wrote:

> On Tue, 2002-08-13 at 07:49, Oleg Bartunov wrote:
> > We have a patch for pg_dump which adds a possibility to dump
> > a part of table, for example:
> >
> > dump Top.Science.Astronomy heirarchy from dmoz catalog
> >
> > pg_dump -d -t dmoz -w "select * from dmoz where path <@ 'Top.Science.Astronomy'" dmoz
> >
> > We found it's very useful. We'd like to extend it to use also with COPY
> > but it has no support for select (of course it will works only for
> > special case when ALL columns retrieved).
> >
> > The question is: Is't worth to submit patch for pg_dump and look into
> > copy code ?
>
> I've been asked by co-workers for information on how to do this type of
> thing.  They do partial table dumps to a development system for the
> purpose of finding / eliminating bugs.
>
> That said, it may be smart to make the 'select * from <table>' part
> yourself, and let the user supply a where clause.
>

find patch (7.2) in attachement. Note, it works with -d (insert mode) option.

>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: db partial dumping with pg_dump

From

Tom Lane

Date:

13 August 2002, 11:24:35

This patch seems extremely messy to me.  Unless I'm missing something,
-w just plain fails except when you are dumping a specific table (ie,
-t must be given as well).  And heaven help you if you specify a
different table in -t than the one -w is selecting from.  This isn't
well thought out.

I'm not at all convinced that such a thing belongs in pg_dump anyway.
It'd be more useful as a manually-invokable feature, I think.  You
can almost do this in psql withselect * from table where something\g outfile
but I don't think you can get psql to emit the data in a form that can
be reloaded reliably (it won't quote data characters that look the same
as column delimiters, for instance).

What would seem to make sense is adding a WHERE-clause option to
COPY TO, and then you could goCOPY table TO 'myfile' WHERE ...
We already have column-list support in COPY, so we can already slice the
table vertically --- WHERE would let you slice it horizontally, which
seems a natural extension.  (BTW, has anyone taught psql's \copy about
column lists?  AFAIR the original patch was only against the backend.)

I'm finding it hard to visualize situations where I'd want the extra
baggage of pg_dump for something like this.  If I want the schema at
all, I'll probably want it separate from the data so that I can hack
the schema conveniently --- so I'd want to do a "pg_dump -s -t table"
and then do the selective copying separately.
        regards, tom lane

Re: db partial dumping with pg_dump

From

Oleg Bartunov

Date:

13 August 2002, 12:23:03

On Tue, 13 Aug 2002, Tom Lane wrote:

> This patch seems extremely messy to me.  Unless I'm missing something,
> -w just plain fails except when you are dumping a specific table (ie,
> -t must be given as well).  And heaven help you if you specify a
> different table in -t than the one -w is selecting from.  This isn't
> well thought out.

You're right. The patch I've sent was no way for submitting to sources !
I just asked about the feature and attached to message for Rod Taylor.

>
> I'm not at all convinced that such a thing belongs in pg_dump anyway.
> It'd be more useful as a manually-invokable feature, I think.  You
> can almost do this in psql with
>     select * from table where something
>     \g outfile
> but I don't think you can get psql to emit the data in a form that can
> be reloaded reliably (it won't quote data characters that look the same
> as column delimiters, for instance).
>

that was the reason we don't use psql for dumping


> What would seem to make sense is adding a WHERE-clause option to
> COPY TO, and then you could go
>     COPY table TO 'myfile' WHERE ...
> We already have column-list support in COPY, so we can already slice the
> table vertically --- WHERE would let you slice it horizontally, which
> seems a natural extension.  (BTW, has anyone taught psql's \copy about
> column lists?  AFAIR the original patch was only against the backend.)
>
> I'm finding it hard to visualize situations where I'd want the extra
> baggage of pg_dump for something like this.  If I want the schema at
> all, I'll probably want it separate from the data so that I can hack
> the schema conveniently --- so I'd want to do a "pg_dump -s -t table"
> and then do the selective copying separately.
>

that'd be nice ! I often need such a feature because db at work are often
too large to play at home :-)


>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: db partial dumping with pg_dump

From

Hannu Krosing

Date:

13 August 2002, 14:58:31

On Tue, 2002-08-13 at 20:24, Tom Lane wrote:
> 
> What would seem to make sense is adding a WHERE-clause option to
> COPY TO, and then you could go
>     COPY table TO 'myfile' WHERE ...

What about :

COPY table TO 'myfile' [ WHERE cond ] [ AS INSERT [ WITH COLUMN ] ];

to get the data as INSERT INTO statements (pg_dump -d), with optional
column list (pg_dump -D)

----------------
Hannu

Re: db partial dumping with pg_dump

From

Bruce Momjian

Date:

13 August 2002, 15:03:46

Actually, loading all this stuff into COPY is not the way to go, I
think.  

Informix had:
UNLOAD TO 'filename'SELECT ...

I have to admit, this is a superior way to do thing compared to what we
have.  Is is possible for us to get:
COPY TO 'filename'SELECT ...

It allows any arbitrary table, group by, even order by combination.


---------------------------------------------------------------------------

Hannu Krosing wrote:
> On Tue, 2002-08-13 at 20:24, Tom Lane wrote:
> > 
> > What would seem to make sense is adding a WHERE-clause option to
> > COPY TO, and then you could go
> >     COPY table TO 'myfile' WHERE ...
> 
> What about :
> 
> COPY table TO 'myfile' [ WHERE cond ] [ AS INSERT [ WITH COLUMN ] ];
> 
> to get the data as INSERT INTO statements (pg_dump -d), with optional
> column list (pg_dump -D)
> 
> ----------------
> Hannu
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: db partial dumping with pg_dump

From

Hannu Krosing

Date:

13 August 2002, 15:33:07

On Wed, 2002-08-14 at 00:03, Bruce Momjian wrote:
> 
> Actually, loading all this stuff into COPY is not the way to go, I
> think.  
> 
> Informix had:
> 
>     UNLOAD TO 'filename'
>     SELECT ...
> 
> I have to admit, this is a superior way to do thing compared to what we
> have.  Is is possible for us to get:
> 
>     COPY TO 'filename'
>     SELECT ...
> 
> It allows any arbitrary table, group by, even order by combination.


It would be more in line with the rest of the system to just allow
subselect as 'table'

COPY (select in,name from mystuff wher id > 10) over10stuff TO stdout;

-----------------
Hannu

Re: db partial dumping with pg_dump

From

Bruce Momjian

Date:

13 August 2002, 15:35:19

Hannu Krosing wrote:
> On Wed, 2002-08-14 at 00:03, Bruce Momjian wrote:
> > 
> > Actually, loading all this stuff into COPY is not the way to go, I
> > think.  
> > 
> > Informix had:
> > 
> >     UNLOAD TO 'filename'
> >     SELECT ...
> > 
> > I have to admit, this is a superior way to do thing compared to what we
> > have.  Is is possible for us to get:
> > 
> >     COPY TO 'filename'
> >     SELECT ...
> > 
> > It allows any arbitrary table, group by, even order by combination.
> 
> 
> It would be more in line with the rest of the system to just allow
> subselect as 'table'
> 
> COPY (select in,name from mystuff wher id > 10) over10stuff TO stdout;

Yep, that would work too.  Clearly, we should shoot for something that
leverages the existing SELECT code rather than hang more clauses off of
COPY.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: db partial dumping with pg_dump

From

Tom Lane

Date:

13 August 2002, 15:42:24

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Clearly, we should shoot for something that leverages the existing
> SELECT code rather than hang more clauses off of COPY.

Yeah, that's a good point.  COPY IN is still a special case, I think,
but seems like COPY OUT could be reimplemented as a special tuple
destination for the regular executor machinery.

Q: how much performance hit would we be taking?  If this slows down
pg_dump a lot, the extra code is worth keeping.
        regards, tom lane

Re: db partial dumping with pg_dump

From

"Christopher Kings-Lynne"

Date:

13 August 2002, 21:54:18

> I'm finding it hard to visualize situations where I'd want the extra
> baggage of pg_dump for something like this.  If I want the schema at
> all, I'll probably want it separate from the data so that I can hack
> the schema conveniently --- so I'd want to do a "pg_dump -s -t table"
> and then do the selective copying separately.

Tell you what I'd kill for:

pg_dump -t users_\* db > users.sql

Sort of thing.

Chris

Re: db partial dumping with pg_dump

From

Scott Shattuck

Date:

13 August 2002, 22:34:43

On Tue, 2002-08-13 at 19:54, Christopher Kings-Lynne wrote:
> > I'm finding it hard to visualize situations where I'd want the extra
> > baggage of pg_dump for something like this.  If I want the schema at
> > all, I'll probably want it separate from the data so that I can hack
> > the schema conveniently --- so I'd want to do a "pg_dump -s -t table"
> > and then do the selective copying separately.
> 
> Tell you what I'd kill for:
> 
> pg_dump -t users_\* db > users.sql
> 
> Sort of thing.
> 
> Chris
> 

I'd kill for pg_restore --ignore-missing-objects --clean .... so I could
run refreshes against databases that are changing schemas rather than
having to constantly edit the output of -l.

I'd also kill for pg_restore --ignore-existing-objects .... so I could
run the darn thing against a database that's already got pl/pgsql
installed in template1 and the dump file wants to install it again etc.

ss

> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: db partial dumping with pg_dump

From

Tom Lane

Date:

14 August 2002, 00:19:09

"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> Tell you what I'd kill for:
> pg_dump -t users_\* db > users.sql

Yeah, I was just about to start working on pg_dump's -t switch.
It absolutely *must* be made schema-aware for 7.3 (-t schema.table
doesn't work today) and it seems like we might as well try to allow
wildcard support.  Another thought I had was to accept multiple -t
switches, eg "pg_dump -t foo -t bar -t schema.baz*" to dump all tables
matching any of the patterns.
        regards, tom lane

Re: db partial dumping with pg_dump

From

"Christopher Kings-Lynne"

Date:

14 August 2002, 00:30:46

> Yeah, I was just about to start working on pg_dump's -t switch.
> It absolutely *must* be made schema-aware for 7.3 (-t schema.table
> doesn't work today) and it seems like we might as well try to allow
> wildcard support.  Another thought I had was to accept multiple -t
> switches, eg "pg_dump -t foo -t bar -t schema.baz*" to dump all tables
> matching any of the patterns.

What about a switch to set schema search path as well?  You could also just
have the tables comma separated in the -t ??

Chris

Re: db partial dumping with pg_dump

From

Tom Lane

Date:

14 August 2002, 00:31:52

Scott Shattuck <ss@technicalpursuit.com> writes:
> I'd also kill for pg_restore --ignore-existing-objects .... so I could
> run the darn thing against a database that's already got pl/pgsql
> installed in template1 and the dump file wants to install it again etc.

In general, I think it's a mistake for pg_restore to bail out on errors.
The underlying pg_dump scripts have been built and optimized on the
assumption that psql would keep plugging after seeing an error.  For
example, scripts containing "\connect - foo" still work if there's no
"foo" user ... but only because psql doesn't go belly-up.  pg_restore
is way less forgiving.

I think the ideal behavior for pg_restore would be to abandon work on
the current dump item upon seeing a SQL error, but to pick up with the
next one.  (Of course we could have an --anal-retentive switch to bail
on first error, but I doubt it'd be used much.)  Errors associated with
ownership switches shouldn't cause failure in any case.

Anybody care to submit patches to make this happen?
        regards, tom lane

Re: db partial dumping with pg_dump

From

Tom Lane

Date:

14 August 2002, 01:18:41

"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> What about a switch to set schema search path as well?

I'm leaning to the thought that this is a bad idea, and that pg_dump
should act in a way that's search-path-independent.  But I need to
think more.

> You could also just have the tables comma separated in the -t ??

Don't like it --- that makes comma a special character in pg_dump
tablename patterns, which is an exception we don't need to make.
The multiple-dash-t approach needs no such assumption.
        regards, tom lane