Thread: Feature: give pg_dump a WHERE clause expression

Feature: give pg_dump a WHERE clause expression

From

Davy Durham

Date:

01 June 2008, 03:07:55

Greetings,
  I have developed a fairly simple patch to the pg_dump utility.  It is
against version 8.3.1 source code.

  I have added a new parameter, -w/--where=EXPR

  This lets you specify an expression that will be used in a WHERE
clause when the data is dumped.

  I have implemented and tested that it works when generating either
COPY statements (the default), or INSERT statements (-d and -D).  These
two modes of operation have two different sections of code that select
the data to be dumped.

  Though this change could arguably be removed, when a -w/--where
expression is specified, it is also indicated in the comments of the
dump output so one viewing the dump can see that it was not necessarily
all of the data.

  When -w/--where is not specified, the dump output is just as if this
patch had not been applied.

  I've also updated the pg_dump.sgml file to add a description of this
new flag.

  The code changes should also conform to the existing code style within
pg_dump.

  The patch should be applied from the root of the source tree with a
-p1 option to the patch command.


Please give any feedback if the patch needs improvement

Thanks for a great DB!

Attachment

pg_dump-where-expr.patch

Re: Feature: give pg_dump a WHERE clause expression

From

Tom Lane

Date:

01 June 2008, 11:43:24

Davy Durham <pubaddr5@davyandbeth.com> writes:
>   I have added a new parameter, -w/--where=EXPR
>   This lets you specify an expression that will be used in a WHERE
> clause when the data is dumped.

This seems pretty poorly thought out.  It can hardly work in a dump
of more than one table, which means that there's not any real reason
to use pg_dump at all.  Just do a COPY (SELECT ...) TO somefile.

            regards, tom lane

Re: Feature: give pg_dump a WHERE clause expression

From

Davy Durham

Date:

01 June 2008, 16:14:22

On Sun, 2008-06-01 at 10:43 -0400, Tom Lane wrote:
> Davy Durham <pubaddr5@davyandbeth.com> writes:
> >   I have added a new parameter, -w/--where=EXPR
> >   This lets you specify an expression that will be used in a WHERE
> > clause when the data is dumped.
>
> This seems pretty poorly thought out.  It can hardly work in a dump
> of more than one table, which means that there's not any real reason
> to use pg_dump at all.  Just do a COPY (SELECT ...) TO somefile.
>
>             regards, tom lane
>

Well, my primary reason for writing the patch was to have a standard SQL
file using INSERT statements in order to load the some of a table's data
into a database other than postgresql which does not support the COPY
statement.

I'll admit that the single where clause would often not be applicable
across all tables in a database, but when pg_dump is told specific
tables to dump (a nice existing feature of pg_dump for doing something
specialized other than a simple entire database backup), then it can be
useful.

My particular case is that I have several tables that are simple event
logs.  Each table has a timestamp column.  I'm periodically bringing
these tables into sync on another database and I only want to pull rows
newer than since the last sync..  So, a where-clause of..
        'ts > $last_sync'
..works for me.  However, I'm sure there are other uses too..


== Thinking Further ==

Beyond serving my own needs, I'm trying to generically extend the
general idea that pg_dump already supports:
        1) pg_dump can be made to dump an entire database
        2) pg_dump can be made to dump only requested tables
        3) [my addition] pg_dump can be made to dump only requested rows
        from requested tables

However, it's no SO generic in that the where clause applies to all
tables.

So, if this patch is not acceptable as-is, what would you feel about
this:
        I could enhance the -t/--table=NAME option to accept more than a
        simple NAME.  Rather it could accept something in the form:

                --table=<table_name>:<where-clause expression>

                For example, pg_dump --table='foo:col1 > 10 AND f2 < 14'

        Currently, the user can specify -t/--table multiple times to
        have more than one table dumped.  Or the user can use a pattern
        to a single -t option to request multiple tabes.

        This way, a user could specify a WHERE clause per table he has
        requested to dump.  Granted, the WHERE clause may then apply to
        multiple tables if a pattern was used, but that may very well be
        desirable to the user.

Unless you disagree, this is a more generic solution (than what my patch
contains) to allowing the user of pg_dump to further refine what they
wish to dump.


Thoughts?

Thanks for the feedback

Re: Feature: give pg_dump a WHERE clause expression

From

Tom Lane

Date:

01 June 2008, 16:47:22

Davy Durham <pubaddr5@davyandbeth.com> writes:
> So, if this patch is not acceptable as-is, what would you feel about
> this:
>         I could enhance the -t/--table=NAME option to accept more than a
>         simple NAME.  Rather it could accept something in the form:

>                 --table=<table_name>:<where-clause expression>

Well, that would at least address the complaint that it doesn't scale
to multiple tables, but the whole thing still seems like a frammish
that will never see enough use to justify maintaining it.

(BTW, what will you do with a table whose name contains a colon?)

            regards, tom lane

Re: Feature: give pg_dump a WHERE clause expression

From

Andrew Dunstan

Date:

01 June 2008, 17:14:20


Tom Lane wrote:
> Davy Durham <pubaddr5@davyandbeth.com> writes:
>
>> So, if this patch is not acceptable as-is, what would you feel about
>> this:
>>         I could enhance the -t/--table=NAME option to accept more than a
>>         simple NAME.  Rather it could accept something in the form:
>>
>
>
>>                 --table=<table_name>:<where-clause expression>
>>
>
> Well, that would at least address the complaint that it doesn't scale
> to multiple tables, but the whole thing still seems like a frammish
> that will never see enough use to justify maintaining it.
>
> (BTW, what will you do with a table whose name contains a colon?)
>
>
>

ISTM this would be better off waiting until we turn large parts of
pg_dump into a library, as has been often discussed, at which point it
should be relatively simple to write a custom client to do what the OP
wants. I agree that it does not at all belong in pg_dump.

cheers

andrew

Re: Feature: give pg_dump a WHERE clause expression

From

daveg

Date:

01 June 2008, 17:24:01

On Sun, Jun 01, 2008 at 04:13:34PM -0400, Andrew Dunstan wrote:
>
>
> Tom Lane wrote:
> >Davy Durham <pubaddr5@davyandbeth.com> writes:
> >
> >>So, if this patch is not acceptable as-is, what would you feel about
> >>this:
> >>        I could enhance the -t/--table=NAME option to accept more than a
> >>        simple NAME.  Rather it could accept something in the form:
> >>
> >
> >
> >>                --table=<table_name>:<where-clause expression>
> >>
> >
> >Well, that would at least address the complaint that it doesn't scale
> >to multiple tables, but the whole thing still seems like a frammish
> >that will never see enough use to justify maintaining it.
> >
> >(BTW, what will you do with a table whose name contains a colon?)
> >
> >
> >
>
> ISTM this would be better off waiting until we turn large parts of
> pg_dump into a library, as has been often discussed, at which point it
> should be relatively simple to write a custom client to do what the OP
> wants. I agree that it does not at all belong in pg_dump.

I can't imagine many of my clients ever writing another C program or even
being willing to pay me to do so. While modularizing pg_dump is a fine idea,
I don't think it addresses the same set of use cases and users as this
proposal.

-dg

--
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: Feature: give pg_dump a WHERE clause expression

From

Davy Durham

Date:

01 June 2008, 17:24:28

On Sun, 2008-06-01 at 15:47 -0400, Tom Lane wrote:
> Davy Durham <pubaddr5@davyandbeth.com> writes:
> > So, if this patch is not acceptable as-is, what would you feel about
> > this:
> >         I could enhance the -t/--table=NAME option to accept more than a
> >         simple NAME.  Rather it could accept something in the form:
>
> >                 --table=<table_name>:<where-clause expression>
>
> Well, that would at least address the complaint that it doesn't scale
> to multiple tables, but the whole thing still seems like a frammish
> that will never see enough use to justify maintaining it.
>

The code is not all that much to maintain as it is, and making it an
addition to an existing parameter instead of a new one may not increase
the code size by much more at all.

BTW- I looked it up, and mysqldump supports such an option as mine, but
it too is global for all tables and cannot be specified per table
AFAICT.

> (BTW, what will you do with a table whose name contains a colon?)
>

I thought about that, but didn't know if tables could contain a colon or
not, but I see that this is possible by enclosing the table name in
double-quotes.  I suppose they could escape the colon as I believe they
may have to do if a table contains  '*', or '?'  ??  Is there another
character that is more appropriate?

Another option I just thought about was to leave the -w/--where flag in
place, but it applies to all subsequent -t/--table flags..

So you could do something like:
    pg_dump -w 'expr1' -t tab1 -t tab2 -w 'expr2' -t tab3

So that the expr1 filters tab1 and tab2, and expr2 filters tab3..

This should be a simple modification to the existing patch to make the
where clause tracked per table rather than merely a global variable.

However there the becomes an importance on the order that args are given
to pg_dump which you may object to.   But then again, if order of the
tables in the dump file matters, then the -t/--tables flags already
indicate what order the tables will be dumped.  (By pointing this out, I
mean there may already be an importance of argument order in some
circumstances)

This also solves the ':' syntax problem you mentioned above.

>             regards, tom lane
>

Re: Feature: give pg_dump a WHERE clause expression

From

Andrew Dunstan

Date:

01 June 2008, 17:40:22


daveg wrote:



>> ISTM this would be better off waiting until we turn large parts of
>> pg_dump into a library, as has been often discussed, at which point it
>> should be relatively simple to write a custom client to do what the OP
>> wants. I agree that it does not at all belong in pg_dump.
>>
>
> I can't imagine many of my clients ever writing another C program or even
> being willing to pay me to do so. While modularizing pg_dump is a fine idea,
> I don't think it addresses the same set of use cases and users as this
> proposal.
>
>
>

It's not clear to me that your use case is very compelling. Does your
foreign database not support import via CSV or XML? Postgres can now
produce both of these for any arbitrary query.

cheers

andrew

Re: Feature: give pg_dump a WHERE clause expression

From

daveg

Date:

01 June 2008, 18:32:39

On Sun, Jun 01, 2008 at 04:40:15PM -0400, Andrew Dunstan wrote:
> daveg wrote:
> >
> >I can't imagine many of my clients ever writing another C program or even
> >being willing to pay me to do so. While modularizing pg_dump is a fine
> >idea,
> >I don't think it addresses the same set of use cases and users as this
> >proposal.
>
> It's not clear to me that your use case is very compelling. Does your
> foreign database not support import via CSV or XML? Postgres can now
> produce both of these for any arbitrary query.

The foreign database in question is postgresql.

The feature that the proposed patch enables is to create pg_dump custom
format archives for multiple tables with a predicate. No amount of csv or
xml will do that. Contrived example:

 pg_dump -Fc --table="*._stats:where ts >= now()::date" -f todays_stats.pgd

If I have not been successful in explaining this clearly, please reply
privately to avoid cluttering the list. If you simply disagree about the
usefulness of the feature, I'm fine with that.

-dg

--
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: Feature: give pg_dump a WHERE clause expression

From

Stephen Frost

Date:

01 June 2008, 21:53:22

* daveg (daveg@sonic.net) wrote:
> The feature that the proposed patch enables is to create pg_dump custom
> format archives for multiple tables with a predicate. No amount of csv or
> xml will do that. Contrived example:

Uh, pg_dump's custom format really isn't particularly special, to be
honest.  Is there some reason you're interested in using it over, as was
suggested, COPY/CSV/etc format?  You could, of course, gzip the output
of COPY (as pg_dump does) if you're concerned about space..

    Thanks,

        Stephen

Attachment

signature.asc

Re: Feature: give pg_dump a WHERE clause expression

From

Stephen Frost

Date:

01 June 2008, 21:54:20

* Davy Durham (pubaddr5@davyandbeth.com) wrote:
> The code is not all that much to maintain as it is, and making it an
> addition to an existing parameter instead of a new one may not increase
> the code size by much more at all.

I'm generally in favor of 'creature-comfort' kind of additions, but I
have to agree with Tom on this one.  I don't see much of a use case, and
it's trivial to create a shell-script to do what you're looking for with
psql and COPY statements.

    Thanks,

        Stephen

Attachment

signature.asc

Re: Feature: give pg_dump a WHERE clause expression

From

Davy Durham

Date:

01 June 2008, 21:55:59

On Sun, 2008-06-01 at 20:53 -0400, Stephen Frost wrote:
> * Davy Durham (pubaddr5@davyandbeth.com) wrote:
> > The code is not all that much to maintain as it is, and making it an
> > addition to an existing parameter instead of a new one may not increase
> > the code size by much more at all.
>
> I'm generally in favor of 'creature-comfort' kind of additions, but I
> have to agree with Tom on this one.  I don't see much of a use case, and
> it's trivial to create a shell-script to do what you're looking for with
> psql and COPY statements.
>

To reiterate, it is not possible to use the COPY command to create
standard SQL INSERT statements that can be conveniently loaded by
another db. No?

Re: Feature: give pg_dump a WHERE clause expression

From

Stephen Frost

Date:

01 June 2008, 23:02:30

* Davy Durham (pubaddr5@davyandbeth.com) wrote:
> To reiterate, it is not possible to use the COPY command to create
> standard SQL INSERT statements that can be conveniently loaded by
> another db. No?

Erm, thankfully, PostgreSQL (what you're loading the data into?) can
take more than just SQL INSERT statements.

Of course, on the other hand, you *could* use COPY to create SQL INSERT
statements through an appropriately crafted query.  PG makes that
reasonably straight-forward, actually.

    Stephen

Attachment

signature.asc

Re: Feature: give pg_dump a WHERE clause expression

From

Davy Durham

Date:

01 June 2008, 23:05:33

On Sun, 2008-06-01 at 22:02 -0400, Stephen Frost wrote:
> * Davy Durham (pubaddr5@davyandbeth.com) wrote:
> > To reiterate, it is not possible to use the COPY command to create
> > standard SQL INSERT statements that can be conveniently loaded by
> > another db. No?
>
> Erm, thankfully, PostgreSQL (what you're loading the data into?) can
> take more than just SQL INSERT statements.
>

No, the database I'm loading into is not PostgreSQL, but that's okay.  I
think that was someone else.

> Of course, on the other hand, you *could* use COPY to create SQL INSERT
> statements through an appropriately crafted query.  PG makes that
> reasonably straight-forward, actually.
>
>     Stephen

Re: Feature: give pg_dump a WHERE clause expression

From

Stephen Frost

Date:

01 June 2008, 23:19:42

* Davy Durham (pubaddr5@davyandbeth.com) wrote:
> No, the database I'm loading into is not PostgreSQL, but that's okay.  I
> think that was someone else.

This gets back to the other question then- what database is it and does
it support CSV or XML imports?

> > Of course, on the other hand, you *could* use COPY to create SQL INSERT
> > statements through an appropriately crafted query.  PG makes that
> > reasonably straight-forward, actually.

This is an option too, of course..
select
'insert into XYZ VALUES (' ||
coalesce(quote_literal(col1),'NULL') || ',' ||
coalesce(quote_literal(col2),'NULL') || ');';

or so.  I suppose it might be interesting to consider an 'insert-format'
output for COPY, which pg_dump could possibly be refactored to use when
requested.  It'd be nice if it was easier to have COPY support more
formats but right now it's kind of a pain.

    Stephen

Attachment

signature.asc

Re: Feature: give pg_dump a WHERE clause expression

From

Tom Lane

Date:

01 June 2008, 23:47:41

Davy Durham <pubaddr5@davyandbeth.com> writes:
> To reiterate, it is not possible to use the COPY command to create
> standard SQL INSERT statements that can be conveniently loaded by
> another db. No?

Fair point, but the question here is about how useful this incremental
feature really is compared to its incremental maintenance cost.
It is *possible* to achieve what you want without any pg_dump
changes: create a fresh table via
    CREATE TABLE tmp_table AS SELECT * FROM my_table WHERE ...
and then pg_dump that.  So the issue is how often does this problem
come up and is it worth maintaining more code to make it a bit
easier to deal with?  My thought is probably not, and that seems
to be the majority opinion so far.

            regards, tom lane

Re: Feature: give pg_dump a WHERE clause expression

From

Davy Durham

Date:

02 June 2008, 01:10:42

On Sun, 2008-06-01 at 22:19 -0400, Stephen Frost wrote:

>
> or so.  I suppose it might be interesting to consider an 'insert-format'
> output for COPY, which pg_dump could possibly be refactored to use when
> requested.  It'd be nice if it was easier to have COPY support more
> formats but right now it's kind of a pain.

That sounds also reasonable to me.  It seems like a worthy format that
COPY could output.

And sure.. pg_dump could then be simplified a bit by using this.  Beyond
that, pg_dump would probably most easily let the user specify the very
keyword (or full COPY statement parameters) that COPY would use as
output format rather than implementing flags for everything COPY already
supports.  Perhaps support the existing -d/-D flags for backwards
compatibility.

This makes pg_dump more flexible AND COPY more flexible for use cases
like mine.

Re: Feature: give pg_dump a WHERE clause expression

From

Tom Lane

Date:

02 June 2008, 01:22:38

Davy Durham <pubaddr5@davyandbeth.com> writes:
> On Sun, 2008-06-01 at 22:19 -0400, Stephen Frost wrote:
>> or so.  I suppose it might be interesting to consider an 'insert-format'
>> output for COPY, which pg_dump could possibly be refactored to use when
>> requested.

> And sure.. pg_dump could then be simplified a bit by using this.

Not really.  pg_dump has to support old server versions, so you won't
get any meaningful simplification by moving functionality out of pg_dump
into the backend; at least not before you've forgotten having made the
change :-(

            regards, tom lane

Re: Feature: give pg_dump a WHERE clause expression

From

daveg

Date:

02 June 2008, 03:54:08

On Sun, Jun 01, 2008 at 08:50:13PM -0400, Stephen Frost wrote:
> * daveg (daveg@sonic.net) wrote:
> > The feature that the proposed patch enables is to create pg_dump custom
> > format archives for multiple tables with a predicate. No amount of csv or
> > xml will do that. Contrived example:
>
> Uh, pg_dump's custom format really isn't particularly special, to be
> honest.  Is there some reason you're interested in using it over, as was
> suggested, COPY/CSV/etc format?  You could, of course, gzip the output
> of COPY (as pg_dump does) if you're concerned about space..

You really underrate the power and utility of the pg_dump, pg_restore
combination. These are very useful tools. I sense that they are sometimes
considered ugly stepchildren, and in need of comprehensive reformation,
but that glorious future should not stand in the way of buttering todays
bread.

Use case: I have a client with 50+ databases on 40+ hosts with more coming
online all the time, they have a couple dozen tables (varies over time) on
each db that need to have summaries of daily activity posted to a central
database. These also need to be backed up for offline archiving. The
archives are sometimes selectively re-loaded to diagnose or verify events
in the past.

Of course something can always be written to do most of this, or Skytools or
even slony might be useful for the posting part, but the proposed pg_dump
feature pretty much handles the whole thing in a very nice way that will
require about 10 lines of shell script and will work unchanged even as
tables and databases are added and dropped from the set. The operations
staff at the site are familiar with pg_dump and selective restores using
the list options -l/-L etc, so it would very be easy for them to use. It
is also an advantage that the pg_dump archives contain a complete set of
related data in one container that can be loaded or selectively loaded in
one command instead of picking through thousands of csv or copy files.
Also pg_dump also handles issues like permissions and ownership and indexes
and constraints which a csv file does not.

The argument that one could just use copy and tar and gzip applies to the
whole existance of pg_dump itself. pg_dump is just a more convenient way to
copy out tables and as such does not NEED to exist at all. However it is
convenient to that is does exist and does have convenient features to
selectively handle schemas and tables.

What is being suggested is that pg_dump take advantage of a new feature
added to postgresql (copy with where clause) to provide even more convenience
and power than it already has.

If the patch is intrusive, or ugly, or interferes with future plans, or is
likely to confuse people, then certainly it should not go in. But it seems
trivial in terms of code and complexity and does no other harm.

Merely because one do not have a use case at present does not mean that use
cases don't exist. The patch submitter (Davy) seems to have one, and I (dg)
have at least two separate client sites that could use this immediately.
Oddly, or perhaps predictably, Davy's use case is quite different from mine,
and has no appeal to me at all, but the one feature combined with the
existing abilities of pg_dump covers both.

I'll stop now, this is not that big a deal, but it seems like a
useful idea and worth more consideration than it has received.

-dg

--
David Gould       daveg@sonic.net      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

Re: Feature: give pg_dump a WHERE clause expression

From

David Fetter

Date:

02 June 2008, 07:12:48

On Sun, Jun 01, 2008 at 02:08:47PM -0500, Davy Durham wrote:
> On Sun, 2008-06-01 at 10:43 -0400, Tom Lane wrote:
> > Davy Durham <pubaddr5@davyandbeth.com> writes:
> > > I have added a new parameter, -w/--where=EXPR This lets you
> > > specify an expression that will be used in a WHERE clause when
> > > the data is dumped.
> >
> > This seems pretty poorly thought out.  It can hardly work in a
> > dump of more than one table, which means that there's not any real
> > reason to use pg_dump at all.  Just do a COPY (SELECT ...) TO
> > somefile.
>
> Well, my primary reason for writing the patch was to have a standard
> SQL file using INSERT statements in order to load the some of a
> table's data into a database other than postgresql which does not
> support the COPY statement.

Have you tried DBI-Link <http://pgfoundry.org/projects/dbi-link/>?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Feature: give pg_dump a WHERE clause expression

From

Gregory Stark

Date:

02 June 2008, 09:11:09

"daveg" <daveg@sonic.net> writes:

> The argument that one could just use copy and tar and gzip applies to the
> whole existance of pg_dump itself. pg_dump is just a more convenient way to
> copy out tables and as such does not NEED to exist at all. However it is
> convenient to that is does exist and does have convenient features to
> selectively handle schemas and tables.

That's not really true, pg_dump does a *lot* more work than just group table
data together. Its real value comes in reconstructing DDL for all the objects
and understanding the dependencies between them. That's functionality that
isn't available elsewhere and can't be reproduced without just reimplementing
pg_dump.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

Re: Feature: give pg_dump a WHERE clause expression

From

Tom Lane

Date:

02 June 2008, 11:17:53

Gregory Stark <stark@enterprisedb.com> writes:
> "daveg" <daveg@sonic.net> writes:
>> The argument that one could just use copy and tar and gzip applies to the
>> whole existance of pg_dump itself.

> That's not really true, pg_dump does a *lot* more work than just group table
> data together. Its real value comes in reconstructing DDL for all the objects
> and understanding the dependencies between them. That's functionality that
> isn't available elsewhere and can't be reproduced without just reimplementing
> pg_dump.

Indeed.  The other altar that pg_dump spends a lot of time worshipping
at is cross-version compatibility.  It's got quite enough on its plate
without people trying to convert it into an ETL tool.

Which is not to say that we don't need an ETL tool, just that it should
be separate from pg_dump.

            regards, tom lane