Thread: New pg_dump options: exclude tables/schemas, multiple all, wildcards

New pg_dump options: exclude tables/schemas, multiple all, wildcards

From
Greg Sabino Mullane
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message


Attached is a patch to hopefully make pg_dump a lot more useful.
I started out by making it simply able to avoid dumping a single
table, but, inspired by David Fetter's patch last November, also
added in support for multiple items and limited wildcard matching.

-n and -N control the schemas, and -t and -T control the tables.

Wildcards can be a star at the start, the end, or on both sides
of a term. The patch acts inclusively with conflicts: the -t
option trumps the -N option.

Some examples:

To dump all tables beginning with the string "slony", plus
all tables with the word "log" inside of them:

pg_dump -t "slony*" -t "*log*"

To dump all schemas except "dev" and "qa", and all tables
except those ending in "large":

pg_dump -N "dev" -N "qa" -T "*large"

--
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200601152100
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFDyv9NvJuQZxSWSsgRAup9AKD110JJtJBYYPV5JxFROovfeddrSACg3IZ3
BqczBImC8UCVmik3YFHvDeQ=
=Y9zs
-----END PGP SIGNATURE-----


Attachment

Re: New pg_dump options: exclude tables/schemas, multiple all, wildcards

From
Alvaro Herrera
Date:
Greg Sabino Mullane wrote:

> Attached is a patch to hopefully make pg_dump a lot more useful.
> I started out by making it simply able to avoid dumping a single
> table, but, inspired by David Fetter's patch last November, also
> added in support for multiple items and limited wildcard matching.

I wonder if there's a way to have the server process the matching?  That
way we could have LIKE expressions in the switches, which would be
simpler in the code and more powerful.  I don't know how pg_dump works
so I can't really answer the question.  We desperately need this
capability however, as patches have been floating since before 8.0 and
we still don't have it.

--
Alvaro Herrera                           Developer, http://www.PostgreSQL.org
"Si quieres ser creativo, aprende el arte de perder el tiempo"

Re: New pg_dump options: exclude tables/schemas, multiple all, wildcards

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> I wonder if there's a way to have the server process the matching?  That
> way we could have LIKE expressions in the switches, which would be
> simpler in the code and more powerful.  I don't know how pg_dump works
> so I can't really answer the question.  We desperately need this
> capability however, as patches have been floating since before 8.0 and
> we still don't have it.

It won't fit into the existing code easily, but it could probably be done.
I toyed around with making the regex more robust, but three things
stopped me:

1) The "star at start" and "star at end" catches probably 99% of the cases,
and is way better than what we have now, so better a bird in the hand...

2) It would be a lot more work to send it to the backend or import some
of the regex code.

and most importantly:

3) It would require yet more arguments to pg_dump. The moment we start allowing
regular expression characters that are also valid identifier names (e.g. "."
and "_") we'll need some way to tell pg_dump whether we mean a literal search
or a regular expression one. Which probably means more arguments or at least
modifying the existing one in a possibly nonintuitive, and definitely more
complex, manner. I'm open to suggestions, however, but I don't want to make
things too byzantine for the users.

- --
Greg Sabino Mullane greg@endpoint.com  greg@turnstep.com
PGP Key: 0x14964AC8 200601171718
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFDzW5jvJuQZxSWSsgRAqrhAJoDvsOerxbi1ay3heRyfhubk3sw1wCdGDd6
6GAk6NVRjfwELzQeLeA7m5s=
=e6WP
-----END PGP SIGNATURE-----



Re: New pg_dump options: exclude tables/schemas, multiple all, wildcards

From
Tom Lane
Date:
"Greg Sabino Mullane" <greg@turnstep.com> writes:
> 2) It would be a lot more work to send it to the backend or import some
> of the regex code.

Importing regex code into pg_dump certainly sounds like a loser.
However, it doesn't seem to me that it'd be that hard to issue
commands like
    select relname from pg_class where relname like <pattern>
then save aside this list to match against stuff-to-dump.

> 3) It would require yet more arguments to pg_dump. The moment we start allowing
> regular expression characters that are also valid identifier names (e.g. "."
> and "_") we'll need some way to tell pg_dump whether we mean a literal search
> or a regular expression one.

However, we are going to have that problem in spades if we do a
half-baked pattern feature now and then want to improve it later.
I think it'd be better to get it right the first time.

In practice, I don't think that LIKE-style patterns (% and _ wildcards)
will pose a serious compatibility problem if we just decree that the
-n and -t switches now take patterns rather than plain names.  I agree
that regex-style patterns would open some gotchas, but what's wrong with
standardizing on LIKE patterns?

            regards, tom lane

Re: New pg_dump options: exclude tables/schemas, multiple all, wildcards

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> In practice, I don't think that LIKE-style patterns (% and _ wildcards)
> will pose a serious compatibility problem if we just decree that the
> -n and -t switches now take patterns rather than plain names.  I agree
> that regex-style patterns would open some gotchas, but what's wrong with
> standardizing on LIKE patterns?

Sounds good, but the more I think about it, why don't we just use regexes via
the ~ operator? After all, if we want to exclude schemas starting with an
underscore from pg_dump, then -N '^_.*' is no worse than -N '\\_%' and has
the added advantage of being more like regexes people are used to. I guess
my earlier 'which is which' argument isn't too much to worry about either -
chances are very slim that an existing script is using a -t argument that
contains regular expressions. Plus, while the underscore is common in
namespace names, a period is not.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200601172005
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFDzZSpvJuQZxSWSsgRAiEQAKD5YXJjne5ZjbSUyHLiVKrEBtLPxQCfbsN8
JlQH5S+UVTogKpyRQJoU6jk=
=Sfcu
-----END PGP SIGNATURE-----



Re: New pg_dump options: exclude tables/schemas, multiple all, wildcards

From
Tom Lane
Date:
"Greg Sabino Mullane" <greg@turnstep.com> writes:
> Plus, while the underscore is common in
> namespace names, a period is not.

That's a good point ... there might actually be less risk of collision
with existing habits if we go with regex instead of LIKE conventions for
the patterns.  Even though regex has many more special characters, none
seem very likely to appear in ordinary table or schema names.

            regards, tom lane

Re: New pg_dump options: exclude tables/schemas, multiple

From
Bruce Momjian
Date:
Tom Lane wrote:
> > 3) It would require yet more arguments to pg_dump. The moment we start allowing
> > regular expression characters that are also valid identifier names (e.g. "."
> > and "_") we'll need some way to tell pg_dump whether we mean a literal search
> > or a regular expression one.
>
> However, we are going to have that problem in spades if we do a
> half-baked pattern feature now and then want to improve it later.
> I think it'd be better to get it right the first time.
>
> In practice, I don't think that LIKE-style patterns (% and _ wildcards)
> will pose a serious compatibility problem if we just decree that the
> -n and -t switches now take patterns rather than plain names.  I agree
> that regex-style patterns would open some gotchas, but what's wrong with
> standardizing on LIKE patterns?

I am concerned about the number of object names that have an underscore.
It seems regex would have fewer conflicts, even though it has more
special characters.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: New pg_dump options: exclude tables/schemas, multiple

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Tom Lane wrote:
> > > 3) It would require yet more arguments to pg_dump. The moment we start allowing
> > > regular expression characters that are also valid identifier names (e.g. "."
> > > and "_") we'll need some way to tell pg_dump whether we mean a literal search
> > > or a regular expression one.
> >
> > However, we are going to have that problem in spades if we do a
> > half-baked pattern feature now and then want to improve it later.
> > I think it'd be better to get it right the first time.
> >
> > In practice, I don't think that LIKE-style patterns (% and _ wildcards)
> > will pose a serious compatibility problem if we just decree that the
> > -n and -t switches now take patterns rather than plain names.  I agree
> > that regex-style patterns would open some gotchas, but what's wrong with
> > standardizing on LIKE patterns?
>
> I am concerned about the number of object names that have an underscore.
> It seems regex would have fewer conflicts, even though it has more
> special characters.

Sorry, I see the group came to the same conclusion.  I should have read
to the end of the thread.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: New pg_dump options: exclude tables/schemas, multiple

From
Bruce Momjian
Date:
Where are we on this patch?  Should we add code to do regex calls to the
backend using '~'.

---------------------------------------------------------------------------

Greg Sabino Mullane wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> NotDashEscaped: You need GnuPG to verify this message
>
>
> Attached is a patch to hopefully make pg_dump a lot more useful.
> I started out by making it simply able to avoid dumping a single
> table, but, inspired by David Fetter's patch last November, also
> added in support for multiple items and limited wildcard matching.
>
> -n and -N control the schemas, and -t and -T control the tables.
>
> Wildcards can be a star at the start, the end, or on both sides
> of a term. The patch acts inclusively with conflicts: the -t
> option trumps the -N option.
>
> Some examples:
>
> To dump all tables beginning with the string "slony", plus
> all tables with the word "log" inside of them:
>
> pg_dump -t "slony*" -t "*log*"
>
> To dump all schemas except "dev" and "qa", and all tables
> except those ending in "large":
>
> pg_dump -N "dev" -N "qa" -T "*large"
>
> --
> Greg Sabino Mullane greg@turnstep.com
> PGP Key: 0x14964AC8 200601152100
> http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
> -----BEGIN PGP SIGNATURE-----
>
> iD8DBQFDyv9NvJuQZxSWSsgRAup9AKD110JJtJBYYPV5JxFROovfeddrSACg3IZ3
> BqczBImC8UCVmik3YFHvDeQ=
> =Y9zs
> -----END PGP SIGNATURE-----
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: New pg_dump options: exclude tables/schemas, multiple

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> Where are we on this patch?  Should we add code to do regex calls
> to the backend using '~'.

Yes - I am planning to submit a new patch when I get a few spare
cycles. Hopefully no more than a few days from now.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200602011424
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFD4QsBvJuQZxSWSsgRAs6XAKCCEobXLaOQfTf0PXpFTl0f90cNWQCgi6wO
g4F6pcP6mjjLYsdkjrKAvF8=
=jJFm
-----END PGP SIGNATURE-----



Re: New pg_dump options: exclude tables/schemas, multiple

From
Bruce Momjian
Date:
Are we any closer to something to apply for this?

---------------------------------------------------------------------------

Greg Sabino Mullane wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> NotDashEscaped: You need GnuPG to verify this message
>
>
> Attached is a patch to hopefully make pg_dump a lot more useful.
> I started out by making it simply able to avoid dumping a single
> table, but, inspired by David Fetter's patch last November, also
> added in support for multiple items and limited wildcard matching.
>
> -n and -N control the schemas, and -t and -T control the tables.
>
> Wildcards can be a star at the start, the end, or on both sides
> of a term. The patch acts inclusively with conflicts: the -t
> option trumps the -N option.
>
> Some examples:
>
> To dump all tables beginning with the string "slony", plus
> all tables with the word "log" inside of them:
>
> pg_dump -t "slony*" -t "*log*"
>
> To dump all schemas except "dev" and "qa", and all tables
> except those ending in "large":
>
> pg_dump -N "dev" -N "qa" -T "*large"
>
> --
> Greg Sabino Mullane greg@turnstep.com
> PGP Key: 0x14964AC8 200601152100
> http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
> -----BEGIN PGP SIGNATURE-----
>
> iD8DBQFDyv9NvJuQZxSWSsgRAup9AKD110JJtJBYYPV5JxFROovfeddrSACg3IZ3
> BqczBImC8UCVmik3YFHvDeQ=
> =Y9zs
> -----END PGP SIGNATURE-----
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian   http://candle.pha.pa.us
  SRA OSS, Inc.   http://www.sraoss.com

  + If your life is a hard drive, Christ can be your backup. +

Re: New pg_dump options: exclude tables/schemas, multiple

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> Are we any closer to something to apply for this?

If by closer you mean closer to the top of my todo queue,
then yes. :) Otherwise, no time for it yet.

Boy, Tom issues an August 1st deadline and people are already
cracking down. Just kidding. *Hopefully* finished this weekend.
I'll even do some work on it tonight...

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200603021936
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFEB4+fvJuQZxSWSsgRAtaCAJ9Nccf3uQ+qNf5aQdJ9gud0wDm96QCbBZIQ
D5YxaKxmEQMS3paPxH1Ncv4=
=P5xi
-----END PGP SIGNATURE-----



Re: New pg_dump options: exclude tables/schemas, multiple

From
Bruce Momjian
Date:
This patch needs the modifications suggested in later emails,
specifically using regex and quering the server for matching objects.

URL added to TODO.

---------------------------------------------------------------------------

Greg Sabino Mullane wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> NotDashEscaped: You need GnuPG to verify this message
>
>
> Attached is a patch to hopefully make pg_dump a lot more useful.
> I started out by making it simply able to avoid dumping a single
> table, but, inspired by David Fetter's patch last November, also
> added in support for multiple items and limited wildcard matching.
>
> -n and -N control the schemas, and -t and -T control the tables.
>
> Wildcards can be a star at the start, the end, or on both sides
> of a term. The patch acts inclusively with conflicts: the -t
> option trumps the -N option.
>
> Some examples:
>
> To dump all tables beginning with the string "slony", plus
> all tables with the word "log" inside of them:
>
> pg_dump -t "slony*" -t "*log*"
>
> To dump all schemas except "dev" and "qa", and all tables
> except those ending in "large":
>
> pg_dump -N "dev" -N "qa" -T "*large"
>
> --
> Greg Sabino Mullane greg@turnstep.com
> PGP Key: 0x14964AC8 200601152100
> http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
> -----BEGIN PGP SIGNATURE-----
>
> iD8DBQFDyv9NvJuQZxSWSsgRAup9AKD110JJtJBYYPV5JxFROovfeddrSACg3IZ3
> BqczBImC8UCVmik3YFHvDeQ=
> =Y9zs
> -----END PGP SIGNATURE-----
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +