Thread: New pg_dump options: exclude tables/schemas, multiple all, wildcards
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 NotDashEscaped: You need GnuPG to verify this message Attached is a patch to hopefully make pg_dump a lot more useful. I started out by making it simply able to avoid dumping a single table, but, inspired by David Fetter's patch last November, also added in support for multiple items and limited wildcard matching. -n and -N control the schemas, and -t and -T control the tables. Wildcards can be a star at the start, the end, or on both sides of a term. The patch acts inclusively with conflicts: the -t option trumps the -N option. Some examples: To dump all tables beginning with the string "slony", plus all tables with the word "log" inside of them: pg_dump -t "slony*" -t "*log*" To dump all schemas except "dev" and "qa", and all tables except those ending in "large": pg_dump -N "dev" -N "qa" -T "*large" -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200601152100 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFDyv9NvJuQZxSWSsgRAup9AKD110JJtJBYYPV5JxFROovfeddrSACg3IZ3 BqczBImC8UCVmik3YFHvDeQ= =Y9zs -----END PGP SIGNATURE-----
Attachment
Greg Sabino Mullane wrote: > Attached is a patch to hopefully make pg_dump a lot more useful. > I started out by making it simply able to avoid dumping a single > table, but, inspired by David Fetter's patch last November, also > added in support for multiple items and limited wildcard matching. I wonder if there's a way to have the server process the matching? That way we could have LIKE expressions in the switches, which would be simpler in the code and more powerful. I don't know how pg_dump works so I can't really answer the question. We desperately need this capability however, as patches have been floating since before 8.0 and we still don't have it. -- Alvaro Herrera Developer, http://www.PostgreSQL.org "Si quieres ser creativo, aprende el arte de perder el tiempo"
Re: New pg_dump options: exclude tables/schemas, multiple all, wildcards
From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > I wonder if there's a way to have the server process the matching? That > way we could have LIKE expressions in the switches, which would be > simpler in the code and more powerful. I don't know how pg_dump works > so I can't really answer the question. We desperately need this > capability however, as patches have been floating since before 8.0 and > we still don't have it. It won't fit into the existing code easily, but it could probably be done. I toyed around with making the regex more robust, but three things stopped me: 1) The "star at start" and "star at end" catches probably 99% of the cases, and is way better than what we have now, so better a bird in the hand... 2) It would be a lot more work to send it to the backend or import some of the regex code. and most importantly: 3) It would require yet more arguments to pg_dump. The moment we start allowing regular expression characters that are also valid identifier names (e.g. "." and "_") we'll need some way to tell pg_dump whether we mean a literal search or a regular expression one. Which probably means more arguments or at least modifying the existing one in a possibly nonintuitive, and definitely more complex, manner. I'm open to suggestions, however, but I don't want to make things too byzantine for the users. - -- Greg Sabino Mullane greg@endpoint.com greg@turnstep.com PGP Key: 0x14964AC8 200601171718 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFDzW5jvJuQZxSWSsgRAqrhAJoDvsOerxbi1ay3heRyfhubk3sw1wCdGDd6 6GAk6NVRjfwELzQeLeA7m5s= =e6WP -----END PGP SIGNATURE-----
"Greg Sabino Mullane" <greg@turnstep.com> writes: > 2) It would be a lot more work to send it to the backend or import some > of the regex code. Importing regex code into pg_dump certainly sounds like a loser. However, it doesn't seem to me that it'd be that hard to issue commands like select relname from pg_class where relname like <pattern> then save aside this list to match against stuff-to-dump. > 3) It would require yet more arguments to pg_dump. The moment we start allowing > regular expression characters that are also valid identifier names (e.g. "." > and "_") we'll need some way to tell pg_dump whether we mean a literal search > or a regular expression one. However, we are going to have that problem in spades if we do a half-baked pattern feature now and then want to improve it later. I think it'd be better to get it right the first time. In practice, I don't think that LIKE-style patterns (% and _ wildcards) will pose a serious compatibility problem if we just decree that the -n and -t switches now take patterns rather than plain names. I agree that regex-style patterns would open some gotchas, but what's wrong with standardizing on LIKE patterns? regards, tom lane
Re: New pg_dump options: exclude tables/schemas, multiple all, wildcards
From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > In practice, I don't think that LIKE-style patterns (% and _ wildcards) > will pose a serious compatibility problem if we just decree that the > -n and -t switches now take patterns rather than plain names. I agree > that regex-style patterns would open some gotchas, but what's wrong with > standardizing on LIKE patterns? Sounds good, but the more I think about it, why don't we just use regexes via the ~ operator? After all, if we want to exclude schemas starting with an underscore from pg_dump, then -N '^_.*' is no worse than -N '\\_%' and has the added advantage of being more like regexes people are used to. I guess my earlier 'which is which' argument isn't too much to worry about either - chances are very slim that an existing script is using a -t argument that contains regular expressions. Plus, while the underscore is common in namespace names, a period is not. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200601172005 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFDzZSpvJuQZxSWSsgRAiEQAKD5YXJjne5ZjbSUyHLiVKrEBtLPxQCfbsN8 JlQH5S+UVTogKpyRQJoU6jk= =Sfcu -----END PGP SIGNATURE-----
"Greg Sabino Mullane" <greg@turnstep.com> writes: > Plus, while the underscore is common in > namespace names, a period is not. That's a good point ... there might actually be less risk of collision with existing habits if we go with regex instead of LIKE conventions for the patterns. Even though regex has many more special characters, none seem very likely to appear in ordinary table or schema names. regards, tom lane
Tom Lane wrote: > > 3) It would require yet more arguments to pg_dump. The moment we start allowing > > regular expression characters that are also valid identifier names (e.g. "." > > and "_") we'll need some way to tell pg_dump whether we mean a literal search > > or a regular expression one. > > However, we are going to have that problem in spades if we do a > half-baked pattern feature now and then want to improve it later. > I think it'd be better to get it right the first time. > > In practice, I don't think that LIKE-style patterns (% and _ wildcards) > will pose a serious compatibility problem if we just decree that the > -n and -t switches now take patterns rather than plain names. I agree > that regex-style patterns would open some gotchas, but what's wrong with > standardizing on LIKE patterns? I am concerned about the number of object names that have an underscore. It seems regex would have fewer conflicts, even though it has more special characters. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian wrote: > Tom Lane wrote: > > > 3) It would require yet more arguments to pg_dump. The moment we start allowing > > > regular expression characters that are also valid identifier names (e.g. "." > > > and "_") we'll need some way to tell pg_dump whether we mean a literal search > > > or a regular expression one. > > > > However, we are going to have that problem in spades if we do a > > half-baked pattern feature now and then want to improve it later. > > I think it'd be better to get it right the first time. > > > > In practice, I don't think that LIKE-style patterns (% and _ wildcards) > > will pose a serious compatibility problem if we just decree that the > > -n and -t switches now take patterns rather than plain names. I agree > > that regex-style patterns would open some gotchas, but what's wrong with > > standardizing on LIKE patterns? > > I am concerned about the number of object names that have an underscore. > It seems regex would have fewer conflicts, even though it has more > special characters. Sorry, I see the group came to the same conclusion. I should have read to the end of the thread. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Where are we on this patch? Should we add code to do regex calls to the backend using '~'. --------------------------------------------------------------------------- Greg Sabino Mullane wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > NotDashEscaped: You need GnuPG to verify this message > > > Attached is a patch to hopefully make pg_dump a lot more useful. > I started out by making it simply able to avoid dumping a single > table, but, inspired by David Fetter's patch last November, also > added in support for multiple items and limited wildcard matching. > > -n and -N control the schemas, and -t and -T control the tables. > > Wildcards can be a star at the start, the end, or on both sides > of a term. The patch acts inclusively with conflicts: the -t > option trumps the -N option. > > Some examples: > > To dump all tables beginning with the string "slony", plus > all tables with the word "log" inside of them: > > pg_dump -t "slony*" -t "*log*" > > To dump all schemas except "dev" and "qa", and all tables > except those ending in "large": > > pg_dump -N "dev" -N "qa" -T "*large" > > -- > Greg Sabino Mullane greg@turnstep.com > PGP Key: 0x14964AC8 200601152100 > http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 > -----BEGIN PGP SIGNATURE----- > > iD8DBQFDyv9NvJuQZxSWSsgRAup9AKD110JJtJBYYPV5JxFROovfeddrSACg3IZ3 > BqczBImC8UCVmik3YFHvDeQ= > =Y9zs > -----END PGP SIGNATURE----- > [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Where are we on this patch? Should we add code to do regex calls > to the backend using '~'. Yes - I am planning to submit a new patch when I get a few spare cycles. Hopefully no more than a few days from now. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200602011424 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFD4QsBvJuQZxSWSsgRAs6XAKCCEobXLaOQfTf0PXpFTl0f90cNWQCgi6wO g4F6pcP6mjjLYsdkjrKAvF8= =jJFm -----END PGP SIGNATURE-----
Are we any closer to something to apply for this? --------------------------------------------------------------------------- Greg Sabino Mullane wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > NotDashEscaped: You need GnuPG to verify this message > > > Attached is a patch to hopefully make pg_dump a lot more useful. > I started out by making it simply able to avoid dumping a single > table, but, inspired by David Fetter's patch last November, also > added in support for multiple items and limited wildcard matching. > > -n and -N control the schemas, and -t and -T control the tables. > > Wildcards can be a star at the start, the end, or on both sides > of a term. The patch acts inclusively with conflicts: the -t > option trumps the -N option. > > Some examples: > > To dump all tables beginning with the string "slony", plus > all tables with the word "log" inside of them: > > pg_dump -t "slony*" -t "*log*" > > To dump all schemas except "dev" and "qa", and all tables > except those ending in "large": > > pg_dump -N "dev" -N "qa" -T "*large" > > -- > Greg Sabino Mullane greg@turnstep.com > PGP Key: 0x14964AC8 200601152100 > http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 > -----BEGIN PGP SIGNATURE----- > > iD8DBQFDyv9NvJuQZxSWSsgRAup9AKD110JJtJBYYPV5JxFROovfeddrSACg3IZ3 > BqczBImC8UCVmik3YFHvDeQ= > =Y9zs > -----END PGP SIGNATURE----- > [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian http://candle.pha.pa.us SRA OSS, Inc. http://www.sraoss.com + If your life is a hard drive, Christ can be your backup. +
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Are we any closer to something to apply for this? If by closer you mean closer to the top of my todo queue, then yes. :) Otherwise, no time for it yet. Boy, Tom issues an August 1st deadline and people are already cracking down. Just kidding. *Hopefully* finished this weekend. I'll even do some work on it tonight... - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200603021936 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFEB4+fvJuQZxSWSsgRAtaCAJ9Nccf3uQ+qNf5aQdJ9gud0wDm96QCbBZIQ D5YxaKxmEQMS3paPxH1Ncv4= =P5xi -----END PGP SIGNATURE-----
This patch needs the modifications suggested in later emails, specifically using regex and quering the server for matching objects. URL added to TODO. --------------------------------------------------------------------------- Greg Sabino Mullane wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > NotDashEscaped: You need GnuPG to verify this message > > > Attached is a patch to hopefully make pg_dump a lot more useful. > I started out by making it simply able to avoid dumping a single > table, but, inspired by David Fetter's patch last November, also > added in support for multiple items and limited wildcard matching. > > -n and -N control the schemas, and -t and -T control the tables. > > Wildcards can be a star at the start, the end, or on both sides > of a term. The patch acts inclusively with conflicts: the -t > option trumps the -N option. > > Some examples: > > To dump all tables beginning with the string "slony", plus > all tables with the word "log" inside of them: > > pg_dump -t "slony*" -t "*log*" > > To dump all schemas except "dev" and "qa", and all tables > except those ending in "large": > > pg_dump -N "dev" -N "qa" -T "*large" > > -- > Greg Sabino Mullane greg@turnstep.com > PGP Key: 0x14964AC8 200601152100 > http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 > -----BEGIN PGP SIGNATURE----- > > iD8DBQFDyv9NvJuQZxSWSsgRAup9AKD110JJtJBYYPV5JxFROovfeddrSACg3IZ3 > BqczBImC8UCVmik3YFHvDeQ= > =Y9zs > -----END PGP SIGNATURE----- > [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +