Thread: Better to dump tabs as tabs, or \t?
Historically pg_dump has taken pains to dump ASCII control characters as backslash constructs, for instance \t for tab. I am thinking this is not such a great idea, and that it'd be more portable rather than less so if we got rid of that logic and just dumped tab as tab, etc. In particular, making this play nice with standard_conforming_strings seems unpleasant: we'll have to emit E'' strings which are certainly not portable, not even to older PG releases. The only good argument I can see for the current behavior is that it makes the dump file somewhat more robust against whitespace-mashing filters like typical email programs. But I wouldn't count on a dump file to come through such a thing completely unscathed anyway. Thoughts? regards, tom lane
Tom Lane wrote: > Historically pg_dump has taken pains to dump ASCII control characters > as backslash constructs, for instance \t for tab. I am thinking this > is not such a great idea, and that it'd be more portable rather than > less so if we got rid of that logic and just dumped tab as tab, etc. > In particular, making this play nice with standard_conforming_strings > seems unpleasant: we'll have to emit E'' strings which are certainly > not portable, not even to older PG releases. > > The only good argument I can see for the current behavior is that it > makes the dump file somewhat more robust against whitespace-mashing > filters like typical email programs. But I wouldn't count on a > dump file to come through such a thing completely unscathed anyway. > > > Go for it. Emailing a database dump as other than a properly encoded MIME attachment makes no sense these days. cheers andrew
On 5/27/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Historically pg_dump has taken pains to dump ASCII control characters > as backslash constructs, for instance \t for tab. I am thinking this > is not such a great idea, and that it'd be more portable rather than > less so if we got rid of that logic and just dumped tab as tab, etc. > In particular, making this play nice with standard_conforming_strings > seems unpleasant: we'll have to emit E'' strings which are certainly > not portable, not even to older PG releases. Could we just give a switch to pg_dump, which toggles between standard_confirming_strings and old escaped strings? IMHO this decision is similar to COPY/INSERT decision - it depends what the admin plans to with the dump, what tools are user on it, whether there is need to reload on older postgres, etc - and all of them are things that the postgres tools cannot deduce. By default, pg_dump should output standard_conforming_strings, that being in sync with policy to move to standard SQL quoting. And when the switch is given, pg_dump should put SET at the start of the dump, not use E'' stings, so giving option for being backwards compatible. Such option would considerably lower the pain of migrating data between versions. -- marko
"Marko Kreen" <markokr@gmail.com> writes: > On 5/27/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Historically pg_dump has taken pains to dump ASCII control characters >> as backslash constructs, for instance \t for tab. I am thinking this >> is not such a great idea, and that it'd be more portable rather than >> less so if we got rid of that logic and just dumped tab as tab, etc. >> In particular, making this play nice with standard_conforming_strings >> seems unpleasant: we'll have to emit E'' strings which are certainly >> not portable, not even to older PG releases. > Could we just give a switch to pg_dump, which toggles between > standard_confirming_strings and old escaped strings? The plan is that it'll dump according to what it finds as the standard_conforming_strings setting on the source server. If you feel a need to override that setting, you can use PGOPTIONS or the other usual ways to set a GUC variable for a program. However, my thought on the point at hand is to just go over to dumping control characters literally in either case. This is backwards-compatible to all PG versions and I don't know of a reason to think it wouldn't work (at least as well as the backslash constructs anyway) for portability to other databases. Note: this only affects strings dumped as part of SQL commands; COPY data isn't at issue, since we're not planning to change the semantics of that. COPY has always dumped tab as \t and I don't intend to change it. But pg_dump --inserts would be affected, also strings appearing in view definitions and such. We have some precedent for this in that pg_dump has by default dumped function definitions as $$ literals for a release or two now, and no one's complained of whitespace getting munged in function definitions. regards, tom lane
COPY wants \r and \n to be used because it checks for line endings, but your change is only for the SQL strings, and you are right, it is more porable to dump as actual bytes than backslashes. --------------------------------------------------------------------------- Tom Lane wrote: > "Marko Kreen" <markokr@gmail.com> writes: > > On 5/27/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Historically pg_dump has taken pains to dump ASCII control characters > >> as backslash constructs, for instance \t for tab. I am thinking this > >> is not such a great idea, and that it'd be more portable rather than > >> less so if we got rid of that logic and just dumped tab as tab, etc. > >> In particular, making this play nice with standard_conforming_strings > >> seems unpleasant: we'll have to emit E'' strings which are certainly > >> not portable, not even to older PG releases. > > > Could we just give a switch to pg_dump, which toggles between > > standard_confirming_strings and old escaped strings? > > The plan is that it'll dump according to what it finds as the > standard_conforming_strings setting on the source server. > If you feel a need to override that setting, you can use PGOPTIONS > or the other usual ways to set a GUC variable for a program. > > However, my thought on the point at hand is to just go over to > dumping control characters literally in either case. This is > backwards-compatible to all PG versions and I don't know of a > reason to think it wouldn't work (at least as well as the backslash > constructs anyway) for portability to other databases. > > Note: this only affects strings dumped as part of SQL commands; > COPY data isn't at issue, since we're not planning to change the > semantics of that. COPY has always dumped tab as \t and I don't > intend to change it. But pg_dump --inserts would be affected, > also strings appearing in view definitions and such. > > We have some precedent for this in that pg_dump has by default > dumped function definitions as $$ literals for a release or two > now, and no one's complained of whitespace getting munged in > function definitions. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +