Thread: Better to dump tabs as tabs, or \t?

Better to dump tabs as tabs, or \t?

From
Tom Lane
Date:
Historically pg_dump has taken pains to dump ASCII control characters
as backslash constructs, for instance \t for tab.  I am thinking this
is not such a great idea, and that it'd be more portable rather than
less so if we got rid of that logic and just dumped tab as tab, etc.
In particular, making this play nice with standard_conforming_strings
seems unpleasant: we'll have to emit E'' strings which are certainly
not portable, not even to older PG releases.

The only good argument I can see for the current behavior is that it
makes the dump file somewhat more robust against whitespace-mashing
filters like typical email programs.  But I wouldn't count on a
dump file to come through such a thing completely unscathed anyway.

Thoughts?
        regards, tom lane


Re: Better to dump tabs as tabs, or \t?

From
Andrew Dunstan
Date:
Tom Lane wrote:
> Historically pg_dump has taken pains to dump ASCII control characters
> as backslash constructs, for instance \t for tab.  I am thinking this
> is not such a great idea, and that it'd be more portable rather than
> less so if we got rid of that logic and just dumped tab as tab, etc.
> In particular, making this play nice with standard_conforming_strings
> seems unpleasant: we'll have to emit E'' strings which are certainly
> not portable, not even to older PG releases.
>
> The only good argument I can see for the current behavior is that it
> makes the dump file somewhat more robust against whitespace-mashing
> filters like typical email programs.  But I wouldn't count on a
> dump file to come through such a thing completely unscathed anyway.
>
>
>   

Go for it.

Emailing a database dump as other than a properly encoded MIME 
attachment makes no sense these days.

cheers

andrew





Re: Better to dump tabs as tabs, or \t?

From
"Marko Kreen"
Date:
On 5/27/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Historically pg_dump has taken pains to dump ASCII control characters
> as backslash constructs, for instance \t for tab.  I am thinking this
> is not such a great idea, and that it'd be more portable rather than
> less so if we got rid of that logic and just dumped tab as tab, etc.
> In particular, making this play nice with standard_conforming_strings
> seems unpleasant: we'll have to emit E'' strings which are certainly
> not portable, not even to older PG releases.

Could we just give a switch to pg_dump, which toggles between
standard_confirming_strings and old escaped strings?

IMHO this decision is similar to COPY/INSERT decision - it depends
what the admin plans to with the dump, what tools are user on it,
whether there is need to reload on older postgres, etc - and all
of them are things that the postgres tools cannot deduce.

By default, pg_dump should output standard_conforming_strings,
that being in sync with policy to move to standard SQL quoting.

And when the switch is given, pg_dump should put SET at the
start of the dump, not use E'' stings, so giving option
for being backwards compatible.

Such option would considerably lower the pain of migrating data
between versions.

-- 
marko


Re: Better to dump tabs as tabs, or \t?

From
Tom Lane
Date:
"Marko Kreen" <markokr@gmail.com> writes:
> On 5/27/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Historically pg_dump has taken pains to dump ASCII control characters
>> as backslash constructs, for instance \t for tab.  I am thinking this
>> is not such a great idea, and that it'd be more portable rather than
>> less so if we got rid of that logic and just dumped tab as tab, etc.
>> In particular, making this play nice with standard_conforming_strings
>> seems unpleasant: we'll have to emit E'' strings which are certainly
>> not portable, not even to older PG releases.

> Could we just give a switch to pg_dump, which toggles between
> standard_confirming_strings and old escaped strings?

The plan is that it'll dump according to what it finds as the
standard_conforming_strings setting on the source server.
If you feel a need to override that setting, you can use PGOPTIONS
or the other usual ways to set a GUC variable for a program.

However, my thought on the point at hand is to just go over to
dumping control characters literally in either case.  This is
backwards-compatible to all PG versions and I don't know of a
reason to think it wouldn't work (at least as well as the backslash
constructs anyway) for portability to other databases.

Note: this only affects strings dumped as part of SQL commands;
COPY data isn't at issue, since we're not planning to change the
semantics of that.  COPY has always dumped tab as \t and I don't
intend to change it.  But pg_dump --inserts would be affected,
also strings appearing in view definitions and such.

We have some precedent for this in that pg_dump has by default
dumped function definitions as $$ literals for a release or two
now, and no one's complained of whitespace getting munged in
function definitions.
        regards, tom lane


Re: Better to dump tabs as tabs, or \t?

From
Bruce Momjian
Date:
COPY wants \r and \n to be used because it checks for line endings, but
your change is only for the SQL strings, and you are right, it is more
porable to dump as actual bytes than backslashes.

---------------------------------------------------------------------------

Tom Lane wrote:
> "Marko Kreen" <markokr@gmail.com> writes:
> > On 5/27/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Historically pg_dump has taken pains to dump ASCII control characters
> >> as backslash constructs, for instance \t for tab.  I am thinking this
> >> is not such a great idea, and that it'd be more portable rather than
> >> less so if we got rid of that logic and just dumped tab as tab, etc.
> >> In particular, making this play nice with standard_conforming_strings
> >> seems unpleasant: we'll have to emit E'' strings which are certainly
> >> not portable, not even to older PG releases.
> 
> > Could we just give a switch to pg_dump, which toggles between
> > standard_confirming_strings and old escaped strings?
> 
> The plan is that it'll dump according to what it finds as the
> standard_conforming_strings setting on the source server.
> If you feel a need to override that setting, you can use PGOPTIONS
> or the other usual ways to set a GUC variable for a program.
> 
> However, my thought on the point at hand is to just go over to
> dumping control characters literally in either case.  This is
> backwards-compatible to all PG versions and I don't know of a
> reason to think it wouldn't work (at least as well as the backslash
> constructs anyway) for portability to other databases.
> 
> Note: this only affects strings dumped as part of SQL commands;
> COPY data isn't at issue, since we're not planning to change the
> semantics of that.  COPY has always dumped tab as \t and I don't
> intend to change it.  But pg_dump --inserts would be affected,
> also strings appearing in view definitions and such.
> 
> We have some precedent for this in that pg_dump has by default
> dumped function definitions as $$ literals for a release or two
> now, and no one's complained of whitespace getting munged in
> function definitions.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
> 

--  Bruce Momjian   http://candle.pha.pa.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +