Thread: pg_dump output portability

pg_dump output portability

From
Peter Eisentraut
Date:
I needed to move a PostgreSQL database to another product but I noticed
that the pg_dump output contains a few artifacts that make the output
nonportable.  Most of these should be relatively easy to fix.  Here's my
list:

* Boolean values should be dumped as true and false (rather than 't' and
'f') in INSERT-style output.

* Numeric and int8 should be dumped without quotes, except in cases like
'NaN'.

* Date, time, and timestamp literals should use standard prefixed syntax
like DATE 'yyyy-mm-dd'.

* Identifier quoting seems to be inconsistent.  The -n option gives you
portable behaviour (quoted only if mixed case or funny characters), but
the default -N doesn't actually quote some things that are generated by
the backend, including rule and index creation commands.  Is there a point
in having the -n behavior at all?

* Nonprintable characters in string literals are currently output as octal
escape sequences (e.g., \012).  It would be more portable to just print
out the characters as is.  This should be an option -- any opinions on
which might be a better default?

* The expression reverse-engineering code outputs ::text and similar casts
in many cases.  These should be CAST().

* It was once proposed to make SET SESSION AUTHORIZATION the default in
pg_dump.  What became of that?

* Is anyone working on using standard foreign key creation commands
instead of CREATE CONSTRAINT TRIGGER?

-- 
Peter Eisentraut   peter_e@gmx.net



Re: pg_dump output portability

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> I needed to move a PostgreSQL database to another product but I noticed
> that the pg_dump output contains a few artifacts that make the output
> nonportable.  Most of these should be relatively easy to fix.

Most of these look like they would break a lot of people --- for
example, we can't just arbitrarily change the results of bool_out.

> * Identifier quoting seems to be inconsistent.  The -n option gives you
> portable behaviour (quoted only if mixed case or funny characters), but
> the default -N doesn't actually quote some things that are generated by
> the backend, including rule and index creation commands.  Is there a point
> in having the -n behavior at all?

You mean you'd rather eliminate the -N behavior, no?  I'd vote for that.

> * Nonprintable characters in string literals are currently output as octal
> escape sequences (e.g., \012).  It would be more portable to just print
> out the characters as is.  This should be an option -- any opinions on
> which might be a better default?

Again, I'm fairly suspicious of this; it seems likely to result in
failures to read in the data.  You can't just leave data newlines as-is
for example.

> * The expression reverse-engineering code outputs ::text and similar casts
> in many cases.  These should be CAST().

I will vote against this as being a major loss of legibility.  Perhaps
we could compromise on controlling it by a GUC variable, though.

> * It was once proposed to make SET SESSION AUTHORIZATION the default in
> pg_dump.  What became of that?

I think this is a good idea, and was meaning to do it but hadn't got
round to it.

> * Is anyone working on using standard foreign key creation commands
> instead of CREATE CONSTRAINT TRIGGER?

Rod Taylor submitted a patch for that, which I was planning to review
and apply shortly.
        regards, tom lane


Re: pg_dump output portability

From
Rod Taylor
Date:
> * Is anyone working on using standard foreign key creation commands
> instead of CREATE CONSTRAINT TRIGGER?

Submitted with the pg_constraint patch, and more recently updated to
match cvs tip.   I believe Tom wishes to review this prior to
application.



Re: pg_dump output portability

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> I needed to move a PostgreSQL database to another product but I noticed
                                        ^^^^^^^^^^^^^^^^^^

Surely this is a misprint.  ;-)


> that the pg_dump output contains a few artifacts that make the output
> nonportable.  Most of these should be relatively easy to fix.  Here's my
> list:

Maybe we need a "maximum portability" flag for pg_dump that will do some
of the things outlined below.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump output portability

From
Peter Eisentraut
Date:
Tom Lane writes:

> Most of these look like they would break a lot of people --- for
> example, we can't just arbitrarily change the results of bool_out.

That wouldn't help anyway.  I meant to add code in pg_dump (and possibly
the rule recompiler).  That doesn't break anything.

> You mean you'd rather eliminate the -N behavior, no?  I'd vote for that.

Yes.  Or at least switch the default to "portable and readable".

> Again, I'm fairly suspicious of this; it seems likely to result in
> failures to read in the data.  You can't just leave data newlines as-is
> for example.

Why not?  You'd end up with

INSERT ... VALUES ('multi
line
literal', 'more data');

This is accepted by PostgreSQL now, is legal SQL, and is arguably at least
as readable as octal escape sequences.  (Note I'm not talking about doing
this in COPY, which is not portable anyway.)

> > * The expression reverse-engineering code outputs ::text and similar casts
> > in many cases.  These should be CAST().
>
> I will vote against this as being a major loss of legibility.  Perhaps
> we could compromise on controlling it by a GUC variable, though.

I was afraid of that, but to pick up the theme of the day, I'm not sure if
I want to overcomplexify things that much.  ;-)

-- 
Peter Eisentraut   peter_e@gmx.net



Re: pg_dump output portability

From
Bruce Momjian
Date:
Peter Eisentraut wrote:
> > I will vote against this as being a major loss of legibility.  Perhaps
> > we could compromise on controlling it by a GUC variable, though.
> 
> I was afraid of that, but to pick up the theme of the day, I'm not sure if
> I want to overcomplexify things that much.  ;-)

Tomorrow's theme is "sharing".  :-)

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: pg_dump output portability

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane writes:
>> Most of these look like they would break a lot of people --- for
>> example, we can't just arbitrarily change the results of bool_out.

> That wouldn't help anyway.  I meant to add code in pg_dump (and possibly
> the rule recompiler).  That doesn't break anything.

Ah.  But where exactly will you substitute true for 't'?  I don't think
pg_dump necessarily knows enough to apply that transformation.
ruleutils could and probably should do it for bool constants, but that's
only a small part of pg_dump output.

>> You mean you'd rather eliminate the -N behavior, no?  I'd vote for that.

> Yes.  Or at least switch the default to "portable and readable".

Switching the default is definitely fine with me, but I'd lean towards
ripping it out entirely, given that the backend-supplied chunks of stuff
are not going to have extra quotes.  We always tell people "always quote
or never quote" a given identifier; pg_dump scripts ought to follow that
rule.

>> Again, I'm fairly suspicious of this; it seems likely to result in
>> failures to read in the data.  You can't just leave data newlines as-is
>> for example.

> Why not?  You'd end up with

> INSERT ... VALUES ('multi
> line
> literal', 'more data');

> This is accepted by PostgreSQL now, is legal SQL, and is arguably at least
> as readable as octal escape sequences.  (Note I'm not talking about doing
> this in COPY, which is not portable anyway.)

Okay, I missed that context; I was thinking of COPY.  Yeah, in string
literals in INSERT it seems fairly reasonable to do nothing to the data
except double ' and \.  I am a little worried however about
character-set-encoding gotchas.  Hiroshi or Tatsuo might have more
insight here.
        regards, tom lane


Re: pg_dump output portability

From
Rod Taylor
Date:
On Wed, 2002-08-14 at 18:20, Bruce Momjian wrote:
> Peter Eisentraut wrote:
> > > I will vote against this as being a major loss of legibility.  Perhaps
> > > we could compromise on controlling it by a GUC variable, though.
> > 
> > I was afraid of that, but to pick up the theme of the day, I'm not sure if
> > I want to overcomplexify things that much.  ;-)
> 
> Tomorrow's theme is "sharing".  :-)

Brought to you by the letters S, Q, L and the number 99.



Re: pg_dump output portability

From
Peter Eisentraut
Date:
Tom Lane writes:

> Ah.  But where exactly will you substitute true for 't'?  I don't think
> pg_dump necessarily knows enough to apply that transformation.

Sure, it does it already for other types.  Look for BITOID in pg_dump.c.

> Switching the default is definitely fine with me, but I'd lean towards
> ripping it out entirely, given that the backend-supplied chunks of stuff
> are not going to have extra quotes.  We always tell people "always quote
> or never quote" a given identifier; pg_dump scripts ought to follow that
> rule.

It occurred to me that pg_dump would need to know which names are
keywords.  Ugh.

-- 
Peter Eisentraut   peter_e@gmx.net



Re: pg_dump output portability

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane writes:
>> Switching the default is definitely fine with me, but I'd lean towards
>> ripping it out entirely, given that the backend-supplied chunks of stuff
>> are not going to have extra quotes.  We always tell people "always quote
>> or never quote" a given identifier; pg_dump scripts ought to follow that
>> rule.

> It occurred to me that pg_dump would need to know which names are
> keywords.  Ugh.

Good point, but not insurmountable.  We could easily make pg_dump
include the backend's src/backend/parser/keywords.o file and then
use that subroutine to test for keyword-ness.  This would be a 100%
solution when dumping from a server of the same version, and about
a 99.9% solution when dumping from an older server.
        regards, tom lane