Re: CSV mode option for pg_dump - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: CSV mode option for pg_dump
Date
Msg-id 3864.24.211.165.134.1150284172.squirrel@www.dunslane.net
Whole thread Raw
In response to Re: CSV mode option for pg_dump  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: CSV mode option for pg_dump  (Bruce Momjian <pgman@candle.pha.pa.us>)
Re: CSV mode option for pg_dump  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
Tom Lane said:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> This is just nonsense. There is not the slightest reason that CSV data
>>  cannot be embedded in a text dump nor exist as the data members of a
>> custom or tar dump with the corresponding COPY statements modified
>> accordingly.
>
> Well, the really *core* question here is whether we trust the stability
> of the CSV format definition (and code) enough to want to rely on it
> for data dump/restore purposes.  I'm still a few years away from that,
> myself.  AFAICT the raison d'etre of the CSV code is "emit whatever it
> takes to satisfy this, that, and the other broken Microsoft
> application". That's fine as an export tool, but as a dump/reload tool,
> nyet.  If you put it in pg_dump you're just handing neophytes another
> foot-gun.
>


Well, I'm staggered. Really.

The CSV definition and its lack of formality is a complete red herring in
this, as are references to Microsoft.

The real issue in what you say here is your suggestion that we might not be
able to reproduce the input in some cases via a COPY CSV round trip. If that
is so it's a bug of our (probably my) making, and must be fixed. I assert
that it is not so. In fact all the tests I did during development and since
were premised on recovering the input exactly. The only CSV option that
breaks things in that respect is FORCE NOT NULL, and it is designed for data
coming in from a non Postgres source, so it's not relevant here. Even FORCE
QUOTE won't break things because it never quotes a null value, and the only
semantic significance to us of quoting is the null-ness of the value.

If the code is broken then it should be discoverable by test or code
analysis. There is no need to refer to any other application or standard. So
if you or anyone think there is a case that will not reproduce the data
exactly when the same CSV options are used for output and input, I challenge
you or them to provide a single example.

You say you're a few years away from trusting the code. Well, it's not so
huge that it's beyond analysis, and I'll be happy to explain anything that
puzzles you. Perhaps more importantly, it has been in use now for 18 months.
We discovered one problem with embedded line feeds very early in the 8.0
release cycle, and fixed it. After that I have not heard of a single
problem. And I assure you this code is widely used.

It probably isn't used much as a round trip mechanism, probably in part
because we haven't provided it as a pg_dump option. So maybe we have a
chicken/egg scenario here. We do have some round trip regression tests in
the copy test, and those can be beefed up if necessary to increase your
confidence level.

I'm happy to debate details, but general assertions of "we can't trust this
code" don't seem worth much to me.

cheers

andrew







pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: postgresql and process titles
Next
From: Bruce Momjian
Date:
Subject: Re: CSV mode option for pg_dump