Re: CSV mode option for pg_dump - Mailing list pgsql-hackers
From | Andrew Dunstan |
---|---|
Subject | Re: CSV mode option for pg_dump |
Date | |
Msg-id | 3864.24.211.165.134.1150284172.squirrel@www.dunslane.net Whole thread Raw |
In response to | Re: CSV mode option for pg_dump (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: CSV mode option for pg_dump
Re: CSV mode option for pg_dump |
List | pgsql-hackers |
Tom Lane said: > Andrew Dunstan <andrew@dunslane.net> writes: >> This is just nonsense. There is not the slightest reason that CSV data >> cannot be embedded in a text dump nor exist as the data members of a >> custom or tar dump with the corresponding COPY statements modified >> accordingly. > > Well, the really *core* question here is whether we trust the stability > of the CSV format definition (and code) enough to want to rely on it > for data dump/restore purposes. I'm still a few years away from that, > myself. AFAICT the raison d'etre of the CSV code is "emit whatever it > takes to satisfy this, that, and the other broken Microsoft > application". That's fine as an export tool, but as a dump/reload tool, > nyet. If you put it in pg_dump you're just handing neophytes another > foot-gun. > Well, I'm staggered. Really. The CSV definition and its lack of formality is a complete red herring in this, as are references to Microsoft. The real issue in what you say here is your suggestion that we might not be able to reproduce the input in some cases via a COPY CSV round trip. If that is so it's a bug of our (probably my) making, and must be fixed. I assert that it is not so. In fact all the tests I did during development and since were premised on recovering the input exactly. The only CSV option that breaks things in that respect is FORCE NOT NULL, and it is designed for data coming in from a non Postgres source, so it's not relevant here. Even FORCE QUOTE won't break things because it never quotes a null value, and the only semantic significance to us of quoting is the null-ness of the value. If the code is broken then it should be discoverable by test or code analysis. There is no need to refer to any other application or standard. So if you or anyone think there is a case that will not reproduce the data exactly when the same CSV options are used for output and input, I challenge you or them to provide a single example. You say you're a few years away from trusting the code. Well, it's not so huge that it's beyond analysis, and I'll be happy to explain anything that puzzles you. Perhaps more importantly, it has been in use now for 18 months. We discovered one problem with embedded line feeds very early in the 8.0 release cycle, and fixed it. After that I have not heard of a single problem. And I assure you this code is widely used. It probably isn't used much as a round trip mechanism, probably in part because we haven't provided it as a pg_dump option. So maybe we have a chicken/egg scenario here. We do have some round trip regression tests in the copy test, and those can be beefed up if necessary to increase your confidence level. I'm happy to debate details, but general assertions of "we can't trust this code" don't seem worth much to me. cheers andrew
pgsql-hackers by date: