Re: Updated COPY CSV patch - Mailing list pgsql-patches
From | Andrew Dunstan |
---|---|
Subject | Re: Updated COPY CSV patch |
Date | |
Msg-id | 2522.24.211.141.25.1081853904.squirrel@www.dunslane.net Whole thread Raw |
In response to | Updated COPY CSV patch (Bruce Momjian <pgman@candle.pha.pa.us>) |
Responses |
Re: Updated COPY CSV patch
Re: Updated COPY CSV patch |
List | pgsql-patches |
Bruce Momjian said: > > OK, here is a new version of the patch that includes the grammar changes we agreed upon, SGML changes, and \copy support. I will not make any more changes without contacting you so feel free to make adjustments and repost. Excellent. Quick work :-) I will test later today. Meanwhile, quick response to your comments. > > I have two open issues. First, CSV should support WITH OIDS, no? > Why on earth would you want to? OIDs only have emaning to postgresql. Dumping to/from CSVs should not be seen as an alternative to Postgresql's normal text or binary file formats. Rather, it is a way of exchanging data with other programs that can't conveniently read or write those formats, but can read/write CSVs. I expect the data to be making a one way trip. OIDs make no sense to those other programs. For all those reasons I disallowed use of WITH OIDS for CSV mode. If you think that we should have it, it will be simple enough to do. It just strikes me as a very odd thing to do. > Second, I found a problem with NULLs. If I do: > . > test=> create table test (x text, y text); > CREATE TABLE > test=> insert into test values ('', NULL); > INSERT 17221 1 > test=> > > then this: > > test=> copy test to '/tmp/b' with csv; > > creates: > > "", > > and this: > > test=> copy test to '/tmp/b' with csv NULL 'fred'; > > creates: > > ,fred > > Is that logical? A non-null field went from "" to nothing. > Yes, I think it is logical, although it is strange, I agree. See below. > I think it is caused by this code: > > bool force_quote = (strcmp(string, null_print) == 0); > CopyAttributeOutCSV(string, delim, quote, escape, > force_quote); > > The reason it happens is that when the null string is '', it matches a zero-length string, so the value is quoted. When the null stirng isn't blank, a zero-length string doesn't match the null string so it isn't quoted. I think we need to add special logic for zero-length strings so they are always quoted, even if there is a special null string. This will make our dumps more consistent, I think, or maybe the current behavior is OK. It just struck me as strange. > I agree it seems strange, but that's because you probably aren't that used to dealing with CSVs. The code you quote is there so that if *we* reload a CSV *we* created the null property is preserved. That's actually quite a neat property, and one I would have said is "beyond contract requirements" :-). The effect you have seen seems perverse because you have chosen a perverse null marker - I expect everyone (well, nearly everyone) who uses this will do what seems to me natural, and use '' as the null marker, and then, as you saw, the results look quite intuitive. The real test of this is how well it plays with other programs, rather than how well data makes a round trip from postgresql to postgresql. In that sense, we need to let it out into the wild a bit and see what happens. There is a bit of an impedance mismatch between the way databases look at data and the way spreadsheets look at data. And CSV is a horribly brain- dead data format. Its one saving grace is that it is a lowest common denominator format that practically every data handling proggram under the sun understands. So there is going to be a bit of oddness. And we may have to make the odd tweak here and there. One area that we should think about as an enhancement is NOT NULL fields. As it stands now, we will get what we normally get when we try to insert a null into a NOT NULL field, namely an error. If the field has a simple literal default we could force that. And in the special case of text/varchar fields, it would be reasonable to force an empty string even if no default is set. There isn't a nice easy answer, I'm afraid. We shouldn't hold up putting this in on that account, but handling this better is certainly a TODO. (MySQL solves this problem by requiring every NOT NULL field to have a default, and silently supplying one if it is not specified, and the inserting a null is the sane as inserting DEFAULT. I don't think we want to go down that road .... this bit me badly a few weeks back when I applied some schema changes someone had carelessly designed relying on this behaviour. Of course I was running PostgrSQL and the whole thing blew up on me.) > I did a dump/reload test with a null string and null, and it worked fine. > > Is there any data that can not be dumped/reloaded via CSV? > I know it works fine for geometric types. I have seen it dump ACLs quite happily, although I didn't try to reload them. I will do some testing soon with arrays and byteas. cheers andrew
pgsql-patches by date: