Re: Updated COPY CSV patch - Mailing list pgsql-patches

From Andrew Dunstan
Subject Re: Updated COPY CSV patch
Date
Msg-id 2522.24.211.141.25.1081853904.squirrel@www.dunslane.net
Whole thread Raw
In response to Updated COPY CSV patch  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: Updated COPY CSV patch
Re: Updated COPY CSV patch
List pgsql-patches
Bruce Momjian said:
>
> OK, here is a new version of the patch that includes the grammar changes
we agreed upon, SGML changes, and \copy support.  I will not make any more
changes without contacting you so feel free to make adjustments and repost.


Excellent. Quick work :-) I will test later today.

Meanwhile, quick response to your comments.


>
> I have two open issues.  First, CSV should support WITH OIDS, no?
>

Why on earth would you want to? OIDs only have emaning to postgresql.
Dumping to/from CSVs should not be seen as an alternative to Postgresql's
normal text or binary file formats. Rather, it is a way of exchanging
data  with other programs that can't conveniently read or write those
formats,  but can read/write CSVs. I expect the data to be making a one
way trip.  OIDs make no sense to those other programs.


For all those reasons I disallowed use of WITH OIDS for CSV mode.

If you think that we should have it, it will be simple enough to do. It
just strikes me as a very odd thing to do.

> Second, I found a problem with NULLs.  If I do:
> .
>        test=> create table test (x text, y text);
>        CREATE TABLE
>        test=> insert into test values ('', NULL);
>        INSERT 17221 1
>        test=>
>
> then this:
>
>        test=> copy test to '/tmp/b' with csv;
>
> creates:
>
>        "",
>
> and this:
>
>        test=> copy test to '/tmp/b' with csv NULL 'fred';
>
> creates:
>
>        ,fred
>
> Is that logical?  A non-null field went from "" to nothing.
>

Yes, I think it is logical, although it is strange, I agree. See below.


> I think it is caused by this code:
>
>         bool force_quote = (strcmp(string, null_print) == 0);
>         CopyAttributeOutCSV(string, delim, quote, escape,
>                             force_quote);
>
> The reason it happens is that when the null string is '', it matches a
zero-length string, so the value is quoted.  When the null stirng isn't
blank, a zero-length string doesn't match the null string so it isn't
quoted.    I think we need to add special logic for zero-length strings so
they are always quoted, even if there is a special null string.  This will
make our dumps more consistent, I think, or maybe the current behavior is
OK.  It just struck me as strange.
>

I agree it seems strange, but that's because you probably aren't that
used  to dealing with CSVs.

The code you quote is there so that if *we* reload a CSV *we* created the
null property is preserved. That's actually quite a neat property, and
one  I would have said is "beyond contract requirements" :-). The effect
you  have seen seems perverse because you have chosen a perverse null
marker -  I expect everyone (well, nearly everyone) who uses this will do
what seems  to me natural, and use '' as the null marker, and then, as you
saw, the  results look quite intuitive.

The real test of this is how well it plays with other programs, rather
than how well data makes a round trip from postgresql to postgresql. In
that sense, we need to let it out into the wild a bit and see what happens.

There is a bit of an impedance mismatch between the way databases look at
data and the way spreadsheets look at data. And CSV is a horribly brain-
dead data format. Its one saving grace is that it is a lowest common
denominator format that practically every data handling proggram under
the  sun understands. So there is going to be a bit of oddness. And we may
have  to make the odd tweak here and there.

One area that we should think about as an enhancement is NOT NULL fields.
As it stands now, we will get what we normally get when we try to insert
a  null into a NOT NULL field, namely an error. If the field has a simple
literal default we could force that. And in the special case of
text/varchar fields, it would be reasonable to force an empty string even
if no default is set. There isn't a nice easy answer, I'm afraid. We
shouldn't hold up putting this in on that account, but handling this
better is certainly a TODO.


(MySQL solves this problem by requiring every NOT NULL field to have a
default, and silently supplying one if it is not specified, and the
inserting a null is the sane as inserting DEFAULT. I don't think we want
to go down that road .... this bit me badly a few weeks back when I
applied some schema changes someone had carelessly designed relying on
this behaviour. Of course I was running PostgrSQL and the whole thing
blew  up on me.)



> I did a dump/reload test with a null string and null, and it worked fine.
>
> Is there any data that can not be dumped/reloaded via CSV?
>

I know it works fine for geometric types. I have seen it dump ACLs quite
happily, although I didn't try to reload them. I will do some testing
soon  with arrays and byteas.

cheers

andrew




pgsql-patches by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: aclitem accessor functions
Next
From: "Andrew Dunstan"
Date:
Subject: Re: Updated COPY CSV patch