Thread: COPY WITH CSV FORCE QUOTE *
Hi, FORCE QUOTE option of COPY WITH CSV requires an explicit column list, but '*' (all columns) would be also useful for typical usages. I searched the ML archive and found one request before: | COPY TO with FORCE QUOTE * | http://archives.postgresql.org/pgsql-sql/2008-08/msg00084.php The attached is a WIP patch add a support of '*' for FORCE QUOTE and FORCE NOT NULL options. I'd like to submit it for the next commit fest (8.5). Comments welcome. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
On Mon, May 11, 2009 at 11:49 PM, Itagaki Takahiro<itagaki.takahiro@oss.ntt.co.jp> wrote: > Hi, > > The attached is a WIP patch add a support of '*' for FORCE QUOTE > and FORCE NOT NULL options. I'd like to submit it for the next > commit fest (8.5). Comments welcome. > this patch applies almost cleanly except for a hunk in gram.y about the patch itself, i can find value for FORCE QUOTE * but what's the use case for FORCE NOT NULL? -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > i can find value for FORCE QUOTE * but what's > the use case for FORCE NOT NULL? NULLs are not quoted (to be ,, ) because empty strings are written as "". It comes from original implementation and not from my patch. I think we don't need to change the behavior. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
> Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > > > i can find value for FORCE QUOTE * but what's > > the use case for FORCE NOT NULL? Oh, sorry. I misread your mail. The patch adds * options "FORCE QUOTE" and "FORCE NOT NULL", too. Both of * mean all-columns for each options. The attached is an updated version of patch; just add documenation to copy.sgml. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
On Tue, Jul 14, 2009 at 2:26 AM, Itagaki Takahiro<itagaki.takahiro@oss.ntt.co.jp> wrote: > >> Jaime Casanova <jcasanov@systemguards.com.ec> wrote: >> >> > i can find value for FORCE QUOTE * but what's >> > the use case for FORCE NOT NULL? > > Oh, sorry. I misread your mail. > The patch adds * options "FORCE QUOTE" and "FORCE NOT NULL", too. > Both of * mean all-columns for each options. > and i misunderstood your patch... is actually an extend of an existing functionality, and both options are necesary for consistency... do you hear an echo here? ;) -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Itagaki Takahiro wrote: >> Jaime Casanova <jcasanov@systemguards.com.ec> wrote: >> >> >>> i can find value for FORCE QUOTE * but what's >>> the use case for FORCE NOT NULL? >>> > > Oh, sorry. I misread your mail. > The patch adds * options "FORCE QUOTE" and "FORCE NOT NULL", too. > Both of * mean all-columns for each options. > > I still don't understand the use case for FORCE NOT NULL *. FORCE QUOTE * is relatively benign. In particular, it doesn't affect the null-preserving properties of our CSV implementation, since it still won't (or shouldn't) quote null values. FORCE NOT NULL is in any case a fairly blunt instrument - it doesn't work for a column of any type that doesn't accept an empty string as valid input, such as numeric types. I note that this patch was (I think) originally submitted without prior discussion. That's not following best practice. cheers andrew
All, 1) Patch applies cleanly against CVS head. 2) Patch compiles and builds cleanly. 3) Unable to check docs because of general doc build problems. 4) Tested the following commands, using a 10MB table of PostgreSQL log data: postgres=# COPY marchlog TO '/tmp/marchlog1.csv' with csv header; COPY 81097 postgres=# COPY marchlog TO '/tmp/marchlog2.csv' with csv header force quote *; COPY 81097 postgres=# COPY marchlog TO '/tmp/marchlog3.csv' with csv header force quote process_id; COPY 81097 postgres=# COPY marchlog TO '/tmp/marchlog4.csv' with csv force quote *; COPY 81097 postgres=# COPY marchlog TO '/tmp/marchlog5.csv' with force quote *; ERROR: COPY force quote available only in CSV mode STATEMENT: COPY marchlog TO '/tmp/marchlog5.csv' with force quote *; ERROR: COPY force quote available only in CSV mode postgres=# COPY reloadlog FROM '/tmp/marchlog2.csv' with csv header; COPY 81097 postgres-# \copy marchlog TO '/tmp/marchlog5.csv' with csv force quote *; postgres-# Per discussion, I did not test FORCE QUOTE NOT NULL *. All output looked as expected. This patch did not seem to change eariler functionality, and seems to quote as specified. Unless there are other things we want to test (CLOBs?) I think the patch is probably ready for code review of the FORCE QUOTE * portion. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Andrew, > FORCE NOT NULL is in any case a fairly blunt instrument - it doesn't > work for a column of any type that doesn't accept an empty string as > valid input, such as numeric types. Con: this allows COPY to produce output which cannot be reloaded into PostgreSQL. Pro: there is a lot of extremely broken external software which expects "nulls" to be expressed as "". This improves compatiblity with them. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Josh Berkus wrote: > Andrew, > >> FORCE NOT NULL is in any case a fairly blunt instrument - it doesn't >> work for a column of any type that doesn't accept an empty string as >> valid input, such as numeric types. > > Con: this allows COPY to produce output which cannot be reloaded into > PostgreSQL. > > Pro: there is a lot of extremely broken external software which > expects "nulls" to be expressed as "". This improves compatiblity > with them. > FORCE NOT NULL is only valid when we import data, not when we export data, so what other programs expect to receive is irrelevant to any argument about FORCE NOT NULL. AFAICT on a brief look at the patch, it doesn't affect the quoting of nulls on export, it just allows * as an alias for all columns for FORCE QUOTE (as well as FORCE NOT NULL). But FORCE QUOTE has never forced quoting of null values, only non-null values. We have never quoted null values, and I'm fairly resistant to any suggestion that we should. As for importing data from programs that produce all values in quotes including null/missing values (your pro case above), arguably what we need is another flag that would turn an empty string into a null. cheers andrew
On Thu, Jul 16, 2009 at 2:47 PM, Josh Berkus<josh@agliodbs.com> wrote: > Unless there are other things we want to test (CLOBs?) I think the patch is > probably ready for code review of the FORCE QUOTE * portion. I think perhaps we should ask the patch author to remove the NOT NULL stuff first? ...Robert
On 7/16/09 12:53 PM, Robert Haas wrote: > On Thu, Jul 16, 2009 at 2:47 PM, Josh Berkus<josh@agliodbs.com> wrote: >> Unless there are other things we want to test (CLOBs?) I think the patch is >> probably ready for code review of the FORCE QUOTE * portion. > > I think perhaps we should ask the patch author to remove the NOT NULL > stuff first? Yes, current status is "Waiting on Author". -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Andrew, > AFAICT on a brief look at the patch, it doesn't affect the quoting of > nulls on export, it just allows * as an alias for all columns for FORCE > QUOTE (as well as FORCE NOT NULL). But FORCE QUOTE has never forced > quoting of null values, only non-null values. We have never quoted null > values, and I'm fairly resistant to any suggestion that we should. See? That's what happens when I can't build the docs. ;-) (and there's no previous discussion of the feature). > > As for importing data from programs that produce all values in quotes > including null/missing values (your pro case above), arguably what we > need is another flag that would turn an empty string into a null. ooooh, TODO, please? There's a lot of this out there, and I've had to build sed into a lot of import routines. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Josh Berkus wrote: > Andrew, > >> AFAICT on a brief look at the patch, it doesn't affect the quoting of >> nulls on export, it just allows * as an alias for all columns for FORCE >> QUOTE (as well as FORCE NOT NULL). But FORCE QUOTE has never forced >> quoting of null values, only non-null values. We have never quoted null >> values, and I'm fairly resistant to any suggestion that we should. > > See? That's what happens when I can't build the docs. ;-) (and > there's no previous discussion of the feature). > >> >> As for importing data from programs that produce all values in quotes >> including null/missing values (your pro case above), arguably what we >> need is another flag that would turn an empty string into a null. > > ooooh, TODO, please? There's a lot of this out there, and I've had to > build sed into a lot of import routines. > +1 For that on the TODO, happens all the time...
Chris Spotts wrote: >>> >>> As for importing data from programs that produce all values in quotes >>> including null/missing values (your pro case above), arguably what we >>> need is another flag that would turn an empty string into a null. >> >> ooooh, TODO, please? There's a lot of this out there, and I've had >> to build sed into a lot of import routines. >> > +1 For that on the TODO, happens all the time... > Well, somebody had better suggest a syntax for it, preferably without adding yet another keyword. cheers andrew
Josh Berkus <josh@agliodbs.com> wrote: > On 7/16/09 12:53 PM, Robert Haas wrote: > > I think perhaps we should ask the patch author to remove the NOT NULL > > stuff first? > > Yes, current status is "Waiting on Author". OK, I removed "FORCE NOT NULL" stuff from the patch. The attached patch only adds "FORCE QUOTE *" feature. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Attachment
Itagaki-san, On Apple OS 10.5: 1. new patch applied cleanly 2. new patch built cleanly 3. regression tests pass 4. Tested following operations: postgres=# COPY marchlog TO '/tmp/marchlog1.csv' with csv header; COPY 81097 postgres=# COPY marchlog TO '/tmp/marchlog2.csv' with csv header force quote *; COPY 81097 postgres=# COPY marchlog TO '/tmp/marchlog3.csv' with csv header force quote process_id; COPY 81097 postgres=# COPY marchlog TO '/tmp/marchlog4.csv' with csv force quote *; COPY 81097 postgres=# COPY marchlog TO '/tmp/marchlog5.csv' with force quote *; ERROR: COPY force quote available only in CSV mode STATEMENT: COPY marchlog TO '/tmp/marchlog5.csv' with force quote *; ERROR: COPY force quote available only in CSV mode postgres=# COPY reloadlog FROM '/tmp/marchlog2.csv' with csv header; COPY 81097 postgres-# \copy marchlog TO '/tmp/marchlog5.csv' with csv force quote *; postgres-# 5. Regression tests for FORCE QUOTE present. 6. Docs present; not sure how good they are, see prior discussion. Stuff someone else should do: a. review code b. review code format I am done with this review. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
On 7/16/09 1:55 PM, Andrew Dunstan wrote: > Well, somebody had better suggest a syntax for it, preferably without > adding yet another keyword. Actually, all that needs to happen is for NULL AS to accept '""' as a string. Right now that produces: ERROR: CSV quote character must not appear in the NULL specification What's the issue with that? I can see how NULL AS '"' would break things, but if I wanted NULL AS '"Josh"' shouldn't I be able to have it? -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Josh Berkus wrote: > On 7/16/09 1:55 PM, Andrew Dunstan wrote: >> Well, somebody had better suggest a syntax for it, preferably without >> adding yet another keyword. > > Actually, all that needs to happen is for NULL AS to accept '""' as a > string. Right now that produces: > > ERROR: CSV quote character must not appear in the NULL specification > > What's the issue with that? I can see how NULL AS '"' would break > things, but if I wanted NULL AS '"Josh"' shouldn't I be able to have it? > See previous thread on -patches/-hackers "allow CSV quote in NULL" in July 2007. Quite apart from any other objection, doing what you suggest will not be column-by-column, like what I suggested. It's an all or nothing deal. cheers andrew
Josh Berkus wrote: > > Stuff someone else should do: > > a. review code > b. review code format > > I am done with this review. I have reviewed this and made a small tweak in the docco. I'm just about ready to commit this, but I'm still slightly worried that passing NULL to denote all columns in this piece of grammar: | FORCE QUOTE '*' { $$ = makeDefElem("force_quote", NULL); } might be less than robust - it just feels slightly hacky, so I'd appreciate others' thoughts. If nobody else is bothered I will commit the patch. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > I have reviewed this and made a small tweak in the docco. I'm just about > ready to commit this, but I'm still slightly worried that passing NULL > to denote all columns in this piece of grammar: > | FORCE QUOTE '*' > { > $$ = makeDefElem("force_quote", NULL); > } > might be less than robust - it just feels slightly hacky, so I'd > appreciate others' thoughts. I agree, that's ugly. Why don't you use an A_Star node? regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> I have reviewed this and made a small tweak in the docco. I'm just about >> ready to commit this, but I'm still slightly worried that passing NULL >> to denote all columns in this piece of grammar: >> > > >> | FORCE QUOTE '*' >> { >> $$ = makeDefElem("force_quote", NULL); >> } >> > > >> might be less than robust - it just feels slightly hacky, so I'd >> appreciate others' thoughts. >> > > I agree, that's ugly. Why don't you use an A_Star node? > > > OK, Done and committed. Nice little addition. cheers andrew