Thread: COPY .... (FORMAT binary) syntax doesn't work
This works fine... COPY pgbench_accounts TO '/tmp/acc' BINARY; This new format does not COPY pgbench_accounts FROM '/tmp/acc' (FORMAT BINARY); ERROR: syntax error at or near "BINARY" at character 47 which looks like I've mistyped something. Until you realise that this statement gives a completely different error message. COPY pgbench_accounts FROM '/tmp/acc' (FORMAT anyname); ERROR: COPY format "anyname" not recognized and we also note that there are no examples in the docs, nor regression tests to cover this situation. So I conclude that this hasn't ever worked since it was introduced in 9.0. The cause is that there is an overlap between the old and the new COPY syntax, relating to the word BINARY. It's the grammar generating the error, not post parse analysis. My attempts to fix that look pretty ugly, so I'm not even going to post them. I can stop the error on binary by causing errors on csv and text, obviously not a fix. Any grammar based fix looks like it would restrict the list of formats, which breaks the orginal intention of the syntax change. I'm advised that COPY pgbench_accounts FROM '/tmp/acc' (FORMAT 'BINARY'); works fine. Though that is not documented and I doubt anyone much uses that. My conclusion is that we should *not* fix this bug, but just alter the manual slightly to explain what the correct usage is (use quotes around 'binary'). Reason for that suggestion is that nobody ever reported this bug, so either few people use binary mode or they use the old syntax. Of course, that is not a normal suggestion, so feel free to walk up and down my spine with boots on for suggesting it. Thoughts? --Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 26.05.2013 04:31, Simon Riggs wrote: > This works fine... > COPY pgbench_accounts TO '/tmp/acc' BINARY; > > This new format does not > COPY pgbench_accounts FROM '/tmp/acc' (FORMAT BINARY); > ERROR: syntax error at or near "BINARY" at character 47 > > which looks like I've mistyped something. Until you realise that this > statement gives a completely different error message. > > COPY pgbench_accounts FROM '/tmp/acc' (FORMAT anyname); > ERROR: COPY format "anyname" not recognized > > and we also note that there are no examples in the docs, nor > regression tests to cover this situation. > > So I conclude that this hasn't ever worked since it was introduced in 9.0. > > The cause is that there is an overlap between the old and the new COPY > syntax, relating to the word BINARY. It's the grammar generating the > error, not post parse analysis. Hmm, the problem is that BINARY is a type_func_keyword, so it doesn't match the ColId rule used to capture the format argument. > My attempts to fix that look pretty ugly, so I'm not even going to > post them. I can stop the error on binary by causing errors on csv and > text, obviously not a fix. Any grammar based fix looks like it would > restrict the list of formats, which breaks the orginal intention of > the syntax change. This seems to work: --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -2528,3 +2528,7 @@ copy_generic_opt_elem: { $$ = makeDefElem($1, $2); } + | ColLabel BINARY + { + $$ = makeDefElem($1, (Node *) makeString("binary")); + } Am I missing something? - Heikki
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > On 26.05.2013 04:31, Simon Riggs wrote: >> This new format does not [work:] >> COPY pgbench_accounts FROM '/tmp/acc' (FORMAT BINARY); >> ERROR: syntax error at or near "BINARY" at character 47 > This seems to work: > --- a/src/backend/parser/gram.y > +++ b/src/backend/parser/gram.y > @@ -2528,3 +2528,7 @@ copy_generic_opt_elem: > { > $$ = makeDefElem($1, $2); > } > + | ColLabel BINARY > + { > + $$ = makeDefElem($1, (Node *) makeString("binary")); > + } That only fixes things for the specific case of FORMAT BINARY. I think it would be better to generalize ColId_or_Sconst. This one-liner works, but it's pretty ugly: *** gram.y~ Sun May 26 11:58:42 2013 --- gram.y Sun May 26 12:00:03 2013 *************** opt_encoding: *** 1548,1553 **** --- 1548,1554 ---- ColId_or_Sconst: ColId { $$ = $1; } + | type_func_name_keyword { $$ = pstrdup($1); } | Sconst { $$ = $1; } ; More readable would be to invent an intermediate nonterminal falling between ColId and ColLabel, whose expansion would be "IDENT | unreserved_keyword | col_name_keyword | type_func_name_keyword", and then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst. Any thoughts about a name for that new nonterminal? BTW, I tried just replacing ColId with ColLabel here, and that *almost* works, but there are some places where we can see either ColId_or_Sconst or DEFAULT. I don't know of any sane way to express "all reserved keywords except DEFAULT" to bison, so the best we can realistically do is to accept all not-fully-reserved keywords in these places. regards, tom lane
On 26 May 2013 16:35, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: >> My attempts to fix that look pretty ugly, so I'm not even going to >> post them. I can stop the error on binary by causing errors on csv and >> text, obviously not a fix. Any grammar based fix looks like it would >> restrict the list of formats, which breaks the orginal intention of >> the syntax change. > > > This seems to work: This was almost exactly the fix I described above that only fixes that specific case and then breaks others. > --- a/src/backend/parser/gram.y > +++ b/src/backend/parser/gram.y > @@ -2528,3 +2528,7 @@ copy_generic_opt_elem: > { > $$ = makeDefElem($1, $2); > } > + | ColLabel BINARY > + { > + $$ = makeDefElem($1, (Node *) > makeString("binary")); > + } So, no that doesn't work. --Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 26 May 2013 17:10, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Heikki Linnakangas <hlinnakangas@vmware.com> writes: > More readable would be to invent an intermediate nonterminal falling > between ColId and ColLabel, whose expansion would be "IDENT | > unreserved_keyword | col_name_keyword | type_func_name_keyword", and > then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst. > Any thoughts about a name for that new nonterminal? Do you think complicating the parser in that way is worth the trouble for this case? Could that slow down parsing? We don't actually have to fix it; clearly not too many people are bothered, since no complaints in 3 years. Documenting 'binary' seems better. --Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On 26 May 2013 17:10, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> More readable would be to invent an intermediate nonterminal falling >> between ColId and ColLabel, whose expansion would be "IDENT | >> unreserved_keyword | col_name_keyword | type_func_name_keyword", and >> then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst. >> Any thoughts about a name for that new nonterminal? > Do you think complicating the parser in that way is worth the trouble > for this case? Could that slow down parsing? It makes the grammar tables a bit larger (1% or so IIRC). There would be some distributed penalty for that, but probably not much. Of course there's always the slippery-slope argument about that. > We don't actually have to fix it; clearly not too many people are > bothered, since no complaints in 3 years. Documenting 'binary' seems > better. Well, my thought is there are other cases. For instance: regression=# create role binary; ERROR: syntax error at or near "binary" LINE 1: create role binary; ^ regression=# create user cross; ERROR: syntax error at or near "cross" LINE 1: create user cross; ^ If we don't have to treat type_func_name_keywords as reserved in these situations, shouldn't we avoid doing so? regards, tom lane
On Mon, May 27, 2013 at 10:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 26 May 2013 17:10, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> More readable would be to invent an intermediate nonterminal falling >>> between ColId and ColLabel, whose expansion would be "IDENT | >>> unreserved_keyword | col_name_keyword | type_func_name_keyword", and >>> then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst. >>> Any thoughts about a name for that new nonterminal? > >> Do you think complicating the parser in that way is worth the trouble >> for this case? Could that slow down parsing? > > It makes the grammar tables a bit larger (1% or so IIRC). There would > be some distributed penalty for that, but probably not much. Of course > there's always the slippery-slope argument about that. > >> We don't actually have to fix it; clearly not too many people are >> bothered, since no complaints in 3 years. Documenting 'binary' seems >> better. > > Well, my thought is there are other cases. For instance: > > regression=# create role binary; > ERROR: syntax error at or near "binary" > LINE 1: create role binary; > ^ > regression=# create user cross; > ERROR: syntax error at or near "cross" > LINE 1: create user cross; > ^ > > If we don't have to treat type_func_name_keywords as reserved in these > situations, shouldn't we avoid doing so? I am almost always in favor of making more things less reserved, so +1 from me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 27 May 2013 15:31, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 26 May 2013 17:10, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> More readable would be to invent an intermediate nonterminal falling >>> between ColId and ColLabel, whose expansion would be "IDENT | >>> unreserved_keyword | col_name_keyword | type_func_name_keyword", and >>> then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst. >>> Any thoughts about a name for that new nonterminal? > >> Do you think complicating the parser in that way is worth the trouble >> for this case? Could that slow down parsing? > > It makes the grammar tables a bit larger (1% or so IIRC). There would > be some distributed penalty for that, but probably not much. Of course > there's always the slippery-slope argument about that. > >> We don't actually have to fix it; clearly not too many people are >> bothered, since no complaints in 3 years. Documenting 'binary' seems >> better. > > Well, my thought is there are other cases. For instance: > > regression=# create role binary; > ERROR: syntax error at or near "binary" > LINE 1: create role binary; > ^ > regression=# create user cross; > ERROR: syntax error at or near "cross" > LINE 1: create user cross; > ^ > > If we don't have to treat type_func_name_keywords as reserved in these > situations, shouldn't we avoid doing so? Seems reasonable argument, so +1. Sorry for delayed reply. --Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services