Thread: COPY .... (FORMAT binary) syntax doesn't work

COPY .... (FORMAT binary) syntax doesn't work

From
Simon Riggs
Date:
This works fine...
COPY pgbench_accounts TO '/tmp/acc' BINARY;

This new format does not
COPY pgbench_accounts FROM '/tmp/acc' (FORMAT BINARY);
ERROR:  syntax error at or near "BINARY" at character 47

which looks like I've mistyped something. Until you realise that this
statement gives a completely different error message.

COPY pgbench_accounts FROM '/tmp/acc' (FORMAT anyname);
ERROR:  COPY format "anyname" not recognized

and we also note that there are no examples in the docs, nor
regression tests to cover this situation.

So I conclude that this hasn't ever worked since it was introduced in 9.0.

The cause is that there is an overlap between the old and the new COPY
syntax, relating to the word BINARY. It's the grammar generating the
error, not post parse analysis.

My attempts to fix that look pretty ugly, so I'm not even going to
post them. I can stop the error on binary by causing errors on csv and
text, obviously not a fix. Any grammar based fix looks like it would
restrict the list of formats, which breaks the orginal intention of
the syntax change.

I'm advised that

COPY pgbench_accounts FROM '/tmp/acc' (FORMAT 'BINARY');

works fine. Though that is not documented and I doubt anyone much uses that.

My conclusion is that we should *not* fix this bug, but just alter the
manual slightly to explain what the correct usage is (use quotes
around 'binary'). Reason for that suggestion is that nobody ever
reported this bug, so either few people use binary mode or they use
the old syntax. Of course, that is not a normal suggestion, so feel
free to walk up and down my spine with boots on for suggesting it.

Thoughts?

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [BUGS] COPY .... (FORMAT binary) syntax doesn't work

From
Heikki Linnakangas
Date:
On 26.05.2013 04:31, Simon Riggs wrote:
> This works fine...
> COPY pgbench_accounts TO '/tmp/acc' BINARY;
>
> This new format does not
> COPY pgbench_accounts FROM '/tmp/acc' (FORMAT BINARY);
> ERROR:  syntax error at or near "BINARY" at character 47
>
> which looks like I've mistyped something. Until you realise that this
> statement gives a completely different error message.
>
> COPY pgbench_accounts FROM '/tmp/acc' (FORMAT anyname);
> ERROR:  COPY format "anyname" not recognized
>
> and we also note that there are no examples in the docs, nor
> regression tests to cover this situation.
>
> So I conclude that this hasn't ever worked since it was introduced in 9.0.
>
> The cause is that there is an overlap between the old and the new COPY
> syntax, relating to the word BINARY. It's the grammar generating the
> error, not post parse analysis.

Hmm, the problem is that BINARY is a type_func_keyword, so it doesn't 
match the ColId rule used to capture the format argument.

> My attempts to fix that look pretty ugly, so I'm not even going to
> post them. I can stop the error on binary by causing errors on csv and
> text, obviously not a fix. Any grammar based fix looks like it would
> restrict the list of formats, which breaks the orginal intention of
> the syntax change.

This seems to work:

--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -2528,3 +2528,7 @@ copy_generic_opt_elem:                 {                     $$ = makeDefElem($1, $2);
    }
 
+            | ColLabel BINARY
+                {
+                    $$ = makeDefElem($1, (Node *) makeString("binary"));
+                }


Am I missing something?

- Heikki



Re: [BUGS] COPY .... (FORMAT binary) syntax doesn't work

From
Tom Lane
Date:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 26.05.2013 04:31, Simon Riggs wrote:
>> This new format does not [work:]
>> COPY pgbench_accounts FROM '/tmp/acc' (FORMAT BINARY);
>> ERROR:  syntax error at or near "BINARY" at character 47

> This seems to work:

> --- a/src/backend/parser/gram.y
> +++ b/src/backend/parser/gram.y
> @@ -2528,3 +2528,7 @@ copy_generic_opt_elem:
>                   {
>                       $$ = makeDefElem($1, $2);
>                   }
> +            | ColLabel BINARY
> +                {
> +                    $$ = makeDefElem($1, (Node *) makeString("binary"));
> +                }

That only fixes things for the specific case of FORMAT BINARY.  I think
it would be better to generalize ColId_or_Sconst.  This one-liner works,
but it's pretty ugly:

*** gram.y~    Sun May 26 11:58:42 2013
--- gram.y    Sun May 26 12:00:03 2013
*************** opt_encoding:
*** 1548,1553 ****
--- 1548,1554 ----  ColId_or_Sconst:             ColId                                    { $$ = $1; }
+             | type_func_name_keyword                { $$ = pstrdup($1); }             | Sconst
       { $$ = $1; }         ;
 

More readable would be to invent an intermediate nonterminal falling
between ColId and ColLabel, whose expansion would be "IDENT |
unreserved_keyword | col_name_keyword | type_func_name_keyword", and
then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst.
Any thoughts about a name for that new nonterminal?

BTW, I tried just replacing ColId with ColLabel here, and that *almost*
works, but there are some places where we can see either ColId_or_Sconst
or DEFAULT.  I don't know of any sane way to express "all reserved
keywords except DEFAULT" to bison, so the best we can realistically do
is to accept all not-fully-reserved keywords in these places.
        regards, tom lane



Re: [BUGS] COPY .... (FORMAT binary) syntax doesn't work

From
Simon Riggs
Date:
On 26 May 2013 16:35, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

>> My attempts to fix that look pretty ugly, so I'm not even going to
>> post them. I can stop the error on binary by causing errors on csv and
>> text, obviously not a fix. Any grammar based fix looks like it would
>> restrict the list of formats, which breaks the orginal intention of
>> the syntax change.
>
>
> This seems to work:

This was almost exactly the fix I described above that only fixes that
specific case and then breaks others.

> --- a/src/backend/parser/gram.y
> +++ b/src/backend/parser/gram.y
> @@ -2528,3 +2528,7 @@ copy_generic_opt_elem:
>                                 {
>                                         $$ = makeDefElem($1, $2);
>                                 }
> +                       | ColLabel BINARY
> +                               {
> +                                       $$ = makeDefElem($1, (Node *)
> makeString("binary"));
> +                               }

So, no that doesn't work.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [BUGS] COPY .... (FORMAT binary) syntax doesn't work

From
Simon Riggs
Date:
On 26 May 2013 17:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:

> More readable would be to invent an intermediate nonterminal falling
> between ColId and ColLabel, whose expansion would be "IDENT |
> unreserved_keyword | col_name_keyword | type_func_name_keyword", and
> then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst.
> Any thoughts about a name for that new nonterminal?

Do you think complicating the parser in that way is worth the trouble
for this case? Could that slow down parsing?

We don't actually have to fix it; clearly not too many people are
bothered, since no complaints in 3 years. Documenting 'binary' seems
better.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [BUGS] COPY .... (FORMAT binary) syntax doesn't work

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On 26 May 2013 17:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> More readable would be to invent an intermediate nonterminal falling
>> between ColId and ColLabel, whose expansion would be "IDENT |
>> unreserved_keyword | col_name_keyword | type_func_name_keyword", and
>> then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst.
>> Any thoughts about a name for that new nonterminal?

> Do you think complicating the parser in that way is worth the trouble
> for this case? Could that slow down parsing?

It makes the grammar tables a bit larger (1% or so IIRC).  There would
be some distributed penalty for that, but probably not much.  Of course
there's always the slippery-slope argument about that.

> We don't actually have to fix it; clearly not too many people are
> bothered, since no complaints in 3 years. Documenting 'binary' seems
> better.

Well, my thought is there are other cases.  For instance:

regression=# create role binary;
ERROR:  syntax error at or near "binary"
LINE 1: create role binary;                   ^
regression=# create user cross;
ERROR:  syntax error at or near "cross"
LINE 1: create user cross;                   ^

If we don't have to treat type_func_name_keywords as reserved in these
situations, shouldn't we avoid doing so?
        regards, tom lane



Re: [BUGS] COPY .... (FORMAT binary) syntax doesn't work

From
Robert Haas
Date:
On Mon, May 27, 2013 at 10:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 26 May 2013 17:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> More readable would be to invent an intermediate nonterminal falling
>>> between ColId and ColLabel, whose expansion would be "IDENT |
>>> unreserved_keyword | col_name_keyword | type_func_name_keyword", and
>>> then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst.
>>> Any thoughts about a name for that new nonterminal?
>
>> Do you think complicating the parser in that way is worth the trouble
>> for this case? Could that slow down parsing?
>
> It makes the grammar tables a bit larger (1% or so IIRC).  There would
> be some distributed penalty for that, but probably not much.  Of course
> there's always the slippery-slope argument about that.
>
>> We don't actually have to fix it; clearly not too many people are
>> bothered, since no complaints in 3 years. Documenting 'binary' seems
>> better.
>
> Well, my thought is there are other cases.  For instance:
>
> regression=# create role binary;
> ERROR:  syntax error at or near "binary"
> LINE 1: create role binary;
>                     ^
> regression=# create user cross;
> ERROR:  syntax error at or near "cross"
> LINE 1: create user cross;
>                     ^
>
> If we don't have to treat type_func_name_keywords as reserved in these
> situations, shouldn't we avoid doing so?

I am almost always in favor of making more things less reserved, so +1 from me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [BUGS] COPY .... (FORMAT binary) syntax doesn't work

From
Simon Riggs
Date:
On 27 May 2013 15:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 26 May 2013 17:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> More readable would be to invent an intermediate nonterminal falling
>>> between ColId and ColLabel, whose expansion would be "IDENT |
>>> unreserved_keyword | col_name_keyword | type_func_name_keyword", and
>>> then replace ColId_or_Sconst with whatever-we-call-that_or_Sconst.
>>> Any thoughts about a name for that new nonterminal?
>
>> Do you think complicating the parser in that way is worth the trouble
>> for this case? Could that slow down parsing?
>
> It makes the grammar tables a bit larger (1% or so IIRC).  There would
> be some distributed penalty for that, but probably not much.  Of course
> there's always the slippery-slope argument about that.
>
>> We don't actually have to fix it; clearly not too many people are
>> bothered, since no complaints in 3 years. Documenting 'binary' seems
>> better.
>
> Well, my thought is there are other cases.  For instance:
>
> regression=# create role binary;
> ERROR:  syntax error at or near "binary"
> LINE 1: create role binary;
>                     ^
> regression=# create user cross;
> ERROR:  syntax error at or near "cross"
> LINE 1: create user cross;
>                     ^
>
> If we don't have to treat type_func_name_keywords as reserved in these
> situations, shouldn't we avoid doing so?


Seems reasonable argument, so +1. Sorry for delayed reply.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services