Thread: Patch to bring \copy syntax more in line with SQL copy

Patch to bring \copy syntax more in line with SQL copy

From
Bill Moran
Date:
This patch brings the \copy syntax of psql more in line with the SQL copy
syntax as described in the reference manual, while maintaining backward
compatibility with what was there prior to the patch.

The change to the syntax is to the DELIMITER[S] portion.

The man page for psql refers the reader to the SQL copy command docs for
details of syntax.  The SQL docs state that the syntax for COPY is
... [ DELIMITER [ AS ] 'delimiter' ] ...
(7.4 docs), while 7.2 docs
use the syntax:
... [ [USING] DELIMITERS 'delimiter' ]
Reading through the psql source code, it's obvious that at one time, the
syntax was
... [ WITH DELIMITER 'delimiter' ] ...

The patch results in the following syntax for the DELIMITER clause of
\copy:
[ [WITH|USING] DELIMITER[S] 'delimiter' ]

Ideally, this should cover all cases of old and new syntax, except where
"AS" is present.  The only drawback I can see is that \copy is now more
liberal in what it accepts, and may accept incomplete statements
without issuing an error (i.e. a WITH or USING clause at the end of a
\copy statement will simply be ignored, and no error generated)

This patch is against version 1.36 of src/bin/psql/copy.c
The resulting psql binary (based off a cvsup from HEAD on 1/18) compiles
and works for me on FreeBSD 5.1.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com
*** copy.c.old    Sat Jan 17 22:17:58 2004
--- copy.c    Sat Jan 17 23:06:30 2004
***************
*** 226,283 ****
      token = strtokx(NULL, whitespace, NULL, NULL,
                      0, false, pset.encoding);

!     /*
!      * Allows old COPY syntax for backward compatibility 2002-06-19
!      */
!     if (token && strcasecmp(token, "using") == 0)
      {
          token = strtokx(NULL, whitespace, NULL, NULL,
                          0, false, pset.encoding);
!         if (!(token && strcasecmp(token, "delimiters") == 0))
!             goto error;
          token = strtokx(NULL, whitespace, NULL, "'",
                          '\\', false, pset.encoding);
          if (!token)
              goto error;
          result->delim = xstrdup(token);
-         token = strtokx(NULL, whitespace, NULL, NULL,
-                         0, false, pset.encoding);
-     }
-
-     if (token)
-     {
-         if (strcasecmp(token, "with") != 0)
-             goto error;
-         while ((token = strtokx(NULL, whitespace, NULL, NULL,
-                                 0, false, pset.encoding)) != NULL)
-         {
-             if (strcasecmp(token, "delimiter") == 0)
-             {
-                 token = strtokx(NULL, whitespace, NULL, "'",
-                                 '\\', false, pset.encoding);
-                 if (token && strcasecmp(token, "as") == 0)
-                     token = strtokx(NULL, whitespace, NULL, "'",
-                                     '\\', false, pset.encoding);
-                 if (token)
-                     result->delim = xstrdup(token);
-                 else
-                     goto error;
-             }
-             else if (strcasecmp(token, "null") == 0)
-             {
-                 token = strtokx(NULL, whitespace, NULL, "'",
-                                 '\\', false, pset.encoding);
-                 if (token && strcasecmp(token, "as") == 0)
-                     token = strtokx(NULL, whitespace, NULL, "'",
-                                     '\\', false, pset.encoding);
-                 if (token)
-                     result->null = xstrdup(token);
-                 else
-                     goto error;
-             }
-             else
-                 goto error;
-         }
      }

      free(line);
--- 226,252 ----
      token = strtokx(NULL, whitespace, NULL, NULL,
                      0, false, pset.encoding);

!     // Discard both "with" and "using" as syntactically neutral
!     if (token &&
!         (strcasecmp(token, "using") == 0 ||
!          strcasecmp(token, "with") == 0))
      {
          token = strtokx(NULL, whitespace, NULL, NULL,
                          0, false, pset.encoding);
!     }
!
!     /*
!      * Allow any combination of "with"|"using" and "delimiter"|"delimiters"
!      */
!     if (token &&
!         (strcasecmp(token, "delimiters") == 0 ||
!          strcasecmp(token, "delimiter") == 0))
!     {
          token = strtokx(NULL, whitespace, NULL, "'",
                          '\\', false, pset.encoding);
          if (!token)
              goto error;
          result->delim = xstrdup(token);
      }

      free(line);

Re: Patch to bring \copy syntax more in line with SQL

From
Neil Conway
Date:
A few quick comments:

- don't use C++-style comments

- please update the documentation

- update the comment at the top of copy.c

Bill Moran <wmoran@potentialtech.com> writes:
> Ideally, this should cover all cases of old and new syntax, except
> where "AS" is present.

ISTM it would be easy to allow for an optional 'AS' token following
the 'DELIMITER[S]' token.

> The only drawback I can see is that \copy is now more liberal in
> what it accepts, and may accept incomplete statements without
> issuing an error (i.e. a WITH or USING clause at the end of a \copy
> statement will simply be ignored, and no error generated)

Why can't you check for this?

-Neil


Re: Patch to bring \copy syntax more in line with SQL copy

From
Bill Moran
Date:
Neil Conway wrote:
> A few quick comments:
>
> - don't use C++-style comments

Oops ... sorry ... been doing too much PHP.

BTW: Why is this frowned apon so?  Are there compilers that have
problems with it?  In my own code, I generally use // and /* */
interchangeably, and I've never had any problems (with C).  Yet,
this isn't the first time I've submitted a patch and had it
pointed out that I shouldn't use C++ comments in C.

> - please update the documentation

I'm not aware that the documentation needs updated.  I changed the
code to be up to date with the docs I found.  If there's docs that
need updated, please point me specifically to them and I'll be
happy to generate a patch to the best of my ability.

> - update the comment at the top of copy.c

Oops ... got it.

> Bill Moran <wmoran@potentialtech.com> writes:
>
>>Ideally, this should cover all cases of old and new syntax, except
>>where "AS" is present.
>
> ISTM it would be easy to allow for an optional 'AS' token following
> the 'DELIMITER[S]' token.

Well ... keep in mind that I'm not as familiar with the code as you ...
this is my first hack into PostgreSQL, and I'm not terribly familiar
with the code yet.

Now that I'm looking over the code for strtokx(), I'm seeing that
you're absolutely right, it won't take much to accept the optional
AS.

>>The only drawback I can see is that \copy is now more liberal in
>>what it accepts, and may accept incomplete statements without
>>issuing an error (i.e. a WITH or USING clause at the end of a \copy
>>statement will simply be ignored, and no error generated)
>
> Why can't you check for this?

I suppose I can.  It's just that the code falls together so simply
without the check, and I can't see anyway to keep it that simple while
still checking ... but I suppose it'd be better to have it fail
properly than to save a few lines of code.

Thanks for the quick feedback.  I hope you didn't see that post to
hackers and take it as me bitching and whining or anything, it's just
that I'm not 100% familiar with how things flow within the PostgreSQL
group, and I was curious.

A revised patch will be forthcoming, as soon as I find some time ...
a day or two probably, over the weekend as worst case scenerio.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


Re: Patch to bring \copy syntax more in line with SQL

From
Neil Conway
Date:
Bill Moran <wmoran@potentialtech.com> writes:
> Neil Conway wrote:
>> - don't use C++-style comments
>
> Oops ... sorry ... been doing too much PHP.
>
> BTW: Why is this frowned apon so?  Are there compilers that have
> problems with it?

IMHO it's not so much that it's considered THAT evil, it's just an
easy thing for reviewers to spot :-) C++-style comments aren't C89
(although they are C99), so plenty of oldish compilers would be well
within their rights to reject code that uses that style of
commenting. Since the rest of the code is consistent in not using
these comments, there's really no benefit to using them that I can
see.

> I'm not aware that the documentation needs updated.  I changed the
> code to be up to date with the docs I found.  If there's docs that
> need updated, please point me specifically to them and I'll be happy
> to generate a patch to the best of my ability.

doc/src/sgml/ref/psql-ref.sgml, circa line 705, obviously needs
updating. That's the only place I could see, although I didn't spend
much time looking.

> Thanks for the quick feedback.  I hope you didn't see that post to
> hackers and take it as me bitching and whining or anything, it's just
> that I'm not 100% familiar with how things flow within the PostgreSQL
> group, and I was curious.

No, it just reminded me about the patch. As for asking questions about
procedure, that's fine with me (more often than not it's I who's
asking :P)

-Neil


Re: Patch to bring \copy syntax more in line with SQL copy

From
Peter Eisentraut
Date:
Am Mittwoch, 21. Januar 2004 02:16 schrieb Bill Moran:
> BTW: Why is this frowned apon so?  Are there compilers that have
> problems with it?

Yes.  Search the archives for "AIX". :-)


Re: Patch to bring \copy syntax more in line with SQL copy

From
Dennis Bjorklund
Date:
On Wed, 21 Jan 2004, Peter Eisentraut wrote:

> Am Mittwoch, 21. Januar 2004 02:16 schrieb Bill Moran:
> > BTW: Why is this frowned apon so?  Are there compilers that have
> > problems with it?
>
> Yes.  Search the archives for "AIX". :-)

I googled a little for the aix compiler and it looks like it supports both
c99 if one tells it to (-langlvl=stdc99) and a just c++ comments if
one wants (-qcpluscmt).

Then again, I'm sure there are lots of old compilers out there, that does
not support it. Personally I would be happy to just demand c99 but I don't
think I get to choose.

--
/Dennis Björklund


Re: Patch to bring \copy syntax more in line with SQL copy

From
Tom Lane
Date:
Bill Moran <wmoran@potentialtech.com> writes:
> A revised patch will be forthcoming, as soon as I find some time ...

BTW, it would be good to update to CVS HEAD before you revise the patch,
as I just yesterday applied a couple of other patches in the same
general area.  I don't think there were any code conflicts, but if you
do anything in psql-ref.sgml without having synced, you will likely have
a patch conflict there.

            regards, tom lane

[update] Re: Patch to bring \copy syntax more in line with SQL copy

From
Bill Moran
Date:
Please find a pair of updated patches attached.

The first is against src/bin/pgsql/copy.c  It changes the acceptable
syntax for \copy to allow the following:

[ [with|using] delimiter[s] [as] delimiter ]

Improvements over the previously posted patch:
1) c++ style comments removed.
2) Comments with explanation of the syntax have been updated.
3) Cases of dangling [with|using] will be caught and complained about.
4) The optional [as] is handled.

As far as I can tell, this should make the \copy syntax equivalent to the
SQL copy syntax (as described in the docs) while still maintaining
backward compatibility with older syntaxes.

Lucky for me, I'm working on a project that uses \copy to import a lot of
test data, so I made it a point to try every combination of the above
syntax that I could imagine, as well as testing the dangling [with|using]
to ensure it complained.  It works properly as well as I can tell.

The second is against doc/src/sgml/ref/psql-ref.sgml, and only changes the
explaination of the \copy syntax.  I can't seem to get the doc build process
to work on my working copy, so I can't be sure that this is OK, but it's
a trivial enough change that it's probably right.

Both patches are against the most recent versions in CVS at the time of
diffing.

Bill Moran wrote:
> Neil Conway wrote:
>
>> A few quick comments:
>>
>> - don't use C++-style comments
>
>
> Oops ... sorry ... been doing too much PHP.
>
> BTW: Why is this frowned apon so?  Are there compilers that have
> problems with it?  In my own code, I generally use // and /* */
> interchangeably, and I've never had any problems (with C).  Yet,
> this isn't the first time I've submitted a patch and had it
> pointed out that I shouldn't use C++ comments in C.
>
>> - please update the documentation
>
>
> I'm not aware that the documentation needs updated.  I changed the
> code to be up to date with the docs I found.  If there's docs that
> need updated, please point me specifically to them and I'll be
> happy to generate a patch to the best of my ability.
>
>> - update the comment at the top of copy.c
>
>
> Oops ... got it.
>
>> Bill Moran <wmoran@potentialtech.com> writes:
>>
>>> Ideally, this should cover all cases of old and new syntax, except
>>> where "AS" is present.
>>
>>
>> ISTM it would be easy to allow for an optional 'AS' token following
>> the 'DELIMITER[S]' token.
>
>
> Well ... keep in mind that I'm not as familiar with the code as you ...
> this is my first hack into PostgreSQL, and I'm not terribly familiar
> with the code yet.
>
> Now that I'm looking over the code for strtokx(), I'm seeing that
> you're absolutely right, it won't take much to accept the optional
> AS.
>
>>> The only drawback I can see is that \copy is now more liberal in
>>> what it accepts, and may accept incomplete statements without
>>> issuing an error (i.e. a WITH or USING clause at the end of a \copy
>>> statement will simply be ignored, and no error generated)
>>
>>
>> Why can't you check for this?
>
>
> I suppose I can.  It's just that the code falls together so simply
> without the check, and I can't see anyway to keep it that simple while
> still checking ... but I suppose it'd be better to have it fail
> properly than to save a few lines of code.
>
> Thanks for the quick feedback.  I hope you didn't see that post to
> hackers and take it as me bitching and whining or anything, it's just
> that I'm not 100% familiar with how things flow within the PostgreSQL
> group, and I was curious.
>
> A revised patch will be forthcoming, as soon as I find some time ...
> a day or two probably, over the weekend as worst case scenerio.
>


--
Bill Moran
Potential Technologies
http://www.potentialtech.com
*** copy.c.old    Sat Jan 17 22:17:58 2004
--- copy.c    Sat Jan 24 11:21:55 2004
***************
*** 35,41 ****
   * parse_slash_copy
   * -- parses \copy command line
   *
!  * Accepted syntax: \copy table [(columnlist)] [with oids] from|to filename [with ] [ oids ] [ delimiter char] [ null
asstring ] 
   * (binary is not here yet)
   *
   * Old syntax for backward compatibility: (2002-06-19):
--- 35,41 ----
   * parse_slash_copy
   * -- parses \copy command line
   *
!  * Accepted syntax: \copy table [(columnlist)] [with oids] from|to filename [ with ] [ oids ] [[ with | using ]
delimiter[s][as] char] [ null as string ] 
   * (binary is not here yet)
   *
   * Old syntax for backward compatibility: (2002-06-19):
***************
*** 101,106 ****
--- 101,107 ----
      char       *line;
      char       *token;
      const char *whitespace = " \t\n\r";
+     int        havewith = 0;

      if (args)
          line = xstrdup(args);
***************
*** 226,284 ****
      token = strtokx(NULL, whitespace, NULL, NULL,
                      0, false, pset.encoding);

!     /*
!      * Allows old COPY syntax for backward compatibility 2002-06-19
!      */
!     if (token && strcasecmp(token, "using") == 0)
      {
          token = strtokx(NULL, whitespace, NULL, NULL,
                          0, false, pset.encoding);
!         if (!(token && strcasecmp(token, "delimiters") == 0))
!             goto error;
          token = strtokx(NULL, whitespace, NULL, "'",
                          '\\', false, pset.encoding);
          if (!token)
              goto error;
          result->delim = xstrdup(token);
-         token = strtokx(NULL, whitespace, NULL, NULL,
-                         0, false, pset.encoding);
-     }
-
-     if (token)
-     {
-         if (strcasecmp(token, "with") != 0)
-             goto error;
-         while ((token = strtokx(NULL, whitespace, NULL, NULL,
-                                 0, false, pset.encoding)) != NULL)
-         {
-             if (strcasecmp(token, "delimiter") == 0)
-             {
-                 token = strtokx(NULL, whitespace, NULL, "'",
-                                 '\\', false, pset.encoding);
-                 if (token && strcasecmp(token, "as") == 0)
-                     token = strtokx(NULL, whitespace, NULL, "'",
-                                     '\\', false, pset.encoding);
-                 if (token)
-                     result->delim = xstrdup(token);
-                 else
-                     goto error;
-             }
-             else if (strcasecmp(token, "null") == 0)
-             {
-                 token = strtokx(NULL, whitespace, NULL, "'",
-                                 '\\', false, pset.encoding);
-                 if (token && strcasecmp(token, "as") == 0)
-                     token = strtokx(NULL, whitespace, NULL, "'",
-                                     '\\', false, pset.encoding);
-                 if (token)
-                     result->null = xstrdup(token);
-                 else
-                     goto error;
-             }
-             else
-                 goto error;
-         }
      }

      free(line);

--- 227,266 ----
      token = strtokx(NULL, whitespace, NULL, NULL,
                      0, false, pset.encoding);

!     /* Discard both "with" and "using" as syntactically neutral */
!     if (token &&
!         (strcasecmp(token, "using") == 0 ||
!          strcasecmp(token, "with") == 0))
      {
          token = strtokx(NULL, whitespace, NULL, NULL,
                          0, false, pset.encoding);
!         havewith = 1;
!     }
!
!     /*
!      * Allow both "delimiter" and "delimiters"
!      */
!     if (token &&
!         (strcasecmp(token, "delimiters") == 0 ||
!          strcasecmp(token, "delimiter") == 0))
!     {
          token = strtokx(NULL, whitespace, NULL, "'",
                          '\\', false, pset.encoding);
+         /* Discard "as" as syntactically neutral */
+         if (strcasecmp(token, "as") == 0)
+         {
+             token = strtokx(NULL, whitespace, NULL, "'",
+                             '\\', false, pset.encoding);
+         }
          if (!token)
              goto error;
          result->delim = xstrdup(token);
      }
+     else
+     {
+         if (havewith)
+             goto error;
+     }

      free(line);

*** psql-ref.sgml.old    Sun Jan 25 18:45:11 2004
--- psql-ref.sgml    Sun Jan 25 18:47:11 2004
***************
*** 708,715 ****
      { <replaceable class="parameter">filename</replaceable> | stdin | stdout | - }
          [ <literal>with</literal> ]
              [ <literal>oids</literal> ]
!             [ <literal>delimiter [as] </literal> '<replaceable class="parameter">character</replaceable>' ]
!             [ <literal>null [as] </literal> '<replaceable class="parameter">string</replaceable>' ]</literal>
          </term>

          <listitem>
--- 708,715 ----
      { <replaceable class="parameter">filename</replaceable> | stdin | stdout | - }
          [ <literal>with</literal> ]
              [ <literal>oids</literal> ]
!             [ [ <literal>with</literal> | <literal>using</literal> ]
!             <literal>delimiter[s] [as] </literal> '<replaceable class="parameter">character</replaceable>' ]
 [ <literal>null [as] </literal> '<replaceable class="parameter">string</replaceable>' ]</literal> 
          </term>

          <listitem>

Re: [update] Re: Patch to bring \copy syntax more in line with SQL copy

From
Tom Lane
Date:
Bill Moran <wmoran@potentialtech.com> writes:
> As far as I can tell, this should make the \copy syntax equivalent to the
> SQL copy syntax (as described in the docs) while still maintaining
> backward compatibility with older syntaxes.

On reviewing this, I see that the existing code is much closer to what
the backend actually accepts (see backend/parser/gram.y) than your
proposed change.  As far as I can see, the only bugs in the psql code
are that it doesn't treat USING and WITH as optional in the right
places, and it doesn't allow OIDS in the WITH loop.  I've applied the
attached patch to bring psql into exact agreement with what the backend
actually does.

> The second is against doc/src/sgml/ref/psql-ref.sgml, and only changes the
> explaination of the \copy syntax.

psql-ref presently documents only the "preferred" syntax and not any of
the backwards-compatibility options.  I'm inclined to leave it alone.
If we do want to be more complete we should probably do what the COPY
reference page does, which is to describe the old syntax separately.

            regards, tom lane


*** src/bin/psql/copy.c.orig    Mon Jan 26 17:35:32 2004
--- src/bin/psql/copy.c    Wed Jan 28 17:10:13 2004
***************
*** 36,46 ****
   * parse_slash_copy
   * -- parses \copy command line
   *
!  * Accepted syntax: \copy table [(columnlist)] [with oids] from|to filename [with ] [ oids ] [ delimiter char] [ null
asstring ] 
   * (binary is not here yet)
   *
!  * Old syntax for backward compatibility: (2002-06-19):
!  * \copy table [(columnlist)] [with oids] from|to filename [ using delimiters char] [ with null as string ]
   *
   * table name can be double-quoted and can have a schema part.
   * column names can be double-quoted.
--- 36,54 ----
   * parse_slash_copy
   * -- parses \copy command line
   *
!  * The documented preferred syntax is:
!  *    \copy tablename [(columnlist)] from|to filename
!  *        [ with ] [ oids ] [ delimiter [as] char ] [ null [as] string ]
   * (binary is not here yet)
   *
!  * The pre-7.3 syntax was:
!  *    \copy tablename [(columnlist)] [with oids] from|to filename
!  *        [ [using] delimiters char ] [ with null as string ]
!  *
!  * The actual accepted syntax is a rather unholy combination of these,
!  * plus some undocumented flexibility (for instance, the clauses after
!  * WITH can appear in any order).  The accepted syntax matches what
!  * the backend grammar actually accepts (see backend/parser/gram.y).
   *
   * table name can be double-quoted and can have a schema part.
   * column names can be double-quoted.
***************
*** 243,248 ****
--- 251,259 ----
                          0, false, pset.encoding);
          if (!(token && strcasecmp(token, "delimiters") == 0))
              goto error;
+     }
+     if (token && strcasecmp(token, "delimiters") == 0)
+     {
          token = strtokx(NULL, whitespace, NULL, "'",
                          '\\', false, pset.encoding);
          if (!token)
***************
*** 254,265 ****

      if (token)
      {
!         if (strcasecmp(token, "with") != 0)
!             goto error;
!         while ((token = strtokx(NULL, whitespace, NULL, NULL,
!                                 0, false, pset.encoding)) != NULL)
          {
!             if (strcasecmp(token, "delimiter") == 0)
              {
                  token = strtokx(NULL, whitespace, NULL, "'",
                                  '\\', false, pset.encoding);
--- 265,286 ----

      if (token)
      {
!         /*
!          * WITH is optional.  Also, the backend will allow WITH followed by
!          * nothing, so we do too.
!          */
!         if (strcasecmp(token, "with") == 0)
!             token = strtokx(NULL, whitespace, NULL, NULL,
!                             0, false, pset.encoding);
!
!         while (token)
          {
!             /* someday allow BINARY here */
!             if (strcasecmp(token, "oids") == 0)
!             {
!                 result->oids = true;
!             }
!             else if (strcasecmp(token, "delimiter") == 0)
              {
                  token = strtokx(NULL, whitespace, NULL, "'",
                                  '\\', false, pset.encoding);
***************
*** 285,290 ****
--- 306,314 ----
              }
              else
                  goto error;
+
+             token = strtokx(NULL, whitespace, NULL, NULL,
+                             0, false, pset.encoding);
          }
      }