Re: [HACKERS] patches for items from TODO list - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: [HACKERS] patches for items from TODO list
Date
Msg-id 200505281703.j4SH3qk14807@candle.pha.pa.us
Whole thread Raw
Responses Re: [HACKERS] patches for items from TODO list
Adding \x escape processing to COPY, psql, backend
List pgsql-patches
Here is an updated version of the COPY \x patch.  It is the first patch
attached.

Also, I realized that if we support \x in COPY, we should also support
\x in strings to the backend.  This is the second patch.

Third, I found out that psql has some unusual handling of escaped
numbers.  Instead of using \ddd as octal, it has \ddd is decimal, \0ddd
is octal, and \0xddd is decimal.  It is basically following the strtol()
rules for an escaped value.  This seems confusing and contradicts how
the rest of our system works. I looked at 'bash' and found that it
supports the \000 and \x00 just like C, so I am confused why we have
the current behavior.  This patch makes psql consistent with the rest of
our system for back slashes.  It does break backward compatibility.  It
wouldn't be a big deal to fix, except we document this in the psql
manual page, and that adds confusion.

FYI, here is the current psql behavior:

    test=> \set x '\42'
    test=> \echo :x
    *
    test=> \set x '\042'
    test=> \echo :x
    "
    test=> \set x '\0x42'
    test=> \echo :x
    B

The new behavior is:

    test=> \set x '\42'
    test=> \echo :x
    "
    test=> \set x '\042'
    test=> \echo :x
    "
    test=> \set x '\x42'
    test=> \echo :x
    B

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: doc/src/sgml/ref/copy.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.65
diff -c -c -r1.65 copy.sgml
*** doc/src/sgml/ref/copy.sgml    7 May 2005 02:22:45 -0000    1.65
--- doc/src/sgml/ref/copy.sgml    28 May 2005 14:02:59 -0000
***************
*** 424,436 ****
         <entry>Backslash followed by one to three octal digits specifies
         the character with that numeric code</entry>
        </row>
       </tbody>
      </tgroup>
     </informaltable>

!     Presently, <command>COPY TO</command> will never emit an octal-digits
!     backslash sequence, but it does use the other sequences listed above
!     for those control characters.
     </para>

     <para>
--- 424,441 ----
         <entry>Backslash followed by one to three octal digits specifies
         the character with that numeric code</entry>
        </row>
+       <row>
+        <entry><literal>\x</><replaceable>digits</></entry>
+        <entry>Backslash <literal>x</> followed by one or two hex digits specifies
+        the character with that numeric code</entry>
+       </row>
       </tbody>
      </tgroup>
     </informaltable>

!     Presently, <command>COPY TO</command> will never emit an octal or
!     hex-digits backslash sequence, but it does use the other sequences
!     listed above for those control characters.
     </para>

     <para>
Index: src/backend/commands/copy.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.244
diff -c -c -r1.244 copy.c
*** src/backend/commands/copy.c    7 May 2005 02:22:46 -0000    1.244
--- src/backend/commands/copy.c    28 May 2005 14:03:07 -0000
***************
*** 2274,2279 ****
--- 2274,2291 ----
      return result;
  }

+ /*
+  *    Return decimal value for a hexadecimal digit
+  */
+ static
+ int GetDecimalFromHex(char hex)
+ {
+     if (isdigit(hex))
+         return hex - '0';
+     else
+         return tolower(hex) - 'a' + 10;
+ }
+
  /*----------
   * Read the value of a single attribute, performing de-escaping as needed.
   *
***************
*** 2335,2340 ****
--- 2347,2353 ----
                  case '5':
                  case '6':
                  case '7':
+                     /* handle \013 */
                      {
                          int            val;

***************
*** 2360,2365 ****
--- 2373,2402 ----
                          c = val & 0377;
                      }
                      break;
+                 case 'x':
+                     /* Handle \x3F */
+                     if (line_buf.cursor < line_buf.len)
+                     {
+                         char hexchar = line_buf.data[line_buf.cursor];
+
+                         if (isxdigit(hexchar))
+                         {
+                             int val = GetDecimalFromHex(hexchar);
+
+                             line_buf.cursor++;
+                             if (line_buf.cursor < line_buf.len)
+                             {
+                                 hexchar = line_buf.data[line_buf.cursor];
+                                 if (isxdigit(hexchar))
+                                 {
+                                     line_buf.cursor++;
+                                     val = (val << 4) + GetDecimalFromHex(hexchar);
+                                 }
+                             }
+                             c = val & 0xff;
+                         }
+                     }
+                     break;
                  case 'b':
                      c = '\b';
                      break;

Index: doc/src/sgml/datatype.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v
retrieving revision 1.157
diff -c -c -r1.157 datatype.sgml
*** doc/src/sgml/datatype.sgml    1 May 2005 15:54:46 -0000    1.157
--- doc/src/sgml/datatype.sgml    28 May 2005 14:02:40 -0000
***************
*** 1118,1124 ****
     <para>
      When entering <type>bytea</type> values, octets of certain values
      <emphasis>must</emphasis> be escaped (but all octet values
!     <emphasis>may</emphasis> be escaped) when used as part of a string
      literal in an <acronym>SQL</acronym> statement. In general, to
      escape an octet, it is converted into the three-digit octal number
      equivalent of its decimal octet value, and preceded by two
--- 1118,1124 ----
     <para>
      When entering <type>bytea</type> values, octets of certain values
      <emphasis>must</emphasis> be escaped (but all octet values
!     <emphasis>can</emphasis> be escaped) when used as part of a string
      literal in an <acronym>SQL</acronym> statement. In general, to
      escape an octet, it is converted into the three-digit octal number
      equivalent of its decimal octet value, and preceded by two
Index: doc/src/sgml/libpq.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/libpq.sgml,v
retrieving revision 1.180
diff -c -c -r1.180 libpq.sgml
*** doc/src/sgml/libpq.sgml    26 Feb 2005 18:39:04 -0000    1.180
--- doc/src/sgml/libpq.sgml    28 May 2005 14:02:46 -0000
***************
*** 2229,2235 ****

  <para>
     Certain byte values <emphasis>must</emphasis> be escaped (but all
!    byte values <emphasis>may</emphasis> be escaped) when used as part
     of a <type>bytea</type> literal in an <acronym>SQL</acronym>
     statement. In general, to escape a byte, it is converted into the
     three digit octal number equal to the octet value, and preceded by
--- 2229,2235 ----

  <para>
     Certain byte values <emphasis>must</emphasis> be escaped (but all
!    byte values <emphasis>can</emphasis> be escaped) when used as part
     of a <type>bytea</type> literal in an <acronym>SQL</acronym>
     statement. In general, to escape a byte, it is converted into the
     three digit octal number equal to the octet value, and preceded by
Index: doc/src/sgml/syntax.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v
retrieving revision 1.99
diff -c -c -r1.99 syntax.sgml
*** doc/src/sgml/syntax.sgml    23 Dec 2004 05:37:40 -0000    1.99
--- doc/src/sgml/syntax.sgml    28 May 2005 14:02:58 -0000
***************
*** 254,270 ****

      <para>
       Another <productname>PostgreSQL</productname> extension is that
!      C-style backslash escapes are available:
!      <literal>\b</literal> is a backspace, <literal>\f</literal> is a
!      form feed, <literal>\n</literal> is a newline,
!      <literal>\r</literal> is a carriage return, <literal>\t</literal>
!      is a tab, and <literal>\<replaceable>xxx</replaceable></literal>,
!      where <replaceable>xxx</replaceable> is an octal number, is a
!      byte with the corresponding code.  (It is your responsibility
!      that the byte sequences you create are valid characters in the
!      server character set encoding.)  Any other character following a
!      backslash is taken literally.  Thus, to include a backslash in a
!      string constant, write two backslashes.
      </para>

      <para>
--- 254,271 ----

      <para>
       Another <productname>PostgreSQL</productname> extension is that
!      C-style backslash escapes are available: <literal>\b</literal> is a
!      backspace, <literal>\f</literal> is a form feed,
!      <literal>\n</literal> is a newline, <literal>\r</literal> is a
!      carriage return, <literal>\t</literal> is a tab. Also supported is
!      <literal>\<replaceable>digits</replaceable></literal>, where
!      <replaceable>ddd</replaceable> represents an octal byte value, and
!      <literal>\x<replaceable>hexdigits</replaceable></literal>, where
!      <replaceable>hexdigits</replaceable> represents a hexadecimal byte value.
!      (It is your responsibility that the byte sequences you create are
!      valid characters in the server character set encoding.) Any other
!      character following a backslash is taken literally. Thus, to
!      include a backslash in a string constant, write two backslashes.
      </para>

      <para>
Index: src/backend/parser/scan.l
===================================================================
RCS file: /cvsroot/pgsql/src/backend/parser/scan.l,v
retrieving revision 1.122
diff -c -c -r1.122 scan.l
*** src/backend/parser/scan.l    26 May 2005 01:24:29 -0000    1.122
--- src/backend/parser/scan.l    28 May 2005 14:03:10 -0000
***************
*** 193,200 ****
  xqstart            {quote}
  xqdouble        {quote}{quote}
  xqinside        [^\\']+
! xqescape        [\\][^0-7]
  xqoctesc        [\\][0-7]{1,3}

  /* $foo$ style quotes ("dollar quoting")
   * The quoted string starts with $foo$ where "foo" is an optional string
--- 193,201 ----
  xqstart            {quote}
  xqdouble        {quote}{quote}
  xqinside        [^\\']+
! xqescape        [\\][^0-7x]
  xqoctesc        [\\][0-7]{1,3}
+ xqhexesc        [\\]x[0-9A-Fa-f]{1,2}

  /* $foo$ style quotes ("dollar quoting")
   * The quoted string starts with $foo$ where "foo" is an optional string
***************
*** 435,440 ****
--- 436,445 ----
                      unsigned char c = strtoul(yytext+1, NULL, 8);
                      addlitchar(c);
                  }
+ <xq>{xqhexesc}  {
+                     unsigned char c = strtoul(yytext+2, NULL, 16);
+                     addlitchar(c);
+                 }
  <xq>{quotecontinue} {
                      /* ignore */
                  }
Index: doc/src/sgml/ref/psql-ref.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/psql-ref.sgml,v
retrieving revision 1.135
diff -c -c -r1.135 psql-ref.sgml
*** doc/src/sgml/ref/psql-ref.sgml    28 Apr 2005 13:09:59 -0000    1.135
--- doc/src/sgml/ref/psql-ref.sgml    28 May 2005 16:23:01 -0000
***************
*** 590,599 ****
      precede it by a backslash. Anything contained in single quotes is
      furthermore subject to C-like substitutions for
      <literal>\n</literal> (new line), <literal>\t</literal> (tab),
!     <literal>\</literal><replaceable>digits</replaceable>,
!     <literal>\0</literal><replaceable>digits</replaceable>, and
!     <literal>\0x</literal><replaceable>digits</replaceable> (the
!     character with the given decimal, octal, or hexadecimal code).
      </para>

      <para>
--- 590,598 ----
      precede it by a backslash. Anything contained in single quotes is
      furthermore subject to C-like substitutions for
      <literal>\n</literal> (new line), <literal>\t</literal> (tab),
!     <literal>\</literal><replaceable>digits</replaceable>, and
!     <literal>\x</literal><replaceable>digits</replaceable> (the
!     character with the given octal or hexadecimal code).
      </para>

      <para>
Index: src/bin/psql/psqlscan.l
===================================================================
RCS file: /cvsroot/pgsql/src/bin/psql/psqlscan.l,v
retrieving revision 1.10
diff -c -c -r1.10 psqlscan.l
*** src/bin/psql/psqlscan.l    26 May 2005 01:24:29 -0000    1.10
--- src/bin/psql/psqlscan.l    28 May 2005 16:23:10 -0000
***************
*** 849,877 ****
  "\\r"            { appendPQExpBufferChar(output_buf, '\r'); }
  "\\f"            { appendPQExpBufferChar(output_buf, '\f'); }

! "\\"[1-9][0-9]*    {
!                     /* decimal case */
!                     appendPQExpBufferChar(output_buf,
!                                           (char) strtol(yytext + 1, NULL, 0));
!                 }
!
! "\\"0[0-7]*        {
                      /* octal case */
                      appendPQExpBufferChar(output_buf,
!                                           (char) strtol(yytext + 1, NULL, 0));
                  }

! "\\"0[xX][0-9A-Fa-f]+    {
                      /* hex case */
                      appendPQExpBufferChar(output_buf,
!                                           (char) strtol(yytext + 1, NULL, 0));
!                 }
!
! "\\"0[xX]    {
!                     /* failed hex case */
!                     yyless(2);
!                     appendPQExpBufferChar(output_buf,
!                                           (char) strtol(yytext + 1, NULL, 0));
                  }

  "\\".            { emit(yytext + 1, 1); }
--- 849,864 ----
  "\\r"            { appendPQExpBufferChar(output_buf, '\r'); }
  "\\f"            { appendPQExpBufferChar(output_buf, '\f'); }

! "\\"[0-7]{1,3}    {
                      /* octal case */
                      appendPQExpBufferChar(output_buf,
!                                           (char) strtol(yytext + 1, NULL, 8));
                  }

! "\\"x[0-9A-Fa-f]{1,2}    {
                      /* hex case */
                      appendPQExpBufferChar(output_buf,
!                                           (char) strtol(yytext + 2, NULL, 16));
                  }

  "\\".            { emit(yytext + 1, 1); }

pgsql-patches by date:

Previous
From: Robert Treat
Date:
Subject: Re: psql backslash consistency
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] patches for items from TODO list