Adding \x escape processing to COPY, psql, backend - Mailing list pgsql-patches

From Bruce Momjian
Subject Adding \x escape processing to COPY, psql, backend
Date
Msg-id 200505301933.j4UJXcK17113@candle.pha.pa.us
Whole thread Raw
In response to Re: [HACKERS] patches for items from TODO list  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: Adding \x escape processing to COPY, psql, backend
List pgsql-patches
Bruce Momjian wrote:
> Here is an updated version of the COPY \x patch.  It is the first patch
> attached.
>
> Also, I realized that if we support \x in COPY, we should also support
> \x in strings to the backend.  This is the second patch.

Here is a new version of the three \x hex support patches.  I have added
\x for psql variables, which is the last patch.

I have IM'ed with Peter and he is now OK with the idea of supporting \x,
with the underestanding that it doesn't take us any farther away from
compatibility than we are now.

I have already fixed the psql octal/decimal/hex behavior in another
patch.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: doc/src/sgml/ref/copy.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.65
diff -c -c -r1.65 copy.sgml
*** doc/src/sgml/ref/copy.sgml    7 May 2005 02:22:45 -0000    1.65
--- doc/src/sgml/ref/copy.sgml    28 May 2005 14:02:59 -0000
***************
*** 424,436 ****
         <entry>Backslash followed by one to three octal digits specifies
         the character with that numeric code</entry>
        </row>
       </tbody>
      </tgroup>
     </informaltable>

!     Presently, <command>COPY TO</command> will never emit an octal-digits
!     backslash sequence, but it does use the other sequences listed above
!     for those control characters.
     </para>

     <para>
--- 424,441 ----
         <entry>Backslash followed by one to three octal digits specifies
         the character with that numeric code</entry>
        </row>
+       <row>
+        <entry><literal>\x</><replaceable>digits</></entry>
+        <entry>Backslash <literal>x</> followed by one or two hex digits specifies
+        the character with that numeric code</entry>
+       </row>
       </tbody>
      </tgroup>
     </informaltable>

!     Presently, <command>COPY TO</command> will never emit an octal or
!     hex-digits backslash sequence, but it does use the other sequences
!     listed above for those control characters.
     </para>

     <para>
Index: src/backend/commands/copy.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.244
diff -c -c -r1.244 copy.c
*** src/backend/commands/copy.c    7 May 2005 02:22:46 -0000    1.244
--- src/backend/commands/copy.c    28 May 2005 14:03:07 -0000
***************
*** 2274,2279 ****
--- 2274,2291 ----
      return result;
  }

+ /*
+  *    Return decimal value for a hexadecimal digit
+  */
+ static
+ int GetDecimalFromHex(char hex)
+ {
+     if (isdigit(hex))
+         return hex - '0';
+     else
+         return tolower(hex) - 'a' + 10;
+ }
+
  /*----------
   * Read the value of a single attribute, performing de-escaping as needed.
   *
***************
*** 2335,2340 ****
--- 2347,2353 ----
                  case '5':
                  case '6':
                  case '7':
+                     /* handle \013 */
                      {
                          int            val;

***************
*** 2360,2365 ****
--- 2373,2402 ----
                          c = val & 0377;
                      }
                      break;
+                 case 'x':
+                     /* Handle \x3F */
+                     if (line_buf.cursor < line_buf.len)
+                     {
+                         char hexchar = line_buf.data[line_buf.cursor];
+
+                         if (isxdigit(hexchar))
+                         {
+                             int val = GetDecimalFromHex(hexchar);
+
+                             line_buf.cursor++;
+                             if (line_buf.cursor < line_buf.len)
+                             {
+                                 hexchar = line_buf.data[line_buf.cursor];
+                                 if (isxdigit(hexchar))
+                                 {
+                                     line_buf.cursor++;
+                                     val = (val << 4) + GetDecimalFromHex(hexchar);
+                                 }
+                             }
+                             c = val & 0xff;
+                         }
+                     }
+                     break;
                  case 'b':
                      c = '\b';
                      break;

Index: doc/src/sgml/syntax.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v
retrieving revision 1.99
diff -c -c -r1.99 syntax.sgml
*** doc/src/sgml/syntax.sgml    23 Dec 2004 05:37:40 -0000    1.99
--- doc/src/sgml/syntax.sgml    28 May 2005 14:02:58 -0000
***************
*** 254,270 ****

      <para>
       Another <productname>PostgreSQL</productname> extension is that
!      C-style backslash escapes are available:
!      <literal>\b</literal> is a backspace, <literal>\f</literal> is a
!      form feed, <literal>\n</literal> is a newline,
!      <literal>\r</literal> is a carriage return, <literal>\t</literal>
!      is a tab, and <literal>\<replaceable>xxx</replaceable></literal>,
!      where <replaceable>xxx</replaceable> is an octal number, is a
!      byte with the corresponding code.  (It is your responsibility
!      that the byte sequences you create are valid characters in the
!      server character set encoding.)  Any other character following a
!      backslash is taken literally.  Thus, to include a backslash in a
!      string constant, write two backslashes.
      </para>

      <para>
--- 254,271 ----

      <para>
       Another <productname>PostgreSQL</productname> extension is that
!      C-style backslash escapes are available: <literal>\b</literal> is a
!      backspace, <literal>\f</literal> is a form feed,
!      <literal>\n</literal> is a newline, <literal>\r</literal> is a
!      carriage return, <literal>\t</literal> is a tab. Also supported is
!      <literal>\<replaceable>digits</replaceable></literal>, where
!      <replaceable>ddd</replaceable> represents an octal byte value, and
!      <literal>\x<replaceable>hexdigits</replaceable></literal>, where
!      <replaceable>hexdigits</replaceable> represents a hexadecimal byte value.
!      (It is your responsibility that the byte sequences you create are
!      valid characters in the server character set encoding.) Any other
!      character following a backslash is taken literally. Thus, to
!      include a backslash in a string constant, write two backslashes.
      </para>

      <para>
Index: src/backend/parser/scan.l
===================================================================
RCS file: /cvsroot/pgsql/src/backend/parser/scan.l,v
retrieving revision 1.122
diff -c -c -r1.122 scan.l
*** src/backend/parser/scan.l    26 May 2005 01:24:29 -0000    1.122
--- src/backend/parser/scan.l    28 May 2005 14:03:10 -0000
***************
*** 193,200 ****
  xqstart            {quote}
  xqdouble        {quote}{quote}
  xqinside        [^\\']+
! xqescape        [\\][^0-7]
  xqoctesc        [\\][0-7]{1,3}

  /* $foo$ style quotes ("dollar quoting")
   * The quoted string starts with $foo$ where "foo" is an optional string
--- 193,201 ----
  xqstart            {quote}
  xqdouble        {quote}{quote}
  xqinside        [^\\']+
! xqescape        [\\][^0-7x]
  xqoctesc        [\\][0-7]{1,3}
+ xqhexesc        [\\]x[0-9A-Fa-f]{1,2}

  /* $foo$ style quotes ("dollar quoting")
   * The quoted string starts with $foo$ where "foo" is an optional string
***************
*** 435,440 ****
--- 436,445 ----
                      unsigned char c = strtoul(yytext+1, NULL, 8);
                      addlitchar(c);
                  }
+ <xq>{xqhexesc}  {
+                     unsigned char c = strtoul(yytext+2, NULL, 16);
+                     addlitchar(c);
+                 }
  <xq>{quotecontinue} {
                      /* ignore */
                  }
Index: doc/src/sgml/ref/psql-ref.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/psql-ref.sgml,v
retrieving revision 1.137
diff -c -c -r1.137 psql-ref.sgml
*** doc/src/sgml/ref/psql-ref.sgml    30 May 2005 15:24:23 -0000    1.137
--- doc/src/sgml/ref/psql-ref.sgml    30 May 2005 19:07:20 -0000
***************
*** 589,596 ****
      single quote. To include a single quote into such an argument,
      precede it by a backslash. Anything contained in single quotes is
      furthermore subject to C-like substitutions for
!     <literal>\n</literal> (new line), <literal>\t</literal> (tab), and
!     <literal>\</literal><replaceable>digits</replaceable> (octal).
      </para>

      <para>
--- 589,597 ----
      single quote. To include a single quote into such an argument,
      precede it by a backslash. Anything contained in single quotes is
      furthermore subject to C-like substitutions for
!     <literal>\n</literal> (new line), <literal>\t</literal> (tab),
!     <literal>\</literal><replaceable>digits</replaceable> (octal),
!     <literal>\x</literal><replaceable>digits</replaceable> (hexadecimal).
      </para>

      <para>
Index: src/bin/psql/psqlscan.l
===================================================================
RCS file: /cvsroot/pgsql/src/bin/psql/psqlscan.l,v
retrieving revision 1.12
diff -c -c -r1.12 psqlscan.l
*** src/bin/psql/psqlscan.l    30 May 2005 16:48:47 -0000    1.12
--- src/bin/psql/psqlscan.l    30 May 2005 19:07:25 -0000
***************
*** 250,257 ****
  xqstart            {quote}
  xqdouble        {quote}{quote}
  xqinside        [^\\']+
! xqescape        [\\][^0-7]
  xqoctesc        [\\][0-7]{1,3}

  /* $foo$ style quotes ("dollar quoting")
   * The quoted string starts with $foo$ where "foo" is an optional string
--- 250,258 ----
  xqstart            {quote}
  xqdouble        {quote}{quote}
  xqinside        [^\\']+
! xqescape        [\\][^0-7x]
  xqoctesc        [\\][0-7]{1,3}
+ xqhexesc        [\\]x[0-9A-Fa-f]{1,2}

  /* $foo$ style quotes ("dollar quoting")
   * The quoted string starts with $foo$ where "foo" is an optional string
***************
*** 467,472 ****
--- 468,476 ----
  <xq>{xqoctesc}  {
                      ECHO;
                  }
+ <xq>{xqhexesc}  {
+                     ECHO;
+                 }
  <xq>{quotecontinue} {
                      ECHO;
                  }
***************
*** 855,860 ****
--- 859,870 ----
                                            (char) strtol(yytext + 1, NULL, 8));
                  }

+ {xqhexesc}        {
+                     /* hex case */
+                     appendPQExpBufferChar(output_buf,
+                                           (char) strtol(yytext + 2, NULL, 16));
+                 }
+
  "\\".            { emit(yytext + 1, 1); }

  {other}|\n        { ECHO; }

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Support only octal for psql PROMPT % values
Next
From: Bruce Momjian
Date:
Subject: Re: [ADMIN] Config option log_statement