Re: [HACKERS] patches for items from TODO list - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Re: [HACKERS] patches for items from TODO list |
Date | |
Msg-id | 200505281703.j4SH3qk14807@candle.pha.pa.us Whole thread Raw |
Responses |
Re: [HACKERS] patches for items from TODO list
Adding \x escape processing to COPY, psql, backend |
List | pgsql-patches |
Here is an updated version of the COPY \x patch. It is the first patch attached. Also, I realized that if we support \x in COPY, we should also support \x in strings to the backend. This is the second patch. Third, I found out that psql has some unusual handling of escaped numbers. Instead of using \ddd as octal, it has \ddd is decimal, \0ddd is octal, and \0xddd is decimal. It is basically following the strtol() rules for an escaped value. This seems confusing and contradicts how the rest of our system works. I looked at 'bash' and found that it supports the \000 and \x00 just like C, so I am confused why we have the current behavior. This patch makes psql consistent with the rest of our system for back slashes. It does break backward compatibility. It wouldn't be a big deal to fix, except we document this in the psql manual page, and that adds confusion. FYI, here is the current psql behavior: test=> \set x '\42' test=> \echo :x * test=> \set x '\042' test=> \echo :x " test=> \set x '\0x42' test=> \echo :x B The new behavior is: test=> \set x '\42' test=> \echo :x " test=> \set x '\042' test=> \echo :x " test=> \set x '\x42' test=> \echo :x B -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: doc/src/sgml/ref/copy.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v retrieving revision 1.65 diff -c -c -r1.65 copy.sgml *** doc/src/sgml/ref/copy.sgml 7 May 2005 02:22:45 -0000 1.65 --- doc/src/sgml/ref/copy.sgml 28 May 2005 14:02:59 -0000 *************** *** 424,436 **** <entry>Backslash followed by one to three octal digits specifies the character with that numeric code</entry> </row> </tbody> </tgroup> </informaltable> ! Presently, <command>COPY TO</command> will never emit an octal-digits ! backslash sequence, but it does use the other sequences listed above ! for those control characters. </para> <para> --- 424,441 ---- <entry>Backslash followed by one to three octal digits specifies the character with that numeric code</entry> </row> + <row> + <entry><literal>\x</><replaceable>digits</></entry> + <entry>Backslash <literal>x</> followed by one or two hex digits specifies + the character with that numeric code</entry> + </row> </tbody> </tgroup> </informaltable> ! Presently, <command>COPY TO</command> will never emit an octal or ! hex-digits backslash sequence, but it does use the other sequences ! listed above for those control characters. </para> <para> Index: src/backend/commands/copy.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/copy.c,v retrieving revision 1.244 diff -c -c -r1.244 copy.c *** src/backend/commands/copy.c 7 May 2005 02:22:46 -0000 1.244 --- src/backend/commands/copy.c 28 May 2005 14:03:07 -0000 *************** *** 2274,2279 **** --- 2274,2291 ---- return result; } + /* + * Return decimal value for a hexadecimal digit + */ + static + int GetDecimalFromHex(char hex) + { + if (isdigit(hex)) + return hex - '0'; + else + return tolower(hex) - 'a' + 10; + } + /*---------- * Read the value of a single attribute, performing de-escaping as needed. * *************** *** 2335,2340 **** --- 2347,2353 ---- case '5': case '6': case '7': + /* handle \013 */ { int val; *************** *** 2360,2365 **** --- 2373,2402 ---- c = val & 0377; } break; + case 'x': + /* Handle \x3F */ + if (line_buf.cursor < line_buf.len) + { + char hexchar = line_buf.data[line_buf.cursor]; + + if (isxdigit(hexchar)) + { + int val = GetDecimalFromHex(hexchar); + + line_buf.cursor++; + if (line_buf.cursor < line_buf.len) + { + hexchar = line_buf.data[line_buf.cursor]; + if (isxdigit(hexchar)) + { + line_buf.cursor++; + val = (val << 4) + GetDecimalFromHex(hexchar); + } + } + c = val & 0xff; + } + } + break; case 'b': c = '\b'; break; Index: doc/src/sgml/datatype.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v retrieving revision 1.157 diff -c -c -r1.157 datatype.sgml *** doc/src/sgml/datatype.sgml 1 May 2005 15:54:46 -0000 1.157 --- doc/src/sgml/datatype.sgml 28 May 2005 14:02:40 -0000 *************** *** 1118,1124 **** <para> When entering <type>bytea</type> values, octets of certain values <emphasis>must</emphasis> be escaped (but all octet values ! <emphasis>may</emphasis> be escaped) when used as part of a string literal in an <acronym>SQL</acronym> statement. In general, to escape an octet, it is converted into the three-digit octal number equivalent of its decimal octet value, and preceded by two --- 1118,1124 ---- <para> When entering <type>bytea</type> values, octets of certain values <emphasis>must</emphasis> be escaped (but all octet values ! <emphasis>can</emphasis> be escaped) when used as part of a string literal in an <acronym>SQL</acronym> statement. In general, to escape an octet, it is converted into the three-digit octal number equivalent of its decimal octet value, and preceded by two Index: doc/src/sgml/libpq.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/libpq.sgml,v retrieving revision 1.180 diff -c -c -r1.180 libpq.sgml *** doc/src/sgml/libpq.sgml 26 Feb 2005 18:39:04 -0000 1.180 --- doc/src/sgml/libpq.sgml 28 May 2005 14:02:46 -0000 *************** *** 2229,2235 **** <para> Certain byte values <emphasis>must</emphasis> be escaped (but all ! byte values <emphasis>may</emphasis> be escaped) when used as part of a <type>bytea</type> literal in an <acronym>SQL</acronym> statement. In general, to escape a byte, it is converted into the three digit octal number equal to the octet value, and preceded by --- 2229,2235 ---- <para> Certain byte values <emphasis>must</emphasis> be escaped (but all ! byte values <emphasis>can</emphasis> be escaped) when used as part of a <type>bytea</type> literal in an <acronym>SQL</acronym> statement. In general, to escape a byte, it is converted into the three digit octal number equal to the octet value, and preceded by Index: doc/src/sgml/syntax.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v retrieving revision 1.99 diff -c -c -r1.99 syntax.sgml *** doc/src/sgml/syntax.sgml 23 Dec 2004 05:37:40 -0000 1.99 --- doc/src/sgml/syntax.sgml 28 May 2005 14:02:58 -0000 *************** *** 254,270 **** <para> Another <productname>PostgreSQL</productname> extension is that ! C-style backslash escapes are available: ! <literal>\b</literal> is a backspace, <literal>\f</literal> is a ! form feed, <literal>\n</literal> is a newline, ! <literal>\r</literal> is a carriage return, <literal>\t</literal> ! is a tab, and <literal>\<replaceable>xxx</replaceable></literal>, ! where <replaceable>xxx</replaceable> is an octal number, is a ! byte with the corresponding code. (It is your responsibility ! that the byte sequences you create are valid characters in the ! server character set encoding.) Any other character following a ! backslash is taken literally. Thus, to include a backslash in a ! string constant, write two backslashes. </para> <para> --- 254,271 ---- <para> Another <productname>PostgreSQL</productname> extension is that ! C-style backslash escapes are available: <literal>\b</literal> is a ! backspace, <literal>\f</literal> is a form feed, ! <literal>\n</literal> is a newline, <literal>\r</literal> is a ! carriage return, <literal>\t</literal> is a tab. Also supported is ! <literal>\<replaceable>digits</replaceable></literal>, where ! <replaceable>ddd</replaceable> represents an octal byte value, and ! <literal>\x<replaceable>hexdigits</replaceable></literal>, where ! <replaceable>hexdigits</replaceable> represents a hexadecimal byte value. ! (It is your responsibility that the byte sequences you create are ! valid characters in the server character set encoding.) Any other ! character following a backslash is taken literally. Thus, to ! include a backslash in a string constant, write two backslashes. </para> <para> Index: src/backend/parser/scan.l =================================================================== RCS file: /cvsroot/pgsql/src/backend/parser/scan.l,v retrieving revision 1.122 diff -c -c -r1.122 scan.l *** src/backend/parser/scan.l 26 May 2005 01:24:29 -0000 1.122 --- src/backend/parser/scan.l 28 May 2005 14:03:10 -0000 *************** *** 193,200 **** xqstart {quote} xqdouble {quote}{quote} xqinside [^\\']+ ! xqescape [\\][^0-7] xqoctesc [\\][0-7]{1,3} /* $foo$ style quotes ("dollar quoting") * The quoted string starts with $foo$ where "foo" is an optional string --- 193,201 ---- xqstart {quote} xqdouble {quote}{quote} xqinside [^\\']+ ! xqescape [\\][^0-7x] xqoctesc [\\][0-7]{1,3} + xqhexesc [\\]x[0-9A-Fa-f]{1,2} /* $foo$ style quotes ("dollar quoting") * The quoted string starts with $foo$ where "foo" is an optional string *************** *** 435,440 **** --- 436,445 ---- unsigned char c = strtoul(yytext+1, NULL, 8); addlitchar(c); } + <xq>{xqhexesc} { + unsigned char c = strtoul(yytext+2, NULL, 16); + addlitchar(c); + } <xq>{quotecontinue} { /* ignore */ } Index: doc/src/sgml/ref/psql-ref.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/ref/psql-ref.sgml,v retrieving revision 1.135 diff -c -c -r1.135 psql-ref.sgml *** doc/src/sgml/ref/psql-ref.sgml 28 Apr 2005 13:09:59 -0000 1.135 --- doc/src/sgml/ref/psql-ref.sgml 28 May 2005 16:23:01 -0000 *************** *** 590,599 **** precede it by a backslash. Anything contained in single quotes is furthermore subject to C-like substitutions for <literal>\n</literal> (new line), <literal>\t</literal> (tab), ! <literal>\</literal><replaceable>digits</replaceable>, ! <literal>\0</literal><replaceable>digits</replaceable>, and ! <literal>\0x</literal><replaceable>digits</replaceable> (the ! character with the given decimal, octal, or hexadecimal code). </para> <para> --- 590,598 ---- precede it by a backslash. Anything contained in single quotes is furthermore subject to C-like substitutions for <literal>\n</literal> (new line), <literal>\t</literal> (tab), ! <literal>\</literal><replaceable>digits</replaceable>, and ! <literal>\x</literal><replaceable>digits</replaceable> (the ! character with the given octal or hexadecimal code). </para> <para> Index: src/bin/psql/psqlscan.l =================================================================== RCS file: /cvsroot/pgsql/src/bin/psql/psqlscan.l,v retrieving revision 1.10 diff -c -c -r1.10 psqlscan.l *** src/bin/psql/psqlscan.l 26 May 2005 01:24:29 -0000 1.10 --- src/bin/psql/psqlscan.l 28 May 2005 16:23:10 -0000 *************** *** 849,877 **** "\\r" { appendPQExpBufferChar(output_buf, '\r'); } "\\f" { appendPQExpBufferChar(output_buf, '\f'); } ! "\\"[1-9][0-9]* { ! /* decimal case */ ! appendPQExpBufferChar(output_buf, ! (char) strtol(yytext + 1, NULL, 0)); ! } ! ! "\\"0[0-7]* { /* octal case */ appendPQExpBufferChar(output_buf, ! (char) strtol(yytext + 1, NULL, 0)); } ! "\\"0[xX][0-9A-Fa-f]+ { /* hex case */ appendPQExpBufferChar(output_buf, ! (char) strtol(yytext + 1, NULL, 0)); ! } ! ! "\\"0[xX] { ! /* failed hex case */ ! yyless(2); ! appendPQExpBufferChar(output_buf, ! (char) strtol(yytext + 1, NULL, 0)); } "\\". { emit(yytext + 1, 1); } --- 849,864 ---- "\\r" { appendPQExpBufferChar(output_buf, '\r'); } "\\f" { appendPQExpBufferChar(output_buf, '\f'); } ! "\\"[0-7]{1,3} { /* octal case */ appendPQExpBufferChar(output_buf, ! (char) strtol(yytext + 1, NULL, 8)); } ! "\\"x[0-9A-Fa-f]{1,2} { /* hex case */ appendPQExpBufferChar(output_buf, ! (char) strtol(yytext + 2, NULL, 16)); } "\\". { emit(yytext + 1, 1); }
pgsql-patches by date: