Thread: \xDD patch for 7.5devel

\xDD patch for 7.5devel

From
Jason Godden
Date:
Hi all,

This is my first patch for PostgreSQL against the 7.5devel cvs (please advise if this is the wrong place to post
patches). This patch simply enables the \xDD (or \XDD) hexadecimal import in the copy command (im starting with the
simplestuff first).  I did notice that there may be a need to issue an error if an invalid octal or hex character is
foundfollowing a \ or \x.  No errors are currently flagged by the octal (or this hex) import.
 

Rgds,

Jason

cvs server: Diffing .
Index: copy.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/commands/copy.c,v
retrieving revision 1.213
diff -u -r1.213 copy.c
--- copy.c      6 Oct 2003 02:38:53 -0000       1.213
+++ copy.c      5 Nov 2003 11:19:33 -0000
@@ -48,9 +48,10 @@#include "utils/lsyscache.h"#include "utils/syscache.h"

-#define ISOCTAL(c) (((c) >= '0') && ((c) <= '7'))#define OCTVALUE(c) ((c) - '0')
+#define ISHEX(c) ((((c)>='0') && ((c)<='9')) || (((c)>='A') && ((c)<='F')) || (((c)>='a') && ((c)<='f')))
+#define HEXVALUE(c) (((c)>='a') ? ((c)-87) : (((c)>='A') ? ((c)-55) : ((c)-'0')))
/* * Represents the different source/dest cases we need to worry about at
@@ -1947,6 +1948,33 @@                       c = line_buf.data[line_buf.cursor++];                       switch (c)
                 {
 
+                               case 'x':
+                               case 'X':
+                                       {
+                                               if (line_buf.cursor < line_buf.len)
+                                               {
+                                                       int hexval = 0;
+
+                                                       c = line_buf.data[line_buf.cursor];
+                                                       if (ISHEX(c))
+                                                       {
+                                                               line_buf.cursor++;
+                                                               hexval = HEXVALUE(c);
+                                                               if (line_buf.cursor < line_buf.len)
+                                                               {
+                                                                       c = line_buf.data[line_buf.cursor];
+                                                                       line_buf.cursor++;
+                                                                       if (ISHEX(c))
+                                                                       {
+                                                                               line_buf.cursor++;
+                                                                               hexval = (hexval << 4) + HEXVALUE(c);
+                                                                       }
+                                                               }
+                                                       }
+                                                       c = hexval;
+                                               }
+                                       }
+                                       break;                               case '0':
case'1':                               case '2':
 



Re: \xDD patch for 7.5devel

From
Peter Eisentraut
Date:
Jason Godden writes:

> This is my first patch for PostgreSQL against the 7.5devel cvs (please advise if this is the wrong place to post
patches). This patch simply enables the \xDD (or \XDD) hexadecimal import in the copy command (im starting with the
simplestuff first).  I did notice that there may be a need to issue an error if an invalid octal or hex character is
foundfollowing a \ or \x.  No errors are currently flagged by the octal (or this hex) import.
 

I think this belongs into the string literal parser (at least in addition
to COPY).

-- 
Peter Eisentraut   peter_e@gmx.net



Re: \xDD patch for 7.5devel

From
Christopher Kings-Lynne
Date:
>>This is my first patch for PostgreSQL against the 7.5devel cvs (please advise if this is the wrong place to post
patches). This patch simply enables the \xDD (or \XDD) hexadecimal import in the copy command (im starting with the
simplestuff first).  I did notice that there may be a need to issue an error if an invalid octal or hex character is
foundfollowing a \ or \x.  No errors are currently flagged by the octal (or this hex) import.
 
> 
> 
> I think this belongs into the string literal parser (at least in addition
> to COPY).

That's what always happens when I start working on something - someone 
points out something that makes it 100 times harder :P

Chris




Re: \xDD patch for 7.5devel

From
Tom Lane
Date:
Jason Godden <jasongodden@optushome.com.au> writes:
> This is my first patch for PostgreSQL against the 7.5devel cvs (please
> advise if this is the wrong place to post patches).

pgsql-patches in future, please.

> +#define HEXVALUE(c) (((c)>='a') ? ((c)-87) : (((c)>='A') ? ((c)-55) : ((c)-'0')))

This seems excessively dependent on the assumption that the character
set is ASCII.  Why have you hard-coded numeric equivalents into this
macro?

BTW, the patch is incomplete because it is lacking documentation.
        regards, tom lane


Re: \xDD patch for 7.5devel

From
Markus Bertheau
Date:
В Срд, 05.11.2003, в 16:25, Tom Lane пишет:

> > +#define HEXVALUE(c) (((c)>='a') ? ((c)-87) : (((c)>='A') ? ((c)-55) : ((c)-'0')))
>
> This seems excessively dependent on the assumption that the character
> set is ASCII.  Why have you hard-coded numeric equivalents into this
> macro?

What not ASCII compatible character sets are out there in use still
today?

--
Markus Bertheau <twanger@bluetwanger.de>


Re: \xDD patch for 7.5devel

From
Jason Godden
Date:
On Thu, 6 Nov 2003 06:25 am, Markus Bertheau wrote:
> В Срд, 05.11.2003, в 16:25, Tom Lane пишет:
> > > +#define HEXVALUE(c) (((c)>='a') ? ((c)-87) : (((c)>='A') ? ((c)-55) :
> > > ((c)-'0')))
> >
> > This seems excessively dependent on the assumption that the character
> > set is ASCII.  Why have you hard-coded numeric equivalents into this
> > macro?
>
> What not ASCII compatible character sets are out there in use still
> today?

Ah, yes - didn't even think about the character sets.  If thats the case then
octal needs attention as well because it makes a similar assumption.  Peter
Eisentraut commented that this should be in the string literal parser.
Should this be the case? and if so should i migrate both octal and hex to
this parser?

Rgds,

Jason



Re: \xDD patch for 7.5devel

From
Larry Rosenman
Date:

--On Thursday, November 06, 2003 07:43:07 +1100 Jason Godden
<jasongodden@optushome.com.au> wrote:

> On Thu, 6 Nov 2003 06:25 am, Markus Bertheau wrote:
>> ? ???, 05.11.2003, ? 16:25, Tom Lane ?????:
>> > > +#define HEXVALUE(c) (((c)>='a') ? ((c)-87) : (((c)>='A') ? ((c)-55)
>> > > : ((c)-'0')))
>> >
>> > This seems excessively dependent on the assumption that the character
>> > set is ASCII.  Why have you hard-coded numeric equivalents into this
>> > macro?
>>
>> What not ASCII compatible character sets are out there in use still
>> today?
>
EBCDIC as far as I know is still the default on IBM Mainframes (been 5+
years but...).



--
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

Re: \xDD patch for 7.5devel

From
Kurt Roeckx
Date:
On Wed, Nov 05, 2003 at 02:47:17PM -0600, Larry Rosenman wrote:
> 
> 
> --On Thursday, November 06, 2003 07:43:07 +1100 Jason Godden 
> <jasongodden@optushome.com.au> wrote:
> 
> >On Thu, 6 Nov 2003 06:25 am, Markus Bertheau wrote:
> >>? ???, 05.11.2003, ? 16:25, Tom Lane ?????:
> >>> > +#define HEXVALUE(c) (((c)>='a') ? ((c)-87) : (((c)>='A') ? ((c)-55)
> >>> > : ((c)-'0')))
> >>>
> >>> This seems excessively dependent on the assumption that the character
> >>> set is ASCII.  Why have you hard-coded numeric equivalents into this
> >>> macro?
> >>
> >>What not ASCII compatible character sets are out there in use still
> >>today?
> >
> EBCDIC as far as I know is still the default on IBM Mainframes (been 5+ 
> years but...).

Linux on the s390, s390x runs in ASCII mode.  MVS, OS/390, z/OS
all use EBCDIC though.

But I don't think it has anything to do with which OS/hardware
you use but rather what charset is used during the communication.
It's probably about the charset that is used to send the "\xDD".
I guess question is that you can assume that that string is
encoded in ASCII.

If this is broken, I'd say that the octal encoding and other
quotes are broken too.



Kurt



Re: \xDD patch for 7.5devel

From
Stephan Szabo
Date:
On Thu, 6 Nov 2003, Jason Godden wrote:

> On Thu, 6 Nov 2003 06:25 am, Markus Bertheau wrote:
> > В Срд, 05.11.2003, в 16:25, Tom Lane пишет:
> > > > +#define HEXVALUE(c) (((c)>='a') ? ((c)-87) : (((c)>='A') ? ((c)-55) :
> > > > ((c)-'0')))
> > >
> > > This seems excessively dependent on the assumption that the character
> > > set is ASCII.  Why have you hard-coded numeric equivalents into this
> > > macro?
> >
> > What not ASCII compatible character sets are out there in use still
> > today?
>
> Ah, yes - didn't even think about the character sets.  If thats the case then
> octal needs attention as well because it makes a similar assumption.  Peter

I haven't looked at the code in question, but assuming the digits are
contiguous and in order is safe, the C spec mandates that.  Assuming that
the letters are in order and contiguous is not safe.


Re: \xDD patch for 7.5devel

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
> On Thu, 6 Nov 2003, Jason Godden wrote:
>> On Thu, 6 Nov 2003 06:25 am, Markus Bertheau wrote:
>>> +#define HEXVALUE(c) (((c)>='a') ? ((c)-87) : (((c)>='A') ? ((c)-55) :
> ((c)-'0')))

> I haven't looked at the code in question, but assuming the digits are
> contiguous and in order is safe, the C spec mandates that.  Assuming that
> the letters are in order and contiguous is not safe.

I believe that's a true statement with respect to the character sets
used in the field; I dunno whether the C spec actually says that though.

My original concern about this macro had several different levels:

1. I can't see any reason why the subtractions are coded as "-55" and
not "-'A' + 10".  The existing coding is less understandable than doing
it right, and won't save anything in runtime (since the compiler will fold
the constants anyway), on top of being a character set dependency.

2. I don't much care for the assumption that lower case letters are
greater than upper case are greater than digits.  This could be avoided
at a fairly small runtime penalty by making range tests:

#define HEXVALUE(c) \(((c) >= '0' && (c) <= '9') ? ((c) - '0') : \ (((c) >= 'A' && (c) <= 'F') ? ((c) - 'A' + 10) : \
((c)- 'a' + 10)))
 

3. The third level would be to get rid of the assumption that letters
are contiguous, which would probably require making a lookup table to
map from char code to hex value.

I'm not sure level 3 is really worth doing, since AFAIK no one tries to
run Postgres on any EBCDIC platform.  (It's likely that there are other
places that depend on the letters-are-contiguous assumption anyway.)
I do think level 1 and probably level 2 are appropriate changes.
        regards, tom lane


Re: \xDD patch for 7.5devel

From
Jason Godden
Date:
> #define HEXVALUE(c) \
>     (((c) >= '0' && (c) <= '9') ? ((c) - '0') : \
>      (((c) >= 'A' && (c) <= 'F') ? ((c) - 'A' + 10) : \
>       ((c) - 'a' + 10)))
>
> 3. The third level would be to get rid of the assumption that letters
> are contiguous, which would probably require making a lookup table to
> map from char code to hex value.
>
> I'm not sure level 3 is really worth doing, since AFAIK no one tries to
> run Postgres on any EBCDIC platform.  (It's likely that there are other
> places that depend on the letters-are-contiguous assumption anyway.)
> I do think level 1 and probably level 2 are appropriate changes.
>
>             regards, tom lane

Hi Guys,

Thanks for the feedback.  Tom I'll make the changes proposed here to the macro 
and repost the patch to pgsql-patches (and do some reading on Unicode!).  I 
guess at this stage I would like to offer any of my time to any janitorial 
work that might be needed as until im more knowledgeable about the pg source 
I think any large scale stuff is off the cards.

Rgds,

Jason