Thread: PQunescapeBytea code

PQunescapeBytea code

From

"Jeroen T. Vermeulen"

Date:

30 October 2003, 18:24:47

Someething to consider for after the 7.4 release, perhaps...

As per today's CVS version, PQunescapeBytea() does the following when
it encounters an escaped character (i.e., a backslash) in the escaped
string strtext at offset i:

["if (strtext[i] == '\\')"]

i++;
if (strtext[i] == '\\')buffer[j++] = strtext[i++];
else
{if ((isdigit(strtext[i])) &&    (isdigit(strtext[i + 1])) &&    (isdigit(strtext[i + 2]))){    byte =
VAL(strtext[i++]);   byte = (byte << 3) + VAL(strtext[i++]);    buffer[j++] = (byte << 3) + VAL(strtext[i++]);}
 
}

This code completely ignores any other usage of the backslash in the
escaped string, generating no output for unknown escape sequences.  Is 
that the desired behaviour?  The code would be a little simpler if it 
were to allow al characters to be escaped, which means ignoring the 
backslash but not the following character:

i++;
if (isdigit(strtext[i]) && isdigit(strtext[i+1]) && isdigit(strtext[i+2]))
{byte = VAL(strtext[i]);i++;byte = (byte<<3) + VAL(strtext[i]);i++;byte = (byte<<3) + VAL(strtext[i]);buffer[j++] =
byte;
}
else
{buffer[j++] = strtext[i++];
}

In fact, the "else" part is identical to the normal (non-escaped) part of
the loop, so it could probably be merged--leaving only the octal parsing
part as a special case.

Then the whole loop could become something like this:

[unsigned char c;]

for (i=j=buflen=0; i<(int)strtextlen; ++i, buffer[j++]=c)
{c = strtext[i];if (c == '\\'){    c = strtext[i++];    /* Skip backslash */    if (isdigit(c) && isdigit(strtext[i+1])
&&isdigit(strtext[i+2]))    {        /* Parse octal number */        byte = VAL(strtext[i++]);        byte = (byte <<
3)+ VAL(strtext[i++]);        c = (byte << 3) + VAL(strtext[i]);    }}
 
}

...Which saves 8 lines, reduces the number of special cases, adds some
comments, and permits arbitrary characters to be escaped.


Jeroen

Re: PQunescapeBytea code

From

"Jeroen T. Vermeulen"

Date:

30 October 2003, 19:32:06

On Thu, Oct 30, 2003 at 08:24:13PM +0100, Jeroen T. Vermeulen wrote:
> 
> Then the whole loop could become something like this:

Okay, that code wasn't entirely correct but it gets the idea across.  In
C++ terms, what I arrived at was:
 string result; for (int i=0; i<F.size(); ++i) {   unsigned char c = p[i];   if (c == '\\')   {     c = p[++i];     if
(isdigit(c)&& isdigit(p[i+1]) && isdigit(p[i+2]))     {c = (VAL(p[c])<<9) | (VAL(p[i+1])<<3) | VAL(p[i+2]);i += 2;
}  }   result += char(c); }
 


Simple, no?


Jeroen

Re: PQunescapeBytea code

From

Adam Kavan

Date:

30 October 2003, 22:47:11

Actually I was looking at that code today and it does not ignore something 
if it is escaped by a backslash on not on the list.  It eats the backslash 
and then continues the loop so next time that character will be parsed 
normally.  However PQunescapeBytea is _very_ slow.  I am storing fairly 
large (several hundered K) byte strings into Bytea's and it can take 30 
seconds or more to convert them back into binary data.  I wrote a new 
version of PQunescapeBytea that uses pointers instead of arrays to store 
the string in, this increases the speed about 30 fold on my strings and 
still has the same behavior.  I wasn't sure if this would be something I 
should submit as a patch or not, is anyone interested in this?

If they are I'll try to figure out how to submit a patch.

--- Adam Kavan
--- akavan@cox.net

Re: PQunescapeBytea code

From

Bruce Momjian

Date:

31 October 2003, 00:47:09

Adam Kavan wrote:
> Actually I was looking at that code today and it does not ignore something 
> if it is escaped by a backslash on not on the list.  It eats the backslash 
> and then continues the loop so next time that character will be parsed 
> normally.  However PQunescapeBytea is _very_ slow.  I am storing fairly 
> large (several hundered K) byte strings into Bytea's and it can take 30 
> seconds or more to convert them back into binary data.  I wrote a new 
> version of PQunescapeBytea that uses pointers instead of arrays to store 
> the string in, this increases the speed about 30 fold on my strings and 
> still has the same behavior.  I wasn't sure if this would be something I 
> should submit as a patch or not, is anyone interested in this?
> 
> If they are I'll try to figure out how to submit a patch.

Are you testing againts 7.3.X or 7.4?  7.4 has a faster version.  If you
are testing against 7.4, do a diff -c against the old and new files and
send it to the patches list.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: PQunescapeBytea code

From

Tom Lane

Date:

31 October 2003, 16:22:27

"Jeroen T. Vermeulen" <jtv@xs4all.nl> writes:
> This code completely ignores any other usage of the backslash in the
> escaped string, generating no output for unknown escape sequences.  Is 
> that the desired behaviour?

As Adam pointed out, the code does do the right thing for other
backslash sequences; it just processes the character on the following
loop iteration.  I'll add a comment to explain this.

I see another issue here, which is that for a zero-length input string
we will do malloc(0) and realloc(foo, 0), neither of which are very
portable.  Will fix.
        regards, tom lane