Thread: PQunescapeBytea code
Someething to consider for after the 7.4 release, perhaps... As per today's CVS version, PQunescapeBytea() does the following when it encounters an escaped character (i.e., a backslash) in the escaped string strtext at offset i: ["if (strtext[i] == '\\')"] i++; if (strtext[i] == '\\')buffer[j++] = strtext[i++]; else {if ((isdigit(strtext[i])) && (isdigit(strtext[i + 1])) && (isdigit(strtext[i + 2]))){ byte = VAL(strtext[i++]); byte = (byte << 3) + VAL(strtext[i++]); buffer[j++] = (byte << 3) + VAL(strtext[i++]);} } This code completely ignores any other usage of the backslash in the escaped string, generating no output for unknown escape sequences. Is that the desired behaviour? The code would be a little simpler if it were to allow al characters to be escaped, which means ignoring the backslash but not the following character: i++; if (isdigit(strtext[i]) && isdigit(strtext[i+1]) && isdigit(strtext[i+2])) {byte = VAL(strtext[i]);i++;byte = (byte<<3) + VAL(strtext[i]);i++;byte = (byte<<3) + VAL(strtext[i]);buffer[j++] = byte; } else {buffer[j++] = strtext[i++]; } In fact, the "else" part is identical to the normal (non-escaped) part of the loop, so it could probably be merged--leaving only the octal parsing part as a special case. Then the whole loop could become something like this: [unsigned char c;] for (i=j=buflen=0; i<(int)strtextlen; ++i, buffer[j++]=c) {c = strtext[i];if (c == '\\'){ c = strtext[i++]; /* Skip backslash */ if (isdigit(c) && isdigit(strtext[i+1]) &&isdigit(strtext[i+2])) { /* Parse octal number */ byte = VAL(strtext[i++]); byte = (byte << 3)+ VAL(strtext[i++]); c = (byte << 3) + VAL(strtext[i]); }} } ...Which saves 8 lines, reduces the number of special cases, adds some comments, and permits arbitrary characters to be escaped. Jeroen
On Thu, Oct 30, 2003 at 08:24:13PM +0100, Jeroen T. Vermeulen wrote: > > Then the whole loop could become something like this: Okay, that code wasn't entirely correct but it gets the idea across. In C++ terms, what I arrived at was: string result; for (int i=0; i<F.size(); ++i) { unsigned char c = p[i]; if (c == '\\') { c = p[++i]; if (isdigit(c)&& isdigit(p[i+1]) && isdigit(p[i+2])) {c = (VAL(p[c])<<9) | (VAL(p[i+1])<<3) | VAL(p[i+2]);i += 2; } } result += char(c); } Simple, no? Jeroen
Actually I was looking at that code today and it does not ignore something if it is escaped by a backslash on not on the list. It eats the backslash and then continues the loop so next time that character will be parsed normally. However PQunescapeBytea is _very_ slow. I am storing fairly large (several hundered K) byte strings into Bytea's and it can take 30 seconds or more to convert them back into binary data. I wrote a new version of PQunescapeBytea that uses pointers instead of arrays to store the string in, this increases the speed about 30 fold on my strings and still has the same behavior. I wasn't sure if this would be something I should submit as a patch or not, is anyone interested in this? If they are I'll try to figure out how to submit a patch. --- Adam Kavan --- akavan@cox.net
Adam Kavan wrote: > Actually I was looking at that code today and it does not ignore something > if it is escaped by a backslash on not on the list. It eats the backslash > and then continues the loop so next time that character will be parsed > normally. However PQunescapeBytea is _very_ slow. I am storing fairly > large (several hundered K) byte strings into Bytea's and it can take 30 > seconds or more to convert them back into binary data. I wrote a new > version of PQunescapeBytea that uses pointers instead of arrays to store > the string in, this increases the speed about 30 fold on my strings and > still has the same behavior. I wasn't sure if this would be something I > should submit as a patch or not, is anyone interested in this? > > If they are I'll try to figure out how to submit a patch. Are you testing againts 7.3.X or 7.4? 7.4 has a faster version. If you are testing against 7.4, do a diff -c against the old and new files and send it to the patches list. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
"Jeroen T. Vermeulen" <jtv@xs4all.nl> writes: > This code completely ignores any other usage of the backslash in the > escaped string, generating no output for unknown escape sequences. Is > that the desired behaviour? As Adam pointed out, the code does do the right thing for other backslash sequences; it just processes the character on the following loop iteration. I'll add a comment to explain this. I see another issue here, which is that for a zero-length input string we will do malloc(0) and realloc(foo, 0), neither of which are very portable. Will fix. regards, tom lane