Thread: DBD::Pg BYTEA Character Escaping
Hi All, I recently noticed that the DBD::Pg Perl module appears to be doing a lot of work escaping characters for BYTEA data types. It's importing Perl's POSIX support to check every character in BYTEA data with isprint(), and replacing it with its octal representation if its not printable. However, there are two issues with this approach. The first is efficiency. The way the code is currently written in DBD::Pg does a *lot* of unnecessary work, and I'd like to suggest an optimization (based on discussions on this topic on the Fun with Perl mail list: http://archive.develooper.com/fwp%40perl.org/msg00458.html -- patch supplied upon request). The second issue, however, is that it doesn't appear to me that it's even necessary that non-printable characters be replaced. Although Alex Pilosov says that such an approach is needed: http://www.geocrawler.com/mail/msg.php3?msg_id=6509224&list=10 Joe Conway found that there were only three characters ('\', "'", and "\0") that needed to be escaped, and it was those three characters that Bruce Momjian documented for the forthcoming 7.2 release: http://www.geocrawler.com/mail/msg.php3?msg_id=6547225&list=10 If that's true, then any solution escaping non-printable characters is overkill, and therefore only the three characters need to be escaped. And since it looks like two of them ('\' and "'") are already escaped before the non-printable characters are escaped in DBD::Pg, it then it seems that this code: if ($data_type == DBI::SQL_BINARY || $data_type == DBI::SQL_VARBINARY || $data_type == DBI::SQL_LONGVARBINARY) { $str=join("", map { isprint($_)?$_:'\\'.sprintf("%03o",ord($_)) } split //, $str); } Could be changed to: s/\0/\\000/g if $data_type == DBI::SQL_BINARY || $data_type == DBI::SQL_VARBINARY || $data_type == DBI::SQL_LONGVARBINARY; So, the reason I'm posting this query is because I'd like to get confirmation, if possible, on this conclusion. Based on the feedback I receive, I will submit patches to the DBD::Pg maintainer. Thanks! David PS: If discussion of this issue needs to be moved to the Hackers list, I'll be happy to do so. I just thought I'd try here, first. -- David Wheeler AIM: dwTheory David@Wheeler.net ICQ: 15726394 Yahoo!: dew7e Jabber: Theory@jabber.org
> If that's true, then any solution escaping non-printable characters is > overkill, and therefore only the three characters need to be escaped. > And since it looks like two of them ('\' and "'") are already escaped > before the non-printable characters are escaped in DBD::Pg, it then it > seems that this code: > > if ($data_type == DBI::SQL_BINARY || > $data_type == DBI::SQL_VARBINARY || > $data_type == DBI::SQL_LONGVARBINARY) { > $str=join("", map { isprint($_)?$_:'\\'.sprintf("%03o",ord($_)) } > split //, $str); > } > > Could be changed to: > > s/\0/\\000/g if $data_type == DBI::SQL_BINARY || > $data_type == DBI::SQL_VARBINARY || > $data_type == DBI::SQL_LONGVARBINARY; > > So, the reason I'm posting this query is because I'd like to get > confirmation, if possible, on this conclusion. Based on the feedback I > receive, I will submit patches to the DBD::Pg maintainer. > > Thanks! > > David > > PS: If discussion of this issue needs to be moved to the Hackers list, > I'll be happy to do so. I just thought I'd try here, first. Yes, you only need to escape NULL for bytea. The above patch looks fine to me. We can add it to 7.3. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
David Wheeler <david@wheeler.net> writes: > If that's true, then any solution escaping non-printable characters is > overkill, and therefore only the three characters need to be escaped. Check ... > Could be changed to: > s/\0/\\000/g if $data_type == DBI::SQL_BINARY || > $data_type == DBI::SQL_VARBINARY || > $data_type == DBI::SQL_LONGVARBINARY; Offhand I don't think you even need the check on the datatype; wouldn't it be faster and safer to do the substitution unconditionally? I can't see that there are any cases that work without this substitution and fail with it. regards, tom lane
On 17 Nov 2001, David Wheeler wrote: > The second issue, however, is that it doesn't appear to me that it's > even necessary that non-printable characters be replaced. Although Alex > Pilosov says that such an approach is needed: > > http://www.geocrawler.com/mail/msg.php3?msg_id=6509224&list=10 I didn't say it was needed :) I just had the easy way out and escaped everything that might possibly need to be escaped. > Joe Conway found that there were only three characters ('\', "'", and > "\0") that needed to be escaped, and it was those three characters that > Bruce Momjian documented for the forthcoming 7.2 release: Right. > http://www.geocrawler.com/mail/msg.php3?msg_id=6547225&list=10 > > If that's true, then any solution escaping non-printable characters is > overkill, and therefore only the three characters need to be escaped. > And since it looks like two of them ('\' and "'") are already escaped > before the non-printable characters are escaped in DBD::Pg, it then it > seems that this code: > > if ($data_type == DBI::SQL_BINARY || > $data_type == DBI::SQL_VARBINARY || > $data_type == DBI::SQL_LONGVARBINARY) { > $str=join("", map { isprint($_)?$_:'\\'.sprintf("%03o",ord($_)) } > split //, $str); > } > > Could be changed to: > > s/\0/\\000/g if $data_type == DBI::SQL_BINARY || > $data_type == DBI::SQL_VARBINARY || > $data_type == DBI::SQL_LONGVARBINARY; Yep. > > So, the reason I'm posting this query is because I'd like to get > confirmation, if possible, on this conclusion. Based on the feedback I > receive, I will submit patches to the DBD::Pg maintainer. Go right ahead. -alex
On Sat, 2001-11-17 at 21:46, Tom Lane wrote: > Offhand I don't think you even need the check on the datatype; wouldn't > it be faster and safer to do the substitution unconditionally? I can't > see that there are any cases that work without this substitution and > fail with it. In that case, I suggest the attached, even more simplified patch. It assumes that the quote character stored in $lp in the existing version is always "'" (a safe assumption for DBD:Pg, I think, in a way it wasn't for DBI, from whence the code for the LITERAL_PREFIX and LITERAL_SUFFIX was copied into DBD::Pg), and therefore performs the escaping of the three agreed-upon characters for all non-numeric data types. If you agree that this tack makes sense, I'll submit the patch to Edmund Mergl. Regards, Daviid -- David Wheeler AIM: dwTheory David@Wheeler.net ICQ: 15726394 Yahoo!: dew7e Jabber: Theory@jabber.org
Attachment
On Sun, 2001-11-18 at 08:40, Alex Pilosov wrote: > I didn't say it was needed :) I just had the easy way out and escaped > everything that might possibly need to be escaped. Ah, gotcha! <snip /> > > So, the reason I'm posting this query is because I'd like to get > > confirmation, if possible, on this conclusion. Based on the feedback I > > receive, I will submit patches to the DBD::Pg maintainer. > Go right ahead. Thanks. See other recent messages with patches attached. Regards, David -- David Wheeler AIM: dwTheory David@Wheeler.net ICQ: 15726394 Yahoo!: dew7e Jabber: Theory@jabber.org
On Sun, 2001-11-18 at 13:21, David Wheeler wrote: > If you agree that this tack makes sense, I'll submit the patch to Edmund > Mergl. <snip /> Actually, I've come up with an even simpler solution, and it should be faster for some datatypes, too. It's attached. Thanks, David -- David Wheeler AIM: dwTheory David@Wheeler.net ICQ: 15726394 Yahoo!: dew7e Jabber: Theory@jabber.org
Attachment
select version(); version --------------------------------------------------------------------- PostgreSQL 7.1.2 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66 (1 row) (doh, I thought I was running 7.1.3!). SELECT * from arch_ranks_arch4 ; Backend message type 0x44 arrived while idle pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> Trying to pg_dump the table gives me this: <snipped> 1526 2001-11-18 07:22:15+08 0 BLANK 0 0 0 0 1414414664 4714-11--2147483624 BC 218103907 \. ERROR: MemoryContextAlloc: invalid request size 1163153238 PQendcopy: resetting connection SQL query to dump the contents of Table 'arch_ranks_arch4' did not execute correctly. After we read all th e table contents from the backend, PQendcopy() failed. Explanation from backend: 'ERROR: MemoryContextAll oc: invalid request size 1163153238 '. The query was: 'COPY "arch_ranks_arch4" TO stdout; '. Another similar table is ok. I have vacuumed - no errors, but the problem still remained. These tables have had a lot of updates on them (24/7 every second or so). Fortunately it's not critical: just test tables. The type: CREATE TABLE "arch_ranks_arch4" ( "id" integer, "updated" timestamp with time zone, "valid" integer, "name" text, "specialty" text, "status" text, "ranking" integer, "power" integer, "land" integer, "forts" integer, "description" text ); I've truncated the table and it runs ok now. The server itself hasn't crashed - up for 69 days. Is it a fixed bug? Regards, Link.
Lincoln Yeoh <lyeoh@pop.jaring.my> writes: > Trying to pg_dump the table gives me this: > <snipped> > ERROR: MemoryContextAlloc: invalid request size 1163153238 > PQendcopy: resetting connection This looks like a corrupted-data problem ... > I've truncated the table and it runs ok now. ... but the evidence is now gone, so we can't really probe into it further :-(. You might be well advised to run some hardware diagnostics to see if you have any RAM problems, flaky disk controllers, that sort of thing. Not that Postgres has no bugs, of course, but we've seen quite a number of data-corruption reports that ultimately traced to hardware problems. The behavior during a SELECT seems odd also: > SELECT * from arch_ranks_arch4 ; > Backend message type 0x44 arrived while idle > pqReadData() -- backend closed the channel unexpectedly. This suggests that libpq and the backend got out of sync somehow, but I thought we'd fixed that class of problems years ago. If you can reproduce this it'd be worth looking into. regards, tom lane
At 11:18 PM 18-11-2001 -0500, Tom Lane wrote: >Lincoln Yeoh <lyeoh@pop.jaring.my> writes: >> Trying to pg_dump the table gives me this: >> <snipped> >> ERROR: MemoryContextAlloc: invalid request size 1163153238 >> PQendcopy: resetting connection > >This looks like a corrupted-data problem ... Yah. Timestamp was BC :). >> I've truncated the table and it runs ok now. > >... but the evidence is now gone, so we can't really probe into it >further :-(. You might be well advised to run some hardware diagnostics >to see if you have any RAM problems, flaky disk controllers, that sort >of thing. Not that Postgres has no bugs, of course, but we've seen >quite a number of data-corruption reports that ultimately traced to >hardware problems. >The behavior during a SELECT seems odd also: > >> SELECT * from arch_ranks_arch4 ; >> Backend message type 0x44 arrived while idle >> pqReadData() -- backend closed the channel unexpectedly. > >This suggests that libpq and the backend got out of sync somehow, >but I thought we'd fixed that class of problems years ago. If you >can reproduce this it'd be worth looking into. I'll try to upgrade to 7.1.3. Then if it happens again it'll mean more. Cheerio, Link.
On Sun, 2001-11-18 at 13:36, David Wheeler wrote: > Actually, I've come up with an even simpler solution, and it should be > faster for some datatypes, too. It's attached. May I assume by that the patch I submitted is fine? I'd like to send it on to the DBD::Pg maintainer. Best, David -- David Wheeler AIM: dwTheory David@Wheeler.net ICQ: 15726394 Yahoo!: dew7e Jabber: Theory@jabber.org
Tis cool by me, and I am to blame for the previous patch relating to bytea..:) -alex On 26 Nov 2001, David Wheeler wrote: > On Sun, 2001-11-18 at 13:36, David Wheeler wrote: > > > Actually, I've come up with an even simpler solution, and it should be > > faster for some datatypes, too. It's attached. > > May I assume by that the patch I submitted is fine? I'd like to send it > on to the DBD::Pg maintainer. > > Best, > > David > >
On Mon, 2001-11-26 at 22:03, Alex Pilosov wrote: > Tis cool by me, and I am to blame for the previous patch relating to > bytea..:) Thanks Alex. I'll submit it today. Regards, David -- David Wheeler AIM: dwTheory David@Wheeler.net ICQ: 15726394 Yahoo!: dew7e Jabber: Theory@jabber.org