DBD::Pg BYTEA Character Escaping - Mailing list pgsql-general

From David Wheeler
Subject DBD::Pg BYTEA Character Escaping
Date
Msg-id 1006059527.1307.14.camel@mercury.atomicode.com
Whole thread Raw
Responses Re: DBD::Pg BYTEA Character Escaping  (Bruce Momjian <pgman@candle.pha.pa.us>)
Re: DBD::Pg BYTEA Character Escaping  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: DBD::Pg BYTEA Character Escaping  (Alex Pilosov <alex@pilosoft.com>)
List pgsql-general
Hi All,

I recently noticed that the DBD::Pg Perl module appears to be doing a
lot of work escaping characters for BYTEA data types. It's importing
Perl's POSIX support to check every character in BYTEA data with
isprint(), and replacing it with its octal representation if its not
printable.

However, there are two issues with this approach. The first is
efficiency. The way the code is currently written in DBD::Pg does a
*lot* of unnecessary work, and I'd like to suggest an optimization
(based on discussions on this topic on the Fun with Perl mail list:
http://archive.develooper.com/fwp%40perl.org/msg00458.html -- patch
supplied upon request).

The second issue, however, is that it doesn't appear to me that it's
even necessary that non-printable characters be replaced. Although Alex
Pilosov says that such an approach is needed:

http://www.geocrawler.com/mail/msg.php3?msg_id=6509224&list=10

Joe Conway found that there were only three characters ('\', "'", and
"\0") that needed to be escaped, and it was those three characters that
Bruce Momjian documented for the forthcoming 7.2 release:

http://www.geocrawler.com/mail/msg.php3?msg_id=6547225&list=10

If that's true, then any solution escaping non-printable characters is
overkill, and therefore only the three characters need to be escaped.
And since it looks like two of them ('\' and "'") are already escaped
before the non-printable characters are escaped in DBD::Pg, it then it
seems that this code:

  if ($data_type == DBI::SQL_BINARY ||
      $data_type == DBI::SQL_VARBINARY ||
      $data_type == DBI::SQL_LONGVARBINARY) {
      $str=join("", map { isprint($_)?$_:'\\'.sprintf("%03o",ord($_)) }
                    split //, $str);
  }

Could be changed to:

  s/\0/\\000/g if $data_type == DBI::SQL_BINARY ||
                  $data_type == DBI::SQL_VARBINARY ||
                  $data_type == DBI::SQL_LONGVARBINARY;

So, the reason I'm posting this query is because I'd like to get
confirmation, if possible, on this conclusion. Based on the feedback I
receive, I will submit patches to the DBD::Pg maintainer.

Thanks!

David

PS: If discussion of this issue needs to be moved to the Hackers list,
I'll be happy to do so. I just thought I'd try here, first.

--
David Wheeler                                     AIM: dwTheory
David@Wheeler.net                                 ICQ: 15726394
                                               Yahoo!: dew7e
                                               Jabber: Theory@jabber.org


pgsql-general by date:

Previous
From: wyatt@draggoo.com
Date:
Subject: One time only trigger/function on every row...
Next
From: Bruce Momjian
Date:
Subject: Re: DBD::Pg BYTEA Character Escaping