Thread: DBD::Pg BYTEA Character Escaping

DBD::Pg BYTEA Character Escaping

From
David Wheeler
Date:
Hi All,

I recently noticed that the DBD::Pg Perl module appears to be doing a
lot of work escaping characters for BYTEA data types. It's importing
Perl's POSIX support to check every character in BYTEA data with
isprint(), and replacing it with its octal representation if its not
printable.

However, there are two issues with this approach. The first is
efficiency. The way the code is currently written in DBD::Pg does a
*lot* of unnecessary work, and I'd like to suggest an optimization
(based on discussions on this topic on the Fun with Perl mail list:
http://archive.develooper.com/fwp%40perl.org/msg00458.html -- patch
supplied upon request).

The second issue, however, is that it doesn't appear to me that it's
even necessary that non-printable characters be replaced. Although Alex
Pilosov says that such an approach is needed:

http://www.geocrawler.com/mail/msg.php3?msg_id=6509224&list=10

Joe Conway found that there were only three characters ('\', "'", and
"\0") that needed to be escaped, and it was those three characters that
Bruce Momjian documented for the forthcoming 7.2 release:

http://www.geocrawler.com/mail/msg.php3?msg_id=6547225&list=10

If that's true, then any solution escaping non-printable characters is
overkill, and therefore only the three characters need to be escaped.
And since it looks like two of them ('\' and "'") are already escaped
before the non-printable characters are escaped in DBD::Pg, it then it
seems that this code:

  if ($data_type == DBI::SQL_BINARY ||
      $data_type == DBI::SQL_VARBINARY ||
      $data_type == DBI::SQL_LONGVARBINARY) {
      $str=join("", map { isprint($_)?$_:'\\'.sprintf("%03o",ord($_)) }
                    split //, $str);
  }

Could be changed to:

  s/\0/\\000/g if $data_type == DBI::SQL_BINARY ||
                  $data_type == DBI::SQL_VARBINARY ||
                  $data_type == DBI::SQL_LONGVARBINARY;

So, the reason I'm posting this query is because I'd like to get
confirmation, if possible, on this conclusion. Based on the feedback I
receive, I will submit patches to the DBD::Pg maintainer.

Thanks!

David

PS: If discussion of this issue needs to be moved to the Hackers list,
I'll be happy to do so. I just thought I'd try here, first.

--
David Wheeler                                     AIM: dwTheory
David@Wheeler.net                                 ICQ: 15726394
                                               Yahoo!: dew7e
                                               Jabber: Theory@jabber.org


Re: DBD::Pg BYTEA Character Escaping

From
Bruce Momjian
Date:
> If that's true, then any solution escaping non-printable characters is
> overkill, and therefore only the three characters need to be escaped.
> And since it looks like two of them ('\' and "'") are already escaped
> before the non-printable characters are escaped in DBD::Pg, it then it
> seems that this code:
>
>   if ($data_type == DBI::SQL_BINARY ||
>       $data_type == DBI::SQL_VARBINARY ||
>       $data_type == DBI::SQL_LONGVARBINARY) {
>       $str=join("", map { isprint($_)?$_:'\\'.sprintf("%03o",ord($_)) }
>                     split //, $str);
>   }
>
> Could be changed to:
>
>   s/\0/\\000/g if $data_type == DBI::SQL_BINARY ||
>                   $data_type == DBI::SQL_VARBINARY ||
>                   $data_type == DBI::SQL_LONGVARBINARY;
>
> So, the reason I'm posting this query is because I'd like to get
> confirmation, if possible, on this conclusion. Based on the feedback I
> receive, I will submit patches to the DBD::Pg maintainer.
>
> Thanks!
>
> David
>
> PS: If discussion of this issue needs to be moved to the Hackers list,
> I'll be happy to do so. I just thought I'd try here, first.

Yes, you only need to escape NULL for bytea.  The above patch looks fine
to me.  We can add it to 7.3.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: DBD::Pg BYTEA Character Escaping

From
Tom Lane
Date:
David Wheeler <david@wheeler.net> writes:
> If that's true, then any solution escaping non-printable characters is
> overkill, and therefore only the three characters need to be escaped.

Check ...

> Could be changed to:

>   s/\0/\\000/g if $data_type == DBI::SQL_BINARY ||
>                   $data_type == DBI::SQL_VARBINARY ||
>                   $data_type == DBI::SQL_LONGVARBINARY;

Offhand I don't think you even need the check on the datatype; wouldn't
it be faster and safer to do the substitution unconditionally?  I can't
see that there are any cases that work without this substitution and
fail with it.

            regards, tom lane

Re: DBD::Pg BYTEA Character Escaping

From
Alex Pilosov
Date:
On 17 Nov 2001, David Wheeler wrote:

> The second issue, however, is that it doesn't appear to me that it's
> even necessary that non-printable characters be replaced. Although Alex
> Pilosov says that such an approach is needed:
>
> http://www.geocrawler.com/mail/msg.php3?msg_id=6509224&list=10
I didn't say it was needed :) I just had the easy way out and escaped
everything that might possibly need to be escaped.

> Joe Conway found that there were only three characters ('\', "'", and
> "\0") that needed to be escaped, and it was those three characters that
> Bruce Momjian documented for the forthcoming 7.2 release:
Right.

> http://www.geocrawler.com/mail/msg.php3?msg_id=6547225&list=10
>
> If that's true, then any solution escaping non-printable characters is
> overkill, and therefore only the three characters need to be escaped.
> And since it looks like two of them ('\' and "'") are already escaped
> before the non-printable characters are escaped in DBD::Pg, it then it
> seems that this code:
>
>   if ($data_type == DBI::SQL_BINARY ||
>       $data_type == DBI::SQL_VARBINARY ||
>       $data_type == DBI::SQL_LONGVARBINARY) {
>       $str=join("", map { isprint($_)?$_:'\\'.sprintf("%03o",ord($_)) }
>                     split //, $str);
>   }
>
> Could be changed to:
>
>   s/\0/\\000/g if $data_type == DBI::SQL_BINARY ||
>                   $data_type == DBI::SQL_VARBINARY ||
>                   $data_type == DBI::SQL_LONGVARBINARY;
Yep.

>
> So, the reason I'm posting this query is because I'd like to get
> confirmation, if possible, on this conclusion. Based on the feedback I
> receive, I will submit patches to the DBD::Pg maintainer.
Go right ahead.

-alex



Re: DBD::Pg BYTEA Character Escaping

From
David Wheeler
Date:
On Sat, 2001-11-17 at 21:46, Tom Lane wrote:

> Offhand I don't think you even need the check on the datatype; wouldn't
> it be faster and safer to do the substitution unconditionally?  I can't
> see that there are any cases that work without this substitution and
> fail with it.

In that case, I suggest the attached, even more simplified patch. It
assumes that the quote character stored in $lp in the existing version
is always "'" (a safe assumption for DBD:Pg, I think, in a way it wasn't
for DBI, from whence the code for the LITERAL_PREFIX and LITERAL_SUFFIX
was copied into DBD::Pg), and therefore performs the escaping of the
three agreed-upon characters for all non-numeric data types.

If you agree that this tack makes sense, I'll submit the patch to Edmund
Mergl.

Regards,

Daviid

--
David Wheeler                                     AIM: dwTheory
David@Wheeler.net                                 ICQ: 15726394
                                               Yahoo!: dew7e
                                               Jabber: Theory@jabber.org

Attachment

Re: DBD::Pg BYTEA Character Escaping

From
David Wheeler
Date:
On Sun, 2001-11-18 at 08:40, Alex Pilosov wrote:

> I didn't say it was needed :) I just had the easy way out and escaped
> everything that might possibly need to be escaped.

Ah, gotcha!

<snip />

> > So, the reason I'm posting this query is because I'd like to get
> > confirmation, if possible, on this conclusion. Based on the feedback I
> > receive, I will submit patches to the DBD::Pg maintainer.
> Go right ahead.

Thanks. See other recent messages with patches attached.

Regards,

David
--
David Wheeler                                     AIM: dwTheory
David@Wheeler.net                                 ICQ: 15726394
                                               Yahoo!: dew7e
                                               Jabber: Theory@jabber.org


Re: DBD::Pg BYTEA Character Escaping

From
David Wheeler
Date:
On Sun, 2001-11-18 at 13:21, David Wheeler wrote:

> If you agree that this tack makes sense, I'll submit the patch to Edmund
> Mergl.

<snip />

Actually, I've come up with an even simpler solution, and it should be
faster for some datatypes, too. It's attached.

Thanks,

David

--
David Wheeler                                     AIM: dwTheory
David@Wheeler.net                                 ICQ: 15726394
                                               Yahoo!: dew7e
                                               Jabber: Theory@jabber.org

Attachment

7.1.2: Backend message type 0x44 when selecting from a table

From
Lincoln Yeoh
Date:
select version();
                               version
---------------------------------------------------------------------
 PostgreSQL 7.1.2 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66
(1 row)

(doh, I thought I was running 7.1.3!).

SELECT * from arch_ranks_arch4 ;
Backend message type 0x44 arrived while idle
pqReadData() -- backend closed the channel unexpectedly.
        This probably means the backend terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

Trying to pg_dump the table gives me this:

<snipped>
1526    2001-11-18 07:22:15+08  0                       BLANK   0       0
    0       0
1414414664      4714-11--2147483624 BC  218103907

\.
ERROR:  MemoryContextAlloc: invalid request size 1163153238
PQendcopy: resetting connection
SQL query to dump the contents of Table 'arch_ranks_arch4' did not execute
correctly.  After we read all th
e table contents from the backend, PQendcopy() failed.  Explanation from
backend: 'ERROR:  MemoryContextAll
oc: invalid request size 1163153238
'.
The query was: 'COPY "arch_ranks_arch4" TO stdout;
'.

Another similar table is ok. I have vacuumed - no errors, but the problem
still remained.

These tables have had a lot of updates on them (24/7 every second or so).
Fortunately it's not critical: just test tables.

The type:
CREATE TABLE "arch_ranks_arch4" (
        "id" integer,
        "updated" timestamp with time zone,
        "valid" integer,
        "name" text,
        "specialty" text,
        "status" text,
        "ranking" integer,
        "power" integer,
        "land" integer,
        "forts" integer,
        "description" text
);

I've truncated the table and it runs ok now.

The server itself hasn't crashed - up for 69 days.

Is it a fixed bug?

Regards,
Link.



Re: 7.1.2: Backend message type 0x44 when selecting from a table

From
Tom Lane
Date:
Lincoln Yeoh <lyeoh@pop.jaring.my> writes:
> Trying to pg_dump the table gives me this:
> <snipped>
> ERROR:  MemoryContextAlloc: invalid request size 1163153238
> PQendcopy: resetting connection

This looks like a corrupted-data problem ...

> I've truncated the table and it runs ok now.

... but the evidence is now gone, so we can't really probe into it
further :-(.  You might be well advised to run some hardware diagnostics
to see if you have any RAM problems, flaky disk controllers, that sort
of thing.  Not that Postgres has no bugs, of course, but we've seen
quite a number of data-corruption reports that ultimately traced to
hardware problems.

The behavior during a SELECT seems odd also:

> SELECT * from arch_ranks_arch4 ;
> Backend message type 0x44 arrived while idle
> pqReadData() -- backend closed the channel unexpectedly.

This suggests that libpq and the backend got out of sync somehow,
but I thought we'd fixed that class of problems years ago.  If you
can reproduce this it'd be worth looking into.

            regards, tom lane

Re: 7.1.2: Backend message type 0x44 when selecting

From
Lincoln Yeoh
Date:
At 11:18 PM 18-11-2001 -0500, Tom Lane wrote:
>Lincoln Yeoh <lyeoh@pop.jaring.my> writes:
>> Trying to pg_dump the table gives me this:
>> <snipped>
>> ERROR:  MemoryContextAlloc: invalid request size 1163153238
>> PQendcopy: resetting connection
>
>This looks like a corrupted-data problem ...

Yah. Timestamp was BC :).

>> I've truncated the table and it runs ok now.
>
>... but the evidence is now gone, so we can't really probe into it
>further :-(.  You might be well advised to run some hardware diagnostics
>to see if you have any RAM problems, flaky disk controllers, that sort
>of thing.  Not that Postgres has no bugs, of course, but we've seen
>quite a number of data-corruption reports that ultimately traced to
>hardware problems.

>The behavior during a SELECT seems odd also:
>
>> SELECT * from arch_ranks_arch4 ;
>> Backend message type 0x44 arrived while idle
>> pqReadData() -- backend closed the channel unexpectedly.
>
>This suggests that libpq and the backend got out of sync somehow,
>but I thought we'd fixed that class of problems years ago.  If you
>can reproduce this it'd be worth looking into.

I'll try to upgrade to 7.1.3. Then if it happens again it'll mean more.

Cheerio,
Link.


Re: DBD::Pg BYTEA Character Escaping

From
David Wheeler
Date:
On Sun, 2001-11-18 at 13:36, David Wheeler wrote:

> Actually, I've come up with an even simpler solution, and it should be
> faster for some datatypes, too. It's attached.

May I assume by that the patch I submitted is fine? I'd like to send it
on to the DBD::Pg maintainer.

Best,

David

--
David Wheeler                                     AIM: dwTheory
David@Wheeler.net                                 ICQ: 15726394
                                               Yahoo!: dew7e
                                               Jabber: Theory@jabber.org


Re: DBD::Pg BYTEA Character Escaping

From
Alex Pilosov
Date:
Tis cool by me, and I am to blame for the previous patch relating to
bytea..:)

-alex
On 26 Nov 2001, David Wheeler wrote:

> On Sun, 2001-11-18 at 13:36, David Wheeler wrote:
>
> > Actually, I've come up with an even simpler solution, and it should be
> > faster for some datatypes, too. It's attached.
>
> May I assume by that the patch I submitted is fine? I'd like to send it
> on to the DBD::Pg maintainer.
>
> Best,
>
> David
>
>


Re: DBD::Pg BYTEA Character Escaping

From
David Wheeler
Date:
On Mon, 2001-11-26 at 22:03, Alex Pilosov wrote:
> Tis cool by me, and I am to blame for the previous patch relating to
> bytea..:)

Thanks Alex. I'll submit it today.

Regards,

David
--
David Wheeler                                     AIM: dwTheory
David@Wheeler.net                                 ICQ: 15726394
                                               Yahoo!: dew7e
                                               Jabber: Theory@jabber.org