Thread: BUG #2236: extremely slow to get unescaped bytea data from db

BUG #2236: extremely slow to get unescaped bytea data from db

From
"Kai Ronan"
Date:
The following bug has been logged online:

Bug reference:      2236
Logged by:          Kai Ronan
Email address:      support@kalador.com
PostgreSQL version: 8.0.1
Operating system:   redhat linux
Description:        extremely slow to get unescaped bytea data from db
Details:

Using php 5.1.2, trying to store images in database using bytea column and
pg_unescape_bytea() which is a PHP wrapper for PQunescapeBytea().

It took 77 seconds to extract a 400K gif image from the database.  Using a
text column and base64 escaping, the same image took <1 to extract from the
database.  lo functions also work fast.  Loading images to the db is fast in
all cases.

Code snippet is:

<?php
 // Connect to the database
 $dbconn = pg_connect('dbname=imagedb');

 // Get the bytea data
 $res = pg_query("SELECT data FROM image WHERE name='big.gif'");

 // Convert to binary and send to the browser
 header('Content-type: image/gif');
 echo pg_unescape_bytea( pg_fetch_result($res,'data') );
?>

Re: BUG #2236: extremely slow to get unescaped bytea data from db

From
Alvaro Herrera
Date:
Kai Ronan wrote:

>  // Get the bytea data
>  $res = pg_query("SELECT data FROM image WHERE name='big.gif'");

Do you have an index in the image.name column?  What does an

EXPLAIN ANALYZE SELECT data FROM image WHERE name='big.gif'

say?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #2236: extremely slow to get unescaped bytea data

From
Kalador Tech Support
Date:
This is a table with just one entry created to test the problem.  Should
not have any indexing issues.

I've since isolated the problem to the unescape_bytea function not the
SELECT.

I inserted the same image to a bytea column using base64 encoding, and
extracted it from the table (using base64 decoding) and this worked very
fast (<1 second).  So, it is the unescape_bytea function that is to blame.

Hope that helps,

Kai Ronan
Technical Support
Kalador Entertainment Inc.



Alvaro Herrera wrote:

>Kai Ronan wrote:
>
>
>
>> // Get the bytea data
>> $res = pg_query("SELECT data FROM image WHERE name='big.gif'");
>>
>>
>
>Do you have an index in the image.name column?  What does an
>
>EXPLAIN ANALYZE SELECT data FROM image WHERE name='big.gif'
>
>say?
>
>
>

Re: BUG #2236: extremely slow to get unescaped bytea data

From
Michael Fuhr
Date:
On Sat, Feb 04, 2006 at 04:06:11PM -0800, Kalador Tech Support wrote:
> I've since isolated the problem to the unescape_bytea function not the
> SELECT.
>
> I inserted the same image to a bytea column using base64 encoding, and
> extracted it from the table (using base64 decoding) and this worked very
> fast (<1 second).  So, it is the unescape_bytea function that is to blame.

pg_unescape_bytea is fast here; I just unescaped an 850K jpeg image
in about 0.18 seconds on a slow (500MHz) machine.

How did you determine that pg_unescape_bytea was the problem?  What
does something like the following show?

$tstart = microtime(true);
$data = pg_unescape_bytea(pg_fetch_result($res, 'data'));
$dt = microtime(true) - $tstart;
header("Content-Type: text/plain");
printf("unescape time = %.3fms, %d bytes\n", $dt * 1000.0, strlen($data));

--
Michael Fuhr

Re: BUG #2236: extremely slow to get unescaped bytea data

From
Michael Fuhr
Date:
On Wed, Feb 08, 2006 at 09:52:33AM -0800, Kalador Tech Support wrote:
> $tfetch = microtime(true);
>
> // Convert to binary and send to the browser
> // header('Content-type: image/gif');
> header('Content-type: text/plain');
> $data = pg_unescape_bytea($raw);
> $dt = microtime(true) - $tfetch;

The header call should be outside the section of code you're timing.
It probably doesn't matter, but the idea is to time pg_unescape_bytea
so that's all that should be timed (I agree with your decision to
time pg_fetch_result separately).

> When run from a browser, accessing an Apache webserver, the results
> displayed were:
>
> fetch    time = 3.632ms, 296043 bytes
> unescape time = 70625.518ms, 296043 bytes
>
> When running the same code from command line php, the results were:
>
> fetch    time = 3.618ms, 296043 bytes
> unescape time = 8.298ms, 296043 bytes
>
> Am running Apache 1.3.33.  I can't figure the difference between
> environments - command line php and apache php are compiled the same
> (except for the --with-apxs=... instruction for apache version).

Something must be different, although offhand I can't think of what it
might be.  Locale settings, perhaps?  I don't know if pg_unescape_bytea's
performance could be affected by that, but I've seen locale settings
cause horrible performance in string-manipulating code before.

> As an aside, I'm now using base64 encode/decode to insert data without
> problems.  For GIF images, at least, it is also a much smaller encoding
> than the escape_bytea encoding.

Too bad PHP's PostgreSQL extension doesn't handle binary results
(or does it?).  With a binary cursor or a way to tell pg_query and
friends that you want binary results, you could retrieve the binary
image data without having to mess with encoding and decoding.

Even if you're satisfied with what you're doing, it would be
interesting to find an explanation for pg_unescape_bytea's performance
discrepancy and determine whether it's a problem with PHP or with
the underlying libpq function.  Please let us know if you find
anything.

--
Michael Fuhr

Re: BUG #2236: extremely slow to get unescaped bytea data

From
Kalador Tech Support
Date:
I modified your code slightly to isolate the pg_unescape_bytea() from
the pg_fetch_result():

<?php
 // Connect to the database
 $dbconn = pg_connect('dbname=foo');

 // Get the bytea data
 $res = pg_query("SELECT data FROM bar WHERE name='image'");
 $tstart = microtime(true);
 $raw = pg_fetch_result($res, 'data');
 $fetch  = microtime(true) - $tstart;
 $tfetch = microtime(true);

 // Convert to binary and send to the browser
// header('Content-type: image/gif');
 header('Content-type: text/plain');
 $data = pg_unescape_bytea($raw);
 $dt = microtime(true) - $tfetch;
 printf("fetch    time = %.3fms, %d bytes\n", $fetch*1000.0,strlen($data));
 printf("unescape time = %.3fms, %d bytes\n", $dt*1000.0,strlen($data));
?>

When run from a browser, accessing an Apache webserver, the results
displayed were:

fetch    time = 3.632ms, 296043 bytes
unescape time = 70625.518ms, 296043 bytes

When running the same code from command line php, the results were:

fetch    time = 3.618ms, 296043 bytes
unescape time = 8.298ms, 296043 bytes

Am running Apache 1.3.33.  I can't figure the difference between
environments - command line php and apache php are compiled the same
(except for the --with-apxs=... instruction for apache version).

The code to insert into the database was:

<?php
 // Connect to the database
 $dbconn = pg_connect('dbname=foo');

 // Read in a binary file
 $data = file_get_contents('big.gif');

 // Escape the binary data
 $escaped = pg_escape_bytea($data);

 // Insert it into the database
 pg_query("INSERT INTO bar (name, data) VALUES ('image', '{$escaped}')");
?>

As an aside, I'm now using base64 encode/decode to insert data without
problems.  For GIF images, at least, it is also a much smaller encoding
than the escape_bytea encoding.

Kai Ronan
Technical Support
Kalador Entertainment Inc.



Michael Fuhr wrote:

>On Sat, Feb 04, 2006 at 04:06:11PM -0800, Kalador Tech Support wrote:
>
>
>>I've since isolated the problem to the unescape_bytea function not the
>>SELECT.
>>
>>I inserted the same image to a bytea column using base64 encoding, and
>>extracted it from the table (using base64 decoding) and this worked very
>>fast (<1 second).  So, it is the unescape_bytea function that is to blame.
>>
>>
>
>pg_unescape_bytea is fast here; I just unescaped an 850K jpeg image
>in about 0.18 seconds on a slow (500MHz) machine.
>
>How did you determine that pg_unescape_bytea was the problem?  What
>does something like the following show?
>
>$tstart = microtime(true);
>$data = pg_unescape_bytea(pg_fetch_result($res, 'data'));
>$dt = microtime(true) - $tstart;
>header("Content-Type: text/plain");
>printf("unescape time = %.3fms, %d bytes\n", $dt * 1000.0, strlen($data));
>
>
>

Re: BUG #2236: extremely slow to get unescaped bytea data

From
Tom Lane
Date:
Kalador Tech Support <support@kalador.com> writes:
> When run from a browser, accessing an Apache webserver, the results
> displayed were:

> fetch    time = 3.632ms, 296043 bytes
> unescape time = 70625.518ms, 296043 bytes

Is it possible that you are relying on an old (like 7.3) libpq?
I find this in the CVS logs:

2003-06-11 21:17  momjian

    * src/interfaces/libpq/fe-exec.c: I found the libpq function
    PGunescapeBytea a little slow. It was taking a minute and a half to
    decode a 500Kb on a fairly fast machine. I think the culprit is
    sscanf.
    ...
    The new function is significantly faster on my machine with the
    JPEGs being decoded in less than a second. I attach a modified
    libpq example program that I used for my testing.

    Ben Lamb.

            regards, tom lane

Re: BUG #2236: extremely slow to get unescaped bytea data

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Kalador Tech Support <support@kalador.com> writes:
> > When run from a browser, accessing an Apache webserver, the results
> > displayed were:
>
> > fetch    time = 3.632ms, 296043 bytes
> > unescape time = 70625.518ms, 296043 bytes
>
> Is it possible that you are relying on an old (like 7.3) libpq?
> I find this in the CVS logs:
>
> 2003-06-11 21:17  momjian
>
>     * src/interfaces/libpq/fe-exec.c: I found the libpq function
>     PGunescapeBytea a little slow. It was taking a minute and a half to
>     decode a 500Kb on a fairly fast machine. I think the culprit is
>     sscanf.
>     ...
>     The new function is significantly faster on my machine with the
>     JPEGs being decoded in less than a second. I attach a modified
>     libpq example program that I used for my testing.

I note in the PHP 4 sources that the PQunescapeBytea function seems to
have been copied there, "for the benefit of PostgreSQL 7.2 users".  It
says that it comes from 7.3 but I don't see any sscanf call.

There is no PQunescapeBytea call in the whole source that I can see, so
my guess is that the libpq function is not called at all.  So this may
be a PHP bug rather than a Postgres bug.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #2236: extremely slow to get unescaped bytea data

From
Michael Fuhr
Date:
On Thu, Feb 09, 2006 at 12:46:46PM -0300, Alvaro Herrera wrote:
> I note in the PHP 4 sources that the PQunescapeBytea function seems to
> have been copied there, "for the benefit of PostgreSQL 7.2 users".  It
> says that it comes from 7.3 but I don't see any sscanf call.
>
> There is no PQunescapeBytea call in the whole source that I can see, so
> my guess is that the libpq function is not called at all.  So this may
> be a PHP bug rather than a Postgres bug.

The OP claimed to be using PHP 5.1.2, which does have a call to
PQunescapeBytea(), although it also has the old code you're seeing
and a HAVE_PQUNESCAPEBYTEA macro that determines which to use.

Interesting that the command line php and the Apache module behave
differently.  I wonder if ldd would show the php executable and
libphp5.so linked against different versions of libpq; that would
add weight to Tom's suggestion that an old libpq might be responsible.

--
Michael Fuhr

Re: BUG #2236: extremely slow to get unescaped bytea data

From
Kalador Tech Support
Date:
You guys are smart!

Apache was running against an old libpq.  I shutdown apache, updated
/etc/ld.so.conf with the postgres lib dir, ran ldconfig, restarted
apache, and the problem went away.

The old libpq was libpq.so.3.0 (pre-installed on machine).  The new one
is libpq.so.3.2 (installed with 8.0.1)

Sorry for the false alarm - thanks for the help.

Kai Ronan
Technical Support
Kalador Entertainment Inc.



Michael Fuhr wrote:

>On Thu, Feb 09, 2006 at 12:46:46PM -0300, Alvaro Herrera wrote:
>
>
>>I note in the PHP 4 sources that the PQunescapeBytea function seems to
>>have been copied there, "for the benefit of PostgreSQL 7.2 users".  It
>>says that it comes from 7.3 but I don't see any sscanf call.
>>
>>There is no PQunescapeBytea call in the whole source that I can see, so
>>my guess is that the libpq function is not called at all.  So this may
>>be a PHP bug rather than a Postgres bug.
>>
>>
>
>The OP claimed to be using PHP 5.1.2, which does have a call to
>PQunescapeBytea(), although it also has the old code you're seeing
>and a HAVE_PQUNESCAPEBYTEA macro that determines which to use.
>
>Interesting that the command line php and the Apache module behave
>differently.  I wonder if ldd would show the php executable and
>libphp5.so linked against different versions of libpq; that would
>add weight to Tom's suggestion that an old libpq might be responsible.
>
>
>