Thread: ECPG patch to use prepare for improved performance

ECPG patch to use prepare for improved performance

From
"William Lawrance"
Date:
This patch for ECPG utilizes the "PQprepare" and "PQexecPrepared"
functions to cause SQL statements from ECPG to be cached. It does
this without requiring any changes in the user's source program.

It was developed during the preparation for a benchmark for a
large customer. This benchmark consists of several hundred programs
containing several thousand embedded SQL statements. The benchmark
has been successfully executed using Oracle, DB2, and PostgreSQL.
In the benchmark, Postgres is shown to be slower, by far, than the
other DBMS systems. In a three hour execution, using this patch,
approximately 30% was saved.

The following approach is used:

    Within the "execute.c" module, routines are added to manage a cache
    of prepared statements. These routines are used  to search, insert,
    and delete entries in the cache. The key for these cache entries is
    the text of the SQL statement as passed by ECPG from the application
    program.

    Within the same module, the "ECPGexecute" function was replaced.
    This is the function that is called to execute a statement after
    some preliminary housekeeping is done. The original "ECPGexecute"
    function constructs an ASCII string by replacing each host variable
    with its current value and then calling "PQexec". The new
    "ECPGexecute" function does the following:

      - build an array of the current values of the host variables.

      - search the cache for an entry indicating that this statement
        has already been prepare'd, via  "PQprepare"

      - If no entry was found in the previous step, call "PQprepare"
        for the statement and then insert an entry for it into the
        cache. If this requires an entry to be re-used, execute a
        "DEALLOCATE PREPARE.." for the previous contents.

      - At this point, the SQL statement has been prepare'd by PQlib,
        either when the statement was executed in the past, or in
        the previous step.

      - call "PQexecPrepared", using the array of parameters built
        in the first step above.








Attachment

Re: ECPG patch to use prepare for improved performance

From
Michael Meskes
Date:
On Mon, May 07, 2007 at 02:46:29PM -0700, William Lawrance wrote:
> This patch for ECPG utilizes the "PQprepare" and "PQexecPrepared"
> functions to cause SQL statements from ECPG to be cached. It does
> this without requiring any changes in the user's source program.
> ...

I still do not understand why you prepare each statement. This might
help you with your test case, but I don't like to add this as a general
rule. If a user wants a prepared statement he/she should use the prepare
statement. I agree that the prepare logic has to be rewritten and this
is high on my agenda, but I will probably only do this for statements
issued with EXEC SQL PREPARE not for every single statement.

Michael
--
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

Re: ECPG patch to use prepare for improved performance

From
"William Lawrance"
Date:
This approach was used for several reasons--

1. No changes were required in the application source program. For
   an application involving thousands of SQL statements in hundreds
   of programs, this is important. This customer application has
   been tuned extensively by the customer for DB2, and he is not
   receptive to large changes.

2. The performance was improved by about 1 hour in the 3 hour
   elapsed time of the application. This is important to the
   customer in terms of accomplishing his work load in the
   time that has been allotted, based on his experience with DB2.
   Without this improvement, he is likely to consider it too slow.

I would like to emphasize that we aren't measuring an artificial
test program; this is a real customer's application. We loaded
7 million rows into 217 tables to run the application. I believe
it is representative of many real batch applications.


Is there reason not to prepare each statement?

Could it be predicated upon a user supplied option ?

Other comments ?


-----Original Message-----
From: Michael Meskes [mailto:meskes@postgresql.org]
Sent: Wednesday, May 09, 2007 2:04 AM
To: William Lawrance
Cc: Pgsql-Patches
Subject: Re: [PATCHES] ECPG patch to use prepare for improved
performance

On Mon, May 07, 2007 at 02:46:29PM -0700, William Lawrance wrote:
> This patch for ECPG utilizes the "PQprepare" and "PQexecPrepared"
> functions to cause SQL statements from ECPG to be cached. It does
> this without requiring any changes in the user's source program.
> ...

I still do not understand why you prepare each statement. This might
help you with your test case, but I don't like to add this as a general
rule. If a user wants a prepared statement he/she should use the prepare
statement. I agree that the prepare logic has to be rewritten and this
is high on my agenda, but I will probably only do this for statements
issued with EXEC SQL PREPARE not for every single statement.

Michael


Re: ECPG patch to use prepare for improved performance

From
Alvaro Herrera
Date:
William Lawrance wrote:
> This approach was used for several reasons--
>
> 1. No changes were required in the application source program. For
>    an application involving thousands of SQL statements in hundreds
>    of programs, this is important. This customer application has
>    been tuned extensively by the customer for DB2, and he is not
>    receptive to large changes.
>
> 2. The performance was improved by about 1 hour in the 3 hour
>    elapsed time of the application. This is important to the
>    customer in terms of accomplishing his work load in the
>    time that has been allotted, based on his experience with DB2.
>    Without this improvement, he is likely to consider it too slow.
>
> I would like to emphasize that we aren't measuring an artificial
> test program; this is a real customer's application. We loaded
> 7 million rows into 217 tables to run the application. I believe
> it is representative of many real batch applications.
>
>
> Is there reason not to prepare each statement?

One reason is that prepared statements have the parameters passed out of
line after the planning is done, so in certain cases the optimizer makes
a different choice which leads to worse plans.

This used to be a problem with JDBC as well, until a workaround was
added so that the "unnamed" prepared statement is not planned until the
parameters are passed.  If you don't do that, it may end up being a bad
choice for applications as well.

> Other comments ?

Codewise I noticed you wrote your own hashing function, which seemed odd
to me at first sight.  We already have a hashing infrastructure, but I'm
not sure if it could be used in ECPG (mainly due to lack of ereport/elog
support).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: ECPG patch to use prepare for improved performance

From
Michael Meskes
Date:
On Wed, May 09, 2007 at 01:12:17PM -0700, William Lawrance wrote:
> 2. The performance was improved by about 1 hour in the 3 hour
>    elapsed time of the application. This is important to the
>    customer in terms of accomplishing his work load in the
>    time that has been allotted, based on his experience with DB2.
>    Without this improvement, he is likely to consider it too slow.

But this only holds for one customer. I don't think this will hold for
every single application. At least I do not see a reason why this
should hold everytime.

> I would like to emphasize that we aren't measuring an artificial
> test program; this is a real customer's application. We loaded
> 7 million rows into 217 tables to run the application. I believe
> it is representative of many real batch applications.

But how about non-batch applications?

> Is there reason not to prepare each statement?

I'm completely against forcing such a design decision on the programmer.
Hopefully I will be able to add a real prepare statement soon.

> Could it be predicated upon a user supplied option ?

Yes, this is fine with me. If you could rearrange the patch I will test
and commit it.

Michael
--
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

Re: ECPG patch to use prepare for improved performance

From
"William Lawrance"
Date:
This updated patch for ECPG uses the current routines by
default. If an environment variable (ECPGUSEPREPARE) is set
to "yes", it uses the new routine that prepares and
caches each statement.




-----Original Message-----
From: Michael Meskes [mailto:meskes@postgresql.org]
Sent: Thursday, May 10, 2007 3:01 AM
To: William Lawrance
Cc: Michael Meskes; Pgsql-Patches
Subject: Re: [PATCHES] ECPG patch to use prepare for improved
performance


On Wed, May 09, 2007 at 01:12:17PM -0700, William Lawrance wrote:
> 2. The performance was improved by about 1 hour in the 3 hour
>    elapsed time of the application. This is important to the
>    customer in terms of accomplishing his work load in the
>    time that has been allotted, based on his experience with DB2.
>    Without this improvement, he is likely to consider it too slow.

But this only holds for one customer. I don't think this will hold for
every single application. At least I do not see a reason why this
should hold everytime.

> I would like to emphasize that we aren't measuring an artificial
> test program; this is a real customer's application. We loaded
> 7 million rows into 217 tables to run the application. I believe
> it is representative of many real batch applications.

But how about non-batch applications?

> Is there reason not to prepare each statement?

I'm completely against forcing such a design decision on the programmer.
Hopefully I will be able to add a real prepare statement soon.

> Could it be predicated upon a user supplied option ?

Yes, this is fine with me. If you could rearrange the patch I will test
and commit it.

Michael
--
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

Attachment

Re: ECPG patch to use prepare for improved performance

From
Andrew Dunstan
Date:
This seems like a very all or nothing approach. By contrast, the Perl
DBD::Pg driver lets you decide per statement if you want it
server-prepared or not. Is that not possible?

cheers

andrew

William Lawrance wrote:
> This updated patch for ECPG uses the current routines by
> default. If an environment variable (ECPGUSEPREPARE) is set
> to "yes", it uses the new routine that prepares and
> caches each statement.
>
>
>
>
> -----Original Message-----
> From: Michael Meskes [mailto:meskes@postgresql.org]
> Sent: Thursday, May 10, 2007 3:01 AM
> To: William Lawrance
> Cc: Michael Meskes; Pgsql-Patches
> Subject: Re: [PATCHES] ECPG patch to use prepare for improved
> performance
>
>
> On Wed, May 09, 2007 at 01:12:17PM -0700, William Lawrance wrote:
>
>> 2. The performance was improved by about 1 hour in the 3 hour
>>    elapsed time of the application. This is important to the
>>    customer in terms of accomplishing his work load in the
>>    time that has been allotted, based on his experience with DB2.
>>    Without this improvement, he is likely to consider it too slow.
>>
>
> But this only holds for one customer. I don't think this will hold for
> every single application. At least I do not see a reason why this
> should hold everytime.
>
>
>> I would like to emphasize that we aren't measuring an artificial
>> test program; this is a real customer's application. We loaded
>> 7 million rows into 217 tables to run the application. I believe
>> it is representative of many real batch applications.
>>
>
> But how about non-batch applications?
>
>
>> Is there reason not to prepare each statement?
>>
>
> I'm completely against forcing such a design decision on the programmer.
> Hopefully I will be able to add a real prepare statement soon.
>
>
>> Could it be predicated upon a user supplied option ?
>>
>
> Yes, this is fine with me. If you could rearrange the patch I will test
> and commit it.
>
> Michael
>
> ------------------------------------------------------------------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: ECPG patch to use prepare for improved performance

From
Bruce Momjian
Date:
This has been saved for the 8.4 release:

    http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

William Lawrance wrote:
>
> This updated patch for ECPG uses the current routines by
> default. If an environment variable (ECPGUSEPREPARE) is set
> to "yes", it uses the new routine that prepares and
> caches each statement.
>
>
>
>
> -----Original Message-----
> From: Michael Meskes [mailto:meskes@postgresql.org]
> Sent: Thursday, May 10, 2007 3:01 AM
> To: William Lawrance
> Cc: Michael Meskes; Pgsql-Patches
> Subject: Re: [PATCHES] ECPG patch to use prepare for improved
> performance
>
>
> On Wed, May 09, 2007 at 01:12:17PM -0700, William Lawrance wrote:
> > 2. The performance was improved by about 1 hour in the 3 hour
> >    elapsed time of the application. This is important to the
> >    customer in terms of accomplishing his work load in the
> >    time that has been allotted, based on his experience with DB2.
> >    Without this improvement, he is likely to consider it too slow.
>
> But this only holds for one customer. I don't think this will hold for
> every single application. At least I do not see a reason why this
> should hold everytime.
>
> > I would like to emphasize that we aren't measuring an artificial
> > test program; this is a real customer's application. We loaded
> > 7 million rows into 217 tables to run the application. I believe
> > it is representative of many real batch applications.
>
> But how about non-batch applications?
>
> > Is there reason not to prepare each statement?
>
> I'm completely against forcing such a design decision on the programmer.
> Hopefully I will be able to add a real prepare statement soon.
>
> > Could it be predicated upon a user supplied option ?
>
> Yes, this is fine with me. If you could rearrange the patch I will test
> and commit it.
>
> Michael
> --
> Michael Meskes
> Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
> ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org
> Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +