Thread: ECPG patch to use prepare for improved performance
This patch for ECPG utilizes the "PQprepare" and "PQexecPrepared" functions to cause SQL statements from ECPG to be cached. It does this without requiring any changes in the user's source program. It was developed during the preparation for a benchmark for a large customer. This benchmark consists of several hundred programs containing several thousand embedded SQL statements. The benchmark has been successfully executed using Oracle, DB2, and PostgreSQL. In the benchmark, Postgres is shown to be slower, by far, than the other DBMS systems. In a three hour execution, using this patch, approximately 30% was saved. The following approach is used: Within the "execute.c" module, routines are added to manage a cache of prepared statements. These routines are used to search, insert, and delete entries in the cache. The key for these cache entries is the text of the SQL statement as passed by ECPG from the application program. Within the same module, the "ECPGexecute" function was replaced. This is the function that is called to execute a statement after some preliminary housekeeping is done. The original "ECPGexecute" function constructs an ASCII string by replacing each host variable with its current value and then calling "PQexec". The new "ECPGexecute" function does the following: - build an array of the current values of the host variables. - search the cache for an entry indicating that this statement has already been prepare'd, via "PQprepare" - If no entry was found in the previous step, call "PQprepare" for the statement and then insert an entry for it into the cache. If this requires an entry to be re-used, execute a "DEALLOCATE PREPARE.." for the previous contents. - At this point, the SQL statement has been prepare'd by PQlib, either when the statement was executed in the past, or in the previous step. - call "PQexecPrepared", using the array of parameters built in the first step above.
Attachment
On Mon, May 07, 2007 at 02:46:29PM -0700, William Lawrance wrote: > This patch for ECPG utilizes the "PQprepare" and "PQexecPrepared" > functions to cause SQL statements from ECPG to be cached. It does > this without requiring any changes in the user's source program. > ... I still do not understand why you prepare each statement. This might help you with your test case, but I don't like to add this as a general rule. If a user wants a prepared statement he/she should use the prepare statement. I agree that the prepare logic has to be rewritten and this is high on my agenda, but I will probably only do this for statements issued with EXEC SQL PREPARE not for every single statement. Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
This approach was used for several reasons-- 1. No changes were required in the application source program. For an application involving thousands of SQL statements in hundreds of programs, this is important. This customer application has been tuned extensively by the customer for DB2, and he is not receptive to large changes. 2. The performance was improved by about 1 hour in the 3 hour elapsed time of the application. This is important to the customer in terms of accomplishing his work load in the time that has been allotted, based on his experience with DB2. Without this improvement, he is likely to consider it too slow. I would like to emphasize that we aren't measuring an artificial test program; this is a real customer's application. We loaded 7 million rows into 217 tables to run the application. I believe it is representative of many real batch applications. Is there reason not to prepare each statement? Could it be predicated upon a user supplied option ? Other comments ? -----Original Message----- From: Michael Meskes [mailto:meskes@postgresql.org] Sent: Wednesday, May 09, 2007 2:04 AM To: William Lawrance Cc: Pgsql-Patches Subject: Re: [PATCHES] ECPG patch to use prepare for improved performance On Mon, May 07, 2007 at 02:46:29PM -0700, William Lawrance wrote: > This patch for ECPG utilizes the "PQprepare" and "PQexecPrepared" > functions to cause SQL statements from ECPG to be cached. It does > this without requiring any changes in the user's source program. > ... I still do not understand why you prepare each statement. This might help you with your test case, but I don't like to add this as a general rule. If a user wants a prepared statement he/she should use the prepare statement. I agree that the prepare logic has to be rewritten and this is high on my agenda, but I will probably only do this for statements issued with EXEC SQL PREPARE not for every single statement. Michael
William Lawrance wrote: > This approach was used for several reasons-- > > 1. No changes were required in the application source program. For > an application involving thousands of SQL statements in hundreds > of programs, this is important. This customer application has > been tuned extensively by the customer for DB2, and he is not > receptive to large changes. > > 2. The performance was improved by about 1 hour in the 3 hour > elapsed time of the application. This is important to the > customer in terms of accomplishing his work load in the > time that has been allotted, based on his experience with DB2. > Without this improvement, he is likely to consider it too slow. > > I would like to emphasize that we aren't measuring an artificial > test program; this is a real customer's application. We loaded > 7 million rows into 217 tables to run the application. I believe > it is representative of many real batch applications. > > > Is there reason not to prepare each statement? One reason is that prepared statements have the parameters passed out of line after the planning is done, so in certain cases the optimizer makes a different choice which leads to worse plans. This used to be a problem with JDBC as well, until a workaround was added so that the "unnamed" prepared statement is not planned until the parameters are passed. If you don't do that, it may end up being a bad choice for applications as well. > Other comments ? Codewise I noticed you wrote your own hashing function, which seemed odd to me at first sight. We already have a hashing infrastructure, but I'm not sure if it could be used in ECPG (mainly due to lack of ereport/elog support). -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wed, May 09, 2007 at 01:12:17PM -0700, William Lawrance wrote: > 2. The performance was improved by about 1 hour in the 3 hour > elapsed time of the application. This is important to the > customer in terms of accomplishing his work load in the > time that has been allotted, based on his experience with DB2. > Without this improvement, he is likely to consider it too slow. But this only holds for one customer. I don't think this will hold for every single application. At least I do not see a reason why this should hold everytime. > I would like to emphasize that we aren't measuring an artificial > test program; this is a real customer's application. We loaded > 7 million rows into 217 tables to run the application. I believe > it is representative of many real batch applications. But how about non-batch applications? > Is there reason not to prepare each statement? I'm completely against forcing such a design decision on the programmer. Hopefully I will be able to add a real prepare statement soon. > Could it be predicated upon a user supplied option ? Yes, this is fine with me. If you could rearrange the patch I will test and commit it. Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
This updated patch for ECPG uses the current routines by default. If an environment variable (ECPGUSEPREPARE) is set to "yes", it uses the new routine that prepares and caches each statement. -----Original Message----- From: Michael Meskes [mailto:meskes@postgresql.org] Sent: Thursday, May 10, 2007 3:01 AM To: William Lawrance Cc: Michael Meskes; Pgsql-Patches Subject: Re: [PATCHES] ECPG patch to use prepare for improved performance On Wed, May 09, 2007 at 01:12:17PM -0700, William Lawrance wrote: > 2. The performance was improved by about 1 hour in the 3 hour > elapsed time of the application. This is important to the > customer in terms of accomplishing his work load in the > time that has been allotted, based on his experience with DB2. > Without this improvement, he is likely to consider it too slow. But this only holds for one customer. I don't think this will hold for every single application. At least I do not see a reason why this should hold everytime. > I would like to emphasize that we aren't measuring an artificial > test program; this is a real customer's application. We loaded > 7 million rows into 217 tables to run the application. I believe > it is representative of many real batch applications. But how about non-batch applications? > Is there reason not to prepare each statement? I'm completely against forcing such a design decision on the programmer. Hopefully I will be able to add a real prepare statement soon. > Could it be predicated upon a user supplied option ? Yes, this is fine with me. If you could rearrange the patch I will test and commit it. Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
Attachment
This seems like a very all or nothing approach. By contrast, the Perl DBD::Pg driver lets you decide per statement if you want it server-prepared or not. Is that not possible? cheers andrew William Lawrance wrote: > This updated patch for ECPG uses the current routines by > default. If an environment variable (ECPGUSEPREPARE) is set > to "yes", it uses the new routine that prepares and > caches each statement. > > > > > -----Original Message----- > From: Michael Meskes [mailto:meskes@postgresql.org] > Sent: Thursday, May 10, 2007 3:01 AM > To: William Lawrance > Cc: Michael Meskes; Pgsql-Patches > Subject: Re: [PATCHES] ECPG patch to use prepare for improved > performance > > > On Wed, May 09, 2007 at 01:12:17PM -0700, William Lawrance wrote: > >> 2. The performance was improved by about 1 hour in the 3 hour >> elapsed time of the application. This is important to the >> customer in terms of accomplishing his work load in the >> time that has been allotted, based on his experience with DB2. >> Without this improvement, he is likely to consider it too slow. >> > > But this only holds for one customer. I don't think this will hold for > every single application. At least I do not see a reason why this > should hold everytime. > > >> I would like to emphasize that we aren't measuring an artificial >> test program; this is a real customer's application. We loaded >> 7 million rows into 217 tables to run the application. I believe >> it is representative of many real batch applications. >> > > But how about non-batch applications? > > >> Is there reason not to prepare each statement? >> > > I'm completely against forcing such a design decision on the programmer. > Hopefully I will be able to add a real prepare statement soon. > > >> Could it be predicated upon a user supplied option ? >> > > Yes, this is fine with me. If you could rearrange the patch I will test > and commit it. > > Michael > > ------------------------------------------------------------------------ > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- William Lawrance wrote: > > This updated patch for ECPG uses the current routines by > default. If an environment variable (ECPGUSEPREPARE) is set > to "yes", it uses the new routine that prepares and > caches each statement. > > > > > -----Original Message----- > From: Michael Meskes [mailto:meskes@postgresql.org] > Sent: Thursday, May 10, 2007 3:01 AM > To: William Lawrance > Cc: Michael Meskes; Pgsql-Patches > Subject: Re: [PATCHES] ECPG patch to use prepare for improved > performance > > > On Wed, May 09, 2007 at 01:12:17PM -0700, William Lawrance wrote: > > 2. The performance was improved by about 1 hour in the 3 hour > > elapsed time of the application. This is important to the > > customer in terms of accomplishing his work load in the > > time that has been allotted, based on his experience with DB2. > > Without this improvement, he is likely to consider it too slow. > > But this only holds for one customer. I don't think this will hold for > every single application. At least I do not see a reason why this > should hold everytime. > > > I would like to emphasize that we aren't measuring an artificial > > test program; this is a real customer's application. We loaded > > 7 million rows into 217 tables to run the application. I believe > > it is representative of many real batch applications. > > But how about non-batch applications? > > > Is there reason not to prepare each statement? > > I'm completely against forcing such a design decision on the programmer. > Hopefully I will be able to add a real prepare statement soon. > > > Could it be predicated upon a user supplied option ? > > Yes, this is fine with me. If you could rearrange the patch I will test > and commit it. > > Michael > -- > Michael Meskes > Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) > ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes@jabber.org > Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL! [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +