Thread: Performance problems with prepared statements
Hi I have been having some serious performance issues when using prepared statements which I can not re-produce when using a direct statement. Let me try to explain The query does an order by in descending order on several columns for which an index exists. The explain output as follows rascal=# explain SELECT oid, * FROM calllog WHERE calllog_mainteng = '124 ' AND calllog_phase = 8 AND calllog_self < 366942 OR calllog_mainteng = '124 ' AND calllog_phase < 8 ORDER BY calllog_mainteng DESC, calllog_phase DESC, calllog_self DESC limit 25; QUERY PLAN --------------------------------------------------------------------------------------------------------- Limit (cost=0.00..111.62 rows=25 width=2164) -> Index Scan Backward using calllog_rmc_idx on calllog (cost=0.00..53475.22 rows=11977 width=2164) Index Cond: (calllog_mainteng = '124 '::bpchar) Filter: (((calllog_phase = 8) AND (calllog_self < 366942)) OR (calllog_phase < 8)) When running the query directly from psql it returns the required rows in less than 100 milli-seconds. However, when using a prepared statement from my C application on the above query and executing it the query duration is as follows SELECT oid, * FROM calllog WHERE calllog_mainteng = '124 ' AND calllog_phase = 8 AND calllog_self < 366942 OR calllog_mainteng = '124 ' AND calllog_phase < 8 ORDER BY calllog_mainteng DESC, calllog_phase DESC, calllog_self DESC limit 25 Row[s] = 25, Duration = 435409.474 ms The index as per the explain is defined as follows "calllog_rmc_idx" UNIQUE, btree (calllog_mainteng, calllog_phase, calllog_self) VACUUM and all those good things done Version of PostgreSQL 8.1 and 8.2 enable_seqscan = off enable_sort = off Any advice/suggestions/thoughts much appreciated -- Regards Theo
Theo Kramer a écrit : > Hi > > I have been having some serious performance issues when using prepared > statements which I can not re-produce when using a direct statement. Let > me try to explain > > The query does an order by in descending order on several columns for > which an index exists. > > The explain output as follows > > rascal=# explain SELECT oid, * FROM calllog > WHERE calllog_mainteng = '124 ' > AND calllog_phase = 8 > AND calllog_self < 366942 > OR calllog_mainteng = '124 ' > AND calllog_phase < 8 > ORDER BY calllog_mainteng DESC, > calllog_phase DESC, > calllog_self DESC limit 25; > QUERY PLAN > --------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..111.62 rows=25 width=2164) > -> Index Scan Backward using calllog_rmc_idx on calllog > (cost=0.00..53475.22 rows=11977 width=2164) > Index Cond: (calllog_mainteng = '124 '::bpchar) > Filter: (((calllog_phase = 8) AND (calllog_self < 366942)) OR > (calllog_phase < 8)) > > When running the query directly from psql it returns the required rows > in less than 100 milli-seconds. > > However, when using a prepared statement from my C application on the > above query and executing it the query duration is as follows > > SELECT oid, * FROM calllog > WHERE calllog_mainteng = '124 ' > AND calllog_phase = 8 > AND calllog_self < 366942 > OR calllog_mainteng = '124 ' > AND calllog_phase < 8 > ORDER BY calllog_mainteng DESC, > calllog_phase DESC, > calllog_self DESC limit 25 > Row[s] = 25, Duration = 435409.474 ms > > The index as per the explain is defined as follows > > "calllog_rmc_idx" UNIQUE, btree (calllog_mainteng, calllog_phase, > calllog_self) > > VACUUM and all those good things done > > Version of PostgreSQL 8.1 and 8.2 > > enable_seqscan = off > enable_sort = off > > Any advice/suggestions/thoughts much appreciated > Reading the manual, you can learn that prepared statement can (not) follow the same plan as direct query: the plan is make before pg know the value of the variable. See 'Notes' http://www.postgresql.org/docs/8.2/interactive/sql-prepare.html
On Wed, 2007-10-10 at 17:00 +0200, Cédric Villemain wrote: > <snip> > Reading the manual, you can learn that prepared statement can (not) > follow the same plan as direct query: > the plan is make before pg know the value of the variable. > > See 'Notes' http://www.postgresql.org/docs/8.2/interactive/sql-prepare.html Thanks, had missed that, however, I am afraid that I fail to see how preparing a query using PQprepare() and then executing it using PQexecPrepared(), is 8 thousand times slower than directly executing it.,, ( 403386.583ms/50.0ms = 8067 ). When doing a 'manual' prepare and explain analyze I get the following rascal=# prepare cq (char(12), smallint, integer) as SELECT oid, calllog_mainteng, calllog_phase, calllog_self FROM calllog WHERE calllog_mainteng = $1 AND calllog_phase = $2 AND calllog_self < $3 OR calllog_mainteng = $1 AND calllog_phase < $2 ORDER BY calllog_mainteng DESC, calllog_phase DESC, calllog_self DESC limit 25; PREPARE rascal=# explain analyze execute cq ('124 ', 8, 366942); QUERY PLAN --------------------------------------------------------------------------- Limit (cost=0.00..232.73 rows=25 width=26) (actual time=2.992..3.178 rows=25 loops=1) -> Index Scan Backward using calllog_rmc_idx on calllog (cost=0.00..38651.38 rows=4152 width=26) (actual time=2.986..3.116 rows=25 loops=1) Index Cond: (calllog_mainteng = $1) Filter: (((calllog_phase = $2) AND (calllog_self < $3)) OR (calllog_phase < $2)) Total runtime: 3.272 ms So I suspect that there is something more fundamental here... -- Regards Theo
On 10/10/07, Theo Kramer <theo@flame.co.za> wrote: > When running the query directly from psql it returns the required rows > in less than 100 milli-seconds. > > However, when using a prepared statement from my C application on the > above query and executing it the query duration is as follows > ... > Row[s] = 25, Duration = 435409.474 ms > How are you timing it? Does it really take 435 seconds to complete? Try the following in psql: postgres# PREPARE yourplan (VARCHAR, INT, INT) AS SELECT oid, * FROM calllog WHERE calllog_mainteng = $1 AND calllog_phase = $2 AND calllog_self < $3 OR calllog_mainteng = $1 AND calllog_phase < 8 ORDER BY calllog_mainteng DESC, calllog_phase DESC, calllog_self DESC limit 25; postgres# EXECUTE yourplan('124 ', 8, 366942); -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/
On Wed, 2007-10-10 at 15:55 -0400, Jonah H. Harris wrote: > On 10/10/07, Theo Kramer <theo@flame.co.za> wrote: > > When running the query directly from psql it returns the required rows > > in less than 100 milli-seconds. > > > > However, when using a prepared statement from my C application on the > > above query and executing it the query duration is as follows > > ... > > Row[s] = 25, Duration = 435409.474 ms > > > > How are you timing it? Does it really take 435 seconds to complete? Fraid so - and I am running postgresql on a separate machine from the client machine - with the load going way up on the postgresql machine and the client machine remains idle until the query returns. Also the postgresql has only the one prepared statement executing during my tests. > Try the following in psql: Did that - see my previous email. -- Regards Theo
Theo Kramer a écrit : > On Wed, 2007-10-10 at 17:00 +0200, Cédric Villemain wrote: > >> <snip> >> Reading the manual, you can learn that prepared statement can (not) >> follow the same plan as direct query: >> the plan is make before pg know the value of the variable. >> >> See 'Notes' http://www.postgresql.org/docs/8.2/interactive/sql-prepare.html >> > > Thanks, had missed that, however, I am afraid that I fail to see how > preparing a query using PQprepare() and then executing it using > PQexecPrepared(), is 8 thousand times slower than directly executing > it.,, ( 403386.583ms/50.0ms = 8067 ). > > When doing a 'manual' prepare and explain analyze I get the following > > rascal=# prepare cq (char(12), smallint, integer) as SELECT oid, > calllog_mainteng, calllog_phase, calllog_self FROM calllog > WHERE calllog_mainteng = $1 > AND calllog_phase = $2 > AND calllog_self < $3 > OR calllog_mainteng = $1 > AND calllog_phase < $2 > ORDER BY calllog_mainteng DESC, > calllog_phase DESC, > calllog_self DESC limit 25; > PREPARE > rascal=# explain analyze execute cq ('124 ', 8, 366942); > QUERY > PLAN > --------------------------------------------------------------------------- > Limit (cost=0.00..232.73 rows=25 width=26) (actual time=2.992..3.178 > rows=25 loops=1) > -> Index Scan Backward using calllog_rmc_idx on calllog > (cost=0.00..38651.38 rows=4152 width=26) (actual time=2.986..3.116 > rows=25 loops=1) > Index Cond: (calllog_mainteng = $1) > Filter: (((calllog_phase = $2) AND (calllog_self < $3)) OR > (calllog_phase < $2)) > Total runtime: 3.272 ms > > > So I suspect that there is something more fundamental here... > my two cents: perhaps ... please check that with your C code And be sure you are not providing time from application. If you have a lot of data and/or a lag on your lan, it can be the cause of your so big difference between psql and C
Theo Kramer wrote: > Thanks, had missed that, however, I am afraid that I fail to see how > preparing a query using PQprepare() and then executing it using > PQexecPrepared(), is 8 thousand times slower than directly executing > it.,, ( 403386.583ms/50.0ms = 8067 ). > > When doing a 'manual' prepare and explain analyze I get the following > rascal=# explain analyze execute cq ('124 ', 8, 366942); > Total runtime: 3.272 ms > > So I suspect that there is something more fundamental here... OK, so there must be something different between the two scenarios. It can only be one of: 1. Query 2. DB Environment (user, locale, settings) 3. Network environment (server/client/network activity etc) Are you sure you have the parameter types correct in your long-running query? Try setting log_min_duration_statement=9000 or so to capture long-running queries. Make sure the user and any custom settings are the same. Compare SHOW ALL for both ways. You've said elsewhere you've ruled out the network environment, so there's not point worrying about that further. -- Richard Huxton Archonet Ltd
On Thu, 2007-10-11 at 10:12 +0100, Richard Huxton wrote: > Theo Kramer wrote: > > > > So I suspect that there is something more fundamental here... > > OK, so there must be something different between the two scenarios. It > can only be one of: > 1. Query > 2. DB Environment (user, locale, settings) > 3. Network environment (server/client/network activity etc) I suspect that it could also be in the way the libpq PQprepare(), and PQexecPrepared() are handled... as opposed to the way PREPARE and EXECUTE are handled. > > Are you sure you have the parameter types correct in your long-running > query? Yes - the problem surfaced during a going live session on an 80 user system... and we had to roll back to the previous system in a hurry. This was a part of the application that had missed testing, but I have had other reports from some of my other systems where this appears to be a problem but not of the magnitude that this one is. In any case I have managed to reproduce it in my test environment with configuration settings the same. > Try setting log_min_duration_statement=9000 or so to capture > long-running queries. Thanks - will give that a try. > > Make sure the user and any custom settings are the same. Compare SHOW > ALL for both ways. > You've said elsewhere you've ruled out the network environment, so > there's not point worrying about that further. It is definitely not a network problem - ie. the postgresql server load goes way up when this query is run. -- Regards Theo
On 10/11/07, Theo Kramer <theo@flame.co.za> wrote: > On Thu, 2007-10-11 at 10:12 +0100, Richard Huxton wrote: > > Theo Kramer wrote: > > > > > > So I suspect that there is something more fundamental here... > > > > OK, so there must be something different between the two scenarios. It > > can only be one of: > > 1. Query > > 2. DB Environment (user, locale, settings) > > 3. Network environment (server/client/network activity etc) > > I suspect that it could also be in the way the libpq PQprepare(), and > PQexecPrepared() are handled... as opposed to the way PREPARE and > EXECUTE are handled. PQexecPrepared is generally the fastest way to run queries from a C app as long as you get the right plan. Some suggestions * you can explain/explain analyze executing prepared statements from psql shell...try that and see if you can reproduce results * at worst case you can drop to execParams which is faster (and better) than PQexec, at least * if problem is plan related, you can always disable certain plan types (seqscan), prepare, and re-enable those plan types * do as Jonah suggested, first step is to try and reproduce problem from psql merlin
On 2007-10-10, Theo Kramer <theo@flame.co.za> wrote: > When doing a 'manual' prepare and explain analyze I get the following > > rascal=# prepare cq (char(12), smallint, integer) as SELECT oid, > calllog_mainteng, calllog_phase, calllog_self FROM calllog > WHERE calllog_mainteng = $1 > AND calllog_phase = $2 > AND calllog_self < $3 > OR calllog_mainteng = $1 > AND calllog_phase < $2 > ORDER BY calllog_mainteng DESC, > calllog_phase DESC, > calllog_self DESC limit 25; > PREPARE When you do this from the application, are you passing it 3 parameters, or 5? The plan is clearly taking advantage of the fact that the two occurrences of $1 and $2 are known to be the same value; if your app is using some interface that uses ? placeholders rather than numbered parameters, then the planner will not be able to make this assumption. Also, from the application, is the LIMIT 25 passed as a constant or is that also a parameter? -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
On 10/11/07, Andrew - Supernews <andrew+nonews@supernews.com> wrote: > On 2007-10-10, Theo Kramer <theo@flame.co.za> wrote: > > When doing a 'manual' prepare and explain analyze I get the following > > > > rascal=# prepare cq (char(12), smallint, integer) as SELECT oid, > > calllog_mainteng, calllog_phase, calllog_self FROM calllog > > WHERE calllog_mainteng = $1 > > AND calllog_phase = $2 > > AND calllog_self < $3 > > OR calllog_mainteng = $1 > > AND calllog_phase < $2 > > ORDER BY calllog_mainteng DESC, > > calllog_phase DESC, > > calllog_self DESC limit 25; > > PREPARE > > When you do this from the application, are you passing it 3 parameters, > or 5? The plan is clearly taking advantage of the fact that the two > occurrences of $1 and $2 are known to be the same value; if your app is > using some interface that uses ? placeholders rather than numbered > parameters, then the planner will not be able to make this assumption. > > Also, from the application, is the LIMIT 25 passed as a constant or is that > also a parameter? also, this looks a bit like a drilldown query, which is ordering the table on 2+ fields. if that's the case, row wise comparison is a better and faster approach. is this a converted cobol app? merlin
On Thu, 2007-10-11 at 16:04 -0400, Merlin Moncure wrote: > On 10/11/07, Andrew - Supernews <andrew+nonews@supernews.com> wrote: > > On 2007-10-10, Theo Kramer <theo@flame.co.za> wrote: > > > When doing a 'manual' prepare and explain analyze I get the following > > > > > > rascal=# prepare cq (char(12), smallint, integer) as SELECT oid, > > > calllog_mainteng, calllog_phase, calllog_self FROM calllog > > > WHERE calllog_mainteng = $1 > > > AND calllog_phase = $2 > > > AND calllog_self < $3 > > > OR calllog_mainteng = $1 > > > AND calllog_phase < $2 > > > ORDER BY calllog_mainteng DESC, > > > calllog_phase DESC, > > > calllog_self DESC limit 25; > > > PREPARE > > > > When you do this from the application, are you passing it 3 parameters, > > or 5? The plan is clearly taking advantage of the fact that the two > > occurrences of $1 and $2 are known to be the same value; if your app is > > using some interface that uses ? placeholders rather than numbered > > parameters, then the planner will not be able to make this assumption. > > > > Also, from the application, is the LIMIT 25 passed as a constant or is that > > also a parameter? > > also, this looks a bit like a drilldown query, which is ordering the > table on 2+ fields. if that's the case, row wise comparison is a > better and faster approach. Agreed - and having a look into that. > is this a converted cobol app? :) - on the right track - it is a conversion from an isam based package where I have changed the backed to PostgreSQL. Unfortunately there is way too much legacy design and application code to change things at a higher level. -- Regards Theo
On Thu, 2007-10-11 at 13:28 -0400, Merlin Moncure wrote: > On 10/11/07, Theo Kramer <theo@flame.co.za> wrote: > > On Thu, 2007-10-11 at 10:12 +0100, Richard Huxton wrote: > > > Theo Kramer wrote: > > > > > > > > So I suspect that there is something more fundamental here... > > > > > > OK, so there must be something different between the two scenarios. It > > > can only be one of: > > > 1. Query > > > 2. DB Environment (user, locale, settings) > > > 3. Network environment (server/client/network activity etc) > > > > I suspect that it could also be in the way the libpq PQprepare(), and > > PQexecPrepared() are handled... as opposed to the way PREPARE and > > EXECUTE are handled. > > PQexecPrepared is generally the fastest way to run queries from a C > app as long as you get the right plan. Some suggestions > > * you can explain/explain analyze executing prepared statements from > psql shell...try that and see if you can reproduce results Did that - see previous emails in this thread. > * at worst case you can drop to execParams which is faster (and > better) than PQexec, at least Thanks - will keep that option open. > * if problem is plan related, you can always disable certain plan > types (seqscan), prepare, and re-enable those plan types > * do as Jonah suggested, first step is to try and reproduce problem from psql No success on that. -- Regards Theo
On Thu, 2007-10-11 at 18:28 +0000, Andrew - Supernews wrote: > On 2007-10-10, Theo Kramer <theo@flame.co.za> wrote: > > When doing a 'manual' prepare and explain analyze I get the following > > > > rascal=# prepare cq (char(12), smallint, integer) as SELECT oid, > > calllog_mainteng, calllog_phase, calllog_self FROM calllog > > WHERE calllog_mainteng = $1 > > AND calllog_phase = $2 > > AND calllog_self < $3 > > OR calllog_mainteng = $1 > > AND calllog_phase < $2 > > ORDER BY calllog_mainteng DESC, > > calllog_phase DESC, > > calllog_self DESC limit 25; > > PREPARE > > When you do this from the application, are you passing it 3 parameters, > or 5? The plan is clearly taking advantage of the fact that the two > occurrences of $1 and $2 are known to be the same value; if your app is > using some interface that uses ? placeholders rather than numbered > parameters, then the planner will not be able to make this assumption. You may just have hit the nail on the head. I use numbered parameters but have $1 to $5 ... let me take a look to see if I can change this. > Also, from the application, is the LIMIT 25 passed as a constant or is that > also a parameter? A constant. > -- Regards Theo
Theo Kramer wrote: > On Thu, 2007-10-11 at 18:28 +0000, Andrew - Supernews wrote: >> When you do this from the application, are you passing it 3 parameters, >> or 5? The plan is clearly taking advantage of the fact that the two >> occurrences of $1 and $2 are known to be the same value; if your app is >> using some interface that uses ? placeholders rather than numbered >> parameters, then the planner will not be able to make this assumption. > > You may just have hit the nail on the head. I use numbered parameters > but have $1 to $5 ... let me take a look to see if I can change this. That'll be it. -- Richard Huxton Archonet Ltd
On 10/12/07, Theo Kramer <theo@flame.co.za> wrote: > On Thu, 2007-10-11 at 16:04 -0400, Merlin Moncure wrote: > > is this a converted cobol app? > > :) - on the right track - it is a conversion from an isam based package > where I have changed the backed to PostgreSQL. Unfortunately there is > way too much legacy design and application code to change things at a > higher level. fwiw, I converted a pretty large cobol app (acucobol) to postgresql backend translating queries on the fly. if this is a fresh effort, you definately want to use the row-wise comparison feature of 8.2. not only is it much simpler, it's much faster. with some clever caching strategies i was able to get postgresql performance to exceed the isam backend. btw, I used execprepared for virtually the entire system. example read next: select * from foo where (a,b,c) > (a1,b1,c1) order by a,b,c limit 25; example read previous: select * from foo where (a,b,c) < (a1,b1,c1) order by a desc, b desc, c desc limit 25; etc. this will use complete index for a,b,c and is much cleaner to prepare, and parse for the planner (the best you can get with standard tactics is to get backend to use index on a). Another big tip i can give you (also 8.2) is to check into advisory locks for isam style pessimistic locking. With some thin wrappers you can generate full row and table locking which is quite powerful. merlin
On Fri, 2007-10-12 at 09:02 -0400, Merlin Moncure wrote: > fwiw, I converted a pretty large cobol app (acucobol) to postgresql > backend translating queries on the fly. if this is a fresh effort, > you definately want to use the row-wise comparison feature of 8.2. > not only is it much simpler, it's much faster. with some clever > caching strategies i was able to get postgresql performance to exceed > the isam backend. btw, I used execprepared for virtually the entire > system. > > example read next: > select * from foo where (a,b,c) > (a1,b1,c1) order by a,b,c limit 25; > > example read previous: > select * from foo where (a,b,c) < (a1,b1,c1) order by a desc, b desc, > c desc limit 25; > > etc. this will use complete index for a,b,c and is much cleaner to > prepare, and parse for the planner (the best you can get with standard > tactics is to get backend to use index on a). > > Another big tip i can give you (also 8.2) is to check into advisory > locks for isam style pessimistic locking. With some thin wrappers you > can generate full row and table locking which is quite powerful. Very interesting - I have largely done the same thing, creating tables on the fly, translating isam calls, and creating, preparing and executing queries on the fly using the libpq PQprepare() and PQexecPrepared() statements... and it is running rather well at several sites, however, the initial port I did was for 8.0 and 8.1 so could not, at the time use, row level comparison, although I do have it on the latest version of my code working on 8.2 which is not yet released. The problem I have on row level comparison is that we have orders that are mixed, ie. a mixture of ascending and descending orders and do not know if it is possible to use row level comparison on that... eg. I haven't been able to transform the following it a row comparison query. select * from foo where (a = a1 and b = b1 and c >= c1) or (a = a1 and b < b1) or (a > a1) order by a, b desc, c; I have, however, found that transforming the above into a union based query performs substantially better. Also indexes containing mixed order columns will only be available on 8.3... But many thanks for your advice. -- Regards Theo
>>> On Fri, Oct 12, 2007 at 9:57 AM, in message <1192201021.6170.47.camel@localhost.localdomain>, Theo Kramer <theo@flame.co.za> wrote: > > select * from foo where > (a = a1 and b = b1 and c >= c1) or > (a = a1 and b < b1) or > (a > a1) > order by a, b desc, c; > > I have, however, found that transforming the above into a union based > query performs substantially better. Another approach which often performs better is to rearrange the logic so that the high-order predicate is AND instead of OR: select * from foo where ( a >= a1 and ( a > a1 or ( b <= b1 and ( b < b1 or ( c >= c1 ))))) order by a, b desc, c; With the right index and a limit on rows, this can do particularly well. -Kevin