Thread: BUG #1334: PREPARE creates bad execution plan (40x slower)
The following bug has been logged online: Bug reference: 1334 Logged by: A. Steinmetz Email address: ast@domdv.de PostgreSQL version: 7.4.6 Operating system: Linux Description: PREPARE creates bad execution plan (40x slower) Details: Direct excution of: explain analyze INSERT INTO results SELECT pagesearch.weight,pagesearch.pageid FROM pagesearch,topictrace WHERE pagesearch.wordid=924375 AND pagesearch.catid=topictrace.catid AND topictrace.refid=1 LIMIT 1500; gives: Subquery Scan "*SELECT*" (cost=0.00..11348.27 rows=1500 width=8) (actual time=0.317..44.297 rows=1500 loops=1) -> Limit (cost=0.00..11333.27 rows=1500 width=8) (actual time=0.314..42.909 rows=1500 loops=1) -> Nested Loop (cost=0.00..40202.90 rows=5321 width=8) (actual time=0.311..42.185 rows=1500 loops=1) -> Index Scan using pgscwdidx on pagesearch (cost=0.00..173.32rows=7580 width=12) (actual time=0.167..2.725 rows=1500 loops=1) Index Cond: (wordid = 924375) -> Index Scan using tptrc on topictrace (cost=0.00..5.27 rows=1 width=4) (actual time=0.023..0.024 rows=1 loops=1500) Index Cond: ((topictrace.refid = 1) AND ("outer".catid = topictrace.catid)) Total runtime: 53.663 ms (8 rows) ====================================================== Now, executing: prepare t1 (integer,integer) as INSERT INTO results SELECT pagesearch.weight,pagesearch.pageid FROM pagesearch,topictrace WHERE pagesearch.wordid=$1 AND pagesearch.catid=topictrace.catid AND topictrace.refid=$2 LIMIT 1500; explain analyze execute t1 (924375,1); gives: Subquery Scan "*SELECT*" (cost=6569.10..6619.22 rows=17 width=8) (actual time=2013.509..2039.757 rows=1500 loops=1) -> Limit (cost=6569.10..6619.05 rows=17 width=8) (actual time=2013.503..2038.543 rows=1500 loops=1) -> Merge Join (cost=6569.10..6619.05 rows=17 width=8) (actual time=2013.500..2037.904 rows=1500 loops=1) Merge Cond: ("outer".catid = "inner".catid) -> Sort (cost=701.29..721.28 rows=7996 width=12) (actual time=32.194..32.546 rows=1500 loops=1) Sort Key: pagesearch.catid -> Index Scan using pgscwdidx on pagesearch (cost=0.00..182.94 rows=7996 width=12) (actual time=0.176..15.574 rows=9267 loops=1) Index Cond: (wordid = $1) -> Sort (cost=5867.81..5872.71 rows=1960 width=4) (actual time=1981.179..1988.281 rows=31483 loops=1) Sort Key: topictrace.catid -> Index Scan using tptrc on topictrace (cost=0.00..5760.63 rows=1960 width=4) (actual time=0.172..978.313 rows=650273 loops=1) Index Cond: (refid = $2) Total runtime: 2155.218 ms (13 rows) ===================================================== This means that using a prepared statement instead of a direct query is *40* times slower! Some more information about the tables used: CREATE TEMPORARY TABLE results (weight INTEGER,pageid INTEGER); CREATE INDEX residx ON results (weight); CREATE TABLE pagesearch (serial INTEGER PRIMARY KEY,wordid INTEGER,weight INTEGER,pageid INTEGER,catid INTEGER,ages INTEGER); CREATE INDEX pgscwdidx on pagesearch (wordid); CREATE TABLE topictrace (serial INTEGER PRIMARY KEY,refid INTEGER,catid INTEGER); CREATE INDEX tptrc on topictrace (refid,catid); Data volumes in the non-temporary tables: pagesearch: 48318888 rows topictrace: 5271657 rows Note: the same prepared statement works well with other typical databases (e.g. MySQL, SQLite).
On Tue, 2004-11-30 at 22:19 +0000, PostgreSQL Bugs List wrote: > This means that using a prepared statement instead of a direct query is *40* > times slower! Yes, it's a known (documented) issue that you can get inferior query plans using prepared statements. I don't know of an easy way to fix this: we cannot infer the value of the prepared query's parameters when the planning is done. An incremental improvement would be to delay preparing queries until the first time they are executed (so PREPARE would just store the query in an internal hash table, and the first EXECUTE would do the planning, then store the plan in the hash table). This would allow us to make use of the constant values of query parameters, but there is no guarantee that those constant values will be the same for future EXECUTEs (i.e. we could actually see worse performance in the aggregate). A more sophisticated approach would be to do something along the lines of generating multiple plans at PREPARE time and then choosing the "best" plan for a given EXECUTE (by matching the supplied query parameters to the closest guessed set of parameters chosen by PREPARE). This is a _hard_ problem though, especially in the presence of multiple parameters. > Note: the same prepared statement works well with other typical > databases (e.g. MySQL, SQLite). This isn't really relevant -- the fact that we fall over for this particular query is as much bad luck as anything else. -Neil
FWIW this is a hard problem; Oracle is the only database I know of that's tackled it. On Wed, Dec 01, 2004 at 11:38:25AM +1100, Neil Conway wrote: > On Tue, 2004-11-30 at 22:19 +0000, PostgreSQL Bugs List wrote: > > This means that using a prepared statement instead of a direct query is *40* > > times slower! > > Yes, it's a known (documented) issue that you can get inferior query > plans using prepared statements. I don't know of an easy way to fix > this: we cannot infer the value of the prepared query's parameters when > the planning is done. > > An incremental improvement would be to delay preparing queries until the > first time they are executed (so PREPARE would just store the query in > an internal hash table, and the first EXECUTE would do the planning, > then store the plan in the hash table). This would allow us to make use > of the constant values of query parameters, but there is no guarantee > that those constant values will be the same for future EXECUTEs (i.e. we > could actually see worse performance in the aggregate). > > A more sophisticated approach would be to do something along the lines > of generating multiple plans at PREPARE time and then choosing the > "best" plan for a given EXECUTE (by matching the supplied query > parameters to the closest guessed set of parameters chosen by PREPARE). > This is a _hard_ problem though, especially in the presence of multiple > parameters. > > > Note: the same prepared statement works well with other typical > > databases (e.g. MySQL, SQLite). > > This isn't really relevant -- the fact that we fall over for this > particular query is as much bad luck as anything else. > > -Neil > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
"Jim C. Nasby" <decibel@decibel.org> writes: > FWIW this is a hard problem; Oracle is the only database I know of > that's tackled it. It seems fair to point out that this is the tradeoff you must buy into when using PREPARE. You can have a query plan that is tailored to the specific parameter value you supply, or you can have a generic query plan. The tailored plan will cost you planning time; the generic plan will save you planning time; but there's no free lunch. If your table statistics are such that you really need different plans for different parameter values, then you shouldn't be using PREPARE. I do not think this is a database bug --- it looks more like DBA misuse of the available tools. regards, tom lane
Tom Lane wrote: > "Jim C. Nasby" <decibel@decibel.org> writes: > > FWIW this is a hard problem; Oracle is the only database I know of > > that's tackled it. > > It seems fair to point out that this is the tradeoff you must buy into > when using PREPARE. You can have a query plan that is tailored to the > specific parameter value you supply, or you can have a generic query > plan. The tailored plan will cost you planning time; the generic plan > will save you planning time; but there's no free lunch. If your table > statistics are such that you really need different plans for different > parameter values, then you shouldn't be using PREPARE. I do not think > this is a database bug --- it looks more like DBA misuse of the > available tools. There are a few PREPARE items on the TODO list. The big one is that the SQL PREPARE is not delayed until the first execute so no actual parameter values are used to generated the prepared plan. libpq prepare does do this already. The full TODO item is: * Allow finer control over the caching of prepared query plans Currently, queries prepared via the libpq API are planned on first execute using the supplied parameters --- allow SQL PREPARE to do the same. Also, allow control over replanning prepared queries either manually or automatically when statistics for execute parameters differ dramatically from those used during planning. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073