Thread: BUG #1334: PREPARE creates bad execution plan (40x slower)

BUG #1334: PREPARE creates bad execution plan (40x slower)

From
"PostgreSQL Bugs List"
Date:
The following bug has been logged online:

Bug reference:      1334
Logged by:          A. Steinmetz

Email address:      ast@domdv.de

PostgreSQL version: 7.4.6

Operating system:   Linux

Description:        PREPARE creates bad execution plan (40x slower)

Details:

Direct excution of:

explain analyze INSERT INTO results SELECT
pagesearch.weight,pagesearch.pageid FROM pagesearch,topictrace WHERE
pagesearch.wordid=924375 AND pagesearch.catid=topictrace.catid AND
topictrace.refid=1 LIMIT 1500;

gives:

 Subquery Scan "*SELECT*"  (cost=0.00..11348.27 rows=1500 width=8) (actual
time=0.317..44.297 rows=1500 loops=1)
   ->  Limit  (cost=0.00..11333.27 rows=1500 width=8) (actual
time=0.314..42.909 rows=1500 loops=1)
         ->  Nested Loop  (cost=0.00..40202.90 rows=5321 width=8) (actual
time=0.311..42.185 rows=1500 loops=1)
               ->  Index Scan using pgscwdidx on pagesearch
(cost=0.00..173.32rows=7580 width=12) (actual time=0.167..2.725 rows=1500
loops=1)
                     Index Cond: (wordid = 924375)
               ->  Index Scan using tptrc on topictrace  (cost=0.00..5.27
rows=1 width=4) (actual time=0.023..0.024 rows=1 loops=1500)
                     Index Cond: ((topictrace.refid = 1) AND ("outer".catid
= topictrace.catid))
 Total runtime: 53.663 ms
(8 rows)



======================================================

Now, executing:

prepare t1 (integer,integer) as INSERT INTO results SELECT
pagesearch.weight,pagesearch.pageid FROM pagesearch,topictrace WHERE
pagesearch.wordid=$1 AND pagesearch.catid=topictrace.catid AND
topictrace.refid=$2 LIMIT 1500;

explain analyze execute t1 (924375,1);

gives:

 Subquery Scan "*SELECT*"  (cost=6569.10..6619.22 rows=17 width=8) (actual
time=2013.509..2039.757 rows=1500 loops=1)
   ->  Limit  (cost=6569.10..6619.05 rows=17 width=8) (actual
time=2013.503..2038.543 rows=1500 loops=1)
         ->  Merge Join  (cost=6569.10..6619.05 rows=17 width=8) (actual
time=2013.500..2037.904 rows=1500 loops=1)
               Merge Cond: ("outer".catid = "inner".catid)
               ->  Sort  (cost=701.29..721.28 rows=7996 width=12) (actual
time=32.194..32.546 rows=1500 loops=1)
                     Sort Key: pagesearch.catid
                     ->  Index Scan using pgscwdidx on pagesearch
(cost=0.00..182.94 rows=7996 width=12) (actual time=0.176..15.574 rows=9267
loops=1)
                           Index Cond: (wordid = $1)
               ->  Sort  (cost=5867.81..5872.71 rows=1960 width=4) (actual
time=1981.179..1988.281 rows=31483 loops=1)
                     Sort Key: topictrace.catid
                     ->  Index Scan using tptrc on topictrace
(cost=0.00..5760.63 rows=1960 width=4) (actual time=0.172..978.313
rows=650273 loops=1)
                           Index Cond: (refid = $2)
 Total runtime: 2155.218 ms
(13 rows)

=====================================================

This means that using a prepared statement instead of a direct query is *40*
times slower!

Some more information about the tables used:

CREATE TEMPORARY TABLE results (weight INTEGER,pageid INTEGER);
CREATE INDEX residx ON results (weight);
CREATE TABLE pagesearch (serial INTEGER PRIMARY KEY,wordid INTEGER,weight
INTEGER,pageid INTEGER,catid INTEGER,ages INTEGER);
CREATE INDEX pgscwdidx on pagesearch (wordid);
CREATE TABLE topictrace (serial INTEGER PRIMARY KEY,refid INTEGER,catid
INTEGER);
CREATE INDEX tptrc on topictrace (refid,catid);

Data volumes in the non-temporary tables:

pagesearch: 48318888 rows
topictrace: 5271657 rows

Note: the same prepared statement works well with other typical databases
(e.g. MySQL, SQLite).

Re: BUG #1334: PREPARE creates bad execution plan (40x

From
Neil Conway
Date:
On Tue, 2004-11-30 at 22:19 +0000, PostgreSQL Bugs List wrote:
> This means that using a prepared statement instead of a direct query is *40*
> times slower!

Yes, it's a known (documented) issue that you can get inferior query
plans using prepared statements. I don't know of an easy way to fix
this: we cannot infer the value of the prepared query's parameters when
the planning is done.

An incremental improvement would be to delay preparing queries until the
first time they are executed (so PREPARE would just store the query in
an internal hash table, and the first EXECUTE would do the planning,
then store the plan in the hash table). This would allow us to make use
of the constant values of query parameters, but there is no guarantee
that those constant values will be the same for future EXECUTEs (i.e. we
could actually see worse performance in the aggregate).

A more sophisticated approach would be to do something along the lines
of generating multiple plans at PREPARE time and then choosing the
"best" plan for a given EXECUTE (by matching the supplied query
parameters to the closest guessed set of parameters chosen by PREPARE).
This is a _hard_ problem though, especially in the presence of multiple
parameters.

> Note: the same prepared statement works well with other typical
> databases (e.g. MySQL, SQLite).

This isn't really relevant -- the fact that we fall over for this
particular query is as much bad luck as anything else.

-Neil

Re: BUG #1334: PREPARE creates bad execution plan (40x

From
"Jim C. Nasby"
Date:
FWIW this is a hard problem; Oracle is the only database I know of
that's tackled it.

On Wed, Dec 01, 2004 at 11:38:25AM +1100, Neil Conway wrote:
> On Tue, 2004-11-30 at 22:19 +0000, PostgreSQL Bugs List wrote:
> > This means that using a prepared statement instead of a direct query is *40*
> > times slower!
>
> Yes, it's a known (documented) issue that you can get inferior query
> plans using prepared statements. I don't know of an easy way to fix
> this: we cannot infer the value of the prepared query's parameters when
> the planning is done.
>
> An incremental improvement would be to delay preparing queries until the
> first time they are executed (so PREPARE would just store the query in
> an internal hash table, and the first EXECUTE would do the planning,
> then store the plan in the hash table). This would allow us to make use
> of the constant values of query parameters, but there is no guarantee
> that those constant values will be the same for future EXECUTEs (i.e. we
> could actually see worse performance in the aggregate).
>
> A more sophisticated approach would be to do something along the lines
> of generating multiple plans at PREPARE time and then choosing the
> "best" plan for a given EXECUTE (by matching the supplied query
> parameters to the closest guessed set of parameters chosen by PREPARE).
> This is a _hard_ problem though, especially in the presence of multiple
> parameters.
>
> > Note: the same prepared statement works well with other typical
> > databases (e.g. MySQL, SQLite).
>
> This isn't really relevant -- the fact that we fall over for this
> particular query is as much bad luck as anything else.
>
> -Neil
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: BUG #1334: PREPARE creates bad execution plan (40x

From
Tom Lane
Date:
"Jim C. Nasby" <decibel@decibel.org> writes:
> FWIW this is a hard problem; Oracle is the only database I know of
> that's tackled it.

It seems fair to point out that this is the tradeoff you must buy into
when using PREPARE.  You can have a query plan that is tailored to the
specific parameter value you supply, or you can have a generic query
plan.  The tailored plan will cost you planning time; the generic plan
will save you planning time; but there's no free lunch.  If your table
statistics are such that you really need different plans for different
parameter values, then you shouldn't be using PREPARE.  I do not think
this is a database bug --- it looks more like DBA misuse of the
available tools.

            regards, tom lane

Re: BUG #1334: PREPARE creates bad execution plan (40x

From
Bruce Momjian
Date:
Tom Lane wrote:
> "Jim C. Nasby" <decibel@decibel.org> writes:
> > FWIW this is a hard problem; Oracle is the only database I know of
> > that's tackled it.
>
> It seems fair to point out that this is the tradeoff you must buy into
> when using PREPARE.  You can have a query plan that is tailored to the
> specific parameter value you supply, or you can have a generic query
> plan.  The tailored plan will cost you planning time; the generic plan
> will save you planning time; but there's no free lunch.  If your table
> statistics are such that you really need different plans for different
> parameter values, then you shouldn't be using PREPARE.  I do not think
> this is a database bug --- it looks more like DBA misuse of the
> available tools.

There are a few PREPARE items on the TODO list.  The big one is that the
SQL PREPARE is not delayed until the first execute so no actual
parameter values are used to generated the prepared plan.  libpq prepare
does do this already.

The full TODO item is:

* Allow finer control over the caching of prepared query plans

  Currently, queries prepared via the libpq API are planned on first
  execute using the supplied parameters --- allow SQL PREPARE to do the
  same.  Also, allow control over replanning prepared queries either
  manually or automatically when statistics for execute parameters
  differ dramatically from those used during planning.


--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073