Thread: Sub-optimal plan chosen

Sub-optimal plan chosen

From
bricklen
Date:
In the following query, We are seeing a sub-optimal plan being chosen. The following results are after running the query several times (after each change).

dev1=# select version();
                                                  version
-----------------------------------------------------------------------------------------------------------
 PostgreSQL 8.2.13 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-14)



dev1=# EXPLAIN ANALYZE SELECT SUM (revenue) as revenue FROM statsdaily WHERE ofid = 38 AND date >= '2009-09-01' AND date <= '2999-01-01';
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=11796.19..11796.20 rows=1 width=8) (actual time=28.598..28.599 rows=1 loops=1)
   ->  Index Scan using statsdaily_unique_idx on statsdaily  (cost=0.00..11783.65 rows=5017 width=8) (actual time=0.043..25.374 rows=3125 loops=1)
         Index Cond: ((date >= '2009-09-01'::date) AND (date <= '2999-01-01'::date) AND (ofid = 38))
 Total runtime: 28.650 ms


dev1=# set enable_indexscan to off;


dev1=# EXPLAIN ANALYZE SELECT SUM (revenue) as revenue FROM statsdaily WHERE ofid = '38' AND date >= '2009-09-01' AND date <= '2999-01-01';
                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=13153.47..13153.48 rows=1 width=8) (actual time=7.746..7.747 rows=1 loops=1)
   ->  Bitmap Heap Scan on statsdaily  (cost=3622.22..13140.92 rows=5017 width=8) (actual time=0.941..4.865 rows=3125 loops=1)
         Recheck Cond: ((ofid = 38) AND (date >= '2009-09-01'::date))
         Filter: (date <= '2999-01-01'::date)
         ->  Bitmap Index Scan on statsdaily_ofid_sept2009_idx  (cost=0.00..3620.97 rows=5046 width=0) (actual time=0.551..0.551 rows=3125 loops=1)
               Index Cond: (ofid = 38)
 Total runtime: 7.775 ms


default_statistics_target = 100 (tried with 500, no change). Vacuum analyzed before initial query, and after each change to default_statistics_target.


The same query, with a different "ofid", will occasionally get the more optimal plan -- I assume that the distribution of data is the differentiator there.

Is there any other data I can provide to shed some light on this?

Thanks!

Re: Sub-optimal plan chosen

From
tv@fuzzy.cz
Date:
> default_statistics_target = 100 (tried with 500, no change). Vacuum
> analyzed
> before initial query, and after each change to default_statistics_target.

Modifying the statistics target is useful only if the estimates are
seriously off, which is not your case - so it won't help, at least not
reliably.

> The same query, with a different "ofid", will occasionally get the more
> optimal plan -- I assume that the distribution of data is the
> differentiator
> there.

Yes, the difference between costs of the two plans is quite small (11796
vs. 13153) so it's very sensible to data distribution.

> Is there any other data I can provide to shed some light on this?

You may try to play with the 'cost' constants - see this:

http://www.postgresql.org/docs/8.4/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS

You just need to modify them so that the bitmap index scan / bitmap heap
scan is prefered to plain index scan.

Just be careful - if set in the postgresql.conf, it affects all the
queries and may cause serious problems with other queries. So it deserves
proper testing ...

regards
Tomas


Re: Sub-optimal plan chosen

From
bricklen
Date:
Hi Tomas,

2009/9/10 <tv@fuzzy.cz>
> default_statistics_target = 100 (tried with 500, no change). Vacuum
> analyzed
> before initial query, and after each change to default_statistics_target.

Modifying the statistics target is useful only if the estimates are
seriously off, which is not your case - so it won't help, at least not
reliably.

> The same query, with a different "ofid", will occasionally get the more
> optimal plan -- I assume that the distribution of data is the
> differentiator
> there.

Yes, the difference between costs of the two plans is quite small (11796
vs. 13153) so it's very sensible to data distribution.

> Is there any other data I can provide to shed some light on this?

You may try to play with the 'cost' constants - see this:

http://www.postgresql.org/docs/8.4/interactive/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS

You just need to modify them so that the bitmap index scan / bitmap heap
scan is prefered to plain index scan.

Just be careful - if set in the postgresql.conf, it affects all the
queries and may cause serious problems with other queries. So it deserves
proper testing ...

regards
Tomas


Playing around with seq_page_cost (1) and random_page_cost (1), I can get the correct index selected. Applying those same settings to our production server does not produce the optimal plan, though.

Re: Sub-optimal plan chosen

From
tv@fuzzy.cz
Date:
> Playing around with seq_page_cost (1) and random_page_cost (1), I can get
> the correct index selected. Applying those same settings to our production
> server does not produce the optimal plan, though.

I doubt setting seq_page_cost and random_page_cost to the same value is
reasonable - random access is almost always more expensive than sequential
access.

Anyway, post the EXPLAIN ANALYZE output from the production server. Don't
forget there are other _cost values - try to modify them too, but I'm not
sure how these values relate to the bitmap heap scan / bitmap index plans.

regards
Tomas


Re: Sub-optimal plan chosen

From
Tom Lane
Date:
bricklen <bricklen@gmail.com> writes:
> Is there any other data I can provide to shed some light on this?

The table and index definitions?

The straight indexscan would probably win if the index column order
were ofid, date instead of date, ofid.  I can't tell if you have
any other queries for which the existing column order is preferable,
though.

            regards, tom lane

Re: Sub-optimal plan chosen

From
bricklen
Date:
On Thu, Sep 10, 2009 at 8:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
bricklen <bricklen@gmail.com> writes:
> Is there any other data I can provide to shed some light on this?

The table and index definitions?

The straight indexscan would probably win if the index column order
were ofid, date instead of date, ofid.  I can't tell if you have
any other queries for which the existing column order is preferable,
though.

                       regards, tom lane


Changing the order of the WHERE predicates didn't help. The indexes are mostly defined as single-column indexes, with the exception of the "statsdaily_unique_idx" one:

statsdaily_id_pk PRIMARY KEY, btree (id)
statsdaily_unique_idx UNIQUE, btree (date, idaf, idsite, ofid, idcreative, idoptimizer)
statsdaily_date_idx btree (date)
statsdaily_ofid_idx btree (ofid)
statsdaily_ofid_sept2009_idx btree (ofid) WHERE date >= '2009-09-01'::date

Re: Sub-optimal plan chosen

From
Robert Haas
Date:
2009/9/10  <tv@fuzzy.cz>:
>> Playing around with seq_page_cost (1) and random_page_cost (1), I can get
>> the correct index selected. Applying those same settings to our production
>> server does not produce the optimal plan, though.
>
> I doubt setting seq_page_cost and random_page_cost to the same value is
> reasonable - random access is almost always more expensive than sequential
> access.

If the data figures to be read from the OS cache, it's very
reasonable, and the right value is somewhere in the 0.05 - 0.10 range.

...Robert

Re: Sub-optimal plan chosen

From
bricklen
Date:
On Thu, Sep 10, 2009 at 9:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
2009/9/10  <tv@fuzzy.cz>:
>> Playing around with seq_page_cost (1) and random_page_cost (1), I can get
>> the correct index selected. Applying those same settings to our production
>> server does not produce the optimal plan, though.
>
> I doubt setting seq_page_cost and random_page_cost to the same value is
> reasonable - random access is almost always more expensive than sequential
> access.

If the data figures to be read from the OS cache, it's very
reasonable, and the right value is somewhere in the 0.05 - 0.10 range.


For the most part, it will indeed be cached. Thanks for the tip on the values.

Re: Sub-optimal plan chosen

From
Robert Haas
Date:
On Thu, Sep 10, 2009 at 12:56 PM, bricklen <bricklen@gmail.com> wrote:
> On Thu, Sep 10, 2009 at 8:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> bricklen <bricklen@gmail.com> writes:
>> > Is there any other data I can provide to shed some light on this?
>>
>> The table and index definitions?
>>
>> The straight indexscan would probably win if the index column order
>> were ofid, date instead of date, ofid.  I can't tell if you have
>> any other queries for which the existing column order is preferable,
>> though.
>>
>>                        regards, tom lane
>
>
> Changing the order of the WHERE predicates didn't help.

He's talking about the index definition, not the WHERE clause.  The
order of the WHERE clause is totally irrelevant.

...Robert

Re: Sub-optimal plan chosen

From
bricklen
Date:
On Thu, Sep 10, 2009 at 10:02 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Sep 10, 2009 at 12:56 PM, bricklen <bricklen@gmail.com> wrote:
> On Thu, Sep 10, 2009 at 8:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> bricklen <bricklen@gmail.com> writes:
>> > Is there any other data I can provide to shed some light on this?
>>
>> The table and index definitions?
>>
>> The straight indexscan would probably win if the index column order
>> were ofid, date instead of date, ofid.  I can't tell if you have
>> any other queries for which the existing column order is preferable,
>> though.
>>
>>                        regards, tom lane
>
>
> Changing the order of the WHERE predicates didn't help.

He's talking about the index definition, not the WHERE clause.  The
order of the WHERE clause is totally irrelevant.


Ah, sorry, missed that.

Re: Sub-optimal plan chosen

From
bricklen
Date:
On Thu, Sep 10, 2009 at 10:07 AM, bricklen <bricklen@gmail.com> wrote:
On Thu, Sep 10, 2009 at 10:02 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Sep 10, 2009 at 12:56 PM, bricklen <bricklen@gmail.com> wrote:
> On Thu, Sep 10, 2009 at 8:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> bricklen <bricklen@gmail.com> writes:
>> > Is there any other data I can provide to shed some light on this?
>>
>> The table and index definitions?
>>
>> The straight indexscan would probably win if the index column order
>> were ofid, date instead of date, ofid.  I can't tell if you have
>> any other queries for which the existing column order is preferable,
>> though.
>>
>>                        regards, tom lane
>
>
> Changing the order of the WHERE predicates didn't help.

He's talking about the index definition, not the WHERE clause.  The
order of the WHERE clause is totally irrelevant.


Ah, sorry, missed that.


I just created a new index as Tom said, and the query *does* use the new index (where ofid precedes date in the definition).

Re: Sub-optimal plan chosen

From
Tom Lane
Date:
bricklen <bricklen@gmail.com> writes:
> I just created a new index as Tom said, and the query *does* use the new
> index (where ofid precedes date in the definition).

And is it indeed faster than the other alternatives?

            regards, tom lane

Re: Sub-optimal plan chosen

From
bricklen
Date:
On Thu, Sep 10, 2009 at 10:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
bricklen <bricklen@gmail.com> writes:
> I just created a new index as Tom said, and the query *does* use the new
> index (where ofid precedes date in the definition).

And is it indeed faster than the other alternatives?

                       regards, tom lane

About the same as the earlier, faster plan:

 Aggregate  (cost=2342.79..2342.80 rows=1 width=8) (actual time=8.433..8.433 rows=1 loops=1)
   ->  Index Scan using statsdaily_ofid_date on statsdaily  (cost=0.00..2330.61 rows=4873 width=8) (actual time=0.089..5.043 rows=3125 loops=1)
         Index Cond: ((ofid = 38) AND (date >= '2009-09-01'::date) AND (date <= '2999-01-01'::date))
 Total runtime: 8.470 ms