Re: planner with index scan cost way off actual cost, advices to tweak cost constants? - Mailing list pgsql-performance

From Guillaume Cottenceau
Subject Re: planner with index scan cost way off actual cost, advices to tweak cost constants?
Date
Msg-id 87hd5tmt1z.fsf@meuh.mnc.lan
Whole thread Raw
In response to Re: planner with index scan cost way off actual cost, advices to tweak cost constants?  ("Guillaume Smet" <guillaume.smet@gmail.com>)
Responses Re: planner with index scan cost way off actual cost, advices to tweak cost constants?
List pgsql-performance
Guillaume,

Thanks for your answer.

> On 17 Mar 2006 11:09:50 +0100, Guillaume Cottenceau
> wrote:
> > Reading the documentation and postgresql list archives, I have
> > run ANALYZE right before my tests, I have increased the
> > statistics target to 50 for the considered table; my problem is
> > that the index scan cost reported by EXPLAIN seems to be around
> > 12.7 times higher that it should, a figure I suppose incompatible
> > (too large) for just random_page_cost and effective_cache_size
> > tweaks.
>
> It's not surprising you have a high cost for an index scan which is
> planned to return and returns so much rows. I really don't think the
> planner does something wrong on this one.

My point is that the planner's cost estimate is way above the
actual cost of the query, so the planner doesn't use the best
plan. Even if the index returns so much rows, actual cost of the
query is so that index scan (worst case, all disk cache flushed)
is still better than seq scan but the planner uses seq scan.

> AFAIK, increasing the statistics target won't do anything to reduce
> the cost as the planner estimation for the number of returned rows is
> already really accurate and probably can't be better.

Ok, thanks.

> > Of course real queries use smaller date ranges.
>
> What about providing us the respective plans for your real queries?
> And in a real case. It's a bad idea to compare index scan and seqscan

The original query is more complicated and sometimes involves
restricting the resultset with another constraint. I am not sure
it is very interesting to show it; I know that best performance
would be achieved with an index on the date column for the shown
query, and an index on the date column and the other column when
doing a query on these..

> when your data have to be loaded in RAM.

What do you mean? That I should not flush disk cache before
timing? I did so to find the worst case.. I am not sure it is the
best solution.. maybe half worst case would be? but this depends
a lot on whether the index pages would stay in disk cache or not
before next query.. which cannot be told for sure unless a full
serious timing of the real application is done (and my
application can be used in quite different scenarios, which means
such a test is not entirely possible/meaningful).

> Before doing so create an index on the date column to have the most
> effective index possible.

Yes, as I said, I know that doing this would improve a lot the
queries. My point was to understand why the cost of the index
scan is so "inaccurate" compared to actual cost. Adding an index
on the date column enlarges the data by 100-150M so I'd rather
save this if possible.

> > - I then tried to tweak random_page_cost and effective_cache_size
> >   following advices from documentation:
> >
> > SET random_page_cost = 2;
>
> random_page_cost is the way to go for this sort of thing but I don't
> think it's a good idea to have it too low globally and I'm still

Thanks, I suspected so.

> thinking the problem is that your test case is not accurate.

Ok.

--
Guillaume Cottenceau

pgsql-performance by date:

Previous
From: Tim Allen
Date:
Subject: partial indexes and inference
Next
From: Guillaume Cottenceau
Date:
Subject: Re: planner with index scan cost way off actual cost, advices to tweak cost constants?