Thread: tweaking costs to favor nestloop

tweaking costs to favor nestloop

From
Vincent van Leeuwen
Date:
Hi,

I'm running PG 7.3.2 on a dual P3 1 GHz, 4GB RAM, 5-disk RAID 5 (hardware) on
Debian Linux, kernel 2.4.21-rc3.

I'm unable to tweak the various _cost settings in such a way that attached
query will use the right plan. Attachment contains relevant config file
settings, table defenitions and explain analyze output with some enable_*
settings turned off.

Could anyone tell me what I'm doing wrong in the query itself or tell me what
I should do with which config file setting to let the planner choose the
fastest plan by itself?


Regards,

Vincent van Leeuwen
Media Design - http://www.mediadesign.nl/

Attachment

Re: tweaking costs to favor nestloop

From
Tom Lane
Date:
Vincent van Leeuwen <pgsql.spam@vinz.nl> writes:
> I'm unable to tweak the various _cost settings in such a way that attached
> query will use the right plan.

You aren't going to be able to.  You've already overshot a reasonable
random_page_cost setting --- to judge by the relative actual costs of
the merge and hash join, a value somewhere around 3 is appropriate for
your setup.  (Assuming I did the math right --- if you set it to 3,
do you get a ratio of merge and hash estimated costs that agrees with
the ratio of actual runtimes?)

The problem here is that the costing of the repeated inner index scans
isn't realistic: 35417 probes into "auth" are clearly taking much less
than 35417 times what a single probe could be expected to take.  We
talked about how repeated scans would win from caching of the upper
btree levels, but I think there's more to it than that.  It occurs to me
that the probes you are making are probably not random and uncorrelated.
They are driven by the values of reportuser.idreporter ... is it fair
to guess that most of the reportuser rows link to just a small fraction
of the total auth population?  If so, the caching could be eliminating
most of the reads, not just the upper btree levels, because we're
mostly hitting only small parts of the index and auth tables.

I'm beginning to think that the only reasonable way to model this is to
cost the entire nestloop join as a unit, so that we have access to
statistics about the outer table as well as the indexed table.  That
would give us a shot at estimating how much of the index is likely to
get touched.

As of 7.3 I think all you can do is force nestloop by disabling the
other two join types.

            regards, tom lane

Re: tweaking costs to favor nestloop

From
Vincent van Leeuwen
Date:
On 2003-06-11 16:17:53 -0400, Tom Lane wrote:
> Vincent van Leeuwen <pgsql.spam@vinz.nl> writes:
> > I'm unable to tweak the various _cost settings in such a way that attached
> > query will use the right plan.
>
> You aren't going to be able to.  You've already overshot a reasonable
> random_page_cost setting --- to judge by the relative actual costs of
> the merge and hash join, a value somewhere around 3 is appropriate for
> your setup.  (Assuming I did the math right --- if you set it to 3,
> do you get a ratio of merge and hash estimated costs that agrees with
> the ratio of actual runtimes?)
>
Well, random_page_cost is where it is right now because for a number of other
queries it seems to give the best result. Specifically, 1.25 seems to be the
sweet spot where a number of queries that were using seqscans but should use
indexscans started to use indexscans. Tweaking the cpu_index_tuple_cost by
rather large margins didn't seem to have any effect on the calculated costs.
Going back to a setting of 3 will hurt overall performance, unless we can
still get those other queries to use the right plan by tweaking other config
parameters.

How did you calculate the value of 3?

Another problem we've noticed is that on an idle database certain queries are
better off using an indexscan than a seqscan, something which the planner
already wanted to do. But when the load on the database gets a lot higher,
indexscans are consistently slower than seqscans (same query, same
parameters). So we had to dick around a bit to favor seqscans more for those
queries (we set cpu_operator_cost a lot lower to favor a seqscan+sort over a
(reverse? dunno anymore) indexscan).

> The problem here is that the costing of the repeated inner index scans
> isn't realistic: 35417 probes into "auth" are clearly taking much less
> than 35417 times what a single probe could be expected to take.  We
> talked about how repeated scans would win from caching of the upper
> btree levels, but I think there's more to it than that.  It occurs to me
> that the probes you are making are probably not random and uncorrelated.
> They are driven by the values of reportuser.idreporter ... is it fair
> to guess that most of the reportuser rows link to just a small fraction
> of the total auth population?  If so, the caching could be eliminating
> most of the reads, not just the upper btree levels, because we're
> mostly hitting only small parts of the index and auth tables.
>
Exactly. I think the 'auth' table is already completely in kernel
filesystemcache to begin with, and probably largely in shared_buffers too,
since it's a small table that gets hit a lot. Especially on it's primary key,
which we use here.

> I'm beginning to think that the only reasonable way to model this is to
> cost the entire nestloop join as a unit, so that we have access to
> statistics about the outer table as well as the indexed table.  That
> would give us a shot at estimating how much of the index is likely to
> get touched.
>
> As of 7.3 I think all you can do is force nestloop by disabling the
> other two join types.
>

Does 7.4 already have changes in this area that will affect this query?


Vincent van Leeuwen
Media Design - http://www.mediadesign.nl/

Re: tweaking costs to favor nestloop

From
Tom Lane
Date:
Vincent van Leeuwen <pgsql.spam@vinz.nl> writes:
> How did you calculate the value of 3?

Estimated cost of an indexscan is approximately proportional to
random_page_cost, but cost of a seqscan isn't affected by it.
You had a hash join plan that used two seqscans (so its estimated
cost is unaffected by random_page_cost) plus a merge join plan
that had one indexscan input.  I just extrapolated the change in
the indexscan cost needed to make the ratio of total costs agree with
reality.  This is a pretty rough calculation of course, but I don't
believe small values of random_page_cost except for situations where all
your data is buffered in RAM.  It's real easy to get led down the garden
path by small test cases that get fully buffered (especially when you
repeat them over and over), and pick cost values that will not reflect
reality in a production environment.  I can't say whether that actually
happened to you, but it's something to be on your guard about.

> Another problem we've noticed is that on an idle database certain queries are
> better off using an indexscan than a seqscan, something which the planner
> already wanted to do. But when the load on the database gets a lot higher,
> indexscans are consistently slower than seqscans (same query, same
> parameters).

See above.  Increasing load reduces the chances that any one query will
find its data already buffered, since there's more competition for the
available buffer space.

> Does 7.4 already have changes in this area that will affect this query?

No.

            regards, tom lane

Re: tweaking costs to favor nestloop

From
"Jim C. Nasby"
Date:
Tom-
FWIW, these are the same kind of numbers I'm seeing for the project I'm
working on.. ie: nested loop estimates at 0.00-3.01 but reality is much
closer to 0.2. I agrees that it probably makes sense to take the
correlation of both tables into account for nested-loop joins.

On Wed, Jun 11, 2003 at 04:17:53PM -0400, Tom Lane wrote:
> Vincent van Leeuwen <pgsql.spam@vinz.nl> writes:
> > I'm unable to tweak the various _cost settings in such a way that attached
> > query will use the right plan.
>
> You aren't going to be able to.  You've already overshot a reasonable
> random_page_cost setting --- to judge by the relative actual costs of
> the merge and hash join, a value somewhere around 3 is appropriate for
> your setup.  (Assuming I did the math right --- if you set it to 3,
> do you get a ratio of merge and hash estimated costs that agrees with
> the ratio of actual runtimes?)
>
> The problem here is that the costing of the repeated inner index scans
> isn't realistic: 35417 probes into "auth" are clearly taking much less
> than 35417 times what a single probe could be expected to take.  We
> talked about how repeated scans would win from caching of the upper
> btree levels, but I think there's more to it than that.  It occurs to me
> that the probes you are making are probably not random and uncorrelated.
> They are driven by the values of reportuser.idreporter ... is it fair
> to guess that most of the reportuser rows link to just a small fraction
> of the total auth population?  If so, the caching could be eliminating
> most of the reads, not just the upper btree levels, because we're
> mostly hitting only small parts of the index and auth tables.
>
> I'm beginning to think that the only reasonable way to model this is to
> cost the entire nestloop join as a unit, so that we have access to
> statistics about the outer table as well as the indexed table.  That
> would give us a shot at estimating how much of the index is likely to
> get touched.
>
> As of 7.3 I think all you can do is force nestloop by disabling the
> other two join types.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>

--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"