Home > mailing lists

Re: osdl-dbt3 run results - puzzled by the execution - Mailing list pgsql-performance

From	Jenny Zhang
Subject	Re: osdl-dbt3 run results - puzzled by the execution
Date	September 19, 2003 19:54:24
Msg-id	1064007341.442.45.camel@ibm-a.pdx.osdl.net Whole thread Raw
In response to	Re: osdl-dbt3 run results - puzzled by the execution plans (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-performance

Tree view

On Thu, 2003-09-18 at 20:20, Tom Lane wrote:
> Jenny Zhang <jenny@osdl.org> writes:
> > ...  It seems to me that small
> > effective_cache_size favors the choice of nested loop joins (NLJ)
> > while the big effective_cache_size is in favor of merge joins (MJ).
>
> No, I wouldn't think that, because a nestloop plan will involve repeated
> fetches of the same tuples whereas a merge join doesn't (at least not
> when it sorts its inner input, as this plan does).  Larger cache
> improves the odds of a repeated fetch not having to do I/O.  In practice
> a larger cache area would also have some effects on access costs for the
> sort's temp file, but I don't think the planner's cost model for sorting
> takes that into account.
I think there is some misunderstanding here.  What I meant to say is:
From the plans we got, the optimizer favors the choice of nested loop
joins (NLJ) while the big effective_cache_size is in favor of merge
joins (MJ).  Which we think is not appropriate.  We verified that
sort_mem has no impact on the plans.  Though it would be nice to take
that into account.
>
> As Matt Clark points out nearby, the real question is whether these
> planner estimates have anything to do with reality.  EXPLAIN ANALYZE
> results would be far more interesting than plain EXPLAIN.
>
> > However, within the same run set consist of 6 runs, we see 2-3%
> > standard deviation for the run metrics associated with the multiple
> > stream part of the test (as opposed to the single stream part).
>
> <python> Och, laddie, we useta *dream* of 2-3% variation </python>
>
BTW, I am a she :-)
> > We would like to reduce the variation to be less than 1% so that a
> > 2% change between two different kernels would be significant.
>
> I think this is a pipe dream.  Variation in where the data gets laid
> down on your disk drive would alone create more than that kind of delta.
> I'm frankly amazed you could get repeatability within 2-3%.
>
Greg is right.  The repeatability is due to the aggregate results for a
whole test run.  As for individual query, the power test(single stream)
is very consistent, and the throughput test(multiple streams), any given
query execution time varies up to 15% if no swapping.  If we set
sort_mem too high and swapping occurs, the variation is bigger.

Jenny

pgsql-performance by date:

From: Jenny Zhang
Date: 19 September 2003, 15:28:03
Subject: Re: osdl-dbt3 run results - puzzled by the execution

From: Jenny Zhang
Date: 19 September 2003, 20:19:27
Subject: Re: osdl-dbt3 run results - puzzled by the execution

Re: osdl-dbt3 run results - puzzled by the execution - Mailing list pgsql-performance

Previous

Next