Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets - Mailing list pgsql-hackers

From Joshua Tolley
Subject Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Date
Msg-id 20081223145146.GA5882@uber
Whole thread Raw
In response to Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets  ("Robert Haas" <robertmhaas@gmail.com>)
Responses Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
List pgsql-hackers
On Tue, Dec 23, 2008 at 09:22:27AM -0500, Robert Haas wrote:
> On Tue, Dec 23, 2008 at 2:21 AM, Bryce Cutt <pandasuit@gmail.com> wrote:
> > Because there is no nice way in PostgreSQL (that I know of) to derive
> > a histogram after a join (on an intermediate result) currently
> > usingMostCommonValues is only enabled on a join when the outer (probe)
> > side is a table scan (seq scan only actually).  See
> > getMostCommonValues (soon to be called
> > ExecHashJoinGetMostCommonValues) for the logic that determines this.

So my test case of "do a whole bunch of hash joins in a test query"
isn't really valid. Makes sense. I did another, more haphazard test on a
query with fewer joins, and saw noticeable speedups.

> It's starting to seem to me that the case where this patch provides a
> benefit is so narrow that I'm not sure it's worth the extra code.

Not that anyone asked, but I don't consider myself qualified to render
judgement on that point. Code size is, I guess, a maintainability issue,
and I'm not terribly experienced maintaining PostgreSQL :)
> Is it realistic to think that the MCVs of the base relation might
> still be applicable to the joinrel?  It's certainly easy to think of
> counterexamples, but it might be a good approximation more often than
> not.

It's equivalent to our assumption that distributions of values in
columns in the same table are independent. Making that assumption in
this case would probably result in occasional dramatic speed
improvements similar to the ones we've seen in less complex joins,
offset by just-as-occasional dramatic slowdowns of similar magnitude. In
other words, it will increase the variance of our results.

- Josh

pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: incoherent view of serializable transactions
Next
From: Emmanuel Cecchet
Date:
Subject: Re: incoherent view of serializable transactions