Re: HashJoin w/option to unique-ify inner rel - Mailing list pgsql-hackers

From Robert Haas
Subject Re: HashJoin w/option to unique-ify inner rel
Date
Msg-id 603c8f070905092005i9e40572o1f217aba9a2c2c13@mail.gmail.com
Whole thread Raw
In response to Re: HashJoin w/option to unique-ify inner rel  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, May 9, 2009 at 7:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> ... So it appears to me that instead of taking an average-case correction
>> as is done in this patch and the old coding, we have to explicitly model
>> the matched-tuple and unmatched-tuple cases separately.
>
> I've applied the attached patch that does things this way.  I did not do
> anything about improving the detailed modeling of hash-bucket searching
> as Robert suggested in some later messages.  I think that's probably
> worth looking at, but it's a second-order consideration --- this patch
> already seems to bring the estimates for semi/antijoins much closer
> to reality.

I'll take a look at this when I get a chance, but I'm just playing
with test cases, so I share your hope that Kevin (or someone else with
complex queries against real data) will test it out.

> I am a bit concerned about the extra time spent on repeated selectivity
> estimates.  It might not matter too much since it's only done for semi
> and anti joins which aren't that common.  It would be good though if
> someone who has a lot of such joins could test CVS HEAD and see if
> performance has gotten worse (Kevin?).  We could refactor things to
> reduce the duplication of effort but I'd prefer to leave that sort of
> thing to 8.5.

Agreed.  I was worried about that when I wrote the emails to which you
refer above, but I don't know how else to get good estimates for all
the relevant cases.

...Robert


pgsql-hackers by date:

Previous
From: "Erik Rijkers"
Date:
Subject: Re: pg_migrator alpha 5 - truncates at 10 M rows
Next
From: Tom Lane
Date:
Subject: Re: pg_migrator alpha 5 - truncates at 10 M rows