Re: [HACKERS] Performance degradation in TPC-H Q18 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Performance degradation in TPC-H Q18
Date
Msg-id CA+Tgmob43_EpJRT_xAizrZ6sBTwKqHSZQZJ330X2XsORcxNmfQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Performance degradation in TPC-H Q18  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] Performance degradation in TPC-H Q18
Re: [HACKERS] Performance degradation in TPC-H Q18
List pgsql-hackers
On Sat, Mar 4, 2017 at 5:56 AM, Andres Freund <andres@anarazel.de> wrote:
> attached is a patch to address this problem, and the one reported by
> Dilip.  I ran a lot of TPC-H and other benchmarks, and so far this
> addresses all the performance issues, often being noticeably faster than
> with the dynahash code.
>
> Comments?

I'm still not convinced that raising the fillfactor like this is going
to hold up in testing, but I don't mind you committing it and we'll
see what happens.  If it is possible to survive with a fillfactor that
high, it certainly has some advantages, and it's obviously more likely
to work out with the other logic you've added.  I think we need a lot
more people playing with this to know whether any given approach is
right, and I think getting something committed will help more than any
amount of theoretical discussion.

I think DEBUG1 is far too high for something that could occur with
some frequency on a busy system; I'm fairly strongly of the opinion
that you ought to downgrade that by a couple of levels, say to DEBUG3
or so.

> On 2017-03-03 11:23:00 +0530, Kuntal Ghosh wrote:
>> On Fri, Mar 3, 2017 at 8:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> > On Fri, Mar 3, 2017 at 1:22 AM, Andres Freund <andres@anarazel.de> wrote:
>> >> the resulting hash-values aren't actually meaningfully influenced by the
>> >> IV. Because we just xor with the IV, most hash-value that without the IV
>> >> would have fallen into a single hash-bucket, fall into a single
>> >> hash-bucket afterwards as well; just somewhere else in the hash-range.
>> >
>> > Wow, OK.  I had kind of assumed (without looking) that setting the
>> > hash IV did something a little more useful than that.  Maybe we should
>> > do something like struct blah { int iv; int hv; }; newhv =
>> > hash_any(&blah, sizeof(blah)).
>
> The hash invocations are already noticeable performancewise, so I'm a
> bit hesitant to go there.  I'd rather introduce a decent 'hash_combine'
> function or such.

Yes, that might be better.  I wasn't too sure the approach I proposed
would actually do a sufficiently-good job mixing it the bits from the
IV anyway.  It's important to keep in mind that the values we're using
as IVs aren't necessarily going to be uniformly distributed in any
meaningful way.  They're just PIDs, so you might only have 1-3 bits of
difference between one value and another within the same parallel
query.  If you don't do something fairly aggressive to make that
change perturb the final hash value, it probably won't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] DROP SUBSCRIPTION and ROLLBACK
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] log_autovacuum_min_duration doesn't log VACUUMs