Re: [HACKERS] Performance degradation in TPC-H Q18 - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] Performance degradation in TPC-H Q18 |
Date | |
Msg-id | CA+Tgmob43_EpJRT_xAizrZ6sBTwKqHSZQZJ330X2XsORcxNmfQ@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Performance degradation in TPC-H Q18 (Andres Freund <andres@anarazel.de>) |
Responses |
Re: [HACKERS] Performance degradation in TPC-H Q18
Re: [HACKERS] Performance degradation in TPC-H Q18 |
List | pgsql-hackers |
On Sat, Mar 4, 2017 at 5:56 AM, Andres Freund <andres@anarazel.de> wrote: > attached is a patch to address this problem, and the one reported by > Dilip. I ran a lot of TPC-H and other benchmarks, and so far this > addresses all the performance issues, often being noticeably faster than > with the dynahash code. > > Comments? I'm still not convinced that raising the fillfactor like this is going to hold up in testing, but I don't mind you committing it and we'll see what happens. If it is possible to survive with a fillfactor that high, it certainly has some advantages, and it's obviously more likely to work out with the other logic you've added. I think we need a lot more people playing with this to know whether any given approach is right, and I think getting something committed will help more than any amount of theoretical discussion. I think DEBUG1 is far too high for something that could occur with some frequency on a busy system; I'm fairly strongly of the opinion that you ought to downgrade that by a couple of levels, say to DEBUG3 or so. > On 2017-03-03 11:23:00 +0530, Kuntal Ghosh wrote: >> On Fri, Mar 3, 2017 at 8:41 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> > On Fri, Mar 3, 2017 at 1:22 AM, Andres Freund <andres@anarazel.de> wrote: >> >> the resulting hash-values aren't actually meaningfully influenced by the >> >> IV. Because we just xor with the IV, most hash-value that without the IV >> >> would have fallen into a single hash-bucket, fall into a single >> >> hash-bucket afterwards as well; just somewhere else in the hash-range. >> > >> > Wow, OK. I had kind of assumed (without looking) that setting the >> > hash IV did something a little more useful than that. Maybe we should >> > do something like struct blah { int iv; int hv; }; newhv = >> > hash_any(&blah, sizeof(blah)). > > The hash invocations are already noticeable performancewise, so I'm a > bit hesitant to go there. I'd rather introduce a decent 'hash_combine' > function or such. Yes, that might be better. I wasn't too sure the approach I proposed would actually do a sufficiently-good job mixing it the bits from the IV anyway. It's important to keep in mind that the values we're using as IVs aren't necessarily going to be uniformly distributed in any meaningful way. They're just PIDs, so you might only have 1-3 bits of difference between one value and another within the same parallel query. If you don't do something fairly aggressive to make that change perturb the final hash value, it probably won't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: