Re: [HACKERS] Performance degradation in TPC-H Q18 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [HACKERS] Performance degradation in TPC-H Q18 |
Date | |
Msg-id | 20170306203200.kczd7xldxirsbgwl@alap3.anarazel.de Whole thread Raw |
In response to | Re: [HACKERS] Performance degradation in TPC-H Q18 (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Performance degradation in TPC-H Q18
|
List | pgsql-hackers |
On 2017-03-04 11:09:40 +0530, Robert Haas wrote: > On Sat, Mar 4, 2017 at 5:56 AM, Andres Freund <andres@anarazel.de> wrote: > > attached is a patch to address this problem, and the one reported by > > Dilip. I ran a lot of TPC-H and other benchmarks, and so far this > > addresses all the performance issues, often being noticeably faster than > > with the dynahash code. > > > > Comments? > > I'm still not convinced that raising the fillfactor like this is going > to hold up in testing, but I don't mind you committing it and we'll > see what happens. I didn't see anything in testing, but I agree that it's debatable. But I'd rather commit it now, when we all know it's new code. Raising it in a new release will be a lot harder. > I think DEBUG1 is far too high for something that could occur with > some frequency on a busy system; I'm fairly strongly of the opinion > that you ought to downgrade that by a couple of levels, say to DEBUG3 > or so. I actually planned to remove it entirely, before committing. It was more left in for testers to be able to see when the code triggers. > > On 2017-03-03 11:23:00 +0530, Kuntal Ghosh wrote: > >> On Fri, Mar 3, 2017 at 8:41 AM, Robert Haas <robertmhaas@gmail.com> wrote: > >> > On Fri, Mar 3, 2017 at 1:22 AM, Andres Freund <andres@anarazel.de> wrote: > >> >> the resulting hash-values aren't actually meaningfully influenced by the > >> >> IV. Because we just xor with the IV, most hash-value that without the IV > >> >> would have fallen into a single hash-bucket, fall into a single > >> >> hash-bucket afterwards as well; just somewhere else in the hash-range. > >> > > >> > Wow, OK. I had kind of assumed (without looking) that setting the > >> > hash IV did something a little more useful than that. Maybe we should > >> > do something like struct blah { int iv; int hv; }; newhv = > >> > hash_any(&blah, sizeof(blah)). > > > > The hash invocations are already noticeable performancewise, so I'm a > > bit hesitant to go there. I'd rather introduce a decent 'hash_combine' > > function or such. > > Yes, that might be better. I wasn't too sure the approach I proposed > would actually do a sufficiently-good job mixing it the bits from the > IV anyway. It's important to keep in mind that the values we're using > as IVs aren't necessarily going to be uniformly distributed in any > meaningful way. They're just PIDs, so you might only have 1-3 bits of > difference between one value and another within the same parallel > query. If you don't do something fairly aggressive to make that > change perturb the final hash value, it probably won't. FWIW, I played with some better mixing, and it helps a bit with accurately sized hashtables and multiple columns. What's however more interesting is that a better mixed IV and/or better iteration now *slightly* *hurts* performance with grossly misestimated sizes, because resizing has to copy more rows... Not what I predicted. Greetings, Andres Freund
pgsql-hackers by date: