Re: PATCH: postpone building buckets to the end of Hash (in HashJoin) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: PATCH: postpone building buckets to the end of Hash (in HashJoin)
Date
Msg-id 569EAF53.30202@2ndquadrant.com
Whole thread Raw
In response to Re: PATCH: postpone building buckets to the end of Hash (in HashJoin)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: PATCH: postpone building buckets to the end of Hash (in HashJoin)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers

On 01/19/2016 08:34 PM, Robert Haas wrote:
> On Mon, Jan 18, 2016 at 10:57 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>>>> If this doesn't regress performance in the case where the number of
>>>> buckets is estimated accurately to begin with, then I think this is
>>>> a great idea. Can you supply some performance tests results for that
>>>> case, and maybe some of the other cases also?
>>>
>>> I don't see how it could regress performance, and the benchmarks I've
>>> done confirm that. I'll do more thorough benchmarking and post the
>>> results here, but not now as this patch is in 2016-01 CF and I want to
>>> put all my time into reviewing patches from the open commitfest.
>>
>> I've finally got to do more thorough benchmarks, and surprisingly there
>> seems to be a minor regression.
> ...
>>
>> So even if we entirely skipped the bucket build, we would not recover the
>> difference. Not sure what's going on here.
>>
>> I've also done some profiling using perf, but I haven't really found
>> anything suspicious there.
>>
>> Any ideas what might be the cause of this?
>
> My guess is that memory locality is not as good with this patch.  Consider this:
>
> copyTuple->next = hashtable->buckets[bucketno];
>
> We've only just populated copytuple via memcpy, so that cache line is
> presumably in whatever place cache lines go when they are really hot.
> We basically make one pass over the allocated space for the hash
> table, filling in the hash tuples and linking things into bucket
> chains.  OTOH, with the patch, we don't do that on the first pass,
> just writing the tuples.  Then when we come back through we still have
> to do that part:
>
> hashTuple->next = hashtable->buckets[bucketno];
>
> By the time we're doing this, especially at larger work_mem settings,
> we've probably pushed those cache lines out of the faster levels of
> the CPU cache, and we have to pull them back in again, and that takes
> time.   If the tuples are small, we'll dirty every cache line twice;
> if they're larger, we might not dirty every cache line on the second
> pass, but just some fraction of them.
>
> I could be totally off base, but this is why I was concerned about the patch.

I can totally see why this would slow-down the BuildBuckets function, 
but I don't quite see why it should make the other code significantly 
slower. Yet BuildBuckets takes just ~25ms while the total duration 
increases by ~200ms (for the 1x10 dataset).

I do understand calling BuildBuckets may affect the code executed after 
that as it keeps other data in the cache, but i would not expect ~175ms.

Yet another thing is that BuildBuckets accesses the tuples in mostly 
sequential way, so I'd expect prefetch to take care of that.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Combining Aggregates
Next
From: David Rowley
Date:
Subject: Re: Combining Aggregates