Re: tweaking NTUP_PER_BUCKET - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: tweaking NTUP_PER_BUCKET
Date
Msg-id 53C2DECC.2010408@fuzzy.cz
Whole thread Raw
In response to Re: tweaking NTUP_PER_BUCKET  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: tweaking NTUP_PER_BUCKET  (Tomas Vondra <tv@fuzzy.cz>)
List pgsql-hackers
On 11.7.2014 19:25, Tomas Vondra wrote:
> 2) walking through the tuples sequentially
> ------------------------------------------
>
> The other option is not to walk the tuples through buckets, but by
> walking throught the chunks - we know the tuples are stored as
> HashJoinTuple/MinimalTuple, so it shouldn't be that difficult. And by
> walking the tuples in the order they're stored, we may just rearrange
> the tuples in already allocated memory. And maybe it could be even
> faster than the current code, because of the sequential access to the
> memory (as opposed to the random access through buckets) and CPU caches.
> So I like this approach - it's simple, clean and shouldn't add any
> overhead (memory or CPU).

So, attached is a patch that implements this. It's not very complicated
and the resulting code is surprisingly readable (IMHO comprable to the
previous code).

Basics of the patch:

(1) adds HashChunkData structure - a buffer used for dense allocation
    of tuples, 16kB by default

(2) adds 'chunks_batch' into HashJoinTableData, which is a linked list
    of HashChunkData

(3) the allocation itself is done in chunk_alloc - in short it either
    allocates the tuple in the current chunk, or adds a new one if
    needed

(4) ExecHashTableInsert calls chunk_alloc (instead of the regular
    MemoryContextAlloc)

(5) the main change is in ExecHashIncreaseNumBatches, which now can't
    do pfree (does not work with dense allocation), so it reads the
    tuples from chunks directly and simply rebuilds the buckets
    from scratch (thanks to this we only need one more chunk of memory,
    not a complete copy, and we don't need to duplicate buckets at all)

>From the tests I've done so far this seems to work perfectly - it still
saves the memory overhead, and I've seen no difference in performance.

The current patch only implemnents this for tuples in the main hash
table, not for skew buckets. I plan to do that, but it will require
separate chunks for each skew bucket (so we can remove it without
messing with all of them). The chunks for skew buckets should be smaller
(I'm thinking about ~4kB), because there'll be more of them, but OTOH
those buckets are for frequent values so the chunks should not remain empty.

But let's see if the current patch misses something important first.


regards
Tomas

Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: tweaking NTUP_PER_BUCKET
Next
From: Stefan Kaltenbrunner
Date:
Subject: Re: SSL information view