Re: tweaking NTUP_PER_BUCKET - Mailing list pgsql-hackers

From Robert Haas
Subject Re: tweaking NTUP_PER_BUCKET
Date
Msg-id CA+TgmoZjpGaqT11o1+tSa6yKoMpn4GkGfnEVKjbeiej1E_FTsA@mail.gmail.com
Whole thread Raw
In response to Re: tweaking NTUP_PER_BUCKET  ("Tomas Vondra" <tv@fuzzy.cz>)
Responses Re: tweaking NTUP_PER_BUCKET  ("Tomas Vondra" <tv@fuzzy.cz>)
List pgsql-hackers
On Tue, Jul 8, 2014 at 9:35 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
> On 8 Červenec 2014, 14:49, Robert Haas wrote:
>> On Wed, Jul 2, 2014 at 8:13 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>>> I propose dynamic increase of the nbuckets (up to NTUP_PER_BUCKET=1)
>>> once the table is built and there's free space in work_mem. The patch
>>> mentioned above makes implementing this possible / rather simple.
>>
>> Another idea would be to start with NTUP_PER_BUCKET=1 and then, if we
>> run out of memory, increase NTUP_PER_BUCKET.  I'd like to think that
>> the common case is that work_mem will be large enough that everything
>> fits; and if you do it that way, then you save yourself the trouble of
>> rehashing later, which as you point out might lose if there are only a
>> few probes.  If it turns out that you run short of memory, you can
>> merge pairs of buckets up to three times, effectively doubling
>> NTUP_PER_BUCKET each time.
>
> Maybe. I'm not against setting NTUP_PER_BUCKET=1, but with large outer
> relations it may be way cheaper to use higher NTUP_PER_BUCKET values
> instead of increasing the number of batches (resulting in repeated scans
> of the outer table). I think it's important (but difficult) to keep these
> things somehow balanced.

Right, I think that's clear.  I'm just pointing out that you get to
decide: you can either start with a larger NTUP_PER_BUCKET and then
reduce it if you enough memory left, or you can start with a smaller
NTUP_PER_BUCKET and then increase it if you run short of memory.

>> Yet another idea is to stick with your scheme, but do the dynamic
>> bucket splits lazily.  Set a flag on each bucket indicating whether or
>> not it needs a lazy split.  When someone actually probes the hash
>> table, if the flag is set for a particular bucket, move any tuples
>> that don't belong as we scan the bucket.  If we reach the end of the
>> bucket chain, clear the flag.
>
> I don't think a lazy scheme makes much sense here. There are two clearly
> separated phases - loading the table and search.
>
> Also, it seems to me it can't work as described. Say we build a table and
> then find out we need a table 4x the size. If I understand your proposal
> correctly, you'd just resize buckets array, copy the existing buckets
> (first 1/4 buckets) and set 'needs_split' flags. Correct?
>
> Well, the problem is that while scanning a bucket you already need to know
> which tuples belong there. But with the lazy approach, the tuples might
> still be in the old (not yet split) buckets. So you'd have to search the
> current bucket and all buckets that were not split yet. Or did I miss
> something?

If you have, say, bucket 0..63 and you decide to change it to buckets
0..255, then any tuple that isn't in bucket N has to be in bucket N %
64.  More generally, any time you expand or contract by a power of two
it's pretty easy to figure out where tuples have to go.

But even if you do something that's not a power of two, like say a 10x
compaction of the buckets, there's still only two buckets that can
contain any particular hash value.  If the hash value is H, the old
number of buckets is O, and the new number of buckets is N, then the
tuple has got to be in bucket H % O or bucket H % N.  If bucket H % O
doesn't have the needs-split flag set, then it must be in H % N (if it
exists at all).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Kohei KaiGai
Date:
Subject: Re: RLS Design
Next
From: Tom Lane
Date:
Subject: Re: Extending constraint exclusion for implied constraints/conditions