Re: tweaking NTUP_PER_BUCKET - Mailing list pgsql-hackers

From Robert Haas
Subject Re: tweaking NTUP_PER_BUCKET
Date
Msg-id CA+TgmobMzPsA-N04=yX2jtMHeb3tGYw5HoZOE+Aq8PTHLoHYLw@mail.gmail.com
Whole thread Raw
In response to tweaking NTUP_PER_BUCKET  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: tweaking NTUP_PER_BUCKET
List pgsql-hackers
On Wed, Jul 2, 2014 at 8:13 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> I propose dynamic increase of the nbuckets (up to NTUP_PER_BUCKET=1)
> once the table is built and there's free space in work_mem. The patch
> mentioned above makes implementing this possible / rather simple.

Another idea would be to start with NTUP_PER_BUCKET=1 and then, if we
run out of memory, increase NTUP_PER_BUCKET.  I'd like to think that
the common case is that work_mem will be large enough that everything
fits; and if you do it that way, then you save yourself the trouble of
rehashing later, which as you point out might lose if there are only a
few probes.  If it turns out that you run short of memory, you can
merge pairs of buckets up to three times, effectively doubling
NTUP_PER_BUCKET each time.

Yet another idea is to stick with your scheme, but do the dynamic
bucket splits lazily.  Set a flag on each bucket indicating whether or
not it needs a lazy split.  When someone actually probes the hash
table, if the flag is set for a particular bucket, move any tuples
that don't belong as we scan the bucket.  If we reach the end of the
bucket chain, clear the flag.

Not sure either of these are better (though I kind of like the first
one) but I thought I'd throw them out there...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: "MauMau"
Date:
Subject: Re: [bug fix or improvement?] Correctly place DLLs for ECPG apps in bin folder
Next
From: "Tomas Vondra"
Date:
Subject: Re: tweaking NTUP_PER_BUCKET