Home > mailing lists

Re: Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller
Date	June 11, 2014 03:12:38
Msg-id	CA+TgmoYYXP34wwA7sQWLeTCaTfB5+NBAbZ6QoSpAu7W+46Xqfw@mail.gmail.com Whole thread Raw
In response to	Re: Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller (Jeff Janes <jeff.janes@gmail.com>)
List	pgsql-hackers

Tree view

On Tue, Jun 10, 2014 at 1:43 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Tue, Jun 10, 2014 at 5:17 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> The problem case is when you have 1 batch and the increased memory
>> consumption causes you to switch to 2 batches.  That's expensive.  It
>> seems clear based on previous testing that *on the average*
>> NTUP_PER_BUCKET = 1 will be better, but in the case where it causes an
>> increase in the number of batches it will be much worse - particularly
>> because the only way we ever increase the number of batches is to
>> double it, which is almost always going to be a huge loss.
>
> Is there a reason we don't do hybrid hashing, where if 80% fits in memory
> than we write out only the 20% that doesn't? And then when probing the table
> with the other input, the 80% that land in in-memory buckets get handled
> immediately, and only the 20 that land in the on-disk buckets get written
> for the next step?

We have an optimization that is a little bit like that.  The "skew"
hash join stuff tries to (essentially) ensure that the MCVs are in the
first batch.

But more could probably be done.  For example, suppose we have 256
buckets.  If the hash table overflows work_mem, we could write the
contents of *one bucket* out to disk, rather than (as we currently do)
half of the table.  If we overflow again, we write another bucket.
When the number of buckets written reaches half the total, we split
all of the remaining buckets so that all 256 slots are once again
active.  Repeat as needed.

If something like that worked out, it would drastically reduce the
penalty for slightly overrunning work_mem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Tom Lane
Date: 11 June 2014, 02:34:00
Subject: Re: API change advice: Passing plan invalidation info from the rewriter into the planner?

From: Andres Freund
Date: 11 June 2014, 03:26:11
Subject: Re: "RETURNING PRIMARY KEY" syntax extension

Re: Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller - Mailing list pgsql-hackers

Previous

Next