Home > mailing lists

Re: PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching
Date	December 12, 2014 16:20:09
Msg-id	CA+TgmobEapmVBgH6LOafBEzZJkAk4pgj-OXwnT58xkoccetabg@mail.gmail.com Whole thread Raw
In response to	Re: PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching (Tomas Vondra <tv@fuzzy.cz>)
Responses	Re: PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching (Tomas Vondra <tv@fuzzy.cz>) Re: PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching (Peter Geoghegan <pg@heroku.com>)
List	pgsql-hackers

Tree view

On Thu, Dec 11, 2014 at 5:46 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>>> The idea was that if we could increase the load a bit (e.g. using 2
>>> tuples per bucket instead of 1), we will still use a single batch in
>>> some cases (when we miss the work_mem threshold by just a bit). The
>>> lookups will be slower, but we'll save the I/O.
>>
>> Yeah.  That seems like a valid theory, but your test results so far
>> seem to indicate that it's not working out like that - which I find
>> quite surprising, but, I mean, it is what it is, right?
>
> Not exactly. My tests show that as long as the outer table batches fit
> into page cache, icreasing the load factor results in worse performance
> than batching.
>
> When the outer table is "sufficiently small", the batching is faster.
>
> Regarding the "sufficiently small" - considering today's hardware, we're
> probably talking about gigabytes. On machines with significant memory
> pressure (forcing the temporary files to disk), it might be much lower,
> of course. Of course, it also depends on kernel settings (e.g.
> dirty_bytes/dirty_background_bytes).

Well, this is sort of one of the problems with work_mem.  When we
switch to a tape sort, or a tape-based materialize, we're probably far
from out of memory.  But trying to set work_mem to the amount of
memory we have can easily result in a memory overrun if a load spike
causes lots of people to do it all at the same time.  So we have to
set work_mem conservatively, but then the costing doesn't really come
out right.  We could add some more costing parameters to try to model
this, but it's not obvious how to get it right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Andres Freund
Date: 12 December 2014, 16:14:24
Subject: Re: moving from contrib to bin

From: Robert Haas
Date: 12 December 2014, 16:23:46
Subject: Re: Compression of full-page-writes

Re: PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching - Mailing list pgsql-hackers

Previous

Next