Re: PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching
Date
Msg-id 646916973.4801168.1418220141970.JavaMail.yahoo@jws10090.mail.ne1.yahoo.com
Whole thread Raw
In response to PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching  (Tomas Vondra <tv@fuzzy.cz>)
List pgsql-hackers
Tomas Vondra <tv@fuzzy.cz> wrote:

> back when we were discussing the hashjoin patches (now committed),
> Robert proposed that maybe it'd be a good idea to sometimes increase the
> number of tuples per bucket instead of batching.
>
> That is, while initially sizing the hash table - if the hash table with
> enough buckets to satisfy NTUP_PER_BUCKET load factor does not fit into
> work_mem, try a bit higher load factor before starting to batch.
>
> Attached patch is an initial attempt to implement this - it's a bit
> rough on the edges, but hopefully enough to judge the benefit of this.
>
> The default load factor is 1. The patch tries to degrade this to 2, 4 or
> 8 in attempt to fit the hash table into work mem. If it doesn't work, it
> starts batching with the default load factor. If the batching is
> required while the hashjoin is running, the load factor is switched back
> to the default one (if we're batching, there's no point in keeping the
> slower hash table).
>
> The patch also modifies explain output, to show the load factor.
>
> The testing I've done so far are rather disappointing, though.
>
>     create table a as select i from generate_series(1,1000000) s(i);
>     create table b as select i from generate_series(1,1000000) s(i);
>
>     analyze a;
>     analyze b;
>
>     select a.i, b.i from a join b on (a.i = b.i);
>
>     work_mem  batches    tuples per bucket      duration
>     -----------------------------------------------------
>     64 MB            1                    1        585 ms
>     46 MB            1                    2        639 ms
>     43 MB            1                    4        794 ms
>     40 MB            1                    8      1075 ms
>     39 MB            2                    1        623 ms
>
> So, even increasing the load factor to 2 is slower than batching.

Right, I saw the same thing.

I tried pretty hard to create a case where increasing the load
factor from 1 to 2 was faster than going to a second batch, and was
unable to do so.   I hypothesized that the second batch was cached
by the OS and flipping the data in and out of the OS cache was
faster than chasing through the links.   I expect that if you have a
large enough hashtable to barely exceed your machines ability to
cache, you *might* see a benefit in the hash table access by
increasing the load factor.   I think it would be incredibly hard to
accurately identify those cases, and every time a hash table was
used there would be a cost to trying to figure it out.  I just
can't see this being a win.

> I'm not sure what's the best way forward - clearly, for cases that fit
> into RAM (temp files into page cache), batching is faster. For tables
> large enough to cause a lot of I/O, it may make a difference - but I'm
> not sure how to identify these cases.
>
> So ISTM implementing this requires a reliable way to identify which case
> we're dealing with - if the outer table is large enough (prefer
> increasing load factor) or not (prefer batching). Until then keeping the
> current simple/predictible approach is probably better.
>
> Of course, this also depends on additional variables (e.g. is the system
> memory-stressed?).

All that is on-target, but my take-away is that increasing load
factor to avoid a second batch was an interesting idea that turns
out to not really be a good one.   If someone can craft a
reproducible test case that demonstrates a win for increasing the
load factor that doesn't have significant cost for cases where it
isn't a win, I might change my opinion; but count me as a skeptic.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: logical column ordering
Next
From: Alex Shulgin
Date:
Subject: Re: Small TRUNCATE glitch