Re: PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching |
Date | |
Msg-id | 646916973.4801168.1418220141970.JavaMail.yahoo@jws10090.mail.ne1.yahoo.com Whole thread Raw |
In response to | PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching (Tomas Vondra <tv@fuzzy.cz>) |
List | pgsql-hackers |
Tomas Vondra <tv@fuzzy.cz> wrote: > back when we were discussing the hashjoin patches (now committed), > Robert proposed that maybe it'd be a good idea to sometimes increase the > number of tuples per bucket instead of batching. > > That is, while initially sizing the hash table - if the hash table with > enough buckets to satisfy NTUP_PER_BUCKET load factor does not fit into > work_mem, try a bit higher load factor before starting to batch. > > Attached patch is an initial attempt to implement this - it's a bit > rough on the edges, but hopefully enough to judge the benefit of this. > > The default load factor is 1. The patch tries to degrade this to 2, 4 or > 8 in attempt to fit the hash table into work mem. If it doesn't work, it > starts batching with the default load factor. If the batching is > required while the hashjoin is running, the load factor is switched back > to the default one (if we're batching, there's no point in keeping the > slower hash table). > > The patch also modifies explain output, to show the load factor. > > The testing I've done so far are rather disappointing, though. > > create table a as select i from generate_series(1,1000000) s(i); > create table b as select i from generate_series(1,1000000) s(i); > > analyze a; > analyze b; > > select a.i, b.i from a join b on (a.i = b.i); > > work_mem batches tuples per bucket duration > ----------------------------------------------------- > 64 MB 1 1 585 ms > 46 MB 1 2 639 ms > 43 MB 1 4 794 ms > 40 MB 1 8 1075 ms > 39 MB 2 1 623 ms > > So, even increasing the load factor to 2 is slower than batching. Right, I saw the same thing. I tried pretty hard to create a case where increasing the load factor from 1 to 2 was faster than going to a second batch, and was unable to do so. I hypothesized that the second batch was cached by the OS and flipping the data in and out of the OS cache was faster than chasing through the links. I expect that if you have a large enough hashtable to barely exceed your machines ability to cache, you *might* see a benefit in the hash table access by increasing the load factor. I think it would be incredibly hard to accurately identify those cases, and every time a hash table was used there would be a cost to trying to figure it out. I just can't see this being a win. > I'm not sure what's the best way forward - clearly, for cases that fit > into RAM (temp files into page cache), batching is faster. For tables > large enough to cause a lot of I/O, it may make a difference - but I'm > not sure how to identify these cases. > > So ISTM implementing this requires a reliable way to identify which case > we're dealing with - if the outer table is large enough (prefer > increasing load factor) or not (prefer batching). Until then keeping the > current simple/predictible approach is probably better. > > Of course, this also depends on additional variables (e.g. is the system > memory-stressed?). All that is on-target, but my take-away is that increasing load factor to avoid a second batch was an interesting idea that turns out to not really be a good one. If someone can craft a reproducible test case that demonstrates a win for increasing the load factor that doesn't have significant cost for cases where it isn't a win, I might change my opinion; but count me as a skeptic. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: