PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching - Mailing list pgsql-hackers

From Tomas Vondra
Subject PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching
Date
Msg-id 5483C4AC.5060107@fuzzy.cz
Whole thread Raw
Responses Re: PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching  (Kevin Grittner <kgrittn@ymail.com>)
Re: PATCH: hashjoin - gracefully increasing NTUP_PER_BUCKET instead of batching  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

back when we were discussing the hashjoin patches (now committed),
Robert proposed that maybe it'd be a good idea to sometimes increase the
number of tuples per bucket instead of batching.

That is, while initially sizing the hash table - if the hash table with
enough buckets to satisfy NTUP_PER_BUCKET load factor does not fit into
work_mem, try a bit higher load factor before starting to batch.

Attached patch is an initial attempt to implement this - it's a bit
rough on the edges, but hopefully enough to judge the benefit of this.

The default load factor is 1. The patch tries to degrade this to 2, 4 or
8 in attempt to fit the hash table into work mem. If it doesn't work, it
starts batching with the default load factor. If the batching is
required while the hashjoin is running, the load factor is switched back
to the default one (if we're batching, there's no point in keeping the
slower hash table).

The patch also modifies explain output, to show the load factor.

The testing I've done so far are rather disappointing, though.


    create table a as select i from generate_series(1,1000000) s(i);
    create table b as select i from generate_series(1,1000000) s(i);

    analyze a;
    analyze b;

    select a.i, b.i from a join b on (a.i = b.i);

    work_mem   batches    tuples per bucket      duration
    -----------------------------------------------------
    64 MB            1                    1        585 ms
    46 MB            1                    2        639 ms
    43 MB            1                    4        794 ms
    40 MB            1                    8       1075 ms
    39 MB            2                    1        623 ms

So, even increasing the load factor to 2 is slower than batching.

Of course, with other examples the results may be different. For example
with a much larger outer table:

    create table a as select mod(i,1000000) i
                        from generate_series(1,10000000) s(i);
    analyze a;

    work_mem   batches    tuples per bucket      duration
    -----------------------------------------------------
    64 MB            1                    1       3904 ms
    46 MB            1                    2       4692 ms
    43 MB            1                    4       6499 ms
    40 MB            1                    8       9264 ms
    39 MB            2                    1       4054 ms

Same results. Of course, for "huge" outer tables it will make a
difference. For example on a ~8GB outer table (on a machine with 8GB of
RAM), the batching causes enough I/O to make it slower than ntup=2 (50s
vs. 80s, for example).

I'm not sure what's the best way forward - clearly, for cases that fit
into RAM (temp files into page cache), batching is faster. For tables
large enough to cause a lot of I/O, it may make a difference - but I'm
not sure how to identify these cases.

So ISTM implementing this requires a reliable way to identify which case
we're dealing with - if the outer table is large enough (prefer
increasing load factor) or not (prefer batching). Until then keeping the
current simple/predictible approach is probably better.

Of course, this also depends on additional variables (e.g. is the system
memory-stressed?).

regards
Tomas

Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: PATCH: adaptive ndistinct estimator v3 (WAS: Re: [PERFORM] Yet another abort-early plan disaster on 9.3)
Next
From: Bruce Momjian
Date:
Subject: Re: Testing DDL deparsing support