Re: queries with DISTINCT / GROUP BY giving different plans - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: queries with DISTINCT / GROUP BY giving different plans
Date
Msg-id 5213ADB1.3040807@fuzzy.cz
Whole thread Raw
In response to Re: queries with DISTINCT / GROUP BY giving different plans  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: queries with DISTINCT / GROUP BY giving different plans
List pgsql-performance
On 20.8.2013 18:24, Tom Lane wrote:
> Tomas Vondra <tv@fuzzy.cz> writes:
>> I've managed to get the data to a different machine, and I've spent
>> some time on debugging it.
>
> Great, thanks for looking into it!
>
>> It seems that the difference is in evaluating hashentrysize [
>> choose_hashed_distinct omits hash_agg_entry_size() ] but the
>> hashentrysize size is 24 (choose_hashed_distinct) or 56
>> (choose_hashed_grouping). This causes that while _distinct
>> evaluates the condition as false, and _grouping as true (and thus
>> returns false).
>
> Hah.
>
>> Now, the difference between 24 and 56 is caused by
>> hash_agg_entry_size. It's called with numAggs=0 but returns 32. I'm
>> wondering if it should return 0 in such cases, i.e. something like
>> this:
>
> No, I don't think so.  I'm pretty sure the reason
> choose_hashed_distinct is like that is that I subconsciously assumed
> hash_agg_entry_size would produce zero for numAggs = 0; but in fact
> it does not and should not, because there's still some overhead for
> the per-group hash entry whether or not there's any aggregates.  So
> the right fix is that choose_hashed_distinct should add
> hash_agg_entry_size(0) onto its hashentrysize estimate.
>
> A separate issue is that the use of numAggs-1 in
> hash_agg_entry_size's calculations seems a bit risky if numAggs can
> be zero - I'm not sure we can rely on compilers to get that right.
> I'm inclined to replace that with use of offsetof.  Likewise in
> build_hash_table.
>
>> I've tested that after this both queries use HashAggregate (which
>> is the right choice), but I haven't done any extensive checking so
>> maybe I'm missing something.
>
> It might be the preferable choice in this example, but you're looking
> at an edge case.  If you want the thing to be using a hash aggregate
> for this size of problem, you should increase work_mem.

Hmmm. I think the main 'issue' here is that the queries behave quite
differently although it seems like they should do the same thing (well,
I understand they're not the same).

We're already using work_mem='800MB' so there's not much room to
increase this. Actually, this is probably the main reason why we haven't
seen this issue more often, because the other dataset are smaller (but
that won't last for long, because of steady growth).

A complete explain analyze for the HashAggregate plan is available here:
http://explain.depesz.com/s/jCO The estimates seem to be pretty exact,
except for the very last step:

 HashAggregate  (cost=1399795.00..1604305.06 rows=20451006 width=8)
                (actual time=13985.580..14106.708 rows=355600 loops=1)

So, the estimate is ~60x higher than the actual value, which then
happens to work for choose_hashed_distinct (because it uses much lower
value for hashentrysize), but for choose_hashed_grouping this is
actually above the threshold.

But then again, the actual number of rows is much lower than the
estimate so that the amount of memory is actually well within work_mem
so it does not cause any trouble with OOM.

So I don't think increasing the work_mem is a good long-term solution
here, because the main problem here is the estimate. Another sign I
should probably start working on the multi-column indexes as I planned
for a long time ...

Anyway, I still don't understand why the same logic around
hash_agg_entry_size should not apply to choose_hashed_grouping as well?
Well, it would make it slower in this particular corner case, but
wouldn't it be more correct?

Tomas


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: queries with DISTINCT / GROUP BY giving different plans
Next
From: David McNett
Date:
Subject: Can query planner prefer a JOIN over a high-cost Function?