Re: Macro customizable hashtable / bitmapscan & aggregation perf - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Macro customizable hashtable / bitmapscan & aggregation perf
Date
Msg-id dbefd9cf-bfd8-2bcc-0feb-e8e972aacf9b@2ndquadrant.com
Whole thread Raw
In response to Re: Macro customizable hashtable / bitmapscan & aggregation perf  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 10/11/2016 05:56 PM, Andres Freund wrote:
> On 2016-10-11 04:29:31 +0200, Tomas Vondra wrote:
>> On 10/11/2016 04:07 AM, Andres Freund wrote:
>>> On 2016-10-10 17:46:22 -0700, Andres Freund wrote:
>>>>> TPC-DS (tpcds.ods)
>>>>> ------------------
>>>>>
>>>>> In this case, I'd say the results are less convincing. There are quite a few
>>>>> queries that got slower by ~10%, which is well above - for example queries
>>>>> 22 and 67. There are of course queries that got ~10% faster, and in total
>>>>> the patched version executed more queries (so overall the result is slightly
>>>>> positive, but not significantly).
>>>>
>>>> That's interesting. I wonder whether that's plan changes just due to the
>>>> changing memory estimates, or what's causing that. I'll look into it.
>>>
>>> Hm. Based on an initial look those queries aren't planned with any of
>>> the affected codepaths.  Could this primarily be a question of
>>> randomness? Would it perhaps make sense to run the tests in a comparable
>>> order? Looking at tpcds.py and the output files, it seems that the query
>>> order differes between the runs, that can easily explain bigger
>>> difference than the above. For me a scale=1 run creates a database of
>>> approximately 4.5GB, thus with shared_buffers=1GB execution order is
>>> likely to have a significant performance impact.
>>>
>>
>> Yes, I see similar plans (no bitmap index scans or hash aggregates). But the
>> difference is there, even when running the query alone (so it's not merely
>> due to the randomized ordering).
>
>> I wonder whether this is again due to compiler moving stuff around.
>
> It looks like that. I looked through a significant set of plans where
> there we time differences (generated on my machine), and none of them
> had bitmap or hash groupings to any significant degree.  Comparing
> profiles in those cases usually shows a picture like:
>     24.98%   +0.32%  postgres            [.] slot_deform_tuple
>     16.58%   -0.05%  postgres            [.] ExecMakeFunctionResultNoSets
>     12.41%   -0.01%  postgres            [.] slot_getattr
>      6.10%   +0.39%  postgres            [.] heap_getnext
>      4.41%   -0.37%  postgres            [.] ExecQual
>      3.08%   +0.12%  postgres            [.] ExecEvalScalarVarFast
>      2.85%   -0.11%  postgres            [.] check_stack_depth
>      2.48%   +0.42%  postgres            [.] ExecEvalConst
>      2.44%   -0.33%  postgres            [.] heapgetpage
>      2.34%   +0.11%  postgres            [.] ExecScan
>      2.14%   -0.20%  postgres            [.] ExecStoreTuple
>
> I.e. pretty random performance changes. This indeed looks like binary
> layout changes.  Looking at these plans and at profiles spanning a run
> of all queries shows that bitmap scans and hash aggregations, while
> present, account for a very small amount of time in total. So tpc-ds
> doesn't look particularly interesting to evaluate these patches - but
> vey interesting for my slot deforming and qual evaluation patches.
>

Meh, that's annoying. Anyway, the reason why many of the TPC-DS queries 
can't really benefit from the patch is that a lot of the queries use 
grouping sets, and we only have sorted implementation for that.

I wonder whether the TPC-H differences are also affected by this ...

> Btw, query_14.sql as generated by your templates (in pgperffarm) doesn't
> seem to work here. And I never had the patience to run query_1.sql to
> completion... Looks like we could very well use some planner
> improvements here.
>

Yeah, planner improvements would be great ;-)

Query 14 needs an explicit cast to text, otherwise you get something 
like "can't cast unknown to text" error. Not sure if that's an expected 
issue or not, but the cast fixes it. I haven't pushed that to the 
pgperffarm repo yet, along with several other tooling fixes.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: autonomous transactions
Next
From: Stephen Frost
Date:
Subject: Remove "Source Code" column from \df+ ?