Re: Macro customizable hashtable / bitmapscan & aggregation perf - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Macro customizable hashtable / bitmapscan & aggregation perf |
Date | |
Msg-id | 20161011155625.kexqknqvls5yii3b@alap3.anarazel.de Whole thread Raw |
In response to | Re: Macro customizable hashtable / bitmapscan & aggregation perf (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: Macro customizable hashtable / bitmapscan & aggregation
perf
|
List | pgsql-hackers |
On 2016-10-11 04:29:31 +0200, Tomas Vondra wrote: > On 10/11/2016 04:07 AM, Andres Freund wrote: > > On 2016-10-10 17:46:22 -0700, Andres Freund wrote: > > > > TPC-DS (tpcds.ods) > > > > ------------------ > > > > > > > > In this case, I'd say the results are less convincing. There are quite a few > > > > queries that got slower by ~10%, which is well above - for example queries > > > > 22 and 67. There are of course queries that got ~10% faster, and in total > > > > the patched version executed more queries (so overall the result is slightly > > > > positive, but not significantly). > > > > > > That's interesting. I wonder whether that's plan changes just due to the > > > changing memory estimates, or what's causing that. I'll look into it. > > > > Hm. Based on an initial look those queries aren't planned with any of > > the affected codepaths. Could this primarily be a question of > > randomness? Would it perhaps make sense to run the tests in a comparable > > order? Looking at tpcds.py and the output files, it seems that the query > > order differes between the runs, that can easily explain bigger > > difference than the above. For me a scale=1 run creates a database of > > approximately 4.5GB, thus with shared_buffers=1GB execution order is > > likely to have a significant performance impact. > > > > Yes, I see similar plans (no bitmap index scans or hash aggregates). But the > difference is there, even when running the query alone (so it's not merely > due to the randomized ordering). > I wonder whether this is again due to compiler moving stuff around. It looks like that. I looked through a significant set of plans where there we time differences (generated on my machine), and none of them had bitmap or hash groupings to any significant degree. Comparing profiles in those cases usually shows a picture like: 24.98% +0.32% postgres [.] slot_deform_tuple 16.58% -0.05% postgres [.] ExecMakeFunctionResultNoSets 12.41% -0.01% postgres [.] slot_getattr 6.10% +0.39% postgres [.] heap_getnext 4.41% -0.37% postgres [.] ExecQual 3.08% +0.12% postgres [.] ExecEvalScalarVarFast 2.85% -0.11% postgres [.] check_stack_depth 2.48% +0.42% postgres [.] ExecEvalConst 2.44% -0.33% postgres [.] heapgetpage 2.34% +0.11% postgres [.] ExecScan 2.14% -0.20% postgres [.] ExecStoreTuple I.e. pretty random performance changes. This indeed looks like binary layout changes. Looking at these plans and at profiles spanning a run of all queries shows that bitmap scans and hash aggregations, while present, account for a very small amount of time in total. So tpc-ds doesn't look particularly interesting to evaluate these patches - but vey interesting for my slot deforming and qual evaluation patches. Btw, query_14.sql as generated by your templates (in pgperffarm) doesn't seem to work here. And I never had the patience to run query_1.sql to completion... Looks like we could very well use some planner improvements here. Greetings, Andres Freund
pgsql-hackers by date: