On Tue, Sep 01, 2020 at 03:09:14PM -0700, Peter Geoghegan wrote:
>On Tue, Sep 1, 2020 at 2:09 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>> >* Instead of changing the order directly, now patch creates another patch with
>> > modifier order of clauses. It does so for the normal sort as well as for
>> > incremental sort. The whole thing is done in two steps: first it finds a
>> > potentially better ordering taking into account number of groups, widths and
>> > comparison costs; afterwards this information is used to produce a cost
>> > estimation. This is implemented via a separate create_reordered_sort_path to
>> > not introduce too many changes, I couldn't find any better place.
>> >
>>
>> I haven't tested the patch with any queries, but I agree this seems like
>> the right approach in general.
>
>If we're creating a new sort path anyway, then perhaps we can also
>change the collation -- it might be possible to "reduce" it to the "C"
>collation without breaking queries.
>
>This is admittedly pretty hard to do well. It could definitely work
>out when we have to do a sort anyway -- a sort with high cardinality
>abbreviated keys will be very fast (though we can't use abbreviated
>keys with libc collations right now). OTOH, it would be quite
>counterproductive if we were hoping to get an incremental sort that
>used some available index that happens to use the default collation
>(which is not the C collation in cases where this optimization is
>expected to help).
>
Even if reducing collations like this was possible (I have no idea how
tricky it is, my knowledge of collations is pretty minimal and from what
I know I'm not dying to learn more), I suggest we consider that out of
scope for this particular patch.
There are multiple open issues already - deciding which pathkeys are
interesting, reasonable costing, etc. Once those issues are solved, we
can consider tweaking collations as an additional optimizations.
Or maybe we can consider it entirely separately, i.e. why would it
matter if we re-order the GROUP BY keys? The collation reduction can
just as well help even if we use the same pathkeys.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services