Home > mailing lists

Re: Group by reordering optimization - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Group by reordering optimization
Date	September 2, 2020 16:12:01
Msg-id	20200902161201.ogkjrpf232a7htn3@development Whole thread
In response to	Re: Group by reordering optimization (Peter Geoghegan <pg@bowt.ie>)
Responses	Re: POC: GROUP BY optimization
List	pgsql-hackers

Tree view

On Tue, Sep 01, 2020 at 03:09:14PM -0700, Peter Geoghegan wrote:
>On Tue, Sep 1, 2020 at 2:09 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>> >* Instead of changing the order directly, now patch creates another patch with
>> >  modifier order of clauses. It does so for the normal sort as well as for
>> >  incremental sort. The whole thing is done in two steps: first it finds a
>> >  potentially better ordering taking into account number of groups, widths and
>> >  comparison costs; afterwards this information is used to produce a cost
>> >  estimation. This is implemented via a separate create_reordered_sort_path to
>> >  not introduce too many changes, I couldn't find any better place.
>> >
>>
>> I haven't tested the patch with any queries, but I agree this seems like
>> the right approach in general.
>
>If we're creating a new sort path anyway, then perhaps we can also
>change the collation -- it might be possible to "reduce" it to the "C"
>collation without breaking queries.
>
>This is admittedly pretty hard to do well. It could definitely work
>out when we have to do a sort anyway -- a sort with high cardinality
>abbreviated keys will be very fast (though we can't use abbreviated
>keys with libc collations right now). OTOH, it would be quite
>counterproductive if we were hoping to get an incremental sort that
>used some available index that happens to use the default collation
>(which is not the C collation in cases where this optimization is
>expected to help).
>

Even if reducing collations like this was possible (I have no idea how
tricky it is, my knowledge of collations is pretty minimal and from what
I know I'm not dying to learn more), I suggest we consider that out of
scope for this particular patch.

There are multiple open issues already - deciding which pathkeys are
interesting, reasonable costing, etc. Once those issues are solved, we
can consider tweaking collations as an additional optimizations.

Or maybe we can consider it entirely separately, i.e. why would it
matter if we re-order the GROUP BY keys? The collation reduction can
just as well help even if we use the same pathkeys.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Juan José Santamaría Flecha
Date: 02 September 2020, 15:51:27
Subject: Re: A micro-optimisation for walkdir()

From: Dave Page
Date: 02 September 2020, 16:20:03
Subject: Re: Kerberos support broken on MSVC builds for Windows x64?

Re: Group by reordering optimization - Mailing list pgsql-hackers

Previous

Next