Home > mailing lists

Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

From	Gavin Flower
Subject	Re: POC: GROUP BY optimization
Date	June 30, 2018 00:19:13
Msg-id	0b3ee377-2f67-f58d-6177-bc2ed4626499@archidevsys.co.nz Whole thread Raw
In response to	Re: POC: GROUP BY optimization (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: POC: GROUP BY optimization
List	pgsql-hackers

Tree view

On 30/06/18 03:03, Tomas Vondra wrote:
> On 06/29/2018 04:51 PM, Teodor Sigaev wrote:
>>
>>>> I tried to attack the cost_sort() issues and hope on that basis we 
>>>> can solve problems with 0002 patch and improve incremental sort patch.
>>>>
>>>
>>> OK, will do. Thanks for working on this!
>>
>> I hope, now we have a better cost_sort(). The obvious way is a try 
>> all combination of pathkeys in get_cheapest_group_keys_order() and 
>> choose cheapest one by cost_sort().
>
>> But it requires N! operations and potentially could be very
>> expensive in case of large number of pathkeys and doesn't solve the
>> issue with user-knows-what-he-does pathkeys.
>
> Not sure. There are N! combinations, but this seems like a good 
> candidate for backtracking [1]. You don't have to enumerate and 
> evaluate all N! combinations, just construct one and then abandon 
> whole classes of combinations as soon as they get more expensive than 
> the currently best one. That's thanks to additive nature of the 
> comparison costing, because appending a column to the sort key can 
> only make it more expensive. My guess is this will make this a non-issue.
>
> [1] https://en.wikipedia.org/wiki/Backtracking
>
>>
>> We could suggest an order of pathkeys as patch suggests now and if 
>> cost_sort() estimates cost is less than 80% (arbitrary chosen) cost
>> of user-suggested pathkeys then it use our else user pathkeys.
>>
>
> I really despise such arbitrary thresholds. I'd much rather use a more 
> reliable heuristics by default, even if it gets it wrong in some cases 
> (which it will, but that's natural).
>
> regards
>
Additionally put an upper limit threshold on the number of combinations 
to check, fairly large by default?

If first threshold is exceeded, could consider checking out a few more 
selected at random from paths not yet checked, to avoid any bias caused 
by stopping a systematic search.  This might prove important when N! is 
fairly large.


Cheers,
Gavin

pgsql-hackers by date:

From: Peter Eisentraut
Date: 29 June 2018, 23:47:00
Subject: TLS 1.3 and OpenSSL

From: Alvaro Herrera
Date: 30 June 2018, 00:27:17
Subject: Re: Internal error XX000 with enable_partition_pruning=on, pg 11beta1 on Debian

Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

Previous

Next