Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: POC: GROUP BY optimization
Date
Msg-id 20200514235220.xewrrwjvatxzn3g6@development
Whole thread Raw
In response to Re: POC: GROUP BY optimization  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: POC: GROUP BY optimization  (Dmitry Dolgov <9erthalion6@gmail.com>)
Re: POC: GROUP BY optimization  (Pavel Borisov <pashkin.elfe@gmail.com>)
List pgsql-hackers
Hi,

I wonder if anyone has plans to try again with this optimization in v14
cycle? The patches no longer apply thanks to the incremental sort patch,
but I suppose fixing that should not be extremely hard.

The 2020-07 CF is still a couple weeks away, but it'd be good to know if
there are any plans to revive this. I'm willing to spend some time on
reviewing / testing this, etc.


I've only quickly skimmed the old thread, but IIRC there were two main
challenges in getting the optimization right:


1) deciding which orderings are interesting / worth additional work

I think we need to consider these orderings, in addition to the one
specified in GROUP BY:

1) as specified in ORDER BY (if different from 1)

2) one with cheapest sort on unsorted input (depending on number of
distinct values, cost of comparisons, etc.)

3) one with cheapest sort on partially sorted input (essentially what we
do with the incremental sort paths, but matching the pathkeys in a more
elaborate way)


2) costing the alternative orderings

I think we've already discussed various ways to leverage as much
available info as possible (extended stats, MCVs, ...) but I think the
patch only does some of it.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Event trigger code comment duplication
Next
From: Alvaro Herrera
Date:
Subject: Re: Add A Glossary