POC: GROUP BY optimization - Mailing list pgsql-hackers

From Teodor Sigaev
Subject POC: GROUP BY optimization
Date
Msg-id 7c79e6a5-8597-74e8-0671-1c39d124c9d6@sigaev.ru
Whole thread Raw
Responses Re: POC: GROUP BY optimization
List pgsql-hackers
Hi!

As far as I known, columns in GROUP BY could be reordered without loss of 
correctness. But reorder could give a benefits in performance. Suggested patch 
implements that in several ways:

1) Reorder GROUP BY columns by number of unique values in descent order. Simple 
example shows a performance difference by two times (see 
opt_group_by_demostrator.sql, on my notebook first two queries demonstrate 
that). The idea is: if first column is unique then sort comparator will never 
compute difference of following columns.

2) Reorder GROUP BY columns to match existing pathkeys which are result of index 
scan or ORDER BY clause. It prevents to add sort node - suppose, it's a clear win.

3) Patch makes suggestion to use index by GROUP BY clause, but unlike to ORDER 
BY or merge join case it doesn't pay attention to actual order of columns 
because of 2)

Patch doesn't change any cost estimation computation, although 1) could take an 
advantage of it.

Some issues/problems/notices:

1) I didn't play around GROUPING SETS at all. As I can see, current coding 
prevents any optimization around it and, suppose, it should be addressed  to 
another patch

2) I'm not completely satisfied with counting of number of groups per column, it 
looks ugly without refactoring estimate_num_groups(). At least, now it could be 
called multiple times for each column. May be, this number should be added to 
PathKey structure?

3) EquivalenceClass->ec_sortref. If planner faced with column first time in 
WHERE clause it doesn't fill target reference field because it is unknown yet. 
Patch looks for accordance of PathKey (group_pathkeys) and SortGroupClause 
(groupClause) and fails in this case. So, get_eclass_for_sort_expr() now updates 
ec_sortref field if it's not initialized yet.

4) Algorithms to reorder columns is proportional to N^2 where N is number of 
columns, but I hope it isn't a problem because number of GROUP BY columns isn't 
very big usually.




-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: pg_config.h.win32 missing a set of flags from pg_config.h.in addedin v11 development
Next
From: Alvaro Herrera
Date:
Subject: Re: found xmin from before relfrozenxid on pg_catalog.pg_authid