Re: BUG #15001: planner cann't distinguish composite index? - Mailing list pgsql-bugs
From | Amit Kapila |
---|---|
Subject | Re: BUG #15001: planner cann't distinguish composite index? |
Date | |
Msg-id | CAA4eK1Kg1ffscayD+dENWEzajW5fFBX=YhkD4KvLOjsHVKGweA@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #15001: planner cann't distinguish composite index? (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: BUG #15001: planner cann't distinguish composite index?
|
List | pgsql-bugs |
On Thu, Jan 18, 2018 at 3:14 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > > On 01/10/2018 04:35 AM, Amit Kapila wrote: >> On Tue, Jan 9, 2018 at 4:55 PM, PG Bug reporting form >> <noreply@postgresql.org> wrote: >>> The following bug has been logged on the website: >>> >>> Bug reference: 15001 >>> Logged by: Zhou Digoal >>> Email address: digoal@126.com >>> PostgreSQL version: 10.1 >>> Operating system: centos 7.x x64 >>> Description: >>> >>> ``` >>> postgres=# create table tbl(c1 int, c2 int, c3 int); >>> CREATE TABLE >>> postgres=# create index idx_tbl on tbl (c1,c2); >>> CREATE INDEX >>> postgres=# insert into tbl select random()*100, random()*10 from >>> generate_series(1,10000000); >>> INSERT 0 10000000 >>> >>> postgres=# explain select c1,c2 , count(*) from tbl group by c2,c1; >>> QUERY PLAN >>> >>> -------------------------------------------------------------------------------------------------------------------- >>> Finalize GroupAggregate (cost=176259.41..176337.18 rows=1111 width=16) >>> Group Key: c2, c1 >>> -> Sort (cost=176259.41..176276.08 rows=6666 width=16) >>> Sort Key: c2, c1 >>> -> Gather (cost=175158.32..175836.03 rows=6666 width=16) >>> Workers Planned: 6 >>> -> Partial HashAggregate (cost=174158.32..174169.43 >>> rows=1111 width=16) >>> Group Key: c2, c1 >>> -> Parallel Index Only Scan using idx_tbl on tbl >>> (cost=0.43..161658.26 rows=1666675 width=8) >>> (9 rows) >>> >>> postgres=# explain select c1,c2 , count(*) from tbl group by c1,c2; >>> QUERY PLAN >>> >>> -------------------------------------------------------------------------------------------------------------- >>> Finalize GroupAggregate (cost=1000.53..176040.80 rows=1111 width=16) >>> Group Key: c1, c2 >>> -> Gather Merge (cost=1000.53..175979.69 rows=6666 width=16) >>> Workers Planned: 6 >>> -> Partial GroupAggregate (cost=0.43..174169.43 rows=1111 >>> width=16) >>> Group Key: c1, c2 >>> -> Parallel Index Only Scan using idx_tbl on tbl >>> (cost=0.43..161658.26 rows=1666675 width=8) >>> (7 rows) >>> ``` >>> >>> i need to set enable_sort=off, so planner can choose the same planner with >>> c1,c2 and c2,c1 group by. >>> >>> ``` >>> postgres=# set enable_sort=off; >>> SET >>> postgres=# explain select c1,c2 , count(*) from tbl group by c2,c1; >>> QUERY PLAN >>> >>> -------------------------------------------------------------------------------------------------------------- >>> Finalize HashAggregate (cost=175886.03..175897.14 rows=1111 width=16) >>> Group Key: c2, c1 >>> -> Gather (cost=175158.32..175836.03 rows=6666 width=16) >>> Workers Planned: 6 >>> -> Partial HashAggregate (cost=174158.32..174169.43 rows=1111 >>> width=16) >>> Group Key: c2, c1 >>> -> Parallel Index Only Scan using idx_tbl on tbl >>> (cost=0.43..161658.26 rows=1666675 width=8) >>> (7 rows) >>> ``` >>> >> >> I don't think the plan chosen is same. If you notice, for c1,c2 the >> plan chosen is Finalize GroupAggregate -> Gather Merge whereas for >> c2,c1 (with sort off) it is Finalize HashAggregate -> Gather. There is >> a lot of difference between both the plans. The output by Gather >> Merge is sorted, so you can directly use GroupAggregate whereas the >> output of Gather is unsorted, so the final plan is HashAggregate. >> >> Now, here one can wonder why the planner hasn't chosen the path >> without Sort for c2,c1 even when enable_sort=on as the cost of that >> plan is less. If you see the cost difference of plans with sort >> (total_cost - 176337.18) and without sort (total_cost - 175897.14), it >> is marginal and planner thinks that they are fuzzily same. The same >> is true for startup costs as well. Now, if both have same costs >> (fuzzily), it gives preference to the sorted path. >> > > The original report is quite unclear - it only shows some plans but does > not explain why it's a bug or what's the expected behavior. > > The fuzzy comparison of costs certainly explains at least some of it, > but my feeling is it's likely related to the fact that group by has to > match a composite index including the order of columns. > > That is, an index defined on (c1,c2) is unusable for (GROUP BY c2,c1). > Which is why GROUP BY c1,c2 uses GroupAggregate, while GROUP BY c2,c1 > uses HashAggregate. > Sure, but it can use sort node for c2,c1 to make use of GroupAggregate if such a plan is cheap and that is what has happened in the first case of this report. I have tried to explain the reason why planner has chosen the specific plan in each of the three cases. I guess the OP might have confused between second and third plan which appears to be somewhat similar but are actually quite different. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-bugs by date: