Home > mailing lists

Re: BUG #15001: planner cann't distinguish composite index? - Mailing list pgsql-bugs

From	Amit Kapila
Subject	Re: BUG #15001: planner cann't distinguish composite index?
Date	January 20, 2018 09:15:27
Msg-id	CAA4eK1Kg1ffscayD+dENWEzajW5fFBX=YhkD4KvLOjsHVKGweA@mail.gmail.com Whole thread
In response to	Re: BUG #15001: planner cann't distinguish composite index? (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: BUG #15001: planner cann't distinguish composite index?
List	pgsql-bugs

Tree view

On Thu, Jan 18, 2018 at 3:14 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
>
> On 01/10/2018 04:35 AM, Amit Kapila wrote:
>> On Tue, Jan 9, 2018 at 4:55 PM, PG Bug reporting form
>> <noreply@postgresql.org> wrote:
>>> The following bug has been logged on the website:
>>>
>>> Bug reference:      15001
>>> Logged by:          Zhou Digoal
>>> Email address:      digoal@126.com
>>> PostgreSQL version: 10.1
>>> Operating system:   centos 7.x x64
>>> Description:
>>>
>>> ```
>>> postgres=# create table tbl(c1 int, c2 int, c3 int);
>>> CREATE TABLE
>>> postgres=# create index idx_tbl on tbl (c1,c2);
>>> CREATE INDEX
>>> postgres=# insert into tbl select random()*100, random()*10 from
>>> generate_series(1,10000000);
>>> INSERT 0 10000000
>>>
>>> postgres=# explain select c1,c2 , count(*) from tbl group by c2,c1;
>>>                                                      QUERY PLAN
>>>
>>>
--------------------------------------------------------------------------------------------------------------------
>>>  Finalize GroupAggregate  (cost=176259.41..176337.18 rows=1111 width=16)
>>>    Group Key: c2, c1
>>>    ->  Sort  (cost=176259.41..176276.08 rows=6666 width=16)
>>>          Sort Key: c2, c1
>>>          ->  Gather  (cost=175158.32..175836.03 rows=6666 width=16)
>>>                Workers Planned: 6
>>>                ->  Partial HashAggregate  (cost=174158.32..174169.43
>>> rows=1111 width=16)
>>>                      Group Key: c2, c1
>>>                      ->  Parallel Index Only Scan using idx_tbl on tbl
>>> (cost=0.43..161658.26 rows=1666675 width=8)
>>> (9 rows)
>>>
>>> postgres=# explain select c1,c2 , count(*) from tbl group by c1,c2;
>>>                                                   QUERY PLAN
>>>
>>> --------------------------------------------------------------------------------------------------------------
>>>  Finalize GroupAggregate  (cost=1000.53..176040.80 rows=1111 width=16)
>>>    Group Key: c1, c2
>>>    ->  Gather Merge  (cost=1000.53..175979.69 rows=6666 width=16)
>>>          Workers Planned: 6
>>>          ->  Partial GroupAggregate  (cost=0.43..174169.43 rows=1111
>>> width=16)
>>>                Group Key: c1, c2
>>>                ->  Parallel Index Only Scan using idx_tbl on tbl
>>> (cost=0.43..161658.26 rows=1666675 width=8)
>>> (7 rows)
>>> ```
>>>
>>> i need to set enable_sort=off, so planner can choose the same planner with
>>> c1,c2 and c2,c1 group by.
>>>
>>> ```
>>> postgres=# set enable_sort=off;
>>> SET
>>> postgres=# explain select c1,c2 , count(*) from tbl group by c2,c1;
>>>                                                   QUERY PLAN
>>>
>>> --------------------------------------------------------------------------------------------------------------
>>>  Finalize HashAggregate  (cost=175886.03..175897.14 rows=1111 width=16)
>>>    Group Key: c2, c1
>>>    ->  Gather  (cost=175158.32..175836.03 rows=6666 width=16)
>>>          Workers Planned: 6
>>>          ->  Partial HashAggregate  (cost=174158.32..174169.43 rows=1111
>>> width=16)
>>>                Group Key: c2, c1
>>>                ->  Parallel Index Only Scan using idx_tbl on tbl
>>> (cost=0.43..161658.26 rows=1666675 width=8)
>>> (7 rows)
>>> ```
>>>
>>
>> I don't think the plan chosen is same.  If you notice, for c1,c2 the
>> plan chosen is Finalize GroupAggregate -> Gather Merge whereas for
>> c2,c1 (with sort off) it is Finalize HashAggregate -> Gather. There is
>> a lot of difference between both the plans.  The output by Gather
>> Merge is sorted, so you can directly use GroupAggregate whereas the
>> output of Gather is unsorted, so the final plan is HashAggregate.
>>
>> Now, here one can wonder why the planner hasn't chosen the path
>> without Sort for c2,c1 even when enable_sort=on as the cost of that
>> plan is less.  If you see the cost difference of plans with sort
>> (total_cost - 176337.18) and without sort (total_cost - 175897.14), it
>> is marginal and planner thinks that they are fuzzily same.  The same
>> is true for startup costs as well.  Now, if both have same costs
>> (fuzzily), it gives preference to the sorted path.
>>
>
> The original report is quite unclear - it only shows some plans but does
> not explain why it's a bug or what's the expected behavior.
>
> The fuzzy comparison of costs certainly explains at least some of it,
> but my feeling is it's likely related to the fact that group by has to
> match a composite index including the order of columns.
>
> That is, an index defined on (c1,c2) is unusable for (GROUP BY c2,c1).
> Which is why GROUP BY c1,c2 uses GroupAggregate, while GROUP BY c2,c1
> uses HashAggregate.
>

Sure, but it can use sort node for c2,c1 to make use of GroupAggregate
if such a plan is cheap and that is what has happened in the first
case of this report.  I have tried to explain the reason why planner
has chosen the specific plan in each of the three cases.  I guess the
OP might have confused between second and third plan which appears to
be somewhat similar but are actually quite different.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-bugs by date:

From: Wallace Baggaley
Date: 20 January 2018, 03:32:06
Subject: Re: BUG #15021: Postgres crashes unexpectedly

From: Francisco Olarte
Date: 20 January 2018, 12:51:10
Subject: Re: BUG #15021: Postgres crashes unexpectedly

Re: BUG #15001: planner cann't distinguish composite index? - Mailing list pgsql-bugs

Previous

Next