Home > mailing lists

Re: [HACKERS] Print correct startup cost for the group aggregate. - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: [HACKERS] Print correct startup cost for the group aggregate.
Date	March 4, 2017 12:20:10
Msg-id	CA+TgmoawVv-NA-TUx7mmFLigJQNJ3R3bRCgB-01hohJfBedKtQ@mail.gmail.com Whole thread
In response to	Re: [HACKERS] Print correct startup cost for the group aggregate. (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses	Re: [HACKERS] Print correct startup cost for the group aggregate.
List	pgsql-hackers

Tree view

On Thu, Mar 2, 2017 at 6:48 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Thu, Mar 2, 2017 at 6:06 PM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
>> While reading through the cost_agg() I found that startup cost for the
>> group aggregate is not correctly assigned. Due to this explain plan is
>> not printing the correct startup cost.
>>
>> Without patch:
>>
>> postgres=# explain select aid, sum(abalance) from pgbench_accounts where
>> filler like '%foo%' group by aid;
>>                                      QUERY PLAN
>> -------------------------------------------------------------------------------------
>>  GroupAggregate  (cost=81634.33..85102.04 rows=198155 width=12)
>>    Group Key: aid
>>    ->  Sort  (cost=81634.33..82129.72 rows=198155 width=8)
>>          Sort Key: aid
>>          ->  Seq Scan on pgbench_accounts  (cost=0.00..61487.89 rows=198155
>> width=8)
>>                Filter: (filler ~~ '%foo%'::text)
>> (6 rows)
>>
>> With patch:
>>
>> postgres=# explain select aid, sum(abalance) from pgbench_accounts where
>> filler like '%foo%' group by aid;
>>                                      QUERY PLAN
>> -------------------------------------------------------------------------------------
>>  GroupAggregate  (cost=82129.72..85102.04 rows=198155 width=12)
>>    Group Key: aid
>>    ->  Sort  (cost=81634.33..82129.72 rows=198155 width=8)
>>          Sort Key: aid
>>          ->  Seq Scan on pgbench_accounts  (cost=0.00..61487.89 rows=198155
>> width=8)
>>                Filter: (filler ~~ '%foo%'::text)
>> (6 rows)
>>
>
> The reason the reason why startup_cost = input_startup_cost and not
> input_total_cost for aggregation by sorting is we don't need the whole
> input before the Group/Agg plan can produce the first row. But I think
> setting startup_cost = input_startup_cost is also not exactly correct.
> Before the plan can produce one row, it has to transit through all the
> rows belonging to the group to which the first row belongs. On an
> average it has to scan (total number of rows)/(number of groups)
> before producing the first aggregated row. startup_cost will be
> input_startup_cost + cost to scan (total number of rows)/(number of
> groups) rows + cost of transiting over those many rows. Total cost =
> startup_cost + cost of scanning and transiting through the remaining
> number of input rows.

While that idea has some merit, I think it's inconsistent with current
practice.  cost_seqscan(), for example, doesn't include the cost of
reading the first page in the startup cost, even though that certainly
must be done before returning the first row.  I think there have been
previous discussions of switching over to the practice for which you
are advocating here, but my impression (without researching) is that
the current practice is more like what Rushabh did.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Robert Haas
Date: 04 March 2017, 12:16:56
Subject: Re: [HACKERS] REINDEX CONCURRENTLY 2.0

From: Amit Kapila
Date: 04 March 2017, 12:30:10
Subject: Re: [HACKERS] GUC for cleanup indexes threshold.

Re: [HACKERS] Print correct startup cost for the group aggregate. - Mailing list pgsql-hackers

Previous

Next