Thread: [Devel 9.2] Index-only scan in count aggregation

[Devel 9.2] Index-only scan in count aggregation

From
Emanuel Calvo
Date:
Hi guys,

I'm one of the nightly sources of 9.2devel. I was trying some simple
queries and I realized something:

stuff=# explain (analyze true, costs true, buffers true, timing true,
verbose true) select count(i) from lot_of_values;                                                               QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=213496.00..213496.01 rows=1 width=4) (actual
 
time=60400.788..60400.791 rows=1 loops=1)  Output: count(i)  Buffers: shared hit=2400 read=86096  ->  Seq Scan on
public.lot_of_values (cost=0.00..188496.00
 
rows=10000000 width=4) (actual time=0.371..32227.791 rows=10000000
loops=1)        Output: i, t1, r1, r2, r3, d1        Buffers: shared hit=2400 read=86096Total runtime: 60402.460 ms
(7 rows)

stuff=# set enable_seqscan=off;
SET
stuff=# explain (analyze true, costs true, buffers true, timing true,
verbose true) select count(i) from lot_of_values;
         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=351292.03..351292.04 rows=1 width=4) (actual
 
time=63278.472..63278.475 rows=1 loops=1)  Output: count(i)  Buffers: shared hit=1 read=110379  ->  Index Only Scan
usinglot_of_values_pkey on
 
public.lot_of_values  (cost=0.00..326292.03 rows=10000000 width=4)
(actual time=42.028..35217.460 rows=10000000 loops=1)        Output: i        Heap Fetches: 10000000        Buffers:
sharedhit=1 read=110379Total runtime: 63278.720 ms
 
(8 rows)


I know, still development. Just wanna know if there will be an
improvement for this in the next patches or the idea is to maintain
this behaviour.

Cheers and thanks for the amazing work you all had done!

-- 
--
Emanuel Calvo


Re: [Devel 9.2] Index-only scan in count aggregation

From
Robert Haas
Date:
On Wed, Apr 18, 2012 at 12:13 PM, Emanuel Calvo <postgres.arg@gmail.com> wrote:
> Hi guys,
>
> I'm one of the nightly sources of 9.2devel. I was trying some simple
> queries and I realized something:
>
> stuff=# explain (analyze true, costs true, buffers true, timing true,
> verbose true) select count(i) from lot_of_values;
>                                                                QUERY
> PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=213496.00..213496.01 rows=1 width=4) (actual
> time=60400.788..60400.791 rows=1 loops=1)
>   Output: count(i)
>   Buffers: shared hit=2400 read=86096
>   ->  Seq Scan on public.lot_of_values  (cost=0.00..188496.00
> rows=10000000 width=4) (actual time=0.371..32227.791 rows=10000000
> loops=1)
>         Output: i, t1, r1, r2, r3, d1
>         Buffers: shared hit=2400 read=86096
>  Total runtime: 60402.460 ms
> (7 rows)
>
> stuff=# set enable_seqscan=off;
> SET
> stuff=# explain (analyze true, costs true, buffers true, timing true,
> verbose true) select count(i) from lot_of_values;
>
>          QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=351292.03..351292.04 rows=1 width=4) (actual
> time=63278.472..63278.475 rows=1 loops=1)
>   Output: count(i)
>   Buffers: shared hit=1 read=110379
>   ->  Index Only Scan using lot_of_values_pkey on
> public.lot_of_values  (cost=0.00..326292.03 rows=10000000 width=4)
> (actual time=42.028..35217.460 rows=10000000 loops=1)
>         Output: i
>         Heap Fetches: 10000000
>         Buffers: shared hit=1 read=110379
>  Total runtime: 63278.720 ms
> (8 rows)
>
>
> I know, still development. Just wanna know if there will be an
> improvement for this in the next patches or the idea is to maintain
> this behaviour.
>
> Cheers and thanks for the amazing work you all had done!

I'm not sure what you're unhappy about.  It seems that the query
planner picked the fastest plan (a sequential scan) and then when you
disabled that it picked the second-fastest plan (an index-only scan).

The index-only scan would have a chance of beating the sequential scan
if the table had been recently vacuumed, but not in the case where
every row is going to require a heap fetch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [Devel 9.2] Index-only scan in count aggregation

From
Emanuel Calvo
Date:
El día 18 de abril de 2012 18:17, Robert Haas <robertmhaas@gmail.com> escribió:
> On Wed, Apr 18, 2012 at 12:13 PM, Emanuel Calvo <postgres.arg@gmail.com> wrote:
>> Hi guys,
>>
>> I'm one of the nightly sources of 9.2devel. I was trying some simple
>> queries and I realized something:
>>
>> stuff=# explain (analyze true, costs true, buffers true, timing true,
>> verbose true) select count(i) from lot_of_values;
>>                                                                QUERY
>> PLAN
>>
-------------------------------------------------------------------------------------------------------------------------------------------
>>  Aggregate  (cost=213496.00..213496.01 rows=1 width=4) (actual
>> time=60400.788..60400.791 rows=1 loops=1)
>>   Output: count(i)
>>   Buffers: shared hit=2400 read=86096
>>   ->  Seq Scan on public.lot_of_values  (cost=0.00..188496.00
>> rows=10000000 width=4) (actual time=0.371..32227.791 rows=10000000
>> loops=1)
>>         Output: i, t1, r1, r2, r3, d1
>>         Buffers: shared hit=2400 read=86096
>>  Total runtime: 60402.460 ms
>> (7 rows)
>>
>> stuff=# set enable_seqscan=off;
>> SET
>> stuff=# explain (analyze true, costs true, buffers true, timing true,
>> verbose true) select count(i) from lot_of_values;
>>
>>          QUERY PLAN
>>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>  Aggregate  (cost=351292.03..351292.04 rows=1 width=4) (actual
>> time=63278.472..63278.475 rows=1 loops=1)
>>   Output: count(i)
>>   Buffers: shared hit=1 read=110379
>>   ->  Index Only Scan using lot_of_values_pkey on
>> public.lot_of_values  (cost=0.00..326292.03 rows=10000000 width=4)
>> (actual time=42.028..35217.460 rows=10000000 loops=1)
>>         Output: i
>>         Heap Fetches: 10000000
>>         Buffers: shared hit=1 read=110379
>>  Total runtime: 63278.720 ms
>> (8 rows)
>>
>>
>> I know, still development. Just wanna know if there will be an
>> improvement for this in the next patches or the idea is to maintain
>> this behaviour.
>>
>> Cheers and thanks for the amazing work you all had done!
>
> I'm not sure what you're unhappy about.  It seems that the query
> planner picked the fastest plan (a sequential scan) and then when you
> disabled that it picked the second-fastest plan (an index-only scan).
>
> The index-only scan would have a chance of beating the sequential scan
> if the table had been recently vacuumed, but not in the case where
> every row is going to require a heap fetch.
>

Oh, I see now. Honestly, I thought it wasn't necessary to make a heap
fetch. The table
doesn't have any modifications, but with the vacuum the cost changed.

Checking the source code, I saw what you are talking about:
 /*                * We can skip the heap fetch if the TID references a
heap page on                * which all tuples are known visible to everybody.
In any case,                * we'll use the index tuple not the heap tuple as the
data source.                */

Thanks for the information!


--
--
Emanuel Calvo


Re: [Devel 9.2] Index-only scan in count aggregation

From
Robert Haas
Date:
On Wed, Apr 18, 2012 at 12:40 PM, Emanuel Calvo <postgres.arg@gmail.com> wrote:
>> I'm not sure what you're unhappy about.  It seems that the query
>> planner picked the fastest plan (a sequential scan) and then when you
>> disabled that it picked the second-fastest plan (an index-only scan).
>>
>> The index-only scan would have a chance of beating the sequential scan
>> if the table had been recently vacuumed, but not in the case where
>> every row is going to require a heap fetch.
>
> Oh, I see now. Honestly, I thought it wasn't necessary to make a heap
> fetch. The table
> doesn't have any modifications, but with the vacuum the cost changed.

Ah, I see.  Yeah, I think you're not going to be the first person to
not realize that, especially since we haven't changed the rules for
autovacuuming, and therefore you can't count on autovac to correct the
problem for you.  :-(

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [Devel 9.2] Index-only scan in count aggregation

From
Robert Haas
Date:
On Wed, Apr 18, 2012 at 12:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Apr 18, 2012 at 12:40 PM, Emanuel Calvo <postgres.arg@gmail.com> wrote:
>>> I'm not sure what you're unhappy about.  It seems that the query
>>> planner picked the fastest plan (a sequential scan) and then when you
>>> disabled that it picked the second-fastest plan (an index-only scan).
>>>
>>> The index-only scan would have a chance of beating the sequential scan
>>> if the table had been recently vacuumed, but not in the case where
>>> every row is going to require a heap fetch.
>>
>> Oh, I see now. Honestly, I thought it wasn't necessary to make a heap
>> fetch. The table
>> doesn't have any modifications, but with the vacuum the cost changed.
>
> Ah, I see.  Yeah, I think you're not going to be the first person to
> not realize that, especially since we haven't changed the rules for
> autovacuuming, and therefore you can't count on autovac to correct the
> problem for you.  :-(

And by "the first person" I of course meant "the last person".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company