Thread: [Devel 9.2] Index-only scan in count aggregation
Hi guys, I'm one of the nightly sources of 9.2devel. I was trying some simple queries and I realized something: stuff=# explain (analyze true, costs true, buffers true, timing true, verbose true) select count(i) from lot_of_values; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=213496.00..213496.01 rows=1 width=4) (actual time=60400.788..60400.791 rows=1 loops=1) Output: count(i) Buffers: shared hit=2400 read=86096 -> Seq Scan on public.lot_of_values (cost=0.00..188496.00 rows=10000000 width=4) (actual time=0.371..32227.791 rows=10000000 loops=1) Output: i, t1, r1, r2, r3, d1 Buffers: shared hit=2400 read=86096Total runtime: 60402.460 ms (7 rows) stuff=# set enable_seqscan=off; SET stuff=# explain (analyze true, costs true, buffers true, timing true, verbose true) select count(i) from lot_of_values; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=351292.03..351292.04 rows=1 width=4) (actual time=63278.472..63278.475 rows=1 loops=1) Output: count(i) Buffers: shared hit=1 read=110379 -> Index Only Scan usinglot_of_values_pkey on public.lot_of_values (cost=0.00..326292.03 rows=10000000 width=4) (actual time=42.028..35217.460 rows=10000000 loops=1) Output: i Heap Fetches: 10000000 Buffers: sharedhit=1 read=110379Total runtime: 63278.720 ms (8 rows) I know, still development. Just wanna know if there will be an improvement for this in the next patches or the idea is to maintain this behaviour. Cheers and thanks for the amazing work you all had done! -- -- Emanuel Calvo
On Wed, Apr 18, 2012 at 12:13 PM, Emanuel Calvo <postgres.arg@gmail.com> wrote: > Hi guys, > > I'm one of the nightly sources of 9.2devel. I was trying some simple > queries and I realized something: > > stuff=# explain (analyze true, costs true, buffers true, timing true, > verbose true) select count(i) from lot_of_values; > QUERY > PLAN > ------------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=213496.00..213496.01 rows=1 width=4) (actual > time=60400.788..60400.791 rows=1 loops=1) > Output: count(i) > Buffers: shared hit=2400 read=86096 > -> Seq Scan on public.lot_of_values (cost=0.00..188496.00 > rows=10000000 width=4) (actual time=0.371..32227.791 rows=10000000 > loops=1) > Output: i, t1, r1, r2, r3, d1 > Buffers: shared hit=2400 read=86096 > Total runtime: 60402.460 ms > (7 rows) > > stuff=# set enable_seqscan=off; > SET > stuff=# explain (analyze true, costs true, buffers true, timing true, > verbose true) select count(i) from lot_of_values; > > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=351292.03..351292.04 rows=1 width=4) (actual > time=63278.472..63278.475 rows=1 loops=1) > Output: count(i) > Buffers: shared hit=1 read=110379 > -> Index Only Scan using lot_of_values_pkey on > public.lot_of_values (cost=0.00..326292.03 rows=10000000 width=4) > (actual time=42.028..35217.460 rows=10000000 loops=1) > Output: i > Heap Fetches: 10000000 > Buffers: shared hit=1 read=110379 > Total runtime: 63278.720 ms > (8 rows) > > > I know, still development. Just wanna know if there will be an > improvement for this in the next patches or the idea is to maintain > this behaviour. > > Cheers and thanks for the amazing work you all had done! I'm not sure what you're unhappy about. It seems that the query planner picked the fastest plan (a sequential scan) and then when you disabled that it picked the second-fastest plan (an index-only scan). The index-only scan would have a chance of beating the sequential scan if the table had been recently vacuumed, but not in the case where every row is going to require a heap fetch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
El día 18 de abril de 2012 18:17, Robert Haas <robertmhaas@gmail.com> escribió: > On Wed, Apr 18, 2012 at 12:13 PM, Emanuel Calvo <postgres.arg@gmail.com> wrote: >> Hi guys, >> >> I'm one of the nightly sources of 9.2devel. I was trying some simple >> queries and I realized something: >> >> stuff=# explain (analyze true, costs true, buffers true, timing true, >> verbose true) select count(i) from lot_of_values; >> QUERY >> PLAN >> ------------------------------------------------------------------------------------------------------------------------------------------- >> Aggregate (cost=213496.00..213496.01 rows=1 width=4) (actual >> time=60400.788..60400.791 rows=1 loops=1) >> Output: count(i) >> Buffers: shared hit=2400 read=86096 >> -> Seq Scan on public.lot_of_values (cost=0.00..188496.00 >> rows=10000000 width=4) (actual time=0.371..32227.791 rows=10000000 >> loops=1) >> Output: i, t1, r1, r2, r3, d1 >> Buffers: shared hit=2400 read=86096 >> Total runtime: 60402.460 ms >> (7 rows) >> >> stuff=# set enable_seqscan=off; >> SET >> stuff=# explain (analyze true, costs true, buffers true, timing true, >> verbose true) select count(i) from lot_of_values; >> >> QUERY PLAN >> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> Aggregate (cost=351292.03..351292.04 rows=1 width=4) (actual >> time=63278.472..63278.475 rows=1 loops=1) >> Output: count(i) >> Buffers: shared hit=1 read=110379 >> -> Index Only Scan using lot_of_values_pkey on >> public.lot_of_values (cost=0.00..326292.03 rows=10000000 width=4) >> (actual time=42.028..35217.460 rows=10000000 loops=1) >> Output: i >> Heap Fetches: 10000000 >> Buffers: shared hit=1 read=110379 >> Total runtime: 63278.720 ms >> (8 rows) >> >> >> I know, still development. Just wanna know if there will be an >> improvement for this in the next patches or the idea is to maintain >> this behaviour. >> >> Cheers and thanks for the amazing work you all had done! > > I'm not sure what you're unhappy about. It seems that the query > planner picked the fastest plan (a sequential scan) and then when you > disabled that it picked the second-fastest plan (an index-only scan). > > The index-only scan would have a chance of beating the sequential scan > if the table had been recently vacuumed, but not in the case where > every row is going to require a heap fetch. > Oh, I see now. Honestly, I thought it wasn't necessary to make a heap fetch. The table doesn't have any modifications, but with the vacuum the cost changed. Checking the source code, I saw what you are talking about: /* * We can skip the heap fetch if the TID references a heap page on * which all tuples are known visible to everybody. In any case, * we'll use the index tuple not the heap tuple as the data source. */ Thanks for the information! -- -- Emanuel Calvo
On Wed, Apr 18, 2012 at 12:40 PM, Emanuel Calvo <postgres.arg@gmail.com> wrote: >> I'm not sure what you're unhappy about. It seems that the query >> planner picked the fastest plan (a sequential scan) and then when you >> disabled that it picked the second-fastest plan (an index-only scan). >> >> The index-only scan would have a chance of beating the sequential scan >> if the table had been recently vacuumed, but not in the case where >> every row is going to require a heap fetch. > > Oh, I see now. Honestly, I thought it wasn't necessary to make a heap > fetch. The table > doesn't have any modifications, but with the vacuum the cost changed. Ah, I see. Yeah, I think you're not going to be the first person to not realize that, especially since we haven't changed the rules for autovacuuming, and therefore you can't count on autovac to correct the problem for you. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Apr 18, 2012 at 12:52 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Apr 18, 2012 at 12:40 PM, Emanuel Calvo <postgres.arg@gmail.com> wrote: >>> I'm not sure what you're unhappy about. It seems that the query >>> planner picked the fastest plan (a sequential scan) and then when you >>> disabled that it picked the second-fastest plan (an index-only scan). >>> >>> The index-only scan would have a chance of beating the sequential scan >>> if the table had been recently vacuumed, but not in the case where >>> every row is going to require a heap fetch. >> >> Oh, I see now. Honestly, I thought it wasn't necessary to make a heap >> fetch. The table >> doesn't have any modifications, but with the vacuum the cost changed. > > Ah, I see. Yeah, I think you're not going to be the first person to > not realize that, especially since we haven't changed the rules for > autovacuuming, and therefore you can't count on autovac to correct the > problem for you. :-( And by "the first person" I of course meant "the last person". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company