Thread: VACUUM ANALYZE downgrades performance
Hi all, On v7.4.5 I noticed downgrade in the planner, namely favoring sequential scan over index scan. The proof: create table a ( a integer); create index aidx on a(a); explain analyze select * from a where a = 0; -- Index Scan using aidx on a (cost=0.00..17.07 rows=5 width=4) (actual -- time=0.029..0.029 rows=0 loops=1) -- Index Cond: (a = 0) vacuum analyze; explain analyze select * from a where a = 0; -- Seq Scan on a (cost=0.00..0.00 rows=1 width=4) (actual time=0.009..0.009 -- rows=0 loops=1) -- Filter: (a = 0) I do realize that there might be reasons why this happens over an empty table, but what is way worse that when the table starts actually to fill, the seq scan is still there, and the index is simply not used. How that could be so ...mmm... shortsighted, and what is more important, how to avoid this? I hope the answer is not 'run vacuum analyze each 5 seconds'. -- Sincerely, Dmitry Karasik --- catpipe Systems ApS *BSD solutions, consulting, development www.catpipe.net +45 7021 0050
On 30 Nov 2004 14:30:37 +0100, Dmitry Karasik <dmitry@karasik.eu.org> wrote: > > Hi all, > > On v7.4.5 I noticed downgrade in the planner, namely favoring > sequential scan over index scan. The proof: > > create table a ( a integer); > create index aidx on a(a); > explain analyze select * from a where a = 0; > -- Index Scan using aidx on a (cost=0.00..17.07 rows=5 width=4) (actual > -- time=0.029..0.029 rows=0 loops=1) > -- Index Cond: (a = 0) > vacuum analyze; > explain analyze select * from a where a = 0; > -- Seq Scan on a (cost=0.00..0.00 rows=1 width=4) (actual time=0.009..0.009 > -- rows=0 loops=1) > -- Filter: (a = 0) Looks to me like the seq scan is a better plan. The "actual time" went down. > > I do realize that there might be reasons why this happens over an empty > table, but what is way worse that when the table starts actually to fill, > the seq scan is still there, and the index is simply not used. How > that could be so ...mmm... shortsighted, and what is more important, > how to avoid this? I hope the answer is not 'run vacuum analyze each 5 seconds'. > See this thread (http://archives.postgresql.org/pgsql-hackers/2004-11/msg00985.php and http://archives.postgresql.org/pgsql-hackers/2004-11/msg01080.php) for an ongoing discussion of the issue. -- Mike Rylander mrylander@gmail.com GPLS -- PINES Development Database Developer http://open-ils.org
On 11/30/2004 7:30 AM Dmitry Karasik said:: >Hi all, > >On v7.4.5 I noticed downgrade in the planner, namely favoring >sequential scan over index scan. The proof: > > create table a ( a integer); > create index aidx on a(a); > explain analyze select * from a where a = 0; > -- Index Scan using aidx on a (cost=0.00..17.07 rows=5 width=4) (actual > -- time=0.029..0.029 rows=0 loops=1) > -- Index Cond: (a = 0) > vacuum analyze; > explain analyze select * from a where a = 0; > -- Seq Scan on a (cost=0.00..0.00 rows=1 width=4) (actual time=0.009..0.009 > -- rows=0 loops=1) > -- Filter: (a = 0) > >I do realize that there might be reasons why this happens over an empty >table, but what is way worse that when the table starts actually to fill, >the seq scan is still there, and the index is simply not used. How >that could be so ...mmm... shortsighted, and what is more important, >how to avoid this? I hope the answer is not 'run vacuum analyze each 5 seconds'. > > > Look at the ACTUAL TIME. It dropped from 0.029ms (using the index scan) to 0.009ms (using a sequential scan.) Index scans are not always faster, and the planner/optimizer knows this. VACUUM ANALYZE is best run when a large proportion of data has been updated/loaded or in the off hours to refresh the statistics on large datasets.
Hi Thomas! Thomas> Look at the ACTUAL TIME. It dropped from 0.029ms (using the index Thomas> scan) to 0.009ms (using a sequential scan.) Thomas> Index scans are not always faster, and the planner/optimizer knows Thomas> this. VACUUM ANALYZE is best run when a large proportion of data Thomas> has been updated/loaded or in the off hours to refresh the Thomas> statistics on large datasets. While I agree that generally this is true, look how stupid this behavior looks in this particular case: A developer creates a table and index, knowing that the table will be large and will be intensively used. An admin runs 'VACUUM ANALYZE' when table is occasionally empty, and next, say, 1 day, until another 'VACUUM ANALYZE' starts, the index is simply not used! Sure you don't suppose to run 'VACUUM ANALYZE' every 5 minutes as a solution, right? I'm not sure if there's ever such thing like planner hints, such as, "yes, we were switched from index back to seqscan, but this switch is only valid until table has less than X records", but it sounds as a reasonable solution. Well anyway, here's the scenario that cannot be fought neither by SQL programming nor by administrative guidelines, at least as I see it. And yes, I looked on the actual time, but somehow am not moved by how fast postgresql can seqscan an empty table, really. I believe there's something wrong if decisions based on a table when it is empty, are suddenly applied when it is full. -- Sincerely, Dmitry Karasik --- catpipe Systems ApS *BSD solutions, consulting, development www.catpipe.net +45 7021 0050
On Thu, 2004-12-02 at 17:07 +0100, Dmitry Karasik wrote: > Hi Thomas! > > Thomas> Look at the ACTUAL TIME. It dropped from 0.029ms (using the index > Thomas> scan) to 0.009ms (using a sequential scan.) > > Thomas> Index scans are not always faster, and the planner/optimizer knows > Thomas> this. VACUUM ANALYZE is best run when a large proportion of data > Thomas> has been updated/loaded or in the off hours to refresh the > Thomas> statistics on large datasets. > > While I agree that generally this is true, look how stupid this > behavior looks in this particular case: A developer creates a table > and index, knowing that the table will be large and will be intensively > used. An admin runs 'VACUUM ANALYZE' when table is occasionally empty, > and next, say, 1 day, until another 'VACUUM ANALYZE' starts, the index > is simply not used! Sure you don't suppose to run 'VACUUM ANALYZE' every > 5 minutes as a solution, right? You might want to try this on the next 8.0 beta to come out, or against CVS. Tom recently applied some changes which should mitigate this situation. --
On Thu, Dec 02, 2004 at 05:07:17PM +0100, Dmitry Karasik wrote: > While I agree that generally this is true, look how stupid this > behavior looks in this particular case: A developer creates a table > and index, knowing that the table will be large and will be intensively > used. An admin runs 'VACUUM ANALYZE' when table is occasionally empty, > and next, say, 1 day, until another 'VACUUM ANALYZE' starts, the index > is simply not used! Sure you don't suppose to run 'VACUUM ANALYZE' every > 5 minutes as a solution, right? No, you run autovacuum, which automatically re-analyzes at approximately the right time. /* Steinar */ -- Homepage: http://www.sesse.net/
On Thursday 02 Dec 2004 9:37 pm, Dmitry Karasik wrote: > Hi Thomas! > > Thomas> Look at the ACTUAL TIME. It dropped from 0.029ms (using the index > Thomas> scan) to 0.009ms (using a sequential scan.) > > Thomas> Index scans are not always faster, and the planner/optimizer knows > Thomas> this. VACUUM ANALYZE is best run when a large proportion of data > Thomas> has been updated/loaded or in the off hours to refresh the > Thomas> statistics on large datasets. > > While I agree that generally this is true, look how stupid this > behavior looks in this particular case: A developer creates a table > and index, knowing that the table will be large and will be intensively > used. An admin runs 'VACUUM ANALYZE' when table is occasionally empty, > and next, say, 1 day, until another 'VACUUM ANALYZE' starts, the index > is simply not used! Sure you don't suppose to run 'VACUUM ANALYZE' every > 5 minutes as a solution, right? Why not? If the updates are frequent enough, that is *the* solution. But you could always use autovacuum daemon in most case. HTH Shridhar
Hi Rod! Thomas> Index scans are not always faster, and the planner/optimizer knows Thomas> this. VACUUM ANALYZE is best run when a large proportion of data Thomas> has been updated/loaded or in the off hours to refresh the Thomas> statistics on large datasets. >> While I agree that generally this is true, look how stupid this >> behavior looks in this particular case: A developer creates a table and >> index, knowing that the table will be large and will be intensively >> used. An admin runs 'VACUUM ANALYZE' when table is occasionally empty, >> and next, say, 1 day, until another 'VACUUM ANALYZE' starts, the index >> is simply not used! Sure you don't suppose to run 'VACUUM ANALYZE' >> every 5 minutes as a solution, right? Rod> You might want to try this on the next 8.0 beta to come out, or Rod> against CVS. Tom recently applied some changes which should mitigate Rod> this situation. But this would affect only VACUUM, and not ANALYZE, right? -- Sincerely, Dmitry Karasik --- catpipe Systems ApS *BSD solutions, consulting, development www.catpipe.net +45 7021 0050