Thread: VACUUM ANALYZE downgrades performance

VACUUM ANALYZE downgrades performance

From
Dmitry Karasik
Date:
Hi all,

On v7.4.5 I noticed downgrade in the planner, namely favoring
sequential scan over index scan. The proof:

   create table a ( a integer);
   create index aidx on a(a);
   explain analyze select * from a where a = 0;
   -- Index Scan using aidx on a  (cost=0.00..17.07 rows=5 width=4) (actual
   --   time=0.029..0.029 rows=0 loops=1)
   -- Index Cond: (a = 0)
   vacuum analyze;
   explain analyze select * from a where a = 0;
   -- Seq Scan on a (cost=0.00..0.00 rows=1 width=4) (actual time=0.009..0.009
   --   rows=0 loops=1)
   -- Filter: (a = 0)

I do realize that there might be reasons why this happens over an empty
table, but what is way worse that when the table starts actually to fill,
the seq scan is still there, and the index is simply not used. How
that could be so ...mmm... shortsighted, and what is more important,
how to avoid this? I hope the answer is not 'run vacuum analyze each 5 seconds'.

--
Sincerely,
    Dmitry Karasik

---
catpipe Systems ApS
*BSD solutions, consulting, development
www.catpipe.net
+45 7021 0050

Re: VACUUM ANALYZE downgrades performance

From
Mike Rylander
Date:
On 30 Nov 2004 14:30:37 +0100, Dmitry Karasik <dmitry@karasik.eu.org> wrote:
>
> Hi all,
>
> On v7.4.5 I noticed downgrade in the planner, namely favoring
> sequential scan over index scan. The proof:
>
>    create table a ( a integer);
>    create index aidx on a(a);
>    explain analyze select * from a where a = 0;
>    -- Index Scan using aidx on a  (cost=0.00..17.07 rows=5 width=4) (actual
>    --   time=0.029..0.029 rows=0 loops=1)
>    -- Index Cond: (a = 0)
>    vacuum analyze;
>    explain analyze select * from a where a = 0;
>    -- Seq Scan on a (cost=0.00..0.00 rows=1 width=4) (actual time=0.009..0.009
>    --   rows=0 loops=1)
>    -- Filter: (a = 0)

Looks to me like the seq scan is a better plan.  The "actual time" went down.

>
> I do realize that there might be reasons why this happens over an empty
> table, but what is way worse that when the table starts actually to fill,
> the seq scan is still there, and the index is simply not used. How
> that could be so ...mmm... shortsighted, and what is more important,
> how to avoid this? I hope the answer is not 'run vacuum analyze each 5 seconds'.
>

See this thread
(http://archives.postgresql.org/pgsql-hackers/2004-11/msg00985.php and
http://archives.postgresql.org/pgsql-hackers/2004-11/msg01080.php) for
an ongoing discussion of the issue.


--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Re: VACUUM ANALYZE downgrades performance

From
Thomas Swan
Date:
On 11/30/2004 7:30 AM Dmitry Karasik said::

>Hi all,
>
>On v7.4.5 I noticed downgrade in the planner, namely favoring
>sequential scan over index scan. The proof:
>
>   create table a ( a integer);
>   create index aidx on a(a);
>   explain analyze select * from a where a = 0;
>   -- Index Scan using aidx on a  (cost=0.00..17.07 rows=5 width=4) (actual
>   --   time=0.029..0.029 rows=0 loops=1)
>   -- Index Cond: (a = 0)
>   vacuum analyze;
>   explain analyze select * from a where a = 0;
>   -- Seq Scan on a (cost=0.00..0.00 rows=1 width=4) (actual time=0.009..0.009
>   --   rows=0 loops=1)
>   -- Filter: (a = 0)
>
>I do realize that there might be reasons why this happens over an empty
>table, but what is way worse that when the table starts actually to fill,
>the seq scan is still there, and the index is simply not used. How
>that could be so ...mmm... shortsighted, and what is more important,
>how to avoid this? I hope the answer is not 'run vacuum analyze each 5 seconds'.
>
>
>
Look at the ACTUAL TIME.   It dropped from 0.029ms (using the index
scan) to 0.009ms (using a sequential scan.)

Index scans are not always faster, and the planner/optimizer knows
this.  VACUUM ANALYZE is best run when a large proportion of data has
been updated/loaded or in the off hours to refresh the statistics on
large datasets.





Re: VACUUM ANALYZE downgrades performance

From
Dmitry Karasik
Date:
    Hi Thomas!

 Thomas> Look at the ACTUAL TIME.  It dropped from 0.029ms (using the index
 Thomas> scan) to 0.009ms (using a sequential scan.)

 Thomas> Index scans are not always faster, and the planner/optimizer knows
 Thomas> this.  VACUUM ANALYZE is best run when a large proportion of data
 Thomas> has been updated/loaded or in the off hours to refresh the
 Thomas> statistics on large datasets.

While I agree that generally this is true, look how stupid this
behavior looks in this particular case: A developer creates a table
and index, knowing that the table will be large and will be intensively
used. An admin runs 'VACUUM ANALYZE' when table is occasionally empty,
and next, say, 1 day, until another 'VACUUM ANALYZE' starts, the index
is simply not used! Sure you don't suppose to run 'VACUUM ANALYZE' every
5 minutes as a solution, right?

I'm not sure if there's ever such thing like planner hints, such as,
"yes, we were switched from index back to seqscan, but this switch is
only valid until table has less than X records", but it sounds as a
reasonable solution.

Well anyway, here's the scenario that cannot be fought neither by
SQL programming nor by administrative guidelines, at least as I see
it. And yes, I looked on the actual time, but somehow am not moved by
how fast postgresql can seqscan an empty table, really. I believe
there's something wrong if decisions based on a table when it is empty,
are suddenly applied when it is full.

--
Sincerely,
    Dmitry Karasik

---
catpipe Systems ApS
*BSD solutions, consulting, development
www.catpipe.net
+45 7021 0050

Re: VACUUM ANALYZE downgrades performance

From
Rod Taylor
Date:
On Thu, 2004-12-02 at 17:07 +0100, Dmitry Karasik wrote:
>     Hi Thomas!
>
>  Thomas> Look at the ACTUAL TIME.  It dropped from 0.029ms (using the index
>  Thomas> scan) to 0.009ms (using a sequential scan.)
>
>  Thomas> Index scans are not always faster, and the planner/optimizer knows
>  Thomas> this.  VACUUM ANALYZE is best run when a large proportion of data
>  Thomas> has been updated/loaded or in the off hours to refresh the
>  Thomas> statistics on large datasets.
>
> While I agree that generally this is true, look how stupid this
> behavior looks in this particular case: A developer creates a table
> and index, knowing that the table will be large and will be intensively
> used. An admin runs 'VACUUM ANALYZE' when table is occasionally empty,
> and next, say, 1 day, until another 'VACUUM ANALYZE' starts, the index
> is simply not used! Sure you don't suppose to run 'VACUUM ANALYZE' every
> 5 minutes as a solution, right?

You might want to try this on the next 8.0 beta to come out, or against
CVS. Tom recently applied some changes which should mitigate this
situation.

--


Re: VACUUM ANALYZE downgrades performance

From
"Steinar H. Gunderson"
Date:
On Thu, Dec 02, 2004 at 05:07:17PM +0100, Dmitry Karasik wrote:
> While I agree that generally this is true, look how stupid this
> behavior looks in this particular case: A developer creates a table
> and index, knowing that the table will be large and will be intensively
> used. An admin runs 'VACUUM ANALYZE' when table is occasionally empty,
> and next, say, 1 day, until another 'VACUUM ANALYZE' starts, the index
> is simply not used! Sure you don't suppose to run 'VACUUM ANALYZE' every
> 5 minutes as a solution, right?

No, you run autovacuum, which automatically re-analyzes at approximately the
right time.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: VACUUM ANALYZE downgrades performance

From
Shridhar Daithankar
Date:
On Thursday 02 Dec 2004 9:37 pm, Dmitry Karasik wrote:
>     Hi Thomas!
>
>  Thomas> Look at the ACTUAL TIME.  It dropped from 0.029ms (using the index
>  Thomas> scan) to 0.009ms (using a sequential scan.)
>
>  Thomas> Index scans are not always faster, and the planner/optimizer knows
>  Thomas> this.  VACUUM ANALYZE is best run when a large proportion of data
>  Thomas> has been updated/loaded or in the off hours to refresh the
>  Thomas> statistics on large datasets.
>
> While I agree that generally this is true, look how stupid this
> behavior looks in this particular case: A developer creates a table
> and index, knowing that the table will be large and will be intensively
> used. An admin runs 'VACUUM ANALYZE' when table is occasionally empty,
> and next, say, 1 day, until another 'VACUUM ANALYZE' starts, the index
> is simply not used! Sure you don't suppose to run 'VACUUM ANALYZE' every
> 5 minutes as a solution, right?

Why not? If the updates are frequent enough, that is *the* solution.

But you could always use autovacuum daemon in most case.

HTH

 Shridhar

Re: VACUUM ANALYZE downgrades performance

From
Dmitry Karasik
Date:
    Hi Rod!

 Thomas> Index scans are not always faster, and the planner/optimizer knows
 Thomas> this.  VACUUM ANALYZE is best run when a large proportion of data
 Thomas> has been updated/loaded or in the off hours to refresh the
 Thomas> statistics on large datasets.
 >>  While I agree that generally this is true, look how stupid this
 >> behavior looks in this particular case: A developer creates a table and
 >> index, knowing that the table will be large and will be intensively
 >> used. An admin runs 'VACUUM ANALYZE' when table is occasionally empty,
 >> and next, say, 1 day, until another 'VACUUM ANALYZE' starts, the index
 >> is simply not used! Sure you don't suppose to run 'VACUUM ANALYZE'
 >> every 5 minutes as a solution, right?

 Rod> You might want to try this on the next 8.0 beta to come out, or
 Rod> against CVS. Tom recently applied some changes which should mitigate
 Rod> this situation.

But this would affect only VACUUM, and not ANALYZE, right?

--
Sincerely,
    Dmitry Karasik

---
catpipe Systems ApS
*BSD solutions, consulting, development
www.catpipe.net
+45 7021 0050