Performance improves only after repeated VACUUM/ANALYZE - Mailing list pgsql-performance

From Carlo Stonebanks
Subject Performance improves only after repeated VACUUM/ANALYZE
Date
Msg-id B57531DF0F844B86962E9A22852F27F6@serenity
Whole thread Raw
In response to Re: Query works when kludged, but would prefer "best practice" solution  ("Dave Dutcher" <dave@tridecap.com>)
List pgsql-performance
My client "publishes" an "edition" of their DB from his production site to
his hosted web/db server. This is done by FTPing a backup of the DB to his
hosting provider.

Immediately after a "publication" (restore to web/db server) we immediately
run VACUUM ANALYZE to make sure the statistics and row estimates are
correct.

The problem is, after this initial VACUUM ANALYZE, the row estimates in
query plans are off by several orders of magnitude. For example, a
disastrous plan was created because the planner estimated 4K rows when in
fact it returned 980K rows.

Sometimes - a day or two later - the plans return to "normal" and row
estimates are closer to realistic values. Guessing that there may be
background events that are correcting the row estimates over time, I ran an
ANALYZE on the DB - and sure enough - the row estimates corrected
themselves. The puzzling thing is, there have been no writes of any sort to
the data - there is no reason for the stats to have changed.

I believe that a VACUUM may not be necessary for a newly restored DB, but I
assumed that VACUUM ANALYZE and ANALYZE have the same net result. Am I
wrong?

If I am not wrong (i.e. VACUUM ANALYZE and ANALYZE should produce the same
results) why would the performance improve on a DB that has seen no
transactional activity only after the SECOND try?

PG 8.2.4 on RH LINUX 1GB RAM SCSI RAID 1

Carlo



pgsql-performance by date:

Previous
From: Bill Moran
Date:
Subject: Re: R: DELETE queries slow down
Next
From: "Carlo Stonebanks"
Date:
Subject: Nested loops row estimates always too high