Re: How to Find Cause of Long Vacuum Times - NOOB Question - Mailing list pgsql-performance

From Steinar H. Gunderson
Subject Re: How to Find Cause of Long Vacuum Times - NOOB Question
Date
Msg-id 20070505234014.GA15522@uio.no
Whole thread Raw
In response to How to Find Cause of Long Vacuum Times - NOOB Question  (Yudhvir Singh Sidhu <ysidhu@gmail.com>)
Responses Re: How to Find Cause of Long Vacuum Times - NOOB Question  (Yudhvir Singh Sidhu <ysidhu@gmail.com>)
List pgsql-performance
On Sat, May 05, 2007 at 03:57:25PM -0700, Yudhvir Singh Sidhu wrote:
> Situation:  huge amounts of adds and deletes daily. Running daily vacuums

If you have huge amounts of adds and deletes, you might want to vacuum more
often; optionally, look into autovacuum.

> Problem:  Vacuum times jump up from 45 minutes, or 1:30 minutes to 6+
> hours overnight, once every 1 to 3 months.

You might want to check your FSM settings. Take a look at the output of
VACUUM VERBOSE and see how the results stack up against your FSM settings.
Optionally, you could do a VACUUM FULL to clear the bloat, but this will lock
the tables and is not recommended on a regular basis.

> I know my indexes are getting fragmented and my tables are getting
> fragmented.

This sounds like a case of table bloat, ie. vacuuming too seldom and/or too
low FSM settings.

> I also know that some of my btree indexes are not being used in queries.

This is a separate problem, usually; if you need help with a specific query,
post query and the EXPLAIN ANALYZE output here. (Note that using an index is
not always a win; Postgres' planner knows about this and tries to figure out
when it is a win and when it is not.)

> I also know that using "UNIQUE" in a query makes PG ignore any index.

Do you mean DISTINCT? There are known problems with SELECT DISTINCT, but I'm
not sure how it could make Postgres start ignoring an index. Again, it's a
separate problem.

> I am looking for the cause of this. Recently I have been looking at
> EXPLAIN and ANALYZE.

This is a good beginning. :-)

> 1.  Running EXPLAIN on a query tells me how my query SHOULD run and
> running ANALYZE tells me how it DOES run. Is that correct?

Nearly. EXPLAIN tells you how the plan Postgres has chosen, with estimates on
the costs of each step. EXPLAIN ANALYZE (just plain "ANALYZE" is a different
command, which updates the planner's statistics) does the same, but also runs
the query and shows the time each step ended up taking. (Note that the
units of the estimates and the timings are different, so you can't compare
them directly.)

> 2.  If (1) is true, then a difference between the two means my query
> plan is messed up and running ANALYZE on a table-level will somehow
> rebuild the plan. Is that correct?

Again, sort of right, but not entirely. ANALYZE updates the planner's
statistics. Having good statistics is very useful for the planner in
selecting the plan that actually ends up being the best.

> 3.  If (2) is correct, then  running ANALYZE on a nightly basis before
> running vacuum will keep vacuum times down. Is that correct?

No, ANALYZE will only update planner statistics, which has nothing to do with
vacuum times. On the other hand, it might help with some of your queries.

/* Steinar */
--
Homepage: http://www.sesse.net/

pgsql-performance by date:

Previous
From: Yudhvir Singh Sidhu
Date:
Subject: How to Find Cause of Long Vacuum Times - NOOB Question
Next
From: Yudhvir Singh Sidhu
Date:
Subject: Re: How to Find Cause of Long Vacuum Times - NOOB Question