Re: PostgreSQL 8.2.3 VACUUM Timings/Performance - Mailing list pgsql-performance

From Bruce McAlister
Subject Re: PostgreSQL 8.2.3 VACUUM Timings/Performance
Date
Msg-id esrdsb$1klg$1@news.hub.org
Whole thread Raw
In response to PostgreSQL 8.2.3 VACUUM Timings/Performance  ("Bruce McAlister" <bruce.mcalister@blueface.ie>)
List pgsql-performance
Hi Tom,

Thanks for the suggestion. It's been a while since I replied to this as I
had to go and do some further investigation of the docs with regards the
autovacuum daemons configuration. According to the documentation, the
formula's for the vacuum and analyze are as follows:

Vacuum
    vacuum threshold = vacuum base threshold + vacuum scale factor * number
of tuples
Analyze
    analyze threshold = analyze base threshold + analyze scale factor *
number of tuples

My current settings for autovacuum are as follows:

# - Cost-Based Vacuum Delay -

vacuum_cost_delay         = 200                 # 0-1000 milliseconds
vacuum_cost_page_hit    = 1                     # 0-10000 credits
vacuum_cost_page_miss = 10                   # 0-10000 credits
vacuum_cost_page_dirty = 20                   # 0-10000 credits
vacuum_cost_limit           = 200                 # 0-10000 credits

#---------------------------------------------------------------------------
# AUTOVACUUM PARAMETERS
#---------------------------------------------------------------------------

autovacuum                                   = on                      #
enable autovacuum subprocess?

      # 'on' requires stats_start_collector

      # and stats_row_level to also be on
autovacuum_naptime                     = 1min                   # time
between autovacuum runs
autovacuum_vacuum_threshold     = 500                     # min # of tuple
updates before

      # vacuum
autovacuum_analyze_threshold      = 250                    # min # of tuple
updates before

      # analyze
autovacuum_vacuum_scale_factor = 0.2                     # fraction of rel
size before

      # vacuum
autovacuum_analyze_scale_factor = 0.1                     # fraction of rel
size before

      # analyze
autovacuum_freeze_max_age       = 200000000         # maximum XID age before
forced vacuum

      # (change requires restart)
autovacuum_vacuum_cost_delay  = -1                       # default vacuum
cost delay for

      # autovacuum, -1 means use

      # vacuum_cost_delay
autovacuum_vacuum_cost_limit    = -1                       # default vacuum
cost limit for

     # autovacuum, -1 means use

     # vacuum_cost_limit

Thus to make the autovacuum more aggressive I am thinking along the lines of
changing the following parameters:

autovacuum_vacuum_threshold     = 250
autovacuum_analyze_threshold     = 125

The documentation also mentions that when the autovacuum runs it selects a
single database to process on that run. This means that the particular table
that we are interrested in will only be vacuumed once every 17 minutes,
assuming we have 18 databases and the selection process is sequential
through the database list.

From my understanding of the documentation, the only way to work around this
issue is to manually update the system catalog table pg_autovacuum and set
the pg_autovacuum.enabled field to false to skip the autovacuum on tables
that dont require such frequent vacuums. If I do enable this feature, and I
manually run a vacuumdb from the command line against that particular
disabled table, will the vacuum still process the table? I'm assuming too,
that the best tables to disable autovacuum on will be ones with a minimal
amount of update/delete queries run against it. For example, if we have a
table that only has inserts applied to it, it is safe to assume that that
table can safely be ignored by autovacuum.

Do you have any other suggestions as to which tables generally can be
excluded from the autovacuum based on the usage patterns?
Can you see anything with respect to my new autovacuum parameters that may
cause issue's and are there any other parameters that you suggest I need to
change to make the autovacuum daemon more aggressive?

PS: Currently we have the Cluster command running on the sipaccounts table
as the vacuum full is taking too long. It would be nice though to have some
piece of mind that the cluster command is mvcc safe, as Heikki and Aidan
have mentioned that it is not and may break things in our environment, I'm a
little afraid of running with the cluster command, and should possibly go
back to the vacuum full :/

Thanks all for any and all suggestions/comments.

Thanks
Bruce


"Tom Lane" <tgl@sss.pgh.pa.us> wrote in message
news:19106.1173111068@sss.pgh.pa.us...
> "Bruce McAlister" <bruce.mcalister@blueface.ie> writes:
>> [1] AutoVacuum runs during the day over the entire PostgreSQL cluster,
>
> Good, but evidently you need to make it more aggressive.
>
>> [2] A Vacuum Full Verbose is run during our least busy period (generally
>> 03:30) against the Database,
>
>> [3] A Re-Index on the table is performed,
>
>> [4] A Cluster on the table is performed against the most used index,
>
>> [5] A Vacuum Analyze Verbose is run against the database.
>
> That is enormous overkill.  Steps 2 and 3 are a 100% waste of time if
> you are going to cluster in step 4.  Just do the CLUSTER and then
> ANALYZE (or VACUUM ANALYZE if you really must, but the value is marginal).
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>



pgsql-performance by date:

Previous
From: "James Mansion"
Date:
Subject: Re: compact flash disks?
Next
From: Axel Rau
Date:
Subject: Re: configuring new server / many slow disks?