Re: AutoVacuum and growing transaction XID's - Mailing list pgsql-general
From | github kran |
---|---|
Subject | Re: AutoVacuum and growing transaction XID's |
Date | |
Msg-id | CACaZr5R2JXPbWLXEP39Tjh5uvJnC0FFPNVaaUW+wiaCAfP5QgQ@mail.gmail.com Whole thread Raw |
In response to | Re: AutoVacuum and growing transaction XID's (David Rowley <dgrowleyml@gmail.com>) |
List | pgsql-general |
Thanks David for your replies.
On Thu, May 7, 2020 at 11:01 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Fri, 8 May 2020 at 09:18, github kran <githubkran@gmail.com> wrote:
> 1) We haven't changed anything related to autovacuum except a work_mem parameter which was increased to 4 GB which I believe is not related to autovacuum
It might want to look into increasing vacuum_cost_limit to something
well above 200 or dropping autovacuum_vacuum_cost_delay down from 20
to something much lower. However, you say you've not changed the
autovacuum settings, but you've also said:
> 1) I see there are 8 Vacuum workers ( Not sure what changed) running in the background and the concern I have is all of these vacuum processes are running with wrap around and while they are running
- Yes I said it was originally 3 but I noticed the work_mem parameter was changed few weeks back to 4 GB and then from that day onwards there is an increasing trend of the MaxUsedTransactionIds from 200 Million to 347 million ( It's growing day by day from last 2 -3 weeks)
- Do you think there could be a formula on how the workers could have increased based on this increase in WORK_MEM controlled by database ?.
The default is 3, so if you have 8 then the settings are non-standard.
It might be good to supply the output of:
SELECT name,setting from pg_Settings where name like '%vacuum%';
Output of vacuum
name | setting | min_val | max_val | boot_val | reset_val |
autovacuum | on | null | null | on | on |
autovacuum_analyze_scale_factor | 0.02 | 0 | 100 | 0.1 | 0.02 |
autovacuum_analyze_threshold | 50 | 0 | 2147483647 | 50 | 50 |
autovacuum_freeze_max_age | 200000000 | 100000 | 2000000000 | 200000000 | 200000000 |
autovacuum_max_workers | 8 | 1 | 262143 | 3 | 8 |
autovacuum_multixact_freeze_max_age | 400000000 | 10000 | 2000000000 | 400000000 | 400000000 |
autovacuum_naptime | 5 | 1 | 2147483 | 60 | 5 |
autovacuum_vacuum_cost_delay | 5 | -1 | 100 | 20 | 5 |
autovacuum_vacuum_cost_limit | -1 | -1 | 10000 | -1 | -1 |
autovacuum_vacuum_scale_factor | 0.05 | 0 | 100 | 0.2 | 0.05 |
autovacuum_vacuum_threshold | 50 | 0 | 2147483647 | 50 | 50 |
autovacuum_work_mem | -1 | -1 | 2147483647 | -1 | -1 |
You should know that the default speed that autovacuum runs at is
quite slow in 9.6. If you end up with all your autovacuum workers tied
up with anti-wraparound vacuums then other tables are likely to get
neglected and that could lead to stale stats or bloated tables. Best
to aim to get auto-vacuum running faster or aim to perform some manual
vacuums of tables that are over their max freeze age during an
off-peak period to make use of the lower load during those times.
Start with tables in pg_class with the largest age(relfrozenxid).
You'll still likely want to look at the speed autovacuum runs at
either way.
Please be aware that the first time a new cluster crosses the
autovacuum_freeze_max_age threshold can be a bit of a pain point as it
can mean that many tables require auto-vacuum activity all at once.
The impact of this is compounded if you have many tables that never
receive an UPDATE/DELETE as auto-vacuum, in 9.6, does not visit those
tables for any other reason. After the first time, the relfrozenxids
of tables tend to be more staggered so their vacuum freeze
requirements are also more staggered and that tends to cause fewer
problems.
The current situation I have is the auto vacuum kicked with 8 tables with each of those tied to each worker and it's running very slow in 9.6 as you mentioned
i observed VACUUM on those 8 tables is running from last 15 hrs and other process are running for 1 hr+ and others for few minutes for different tables.
Finally I would wait for your reply to see what could be done for this VACUUM and growing TXIDs values.
- Do you think I should consider changing back the work_mem back to 4 MB what it was originally ?
- Can I apply your recommendations on a production instance directly or you prefer me to apply initially in other environment before applying on Prod ?
- Also like I said I want to clean up few unused tables OR MANUAL VACUUM but current system doesn't allow me to do it considering these factors.
- I will try to run VACUUM Manually during off peak hrs , Can I STOP the Manual VACUUM process if its take more than 10 minutes or what is the allowed time in mins I can have it running ?.
David
pgsql-general by date: