Thread: confusion about this commit "Revert "Skip redundant anti-wraparound vacuums""
hi,
in our test enviroment, if one database's have major update operations, autovacuum does not work and cause major performance degradation.
if found this issue may be resolved by revert this Skip redundant anti-wraparound vacuums · postgres/postgres@2aa6e33 (github.com) commit.
after fetch some disccusion about this revert, i have some question as follow:
1. i understand that anti-wraparound and no-aggressive autovacuum will be skipped for shared catalog tables, but why this can trigger autovacuum does not work for others tables ?
2. "this could cause autovacuum to lock down", this lock down implict that autovacuum can make a dead lock problem ?
3. how to reproduce this lock down or autovacuum invalid issue, must be cluster enviroment ?
so is there any body know these issuse or commits can give me some suggestion about my confusion.
Re: confusion about this commit "Revert "Skip redundant anti-wraparound vacuums""
From
Tom Lane
Date:
jiye <jiye_sw@126.com> writes: > in our test enviroment, if one database's have major update operations, autovacuum does not work and cause major performancedegradation. > if found this issue may be resolved by revert this Skip redundant anti-wraparound vacuums · postgres/postgres@2aa6e33 (github.com)commit. Please provide a self-contained test case illustrating this report. regards, tom lane
we can not get determinate test case as this issue reproduce only once, and currently autovaccum can works as we using vacuum freeze for each tables of each database.
our client's application is real online bank business, and have serveral customer database, do a majority of update opertaion as result trigger some table dead_tup_ratio nealy 100%, but can not find any autovacuum process work for a very long time before we do vacuum freeze manally.
and out autovacuum params as follow:
---- Replied Message ----
From | Tom Lane<tgl@sss.pgh.pa.us> |
Date | 6/5/2023 11:37 |
To | jiye<jiye_sw@126.com> |
Cc | pgsql-hackers@lists.postgresql.org<pgsql-hackers@lists.postgresql.org> |
Subject | Re: confusion about this commit "Revert "Skip redundant anti-wraparound vacuums"" |
in our test enviroment, if one database's have major update operations, autovacuum does not work and cause major performance degradation.
if found this issue may be resolved by revert this Skip redundant anti-wraparound vacuums · postgres/postgres@2aa6e33 (github.com) commit.
Please provide a self-contained test case illustrating this report.
regards, tom lane
Attachment
i attach our another case, many tables have over 200 million xid age, and these relation can not be autovacuum for long time until we freeze them manually.
---- Replied Message ----
From | jiye<jiye_sw@126.com> |
Date | 6/5/2023 13:50 |
To | tgl@sss.pgh.pa.us<tgl@sss.pgh.pa.us> |
Cc | pgsql-hackers@lists.postgresql.org<pgsql-hackers@lists.postgresql.org> |
Subject | Re: confusion about this commit "Revert "Skip redundant anti-wraparound vacuums"" |
we can not get determinate test case as this issue reproduce only once, and currently autovaccum can works as we using vacuum freeze for each tables of each database.
our client's application is real online bank business, and have serveral customer database, do a majority of update opertaion as result trigger some table dead_tup_ratio nealy 100%, but can not find any autovacuum process work for a very long time before we do vacuum freeze manally.
and out autovacuum params as follow:
---- Replied Message ----
From | Tom Lane<tgl@sss.pgh.pa.us> |
Date | 6/5/2023 11:37 |
To | jiye<jiye_sw@126.com> |
Cc | pgsql-hackers@lists.postgresql.org<pgsql-hackers@lists.postgresql.org> |
Subject | Re: confusion about this commit "Revert "Skip redundant anti-wraparound vacuums"" |
in our test enviroment, if one database's have major update operations, autovacuum does not work and cause major performance degradation.
if found this issue may be resolved by revert this Skip redundant anti-wraparound vacuums · postgres/postgres@2aa6e33 (github.com) commit.
Please provide a self-contained test case illustrating this report.
regards, tom lane
Attachment
Re: confusion about this commit "Revert "Skip redundant anti-wraparound vacuums""
From
Robert Haas
Date:
On Mon, Jun 5, 2023 at 1:50 AM jiye <jiye_sw@126.com> wrote:
we can not get determinate test case as this issue reproduce only once, and currently autovaccum can works as we using vacuum freeze for each tables of each database.our client's application is real online bank business, and have serveral customer database, do a majority of update opertaion as result trigger some table dead_tup_ratio nealy 100%, but can not find any autovacuum process work for a very long time before we do vacuum freeze manally.
I tend to doubt that this is caused by the commit you're blaming, because that commit purports to skip autovacuum operations only if some other vacuum has already done the work. Here you are saying that you see no autovacuum tasks at all.
The screenshot that you posted of XID ages exceeding 200 million is not evidence of a problem. It's pretty normal for some table XID ages to temporarily exceed autovacuum_freeze_max_age, especially if you have a lot of tables with about the same XID age, as seems to be the case here. When a table's XID age reaches autovacuum_freeze_max_age, the system will start trying harder to reduce the XID age, but that process isn't instantaneous.
On the other hand, your statement that you have very high numbers of dead tuples *is* evidence of a problem. It's very likely caused by vacuum not running aggressively enough. Remember that autovacuum is limited by the number of workers (autovacuum_max_workers) but even more importantly by the cost delay system. It's *extremely* common to need to raise vacuum_cost_limit on large or busy database systems, often by large multiples (e.g. 10x or more).
I'd strongly suggest that you carefully monitor how many autovacuum processes are running and what they are doing. If I were a betting man, I'd bet that you'd find that in the situation where you had this problem, the number of running processes was always 3 -- which is the configured maximum -- and if you looked at the wait event in pg_stat_activity I bet you would see VacuumDelay showing up a lot. If so, raise vacuum_cost_limit considerably and over time the problem should get better. It won't be instantaneous.
Or maybe I'm wrong and you'd see something else, but whatever you did see would probably give a hint as to what the problem here is.
Re: confusion about this commit "Revert "Skip redundant anti-wraparound vacuums""
From
Julien Rouhaud
Date:
On Tue, Jun 06, 2023 at 03:30:02PM -0400, Robert Haas wrote: > On Mon, Jun 5, 2023 at 1:50 AM jiye <jiye_sw@126.com> wrote: > > > we can not get determinate test case as this issue reproduce only once, > > and currently autovaccum can works as we using vacuum freeze for each > > tables of each database. > > > > our client's application is real online bank business, and have serveral > > customer database, do a majority of update opertaion as result trigger > > some table dead_tup_ratio nealy 100%, but can not find any autovacuum > > process work for a very long time before we do vacuum freeze manally. > > > > I tend to doubt that this is caused by the commit you're blaming, because > that commit purports to skip autovacuum operations only if some other > vacuum has already done the work. Here you are saying that you see no > autovacuum tasks at all. I'm a bit confused about what commit is actually being discussed here. Is it commit 2aa6e331ead7f3ad080561495ad4bd3bc7cd8913? FTR this commit was indeed problematic and eventually reverted in 12.3 (3ec8576a02b2b06aa214c8f3c2c3303c8a67639f), as it was leading to exactly the problem described here (autovacuum kept triggering the same jobs that were silently ignored, leading to absolutely no visible activity from a user point of view).
actually out test instance include 2aa6e331ead7f3ad080561495ad4bd3bc7cd8913 this commit, not yet reverted this commit.
---- Replied Message ----
From | Julien Rouhaud<rjuju123@gmail.com> |
Date | 6/7/2023 14:00 |
To | Robert Haas<robertmhaas@gmail.com> |
Cc | jiye<jiye_sw@126.com> , tgl@sss.pgh.pa.us<tgl@sss.pgh.pa.us> , pgsql-hackers@lists.postgresql.org<pgsql-hackers@lists.postgresql.org> |
Subject | Re: confusion about this commit "Revert "Skip redundant anti-wraparound vacuums"" |
On Mon, Jun 5, 2023 at 1:50 AM jiye <jiye_sw@126.com> wrote:we can not get determinate test case as this issue reproduce only once,
and currently autovaccum can works as we using vacuum freeze for each
tables of each database.
our client's application is real online bank business, and have serveral
customer database, do a majority of update opertaion as result trigger
some table dead_tup_ratio nealy 100%, but can not find any autovacuum
process work for a very long time before we do vacuum freeze manally.
I tend to doubt that this is caused by the commit you're blaming, because
that commit purports to skip autovacuum operations only if some other
vacuum has already done the work. Here you are saying that you see no
autovacuum tasks at all.
I'm a bit confused about what commit is actually being discussed here.
Is it commit 2aa6e331ead7f3ad080561495ad4bd3bc7cd8913? FTR this commit was
indeed problematic and eventually reverted in 12.3
(3ec8576a02b2b06aa214c8f3c2c3303c8a67639f), as it was leading to exactly the
problem described here (autovacuum kept triggering the same jobs that were
silently ignored, leading to absolutely no visible activity from a user point
of view).
Re: confusion about this commit "Revert "Skip redundant anti-wraparound vacuums""
From
Julien Rouhaud
Date:
On Wed, Jun 07, 2023 at 03:12:44PM +0800, jiye wrote: > actually out test instance include 2aa6e331ead7f3ad080561495ad4bd3bc7cd8913 > this commit, not yet reverted this commit. Are you saying that you're doing tests relying on a version that's missing about 3 years of security and bug fixes? You should definitely update to the latest minor version (currently 12.15) and keep applying all minor versions as they get released.
we will update all commits with latest version certaintly, but we must confirm that this issue is same with it currently we
can not confirm this issue can be fixed by revert 2aa6e331ead7f3ad080561495ad4bd3bc7cd8913 this commit,
so i just query about how this commit can trigger autovacuum lock down or does not work.
---- Replied Message ----
From | Julien Rouhaud<rjuju123@gmail.com> |
Date | 6/7/2023 15:36 |
To | jiye<jiye_sw@126.com> |
Cc | robertmhaas@gmail.com<robertmhaas@gmail.com> , tgl@sss.pgh.pa.us<tgl@sss.pgh.pa.us> , pgsql-hackers@lists.postgresql.org<pgsql-hackers@lists.postgresql.org> |
Subject | Re: confusion about this commit "Revert "Skip redundant anti-wraparound vacuums"" |
actually out test instance include 2aa6e331ead7f3ad080561495ad4bd3bc7cd8913
this commit, not yet reverted this commit.
Are you saying that you're doing tests relying on a version that's missing
about 3 years of security and bug fixes? You should definitely update to the
latest minor version (currently 12.15) and keep applying all minor versions as
they get released.
Re: confusion about this commit "Revert "Skip redundant anti-wraparound vacuums""
From
Julien Rouhaud
Date:
On Wed, Jun 07, 2023 at 03:42:25PM +0800, jiye wrote: > we will update all commits with latest version certaintly, but we must > confirm that this issue is same with it currently we can not confirm this > issue can be fixed by revert 2aa6e331ead7f3ad080561495ad4bd3bc7cd8913 this > commit, so i just query about how this commit can trigger autovacuum lock > down or does not work. The revert commit contains a description of the problem and a link to the discussion and analysis that led to that revert.