Thread: Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Alvaro Herrera
Date:
On 2018-Sep-13, Michael Paquier wrote: > Improve autovacuum logging for aggressive and anti-wraparound runs > > A log message was being generated when log_min_duration is reached for > autovacuum on a given relation to indicate if it was an aggressive run, > and missed the point of mentioning if it is doing an anti-wrapround > run. The log message generated is improved so as one, both or no extra > details are added depending on the option set. Hmm, can a for-wraparound vacuum really not be aggressive? I think one of those four cases is really dead code. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Michael Paquier
Date:
On Fri, Sep 14, 2018 at 12:35:54PM -0300, Alvaro Herrera wrote: > On 2018-Sep-13, Michael Paquier wrote: >> Improve autovacuum logging for aggressive and anti-wraparound runs >> >> A log message was being generated when log_min_duration is reached for >> autovacuum on a given relation to indicate if it was an aggressive run, >> and missed the point of mentioning if it is doing an anti-wrapround >> run. The log message generated is improved so as one, both or no extra >> details are added depending on the option set. > > Hmm, can a for-wraparound vacuum really not be aggressive? I think one > of those four cases is really dead code. Sure, at the same time it is a no-brainer to keep the code as is, and seeing log messages where (!aggressive && wraparound) would be an indication of surrounding bugs, no? There have been issues in this area in the past. -- Michael
Attachment
On Fri, Sep 14, 2018 at 11:35 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > On 2018-Sep-13, Michael Paquier wrote: >> Improve autovacuum logging for aggressive and anti-wraparound runs >> >> A log message was being generated when log_min_duration is reached for >> autovacuum on a given relation to indicate if it was an aggressive run, >> and missed the point of mentioning if it is doing an anti-wrapround >> run. The log message generated is improved so as one, both or no extra >> details are added depending on the option set. > > Hmm, can a for-wraparound vacuum really not be aggressive? I think one > of those four cases is really dead code. My first question was whether TWO of them were dead code ... isn't an aggressive vacuum to prevent wraparound, and a vacuum to prevent wraparound aggressive? I can't figure out what this is giving us that we didn't have before. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru
From
Sergei Kornilov
Date:
Hello, Robert > My first question was whether TWO of them were dead code ... isn't an > aggressive vacuum to prevent wraparound, and a vacuum to prevent > wraparound aggressive? Maybe i am wrong, aggressive autovacuum was your commit. Message split was in b55509332f50f998b6e8b3830a51c5b9d8f666aa Aggressive autovacuum was in fd31cd265138019dcccc9b5fe53043670898bc9f If aggressive really is wraparound without difference - i think we need refactor this code, it is difficult have two differentflags for same purpose. But as far i can see it is possible have aggressive non-wraparound vacuum. One important difference - regular and aggressiveregular can be canceled by backend,.wraparound autovacuum can not. (by checking PROC_VACUUM_FOR_WRAPAROUND in src/backend/storage/lmgr/proc.c) regards, Sergei
Hi, On 2018-09-21 20:38:16 +0300, Sergei Kornilov wrote: > > My first question was whether TWO of them were dead code ... isn't an > > aggressive vacuum to prevent wraparound, and a vacuum to prevent > > wraparound aggressive? > Maybe i am wrong, aggressive autovacuum was your commit. > Message split was in b55509332f50f998b6e8b3830a51c5b9d8f666aa > Aggressive autovacuum was in fd31cd265138019dcccc9b5fe53043670898bc9f > > If aggressive really is wraparound without difference - i think we need refactor this code, it is difficult have two differentflags for same purpose. > > But as far i can see it is possible have aggressive non-wraparound vacuum. One important difference - regular and aggressiveregular can be canceled by backend,.wraparound autovacuum can not. (by checking PROC_VACUUM_FOR_WRAPAROUND in src/backend/storage/lmgr/proc.c) Yes, without checking the code, they should be different. Aggressive is controlled by vacuum_freeze_table_age whereas anti-wrap is controlled by autovacuum_freeze_max_age (but also implies aggressive). Greetings, Andres Freund
> On Sep 21, 2018, at 12:43 PM, Andres Freund <andres@anarazel.de> wrote: > >> But as far i can see it is possible have aggressive non-wraparound vacuum. One important difference - regular and aggressiveregular can be canceled by backend,.wraparound autovacuum can not. (by checking PROC_VACUUM_FOR_WRAPAROUND in src/backend/storage/lmgr/proc.c) > > Yes, without checking the code, they should be different. Aggressive is > controlled by vacuum_freeze_table_age whereas anti-wrap is controlled by > autovacuum_freeze_max_age (but also implies aggressive). Right, except that by the time you get into the vacuum code itself nothing should really care about that difference. AFAICT,the only thing is_wraparound is being used for is to set MyPgXact->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND, whichprevents the deadlock detector from killing an autovac process that’s trying to prevent a wraparound. I think it’d beclearer to remove is_wraparound and move the check from vacuum_rel() into lazy_vacuum_rel() (which is where the limitsfor HeapTupleSatisfiesVacuum get determined). Something like the attached.
Attachment
On 2018-09-24 18:25:46 +0000, Nasby, Jim wrote: > > > On Sep 21, 2018, at 12:43 PM, Andres Freund <andres@anarazel.de> wrote: > > > >> But as far i can see it is possible have aggressive non-wraparound vacuum. One important difference - regular and aggressiveregular can be canceled by backend,.wraparound autovacuum can not. (by checking PROC_VACUUM_FOR_WRAPAROUND in src/backend/storage/lmgr/proc.c) > > > > Yes, without checking the code, they should be different. Aggressive is > > controlled by vacuum_freeze_table_age whereas anti-wrap is controlled by > > autovacuum_freeze_max_age (but also implies aggressive). > > Right, except that by the time you get into the vacuum code itself nothing should really care about that difference. AFAICT,the only thing is_wraparound is being used for is to set MyPgXact->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND, whichprevents the deadlock detector from killing an autovac process that’s trying to prevent a wraparound. I think it’d beclearer to remove is_wraparound and move the check from vacuum_rel() into lazy_vacuum_rel() (which is where the limitsfor HeapTupleSatisfiesVacuum get determined). Something like the attached. I'm very doubtful this is an improvement. Especially with the upcoming pluggable storage work making vacuumlazy.c heap specific, while vacuum.c stays generic. The concept of something like PROC_VACUUM_FOR_WRAPAROUND, should imo not be pushed down that much (even if criteria for it might). Greetings, Andres Freund
> On Sep 24, 2018, at 1:29 PM, Andres Freund <andres@anarazel.de> wrote: > > I'm very doubtful this is an improvement. Especially with the upcoming > pluggable storage work making vacuumlazy.c heap specific, while vacuum.c > stays generic. The concept of something like > PROC_VACUUM_FOR_WRAPAROUND, should imo not be pushed down that much > (even if criteria for it might). That’s already a problem since vacuum logging is spread all over while autovac logging is not. Perhaps there needs to besome sort of vacuum_log() function that immediately provides output for manual vacuums, but aggregates output for autovac.AFAIK that’s the only real reason for autovac logging being a special case today.
Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru
From
Sergei Kornilov
Date:
Hi > An autovacuum can't be just aggressive; it's either anti-wraparound or normal. But autovacuum _can_ be aggressive and not anti-wraparound. I build current master and can see 3 different line types: 2018-09-24 23:47:31.500 MSK 27939 @ from [vxid:4/272032 txid:0] [] LOG: automatic aggressive vacuum of table "postgres.public.foo":index scans: 0 2018-09-24 23:49:27.892 MSK 28333 @ from [vxid:4/284297 txid:0] [] LOG: automatic aggressive vacuum to prevent wraparoundof table "postgres.public.foo": index scans: 0 2018-09-24 23:49:29.093 MSK 28337 @ from [vxid:4/284412 txid:0] [] LOG: automatic vacuum of table "postgres.public.foo":index scans: 0 regards, Sergei
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Alvaro Herrera
Date:
On 2018-Sep-24, Sergei Kornilov wrote: > Hi > > > An autovacuum can't be just aggressive; it's either anti-wraparound or normal. > But autovacuum _can_ be aggressive and not anti-wraparound. > I build current master and can see 3 different line types: > 2018-09-24 23:47:31.500 MSK 27939 @ from [vxid:4/272032 txid:0] [] LOG: automatic aggressive vacuum of table "postgres.public.foo":index scans: 0 > 2018-09-24 23:49:27.892 MSK 28333 @ from [vxid:4/284297 txid:0] [] LOG: automatic aggressive vacuum to prevent wraparoundof table "postgres.public.foo": index scans: 0 > 2018-09-24 23:49:29.093 MSK 28337 @ from [vxid:4/284412 txid:0] [] LOG: automatic vacuum of table "postgres.public.foo":index scans: 0 Exactly. It cannot be anti-wraparound and not aggressive, which is the line type not shown. "Aggressive" means it scans all pages; "anti-wraparound" means it does not let itself be cancelled because of another process waiting for a lock on the table. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Masahiko Sawada
Date:
On Tue, Sep 25, 2018 at 6:11 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2018-Sep-24, Sergei Kornilov wrote: > > > Hi > > > > > An autovacuum can't be just aggressive; it's either anti-wraparound or normal. > > But autovacuum _can_ be aggressive and not anti-wraparound. > > I build current master and can see 3 different line types: > > 2018-09-24 23:47:31.500 MSK 27939 @ from [vxid:4/272032 txid:0] [] LOG: automatic aggressive vacuum of table "postgres.public.foo":index scans: 0 > > 2018-09-24 23:49:27.892 MSK 28333 @ from [vxid:4/284297 txid:0] [] LOG: automatic aggressive vacuum to prevent wraparoundof table "postgres.public.foo": index scans: 0 > > 2018-09-24 23:49:29.093 MSK 28337 @ from [vxid:4/284412 txid:0] [] LOG: automatic vacuum of table "postgres.public.foo":index scans: 0 > > Exactly. > > It cannot be anti-wraparound and not aggressive, which is the line type > not shown. > > "Aggressive" means it scans all pages; "anti-wraparound" means it does > not let itself be cancelled because of another process waiting for a > lock on the table. > I agree. Can we fix this simply by the attached patch? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Michael Paquier
Date:
On Fri, Sep 28, 2018 at 01:53:14PM +0900, Masahiko Sawada wrote: > I agree. Can we fix this simply by the attached patch? Thanks for sending a patch. + /* autovacuum cannot be anti-wraparound and not aggressive vacuum */ + Assert(aggressive); + msgfmt = _("automatic aggressive vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n"); While adding this comment in lazy_vacuum_rel() is adapted, I think that we ought to make the assertion check more aggressive by not having it depend on if log_min_duration is set or not. So why not moving that to a place a bit higher, where aggressive gets defined? -- Michael
Attachment
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Masahiko Sawada
Date:
On Tue, Oct 2, 2018 at 9:11 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Sep 28, 2018 at 01:53:14PM +0900, Masahiko Sawada wrote: > > I agree. Can we fix this simply by the attached patch? > > Thanks for sending a patch. > > + /* autovacuum cannot be anti-wraparound and not aggressive vacuum */ > + Assert(aggressive); > + msgfmt = _("automatic aggressive vacuum to prevent wraparound of table \"%s.%s.%s\": index scans: %d\n"); > > While adding this comment in lazy_vacuum_rel() is adapted, I think that > we ought to make the assertion check more aggressive by not having it > depend on if log_min_duration is set or not. So why not moving that to > a place a bit higher, where aggressive gets defined? Since there is no place where checks params->is_wraparound we will have to do something like; if (params->is_wraparound) Assert(aggressive); Or you meant the following? Assert(params->is_wraparound ? aggressive : true); I'm not sure both styles would be appropriate style in the postgres code so I would rather add elog(ERROR) instead. Thought? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Michael Paquier
Date:
On Tue, Oct 02, 2018 at 01:18:01PM +0900, Masahiko Sawada wrote: > I'm not sure both styles would be appropriate style in the postgres > code so I would rather add elog(ERROR) instead. Thought? My brain is rather fried for the rest of the day... But we could just be looking at using USE_ASSERT_CHECKING. Thoughts from other are welcome. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > My brain is rather fried for the rest of the day... But we could just > be looking at using USE_ASSERT_CHECKING. Thoughts from other are > welcome. I'd go with folding the condition into a plain Assert. Then it's obvious that no code is added in a non-assert build. I can see that some cases might be so complicated that that isn't nice, but this case doesn't seem to qualify. regards, tom lane
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Masahiko Sawada
Date:
On Tue, Oct 2, 2018 at 11:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Michael Paquier <michael@paquier.xyz> writes: > > My brain is rather fried for the rest of the day... But we could just > > be looking at using USE_ASSERT_CHECKING. Thoughts from other are > > welcome. > > I'd go with folding the condition into a plain Assert. Then it's > obvious that no code is added in a non-assert build. I can see that > some cases might be so complicated that that isn't nice, but this > case doesn't seem to qualify. > Thank you for the comment. Attached the updated patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Michael Paquier
Date:
On Wed, Oct 03, 2018 at 04:37:29PM +0900, Masahiko Sawada wrote: > Thank you for the comment. Attached the updated patch. Thanks for the patch. This looks clean to me at quick glance, not tested though.. -- Michael
Attachment
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Michael Paquier
Date:
On Wed, Oct 03, 2018 at 04:37:29PM +0900, Masahiko Sawada wrote: > Thank you for the comment. Attached the updated patch. So, I have come back to this stuff, and finished with the attached instead, so as the assertion is in a single place. I find that clearer. The comments have also been improved. Thoughts? -- Michael
Attachment
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Masahiko Sawada
Date:
On Fri, Oct 5, 2018 at 12:16 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Oct 03, 2018 at 04:37:29PM +0900, Masahiko Sawada wrote: > > Thank you for the comment. Attached the updated patch. > > So, I have come back to this stuff, and finished with the attached > instead, so as the assertion is in a single place. I find that > clearer. The comments have also been improved. Thoughts? Thank you! The patch looks good to me. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Michael Paquier
Date:
On Fri, Oct 05, 2018 at 12:16:03PM +0900, Michael Paquier wrote: > So, I have come back to this stuff, and finished with the attached > instead, so as the assertion is in a single place. I find that > clearer. The comments have also been improved. Thoughts? And so... I have been looking at committing this thing, and while testing in-depth I have been able to trigger a case where an autovacuum has been able to be not aggressive but anti-wraparound, which is exactly what should not be possible, no? I have simply created an instance with autovacuum_freeze_max_age = 200000, then ran pgbench with autovacuum_freeze_table_age=200000 set for each table, and also ran installcheck-world in parallel. This has been able to trigger the assertion pretty quickly. Here is the stack trace: #2 0x000055e1a12ef87b in ExceptionalCondition (conditionName=0x55e1a14b45c8 "!((params->is_wraparound && aggressive) || !params->is_wraparound)", errorType=0x55e1a14b459d "FailedAssertion", fileName=0x55e1a14b4590 "vacuumlazy.c", lineNumber=254) at assert.c:54 #3 0x000055e1a0f6c6ad in lazy_vacuum_rel (onerel=0x7f163e7ea710, options=65, params=0x55e1a2eeba70, bstrategy=0x55e1a2f5c1a0) at vacuumlazy.c:253 #4 0x000055e1a0f6c217 in vacuum_rel (relid=1260, relation=0x55e1a2f5d748, options=65, params=0x55e1a2eeba70) at vacuum.c:1714 #5 0x000055e1a0f6a3ac in vacuum (options=65, relations=0x55e1a2f2f050, params=0x55e1a2eeba70, bstrategy=0x55e1a2f5c1a0, isTopLevel=true) at vacuum.c:340 #6 0x000055e1a10c1ddd in autovacuum_do_vac_analyze (tab=0x55e1a2eeba68, bstrategy=0x55e1a2f5c1a0) at autovacuum.c:3121 #7 0x000055e1a10c0e19 in do_autovacuum () at autovacuum.c:2476 $2 = {spcNode = 1664, dbNode = 0, relNode = 1260} (gdb) p onerel->rd_node.relNode $3 = 1260 (gdb) p params->is_wraparound $4 = true (gdb) p aggressive $5 = false I still have the core file, the binaries and the data folder, so it's not a problem to dig into it. -- Michael
Attachment
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Kyotaro HORIGUCHI
Date:
At Fri, 5 Oct 2018 15:35:04 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20181005063504.GB14664@paquier.xyz> > On Fri, Oct 05, 2018 at 12:16:03PM +0900, Michael Paquier wrote: > > So, I have come back to this stuff, and finished with the attached > > instead, so as the assertion is in a single place. I find that > > clearer. The comments have also been improved. Thoughts? > > And so... I have been looking at committing this thing, and while > testing in-depth I have been able to trigger a case where an autovacuum > has been able to be not aggressive but anti-wraparound, which is exactly > what should not be possible, no? I have simply created an instance with > autovacuum_freeze_max_age = 200000, then ran pgbench with > autovacuum_freeze_table_age=200000 set for each table, and also ran > installcheck-world in parallel. This has been able to trigger the > assertion pretty quickly. I investigated it and in short, it can happen. It is a kind of race consdition between two autovacuum processes. do_autovacuum() looks into pg_class (using a snapshot) and vacuum_set_xid_limits() looks into relcache. If concurrent vacuum happens and one has finished the relation, another gets relcache invalidation and relfrozenxid is updated. If this happens between do_autovacuum() and vacuum_set_xid_limits(), the latter sees newer relfrozenxid than the former. The problem happens when it moves by more than 5% of autovacuum_freeze_max_age. If lazy_vacuum_rel() sees the situation, the relation is already aggressively vacuumed by a cocurrent worker. We can just ingore the state safely but also we know that the vacuum is useless. 1. Just allow the case there (and add comment). Causes redundant anti-wraparound vacuum. 2. Skip the relation by the condition. I think that we can safely skip the relation in the case. (attached) 3. Ensure that do_autovacuum always sees the same relfrozenxid with vacuum_set_xid_limits(). 4. Prevent concurrent acuuming of the same relation rigorously, somehow. Thoughts? regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c index 8996d366e9..436d573454 100644 --- a/src/backend/commands/vacuumlazy.c +++ b/src/backend/commands/vacuumlazy.c @@ -249,6 +249,21 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params, if (options & VACOPT_DISABLE_PAGE_SKIPPING) aggressive = true; + /* + * When is_wraparound, relfrozenxid is old enough and aggressive must be + * true here. However, if another concurrent vacuum process has processed + * this relation, relfrozenxid in relcache can be moved forward enough so + * that aggressive is calculated as false here. This relation no longer + * needs to be vacuumed. Bail out. + */ + if (params->is_wraparound && !aggressive) + { + elog(DEBUG1, "relation %d has been vacuumd ocncurrently, skip", + onerel->rd_id); + return; + } + + vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats)); vacrelstats->old_rel_pages = onerel->rd_rel->relpages;
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Andrew Dunstan
Date:
On 10/9/18 5:15 AM, Kyotaro HORIGUCHI wrote: > At Fri, 5 Oct 2018 15:35:04 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20181005063504.GB14664@paquier.xyz> >> On Fri, Oct 05, 2018 at 12:16:03PM +0900, Michael Paquier wrote: >>> So, I have come back to this stuff, and finished with the attached >>> instead, so as the assertion is in a single place. I find that >>> clearer. The comments have also been improved. Thoughts? >> And so... I have been looking at committing this thing, and while >> testing in-depth I have been able to trigger a case where an autovacuum >> has been able to be not aggressive but anti-wraparound, which is exactly >> what should not be possible, no? I have simply created an instance with >> autovacuum_freeze_max_age = 200000, then ran pgbench with >> autovacuum_freeze_table_age=200000 set for each table, and also ran >> installcheck-world in parallel. This has been able to trigger the >> assertion pretty quickly. > I investigated it and in short, it can happen. > > It is a kind of race consdition between two autovacuum > processes. do_autovacuum() looks into pg_class (using a snapshot) > and vacuum_set_xid_limits() looks into relcache. If concurrent > vacuum happens and one has finished the relation, another gets > relcache invalidation and relfrozenxid is updated. If this > happens between do_autovacuum() and vacuum_set_xid_limits(), the > latter sees newer relfrozenxid than the former. The problem > happens when it moves by more than 5% of > autovacuum_freeze_max_age. > > If lazy_vacuum_rel() sees the situation, the relation is already > aggressively vacuumed by a cocurrent worker. We can just ingore > the state safely but also we know that the vacuum is useless. > > 1. Just allow the case there (and add comment). > Causes redundant anti-wraparound vacuum. > > 2. Skip the relation by the condition. > > I think that we can safely skip the relation in the > case. (attached) > > 3. Ensure that do_autovacuum always sees the same relfrozenxid > with vacuum_set_xid_limits(). > > 4. Prevent concurrent acuuming of the same relation rigorously, > somehow. > > Thoughts? > I notice that this seems never to have been acted on. I think we should apply this and remove the (confusing) message setting for the case we'll now be avoiding. If not we should at least comment there that this is a case we only expect to see in pathological cases. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Michael Paquier
Date:
On Fri, Mar 08, 2019 at 05:05:53PM -0500, Andrew Dunstan wrote: > I notice that this seems never to have been acted on. I think we should > apply this and remove the (confusing) message setting for the case we'll > now be avoiding. If not we should at least comment there that this is a > case we only expect to see in pathological cases. Sorry for dropping the ball, I would have assumed that Robert would handle it as he is at the origin of the introduction of the aggressive option via fd31cd26. + elog(DEBUG1, "relation %d has been vacuumd ocncurrently, skip", The proposed patch has two typos in two words. I am adding an open item about that. I think I could commit the patch, but I need to study it a bit more first. -- Michael
Attachment
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Michael Paquier
Date:
On Sat, Mar 09, 2019 at 10:15:37AM +0900, Michael Paquier wrote: > I am adding an open item about that. I think I could commit the > patch, but I need to study it a bit more first. So, coming back to this thread, and studying the problem again, it looks that the diagnostic that a non-aggressive, anti-wraparound vacuum could be triggered because the worker sees trouble in the force because of some activity happening in parallel. Hence, if we face this case, it looks right to skip the vacuum for this relation. Attached is an updated patch with a better error message, more comments, and the removal of the anti-wraparound non-aggressive log which was added in 28a8fa9. The error message could be better I guess. Suggestions are welcome. Thoughts? -- Michael
Attachment
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Andrew Dunstan
Date:
On 3/29/19 7:51 AM, Michael Paquier wrote: > On Sat, Mar 09, 2019 at 10:15:37AM +0900, Michael Paquier wrote: >> I am adding an open item about that. I think I could commit the >> patch, but I need to study it a bit more first. > So, coming back to this thread, and studying the problem again, it > looks that the diagnostic that a non-aggressive, anti-wraparound > vacuum could be triggered because the worker sees trouble in the > force because of some activity happening in parallel. Hence, if we > face this case, it looks right to skip the vacuum for this relation. > > Attached is an updated patch with a better error message, more > comments, and the removal of the anti-wraparound non-aggressive log > which was added in 28a8fa9. The error message could be better I > guess. Suggestions are welcome. > > Thoughts? + (errmsg_internal("found vacuum to prevent wraparound of table \"%s.%s.%s\" to be not aggressive, so skipping", This might convey something to hackers, but I doubt it will convey much to regular users. Perhaps something like "skipping redundant anti-wraparound vacuum of table ..." would be better. The comment is also a bit too tentative. Perhaps something like this would do: Normally the relfrozenxid for an anti-wraparound vacuum will be old enough to force an aggressive vacuum. However, a concurrent vacuum might have already done this work that the relfroxzenxid in relcache has been updated. If that happens this vacuum is redundant, so skip it. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Michael Paquier
Date:
On Fri, Mar 29, 2019 at 09:11:47AM -0400, Andrew Dunstan wrote: > + (errmsg_internal("found vacuum to prevent wraparound of > table \"%s.%s.%s\" to be not aggressive, so skipping", > > This might convey something to hackers, but I doubt it will convey much > to regular users. Perhaps something like "skipping redundant > anti-wraparound vacuum of table ..." would be better. "skipping redundant" is much better. > The comment is also a bit too tentative. Perhaps something like this > would do: > > Normally the relfrozenxid for an anti-wraparound vacuum will be old > enough to force an aggressive vacuum. However, a concurrent vacuum > might have already done this work that the relfroxzenxid in relcache > has been updated. If that happens this vacuum is redundant, so skip it. That works for me. -- Michael
Attachment
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Alvaro Herrera
Date:
On 2019-Mar-29, Michael Paquier wrote: > On Fri, Mar 29, 2019 at 09:11:47AM -0400, Andrew Dunstan wrote: > > + (errmsg_internal("found vacuum to prevent wraparound of > > table \"%s.%s.%s\" to be not aggressive, so skipping", > > > > This might convey something to hackers, but I doubt it will convey much > > to regular users. Perhaps something like "skipping redundant > > anti-wraparound vacuum of table ..." would be better. > > "skipping redundant" is much better. Yeah, that looks good to me too. I wonder if we really need it as LOG though; we don't say anything for actions unless they take more than the min duration, so why say something for a no-op that takes almost no time? Maybe make it DEBUG1. > > The comment is also a bit too tentative. Perhaps something like this > > would do: > > > > Normally the relfrozenxid for an anti-wraparound vacuum will be old > > enough to force an aggressive vacuum. However, a concurrent vacuum > > might have already done this work that the relfroxzenxid in relcache > > has been updated. If that happens this vacuum is redundant, so skip it. > > That works for me. s/relfroxzenxid/relfrozenxid/ -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Michael Paquier
Date:
On Fri, Mar 29, 2019 at 11:22:55AM -0300, Alvaro Herrera wrote: > Yeah, that looks good to me too. I wonder if we really need it as LOG > though; we don't say anything for actions unless they take more than the > min duration, so why say something for a no-op that takes almost no time? > Maybe make it DEBUG1. I think that this does not justify a WARNING, as that's harmless for the user even if we use WARNING for other skips (see vacuum_is_relation_owner). However DEBUG1 is also too low in my opinion as this log can be used as an indicator that autovacuum is too much aggressive because there are too many workers for example. I have seen that matter in some CPU-bound environments. I won't fight hard if the consensus is to use DEBUG1 though. So, more opinions? Andrew perhaps? > s/relfroxzenxid/relfrozenxid/ Sure. -- Michael
Attachment
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Andrew Dunstan
Date:
On 3/29/19 9:08 PM, Michael Paquier wrote: > On Fri, Mar 29, 2019 at 11:22:55AM -0300, Alvaro Herrera wrote: >> Yeah, that looks good to me too. I wonder if we really need it as LOG >> though; we don't say anything for actions unless they take more than the >> min duration, so why say something for a no-op that takes almost no time? >> Maybe make it DEBUG1. > I think that this does not justify a WARNING, as that's harmless for > the user even if we use WARNING for other skips (see > vacuum_is_relation_owner). However DEBUG1 is also too low in my > opinion as this log can be used as an indicator that autovacuum is too > much aggressive because there are too many workers for example. I > have seen that matter in some CPU-bound environments. I won't fight > hard if the consensus is to use DEBUG1 though. So, more opinions? > Andrew perhaps? > > It's really just a matter of housekeeping as I see it, so probably DEBUG1 is right. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Michael Paquier
Date:
On Sat, Mar 30, 2019 at 08:16:21AM -0400, Andrew Dunstan wrote: > It's really just a matter of housekeeping as I see it, so probably > DEBUG1 is right. Okay, I'll use that then. -- Michael
Attachment
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Michael Paquier
Date:
On Sat, Mar 30, 2019 at 10:12:33PM +0900, Michael Paquier wrote: > Okay, I'll use that then. And finally committed. I have changed the debug1 message so as "to prevent wraparound" is used instead of "anti-wraparound". I have noticed something which was also missing from all the patches proposed on this thread: a reset of the progress state was not done, and we need to call pgstat_progress_end_command() in order to do that. -- Michael
Attachment
Hi, On 2019-03-29 20:51:38 +0900, Michael Paquier wrote: > So, coming back to this thread, and studying the problem again, it > looks that the diagnostic that a non-aggressive, anti-wraparound > vacuum could be triggered because the worker sees trouble in the > force because of some activity happening in parallel. Hence, if we > face this case, it looks right to skip the vacuum for this relation. > > Attached is an updated patch with a better error message, more > comments, and the removal of the anti-wraparound non-aggressive log > which was added in 28a8fa9. The error message could be better I > guess. Suggestions are welcome. > diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c > index 5c554f9465..82be8c81f3 100644 > --- a/src/backend/access/heap/vacuumlazy.c > +++ b/src/backend/access/heap/vacuumlazy.c > @@ -248,6 +248,25 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params, > if (params->options & VACOPT_DISABLE_PAGE_SKIPPING) > aggressive = true; > > + /* > + * When running an anti-wraparound vacuum, we expect relfrozenxid to be > + * old enough so as aggressive is always set. If this is not the case, > + * it could be possible that another concurrent vacuum process has done > + * the work for this relation so that relfrozenxid in relcache has > + * already been moved forward enough, causing this vacuum run to be > + * non-aggressive. If that happens, note that this relation no longer > + * needs to be vacuumed, so just skip it. > + */ > + if (params->is_wraparound && !aggressive) > + { > + ereport(LOG, > + (errmsg_internal("found vacuum to prevent wraparound of table \"%s.%s.%s\" to be not aggressive, so skipping", > + get_database_name(MyDatabaseId), > + get_namespace_name(RelationGetNamespace(onerel)), > + RelationGetRelationName(onerel)))); > + return; > + } > + Which valid scenario can lead to this? Neither the comment, nor commit message explain it. Unless you're thinking of scenarios where autovacuum and manual vacuum are mixed, I don't really see valid reasons? Normally autovacuum's locking + the table_recheck_autovac() check should prevent problematic scenarios. I do see a few scenarios that can trigger this - but they all more or less are bugs. It doesn't strike me as a good idea to work around such bugs by silently neutering heap_vacuum_rel(). The likelihood of that temporarily covering up more severe problems seems significant - they're likely to then later bite you with a cluster shutdown. Greetings, Andres Freund
Re: pgsql: Improve autovacuum logging for aggressive andanti-wraparound ru
From
Michael Paquier
Date:
On Mon, Mar 23, 2020 at 06:41:50PM -0700, Andres Freund wrote: > Which valid scenario can lead to this? Neither the comment, nor commit > message explain it. The commit message mentions that concurrent autovacuum jobs can lead to the creation of non-aggressive and anti-wraparound jobs, which have no sense because an aggressive and anti-wraparound job was already done in parallel with a different worker, and that this was possible because of inconsistent relcache lookups across concurrent jobs. This was mentioned upthread. > Unless you're thinking of scenarios where autovacuum > and manual vacuum are mixed, I don't really see valid reasons? Normally > autovacuum's locking + the table_recheck_autovac() check should prevent > problematic scenarios. > > I do see a few scenarios that can trigger this - but they all more or > less are bugs. Hmm. OK. > It doesn't strike me as a good idea to work around such bugs by silently > neutering heap_vacuum_rel(). The likelihood of that temporarily covering > up more severe problems seems significant - they're likely to then later > bite you with a cluster shutdown. Saying that, I have been thinking about this one for a couple of days now and it seems to me that this is a factor contributing to what we are seeing in [1], and I agree that this is just an incorrect approach that makes easier to trigger the real underlying issues, while table_recheck_autovac() ought to be the only code path doing the skip job. Note that I have failed to reproduce the behavior of the other thread though, always finishing with a non-aggressive anti-wraparound skipped because of an aggressive and anti-wraparound job happened just before in parallel, and autovacuum was always able to continue triggering new jobs, keeping the relfrozenxid age at bay. So I would like to first revert that part, to have a cleaner state to work on the underlying issues. A pure revert means also adding back the log message for non-aggressive and anti-wraparound jobs that should never exist, which should be replaced by an assertion once all the holes are fixed. What do you think? [1]: https://www.postgresql.org/message-id/CAE39h23RTX1jkYjWc5tccv34HwwraizaCUxOmdQdPM+Zt5-2Qg@mail.gmail.com -- Michael