Re: Proposal: Log inability to lock pages during vacuum - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: Proposal: Log inability to lock pages during vacuum
Date
Msg-id 54861816.3020502@BlueTreble.com
Whole thread Raw
In response to Re: Proposal: Log inability to lock pages during vacuum  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Proposal: Log inability to lock pages during vacuum  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On 12/7/14, 6:16 PM, Simon Riggs wrote:
> On 20 October 2014 at 10:57, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>
>> Currently, a non-freeze vacuum will punt on any page it can't get a cleanup
>> lock on, with no retry. Presumably this should be a rare occurrence, but I
>> think it's bad that we just assume that and won't warn the user if something
>> bad is going on.
>
> (I'm having email problems, so I can't see later mails on this thread,
> so replying here.)
>
> Logging patch looks fine, but I would rather not add a line of text
> for each VACUUM, just in case this is non-zero. I think we should add
> that log line only if the blocks skipped > 0.

I thought about doing that, but I'm loath to duplicate a rather large ereport call. Happy to make the change if that's
theconsensus though.
 

> What I'm more interested in is what you plan to do with the
> information once we get it?
>
> The assumption that skipping blocks is something bad is strange. I
> added it because VACUUM could and did regularly hang on busy tables,
> which resulted in bloat because other blocks that needed cleaning
> didn't get any attention.
>
> Which is better, spend time obsessively trying to vacuum particular
> blocks, or to spend the time on other blocks that are in need of
> cleaning and are available to be cleaned?
>
> Which is better, have autovacuum or system wide vacuum progress on to
> other tables that need cleaning, or spend lots of effort retrying?
>
> How do we know what is the best next action?
>
> I'd really want to see some analysis of those things before we spend
> even more cycles on this.

That's the entire point of logging this information. There is an underlying assumption that we won't actually skip many
pages,but there's no data to back that up, nor is there currently any way to get that data.
 

My hope is that the logging shows that there isn't anything more that needs to be done here. If this is something that
causesproblems, at least now DBAs will be aware of it and hopefully we'll be able to identify specific problem
scenariosand find a solution.
 



BTW, my initial proposal[1] was strictly logging. The only difference was raising it to a warning if a significant
portionof the table was skipped. I only investigated retrying locks at the suggestion of others. I never intended this
tobecome a big time sink.
 

[1]:
"Currently, a non-freeze vacuum will punt on any page it can't get a cleanup lock on, with no retry. Presumably this
shouldbe a rare occurrence, but I think it's bad that we just assume that and won't warn the user if something bad is
goingon.
 

"My thought is that if we skip any pages elog(LOG) how many we skipped. If we skip more than 1% of the pages we visited
(notrelpages) then elog(WARNING) instead."
 
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: [v9.5] Custom Plan API
Next
From: Jim Nasby
Date:
Subject: Re: Casting issues with domains