Re: Berserk Autovacuum (let's save next Mandrill) - Mailing list pgsql-hackers

From Darafei "Komяpa" Praliaskouski
Subject Re: Berserk Autovacuum (let's save next Mandrill)
Date
Msg-id CAC8Q8tLiFPDVU3X35WMSzysTU3NCpT7F9_=a3E55ucmaK27njQ@mail.gmail.com
Whole thread Raw
In response to Re: Berserk Autovacuum (let's save next Mandrill)  (Laurenz Albe <laurenz.albe@cybertec.at>)
Responses Re: Berserk Autovacuum (let's save next Mandrill)
List pgsql-hackers
On Fri, Mar 13, 2020 at 3:19 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Fri, 2020-03-13 at 09:10 +1300, David Rowley wrote:
> > So you're suggesting we drive the insert-vacuums from existing
> > scale_factor and threshold?  What about the 1 billion row table
> > example above?
>
> I am still not 100% certain if that is really realistic.
> Transactions that insert only a single row are probably the
> exception in large insert-only tables.
>
> But I think that we probably always can find a case where any given
> parameter setting is not so great, so in order to get ahead
> let's decide on something that is not right out stupid.
> Changing the defaults later is always an option.
>
> So the three options are:
>
> 1. introduce no new parameters and trigger autovacuum if the number
>    of inserts exceeds the regular vacuum threshold.
>
> 2. introduce the new parameters with high base threshold and zero scale factor.

Both of these look good to me. 1 is approach in my initial patch
sketch, 2 is approach taken by Laurenz.
Values I think in when considering vacuum is "how many megabytes of
table aren't frozen/visible" (since that's what translates into
processing time knowing io limits of storage), and "how many pages
aren't yet vacuumed".

Threshold in Laurenz's patch was good enough for my taste - it's
basically "vacuum after every gigabyte", and that's exactly what we
implemented when working around this issue manually. There's enough
chance that latest gigabyte is in RAM and vacuum will be super fast on
it; reading a gigabyte of data is not a showstopper for most
contemporary physical and cloud environments I can think of. If
reading a gigabyte is a problem already then wraparound is a
guaranteed disaster.

About index only scan, this threshold seems good enough too. There's a
good chance last gig is already in RAM, and previous data was
processed with previous vacuum. Anyway - with this patch Index Only
Scan starts actually working :)

I'd vote for 2 with a note "rip it off all together later and redesign
scale factors and thresholds system to something more easily
graspable". Whoever needs to cancel the new behavior for some reason
will have a knob then, and patch is laid out already.

> 3. introduce the new parameters with low base threshold and high scale factor.

This looks bad to me. "the bigger the table, the longer we wait" does
not look good for me for something designed as a measure preventing
issues with big tables.

> I think all three are viable.
> If nobody else wants to weigh in, throw a coin.
>
> Yours,
> Laurenz Albe
>


-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa



pgsql-hackers by date:

Previous
From: Kuntal Ghosh
Date:
Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Next
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager