Re: Trigger more frequent autovacuums of heavy insert tables - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Trigger more frequent autovacuums of heavy insert tables
Date
Msg-id Z8H4Os_Rtl7o5mD2@nathan
Whole thread Raw
In response to Re: Trigger more frequent autovacuums of heavy insert tables  (Melanie Plageman <melanieplageman@gmail.com>)
List pgsql-hackers
On Wed, Feb 26, 2025 at 04:48:20PM -0500, Melanie Plageman wrote:
> Makes sense. Thanks Robert and Nathan. Attached v11 changes the docs
> wording and is rebased.

0001 LGTM.

>          <para>
> -         Specifies a fraction of the table size to add to
> -         <varname>autovacuum_vacuum_insert_threshold</varname>
> -         when deciding whether to trigger a <command>VACUUM</command>.
> -         The default is <literal>0.2</literal> (20% of table size).
> -         This parameter can only be set in the <filename>postgresql.conf</filename>
> -         file or on the server command line;
> -         but the setting can be overridden for individual tables by
> -         changing table storage parameters.
> +        Specifies a fraction of the active (unfrozen) table size to add to
> +        <varname>autovacuum_vacuum_insert_threshold</varname>
> +        when deciding whether to trigger a <command>VACUUM</command>.
> +        The default is <literal>0.2</literal> (20% of active table size).
> +        This parameter can only be set in the <filename>postgresql.conf</filename>
> +        file or on the server command line;
> +        but the setting can be overridden for individual tables by
> +        changing table storage parameters.
>          </para>

nitpick: There might be an unintentional indentation change here.

I'm wondering about the use of the word "active," too.  While it's
qualified by the "(unfrozen)" after it, I'm worried it might not be
descriptive enough.  For example, I might consider a frozen page that's in
the buffer cache and is being read by queries to be "active."  And it
doesn't seem clear to me that it's referring to unfrozen pages and not
unfrozen tuples.  Perhaps we should say something like "a fraction of the
unfrozen pages in the table to add...".

> +        /*
> +         * If we have data for relallfrozen, calculate the unfrozen percentage
> +         * of the table to modify insert scale factor. This helps us decide
> +         * whether or not to vacuum an insert-heavy table based on the number
> +         * of inserts to the "active" part of the table.
> +         */
> +        if (relpages > 0 && relallfrozen > 0)

So, if we don't have this data, we just use reltuples, which is the
existing behavior and should trigger vacuums less aggressively than if we
_did_ have the data.  That seems like the correct choice to me.

> +            /*
> +             * It could be the stats were updated manually and relallfrozen >
> +             * relpages. Clamp relallfrozen to relpages to avoid nonsensical
> +             * calculations.
> +             */
> +            relallfrozen = Min(relallfrozen, relpages);
> +            pcnt_unfrozen = 1 - ((float4) relallfrozen / relpages);

Makes sense.

-- 
nathan



pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: Incorrect result of bitmap heap scan.
Next
From: Florents Tselai
Date:
Subject: Re: jsonb_strip_nulls with arrays?