Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
Date
Msg-id 20230406172056.gnp7ohxs4bykrn5z@liskov
Whole thread Raw
In response to Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
List pgsql-hackers
On Thu, Apr 06, 2023 at 11:34:44PM +1200, David Rowley wrote:
>  second On Thu, 6 Apr 2023 at 14:14, Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> >
> > Attached is v14 which adds back in tests for the BUFFER_USAGE_LIMIT
> > option.
> 
> I've spent quite a bit of time looking at this since you sent it. I've
> also made quite a few changes, mostly cosmetic ones, but there are a
> few things below which are more fundamental.
> 
> 1. I don't really think we need to support VACUUM (BUFFER_USAGE_LIMIT
> -1);  It's just the same as VACUUM;  Removing that makes the
> documentation more simple.

Agreed.
 
> 2. I don't think we really need to allow vacuum_buffer_usage_limit =
> -1.  I think we can just set this to 256 and leave it.  If we allow -1
> then we need to document what -1 means. The more I think about it, the
> more strange it seems to allow -1. I can't quite imagine work_mem = -1
> means 4MB. Why 4MB?

Agreed.
 
> 3. There was a bug in GetAccessStrategyBufferCount() (I renamed it to
> that from StrategyGetBufferCount()) that didn't handle NULL input. The
> problem was that if you set vacuum_buffer_usage_limit = 0 then did a
> parallel vacuum that GetAccessStrategyWithSize() would return NULL due
> to the 0 buffer input, but GetAccessStrategyBufferCount() couldn't
> handle NULL.  I've adjusted GetAccessStrategyBufferCount() just to
> return 0 on NULL input.

Good catch.
 
> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> index bcc49aec45..c421da348d 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -2001,6 +2001,35 @@ include_dir 'conf.d'
>        </listitem>
>       </varlistentry>
>  
> +     <varlistentry id="guc-vacuum-buffer-usage-limit" xreflabel="vacuum_buffer_usage_limit">
> +      <term>
> +       <varname>vacuum_buffer_usage_limit</varname> (<type>integer</type>)
> +       <indexterm>
> +        <primary><varname>vacuum_buffer_usage_limit</varname> configuration parameter</primary>
> +       </indexterm>
> +      </term>
> +      <listitem>
> +       <para>
> +        Specifies the size of <varname>shared_buffers</varname> to be reused
> +        for each backend participating in a given invocation of
> +        <command>VACUUM</command> or <command>ANALYZE</command> or in
> +        autovacuum.  This size is converted to the number of shared buffers
> +        which will be reused as part of a
> +        <glossterm linkend="glossary-buffer-access-strategy">Buffer Access Strategy</glossterm>.
> +        A setting of <literal>0</literal> will allow the operation to use any
> +        number of <varname>shared_buffers</varname>.  The maximum size is
> +        <literal>16 GB</literal> and the minimum size is
> +        <literal>128 KB</literal>.  If the specified size would exceed 1/8 the
> +        size of <varname>shared_buffers</varname>, it is silently capped to
> +        that value.  The default value is <literal>-1</literal>.  If this

This still says that the default value is -1.

> diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
> index b6d30b5764..0b02d9faef 100644
> --- a/doc/src/sgml/ref/vacuum.sgml
> +++ b/doc/src/sgml/ref/vacuum.sgml
> @@ -345,6 +346,24 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
>      </listitem>
>     </varlistentry>
>  
> +   <varlistentry>
> +    <term><literal>BUFFER_USAGE_LIMIT</literal></term>
> +    <listitem>
> +     <para>
> +      Specifies the
> +      <glossterm linkend="glossary-buffer-access-strategy">Buffer Access Strategy</glossterm>
> +      ring buffer size for <command>VACUUM</command>.  This size is used to
> +      calculate the number of shared buffers which will be reused as part of
> +      this strategy.  <literal>0</literal> disables use of a
> +      <literal>Buffer Access Strategy</literal>.  If <option>ANALYZE</option>
> +      is also specified, the <option>BUFFER_USAGE_LIMIT</option> value is used
> +      for both the vacuum and analyze stages.  This option can't be used with
> +      the <option>FULL</option> option except if <option>ANALYZE</option> is
> +      also specified.
> +     </para>
> +    </listitem>
> +   </varlistentry>

I noticed you seemed to have removed the reference to the GUC
vacuum_buffer_usage_limit here. Was that intentional?
We may not need to mention "falling back" as I did before, however, the
GUC docs mention max/min values and such, which might be useful.

> diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
> index ea1d8960f4..e92738c7f0 100644
> --- a/src/backend/commands/vacuum.c
> +++ b/src/backend/commands/vacuum.c
> @@ -56,6 +56,7 @@
>  #include "utils/acl.h"
>  #include "utils/fmgroids.h"
>  #include "utils/guc.h"
> +#include "utils/guc_hooks.h"
>  #include "utils/memutils.h"
>  #include "utils/pg_rusage.h"
>  #include "utils/snapmgr.h"
> @@ -95,6 +96,30 @@ static VacOptValue get_vacoptval_from_boolean(DefElem *def);
>  static bool vac_tid_reaped(ItemPointer itemptr, void *state);
>  static int    vac_cmp_itemptr(const void *left, const void *right);
>  
> +/*
> + * GUC check function to ensure GUC value specified is within the allowable
> + * range.
> + */
> +bool
> +check_vacuum_buffer_usage_limit(int *newval, void **extra,
> +                                GucSource source)
> +{
> +    /* Allow specifying the default or disabling Buffer Access Strategy */
> +    if (*newval == -1 || *newval == 0)
> +        return true;

This should not check for -1 since that isn't the default anymore.
It should only need to check for 0 I think?

> +    /* Value upper and lower hard limits are inclusive */
> +    if (*newval >= MIN_BAS_VAC_RING_SIZE_KB &&
> +        *newval <= MAX_BAS_VAC_RING_SIZE_KB)
> +        return true;
> +
> +    /* Value does not fall within any allowable range */
> +    GUC_check_errdetail("\"vacuum_buffer_usage_limit\" must be -1, 0 or between %d KB and %d KB",
> +                        MIN_BAS_VAC_RING_SIZE_KB, MAX_BAS_VAC_RING_SIZE_KB);

Also remove -1 here.

>   * Primary entry point for manual VACUUM and ANALYZE commands
>   *
> @@ -114,6 +139,8 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
>      bool        disable_page_skipping = false;
>      bool        process_main = true;
>      bool        process_toast = true;
> +    /* by default use buffer access strategy with default size */
> +    int            ring_size = -1;

We need to update this comment to something like, "use an invalid value
for ring_size" so it is clear whether or not the BUFFER_USAGE_LIMIT was
specified when making the access strategy later". Also, I think just
removing the comment would be okay, because this is the normal behavior
for initializing values, I think.

> @@ -240,6 +309,17 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
>                  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>                   errmsg("VACUUM FULL cannot be performed in parallel")));
>  
> +    /*
> +     * BUFFER_USAGE_LIMIT does nothing for VACUUM (FULL) so just raise an
> +     * ERROR for that case.  VACUUM (FULL, ANALYZE) does make use of it, so
> +     * we'll permit that.
> +     */
> +    if ((params.options & VACOPT_FULL) && !(params.options & VACOPT_ANALYZE) &&
> +        ring_size > -1)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                 errmsg("BUFFER_USAGE_LIMIT cannot be specified for VACUUM FULL")));
> +
>      /*
>       * Make sure VACOPT_ANALYZE is specified if any column lists are present.
>       */
> @@ -341,7 +421,20 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
>  
>          MemoryContext old_context = MemoryContextSwitchTo(vac_context);
>  
> -        bstrategy = GetAccessStrategy(BAS_VACUUM);

Is it worth moving this assert up above when we do the "sanity checking"
for VACUUM FULL with BUFFER_USAGE_LIMIT?

> +        Assert(ring_size >= -1);

> +        /*
> +         * If BUFFER_USAGE_LIMIT was specified by the VACUUM command, that
> +         * overrides the value of VacuumBufferUsageLimit.  Otherwise, use
> +         * VacuumBufferUsageLimit to define the size, which might be 0.  We
> +         * expliot that calling GetAccessStrategyWithSize with a zero size

s/expliot/exploit

I might rephrase the last sentence(s). Since it overrides it, I think it
is clear that if it is not specified, then the thing it overrides is
used. Then you could phrase the whole thing like:

 "If BUFFER_USAGE_LIMIT was specified by the VACUUM or ANALYZE command,
 it overrides the value of VacuumBufferUsageLimit.  Either value may be
 0, in which case GetAccessStrategyWithSize() will return NULL, which is
 the expected behavior."

> +         * returns NULL.
> +         */
> +        if (ring_size > -1)
> +            bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, ring_size);
> +        else
> +            bstrategy = GetAccessStrategyWithSize(BAS_VACUUM, VacuumBufferUsageLimit);
> +
>          MemoryContextSwitchTo(old_context);
>      }

> diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
> index c1e911b1b3..1b5f779384 100644
> --- a/src/backend/postmaster/autovacuum.c
> +++ b/src/backend/postmaster/autovacuum.c
> @@ -2287,11 +2287,21 @@ do_autovacuum(void)
>      /*
> -     * Create a buffer access strategy object for VACUUM to use.  We want to
> -     * use the same one across all the vacuum operations we perform, since the
> -     * point is for VACUUM not to blow out the shared cache.
> +     * Optionally, create a buffer access strategy object for VACUUM to use.
> +     * When creating one, we want the same one across all tables being
> +     * vacuumed this helps prevent autovacuum from blowing out shared buffers.

"When creating one" is a bit awkward. I would say something like "Use
the same BufferAccessStrategy object for all tables VACUUMed by this
worker to prevent autovacuum from blowing out shared buffers."

> diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
> index f122709fbe..710b05cbc5 100644
> --- a/src/backend/storage/buffer/freelist.c
> +++ b/src/backend/storage/buffer/freelist.c
> +/*
> + * GetAccessStrategyWithSize -- create a BufferAccessStrategy object with a
> + *        number of buffers equivalent to the passed in size.
> + *
> + * If the given ring size is 0, no BufferAccessStrategy will be created and
> + * the function will return NULL.  The ring size may not be negative.
> + */
> +BufferAccessStrategy
> +GetAccessStrategyWithSize(BufferAccessStrategyType btype, int ring_size_kb)
> +{
> +    int            ring_buffers;
> +    BufferAccessStrategy strategy;
> +
> +    Assert(ring_size_kb >= 0);
> +
> +    /* Figure out how many buffers ring_size_kb is */
> +    ring_buffers = ring_size_kb / (BLCKSZ / 1024);

Is there any BLCKSZ that could end up rounding down to 0 and resulting
in a divide by 0?

> diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
> index 1b1d814254..011ec18015 100644
> --- a/src/backend/utils/init/globals.c
> +++ b/src/backend/utils/init/globals.c
> @@ -139,7 +139,10 @@ int            max_worker_processes = 8;
>  int            max_parallel_workers = 8;
>  int            MaxBackends = 0;
>  
> -int            VacuumCostPageHit = 1;    /* GUC parameters for vacuum */
> +/* GUC parameters for vacuum */
> +int            VacuumBufferUsageLimit = 256;

So, I know we agreed to make it camel cased, but I couldn't help
mentioning the discussion over in [1] in which Sawada-san says:

> In vacuum.c, we use snake case for GUC parameters and camel case for
> other global variables

Our variable doesn't have a corresponding global that is not a GUC, and
the current pattern is hardly consistent. But, I know we are discussing
following this convention, so I thought I would mention it.

> diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
> index 06a86f9ac1..d4f9ff8077 100644
> --- a/src/include/miscadmin.h
> +++ b/src/include/miscadmin.h
> @@ -263,6 +263,18 @@ extern PGDLLIMPORT double hash_mem_multiplier;
>  extern PGDLLIMPORT int maintenance_work_mem;
>  extern PGDLLIMPORT int max_parallel_maintenance_workers;
>  

GUC name mentioned in comment is inconsistent with current GUC name.

> +/*
> + * Upper and lower hard limits for the buffer access strategy ring size
> + * specified by the vacuum_buffer_usage_limit GUC and BUFFER_USAGE_LIMIT
> + * option to VACUUM and ANALYZE.
> + */
> +#define MIN_BAS_VAC_RING_SIZE_KB 128
> +#define MAX_BAS_VAC_RING_SIZE_KB (16 * 1024 * 1024)

> diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
> index a1fad43657..d23e1a8ced 100644
> --- a/src/test/regress/sql/vacuum.sql
> +++ b/src/test/regress/sql/vacuum.sql
> @@ -272,6 +272,18 @@ SELECT t.relfilenode = :toast_filenode AS is_same_toast_filenode
>    FROM pg_class c, pg_class t
>    WHERE c.reltoastrelid = t.oid AND c.relname = 'vac_option_tab';
>  
> +-- BUFFER_USAGE_LIMIT option
> +VACUUM (BUFFER_USAGE_LIMIT '512 kB') vac_option_tab;

Is it worth adding a VACUUM (BUFFER_USAGE_LIMIT 0) vac_option_tab test?

- Melanie

[1] https://www.postgresql.org/message-id/CAD21AoC5aDwARiqsL%2BKwHqnN7phub9AaMkbGkJ9aUCeETx8esw%40mail.gmail.com



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: monitoring usage count distribution
Next
From: Tom Lane
Date:
Subject: Re: psql \watch 2nd argument: iteration count