Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
Date
Msg-id ZBYDTrD1kyGg+HkS@telsasoft.com
Whole thread Raw
In response to Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode  (Melanie Plageman <melanieplageman@gmail.com>)
List pgsql-hackers
> Subject: [PATCH v4 3/3] add vacuum[db] option to specify ring_size and guc

> +        Specifies the ring buffer size to be used for a given invocation of
> +        <command>VACUUM</command> or instance of autovacuum. This size is
> +        converted to a number of shared buffers which will be reused as part of

I'd say "specifies the size of shared_buffers to be reused as .."

> +        a <literal>Buffer Access Strategy</literal>. <literal>0</literal> will
> +        disable use of a <literal>Buffer Access Strategy</literal>.
> +        <literal>-1</literal> will set the size to a default of <literal>256
> +        kB</literal>. The maximum ring buffer size is <literal>16 GB</literal>.
> +        Though you may set <varname>vacuum_buffer_usage_limit</varname> below
> +        <literal>128 kB</literal>, it will be clamped to <literal>128
> +        kB</literal> at runtime. The default value is <literal>-1</literal>.
> +        This parameter can be set at any time.

GUC docs usually also say something like 
"If this value is specified without units, it is taken as .."

> +      is used to calculate a number of shared buffers which will be reused as

*the* number?

> +      <command>VACUUM</command>. The analyze stage and parallel vacuum workers
> +      do not use this size.

I think what you mean is that vacuum's heap scan stage uses the
strategy, but the index scan/cleanup phases doesn't?

> +        The size in kB of the ring buffer used for vacuuming. This size is used
> +        to calculate a number of shared buffers which will be reused as part of

*the* number

> +++ b/doc/src/sgml/ref/vacuumdb.sgml

The docs here duplicate the sql-vacuum docs.  It seems better to refer
to the vacuum page for details, like --parallel does.


Unrelated: it would be nice if the client-side options were documented
separately from the server-side options.  Especially due to --jobs and
--parallel.

> +            if (!parse_int(vac_buffer_size, &result, GUC_UNIT_KB, NULL))
> +            {
> +                ereport(ERROR,
> +                        (errcode(ERRCODE_SYNTAX_ERROR),
> +                            errmsg("buffer_usage_limit for a vacuum must be between -1 and %d. %s is invalid.",
> +                                    MAX_BAS_RING_SIZE_KB, vac_buffer_size),
> +                            parser_errposition(pstate, opt->location)));
> +            }
> +
> +            /* check for out-of-bounds */
> +            if (result < -1 || result > MAX_BAS_RING_SIZE_KB)
> +            {
> +                ereport(ERROR,
> +                        (errcode(ERRCODE_SYNTAX_ERROR),
> +                            errmsg("buffer_usage_limit for a vacuum must be between -1 and %d",
> +                                    MAX_BAS_RING_SIZE_KB),
> +                            parser_errposition(pstate, opt->location)));
> +            }

I think these checks could be collapsed into a single ereport().

if !parse_int() || (result < -1 || result > MAX_BAS_RINGSIZE_KB):
    ereport(ERROR,
        errcode(ERRCODE_SYNTAX_ERROR),
        errmsg("buffer_usage_limit for a vacuum must be an integer between -1 and %d",
            MAX_BAS_RING_SIZE_KB),

There was a recent, similar, and unrelated suggestion here:
https://www.postgresql.org/message-id/20230314.135859.260879647537075548.horikyota.ntt%40gmail.com

> +#vacuum_buffer_usage_limit = -1 # size of vacuum buffer access strategy ring.
> +                # -1 to use default,
> +                # 0 to disable vacuum buffer access strategy and use shared buffers

I think it's confusing to say "and use shared buffers", since
"strategies" also use shared_buffers.  It seems better to remove those 4
words.

> @@ -550,6 +563,13 @@ vacuum_one_database(ConnParams *cparams,
>          pg_fatal("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
>                   "--parallel", "13");
>  
> +    // TODO: this is a problem: if the user specifies this option with -1 in a
> +    // version before 16, it will not produce an error message. it also won't
> +    // do anything, but that still doesn't seem right.

Actually, that seems fine to me.  If someone installs v16 vacuumdb, they
can run it against old servers and specify the option as -1 without it
failing with an error.  I don't know if anyone will find that useful,
but it doesn't seem unreasonable.

I still think adding something to the glossary would be good.

Buffer Access Strategy:
A circular/ring buffer used for reading or writing data pages from/to
the operating system.  Ring buffers are used for sequential scans of
large tables, VACUUM, COPY FROM, CREATE TABLE AS SELECT, ALTER TABLE,
and CLUSTER.  By using only a limited portion of >shared_buffers<, the
ring buffer avoids avoids evicting large amounts of data whenever a
backend performs bulk I/O operations.  Use of a ring buffer also forces
the backend to write out its own dirty pages, rather than leaving them
behind to be cleaned up by other backends.

If there's a larger section added than a glossary entry, the text could
be promoted from src/backend/storage/buffer/README to doc/.

-- 
Justin



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: generate_series for timestamptz and time zone problem
Next
From: Joseph Koshakow
Date:
Subject: Re: Infinite Interval