Re: Changing the state of data checksums in a running cluster - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Changing the state of data checksums in a running cluster
Date
Msg-id d4265f6d-ae6a-4a52-999a-092a883d9f46@iki.fi
Whole thread Raw
In response to Re: Changing the state of data checksums in a running cluster  (Daniel Gustafsson <daniel@yesql.se>)
Responses Re: Changing the state of data checksums in a running cluster
List pgsql-hackers
It'd be good to print a LOG message when the checksums state changes, to 
have a trail in the log of when checksums were enabled/disabled. 
Something like:

LOG:  enabling checksums was requested, starting checksum calculation
...
LOG:  checksums calculation finished, checksums are now enabled


On 02/04/2026 02:01, Daniel Gustafsson wrote:
> +        if (result == DATACHECKSUMSWORKER_FAILED)
> +        {
> +            /*
> +             * Disable checksums on cluster, because we failed one of the
> +             * databases and this is an all or nothing process.
> +             */
> +            SetDataChecksumsOff();
> +            ereport(ERROR,
> +                    errcode(ERRCODE_INSUFFICIENT_RESOURCES),
> +                    errmsg("data checksums failed to get enabled in all databases, aborting"),
> +                    errhint("The server log might have more information on the cause of the error."));
> +        }

This got me thinking, what happens if the the data checksums launcher 
encounters some other error, for example if you SIGTERM it? The system 
is left in 'inprogress-on' state, but because the launcher is gone it 
will never finish and 'pg_stat_progress_data_checksums' will be empty.

Perhaps launcher_exit() should call SetDataChecksumsOff()?

>         /*
>          * TODO: how to really handle the worker still running when the
>          * launcher exits?
>          */
>         if (DataChecksumsWorkerShmem->worker_running)
>             ereport(LOG,
>                     errmsg("data checksums launcher exiting while worker is still running"));

That TODO should be addressed somehow.

> +        /*
> +         * As of now we only update the block counter for main forks in order
> +         * to not cause too frequent calls. TODO: investigate whether we
> +         * should do it more frequent?
> +         */
> +        if (forkNum == MAIN_FORKNUM)
> +            pgstat_progress_update_param(PROGRESS_DATACHECKSUMS_BLOCKS_DONE,
> +                                         (blknum + 1));

We're updating it for every block in the main fork, but not at all for 
other forks? What a bizarre way of avoiding too frequent calls :-). I 
think you could just call this on every page, 
pgstat_progress_update_param() is supposed to be very fast. For 
comparison, e.g. index build calls it for every tuple.

> +/*
> + * Configuration of conditions which must match when absorbing a procsignal
> + * barrier during data checksum enable/disable operations.  A single function
> + * is used for absorbing all barriers, and the set of conditions to use is
> + * looked up in the checksum_barriers struct.  The struct member for the target
> + * state defines which state the backend must currently be in, and which it
> + * must not be in.
> + *
> + * The reason for this explicit checking is to ensure that processing cannot
> + * be started such that it breaks the assumptions of the state machine.
> + *
> + * MAX_BARRIER_CONDITIONS must match largest number of sets in barrier_eq and
> + * barrier_ne in the below checksum_barriers definition.
> + */
> +#define MAX_BARRIER_CONDITIONS 2
> +typedef struct ChecksumBarrierCondition
> +{
> +    /* The target state of the barrier */
> +    int            target;
> +    /* A set of states in which at least one MUST match the current state */
> +    int            barrier_eq[MAX_BARRIER_CONDITIONS];
> +    /* The number of elements in the barrier_eq set */
> +    int            barrier_eq_sz;
> +    /* A set of states which all MUST NOT match the current state */
> +    int            barrier_ne[MAX_BARRIER_CONDITIONS];
> +    /* The number of elements in the barrier_ne set */
> +    int            barrier_ne_sz;
> +} ChecksumBarrierCondition;
> +
> +static const ChecksumBarrierCondition checksum_barriers[4] =
> +{
> +    /*
> +     * When disabling checksums, either inprogress state is Ok but checksums
> +     * must not be in the enabled state.
> +     */
> +    {
> +        .target = PG_DATA_CHECKSUM_OFF,
> +        .barrier_eq = {PG_DATA_CHECKSUM_INPROGRESS_ON, PG_DATA_CHECKSUM_INPROGRESS_OFF},
> +        .barrier_eq_sz = 2,
> +        .barrier_ne = {PG_DATA_CHECKSUM_VERSION},
> +        .barrier_ne_sz = 1
> +    },
> +    /* When enabling the current state must be inprogress-on */
> +    {
> +        .target = PG_DATA_CHECKSUM_VERSION,
> +        .barrier_eq = {PG_DATA_CHECKSUM_INPROGRESS_ON},
> +        .barrier_eq_sz = 1,
> +        {0}, 0
> +    },
> +
> +    /*
> +     * When moving to inprogress-on the current state cannot enabled, but when
> +     * moving to inprogress-off the current state must be enabled.
> +     */
> +    {
> +        .target = PG_DATA_CHECKSUM_INPROGRESS_ON,
> +        {0}, 0,
> +        .barrier_ne = {PG_DATA_CHECKSUM_VERSION},
> +        .barrier_ne_sz = 1
> +    },
> +    {
> +        .target = PG_DATA_CHECKSUM_INPROGRESS_OFF,
> +        .barrier_eq = {PG_DATA_CHECKSUM_VERSION},
> +        .barrier_eq_sz = 1,
> +        {0}, 0
> +    },
> +};

I find this to still be a pretty complicated and unclear way of 
representing the allowed transitions. There are only 16 possible 
transitions, and only 6 of them are allowed. How about listing the 
allowed ones directly:

/* Allowed transitions: from, to */
{
     /*
      * Disabling checksums: If checksums are currently enabled,
      * disabling must go through the 'inprogress-off' state.
      */
     {PG_DATA_CHECKSUM_VERSION, PG_DATA_CHECKSUM_INPROGRESS_OFF},
     {PG_DATA_CHECKSUM_INPROGRESS_OFF, PG_DATA_CHECKSUM_OFF},

     /*
      * If checksums are in the process of being enabled, but are
      * not yet being verified, we can abort by going back to 'off'
      * state.
      */
     {PG_DATA_CHECKSUM_INPROGRESS_ON, PG_DATA_CHECKSUM_OFF},

     /*
      * Enabling checksums must normally go through the 'inprogress-on'
      * state.
      */
     {PG_DATA_CHECKSUM_OFF, PG_DATA_CHECKSUM_INPROGRESS_ON},
     {PG_DATA_CHECKSUM_INPROGRESS_ON, PG_DATA_CHECKSUM_VERSION},

     /*
      * If checksums are being disabled but all backends are still
      * computing checksums, we can go straight back to 'on'
      */
     {PG_DATA_CHECKSUM_INPROGRESS_OFF, PG_DATA_CHECKSUM_VERSION},
}

> +/*
> + * Signaling between backends calling pg_enable/disable_data_checksums, the
> + * checksums launcher process, and the checksums worker process.
> + *
> + * This struct is protected by DataChecksumsWorkerLock
> + */
> +typedef struct DataChecksumsWorkerShmemStruct
> +{
> +    /*
> +     * These are set by pg_{enable|disable|verify}_data_checksums, to tell the
> +     * launcher what the target state is.
> +     */
> +    DataChecksumsWorkerOperation launch_operation;
> +    int            launch_cost_delay;
> +    int            launch_cost_limit;

The naming feels a little weird with this struct. It's called 
"DataChecksumsWorkerShmemStruct", but it's also accessed by the backends 
calling pg_enable/disable_data_checksums(). And 
"DataChecksumsWorkerOperation" is not accessed by workers at all. Or I 
guess the "operation" global variable is used in 
DataChecksumsWorkerMain(), but it's always set to ENABLE_DATACHECKSUMS 
in the worker. Do you need the "operation" global variable at all?

> +void
> +SetDataChecksumsOn(void)
> +{
> +    uint64        barrier;
> +
> +    Assert(ControlFile != NULL);
> +
> +    SpinLockAcquire(&XLogCtl->info_lck);
> +
> +    /*
> +     * The only allowed state transition to "on" is from "inprogress-on" since
> +     * that state ensures that all pages will have data checksums written.
> +     */
> +    if (XLogCtl->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_ON)
> +    {
> +        SpinLockRelease(&XLogCtl->info_lck);
> +        elog(PANIC, "checksums not in \"inprogress-on\" mode");
> +    }
> +
> +    SpinLockRelease(&XLogCtl->info_lck);

The PANIC seems a little harsh, you haven't done anything destructive 
here. It's unexpected for this to be called in any other state, so this 
is a "can't happen" scenario, but I don't think we usually PANIC on those.

> +       <para>
> +        If <parameter>cost_delay</parameter> and <parameter>cost_limit</parameter> are
> +        specified, the process is throttled using the same principles as
> +        <link linkend="runtime-config-resource-vacuum-cost">Cost-based Vacuum Delay</link>.
> +       </para>

Ugh, yet another place where we expose the "cost delay/limit" throttling 
mechanism. I agree it's good to be consistent here and use the same 
method we use for vacuum, I just wish we had something more user-friendly..


Grammar / spelling:

> + * state will also be set of "off".

> +     * When moving to inprogress-on the current state cannot enabled, but when

> +     * If a worker process currently running?  This is set by the worker

> +     * These are set by pg_{enable|disable|verify}_data_checksums, to tell the

there is no "pg_verify_data_checksums" function.

"calcuated" in commit message

- Heikki




pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: POC: Parallel processing of indexes in autovacuum
Next
From: Imran Zaheer
Date:
Subject: Silence -Wmaybe-uninitialized warnings