Re: pg_stat_io for the startup process - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: pg_stat_io for the startup process
Date
Msg-id 20230508210638.qv6k7gui6mnfoxuu@liskov
Whole thread Raw
In response to Re: pg_stat_io for the startup process  (Melih Mutlu <m.melihmutlu@gmail.com>)
List pgsql-hackers
On Wed, May 03, 2023 at 04:11:33PM +0300, Melih Mutlu wrote:
> Andres Freund <andres@anarazel.de>, 27 Nis 2023 Per, 19:27 tarihinde şunu yazdı:
> >  #ifdef WAL_DEBUG
> > >                       if (XLOG_DEBUG ||
> > >                               (record->xl_rmid == RM_XACT_ID &&
> > trace_recovery_messages <= DEBUG2) ||
> > > @@ -3617,6 +3621,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr,
> > bool randAccess,
> > >                                               /* Do background tasks
> > that might benefit us later. */
> > >
> >  KnownAssignedTransactionIdsIdleMaintenance();
> > >
> > > +                                             /*
> > > +                                              * Need to disable flush
> > timeout to avoid unnecessary
> > > +                                              * wakeups. Enable it
> > again after a WAL record is read
> > > +                                              * in PerformWalRecovery.
> > > +                                              */
> > > +
> >  disable_startup_stat_flush_timeout();
> > > +
> > >                                               (void)
> > WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch,
> > >
> >       WL_LATCH_SET | WL_TIMEOUT |
> > >
> >       WL_EXIT_ON_PM_DEATH,
> >
> > I think always disabling the timer here isn't quite right - we want to
> > wake up
> > *once* in WaitForWALToBecomeAvailable(), otherwise we'll not submit pending
> > stats before waiting - potentially for a long time - for WAL. One way
> > would be
> > just explicitly report before the WaitLatch().
> >
> > Another approach, I think better, would be to not use
> > enable_timeout_every(),
> > and to explicitly rearm the timer in HandleStartupProcInterrupts(). When
> > called from WaitForWALToBecomeAvailable(), we'd not rearm, and instead do
> > so
> > at the end of WaitForWALToBecomeAvailable().  That way we also wouldn't
> > repeatedly fire between calls to HandleStartupProcInterrupts().
> >
> 
> Attached patch is probably not doing what you asked. IIUC
> HandleStartupProcInterrupts should arm the timer but also shouldn't arm it
> if it's called from  WaitForWALToBecomeAvailable. But the timer should be
> armed again at the very end of WaitForWALToBecomeAvailable. I've been
> thinking about how to do this properly. Should HandleStartupProcInterrupts
> take a parameter to decide whether the timer needs to be armed? Or need to
> add an additional global flag to rearm the timer? Any thoughts?

I had the same question about how to determine whether or not to rearm.

> From 9be7360e49db424c45c53e85efe8a4f5359b5244 Mon Sep 17 00:00:00 2001
> From: Melih Mutlu <m.melihmutlu@gmail.com>
> Date: Wed, 26 Apr 2023 18:21:32 +0300
> Subject: [PATCH v2] Add timeout to flush stats during startup's main replay
>  loop
> diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
> index efc2580536..842394bc8f 100644
> --- a/src/backend/postmaster/startup.c
> +++ b/src/backend/postmaster/startup.c
> @@ -72,6 +72,9 @@ static TimestampTz startup_progress_phase_start_time;
>   */
>  static volatile sig_atomic_t startup_progress_timer_expired = false;
>  
> +/* Indicates whether flushing stats is needed. */
> +static volatile sig_atomic_t idle_stats_update_pending = false;
> +
>  /*
>   * Time between progress updates for long-running startup operations.
>   */
> @@ -206,6 +209,18 @@ HandleStartupProcInterrupts(void)
>      /* Perform logging of memory contexts of this process */
>      if (LogMemoryContextPending)
>          ProcessLogMemoryContextInterrupt();
> +
> +    if (idle_stats_update_pending)
> +    {
> +        /* It's time to report wal stats. */

If we want dbstats to be updated, we'll probably have to call
pgstat_report_stat() here and fix the timestamp issue Horiguchi-san
points out upthread. Perhaps you could review those changes and consider
adding those as preliminary patches before this in a set.

I think you will then need to handle the issue he mentions with partial
flushes coming from pgstat_report_stat() and remembering to try and
flush stats again in case of a partial flush. Though, perhaps we can
just pass force=true.

> +        pgstat_report_wal(true);
> +        idle_stats_update_pending = false;
> +    }

Good that you thought to check if the timeout was already active.

> +    else if (!get_timeout_active(IDLE_STATS_UPDATE_TIMEOUT))
> +    {
> +        /* Set the next timeout. */
> +        enable_idle_stats_update_timeout();
> +    }
>  }
>  
>  
> @@ -385,3 +400,22 @@ has_startup_progress_timeout_expired(long *secs, int *usecs)
>  
>      return true;
>  }
> +
> +/* Set a flag indicating that it's time to flush wal stats. */
> +void
> +idle_stats_update_timeout_handler(void)
> +{
> +    idle_stats_update_pending = true;

Is WakeupRecovery() needed when the timer goes off and the startup
process is waiting on a latch? Does this happen in
WaitForWalToBecomeAvailable()?

> +    WakeupRecovery();
> +}
> +
> +/* Enable the timeout set for wal stat flush. */
> +void
> +enable_idle_stats_update_timeout(void)
> +{
> +    TimestampTz fin_time;
> +
> +    fin_time = TimestampTzPlusMilliseconds(GetCurrentTimestamp(),
> +                                           PGSTAT_MIN_INTERVAL);

It is a shame we have to end up calling GetCurrentTimestamp() since we
are using enable_timeout_at(). Couldn't we use enable_timeout_after()?

> +    enable_timeout_at(IDLE_STATS_UPDATE_TIMEOUT, fin_time);
> +}

- Melanie



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: base backup vs. concurrent truncation
Next
From: Andres Freund
Date:
Subject: Re: pg_stat_io for the startup process