Re: pg_stat_io for the startup process - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: pg_stat_io for the startup process |
Date | |
Msg-id | 20230508210638.qv6k7gui6mnfoxuu@liskov Whole thread Raw |
In response to | Re: pg_stat_io for the startup process (Melih Mutlu <m.melihmutlu@gmail.com>) |
List | pgsql-hackers |
On Wed, May 03, 2023 at 04:11:33PM +0300, Melih Mutlu wrote: > Andres Freund <andres@anarazel.de>, 27 Nis 2023 Per, 19:27 tarihinde şunu yazdı: > > #ifdef WAL_DEBUG > > > if (XLOG_DEBUG || > > > (record->xl_rmid == RM_XACT_ID && > > trace_recovery_messages <= DEBUG2) || > > > @@ -3617,6 +3621,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, > > bool randAccess, > > > /* Do background tasks > > that might benefit us later. */ > > > > > KnownAssignedTransactionIdsIdleMaintenance(); > > > > > > + /* > > > + * Need to disable flush > > timeout to avoid unnecessary > > > + * wakeups. Enable it > > again after a WAL record is read > > > + * in PerformWalRecovery. > > > + */ > > > + > > disable_startup_stat_flush_timeout(); > > > + > > > (void) > > WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch, > > > > > WL_LATCH_SET | WL_TIMEOUT | > > > > > WL_EXIT_ON_PM_DEATH, > > > > I think always disabling the timer here isn't quite right - we want to > > wake up > > *once* in WaitForWALToBecomeAvailable(), otherwise we'll not submit pending > > stats before waiting - potentially for a long time - for WAL. One way > > would be > > just explicitly report before the WaitLatch(). > > > > Another approach, I think better, would be to not use > > enable_timeout_every(), > > and to explicitly rearm the timer in HandleStartupProcInterrupts(). When > > called from WaitForWALToBecomeAvailable(), we'd not rearm, and instead do > > so > > at the end of WaitForWALToBecomeAvailable(). That way we also wouldn't > > repeatedly fire between calls to HandleStartupProcInterrupts(). > > > > Attached patch is probably not doing what you asked. IIUC > HandleStartupProcInterrupts should arm the timer but also shouldn't arm it > if it's called from WaitForWALToBecomeAvailable. But the timer should be > armed again at the very end of WaitForWALToBecomeAvailable. I've been > thinking about how to do this properly. Should HandleStartupProcInterrupts > take a parameter to decide whether the timer needs to be armed? Or need to > add an additional global flag to rearm the timer? Any thoughts? I had the same question about how to determine whether or not to rearm. > From 9be7360e49db424c45c53e85efe8a4f5359b5244 Mon Sep 17 00:00:00 2001 > From: Melih Mutlu <m.melihmutlu@gmail.com> > Date: Wed, 26 Apr 2023 18:21:32 +0300 > Subject: [PATCH v2] Add timeout to flush stats during startup's main replay > loop > diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c > index efc2580536..842394bc8f 100644 > --- a/src/backend/postmaster/startup.c > +++ b/src/backend/postmaster/startup.c > @@ -72,6 +72,9 @@ static TimestampTz startup_progress_phase_start_time; > */ > static volatile sig_atomic_t startup_progress_timer_expired = false; > > +/* Indicates whether flushing stats is needed. */ > +static volatile sig_atomic_t idle_stats_update_pending = false; > + > /* > * Time between progress updates for long-running startup operations. > */ > @@ -206,6 +209,18 @@ HandleStartupProcInterrupts(void) > /* Perform logging of memory contexts of this process */ > if (LogMemoryContextPending) > ProcessLogMemoryContextInterrupt(); > + > + if (idle_stats_update_pending) > + { > + /* It's time to report wal stats. */ If we want dbstats to be updated, we'll probably have to call pgstat_report_stat() here and fix the timestamp issue Horiguchi-san points out upthread. Perhaps you could review those changes and consider adding those as preliminary patches before this in a set. I think you will then need to handle the issue he mentions with partial flushes coming from pgstat_report_stat() and remembering to try and flush stats again in case of a partial flush. Though, perhaps we can just pass force=true. > + pgstat_report_wal(true); > + idle_stats_update_pending = false; > + } Good that you thought to check if the timeout was already active. > + else if (!get_timeout_active(IDLE_STATS_UPDATE_TIMEOUT)) > + { > + /* Set the next timeout. */ > + enable_idle_stats_update_timeout(); > + } > } > > > @@ -385,3 +400,22 @@ has_startup_progress_timeout_expired(long *secs, int *usecs) > > return true; > } > + > +/* Set a flag indicating that it's time to flush wal stats. */ > +void > +idle_stats_update_timeout_handler(void) > +{ > + idle_stats_update_pending = true; Is WakeupRecovery() needed when the timer goes off and the startup process is waiting on a latch? Does this happen in WaitForWalToBecomeAvailable()? > + WakeupRecovery(); > +} > + > +/* Enable the timeout set for wal stat flush. */ > +void > +enable_idle_stats_update_timeout(void) > +{ > + TimestampTz fin_time; > + > + fin_time = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), > + PGSTAT_MIN_INTERVAL); It is a shame we have to end up calling GetCurrentTimestamp() since we are using enable_timeout_at(). Couldn't we use enable_timeout_after()? > + enable_timeout_at(IDLE_STATS_UPDATE_TIMEOUT, fin_time); > +} - Melanie
pgsql-hackers by date: