Re: Fetching timeline during recovery - Mailing list pgsql-hackers
From | Jehan-Guillaume de Rorthais |
---|---|
Subject | Re: Fetching timeline during recovery |
Date | |
Msg-id | 20200211195110.0751aa6b@firost Whole thread Raw |
In response to | Re: Fetching timeline during recovery (Michael Paquier <michael@paquier.xyz>) |
List | pgsql-hackers |
On Fri, 31 Jan 2020 15:12:30 +0900 Michael Paquier <michael@paquier.xyz> wrote: > On Thu, Jan 23, 2020 at 05:54:08PM +0100, Jehan-Guillaume de Rorthais wrote: > > Please find the new version of the patch in attachment. > > To be honest, I find the concept of this patch confusing. > pg_stat_wal_receiver is just a one-one mapping with the shared memory > state of the WAL receiver itself and show data *if and only if* a WAL > receiver is running and iff it is ready to display any data, so I'd > rather not change its nature If you are talking about the pg_stat_wal_receiver view, I don't have a strong opinion on this anyway as I vote 0 when discussing it. My current patch doesn't alter its nature. > and it has nothing to do with the state of WAL being applied by the startup > process. Indeed, I was feeling this was a bad design to add these columns, as stated in my last mail. So I withdraw this. > So this gets a -1 from me. OK. [...] > Isn't what you are looking for here a different system view which maps > directly to XLogCtl so as you can retrieve the status of the applied > WAL at recovery anytime My main objective is received LSN/TLI. This is kept by WalRcv for streaming. That's why pg_stat_wal_receiver was the very good place for my need. But again, you are right, I shouldn't have add the replied bits to it. > say pg_stat_recovery? I finally dig this path. I was in the hope we could find something simpler and lighter, but other solutions we studied so far (thanks all for your time) were all discarded [1]. A new pg_stat_get_recovery() view might be useful for various monitoring purpose. After poking around in the code, it seems the patch would be bigger than previous solutions, so I prefer discussing the specs first. At a first glance, I would imagine the following columns as a minimal patch: * source: stream, archive or pg_wal * write/flush/replayed LSN * write/flush/replayed TLI This has already some heavy impact in the code. Source might be taken from xlog.c:currentSource, so it should probably be included in XLogCtl to be accessible from any backend. As replayed LSN/TLI comes from XLogCtl too, we might probably need a new dedicated function to gather these fields plus currentSource under the same info_lck. Next, write lsn/tli is not accessible from WalRcv, only flush. So either we do not include it, or we would probably need to replace WalRcv->receivedUpto with existing LogstreamResult. Next, there's no stats about wal shipping recovery. Restoring a WAL from archive do not increment anything about write/flush LSN/TLI. I wonder if both wal_receiver stats and WAL shipping stats might be merged together in the same refactored structure in shmem, as they might share a fair number of field together? This would be pretty invasive in the code, but I feel it's heavier to add another new struct in shmem just to track WAL shipping stats whereas WalRcv already exists there. Now, I think the following additional field might be useful for monitoring. But as this is out my my original scope, I prefer discussing how useful this might be: * start_time: start time of the current source * restored_count: total number of WAL restored. We might want to split this counter to track each method individually. * last_received_time: last time we received something from the current source * last_fail_time: last failure time, whatever the source Thanks for reading up to here! Regards, [1] even if I still hope the pg_stat_get_wal_receiver might still gather some more positive vote :)
pgsql-hackers by date: