Re: Fetching timeline during recovery - Mailing list pgsql-hackers

From Jehan-Guillaume de Rorthais
Subject Re: Fetching timeline during recovery
Date
Msg-id 20190724143327.1ca2bb9a@firost
Whole thread Raw
In response to Re: Fetching timeline during recovery  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Fetching timeline during recovery  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
List pgsql-hackers
Hello Michael,

On Wed, 24 Jul 2019 09:49:05 +0900
Michael Paquier <michael@paquier.xyz> wrote:

> On Tue, Jul 23, 2019 at 06:05:18PM +0200, Jehan-Guillaume de Rorthais wrote:
> > Please, find in attachment a first trivial patch to support
> > pg_walfile_name() and pg_walfile_name_offset() on a standby.
> > Previous restriction on this functions seems related to ThisTimeLineID not
> > being safe on standby. This patch is fetching the timeline from
> > WalRcv->receivedTLI using GetWalRcvWriteRecPtr(). As far as I understand,
> > this is updated each time some data are flushed to the WAL.  
[...]
> Your patch does not count for the case of archive recovery, where
> there is no WAL receiver, and as the shared memory state of the WAL
> receiver is not set 0 would be set.

Indeed. I tested this topic with the following query and was fine with the
NULL result:

  select pg_walfile_name(pg_last_wal_receive_lsn());

I was fine with this result because my use case requires replication anyway. A
NULL result would mean that the node never streamed from the old primary since
its last startup, so a failover should ignore it anyway.

However, NULL just comes from pg_last_wal_receive_lsn() here. The following
query result is wrong:

  > select pg_walfile_name('0/1')
  000000000000000000000000

I fixed that. See patch 0001-v2-* in attachement


> The replay timeline is something we could use here instead via
> GetXLogReplayRecPtr(). CreateRestartPoint actually takes the latest WAL
> receiver or replayed point for its end LSN position, whichever is newer.

I did consider GetXLogReplayRecPtr() or even XLogCtl->replayEndTLI (which is
updated right before the replay). However, both depend on read activity on the
standby. That's why I picked WalRcv->receivedTLI which is updated whatever the
reading activity on the standby.

> > Last, I plan to produce an extension to support this on older release. Is
> > it something that could be integrated in official source tree during a minor
> > release or should I publish it on eg. pgxn?  
> 
> Unfortunately no.  This is a behavior change so it cannot find its way
> into back branches.

Yes, my patch is a behavior change. But here, I was yalking about an
extension, not the core itself, to support this feature in older releases.

> The WAL receiver state is in shared memory and published, so that's easy
> enough to get.  We don't do that for XLogCtl unfortunately.

Both are in shared memory, but WalRcv have a public function to get its
receivedTLI member.

XLogCtl has nothing in public to expose its ThisTimeLineID member. However, from
a module, I'm able to fetch it using:

  XLogCtl = ShmemInitStruct("XLOG Ctl", XLOGShmemSize(), &found);
  SpinLockAcquire(&XLogCtl->info_lck);
  tl = XLogCtl->ThisTimeLineID;
  SpinLockRelease(&XLogCtl->info_lck);

As the "XLOG Ctl" index entry already exists in shmem, ShmemInitStruct returns
the correct structure from there. Not sure this was supposed to be used this
way though...Adding a public function might be cleaner, but it will not help
for older releases.

> I think that there are arguments for being more flexible with it, and perhaps
> have a system-level view to be able to look at some of its fields.

Great idea. I'll give it a try to keep the discussion on.

> There is also a downside with get_controlfile(), which is that it
> fetches directly the data from the on-disk pg_control, and
> post-recovery this only gets updated at the first checkpoint.

Indeed, that's why I started this patch and thread.

Thanks,

Attachment

pgsql-hackers by date:

Previous
From: Chapman Flack
Date:
Subject: Re: benchmarking Flex practices
Next
From: Brendan Jurd
Date:
Subject: Re: Issue in to_timestamp/to_date while handling the quoted literal string