Re: Fetching timeline during recovery - Mailing list pgsql-hackers

From Jehan-Guillaume de Rorthais
Subject Re: Fetching timeline during recovery
Date
Msg-id 20190906170634.089d1dce@firost
Whole thread Raw
In response to Re: Fetching timeline during recovery  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Fetching timeline during recovery
List pgsql-hackers
On Wed, 4 Sep 2019 00:32:03 +0900
Fujii Masao <masao.fujii@gmail.com> wrote:

> On Mon, Jul 29, 2019 at 7:26 PM Jehan-Guillaume de Rorthais
> <jgdr@dalibo.com> wrote:
> >
> > On Fri, 26 Jul 2019 18:22:25 +0200
> > Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote:
> >  
> > > On Fri, 26 Jul 2019 10:02:58 +0200
> > > Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote:
[...]
> > > Please, find in attachment a new version of the patch. It now creates two
> > > new fonctions:
> > >
> > >   pg_current_wal_tl()
> > >   pg_last_wal_received_tl()  
> >
> > I just found I forgot to use PG_RETURN_INT32 in pg_last_wal_received_tl().
> > Please find the corrected patch in attachment:
> > 0001-v3-Add-functions-to-get-timeline.patch  
> 
> Thanks for the patch! Here are some comments from me.

Thank you for your review!

Please, find in attachment the v4 of the patch:
0001-v4-Add-functions-to-get-timeline.patch

Answers bellow.

> You need to write the documentation explaining the functions
> that you're thinking to add.

Done.

> +/*
> + * Returns the current timeline on a production cluster
> + */
> +Datum
> +pg_current_wal_tl(PG_FUNCTION_ARGS)
> 
> The timeline ID that this function returns seems almost
> the same as pg_control_checkpoint().timeline_id,
> when the server is in production. So I'm not sure
> if it's worth adding that new function.

pg_control_checkpoint().timeline_id is read from the controldata file on disk
which is asynchronously updated with the real status of the local cluster.
Right after a promotion, fetching the TL from pg_control_checkpoint() is wrong
and can cause race conditions on client side.

This is the main reason I am working on this patch.

> + currentTL = GetCurrentTimeLine();
> +
> + PG_RETURN_INT32(currentTL);
> 
> Is GetCurrentTimeLine() really necessary? Seems ThisTimeLineID can be
> returned directly since it indicates the current timeline ID in production.

Indeed. I might have over-focused on memory state. ThisTimeLineID seems to be
updated soon enough during the promotion, in fact, even before
XLogCtl->ThisTimeLineID:

        if (ArchiveRecoveryRequested)
        {
        [...]
                ThisTimeLineID = findNewestTimeLine(recoveryTargetTLI) + 1;
        [...]
        }

        /* Save the selected TimeLineID in shared memory, too */
        XLogCtl->ThisTimeLineID = ThisTimeLineID;

> +pg_last_wal_received_tl(PG_FUNCTION_ARGS)
> +{
> + TimeLineID lastReceivedTL;
> + WalRcvData *walrcv = WalRcv;
> +
> + SpinLockAcquire(&walrcv->mutex);
> + lastReceivedTL = walrcv->receivedTLI;
> + SpinLockRelease(&walrcv->mutex);
> 
> I think that it's smarter to use GetWalRcvWriteRecPtr() to
> get the last received TLI, like pg_last_wal_receive_lsn() does.

I has been hesitant between the current implementation and using
GetWalRcvWriteRecPtr(). I choose the current implementation to avoid unnecessary
operations during the spinlock and make it as fast as possible.

However, maybe I'm scratching nothing or just dust here in comparison to
calling GetWalRcvWriteRecPtr() and avoiding minor code duplication.

Being hesitant, v4 of the patch use GetWalRcvWriteRecPtr() as suggested.

> The timeline ID that this function returns is the same as
> pg_stat_wal_receiver.received_tli while walreceiver is running.
> But when walreceiver is not running, pg_stat_wal_receiver returns
> no record, and pg_last_wal_received_tl() would be useful to
> get the timeline only in this case. Is this my understanding right?

Exactly.
 
> > Also, TimeLineID is declared as a uint32. So why do we use
> > PG_RETURN_INT32/Int32GetDatum to return a timeline and not PG_RETURN_UINT32?
> > See eg. in pg_stat_get_wal_receiver().  
> 
> pg_stat_wal_receiver.received_tli is declared as integer.

Oh, right. Thank you.

Thanks,

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera from 2ndQuadrant
Date:
Subject: Re: Unix-domain socket support on Windows
Next
From: Robert Haas
Date:
Subject: Re: add a MAC check for TRUNCATE