Re: ThisTimeLineID can be used uninitialized - Mailing list pgsql-hackers

From Andres Freund
Subject Re: ThisTimeLineID can be used uninitialized
Date
Msg-id 20211019204359.aakuvtk7tjari6to@alap3.anarazel.de
Whole thread Raw
In response to ThisTimeLineID can be used uninitialized  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: ThisTimeLineID can be used uninitialized  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Re: ThisTimeLineID can be used uninitialized  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2021-10-19 15:13:04 -0400, Robert Haas wrote:
> This is a followup to
> http://postgr.es/m/CA+TgmoZ5A26C6OxKApafyuy_sx0VG6VXdD_Q6aSEzsvrPHDwzw@mail.gmail.com.
> I'm suspicious of the following code in CreateReplicationSlot:
> 
>     /* setup state for WalSndSegmentOpen */
>     sendTimeLineIsHistoric = false;
>     sendTimeLine = ThisTimeLineID;
> 
> The first thing that's odd about this is that if this is physical
> replication, it's apparently dead code, because AFAICT sendTimeLine
> will not be used for anything in that case.

It's quite confusing. It's *really* not helped by physical replication using
but not really using an xlogreader to keep state. Which presumably isn't
actually used during a physical CreateReplicationSlot(), but is referenced by
a comment :/


> But I don't know if it matters. We call CreateInitDecodingContext()
> with sendTimeLine and ThisTimeLineID still zero; it doesn't call any
> callbacks. Then we call DecodingContextFindStartpoint() with
> sendTimeLine still 0 and the first callback that gets invoked is
> logical_read_xlog_page. At this point sendTimeLine = 0 and
> ThisTimeLineID = 0. That calls XLogReadDetermineTimeline() which
> resets ThisTimeLineID to the correct value of 2, but when we get back
> to logical_read_xlog_page, we still manage to call WALRead with a
> timeline of 0 because state->seg.ws_tli is still 0. And when WALRead
> eventually does call WalSndOpen, which unconditionally propagates
> sendTimeLine into the TLI pointer that is passed to it. So now
> state->seg_ws_tli also ends up being 2. So I guess maybe nothing bad
> happens? But it sure seems strange that the code would apparently work
> just as well as it does today with the following patch:
> 
> diff --git a/src/backend/replication/walsender.c
> b/src/backend/replication/walsender.c
> index b811a5c0ef..44fd598519 100644
> --- a/src/backend/replication/walsender.c
> +++ b/src/backend/replication/walsender.c
> @@ -945,7 +945,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
> 
>      /* setup state for WalSndSegmentOpen */
>      sendTimeLineIsHistoric = false;
> -    sendTimeLine = ThisTimeLineID;
> +    sendTimeLine = rand() % 10;
> 
>      if (cmd->kind == REPLICATION_KIND_PHYSICAL)
>      {

Istm we should introduce an InvalidTimeLineID, and explicitly initialize
sendTimeLine to that, and assert that it's valid / invalid in a bunch of
places?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: ALTER INDEX .. RENAME allows to rename tables/views as well
Next
From: "Bossart, Nathan"
Date:
Subject: Re: ALTER INDEX .. RENAME allows to rename tables/views as well