Re: ThisTimeLineID is used uninitialized in basebackup.c, too - Mailing list pgsql-hackers

From Robert Haas
Subject Re: ThisTimeLineID is used uninitialized in basebackup.c, too
Date
Msg-id CA+TgmoYMqQs3tsqx=1NvWB1nppVO9KkXS3j0Nc71wHV8sNyr+g@mail.gmail.com
Whole thread Raw
In response to Re: ThisTimeLineID is used uninitialized in basebackup.c, too  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
On Thu, Oct 28, 2021 at 9:27 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> The first hunk is useless but seems good for sanity. The second hunk
> looks good. (I think we can call CheckXLogRemoved before collecting
> file names but it would be another thing.) The third and fifth hunks
> look fine.

Thanks for your careful review and analysis, with which I agree.
Instead of using starttli and endtli in the first hunk, we could
instead use an arbitrary constant, like 1, or 42. We could even go
completely nuts and write something like (TimeLineID) NBuffers. After
all, if the value doesn't matter, we can use anything. But I find
using the TLI values that are associated with the LSNs to be more
pleasing, and it seems like you agree, so let's do that.

> The fourth hunk is somewhat dubious. The TLI for a new segment is not
> necessarily equal to the skipped segments unless starttli ==
> endtli. So we could change the message like "could not find WAL file
> for segment %X" or such but also I don't oppose to using the tli of
> the new segment as an approximate.

Right. I think that users might be confused if we just tell them the
segment number; they might have a hard time knowing what that really
means. Normally all the TLI values here will be the same, so picking
any one of them and generating a WAL file name based on that will be
more helpful that a segment number from which they have to generate
the WAL file name themselves.

Really, when you stop to think about it, this logic is totally
ridiculous. Imagine that startsegno = 10, endsegno = 15, starttli = 4,
and endtli = 4. The code as written would be happy if it found segment
10 on timeline 4, segment 11 on timeline 3, segment 12 on timeline 2,
segment 13 on timeline 1, and segment 14 on timeline 525600, which,
needless to say, wouldn't work at all if you tried to use the backup.
We only get by with this logic because in practice such things don't
happen. I'd be happy to see someone rewrite this to perform better
checking, but as long as we're making the rather dubious assumption
that timelines somehow don't matter, I think changing the error
message would make things more confusing rather than less.

> As a whole, it looks fine to me.

Great. Thank you.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Nitin Jadhav
Date:
Subject: Re: when the startup process doesn't (logging startup delays)
Next
From: Robert Haas
Date:
Subject: Re: refactoring basebackup.c