On 06/19/2017 10:30 AM, Andres Freund wrote:
> Greg Burek from Heroku (CCed) reported a weird issue on IM, that was
> weird enough to be interesting. What he'd observed was that he promoted
> some PITR standby, and early clones of that node work, but later clones
> did not, failing to read some segment.
>
> The problems turns out to be the following: [explanation]
Good detective work!
> The minimal fix here is presumably not to use XLByteToPrevSeg() in
> RemoveXlogFile(), but XLByteToSeg(). I don't quite see what purpose it
> serves here - I don't think it's ever needed.
Agreed, I don't see a reason for it either.
> There seems to be a larger question ehre though: Why does
> XLogFileReadAnyTLI() probe all timelines even if they weren't a parent
> at that period? That seems like a bad idea, especially in more
> complicated scenarios where some precursor timeline might live for
> longer than it was a parent? ISTM XLogFileReadAnyTLI() should check
> which timeline a segment ought to come from, based on the historY?
Yeah. I've had that thought for years as well, but there has never been
any pressing reason to bite the bullet and rewrite it, so I haven't
gotten around to it.
- Heikki