Timeline following is a bit tangled... - Mailing list pgsql-hackers

From Craig Ringer
Subject Timeline following is a bit tangled...
Date
Msg-id CAMsr+YH83PKiZY3iqnu6YRboUJAU3k+sRMe=X6p+5VLHxf9fFg@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Hi all

As part of the work I did on timeline following for logical decoding I mapped out the various code paths relating to timeline following in Pg.


It's surprisingly complex (to me), with lots of completely separate logic for each different path. Redo has one way to decide when to switch timelines and which WAL segment to read from. The walsender has two, one for physical replication and one (with a small overlap) for logical replication. Logical replication over the SQL interface now has another, which overlaps mostly but not entirely with the logical walsender one.

One thing that makes it very hard to follow the code (IMO) is that the xlogreader is totally timeline agnostic. The xlogreader's callers decide which timeline to read from and when to switch timelines. The actual WAL segment to read from is determined by the read page callback that the xlogreader invokes. The callback figures out the timeline by looking "around" the xlogreader at global state in xlog.c (for redo) or walsender.c (for phys/logical walsender).

Each place has its own logic for things like the early timeline switch required to ensure that we read from a segment that's actually locally present, since older timelines of the same segment won't be present or will be renamed .partial .

I'd like to reduce the duplication here and try to make it a bit easier to follow. If doing so doesn't seem worth the (undeniable) risks when messing with redo then I'll just leave it untouched, I don't feel so strongly about it as all that.


Because physical rep doesn't use the xlogreader it doesn't make sense to just add timeline following to the xlogreader directly. It has to be separate, usable by physical rep and the xlogreader. I think it should be reasonable to have them both use the same state struct and function though, where they can just call a func before reading each page to update the timeline to read from if needed, then have their page read callback use that timeline. It can keep track of the next timeline, the TLI switchpoint, whether the timeline became historical since the last page was read, a copy of the latest timeline history, etc etc, and should probably live in timeline.c.

I'm not especially thrilled with the code I wrote for logical decoding timeline following there, and I'm actually more inclined to base that logic on the physical walsender's code, which is IMO the clearest we currently have. Extract it, move its state globals into a struct, generalize it.

Sound not completely insane?

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: EXPLAIN VERBOSE with parallel Aggregate
Next
From: Craig Ringer
Date:
Subject: Re: Timeline following for logical slots