Re: ThisTimeLineID can be used uninitialized - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: ThisTimeLineID can be used uninitialized |
Date | |
Msg-id | CA+TgmobSWzacEs+r6C-7DrOPDHoDar4i9gzxB3SCBr5qjnLmVQ@mail.gmail.com Whole thread Raw |
In response to | Re: ThisTimeLineID can be used uninitialized (Andres Freund <andres@anarazel.de>) |
Responses |
Re: ThisTimeLineID can be used uninitialized
|
List | pgsql-hackers |
On Tue, Oct 19, 2021 at 4:44 PM Andres Freund <andres@anarazel.de> wrote: > It's quite confusing. It's *really* not helped by physical replication using > but not really using an xlogreader to keep state. Which presumably isn't > actually used during a physical CreateReplicationSlot(), but is referenced by > a comment :/ I can't figure out what you're referring to here. I can't find CreateReplicationSlot() using an xlogreader in the logical replication case, or a comment that refers to it doing so. > Istm we should introduce an InvalidTimeLineID, and explicitly initialize > sendTimeLine to that, and assert that it's valid / invalid in a bunch of > places? I think the correct fix for this particular problem is just to delete the initialization, as in the attached patch. I spent a long time studying this today and eventually convinced myself that there's just no way these initializations can ever do anything (details in proposed commit message). While it is important that we do not access the global variable when it's uninitialized, here there is no reason to access it in the first place. Regarding the more general problem, I think we should consider (1) reducing the number of places that access ThisTimeLineID directly, preferring to add TimeLineID arguments to functions and pass the relevant timeline value around explicitly and then (2) changing all of the remaining accesses to ThisTimeLineID to function calls instead, e.g. by inventing a function GetCurrentTimeLineID(). Once we do that, I think this kind of problem just goes away. On the one hand, GetCurrentTimeLineID() could assert that the value is valid before returning it, and then we would have centralized checking that we're not using a bogus value. But, there's no reason to stop there. If all the callers are using this function rather than accessing the global variable directly, then the function can just initialize the value from shared memory as required! Or it can forget about having a local copy stored in a global variable and just always read the current value from shared memory! With a little thought, I think this approach can avoid this sort of unfortunate coding: if (!RecoveryInProgress()) read_upto = GetFlushRecPtr(); else read_upto = GetXLogReplayRecPtr(&ThisTimeLineID); tli = ThisTimeLineID; What is going on here? Well, if we're not still in recovery, then the call to RecoveryInProgress() will initialize ThisTimeLineID as a side effect, and after that it can't change. If we are still in recovery then GetXLogReplayRecPtr()'s will update the global variable as a side effect on every trip through the function. Either way, read_upto is the end of WAL in the way that's relevant to whichever operating mode is current. But imagine that we could code this in a way that didn't depend on global variables getting updated as a side effect. For example: if (!RecoveryInProgress()) read_upto = GetFlushRecPtr(); else read_upto = GetXLogReplayRecPtr(); currTLI = GetCurrentTimeLineID(); Or perhaps: if (!RecoveryInProgress()) read_upto = GetFlushRecPtr(&currTLI); else read_upto = GetXLogReplayRecPtr(&currTLI); My point here is that the current idiom only makes sense if you realize that RecoveryInProgress() has a side effect of updating ThisTimeLineID, and on the other hand that the only reason we're using ThisTimeLineID instead of a local variable here is that that's what RecoveryInProgress() updates. It's just two mutually-reinforcing bad decisions. -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: