I previously reported[1] that CreateReplicationSlot() was accessing
the global variable ThisTimeLineID when it might not be initialized.
Today I realized that the code in basebackup.c has the same problem.
perform_base_backup() blithely uses the variable six times without
doing anything at all to initialize it. In practice, it's always going
to be initialized to some non-zero value, because pg_basebackup is
always going to execute IDENTIFY_SYSTEM before it executes
BASE_BACKUP, and that's going to set ThisTimeLineID as a side effect.
But, if you hack pg_basebackup to not call IDENTIFY_SYSTEM before
calling BASE_BACKUP, then you can reach this code with ThisTimeLineID
== 0 using pg_basebackup -Xfetch. Even if you don't do that, you're
only guaranteed that ThisTimeLineID is initialized to something, not
that it's initialized to the correct thing. The timeline on the
standby can change any time.
Fortunately, this has no serious consequences AFAICT. Heikki wrote
"I'd rather not worry about timelines here" and wrote the algorithm in
such a way that the only thing that happens if you feed a wrong TLI
through the logic is that the wrong TLI might be used to construct a
WAL file name for use in an error report. You won't get an error when
things should have worked, or the other way around, but if you get a
complaint about a segment file, it might include the wrong TLI in the
filename. I think possibly this logic should be rewritten to be fully
timeline-aware, but that's a bit of a project and this isn't really
broken, so what I'd like to do for right now is change it to use
non-bogus TLIs that we have available in local variables rather than a
possibly-nonsensical value from a global variable. Patch for that
attached.
--
Robert Haas
EDB: http://www.enterprisedb.com
[1] http://postgr.es/m/CA+TgmoZC2aYe9eRXcUa_c_XBBMVLps9LZL_K5Oo7M9Xhco2KhA@mail.gmail.com