Hi all,
While reviewing some code around pg_lsn_in() I came across a couple of
(potential?) issues:
1.
Commit 21f428eb moves lsn conversion functionality from pg_lsn_in() to a new
function pg_lsn_in_internal(). It takes two parameters the lsn string and a
pointer to a boolean (*have_error) to indicate if there was an error while
converting string format to XLogRecPtr.
pg_lsn_in_internal() only sets the *have_error to 'true' if there is an error,
but leaves it for the caller to make sure it was passed by initializing as
'false'. Currently it is only getting called from pg_lsn_in() and timestamptz_in()
where it has been taken care that the flag is set to false before making the
call. But I think in general it opens the door for unpredictable bugs if
pg_lsn_in_internal() gets called from other locations in future (if need
maybe) and by mistake, it just checks the return value of the flag without
setting it to false before making a call.
I am attaching a patch that makes sure that *have_error is set to false in
pg_lsn_in_internal() itself, rather than being caller dependent.
Also, I think there might be callers who may not care if there had been an error
while converting and just ok to use InvalidXLogRecPtr against return value, and
may pass just a NULL boolean pointer to this function, but for now, I have left
that untouched. Maybe just adding an Assert would improve the situation for
time being.
I have attached a patch (fix_have_error_flag.patch) to take care of above.
2.
I happened to peep in test case pg_lsn.sql, and I started exploring the macros
around lsn.
Following macros:
{code}
/*
* Zero is used indicate an invalid pointer. Bootstrap skips the first possible
* WAL segment, initializing the first WAL page at WAL segment size, so no XLOG
* record can begin at zero.
*/
#define InvalidXLogRecPtr 0
#define XLogRecPtrIsInvalid(r) ((r) == InvalidXLogRecPtr)
{code}
IIUC, in the comment above we clearly want to mark 0 as an invalid lsn (also
further IIUC the comment states - lsn would start from (walSegSize + 1)). Given
this, should not it be invalid to allow "0/0" as the value of type pg_lsn, or
for that matter any number < walSegSize?
There is a test scenario in test case pg_lsn.sql which tests insertion of "0/0"
in a table having a pg_lsn column. I think this is contradictory to the comment.
I am not sure of thought behind this and might be wrong while making the above
assumption. But, I tried to look around a bit in hackers emails and could not
locate any related discussion.
I have attached a patch (mark_lsn_0_invalid.patch) that makes above changes.
Thoughts?
Regards,
Jeevan Ladhe