Thread: Strange error in new 003_cic_2pc.pl test
tadarida just reported this [1]: 2021-11-11 14:40:40.040 CET [29620:388] 023_cic_2pc.pl ERROR: could not read two-phase state from WAL at 0/1716C68 That error message is remarkably unhelpful, and I think unnecessarily so. Why the heck isn't XlogReadTwoPhaseData passing on the error string that it just got from XLogReadRecord? Now, XLogReadRecord's API spec does say it might return NULL errormsg if "the page read callback already reported the error". But there's no sign of that in tadarida's log. Anyway, I have in mind to do ereport(ERROR, (errcode_for_file_access(), - errmsg("could not read two-phase state from WAL at %X/%X", - LSN_FORMAT_ARGS(lsn)))); + errmsg("could not read two-phase state from WAL at %X/%X: %s", + LSN_FORMAT_ARGS(lsn), + errormsg ? errormsg : "(no details available)"))); But first, I'm going to look around for other XLogReadRecord callers that might be equally slipshod. (Identifying the actual underlying bug, if there is one, is a different problem. This test being so new, I'd not be surprised if it is showing us a real-but-intermittent issue. But perhaps fixing this error report will yield more info.) regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tadarida&dt=2021-11-11%2013%3A29%3A58
> tadarida just reported this [1]: > > 2021-11-11 14:40:40.040 CET [29620:388] 023_cic_2pc.pl ERROR: could not read two-phase state from WAL at 0/1716C68 > > That error message is remarkably unhelpful, and I think unnecessarily so. > Why the heck isn't XlogReadTwoPhaseData passing on the error string > that it just got from XLogReadRecord? > > Now, XLogReadRecord's API spec does say it might return NULL errormsg > if "the page read callback already reported the error". But there's > no sign of that in tadarida's log. Anyway, I have in mind to do > > ereport(ERROR, > (errcode_for_file_access(), > - errmsg("could not read two-phase state from WAL at %X/%X", > - LSN_FORMAT_ARGS(lsn)))); > + errmsg("could not read two-phase state from WAL at %X/%X: %s", > + LSN_FORMAT_ARGS(lsn), > + errormsg ? errormsg : "(no details available)"))); > Noah recently proposed the same [0]. But responders like me started to complain about other places without error reportingaround xlog reading. Probably I had to be more clear on the matter. Error logging certainly worth doing, at least this. Thanks! Best regards, Andrey Borodin. [0] https://www.postgresql.org/message-id/flat/20211107013157.GB790288%40rfd.leadboat.com
Andrey Borodin <x4mmm@yandex-team.ru> writes: >> That error message is remarkably unhelpful, and I think unnecessarily so. >> Why the heck isn't XlogReadTwoPhaseData passing on the error string >> that it just got from XLogReadRecord? > Noah recently proposed the same [0]. But responders like me started to complain about other places without error reportingaround xlog reading. Ah, I missed that thread. Now that we've seen another instance of the same failure, I agree it's urgent to get something committed. Noah's patch matches the style of some other callers, so let's go with that. (And yeah, why are some of the other callers ignoring errormsg? But that can wait.) regards, tom lane