Thread: Two noncritical bugs of pg_waldump
Hello. pg_waldump complains at the end in any case. I noticed that the LSN it shows in the finish message is incorrect. (I faintly thought that I posted about this but I didn't find it..) > pg_waldump: fatal: error in WAL record at 0/15073F8: invalid record length at 0/1507470: wanted 24, got 0 xlogreader found the error at the record begins at 1507470, but pg_waldump tells that error happens at 15073F8, which is actually the beginning of the last sound record. If I give an empty file to the tool it complains as the follows. > pg_waldump: fatal: could not read file "hoge": No such file or directory No, the file exists. The cause is it reads uninitialized errno to detect errors from the system call. read(2) is defined to set errno always when it returns -1 and doesn't otherwise. Thus it seems to me that it is better to check that the return value is less than zero than to clear errno before the call to read(). regards. -- Kyotaro Horiguchi NTT Open Source Software Center From 773f14f01c4a5cfd334d0a80778aa819c55c1cb7 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Thu, 27 Jan 2022 09:47:10 +0900 Subject: [PATCH] Fix errornious messages of pg_waldump Fix pg_waldump to give the LSN where read error happens instead of the LSN of the last record successfully read. And fix to give the correct error message for zero-sized blocks instead of a bogus system-call error message. --- src/bin/pg_waldump/pg_waldump.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c index a6251e1a96..58dc4044b5 100644 --- a/src/bin/pg_waldump/pg_waldump.c +++ b/src/bin/pg_waldump/pg_waldump.c @@ -222,15 +222,12 @@ search_directory(const char *directory, const char *fname) WalSegSz), fname, WalSegSz); } + else if (r < 0) + fatal_error("could not read file \"%s\": %m", + fname); else - { - if (errno != 0) - fatal_error("could not read file \"%s\": %m", - fname); - else - fatal_error("could not read file \"%s\": read %d of %d", - fname, r, XLOG_BLCKSZ); - } + fatal_error("could not read file \"%s\": read %d of %d", + fname, r, XLOG_BLCKSZ); close(fd); return true; } @@ -1177,7 +1174,7 @@ main(int argc, char **argv) if (errormsg) fatal_error("error in WAL record at %X/%X: %s", - LSN_FORMAT_ARGS(xlogreader_state->ReadRecPtr), + LSN_FORMAT_ARGS(xlogreader_state->EndRecPtr), errormsg); XLogReaderFree(xlogreader_state); -- 2.27.0
On Thu, Jan 27, 2022 at 10:07:38AM +0900, Kyotaro Horiguchi wrote: > pg_waldump complains at the end in any case. I noticed that the LSN > it shows in the finish message is incorrect. (I faintly thought that > I posted about this but I didn't find it..) Is this thread [0] what you are remembering? [0] https://postgr.es/m/2B4510B2-3D70-4990-BFE3-0FE64041C08A%40amazon.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com/
At Wed, 26 Jan 2022 17:25:14 -0800, Nathan Bossart <nathandbossart@gmail.com> wrote in > On Thu, Jan 27, 2022 at 10:07:38AM +0900, Kyotaro Horiguchi wrote: > > pg_waldump complains at the end in any case. I noticed that the LSN > > it shows in the finish message is incorrect. (I faintly thought that > > I posted about this but I didn't find it..) > > Is this thread [0] what you are remembering? > > [0] https://postgr.es/m/2B4510B2-3D70-4990-BFE3-0FE64041C08A%40amazon.com Maybe exactly. I rememberd the discussion. So the issue there is neither EndRecPtr and ReadRecPtr always points to the current read LSN. The first proposal from Nathen was to use currRecPtr but it was a private member. But after discussion, it seems to me it is (at least now) exactly what we need here so if we ofccially exposed the member the problem would be blown away. Do you want to conitnue that? Otherwise, I think we need to add a comment there about the known issue. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
(this is off-topic) At Thu, 27 Jan 2022 13:23:06 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > At Wed, 26 Jan 2022 17:25:14 -0800, Nathan Bossart <nathandbossart@gmail.com> wrote in > > On Thu, Jan 27, 2022 at 10:07:38AM +0900, Kyotaro Horiguchi wrote: > > > pg_waldump complains at the end in any case. I noticed that the LSN > > > it shows in the finish message is incorrect. (I faintly thought that > > > I posted about this but I didn't find it..) > > > > Is this thread [0] what you are remembering? > > > > [0] https://postgr.es/m/2B4510B2-3D70-4990-BFE3-0FE64041C08A%40amazon.com > > Maybe exactly. I rememberd the discussion. > > So the issue there is neither EndRecPtr and ReadRecPtr always points > to the current read LSN. The first proposal from Nathen was to use Mmm. Sorry for my fat finger, Nathan. > currRecPtr but it was a private member. But after discussion, it > seems to me it is (at least now) exactly what we need here so if we > ofccially exposed the member the problem would be blown away. Do you > want to conitnue that? > > Otherwise, I think we need to add a comment there about the known > issue. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Jan 27, 2022 at 01:23:06PM +0900, Kyotaro Horiguchi wrote: > So the issue there is neither EndRecPtr and ReadRecPtr always points > to the current read LSN. The first proposal from Nathen was to use > currRecPtr but it was a private member. But after discussion, it > seems to me it is (at least now) exactly what we need here so if we > ofccially exposed the member the problem would be blown away. Do you > want to conitnue that? Presumably there is a good reason for keeping it private, but if not, maybe that is a reasonable approach to consider. I don't think we wanted to proceed with the approaches discussed on the old thread, anyway. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com/
On Thu, Jan 27, 2022 at 01:27:36PM +0900, Kyotaro Horiguchi wrote: > At Thu, 27 Jan 2022 13:23:06 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in >> So the issue there is neither EndRecPtr and ReadRecPtr always points >> to the current read LSN. The first proposal from Nathen was to use > > Mmm. Sorry for my fat finger, Nathan. It's okay. :) -- Nathan Bossart Amazon Web Services: https://aws.amazon.com/
Hmm.. At Thu, 27 Jan 2022 10:07:38 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > pg_waldump complains at the end in any case. I noticed that the LSN > it shows in the finish message is incorrect. (I faintly thought that > I posted about this but I didn't find it..) > > > pg_waldump: fatal: error in WAL record at 0/15073F8: invalid record length at 0/1507470: wanted 24, got 0 > > xlogreader found the error at the record begins at 1507470, but > pg_waldump tells that error happens at 15073F8, which is actually the > beginning of the last sound record. It is arguable, but the following is indisputable. > If I give an empty file to the tool it complains as the follows. > > > pg_waldump: fatal: could not read file "hoge": No such file or directory > > No, the file exists. The cause is it reads uninitialized errno to > detect errors from the system call. read(2) is defined to set errno > always when it returns -1 and doesn't otherwise. Thus it seems to me > that it is better to check that the return value is less than zero > than to clear errno before the call to read(). So I post a patch contains only the indisputable part. regards. -- Kyotaro Horiguchi NTT Open Source Software Center From f43cf938b0686dc406a1c81863a6415de1190b40 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Mon, 14 Feb 2022 18:11:19 +0900 Subject: [PATCH v1] Fix incorrect error handling of pg_waldump search_directory of pg_waldump.c puts a wrong assumption about the return value from read(2) and results in a bogus error message. It should check (r < 0) instead of (r != XLOG_BLCKSZ) to correctly detect errors that %m work correctly on. --- src/bin/pg_waldump/pg_waldump.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c index a6251e1a96..2340dc247b 100644 --- a/src/bin/pg_waldump/pg_waldump.c +++ b/src/bin/pg_waldump/pg_waldump.c @@ -222,15 +222,12 @@ search_directory(const char *directory, const char *fname) WalSegSz), fname, WalSegSz); } + else if (r < 0) + fatal_error("could not read file \"%s\": %m", + fname); else - { - if (errno != 0) - fatal_error("could not read file \"%s\": %m", - fname); - else - fatal_error("could not read file \"%s\": read %d of %d", - fname, r, XLOG_BLCKSZ); - } + fatal_error("could not read file \"%s\": read %d of %d", + fname, r, XLOG_BLCKSZ); close(fd); return true; } -- 2.27.0
Hi, On 2022-02-14 18:18:47 +0900, Kyotaro Horiguchi wrote: > > If I give an empty file to the tool it complains as the follows. > > > > > pg_waldump: fatal: could not read file "hoge": No such file or directory > > > > No, the file exists. The cause is it reads uninitialized errno to > > detect errors from the system call. read(2) is defined to set errno > > always when it returns -1 and doesn't otherwise. Thus it seems to me > > that it is better to check that the return value is less than zero > > than to clear errno before the call to read(). > > So I post a patch contains only the indisputable part. Thanks for the report and fix. Pushed. This was surprisingly painful, all but one branch had conflicts... Greetings, Andres Freund
At Fri, 25 Feb 2022 10:48:47 -0800, Andres Freund <andres@anarazel.de> wrote in > Hi, > > On 2022-02-14 18:18:47 +0900, Kyotaro Horiguchi wrote: > > > If I give an empty file to the tool it complains as the follows. > > > > > > > pg_waldump: fatal: could not read file "hoge": No such file or directory > > > > > > No, the file exists. The cause is it reads uninitialized errno to > > > detect errors from the system call. read(2) is defined to set errno > > > always when it returns -1 and doesn't otherwise. Thus it seems to me > > > that it is better to check that the return value is less than zero > > > than to clear errno before the call to read(). > > > > So I post a patch contains only the indisputable part. > > Thanks for the report and fix. Pushed. This was surprisingly painful, all but > one branch had conflicts... Ah, I didn't expect that this is committed so quickly. I should have created patches for all versions. Anyway thanks for committing this! regards. -- Kyotaro Horiguchi NTT Open Source Software Center