Re: detailed error message of pg_waldump - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: detailed error message of pg_waldump
Date
Msg-id CAD21AoDRP2KpQ1BuOG7BjgZhMX+7ksTghp9TdNjM_jU25QgXEA@mail.gmail.com
Whole thread Raw
In response to Re: detailed error message of pg_waldump  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
On Wed, Jun 16, 2021 at 5:36 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> Thanks!
>
> At Wed, 16 Jun 2021 16:52:11 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
> > On Fri, Jun 4, 2021 at 5:35 PM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> > >
> > > In a very common operation of accidentally specifying a recycled
> > > segment, pg_waldump often returns the following obscure message.
> > >
> > > $ pg_waldump 00000001000000000000002D
> > > pg_waldump: fatal: could not find a valid record after 0/2D000000
> > >
> > > The more detailed message is generated internally and we can use it.
> > > That looks like the following.
> > >
> > > $ pg_waldump 00000001000000000000002D
> > > pg_waldump: fatal: unexpected pageaddr 0/24000000 in log segment 00000001000000000000002D, offset 0
> > >
> > > Is it work doing?
> >
> > Perhaps we need both? The current message describes where the error
> > happened and the message internally generated describes the details.
> > It seems to me that both are useful. For example, if we find an error
> > during XLogReadRecord(), we show both as follows:
> >
> >    if (errormsg)
> >        fatal_error("error in WAL record at %X/%X: %s",
> >                    LSN_FORMAT_ARGS(xlogreader_state->ReadRecPtr),
> >                    errormsg);
>
> Yeah, I thought that it might be a bit vervous and lengty but actually
> we have another place where doing that. One more point is whether we
> have a case where first_record is invalid but errormsg is NULL
> there. WALDumpReadPage immediately exits so we should always have a
> message in that case according to the comment in ReadRecord.
>
> > * We only end up here without a message when XLogPageRead()
> > * failed - in that case we already logged something. In
> > * StandbyMode that only happens if we have been triggered, so we
> > * shouldn't loop anymore in that case.
>
> So that can be an assertion.
>
> Now the messages looks like this.
>
> $ pg_waldump /home/horiguti/data/data_work/pg_wal/000000020000000000000010
> pg_waldump: fatal: could not find a valid record after 0/0: unexpected pageaddr 0/9000000 in log segment
000000020000000000000010,offset 0
 
>

Thank you for updating the patch!

+ *
+ * The returned pointer (or *errormsg) points to an internal buffer that's
+ * valid until the next call to XLogFindNextRecord or XLogReadRecord.
  */

The comment of XLogReadRecord() also has a similar description. Should
we update it as well?

BTW is this patch registered to the current commitfest? I could not find it.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Ronan Dunklau
Date:
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Next
From: Masahiko Sawada
Date:
Subject: Re: Diagnostic comment in LogicalIncreaseXminForSlot