Thread: pg_walinspect: ReadNextXLogRecord's first_record argument
Hi, I was looking at the code for pg_walinspect today and I think I may have found a bug (or else I'm confused about how this all works, which is also possible). ReadNextXLogRecord() takes an argument first_record of type XLogRecPtr which is used only for error reporting purposes: if we fail to read the next record for a reason other than end-of-WAL, we complain that we couldn't read the WAL at the LSN specified by first_record. ReadNextXLogRecord() has three callers. In pg_get_wal_record_info(), we're just reading a single record, and the LSN passed as first_record is the LSN at which that record starts. Cool. But in the other two callers, GetWALRecordsInfo() and GetWalStats(), we're reading multiple records, and first_record is always passed as the LSN of the first record. That's logical enough given the name of the argument, but the effect of it seems to be that an error while reading any of the records will be reported using the LSN of the first record, which does not seem right. By contrast, pg_rewind's extractPageMap() reports the error using xlogreader->EndRecPtr. I think that's correct. The toplevel xlogreader function that we're calling here is XLogReadRecord(), which in turn calls XLogNextRecord(), which has this comment: /* * state->EndRecPtr is expected to have been set by the last call to * XLogBeginRead() or XLogNextRecord(), and is the location of the * error. */ Thoughts? -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Aug 16, 2022 at 10:04 PM Robert Haas <robertmhaas@gmail.com> wrote: > > Hi, > > I was looking at the code for pg_walinspect today and I think I may > have found a bug (or else I'm confused about how this all works, which > is also possible). ReadNextXLogRecord() takes an argument first_record > of type XLogRecPtr which is used only for error reporting purposes: if > we fail to read the next record for a reason other than end-of-WAL, we > complain that we couldn't read the WAL at the LSN specified by > first_record. > > ReadNextXLogRecord() has three callers. In pg_get_wal_record_info(), > we're just reading a single record, and the LSN passed as first_record > is the LSN at which that record starts. Cool. But in the other two > callers, GetWALRecordsInfo() and GetWalStats(), we're reading multiple > records, and first_record is always passed as the LSN of the first > record. That's logical enough given the name of the argument, but the > effect of it seems to be that an error while reading any of the > records will be reported using the LSN of the first record, which does > not seem right. Indeed. Thanks a lot for finding it. > By contrast, pg_rewind's extractPageMap() reports the error using > xlogreader->EndRecPtr. I think that's correct. The toplevel xlogreader > function that we're calling here is XLogReadRecord(), which in turn > calls XLogNextRecord(), which has this comment: > > /* > * state->EndRecPtr is expected to have been set by the last call to > * XLogBeginRead() or XLogNextRecord(), and is the location of the > * error. > */ > > Thoughts? Agreed. Here's a patch (for V15 as well) fixing this bug, please review. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Attachment
On Wed, Aug 17, 2022 at 12:41 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > Agreed. > > Here's a patch (for V15 as well) fixing this bug, please review. Couldn't you simplify this further by removing the lsn argument from GetWALRecordInfo and using record->ReadRecPtr instead? Then InitXLogReaderState's second argument could be XLogRecPtr instead of XLogRecPtr *. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Aug 17, 2022 at 8:52 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Aug 17, 2022 at 12:41 AM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > Agreed. > > > > Here's a patch (for V15 as well) fixing this bug, please review. > > Couldn't you simplify this further by removing the lsn argument from > GetWALRecordInfo and using record->ReadRecPtr instead? Then > InitXLogReaderState's second argument could be XLogRecPtr instead of > XLogRecPtr *. Done. XLogFindNextRecord() stores the first valid record in EndRecPtr and the ReadRecPtr is set to InvalidXLogRecPtr by calling XLogBeginRead(). And XLogNextRecord() sets ReadRecPtr which we can use. PSA v2 patches. -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/
Attachment
On Wed, Aug 17, 2022 at 12:29 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > PSA v2 patches. These look OK to me. -- Robert Haas EDB: http://www.enterprisedb.com