Re: Bug in MultiXact replay compat logic for older minor version after crash-recovery - Mailing list pgsql-hackers
| From | Kirill Reshke |
|---|---|
| Subject | Re: Bug in MultiXact replay compat logic for older minor version after crash-recovery |
| Date | |
| Msg-id | CALdSSPh2tfxcZOgdeGdjBwdnUBEY+8iG7unuJMEoruWToUMYwA@mail.gmail.com Whole thread Raw |
| In response to | Re: Bug in MultiXact replay compat logic for older minor version after crash-recovery (Kirill Reshke <reshkekirill@gmail.com>) |
| List | pgsql-hackers |
On Sun, 22 Mar 2026 at 19:15, Kirill Reshke <reshkekirill@gmail.com> wrote: > > Hi! > I can see that the back-branches commit was included into master [0]. > I think this is good. > > On Sun, 22 Mar 2026 at 16:10, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > > > On 20/03/2026 19:05, Andrey Borodin wrote: > > >> On 20 Mar 2026, at 18:14, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > >> > > >> Zeroing the page again is dangerous because the CREATE_ID records can be out of order. The page might already containsome later multixids, and zeroing will overwrite them. > > > > > > I see only cases when it's not a problem: we zeroed page, did not flush it, thus did not extend the file, crashed,tested FS, zeroed page once more, overwrote again by replaying WAL, no big deal. > > > We should never zero a page with offsets, that will not be replayed by WAL. > > > > I think we're in agreement, but I want to verify because this is > > important to get right. I was replying to this: > > > > > If we are sure buffers have no this page we can detect it via FS. > > > Otherwise... nothing bad can happen, actually. We might get false positive and zero the page once more. > > > > My point is that if we rely on SimpleLruDoesPhysicalPageExist(), and it > > ever returns false even though we had already initialized the page, you > > can lose data. It's *not* ok to zero a page again that was zeroed > > earlier already, because we might have already written some real data on it. > > +1. Even if we manage to compose a "fix" that zeroes a page more than > once, this "fix" will be non-future-profing and we will corrupt the > database if anything goes even slightly wrong. > > > Let's consider this wal stream, generated with old minor version: > > > > ZERO_PAGE:2048 -> CREATE_ID:2048 -> CREATE_ID:2049 -> CREATE_ID:2047 > > > > 2048 is the first multixid on the page. When WAL replay gets to the > > CREATE_ID:2047 record, it will enter the backwards-compatibility > > codepath and needs to determine if the page containing the next mxid > > (2048) already exists. > > > > In this WAL sequence, the page already exist because the ZERO_PAGE > > record was replayed earlier. But if we just call > > SimpleLruDoesPhysicalPageExist(), it will return 'false' because the > > page was not flushed to disk yet. If we believe that and zero the page > > again, we will lose data (the offset for mxid 2049). > > > > The opposite cannot happen: if SimpleLruDoesPhysicalPageExist() returns > > true, then it does really exist. > > > > So indeed we can only trust SimpleLruDoesPhysicalPageExist() if we are > > sure that the page is not sitting in the buffers. > > +1 > > > Attached is a new version. I updated the comment to explain that. > > > > I also added another safety measure: before calling > > SimpleLruDoesPhysicalPageExist(), flush all the SLRU buffers. That way, > > SimpleLruDoesPhysicalPageExist() should definitely return the correct > > answer. That shouldn't be necessary because the check with > > last_initialized_offsets_page should cover all the cases where a page > > that extended the file is sitting in the buffers, but better safe than > > sorry. > > > > - Heikki > > I played with v2 and was unable to fool it into corrupting db. So v2 > looks good to me. > > > [0] https://git.postgresql.org/cgit/postgresql.git/commit/?id=516310ed4dba89bd300242df0d56b4782f33ed4d > > -- > Best regards, > Kirill Reshke Also, in commit message: > the backwards compatibility logic to tolerate WAL generated by older minor versions Let's define older as pre-789d65364c to be exact? -- Best regards, Kirill Reshke
pgsql-hackers by date: