Re: Two issues leading to discrepancies in FSM data on the standby server - Mailing list pgsql-hackers
| From | Melanie Plageman |
|---|---|
| Subject | Re: Two issues leading to discrepancies in FSM data on the standby server |
| Date | |
| Msg-id | CAAKRu_ZNAsUv9TPJ0bxxfTj-KyTMVidqWUhsfjKDmUrA4DRSkQ@mail.gmail.com Whole thread |
| In response to | Re: Two issues leading to discrepancies in FSM data on the standby server (Alexander Korotkov <aekorotkov@gmail.com>) |
| Responses |
Re: Two issues leading to discrepancies in FSM data on the standby server
|
| List | pgsql-hackers |
On Wed, Apr 22, 2026 at 8:32 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > On Tue, Apr 21, 2026 at 5:42 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > > > I also think that usage of MarkBufferDirty() here is safe. If I > > > understood correctly. > > > 1) When wal_log_hints = on, should be completely safe. Even if we > > > have torn page after the crash, during recovery FPI from the primary > > > should come first. > > > > I think FPIs from primary don't really matter here, since we are only > > talking about MarkBufferDirty() in XLogRecordPageWithFreespace(). If > > we change it to MarkBufferDirty() on the standby and the machine > > crashes mid-write leading to a checksum error, we'll just zero it out > > -- which is really your point 2. While FPIs from the primary will > > overwrite the standby's FSM page, they don't provide torn-page > > protection for modifications made by the standby as you could read the > > page between the torn write and replaying any FPI from the primary. > > It's probably not so important in this context, but I'd like to verify > my thoughts further. My idea is that standby's changes of FSM are > mirroring primary's changes of FSM, even that FSM changes don't have > own WAL-records and being decoded from other WAL records. Thus, if > some FSM page on primary gets changed then primary emits FPI for the > first change after checkpoint. The standby restartpoints are > synchronized with primary's checkpoints, and the FSM changes mirrors > FSM changes on primary. Standby should also have its first change of > FSM page after the restartpoint covered by FPI received from primary. > So, the consistency of FSM pages should be guaranteed in the similar > way to every other WAL-logged pages, except FSM pages are not directly > WAL-logged, but got their changes decoded from main fork WAL-records. > > The weak point I see in the reasoning above is the assumption that FSM > changes on standby fully mirrors FSM changes on primary. I didn't > really check this invariant. But other than that, could you please, > re-check this thoughts and let me know what do you think? Yea, I don't think they are really being mirrored. The conditions for updating the FSM in normal operation vs in recovery are different. When doing an INSERT, we can look for freespace on heap page 5, find it doesn't have enough and then update the freespace map to reflect that. Then, let's say we eventually find space for the INSERTed tuple on heap page 10000. In normal operation, we update the FSM on rejection, not when we find freespace. So we do not update the FSM for heap page 10000 to reflect how much freespace it now has remaining. Later, when replaying the INSERT, if the heuristic conditions are met (action == BLK_NEEDS_REDO && freespace < BLCKSZ / 5) for freespace on heap block 10000, we may then update the FSM for page 10000. The local changes to the FSM page don't mark the buffer dirty and thus aren't safe from loss if the page is evicted and definitely not safe from being wiped out by a future FPI since they don't update the FSM page LSN. > > It is a bug that is causing overly optimistic FSM > > numbers, but it's not a correctness issue like wrong results/data > > corruption etc. So, I think you could make an argument either way > > about fixing it. > > It has user-visible effect of increased insertion time after replica > promotion. I think this is quite a reason to backpatch. Right, since it's not invasive, I'm fine with backpatching. > > I don't know how much of Alexey's reported issue was this vs > > PageGetFreeSpace() in heap_xlog_visible(). The MarkBufferDirty() > > change is easy to fix, so it probably makes sense to do so. I haven't > > investigated more about the PageGetFreeSpace() issue. > > Makes sense. I suggest Alexey could clarify this. Would be good also if he could add a 19 open item for the PageGetFreeSpace() thing (in the issues affecting back branches section) or register a patch for next commitfest (as a bug), so we don't forget about it. - Melanie
pgsql-hackers by date: