Home > mailing lists

Re: Two issues leading to discrepancies in FSM data on the standby server - Mailing list pgsql-hackers

From	Melanie Plageman
Subject	Re: Two issues leading to discrepancies in FSM data on the standby server
Date	April 22 17:15:23
Msg-id	CAAKRu_ZNAsUv9TPJ0bxxfTj-KyTMVidqWUhsfjKDmUrA4DRSkQ@mail.gmail.com Whole thread
In response to	Re: Two issues leading to discrepancies in FSM data on the standby server (Alexander Korotkov <aekorotkov@gmail.com>)
Responses	Re: Two issues leading to discrepancies in FSM data on the standby server
List	pgsql-hackers

Tree view

On Wed, Apr 22, 2026 at 8:32 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Tue, Apr 21, 2026 at 5:42 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
>
> > > I also think that usage of MarkBufferDirty() here is safe.  If I
> > > understood correctly.
> > > 1) When wal_log_hints = on, should be completely safe.  Even if we
> > > have torn page after the crash, during recovery FPI from the primary
> > > should come first.
> >
> > I think FPIs from primary don't really matter here, since we are only
> > talking about MarkBufferDirty() in XLogRecordPageWithFreespace(). If
> > we change it to MarkBufferDirty() on the standby and the machine
> > crashes mid-write leading to a checksum error, we'll just zero it out
> > -- which is really your point 2. While FPIs from the primary will
> > overwrite the standby's FSM page, they don't provide torn-page
> > protection for modifications made by the standby as you could read the
> > page between the torn write and replaying any FPI from the primary.
>
> It's probably not so important in this context, but I'd like to verify
> my thoughts further.  My idea is that standby's changes of FSM are
> mirroring primary's changes of FSM, even that FSM changes don't have
> own WAL-records and being decoded from other WAL records.  Thus, if
> some FSM page on primary gets changed then primary emits FPI for the
> first change after checkpoint.  The standby restartpoints are
> synchronized with primary's checkpoints, and the FSM changes mirrors
> FSM changes on primary.  Standby should also have its first change of
> FSM page after the restartpoint covered by FPI received from primary.
> So, the consistency of FSM pages should be guaranteed in the similar
> way to every other WAL-logged pages, except FSM pages are not directly
> WAL-logged, but got their changes decoded from main fork WAL-records.
>
> The weak point I see in the reasoning above is the assumption that FSM
> changes on standby fully mirrors FSM changes on primary.  I didn't
> really check this invariant.  But other than that, could you please,
> re-check this thoughts and let me know what do you think?

Yea, I don't think they are really being mirrored. The conditions for
updating the FSM in normal operation vs in recovery are different.
When doing an INSERT, we can look for freespace on heap page 5, find
it doesn't have enough and then update the freespace map to reflect
that. Then, let's say we eventually find space for the INSERTed tuple
on heap page 10000. In normal operation, we update the FSM on
rejection, not when we find freespace. So we do not update the FSM for
heap page 10000 to reflect how much freespace it now has remaining.
Later, when replaying the INSERT, if the heuristic conditions are met
(action == BLK_NEEDS_REDO && freespace < BLCKSZ / 5) for freespace on
heap block 10000, we may then update the FSM for page 10000. The local
changes to the FSM page don't mark the buffer dirty and thus aren't
safe from loss if the page is evicted and definitely not safe from
being wiped out by a future FPI since they don't update the FSM page
LSN.

> > It is a bug that is causing overly optimistic FSM
> > numbers, but it's not a correctness issue like wrong results/data
> > corruption etc. So, I think you could make an argument either way
> > about fixing it.
>
> It has user-visible effect of increased insertion time after replica
> promotion.  I think this is quite a reason to backpatch.

Right, since it's not invasive, I'm fine with backpatching.

> > I don't know how much of Alexey's reported issue was this vs
> > PageGetFreeSpace() in heap_xlog_visible(). The MarkBufferDirty()
> > change is easy to fix, so it probably makes sense to do so. I haven't
> > investigated more about the PageGetFreeSpace() issue.
>
> Makes sense.  I suggest Alexey could clarify this.

Would be good also if he could add a 19 open item for the
PageGetFreeSpace() thing (in the issues affecting back branches
section) or register a patch for next commitfest (as a bug), so we
don't forget about it.

- Melanie

pgsql-hackers by date:

From: Yura Sokolov
Date: 22 April, 16:13:21
Subject: Re: PoC: Add condition variable support to WaitEventSetWait()

From: Fabrízio de Royes Mello
Date: 22 April, 17:21:56
Subject: Re: RFC: pg_stat_logmsg

Re: Two issues leading to discrepancies in FSM data on the standby server - Mailing list pgsql-hackers

Previous

Next