Re: BUG #18241: PushTransaction may cause Standby to execute ItemIdMarkDead - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #18241: PushTransaction may cause Standby to execute ItemIdMarkDead
Date
Msg-id ZXm8AGdeOR5feHDo@paquier.xyz
Whole thread Raw
In response to Re: BUG #18241: PushTransaction may cause Standby to execute ItemIdMarkDead  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-bugs
On Tue, Dec 12, 2023 at 06:11:44PM +0100, Alvaro Herrera wrote:
> What a horrible bug.  Thanks for pushing the fix.  It looks OK to me:
> nowhere else do we create a TransactionStateData that would need the
> initialization.

Yep.

> I wonder if this can cause data corruption, if you happen to have a
> broken standby that improperly kills some index entry and is later
> promoted.  Maybe this bug explains these mysterious cases where an index
> seems to lack a pointer to some heap tuple for no known reason --
> especially notorious when you try to REINDEX a unique index and that
> turns out not to work due to duplicates.

AFAIK, this only impacts the visibility of the data retrieved in
transactions began when recovery was still running when doing an index
scan, and, even if the HOT chains would get inconsistent, we'd still
clean up them one the xmin gets updated on the standby because there
are no snapshots requiring it.  REINDEX after promotion does not looks
like an issue to me as we'd take a lock or wait for the necessary
snapshots to be gone in the concurrent case.  Or perhaps my
imagination with the HOT chains is too limited?

> I would be happier if the order of initialization of the struct member
> followed the order to the struct, with comments to note the missing
> members.  That might make it more visible to future patches that would
> add more members.  Something like this:

That crossed my mind, but it is not like we have other code paths
where we assume some fields to be initialized based on a palloc0() or
similar, so I don't see why we'd need that for this case.

> Also, the way this function checks for overflow is strange.  Why
> increment then check, leading to a forced free and decrement, instead of
> just testing and throwing error right away?  It's less code:

Saying that, agreed that this suggestion is an improvement compared to
what's on HEAD.  So +1 from me to change this code as you are
suggesting.
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Is it possible to add support for PostgreSQL-15 and newer versions in omnipitr?
Next
From: Michael Paquier
Date:
Subject: Re: BUG #18246: pgstathashindex() attempts to read invalid file for hash index attached to partitioned table