Hi,
On 2022-03-10 19:07:21 -0800, Andres Freund wrote:
> This suggests that we should assert that we have a suitable snapshot in
> pg_detoast_datum*, so that we don't need to have an externally toasted datum
> to notice such bugs. The reason I put it in init_toast_snapshot() was that the
> bug leading to the assert was from toast_delete_datum(). But even there it'd
> be better to put the assert to before the early return?
I added an assert like that, but it'd be a lot of stuff to clean up to make it
work :(. There's some easy to ignore cases like bootstrap (safe),
PostgresInit() (probably not). But after that there's just countless unsafe
accesses. Including the one Peter just mentioned in do_autovacuum(). And of
course the one triggering this report.
In a lot of places we fetch a tuple (using a snapshot of course) and then
later after the snapshot is released, take actions that cause datums to be
detoasted (if they were toasted).  Afaict that's plainly not safe - nothing
prevents the toasted datum to be pruned away and replaced with something else,
potentially of a different datatype.
It kinda works for relations using catalog snapshots, because we do hold the
catalog snapshot for longer - but only until the next invalidations are
processed.
I wonder what a debugging mode forcing everything toastable to be toasted
would bring to light *shudder*.
Greetings,
Andres Freund