Re: BUG #18055: logical decoding core on AllocateSnapshotBuilder() - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #18055: logical decoding core on AllocateSnapshotBuilder()
Date
Msg-id 20230820210545.plmlk3rnxxgkokkj@awork3.anarazel.de
Whole thread Raw
In response to RE: BUG #18055: logical decoding core on AllocateSnapshotBuilder()  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses Re: BUG #18055: logical decoding core on AllocateSnapshotBuilder()  (Masahiko Sawada <sawada.mshk@gmail.com>)
Re: BUG #18055: logical decoding core on AllocateSnapshotBuilder()  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-bugs
Hi,

On 2023-08-18 04:21:53 +0000, Zhijie Hou (Fujitsu) wrote:
> From ee1dfccc0306812c356c84bbd7e2558f27d7d348 Mon Sep 17 00:00:00 2001
> From: Hou Zhijie <houzj.fnst@cn.fujitsu.com>
> Date: Thu, 17 Aug 2023 19:29:34 +0800
> Subject: [PATCH v4] cleanup decoding context in error cases
>
> Some of the management functions for logical decoding didn't clean up the
> decoding context when an error occurs during decoding. This can
> result in accessing unfreed resources and assert failure the next time we call
> these functions. Fix it by cleaning up the decoding context and slots in error
> cases.

I don't think this is the right fix - at all. We shouldn't run arbitrary code
after a failure, which we do by calling the shutdown callback. It's not going
to expect to be called in a context where you're not allowed to do very much.

Consider what would happen if FreeDecodingContext() or such failed - we'd not
have executed InvalidateSystemCaches(), and therefore would continue with
corrupted system caches.

There's also no need to free memory here, that's already done via the memory
context cleanup of the "surrounding" memory context (i.e. CurrentMemoryContext
when StartupDecodingContext() is called)


I think the real problem is that 272248a0c added InitialRunningXacts a global
variable. If it just lived in in struct SnapBuild, this whole thing wouldn't
be an issue? The commit justified this choice with avoiding an ABI breakage -
but isn't that bogus? The struct is private to snapbuild.c. It doesn't live on
disk (that's SnapBuildOnDisk). There's no ABI to preserve?

Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: David Rowley
Date:
Subject: Re: BUG #18060: Left joining rows using random() function in join condition doesn't work as expected.
Next
From: James Inform
Date:
Subject: Re: BUG #18060: Left joining rows using random() function in join condition doesn't work as expected.