Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o - Mailing list pgsql-committers

From Andres Freund
Subject Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o
Date
Msg-id 20210807172556.rda5m7jjhk7ktukl@alap3.anarazel.de
Whole thread Raw
In response to Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-committers
Hi,

On 2021-08-07 13:06:47 -0400, Tom Lane wrote:
> The bigger picture here is that anyplace anybody ever wants to add stats
> collection in suddenly becomes a "must run before shmem disconnection"
> module.  I think that way madness lies --- we can't have the entire
> backend shut down before shmem disconnection.  So I feel like there's
> a serious design problem here, and it's not confined to temp files,
> or at least it won't stay confined to temp files.  (There may indeed
> be other problems already, that we just haven't had the good luck to
> have buildfarm timing vagaries expose to us.)

I think more often it should not end up as "must run before shmem
disconnection" as a whole, but should be split into a portion running at that
point.


> Maybe the solution is to acknowledge that we might lose some events
> during backend shutdown, and redefine the behavior as "we ignore
> event reports after pgstat shutdown", not "we assert that there never
> can be any such reports".

I think it's fine to make such calls, but that it ought to reside in the stats
emitting modules. Only it can decide whether needing to emit stats during
shutdown is a rare edge case or a commonly expected path. E.g. the case of
parallel worker shutdown sending stats too late is something common enough to
be problematic, so I don't want to make it too hard to detect such cases.


> > I'm somewhat inclined to split InitFileAccess() into two by separating out
> > InitTemporaryFileAccess() or such. InitFileAccess() would continue to happen
> > early and register a proc exit hook that errors out when there's a temp file
> > (as a backstop for non cassert builds). The new InitTemporaryFileAccess()
> > would happen a bit later and schedule CleanupTempFiles() to happen via
> > before_shmem_access(). And we add a Assert(!proc_exit_inprogress) to the
> > routines for opening a temp file.
>
> Maybe that would work, but after you multiply it by a bunch of different
> scenarios in different modules, it's going to get less and less attractive.

I'm not quite convinced. Even if we had a nicer ordering / layering of
subsystems, we'd still have to deal with subsystems that don't fit nicely
because they have conflicting needs. And we'd still need detection of use of
subsystems that already have not been initialized / are already shut down.

One example of that is imo fd.c being a very low level module that might be
needed during shutdown of other modules but which also currently depends on
the stats subsystem for temp file management. Which doesn't really make sense,
because stats very well could depend on fd.c routines.

I think needing to split the current fd.c subsystem into a lower-level (file
access) and a higher level (temporary file management) module is precisely
what a better designed module layering system will *force* us to do.


> > I've bitten by all this often enough to be motivated to propose
> > something. However I want to get the basics of the shared memory stats stuff
> > in first - it's a pain to keep it upated, and we'll need to find and solve all
> > of the issues it has anyway, even if we go for a redesign of module / startup
> > / shutdown layering.
>
> Fair.  But I suggest that the first cut should look more like what
> I suggest above, ie just be willing to lose events during shutdown.
> The downsides of that are not so enormous that we should be willing
> to undertake major klugery to avoid it before we've even got a
> semi-working system.

I think that's more likely to hide bugs unfortunately. Consider fa91d4c91f2 -
I might not have found that if we had just ignored "too late" pgstats activity
in pgstats.c or fd.c, and that's not an edge case.

Greetings,

Andres Freund



pgsql-committers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o
Next
From: Tom Lane
Date:
Subject: pgsql: Really fix the ambiguity in REFRESH MATERIALIZED VIEW CONCURRENT