Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o - Mailing list pgsql-committers

From Andres Freund
Subject Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o
Date
Msg-id 20210807210349.bby5ta2xrbnte6ht@alap3.anarazel.de
Whole thread Raw
In response to Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-committers
Hi,

On 2021-08-07 15:12:38 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2021-08-07 13:37:16 -0400, Tom Lane wrote:
> >> Depends what you want to define as a bug.  What I am not happy about
> >> is the prospect of random assertion failures for the next six months
> >> while you finish redesigning half of the system.  The rest of us
> >> have work we want to get done, too.  I don't object to the idea of
> >> making no-lost-events an end goal, but we are clearly not ready
> >> for that today.
> 
> > I don't know what to do about that. How would we even find these cases if they
> > aren't hit during regression tests on my machine (nor on a lot of others)?
> 
> The regression tests really aren't that helpful for testing the problem
> scenario here, which basically is SIGTERM'ing a query-in-progress.
> I'm rather surprised that the buildfarm managed to exercise that at all.

They're also not that helpful because this problem likely is unreachable for
any tempfiles other than the one in InitializeBackupManifest(). Pretty much
all, or even all, the other tempfiles are cleaned up either via transaction
and/or resowner cleanup.


I wonder if we should do something about WalSndResourceCleanup() not being
reached for FATALs? I think at least a note in WalSndResourceCleanup()
commenting on that fact seems like it might be a good idea?

It seems like it could eventually be a problem that the resowners added in
0d8c9c1210c4 aren't ever cleaned up in case of a FATAL error. Most resowner
cleanup actions are also backstopped with some form of on-exit hook, but I
don't think it's all - e.g. buffer pins aren't.

I guess I should start a thread about this on -hackers...


> You might try setting up a test scaffold that runs the core regression
> tests and SIGINT's the postmaster, or alternatively SIGTERM's some
> individual session, at random times partway through.  Obviously this
> will make the regression tests report failure, but what to look for
> is if anything dumps core on the way out.

Worth trying.

Greetings,

Andres Freund



pgsql-committers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o
Next
From: Peter Eisentraut
Date:
Subject: pgsql: Remove T_MemoryContext