Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o - Mailing list pgsql-committers

From Andres Freund
Subject Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o
Date
Msg-id 20210807044952.orwuf7hua2wytx4g@alap3.anarazel.de
Whole thread Raw
In response to Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o  (Andres Freund <andres@anarazel.de>)
List pgsql-committers
Hi,

On 2021-08-06 23:22:36 -0400, Tom Lane wrote:
> I wrote:
> > Guessing the common factor is "macOS", but that's just a guess.
> > I can poke into it tomorrow.
>
> I did try it real quick on my Mac laptop, and that fails too.
> Here's a more accurate backtrace, in case that helps.

Thanks!

I now managed to reproduce it on a CI system (after some initial
failed attempts due to pg_upgrade tests failing due to path length issues):

https://api.cirrus-ci.com/v1/artifact/task/4863568911794176/log/src/bin/pg_basebackup/tmp_check/log/010_pg_basebackup_main.log

2021-08-06 21:18:40.096 PDT [22031] 010_pg_basebackup.pl LOG:  received replication command: BASE_BACKUP LABEL
'pg_basebackupbase backup' PROGRESS   NOWAIT    MANIFEST 'yes' 
 
2021-08-06 21:18:40.096 PDT [22031] 010_pg_basebackup.pl STATEMENT:  BASE_BACKUP LABEL 'pg_basebackup base backup'
PROGRESS  NOWAIT    MANIFEST 'yes' 
 
2021-08-06 21:18:42.210 PDT [22031] 010_pg_basebackup.pl LOG:  could not send data to client: Broken pipe
2021-08-06 21:18:42.210 PDT [22031] 010_pg_basebackup.pl STATEMENT:  BASE_BACKUP LABEL 'pg_basebackup base backup'
PROGRESS  NOWAIT    MANIFEST 'yes' 
 
2021-08-06 21:18:42.210 PDT [22031] 010_pg_basebackup.pl FATAL:  connection to client lost
2021-08-06 21:18:42.210 PDT [22031] 010_pg_basebackup.pl STATEMENT:  BASE_BACKUP LABEL 'pg_basebackup base backup'
PROGRESS  NOWAIT    MANIFEST 'yes' 
 
TRAP: FailedAssertion("pgstat_is_initialized && !pgstat_is_shutdown", File: "pgstat.c", Line: 4810, PID: 22031)

vs what I locally get:
2021-08-06 20:52:06.163 PDT [1397252] 010_pg_basebackup.pl LOG:  received replication command: BASE_BACKUP LABEL
'pg_basebackupbase backup' PROGRESS   NOWAIT    MANIFEST 'yes' 
 
2021-08-06 20:52:06.163 PDT [1397252] 010_pg_basebackup.pl STATEMENT:  BASE_BACKUP LABEL 'pg_basebackup base backup'
PROGRESS  NOWAIT    MANIFEST 'yes' 
 
2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl LOG:  could not send data to client: Broken pipe
2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl STATEMENT:  BASE_BACKUP LABEL 'pg_basebackup base backup'
PROGRESS  NOWAIT    MANIFEST 'yes' 
 
2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl ERROR:  base backup could not send data, aborting backup
2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl STATEMENT:  BASE_BACKUP LABEL 'pg_basebackup base backup'
PROGRESS  NOWAIT    MANIFEST 'yes' 
 
2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl FATAL:  connection to client lost

Note that OSX version doesn't have the "base backup could not send data,
aborting backup" message. I guess that is what leads to the difference in
behaviour:

The temp file is created by InitializeBackupManifest(). In the !OSX case, we
first abort via an ERROR, which triggers the cleanup via
WalSndResourceCleanup(). On OSX however, we immediately error out with FATAL
for some reason (timing? network buffering differences?), which will never
reach WalSndErrorCleanup(). Therefore the temp file only gets deleted during
proc_exit(), which triggers the issue...

Not yet really sure what the best way to deal with this is. Presumably this
issue would be fixed if AtProcExit_Files()/CleanupTempFiles() were scheduled
via before_shmem_exit(). And perhaps it's not too off to schedule
CleanupTempFiles() there - but it doesn't quite seem entirely right either.

I'd kinda like to avoid having to overhaul the process exit infrastructure as
a prerequisite to getting the shared memory stats patch in :(.

Greetings,

Andres Freund



pgsql-committers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o
Next
From: Andres Freund
Date:
Subject: Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o