Home > mailing lists

Re: Crash in new pgstats code - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Crash in new pgstats code
Date	April 16, 2022 21:36:33
Msg-id	20220416213633.4gfzputl3wbla55p@alap3.anarazel.de Whole thread Raw
In response to	Re: Crash in new pgstats code (Andres Freund <andres@anarazel.de>)
Responses	Re: Crash in new pgstats code
List	pgsql-hackers

Tree view

Hi

On 2022-04-16 12:13:09 -0700, Andres Freund wrote:
> What confuses me so far is what already had generated stats before
> reaching pgstat_reset_after_failure() (so that the bug could even be hit
> in t/025_stuck_on_old_timeline.pl).

I see part of a problem - in archiver stats. Even in 14 (and presumably
before), we do work that can generate archiver stats
(e.g. ReadCheckpointRecord()) before pgstat_reset_all().  It's not the
end of the world, but doesn't seem great.

But since archiver stats are fixed-numbered stats (and thus not in the
hash table), they'd not trigger the backtrace we saw here.

One thing that's interesting is that the failing tests have:
2022-04-15 12:07:48.828 UTC [675922][walreceiver][:0] FATAL:  could not link file "pg_wal/xlogtemp.675922" to
"pg_wal/00000002.history":File exists

which I haven't seen locally. Looks like we have some race between
startup process and walreceiver? That seems not great.  I'm a bit
confused that walreceiver and archiving are both active at the same time
in the first place - that doesn't seem right as things are set up
currently.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Thomas Munro
Date: 16 April 2022, 20:56:33
Subject: Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

From: Andres Freund
Date: 16 April 2022, 22:07:17
Subject: Re: Crash in new pgstats code

Re: Crash in new pgstats code - Mailing list pgsql-hackers

Previous

Next