Re: Why is lorikeet so unstable in v14 branch only? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Why is lorikeet so unstable in v14 branch only?
Date
Msg-id 174838.1648332620@sss.pgh.pa.us
Whole thread Raw
In response to Re: Why is lorikeet so unstable in v14 branch only?  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: Why is lorikeet so unstable in v14 branch only?
List pgsql-hackers
Andrew Dunstan <andrew@dunslane.net> writes:
> Yes it seems like a bug, but hard to diagnose. It seemed like a bug back
> in May:  see
> <https://postgr.es/m/4baee39d-0ebe-8327-7878-5bc11c95effa@dunslane.net>

Ah, right, but that link is busted.  Here's the correct link:

https://www.postgresql.org/message-id/flat/e6f1fb3e-1e08-0188-9c71-2b5b894571de%40dunslane.net

> I vaguely theorize about a buffer overrun somewhere that scribbles on
> the stack.

I said in the earlier thread

> A platform-specific problem in get_ps_display() seems plausible
> enough.  The apparent connection to a concurrent VACUUM FULL seems
> pretty hard to explain that way ... but maybe that's a mirage.

but your one stack trace showed a crash while trying to lock pg_class for
ScanPgRelation, which'd potentially have blocked because of the VACUUM ---
and that'd result in a process title change, if not disabled.  So now
I feel like "something rotten in ps_status.c" is a theory that can fit
the available facts.

> If I understand correctly that you're only seeing this in v13 and
> HEAD, then it seems like bf68b79e5 (Refactor ps_status.c API)
> deserves a hard look.

I still stand by this opinion.  Can you verify which of the ps_status.c
code paths gets used on this build?

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Column Filtering in Logical Replication
Next
From: James Coleman
Date:
Subject: Re: Document atthasmissing default optimization avoids verification table scan