Re: Add last failed connection error message to pg_stat_wal_receiver - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Add last failed connection error message to pg_stat_wal_receiver
Date
Msg-id CALj2ACWd8RH5wNZScT5Wfyre918+GU7h-bVZF1XHm9NmgqnzBw@mail.gmail.com
Whole thread Raw
In response to Re: Add last failed connection error message to pg_stat_wal_receiver  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Thu, Aug 18, 2022 at 9:01 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> PANIC is not something you'd care about as the system would go down as
> and shared memory would be reset (right?) even if restart_on_crash is
> enabled.  Perhaps it would help here to use something like a macro to
> catch and save the error, in a style similar to what's in hba.c for
> example, which is the closest example I can think of, even if on ERROR
> we don't really care about the error string anyway as there is nothing
> to report back to the SQL views used for the HBA/ident files.
>
> FATAL may prove to be tricky though, because I'd expect the error to
> be saved in shared memory in this case.  This is particularly critical
> as this takes the WAL receiver process down, actually.

Hm, we can use error callbacks or pg try/catch blocks to save the
error message into walreceiver shared memory.

> Anyway, outside the potential scope of the proposal, there are more
> things that I find strange with the code:
> - Why isn't the string reset when the WAL receiver is starting up?
> That surely is not OK to keep a past state not referring to what
> actually happens with a receiver currently running.

I agree that it's not a good way to show some past failure state when
things are fine currently. Would naming the column name as
last_connectivity_error or something better and describing it in the
docs clearly help here?

Otherwise, we can have another simple function that just returns the
last connection failure of walreceiver and if required PID.

> - pg_stat_wal_receiver (system view) reports no rows if pid is NULL,
> which would be the state stored in shared memory after a connection.
> This means that one would never be able to see last_conn_error except
> when calling directly the SQL function pg_stat_get_wal_receiver().
>
> One could say that we should report a row for this view all the time,
> but this creates a compatibility breakage: existing application
> assuming something like (one row <=> WAL receiver running) could
> break.

-1.

We can think of having a separate infrastructure for reporting all
backend or process specific errors similar to pg_stat_activity and
pg_stat_get_activity, but that needs some shared memory and all of
that - IMO, it's an overkill.

I'm fine to withdraw this thread, if none of the above thoughts is
sensible enough to pursue further.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Andrey Lepikhov
Date:
Subject: Re: document the need to analyze partitioned tables
Next
From: Michael Paquier
Date:
Subject: Re: Query Jumbling for CALL and SET utility statements