Re: RFC: pg_stat_logmsg - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: RFC: pg_stat_logmsg
Date
Msg-id Zpb9gBjB2FfiD8jO@paquier.xyz
Whole thread Raw
In response to Re: RFC: pg_stat_logmsg  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: RFC: pg_stat_logmsg
List pgsql-hackers
On Wed, Jul 17, 2024 at 12:14:36AM +0200, Tomas Vondra wrote:
> I noticed this patch hasn't moved since September 2023, so I wonder
> what's the main blocker / what is needed to move this?

+ /* Location of permanent stats file (valid when database is shut down) */
+ #define PGLM_DUMP_FILE    PGSTAT_STAT_PERMANENT_DIRECTORY
"/pg_stat_logmsg.stat

Perhaps this does not count as a valid reason, but does it really make
sense to implement things this way, knowing that this could be changed
to rely on a potential pluggable pgstats?  I mean this one I've
proposed:
https://www.postgresql.org/message-id/Zmqm9j5EO0I4W8dx%40paquier.xyz

One potential implementation is to stick that to be fixed-numbered,
because there is a maximum cap to the number of entries proposed by
the patch, while keeping the whole in memory.

+ logmsg_store(ErrorData *edata, Size *logmsg_offset,
+              int *logmsg_len, int *gc_count)

The patch shares a lot of perks with pg_stat_statements that don't
scale well.  I'm wondering if it is a good idea to duplicate these
properties in a second, different, module, like the external file can
could be written out on a periodic basis depending on the workload.
I am not saying that the other thread is a magic solution for
everything (not looked yet at how this would plug with the cap in
entries that pg_stat_statements wants), just one option on the table.

> As for the code, I wonder if the instability of line numbers could be a
> problem - these can change (a little bit) between minor releases, so
> after an upgrade we'll read the dump file with line numbers from the old
> release, and then start adding entries with new line numbers. Do we need
> to handle this in some way?

Indeed.  Perhaps a PostgreSQL version number assigned to each entry to
know from which binary an entry comes from?  This would cost a couple
of extra bytes for each entry still that would be the best information
possible to match that with a correct code tree.  If it comes to that,
even getting down to a commit SHA1 could be useful to provide the
lowest level of granularity.  Another thing would be to give up on the
line number, stick to the uniqueness in the stats entries with the
errcode and the file name, but that won't help for things like
tablecmds.c.

> This might be partially solved by eviction of entries from the old
> release - we apply decay, so after a while their usage will be 0. But
> what if there's no pressure for space, we'll not actually evict them.
> And it'll be confusing to have a mix of old/new line numbers.

Once we know that these stats not going to be relevant anymore as of a
minor upgrade flow, resetting them could be the move that makes the
most sense, leaving the reset to the provider doing the upgrades,
while taking a snapshot of the past data before the reset?  I find the
whole problem tricky to define, TBH.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: RFC: pg_stat_logmsg
Next
From: Michael Paquier
Date:
Subject: Re: Windows perl/tcl requirement documentation