[Proposal] Adding callback support for custom statistics kinds - Mailing list pgsql-hackers

From Sami Imseih
Subject [Proposal] Adding callback support for custom statistics kinds
Date
Msg-id CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com
Whole thread Raw
Responses Re: [Proposal] Adding callback support for custom statistics kinds
List pgsql-hackers
Hi,

I'd like to propose $SUBJECT to serialize additional per-entry data beyond
the standard statistics entries. Currently, custom statistics kinds can store
their standard entry data in the main "pgstat.stat" file, but there is no
mechanism for extensions to persist extra data stored in the entry. A common
use case is extensions that register a custom kind and, besides
standard counters,
need to track variable-length data stored in a dsa_pointer.

This proposal adds optional "to_serialized_extra" and
"from_serialized_extra" callbacks to "PgStat_KindInfo" that allow custom kinds
to write and read from extra data in a separate files
(pgstat.<kind>.stat). The callbacks
give extensions direct access to the file pointer so they can read and write
data in any format, while the core "pgstat" infrastructure manages
opening, closing, renaming, and cleanup, just as it does with "pgstat.stat".

A concrete use case is pg_stat_statements. If it were to use custom
stats kinds to track statement counters, it could also track query text
stored in DSA. The callbacks allow saving the query text referenced by the
dsa_pointer and restoring it after a clean shutdown. Since DSA
(and more specifically DSM) cannot be attached by the postmaster, an
extension cannot use "on_shmem_exit" or "shmem_startup_hook"
to serialize or restore this data. This is why pgstat handles
serialization during checkpointer shutdown and startup, allowing a single
backend to manage it safely.

I considered adding hooks to the existing pgstat code paths
(pgstat_before_server_shutdown, pgstat_discard_stats, and
pgstat_restore_stats), but that felt too unrestricted. Using per-kind
callbacks provides more control.

There are already "to_serialized_name" and "from_serialized_name"
callbacks used to store and read entries by "name" instead of
"PgStat_HashKey", currently used by replication slot stats. Those
remain unchanged, as they serve a separate purpose.

Other design points:

1. Filenames use "pgstat.<kind>.stat" based on the numeric kind ID.
This avoids requiring extensions to provide names and prevents issues
with spaces or special characters.

2. Both callbacks must be registered together. Serializing without
deserializing would leave orphaned files behind, and I cannot think of a
reason to allow this.

3. "write_chunk", "read_chunk", "write_chunk_s", and
"read_chunk_s" are renamed to "pgstat_write_chunk", etc., and
moved to "pgstat_internal.h" so extensions can use them without
re-implementing these functions.

4. These callbacks are valid only for custom, variable-numbered statistics
kinds. Custom fixed kinds may not benefit, but could be considered in the
future.

Attached 0001 is the proposed change, still in POC form. The second patch
contains  tests in "injection_points" to demonstrate this proposal, and is not
necessarily intended for commit.

Looking forward to your feedback!


--

Sami Imseih
Amazon Web Services (AWS)

Attachment

pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: fix type of infomask parameter in static inline functions
Next
From: Sami Imseih
Date:
Subject: Re: Skip unregistered custom kinds on stats load