Re: [Proposal] Adding callback support for custom statistics kinds - Mailing list pgsql-hackers

From Chao Li
Subject Re: [Proposal] Adding callback support for custom statistics kinds
Date
Msg-id 5EC13E39-B41E-464A-B382-0ACEB18A15BB@gmail.com
Whole thread Raw
In response to Re: [Proposal] Adding callback support for custom statistics kinds  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers

> On Dec 3, 2025, at 13:54, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Dec 03, 2025 at 01:41:44PM +0800, Chao Li wrote:
>> Thanks for the patch, I do think the feature will be useful. After reading the patch, I got a concern on the design:
>>
>> This patch provides callbacks that requests (also allows) custom
>> extensions to write stat files on their own behalf, which I think
>> it’s unsafe. The problems coming out to my head includes:
>>
>> * An extension can write to any where on the storage, that what
>> * about it writes to /tmp and the files are deleted by other process
>> * or by a user manually incidentally?
>
> I mean, just don't do that.  It's up to the extension developer to
> decide what is safe or not, within the scope of the data folder.
>
>> * pgstat has a pattern of writing files like: writing to tmp file
>> * first, then durable_rename(), how to ensure extensions to do the
>> * same pattern? Without this pattern, how to ensure reliability of
>> * stat files?
>
> Extension code would be responsible for ensuring that.
>
>> * In the current path, pgstat performs its own write, then call
>> * callbacks. What about if a callback fails? Will that leave pgstat
>> * in a stale state?
>
> For the write state, end_extra_stats() would take care of that.  It
> depends on what kind of errors you would need to deal with, but as
> proposed the patch would offer the same level of protection for the
> writes of the stats, where we'd check for an error on the fd saved by
> an extension for an extra file.
>
> I think that you have a fair point about the stats read path though,
> shouldn't we make the callback from_serialized_extra_stats() return a
> state to be able to trigger a full-scale cleanup, at least?
>
>> * As extensions own file creation and deletion, in some case, staled
>> * file might be left on storage, who will be responsible for
>> * cleaning up them?
>
> The extension should be able to handle that, I guess.

Yes, they of course can do, but that’s out of pgstat’s control. How can we ensure that?

>
>> Given the goal of the feature is to allow extensions to serialize
>> custom data, the callback should just return serialized/deserialized
>> data, maybe together with some metadata, then pgstat should be
>> responsible for writing the data. In other words, IMO, pgstat should
>> always own stat files.
>
> That's where my view of the matter differs, actually, pushing down the
> responsibility into the extension code itself.  A key argument,
> mentioned upthread, is that the file paths could depend on the stats
> entry *keys*, which may not be known in advance when beginning the
> flush of the stats.  Think about per-database file stats, or just
> some per-object file stats, for example, which is an option that would
> matter so as we do not bloat the main pgstats file.
> --
> Michael

If we push down the responsibility into the extension code, then all extensions that want to enjoy the callbacks have
tohandle the same complexities of dealing with stat files, which sounds big duplicate efforts. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Newly created replication slot may be invalidated by checkpoint
Next
From: Richard Guo
Date:
Subject: Re: Some optimizations for COALESCE expressions during constant folding