Re: Comments on Custom RMGRs - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Comments on Custom RMGRs
Date
Msg-id 3f732849-0f60-4a64-be84-ee41e239d560@iki.fi
Whole thread Raw
In response to Re: Comments on Custom RMGRs  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Comments on Custom RMGRs
List pgsql-hackers
On 27/05/2024 21:20, Michael Paquier wrote:
> On Fri, May 17, 2024 at 04:25:15PM -0400, Robert Haas wrote:
>> On Fri, May 17, 2024 at 4:20 PM Jeff Davis <pgsql@j-davis.com> wrote:
>>> Regarding this particular change: the checkpointing hook seems more
>>> like a table AM feature, so I agree with you that we should have a good
>>> idea how a real table AM might use this, rather than only
>>> pg_stat_statements.
>>
>> I would even be OK with a pg_stat_statements example that is fully
>> working and fully explained. I just don't want to have no example at
>> all. The original proposal has been changed twice because of
>> complaints that the hook wasn't quite useful enough, but I think that
>> only proves that v3 is closer to being useful than v1. If v1 is 40% of
>> the way to useful and v3 is 120% of the way to useful, wonderful! But
>> if v1 is 20% of the way to being useful and v3 is 60% of the way to
>> being useful, it's not time to commit anything yet. I don't know which
>> is the case, and I think if someone wants this to be committed, they
>> need to explain clearly why it's the first and not the second.
> 
> Please note that I've been studying ways to have pg_stat_statements
> being plugged in directly with the shared pgstat APIs to get it backed
> by a dshash to give more flexibility and scaling, giving a way for
> extensions to register their own stats kind.  In this case, the flush
> of the stats would be controlled with a callback in the stats
> registered by the extensions, conflicting with what's proposed here.
> pg_stat_statements is all about stats, at the end.  I don't want this
> argument to act as a barrier if a checkpoint hook is an accepted
> consensus here,  but a checkpoint hook used for this code path is not
> the most intuitive solution I can think of in the long-term.

On the topic of concrete uses for this API: We have a bunch of built-in 
resource managers that could be refactored to use this API. 
CheckPointGuts currently looks like this:

> /*
>  * Flush all data in shared memory to disk, and fsync
>  *
>  * This is the common code shared between regular checkpoints and
>  * recovery restartpoints.
>  */
> static void
> CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
> {
>     CheckPointRelationMap();
>     CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
>     CheckPointSnapBuild();
>     CheckPointLogicalRewriteHeap();
>     CheckPointReplicationOrigin();
> 
>     /* Write out all dirty data in SLRUs and the main buffer pool */
>     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_START(flags);
>     CheckpointStats.ckpt_write_t = GetCurrentTimestamp();
>     CheckPointCLOG();
>     CheckPointCommitTs();
>     CheckPointSUBTRANS();
>     CheckPointMultiXact();
>     CheckPointPredicate();
> 
>     RmgrCheckpoint(flags, RMGR_CHECKPOINT_BEFORE_BUFFERS);
> 
>     CheckPointBuffers(flags);
> 
>     RmgrCheckpoint(flags, RMGR_CHECKPOINT_AFTER_BUFFERS);
> 
>     /* Perform all queued up fsyncs */
>     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_SYNC_START();
>     CheckpointStats.ckpt_sync_t = GetCurrentTimestamp();
>     ProcessSyncRequests();
>     CheckpointStats.ckpt_sync_end_t = GetCurrentTimestamp();
>     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE();
> 
>     /* We deliberately delay 2PC checkpointing as long as possible */
>     CheckPointTwoPhase(checkPointRedo);
> }

Of these calls, CheckPointCLOG would be natural as the rmgr_callback of 
the clog rmgr. Similarly for CheckPointMultiXact and maybe a few others.


However, let's look at the pg_stat_statements patch (pgss_001.v1.patch):

It's now writing a new WAL record for every call to pgss_store(), 
turning even simple queries into WAL-logged operations. That's a 
non-starter. It will also not work on a standby. This needs to be 
redesigned so that the data is updated in memory, and written to disk 
and/or WAL-logged only periodically. Perhaps at checkpoints, but you 
could do it more frequently too.

I'm not convinced that the stats should be WAL-logged. Do you want them 
to be replicated and included in backups? Maybe, but maybe not. It's 
certainly a change to how it currently works.

If we don't WAL-log the stats, we don't really need a custom RMGR for 
it. We just need a checkpoint hook to flush the stats to disk, but we 
don't need a registered RMGR ID for it.

So, I got a feeling that adding this to the rmgr interface is not quite 
right. The rmgr callbacks are for things that run when WAL is 
*replayed*, while checkpoints are related to how WAL is generated. Let's 
design this as an independent hook, separate from rmgrs.


Another data point: In Neon, we actually had to add a little code to 
checkpoints, to WAL-log some exta data. That was a quick hack and might 
not be the right design in the first place, but these hooks would not 
have been useful for our purposes. We wanted to write a new WAL record 
at shutdown, and in CheckPointGuts(), it's already too late for that. It 
needs to be done earlier, before starting to the shutdown checkpoint. 
Similar to LogStandbySnapshot(), except that LogStandbySnapshot() is not 
called at shutdown like we wanted to. For a table AM, the point of a 
checkpoint hook is probably to fsync() data that is managed outside of 
the normal buffer manager and CheckPointGuts() is the right place for 
that, but other extensions might want to hook into checkpoints for other 
reasons.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: xid_wraparound tests intermittent failure.
Next
From: David Christensen
Date:
Subject: Re: [PATCH] GROUP BY ALL