Re: SLRUs in the main buffer pool, redux - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: SLRUs in the main buffer pool, redux |
Date | |
Msg-id | 128709bc-992c-b57a-7174-098433b7faa4@iki.fi Whole thread Raw |
In response to | Re: SLRUs in the main buffer pool, redux (Heikki Linnakangas <hlinnaka@iki.fi>) |
List | pgsql-hackers |
On 25/07/2022 09:54, Heikki Linnakangas wrote: > I'll write a separate post with my thoughts on the high-level design of > this, ... This patch represents each SLRU as a relation. The CLOG is one relation, pg_subtrans is another relations, and so forth. The SLRU relations use a different SMGR implementation, which is implemented in slru.c. As you know, I'd like to make the SMGR implementation replaceable by extensions. We need that for Neon, and I'd imagine it to be useful for many other things, too, like compression, encryption, or restoring data from a backup on-demand. I'd like all file operations to go through the smgr API as much as possible, so that an extension can intercept SLRU file operations too. If we introduce another internal SMGR implementation, then an extension would need to replace both implementations separately. I'd prefer to use the current md.c implementation for SLRUs too, instead. Thus I propose: Let's represent each SLRU *segment* as a separate relation, giving each SLRU segment a separate relNumber. Then we can use md.c for SLRUs, too. Dropping an SLRU segment can be done by calling smgrunlink(). You won't need to deal with missing segments in md.c, because each individual SLRU file is a complete file, with no holes. Dropping buffers for one SLRU segment can be done with DropRelationBuffers(), instead of introducing the new DiscardBuffer() function. You can let md.c handle the caching of the file descriptors, you won't need to reimplement that with 'slru_file_segment'. SLRUs won't need the segmentation into 1 GB segments that md.c does, because each SLRU file is just 256 kB in size. That's OK. (BTW, I propose that we bump the SLRU segment size up to a whopping 1 MB or even more, while we're at it. But one step at a time.) SLRUs also won't need the concept of relation forks. That's fine, we can just use MAIN_FORKNUM. elated to that, I'm somewhat bothered by the way that SMgrRelation currently bundles all the relation forks together. A comment in smgr.h says: > smgr.c maintains a table of SMgrRelation objects, which are essentially > cached file handles. But when we introduced relation forks, that got a bit muddled. Each SMgrRelation object is now a file handle for a bunch of related relation forks, and each fork is a separate file that can be created and truncated separately. That means that an SMGR implementation, like md.c, needs to track the file handles for each fork. I think things would be more clear if we unbundled the forks at the SMGR level, so that we would have a separate SMgrRelation struct for each fork. And let's rename it to SMgrFile to make the role more clear. I think that would reduce the confusion when we start using it for SLRUs; an SLRU is not a relation, after all. md.c would still segment each logical file into 1 GB segments, but it would not need to deal with forks. Attached is a draft patch to refactor it that way, and a refactored version of your SLRU patch over that. The relation cache now needs to hold a separate reference to the SMgrFile of each fork of a relation. And smgr cache invalidation still works at relation granularity. Doing it per SmgrFile would be more clean in smgr.c, but in practice all the forks of a relation are unlinked and truncated together, so sending a separate invalidation event for each SMgrFile would increase the cache invalidation traffic. In the passing, I moved the DropRelationBuffers() calls from smgr.c to the callers. smgr.c doesn't otherwise make any effort to keep the buffer manager in sync with the state on-disk, that responsibility is normally with the code that *uses* the smgr functions, so I think that's more logical. The first patch currently causes the '018_wal_optimize.pl' test to fail. I guess I messed up something in the relation truncation code, but I haven't investigated it yet. I wanted to post this to get comments on the design, before spending more time on that. What do you think? - Heikki
Attachment
pgsql-hackers by date: