On Tue, Oct 24, 2023 at 4:06 AM Robert Haas <robertmhaas@gmail.com> wrote:
> But within mdtruncate(), we've got more than one problem, I think.
> mdnblocks() is a problem because of the reason that you mention, but
> register_dirty_segment() doesn't look totally safe either, because it
> can call RegisterSyncRequest() which, in a standalone backend, can
> call RememberSyncRequest().
But that is already enkludgified thusly:
/*
* XXX: The checkpointer needs to add entries to the pending ops table
* when absorbing fsync requests. That is done within a critical
* section, which isn't usually allowed, but we make an exception. It
* means that there's a theoretical possibility that you run out of
* memory while absorbing fsync requests, which leads to a PANIC.
* Fortunately the hash table is small so that's unlikely to happen in
* practice.
*/
pendingOpsCxt = AllocSetContextCreate(TopMemoryContext,
"Pending ops context",
ALLOCSET_DEFAULT_SIZES);
MemoryContextAllowInCriticalSection(pendingOpsCxt, true);
> In general, it seems like it would be a lot nicer if we were doing a
> lot less stuff inside the critical section here. So I think you're
> right that we need some refactoring. Maybe smgr_prepare_truncate() and
> smgr_execute_truncate() or something like that. I wonder if we could
> actually register the dirty segment in the "prepare" phase - is it bad
> if we register a dirty segment before actually dirtying it? And maybe
> even CacheInvalidateSmgr() could be done at that stage? It seems
> pretty clear that dropping the dirty buffers and actually truncating
> the relation on disk need to happen after we've entered the critical
> section, because if we fail after doing the former, we've thrown away
> dirty data in anticipation of performing an operation that didn't
> happen, and if we fail when attempting the latter, primaries and
> standbys diverge and the originally-reported bug on this thread
> happens. But we'd like to move as much other stuff as we can out of
> that critical section.
Hmm, yeah it seems like that direction would be a nice improvement, as
long as we are sure that the fsync request can't be processed too
soon.