On Fri, Oct 20, 2023 at 1:17 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > I take that back: there is a palloc() under RelationTruncate(). DNLGTM.
>
> Ooops, I should have quoted to the 0002 patch there.
>
> Hmm. We could teach md.c to do all its path manipulation in
> MAXPGPATH-sized output buffers but that still wouldn't be enough
> because it might decide to allocate more space for the array of
> segments, and I'm not even sure where in PostgreSQL we guarantee that
> everything that could appear here would fit in that. But that gives
> me an idea. It feels like a bit of a dirty hack, which could perhaps
> be made less dirty with some kind of function that includes the word
> 'ensure' in its name, but I think we can make md.c promise not to
> allocate anything before the next CFI with something like this:
Ugh. AFAICS the problems are confined to smgrtruncate() and within
that to the callback to smgr_truncate. As far as the rest of
smgrtruncate() is concerned, it seems like DropRelationBuffers() and
CacheInvalidateSmgr() are safe enough in a critical section, and the
other code directly inside RelationTruncate() looks OK, too.
But within mdtruncate(), we've got more than one problem, I think.
mdnblocks() is a problem because of the reason that you mention, but
register_dirty_segment() doesn't look totally safe either, because it
can call RegisterSyncRequest() which, in a standalone backend, can
call RememberSyncRequest().
In general, it seems like it would be a lot nicer if we were doing a
lot less stuff inside the critical section here. So I think you're
right that we need some refactoring. Maybe smgr_prepare_truncate() and
smgr_execute_truncate() or something like that. I wonder if we could
actually register the dirty segment in the "prepare" phase - is it bad
if we register a dirty segment before actually dirtying it? And maybe
even CacheInvalidateSmgr() could be done at that stage? It seems
pretty clear that dropping the dirty buffers and actually truncating
the relation on disk need to happen after we've entered the critical
section, because if we fail after doing the former, we've thrown away
dirty data in anticipation of performing an operation that didn't
happen, and if we fail when attempting the latter, primaries and
standbys diverge and the originally-reported bug on this thread
happens. But we'd like to move as much other stuff as we can out of
that critical section.
--
Robert Haas
EDB: http://www.enterprisedb.com