On Fri, Apr 1, 2022 at 2:04 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> The v1-0003 patch introduced smgropen_cond() to avoid the problem of
> IssuePendingWritebacks(), which does desynchronised smgropen() calls
> and could open files after the barrier but just before they are
> unlinked. Makes sense, but...
>
> 1. For that to actually work, we'd better call smgrcloseall() when
> PROCSIGNAL_BARRIER_SMGRRELEASE is handled; currently it only calls
> closeAllVds(). Here's a patch for that. But...
What if we instead changed our approach to fixing
IssuePendingWritebacks()? Design sketch:
1. We add a new flag, bool smgr_zonk, to SmgrRelationData. Initially it's false.
2. Receipt of PROCSIGNAL_BARRIER_SMGRRELEASE sets smgr_zonk to true.
3. If you want, you can set it back to false any time you access the
smgr while holding a relation lock.
4. IssuePendingWritebacks() uses smgropen_cond(), as today, but that
function now refuses either to create a new smgr or to return one
where smgr_zonk is true.
I think this would be sufficient to guarantee that we never try to
write back to a relfilenode if we haven't had a lock on it since the
last PROCSIGNAL_BARRIER_SMGRRELEASE, which I think is the invariant we
need here?
My thinking here is that it's a lot easier to get away with marking
the content of a data structure invalid than it is to get away with
invalidating pointers to that data structure. If we can tinker with
the design so that the SMgrRelationData doesn't actually go away but
just gets a little bit more thoroughly invalidated, we could avoid
having to audit the entire code base for code that keeps smgr handles
around longer than would be possible with the design you demonstrate
here.
--
Robert Haas
EDB: http://www.enterprisedb.com