On 2016-04-11 13:04:48 -0400, Robert Haas wrote:
> You're right, but I think that's more because I didn't say it
> correctly than because you haven't done something novel.
Could be.
> DROP and
> relation truncation know about shared buffers, and they go clear
> blocks that that might be affected from it as part of the truncate
> operation, which means that no other backend will see them after they
> are gone. The lock makes sure that no other references can be added
> while we're busy removing any that are already there. So I think that
> there is currently an invariant that any block we are attempting to
> access should actually still exist.
Note that we're not actually accessing any blocks, we're just opening a
segment to get the associated file descriptor.
> It sounds like these references are sticking around in backend-private
> memory, which means they are neither protected by locks nor able to be
> cleared out on drop or truncate. I think that's a new thing, and a
> bit scary.
True. But how would you batch flush requests in a sorted manner
otherwise, without re-opening file descriptors otherwise? And that's
prety essential for performance.
I can think of a number of relatively easy ways to address this:
1) Just zap (or issue?) all pending flush requests when getting an smgrinval/smgrclosenode
2) Do 1), but filter for the closed relnode
3) Actually handle the case of the last open segment not being RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks()
doesso.
I'm kind of inclined to do both 3) and 1).
> The possibly-saving grace here, I suppose, is that the references
> we're worried about are just being used to issue hints to the
> operating system.
Indeed.
> So I guess if we sent a hint on a wrong block or
> skip sending a hint altogether because of some failure, no harm done,
> as long as we don't error out.
Which the writeback code is careful not to do; afaics it's just the
"already open segment" issue making problems here.
- Andres