Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To: - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To: |
Date | |
Msg-id | CA+hUKGKOofj-F2bROOxkLTU4XUjztbizVtyu7YBHx5te9tvvyg@mail.gmail.com Whole thread Raw |
In response to | Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To: (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
|
List | pgsql-hackers |
On Wed, Apr 6, 2022 at 5:07 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Apr 4, 2022 at 10:20 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > > The checkpointer never takes heavyweight locks, so the opportunity > > > you're describing can't arise. > > > > <thinks harder> Hmm, oh, you probably meant the buffer interlocking > > in SyncOneBuffer(). It's true that my most recent patch throws away > > more requests than it could, by doing the level check at the end of > > the loop over all buffers instead of adding some kind of > > DropPendingWritebacks() in the barrier handler. I guess I could find > > a way to improve that, basically checking the level more often instead > > of at the end, but I don't know if it's worth it; we're still throwing > > out an arbitrary percentage of writeback requests. > > Doesn't every backend have its own set of pending writebacks? > BufferAlloc() calls > ScheduleBufferTagForWriteback(&BackendWritebackContext, ...)? Yeah. Alright, for my exaggeration above, s/can't arise/probably won't often arise/, since when regular backends do this it's for random dirty buffers, not necessarily stuff they hold a relation lock on. I think you are right that if we find ourselves calling smgrwrite() on another buffer for that relfilenode a bit later, then it's safe to "unzonk" the relation. In the worst case, IssuePendingWritebacks() might finish up calling sync_file_range() on a file that we originally wrote to for generation 1 of that relfilenode, even though we now have a file descriptor for generation 2 of that relfilenode that was reopened by a later smgrwrite(). That would be acceptable, because a bogus sync_file_range() call would be harmless. The key point here is that we absolutely must avoid re-opening files without careful interlocking, because that could lead to later *writes* for generation 2 going to the defunct generation 1 file that we opened just as it was being unlinked. (For completeness: we know of another way for smgrwrite() to write to the wrong generation of a recycled relfilenode, but that's hopefully very unlikely and will hopefully be addressed by adding more bits and killing off OID-wraparound[1]. In this thread we're concerned only with these weird "explicit OID reycling" cases we're trying to fix with PROCSIGNAL_BARRIER_SMGRRELEASE sledgehammer-based cache invalidation.) My new attempt, attached, is closer to what Andres proposed, except at the level of md.c segment objects instead of level of SMgrRelation objects. This avoids the dangling SMgrRelation pointer problem discussed earlier, and is also a little like your "zonk" idea, except instead of a zonk flag, the SMgrRelation object is reset to a state where it knows nothing at all except its relfilenode. Any access through smgrXXX() functions *except* smgrwriteback() will rebuild the state, just as you would clear the hypothetical zonk flag. So, to summarise the new patch that I'm attaching to this email as 0001: 1. When PROCSIGNAL_BARRIER_SMGRRELEASE is handled, we call smgrreleaseall(), which calls smgrrelease() on all SMgrRelation objects, telling them to close all their files. 2. All SMgrRelation objects are still valid, and any pointers to them that were acquired before a CFI() and then used in an smgrXXX() call afterwards will still work (the example I know of being RelationCopyStorage()[2]). 3. mdwriteback() internally uses a new "behavior" flag EXTENSION_DONT_OPEN when getting its hands on the internal segment object, which says that we should just give up immediately if we don't already have the file open (concretely: if PROCSIGNAL_BARRIER_SMGRRELEASE came along between our recent smgrwrite() call and the writeback machinery's smgrwriteback() call, it'll do nothing at all). Someone might say that it's weird that smgrwriteback() has that behaviour internally, and we might eventually want to make it more explicit by adding a "behavior" argument to the function itself, so that it's the caller that controls it. It didn't seem worth it for now though; the whole thing is a hint to the operating system anyway. However it seems that I have something wrong, because CI is failing on Windows; I ran out of time for looking into that today, but wanted to post what I have so far since I know we have an open item or two to close here ASAP... Patches 0002-0004 are Andres's, with minor tweaks: v3-0002-Fix-old-fd-issues-using-global-barriers-everywher.patch: It seems useless to even keep the macro USE_BARRIER_SMGRRELEASE if we're going to define it always, so I removed it. I guess it's useful to be able to disable that logic easily to see that the assertion in the other patch fails, but you can do that by commenting out a line in ProcessBarrierSmgrRelease(). I'm still slightly confused about whether we need an *extra* global barrier in DROP TABLESPACE, not just if destroy_tablespace_directory() failed. v3-0003-WIP-test-for-file-reuse-dangers-around-database-a.patch Was missing: -EXTRA_INSTALL=contrib/test_decoding +EXTRA_INSTALL=contrib/test_decoding contrib/pg_prewarm v3-0004-WIP-AssertFileNotDeleted-fd.patch + * XXX: Figure out which operating systems this works on. Do we want this to be wrapped in some kind of macros that vanishes into thin air unless you enable it with USE_UNLINKED_FILE_CHECKING or something? Then we wouldn't care so much if it doesn't work on some unusual system. Bikeshed mode: I would prefer "not unlinked", since that has a more specific meaning than "not deleted". [1] https://www.postgresql.org/message-id/flat/CA%2BTgmobM5FN5x0u3tSpoNvk_TZPFCdbcHxsXCoY1ytn1dXROvg%40mail.gmail.com#1070c79256f2330ec52f063cdbe2add0 [2] https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BBfaZ2puQDYV6h2oSWO2QW21_JOXZpha65gWRcmGNCZA%40mail.gmail.com#5deeb3d8adae4daf0da1f09e509eef56
Attachment
pgsql-hackers by date: