Thread: Extending SMgrRelation lifetimes
Hi, SMgrRelationData objects don't currently have a defined lifetime, so it's hard to know when the result of smgropen() might become a dangling pointer. This has caused a few bugs in the past, and the usual fix is to just call smgropen() more often and not hold onto pointers. If you're doing that frequently enough, the hash table lookups can show up in profiles. I'm interested in this topic for more than just micro-optimisations, though: in order to be able to batch/merge smgr operations, I'd like to be able to collect them in data structures that survive more than just a few lines of code. (Examples to follow in later emails). The simplest idea seems to be to tie object lifetime to transactions using the existing AtEOXact_SMgr() mechanism. In recovery, the obvious corresponding time would be the commit/abort record that destroys the storage. This could be achieved by extending smgrrelease(). That was a solution to the same problem in a narrower context: we didn't want CFIs to randomly free SMgrRelations, but we needed to be able to force-close fds in other backends, to fix various edge cases. The new idea is to overload smgrrelease(it) so that it also clears the owner, which means that AtEOXact_SMgr() will eventually smgrclose(it), unless it is re-owned by a relation before then. That choice stems from the complete lack of information available via sinval in the case of an overflow. We must (1) close all descriptors because any file might have been unlinked, (2) keep all pointers valid and yet (3) not leak dropped smgr objects forever. In this patch, smgrreleaseall() achieves those goals. Proof-of-concept patch attached. Are there holes in this scheme? Better ideas welcome. In terms of spelling, another option would be to change the behaviour of smgrclose() to work as described, ie it would call smgrrelease() and then also disown, so we don't have to change most of those callers, and then add a new function smgrdestroy() for the few places that truly need it. Or something like that. Other completely different ideas I've bounced around with various hackers and decided against: references counts, "holder" objects that can be an "owner" (like Relation, but when you don't have a Relation) but can re-open on demand. Seemed needlessly complicated. While studying this I noticed a minor thinko in smgrrelease() in 15+16, so here's a fix for that also. I haven't figured out a sequence that makes anything bad happen, but we should really invalidate smgr_targblock when a relfilenode is reused, since it might point past the end. This becomes more obvious once smgrrelease() is used for truncation, as proposed here.
Attachment
On 14/08/2023 05:41, Thomas Munro wrote: > The new idea is to overload smgrrelease(it) so that it also clears the > owner, which means that AtEOXact_SMgr() will eventually smgrclose(it), > unless it is re-owned by a relation before then. That choice stems > from the complete lack of information available via sinval in the case > of an overflow. We must (1) close all descriptors because any file > might have been unlinked, (2) keep all pointers valid and yet (3) not > leak dropped smgr objects forever. In this patch, smgrreleaseall() > achieves those goals. Makes sense. Some of the smgrclose() calls could perhaps still be smgrclose(). If you can guarantee that no-one is holding the SMgrRelation, it's still OK to call smgrclose(). There's little gain from closing earlier, though. > Proof-of-concept patch attached. Are there holes in this scheme? > Better ideas welcome. In terms of spelling, another option would be > to change the behaviour of smgrclose() to work as described, ie it > would call smgrrelease() and then also disown, so we don't have to > change most of those callers, and then add a new function > smgrdestroy() for the few places that truly need it. Or something > like that. If you change smgrclose() to do what smgrrelease() does now, then it will apply automatically to extensions. If an extension is currently using smgropen()/smgrclose() correctly, this patch alone won't make it wrong, so it's not very critical for extensions to adopt the change. However, if after this we consider it OK to hold a pointer to SMgrRelation for longer, and start doing that in the backend, then extensions need to be adapted too. > While studying this I noticed a minor thinko in smgrrelease() in > 15+16, so here's a fix for that also. I haven't figured out a > sequence that makes anything bad happen, but we should really > invalidate smgr_targblock when a relfilenode is reused, since it might > point past the end. This becomes more obvious once smgrrelease() is > used for truncation, as proposed here. +1. You can move the smgr_targblock clearing out of the loop, though. -- Heikki Linnakangas Neon (https://neon.tech)
On Wed, Aug 16, 2023 at 4:11 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Makes sense. Thanks for looking! > If you change smgrclose() to do what smgrrelease() does now, then it > will apply automatically to extensions. > > If an extension is currently using smgropen()/smgrclose() correctly, > this patch alone won't make it wrong, so it's not very critical for > extensions to adopt the change. However, if after this we consider it OK > to hold a pointer to SMgrRelation for longer, and start doing that in > the backend, then extensions need to be adapted too. Yeah, that sounds quite compelling. Let's try that way: * smgrrelease() is removed * smgrclose() now releases resources, but doesn't destroy until EOX * smgrdestroy() now frees memory, and should rarely be used Still WIP while I think about edge cases, but so far I think this is the better option. > > While studying this I noticed a minor thinko in smgrrelease() in > > 15+16, so here's a fix for that also. I haven't figured out a > > sequence that makes anything bad happen, but we should really > > invalidate smgr_targblock when a relfilenode is reused, since it might > > point past the end. This becomes more obvious once smgrrelease() is > > used for truncation, as proposed here. > > +1. You can move the smgr_targblock clearing out of the loop, though. Right, thanks. Pushed.
Attachment
On Thu, Aug 17, 2023 at 1:11 AM Thomas Munro <thomas.munro@gmail.com> wrote: > Still WIP while I think about edge cases, but so far I think this is > the better option. I think this direction makes a lot of sense. The lack of a defined lifetime for SMgrRelation objects makes correct programming difficult, makes efficient programming difficult, and doesn't really have any advantages. I know this is just a WIP patch but for the final version I think it would make sense to try to do a bit more work on the comments. For instance: - In src/include/utils/rel.h, instead of just deleting that comment, how about documenting the new object lifetime? Or maybe that comment belongs elsewhere, but I think it would definitely good to spell it out very explicitly at some suitable place. - When we change smgrcloseall() to smgrdestroyall(), maybe it's worth spelling out why destroying is needed and not just closing. For example, the second hunk in bgwriter.c includes a comment that says "After any checkpoint, close all smgr files. This is so we won't hang onto smgr references to deleted files indefinitely." But maybe it should say something like "After any checkpoint, close all smgr files and destroy the associated smgr objects. This guarantees that we close the actual file descriptors, that we close the File objects as managed by fd.c, and that we also destroy the smgr objects. We don't want any of these resources to stick around indefinitely after a relation file has been deleted." - Maybe it's worth adding comments around some of the smgrclose[all] calls to mentioning that in those cases we want the file descriptors (and File objects?) to get closed but don't want to invalidate pointers. But I'm less sure that this is necessary. I don't want to have a million comments everywhere, just enough that someone looking at this stuff in the future can orient themselves about what's going on without too much difficulty. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Aug 18, 2023 at 2:30 AM Robert Haas <robertmhaas@gmail.com> wrote: > I think this direction makes a lot of sense. The lack of a defined > lifetime for SMgrRelation objects makes correct programming difficult, > makes efficient programming difficult, and doesn't really have any > advantages. Thanks for looking! > I know this is just a WIP patch but for the final version > I think it would make sense to try to do a bit more work on the > comments. For instance: > > - In src/include/utils/rel.h, instead of just deleting that comment, > how about documenting the new object lifetime? Or maybe that comment > belongs elsewhere, but I think it would definitely good to spell it > out very explicitly at some suitable place. Right, let's one find one good place. I think smgropen() would be best. > - When we change smgrcloseall() to smgrdestroyall(), maybe it's worth > spelling out why destroying is needed and not just closing. For > example, the second hunk in bgwriter.c includes a comment that says > "After any checkpoint, close all smgr files. This is so we won't hang > onto smgr references to deleted files indefinitely." But maybe it > should say something like "After any checkpoint, close all smgr files > and destroy the associated smgr objects. This guarantees that we close > the actual file descriptors, that we close the File objects as managed > by fd.c, and that we also destroy the smgr objects. We don't want any > of these resources to stick around indefinitely after a relation file > has been deleted." There are several similar comments. I think they can be divided into two categories: 1. The error-path ones, that we should now just delete along with the code they describe, because the "various strange errors" should have been fixed comprehensively by PROCSIGNAL_BARRIER_SMGRRELEASE. Here is a separate patch to do that. 2. The per-checkpoint ones that still make sense to avoid unbounded resource usage. Here is a new attempt at explaining: /* - * After any checkpoint, close all smgr files. This is so we - * won't hang onto smgr references to deleted files indefinitely. + * After any checkpoint, free all smgr objects. Otherwise we + * would never do so for dropped relations, as the checkpointer + * does not process shared invalidation messages or call + * AtEOXact_SMgr(). */ - smgrcloseall(); + smgrdestroyall(); > - Maybe it's worth adding comments around some of the smgrclose[all] > calls to mentioning that in those cases we want the file descriptors > (and File objects?) to get closed but don't want to invalidate > pointers. But I'm less sure that this is necessary. I don't want to > have a million comments everywhere, just enough that someone looking > at this stuff in the future can orient themselves about what's going > on without too much difficulty. I covered that with the following comment for smgrclose(): + * The object remains valid, but is moved to the unknown list where it will + * be destroyed by AtEOXact_SMgr(). It may be re-owned if it is accessed by a + * relation before then.
Attachment
I think that if you believe 0001 to be correct you should go ahead and commit it sooner rather than later. If you're wrong and something weird starts happening we'll then have a chance to notice that before too much other stuff gets changed on top of this and confuses the matter. On Wed, Aug 23, 2023 at 12:55 AM Thomas Munro <thomas.munro@gmail.com> wrote: > Right, let's one find one good place. I think smgropen() would be best. I think it would be a good idea to give this comment a bit more oomph. In lieu of this: + * This does not attempt to actually open the underlying files. The returned + * object remains valid at least until AtEOXact_SMgr() is called, or until + * smgrdestroy() is called in non-transactional backends. I would leave the existing "This does not attempt to actually open the underlying files." comment as a separate comment, and add something like this as a new paragraph: In versions of PostgreSQL prior to 17, this function returned an object with no defined lifetime. Now, however, the object remains valid for the lifetime of the transaction, up to the point where AtEOXact_SMgr() is called, making it much easier for callers to know for how long they can hold on to a pointer to the returned object. If this function is called outside of a transaction, the object remains valid until smgrdestroy() or smgrdestroyall() is called. Background processes that use smgr but not transactions typically do this once per checkpoint cycle. Apart from that, the main thing that is bothering me is that the justification for redefining smgrclose() to do what smgrrelease() now does isn't really spelled out anywhere. You mentioned some reasons and Heikki mentioned the benefit to extensions, but I feel like somebody should be able to understand the reasoning clearly from reading the commit message or the comments in the patch, rather than having to visit the mailing list discussion, and I'm not sure we're there yet. I feel like I understood why we were doing this and was convinced it was a good idea at some point, but now the reasoning has gone out of my head and I can't recover it. If somebody does smgropen() .. stuff ... smgrclose() as in heapam_relation_copy_data() or index_copy_data(), this change has the effect of making the SmgrRelation remain valid until eoxact whereas before it would have been destroyed instantly. Is that what we want? Presumably yes, or this patch wouldn't be shaped like this, but I don't know why we want that... Another thing that seems a bit puzzling is how this is intended to, or does, interact with the ownership mechanism. Intuitively, I feel like a Relation owns an SMgrRelation *because* the SMgrRelation has no defined lifetime. But I think that's not quite right. I guess the ownership mechanism doesn't guarantee anything about the lifetime of the object, just that the pointer in the Relation won't hang around any longer than the object to which it's pointing. So does that mean that we're free to redefine the object lifetime to be pretty much anything we want and that mechanism doesn't really care? Huh, interesting. -- Robert Haas EDB: http://www.enterprisedb.com
I spent some more time digging into this, experimenting with different approaches. Came up with pretty significant changes; see below: On 18/09/2023 18:19, Robert Haas wrote: > I think that if you believe 0001 to be correct you should go ahead and > commit it sooner rather than later. If you're wrong and something > weird starts happening we'll then have a chance to notice that before > too much other stuff gets changed on top of this and confuses the > matter. +1 > On Wed, Aug 23, 2023 at 12:55 AM Thomas Munro <thomas.munro@gmail.com> wrote: >> Right, let's one find one good place. I think smgropen() would be best. > > I think it would be a good idea to give this comment a bit more oomph. > In lieu of this: > > + * This does not attempt to actually open the underlying files. The returned > + * object remains valid at least until AtEOXact_SMgr() is called, or until > + * smgrdestroy() is called in non-transactional backends. > > I would leave the existing "This does not attempt to actually open the > underlying files." comment as a separate comment, and add something > like this as a new paragraph: > > In versions of PostgreSQL prior to 17, this function returned an > object with no defined lifetime. Now, however, the object remains > valid for the lifetime of the transaction, up to the point where > AtEOXact_SMgr() is called, making it much easier for callers to know > for how long they can hold on to a pointer to the returned object. If > this function is called outside of a transaction, the object remains > valid until smgrdestroy() or smgrdestroyall() is called. Background > processes that use smgr but not transactions typically do this once > per checkpoint cycle. +1 > Apart from that, the main thing that is bothering me is that the > justification for redefining smgrclose() to do what smgrrelease() now > does isn't really spelled out anywhere. You mentioned some reasons and > Heikki mentioned the benefit to extensions, but I feel like somebody > should be able to understand the reasoning clearly from reading the > commit message or the comments in the patch, rather than having to > visit the mailing list discussion, and I'm not sure we're there yet. I > feel like I understood why we were doing this and was convinced it was > a good idea at some point, but now the reasoning has gone out of my > head and I can't recover it. If somebody does smgropen() .. stuff ... > smgrclose() as in heapam_relation_copy_data() or index_copy_data(), > this change has the effect of making the SmgrRelation remain valid > until eoxact whereas before it would have been destroyed instantly. Is > that what we want? Presumably yes, or this patch wouldn't be shaped > like this, but I don't know why we want that... Fair. I tried to address that by adding an overview comment at top of smgr.c, explaining how this stuff work. I hope that helps. > Another thing that seems a bit puzzling is how this is intended to, or > does, interact with the ownership mechanism. Intuitively, I feel like > a Relation owns an SMgrRelation *because* the SMgrRelation has no > defined lifetime. But I think that's not quite right. I guess the > ownership mechanism doesn't guarantee anything about the lifetime of > the object, just that the pointer in the Relation won't hang around > any longer than the object to which it's pointing. So does that mean > that we're free to redefine the object lifetime to be pretty much > anything we want and that mechanism doesn't really care? Huh, > interesting. Yeah that owner mechanism is weird. It guarantees that the pointer to the SMgrRelation is cleared when the SMgrRelation is destroyed. But it also prevents the SMgrRelation from being destroyed at end of transaction. That's how it is in 'master' too. But with this patch, we don't normally call smgrdestroy() on an SMgrRelation that came from the relation cache. We do call smgrdestroyall() in the aux processes, but they don't have a relcache. So the real effect of setting the owner now is to prevent the SMgrRelation from being destroyed at end of transaction; the mechanism of clearing the pointer is unused. I found two exceptions to that, though, by adding some extra assertions and running the regression tests: 1. The smgrdestroyall() in a single-user backend in RequestCheckpoint(). It destroys SMgrRelations belonging to relcache entries, and the owner mechanism clears the pointers from the relcache. I think smgrcloseall(), or doing nothing, would actually be more appropriate there. 2. A funny case with foreign tables: ANALYZE on a foreign table calls visibilitymap_count(). A foreign table has no visibility map so it returns 0, but before doing so it calls RelationGetSmgr on the foreign table, which has 0/0/0 rellocator. That creates an SMgrRelation for 0/0/0, and sets the foreign table's relcache entry as its owner. If you then call ANALYZE on another foreign table, it also calls RelationGetSmgr with 0/0/0 rellocator, returning the same SMgrRelation entry, and changes its owner to the new relcache entry. That doesn't make much sense and it's pretty accidental that it works at all, so attached is a patch to avoid calling visibilitymap_count() on foreign tables. I propose that we replace the single owner with a "pin counter". One SMgrRelation can have more than one pin on it, and the guarantee is that as long as the pin counter is non-zero, the SMgrRelation cannot be destroyed and the pointer remains valid. We don't really need the capability for more than one pin at the moment (the regression tests pass with an extra assertion that pincount <= 1 after fixing the foreign table issue), but it seems more straightforward than tracking an owner. Here's another reason to do that: I noticed this at the end of swap_relation_files(): > /* > * Close both relcache entries' smgr links. We need this kluge because > * both links will be invalidated during upcoming CommandCounterIncrement. > * Whichever of the rels is the second to be cleared will have a dangling > * reference to the other's smgr entry. Rather than trying to avoid this > * by ordering operations just so, it's easiest to close the links first. > * (Fortunately, since one of the entries is local in our transaction, > * it's sufficient to clear out our own relcache this way; the problem > * cannot arise for other backends when they see our update on the > * non-transient relation.) > * > * Caution: the placement of this step interacts with the decision to > * handle toast rels by recursion. When we are trying to rebuild pg_class > * itself, the smgr close on pg_class must happen after all accesses in > * this function. > */ > RelationCloseSmgrByOid(r1); > RelationCloseSmgrByOid(r2); If RelationCloseSmgrByOid() merely closes the underlying file descriptor but doesn't destroy the SMgrRelation object - as it does with these patches - I think we reintroduce the dangling reference problem that the comment mentions. But if we allow the same SMgrRelation to be pinned by two different relcache entries, the problem goes away and we can remove that kluge. I think we're missing test coverage for that though. I commented out those calls in 'master' and ran the regression tests, but got no failures. I don't fully understand the problem anyway. Or does it not exist anymore? Is there a moment where we have two relcache entries pointing to the same SMgrRelation? I don't see it. In any case, with a pin counter mechanism, I believe it would be fine. Summary of the changes to the attached main patch: * Added an overview comment at top of smgr.c * Added the comment Robert suggested to smgropen() * Replaced the single owner with a pin count and smgrpin() / smgrunpin() functions. smgrdestroyall() now only destroys unpinned entries * Removed that kluge from swap_relation_files(). It should not be needed anymore with the pin counter. * Changed a few places in bufmgr.c where we called RelationGetSmgr() on every smgr call to keep the SMgrRelation in a local variable. That was not safe before, but is now. I don't think we should go on a spree to change all callers - RelationGetSmgr() is still cheap - but in these few places it seems worthwhile. * I kept the separate smgrclose() and smgrrelease() functions. I know I suggested to just change smgrclose() to do what smgrrelease() did, but on second thoughts keeping them separate seems nicer. However, sgmgrclose() just calls smgrrelease() now, so the distinction is just pro forma. The idea is that if you call smgrclose(), you hint that you don't need that SMgrRelation pointer anymore, although there might be other pointers to the same object and they stay valid. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
On 29/11/2023 14:41, Heikki Linnakangas wrote: > 2. A funny case with foreign tables: ANALYZE on a foreign table calls > visibilitymap_count(). A foreign table has no visibility map so it > returns 0, but before doing so it calls RelationGetSmgr on the foreign > table, which has 0/0/0 rellocator. That creates an SMgrRelation for > 0/0/0, and sets the foreign table's relcache entry as its owner. If you > then call ANALYZE on another foreign table, it also calls > RelationGetSmgr with 0/0/0 rellocator, returning the same SMgrRelation > entry, and changes its owner to the new relcache entry. That doesn't > make much sense and it's pretty accidental that it works at all, so > attached is a patch to avoid calling visibilitymap_count() on foreign > tables. This patch seems uncontroversial and independent of the others, so I committed it. -- Heikki Linnakangas Neon (https://neon.tech)
On Wed, Nov 29, 2023 at 1:42 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > I spent some more time digging into this, experimenting with different > approaches. Came up with pretty significant changes; see below: Hi Heikki, I think this approach is good. As I wrote in the first email, I had briefly considered reference counting, but at the time I figured there wasn't much point if it's only ever going to be 0 or 1, so I was trying to find the smallest change. But as you explained, there is already an interesting case where it goes to 2, and modelling it that way removes a weird hack, so it's a net improvement over the unusual 'owner' concept. +1 for your version. Are there any further tidying or other improvements you want to make? Typos in comments: s/desroyed/destroyed/ s/receiveing/receiving/
On 31/01/2024 10:54, Thomas Munro wrote: > On Wed, Nov 29, 2023 at 1:42 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> I spent some more time digging into this, experimenting with different >> approaches. Came up with pretty significant changes; see below: > > Hi Heikki, > > I think this approach is good. As I wrote in the first email, I had > briefly considered reference counting, but at the time I figured there > wasn't much point if it's only ever going to be 0 or 1, so I was > trying to find the smallest change. But as you explained, there is > already an interesting case where it goes to 2, and modelling it that > way removes a weird hack, so it's a net improvement over the unusual > 'owner' concept. +1 for your version. Are there any further tidying > or other improvements you want to make? Ok, no, this is good to go then. I'll rebase, fix the typos, run the regression tests again, and push this shortly. Thanks! -- Heikki Linnakangas Neon (https://neon.tech)