Thread: Extending SMgrRelation lifetimes

Extending SMgrRelation lifetimes

From

Thomas Munro

Date:

14 August 2023, 02:41:56

Hi,

SMgrRelationData objects don't currently have a defined lifetime, so
it's hard to know when the result of smgropen() might become a
dangling pointer.  This has caused a few bugs in the past, and the
usual fix is to just call smgropen() more often and not hold onto
pointers.  If you're doing that frequently enough, the hash table
lookups can show up in profiles.  I'm interested in this topic for
more than just micro-optimisations, though: in order to be able to
batch/merge smgr operations, I'd like to be able to collect them in
data structures that survive more than just a few lines of code.
(Examples to follow in later emails).

The simplest idea seems to be to tie object lifetime to transactions
using the existing AtEOXact_SMgr() mechanism.  In recovery, the
obvious corresponding time would be the commit/abort record that
destroys the storage.

This could be achieved by extending smgrrelease().  That was a
solution to the same problem in a narrower context: we didn't want
CFIs to randomly free SMgrRelations, but we needed to be able to
force-close fds in other backends, to fix various edge cases.

The new idea is to overload smgrrelease(it) so that it also clears the
owner, which means that AtEOXact_SMgr() will eventually smgrclose(it),
unless it is re-owned by a relation before then.  That choice stems
from the complete lack of information available via sinval in the case
of an overflow.  We must (1) close all descriptors because any file
might have been unlinked, (2) keep all pointers valid and yet (3) not
leak dropped smgr objects forever.  In this patch, smgrreleaseall()
achieves those goals.

Proof-of-concept patch attached.  Are there holes in this scheme?
Better ideas welcome.  In terms of spelling, another option would be
to change the behaviour of smgrclose() to work as described, ie it
would call smgrrelease() and then also disown, so we don't have to
change most of those callers, and then add a new function
smgrdestroy() for the few places that truly need it.  Or something
like that.

Other completely different ideas I've bounced around with various
hackers and decided against: references counts, "holder" objects that
can be an "owner" (like Relation, but when you don't have a Relation)
but can re-open on demand.  Seemed needlessly complicated.

While studying this I noticed a minor thinko in smgrrelease() in
15+16, so here's a fix for that also.  I haven't figured out a
sequence that makes anything bad happen, but we should really
invalidate smgr_targblock when a relfilenode is reused, since it might
point past the end.  This becomes more obvious once smgrrelease() is
used for truncation, as proposed here.

Attachment

Re: Extending SMgrRelation lifetimes

From

Heikki Linnakangas

Date:

15 August 2023, 16:11:38

On 14/08/2023 05:41, Thomas Munro wrote:
> The new idea is to overload smgrrelease(it) so that it also clears the
> owner, which means that AtEOXact_SMgr() will eventually smgrclose(it),
> unless it is re-owned by a relation before then.  That choice stems
> from the complete lack of information available via sinval in the case
> of an overflow.  We must (1) close all descriptors because any file
> might have been unlinked, (2) keep all pointers valid and yet (3) not
> leak dropped smgr objects forever.  In this patch, smgrreleaseall()
> achieves those goals.

Makes sense.

Some of the smgrclose() calls could perhaps still be smgrclose(). If you 
can guarantee that no-one is holding the SMgrRelation, it's still OK to 
call smgrclose(). There's little gain from closing earlier, though.

> Proof-of-concept patch attached.  Are there holes in this scheme?
> Better ideas welcome.  In terms of spelling, another option would be
> to change the behaviour of smgrclose() to work as described, ie it
> would call smgrrelease() and then also disown, so we don't have to
> change most of those callers, and then add a new function
> smgrdestroy() for the few places that truly need it.  Or something
> like that.

If you change smgrclose() to do what smgrrelease() does now, then it 
will apply automatically to extensions.

If an extension is currently using smgropen()/smgrclose() correctly, 
this patch alone won't make it wrong, so it's not very critical for 
extensions to adopt the change. However, if after this we consider it OK 
to hold a pointer to SMgrRelation for longer, and start doing that in 
the backend, then extensions need to be adapted too.

> While studying this I noticed a minor thinko in smgrrelease() in
> 15+16, so here's a fix for that also.  I haven't figured out a
> sequence that makes anything bad happen, but we should really
> invalidate smgr_targblock when a relfilenode is reused, since it might
> point past the end.  This becomes more obvious once smgrrelease() is
> used for truncation, as proposed here.

+1. You can move the smgr_targblock clearing out of the loop, though.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Extending SMgrRelation lifetimes

From

Thomas Munro

Date:

17 August 2023, 05:10:18

On Wed, Aug 16, 2023 at 4:11 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Makes sense.

Thanks for looking!

> If you change smgrclose() to do what smgrrelease() does now, then it
> will apply automatically to extensions.
>
> If an extension is currently using smgropen()/smgrclose() correctly,
> this patch alone won't make it wrong, so it's not very critical for
> extensions to adopt the change. However, if after this we consider it OK
> to hold a pointer to SMgrRelation for longer, and start doing that in
> the backend, then extensions need to be adapted too.

Yeah, that sounds quite compelling.  Let's try that way:

 * smgrrelease() is removed
 * smgrclose() now releases resources, but doesn't destroy until EOX
 * smgrdestroy() now frees memory, and should rarely be used

Still WIP while I think about edge cases, but so far I think this is
the better option.

> > While studying this I noticed a minor thinko in smgrrelease() in
> > 15+16, so here's a fix for that also.  I haven't figured out a
> > sequence that makes anything bad happen, but we should really
> > invalidate smgr_targblock when a relfilenode is reused, since it might
> > point past the end.  This becomes more obvious once smgrrelease() is
> > used for truncation, as proposed here.
>
> +1. You can move the smgr_targblock clearing out of the loop, though.

Right, thanks.  Pushed.

Attachment

v2-0001-Give-SMgrRelation-pointers-a-well-defined-lifetim.patch

Re: Extending SMgrRelation lifetimes

From

Robert Haas

Date:

17 August 2023, 14:30:17

On Thu, Aug 17, 2023 at 1:11 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> Still WIP while I think about edge cases, but so far I think this is
> the better option.

I think this direction makes a lot of sense. The lack of a defined
lifetime for SMgrRelation objects makes correct programming difficult,
makes efficient programming difficult, and doesn't really have any
advantages. I know this is just a WIP patch but for the final version
I think it would make sense to try to do a bit more work on the
comments. For instance:

- In src/include/utils/rel.h, instead of just deleting that comment,
how about documenting the new object lifetime? Or maybe that comment
belongs elsewhere, but I think it would definitely good to spell it
out very explicitly at some suitable place.

- When we change smgrcloseall() to smgrdestroyall(), maybe it's worth
spelling out why destroying is needed and not just closing. For
example, the second hunk in bgwriter.c includes a comment that says
"After any checkpoint, close all smgr files. This is so we won't hang
onto smgr references to deleted files indefinitely." But maybe it
should say something like "After any checkpoint, close all smgr files
and destroy the associated smgr objects. This guarantees that we close
the actual file descriptors, that we close the File objects as managed
by fd.c, and that we also destroy the smgr objects. We don't want any
of these resources to stick around indefinitely after a relation file
has been deleted."

- Maybe it's worth adding comments around some of the smgrclose[all]
calls to mentioning that in those cases we want the file descriptors
(and File objects?) to get closed but don't want to invalidate
pointers. But I'm less sure that this is necessary. I don't want to
have a million comments everywhere, just enough that someone looking
at this stuff in the future can orient themselves about what's going
on without too much difficulty.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Extending SMgrRelation lifetimes

From

Thomas Munro

Date:

23 August 2023, 04:54:49

On Fri, Aug 18, 2023 at 2:30 AM Robert Haas <robertmhaas@gmail.com> wrote:
> I think this direction makes a lot of sense. The lack of a defined
> lifetime for SMgrRelation objects makes correct programming difficult,
> makes efficient programming difficult, and doesn't really have any
> advantages.

Thanks for looking!

> I know this is just a WIP patch but for the final version
> I think it would make sense to try to do a bit more work on the
> comments. For instance:
>
> - In src/include/utils/rel.h, instead of just deleting that comment,
> how about documenting the new object lifetime? Or maybe that comment
> belongs elsewhere, but I think it would definitely good to spell it
> out very explicitly at some suitable place.

Right, let's one find one good place.  I think smgropen() would be best.

> - When we change smgrcloseall() to smgrdestroyall(), maybe it's worth
> spelling out why destroying is needed and not just closing. For
> example, the second hunk in bgwriter.c includes a comment that says
> "After any checkpoint, close all smgr files. This is so we won't hang
> onto smgr references to deleted files indefinitely." But maybe it
> should say something like "After any checkpoint, close all smgr files
> and destroy the associated smgr objects. This guarantees that we close
> the actual file descriptors, that we close the File objects as managed
> by fd.c, and that we also destroy the smgr objects. We don't want any
> of these resources to stick around indefinitely after a relation file
> has been deleted."

There are several similar comments.  I think they can be divided into
two categories:

1.  The error-path ones, that we should now just delete along with the
code they describe, because the "various strange errors" should have
been fixed comprehensively by PROCSIGNAL_BARRIER_SMGRRELEASE.  Here is
a separate patch to do that.

2.  The per-checkpoint ones that still make sense to avoid unbounded
resource usage.  Here is a new attempt at explaining:

                        /*
-                        * After any checkpoint, close all smgr files.
This is so we
-                        * won't hang onto smgr references to deleted
files indefinitely.
+                        * After any checkpoint, free all smgr
objects.  Otherwise we
+                        * would never do so for dropped relations, as
the checkpointer
+                        * does not process shared invalidation messages or call
+                        * AtEOXact_SMgr().
                         */
-                       smgrcloseall();
+                       smgrdestroyall();

> - Maybe it's worth adding comments around some of the smgrclose[all]
> calls to mentioning that in those cases we want the file descriptors
> (and File objects?) to get closed but don't want to invalidate
> pointers. But I'm less sure that this is necessary. I don't want to
> have a million comments everywhere, just enough that someone looking
> at this stuff in the future can orient themselves about what's going
> on without too much difficulty.

I covered that with the following comment for smgrclose():

+ * The object remains valid, but is moved to the unknown list where it will
+ * be destroyed by AtEOXact_SMgr().  It may be re-owned if it is accessed by a
+ * relation before then.

Attachment

Re: Extending SMgrRelation lifetimes

From

Robert Haas

Date:

18 September 2023, 15:19:36

I think that if you believe 0001 to be correct you should go ahead and
commit it sooner rather than later. If you're wrong and something
weird starts happening we'll then have a chance to notice that before
too much other stuff gets changed on top of this and confuses the
matter.

On Wed, Aug 23, 2023 at 12:55 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> Right, let's one find one good place.  I think smgropen() would be best.

I think it would be a good idea to give this comment a bit more oomph.
In lieu of this:

+ * This does not attempt to actually open the underlying files.  The returned
+ * object remains valid at least until AtEOXact_SMgr() is called, or until
+ * smgrdestroy() is called in non-transactional backends.

I would leave the existing "This does not attempt to actually open the
underlying files." comment as a separate comment, and add something
like this as a new paragraph:

In versions of PostgreSQL prior to 17, this function returned an
object with no defined lifetime. Now, however, the object remains
valid for the lifetime of the transaction, up to the point where
AtEOXact_SMgr() is called, making it much easier for callers to know
for how long they can hold on to a pointer to the returned object. If
this function is called outside of a transaction, the object remains
valid until smgrdestroy() or smgrdestroyall() is called. Background
processes that use smgr but not transactions typically do this once
per checkpoint cycle.

Apart from that, the main thing that is bothering me is that the
justification for redefining smgrclose() to do what smgrrelease() now
does isn't really spelled out anywhere. You mentioned some reasons and
Heikki mentioned the benefit to extensions, but I feel like somebody
should be able to understand the reasoning clearly from reading the
commit message or the comments in the patch, rather than having to
visit the mailing list discussion, and I'm not sure we're there yet. I
feel like I understood why we were doing this and was convinced it was
a good idea at some point, but now the reasoning has gone out of my
head and I can't recover it. If somebody does smgropen() .. stuff ...
smgrclose() as in heapam_relation_copy_data() or index_copy_data(),
this change has the effect of making the SmgrRelation remain valid
until eoxact whereas before it would have been destroyed instantly. Is
that what we want? Presumably yes, or this patch wouldn't be shaped
like this, but I don't know why we want that...

Another thing that seems a bit puzzling is how this is intended to, or
does, interact with the ownership mechanism. Intuitively, I feel like
a Relation owns an SMgrRelation *because* the SMgrRelation has no
defined lifetime. But I think that's not quite right. I guess the
ownership mechanism doesn't guarantee anything about the lifetime of
the object, just that the pointer in the Relation won't hang around
any longer than the object to which it's pointing. So does that mean
that we're free to redefine the object lifetime to be pretty much
anything we want and that mechanism doesn't really care? Huh,
interesting.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Extending SMgrRelation lifetimes

From

Heikki Linnakangas

Date:

29 November 2023, 12:41:57

I spent some more time digging into this, experimenting with different 
approaches. Came up with pretty significant changes; see below:

On 18/09/2023 18:19, Robert Haas wrote:
> I think that if you believe 0001 to be correct you should go ahead and
> commit it sooner rather than later. If you're wrong and something
> weird starts happening we'll then have a chance to notice that before
> too much other stuff gets changed on top of this and confuses the
> matter.

+1

> On Wed, Aug 23, 2023 at 12:55 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Right, let's one find one good place.  I think smgropen() would be best.
> 
> I think it would be a good idea to give this comment a bit more oomph.
> In lieu of this:
> 
> + * This does not attempt to actually open the underlying files.  The returned
> + * object remains valid at least until AtEOXact_SMgr() is called, or until
> + * smgrdestroy() is called in non-transactional backends.
> 
> I would leave the existing "This does not attempt to actually open the
> underlying files." comment as a separate comment, and add something
> like this as a new paragraph:
> 
> In versions of PostgreSQL prior to 17, this function returned an
> object with no defined lifetime. Now, however, the object remains
> valid for the lifetime of the transaction, up to the point where
> AtEOXact_SMgr() is called, making it much easier for callers to know
> for how long they can hold on to a pointer to the returned object. If
> this function is called outside of a transaction, the object remains
> valid until smgrdestroy() or smgrdestroyall() is called. Background
> processes that use smgr but not transactions typically do this once
> per checkpoint cycle.

+1

> Apart from that, the main thing that is bothering me is that the
> justification for redefining smgrclose() to do what smgrrelease() now
> does isn't really spelled out anywhere. You mentioned some reasons and
> Heikki mentioned the benefit to extensions, but I feel like somebody
> should be able to understand the reasoning clearly from reading the
> commit message or the comments in the patch, rather than having to
> visit the mailing list discussion, and I'm not sure we're there yet. I
> feel like I understood why we were doing this and was convinced it was
> a good idea at some point, but now the reasoning has gone out of my
> head and I can't recover it. If somebody does smgropen() .. stuff ...
> smgrclose() as in heapam_relation_copy_data() or index_copy_data(),
> this change has the effect of making the SmgrRelation remain valid
> until eoxact whereas before it would have been destroyed instantly. Is
> that what we want? Presumably yes, or this patch wouldn't be shaped
> like this, but I don't know why we want that...

Fair. I tried to address that by adding an overview comment at top of 
smgr.c, explaining how this stuff work. I hope that helps.

> Another thing that seems a bit puzzling is how this is intended to, or
> does, interact with the ownership mechanism. Intuitively, I feel like
> a Relation owns an SMgrRelation *because* the SMgrRelation has no
> defined lifetime. But I think that's not quite right. I guess the
> ownership mechanism doesn't guarantee anything about the lifetime of
> the object, just that the pointer in the Relation won't hang around
> any longer than the object to which it's pointing. So does that mean
> that we're free to redefine the object lifetime to be pretty much
> anything we want and that mechanism doesn't really care? Huh,
> interesting.

Yeah that owner mechanism is weird. It guarantees that the pointer to 
the SMgrRelation is cleared when the SMgrRelation is destroyed. But it 
also prevents the SMgrRelation from being destroyed at end of 
transaction. That's how it is in 'master' too.

But with this patch, we don't normally call smgrdestroy() on an 
SMgrRelation that came from the relation cache. We do call 
smgrdestroyall() in the aux processes, but they don't have a relcache. 
So the real effect of setting the owner now is to prevent the 
SMgrRelation from being destroyed at end of transaction; the mechanism 
of clearing the pointer is unused.

I found two exceptions to that, though, by adding some extra assertions 
and running the regression tests:

1. The smgrdestroyall() in a single-user backend in RequestCheckpoint(). 
It destroys SMgrRelations belonging to relcache entries, and the owner 
mechanism clears the pointers from the relcache. I think smgrcloseall(), 
or doing nothing, would actually be more appropriate there.

2. A funny case with foreign tables: ANALYZE on a foreign table calls 
visibilitymap_count(). A foreign table has no visibility map so it 
returns 0, but before doing so it calls RelationGetSmgr on the foreign 
table, which has 0/0/0 rellocator. That creates an SMgrRelation for 
0/0/0, and sets the foreign table's relcache entry as its owner. If you 
then call ANALYZE on another foreign table, it also calls 
RelationGetSmgr with 0/0/0 rellocator, returning the same SMgrRelation 
entry, and changes its owner to the new relcache entry. That doesn't 
make much sense and it's pretty accidental that it works at all, so 
attached is a patch to avoid calling visibilitymap_count() on foreign 
tables.

I propose that we replace the single owner with a "pin counter". One 
SMgrRelation can have more than one pin on it, and the guarantee is that 
as long as the pin counter is non-zero, the SMgrRelation cannot be 
destroyed and the pointer remains valid. We don't really need the 
capability for more than one pin at the moment (the regression tests 
pass with an extra assertion that pincount <= 1 after fixing the foreign 
table issue), but it seems more straightforward than tracking an owner.

Here's another reason to do that: I noticed this at the end of 
swap_relation_files():

>     /*
>      * Close both relcache entries' smgr links.  We need this kluge because
>      * both links will be invalidated during upcoming CommandCounterIncrement.
>      * Whichever of the rels is the second to be cleared will have a dangling
>      * reference to the other's smgr entry.  Rather than trying to avoid this
>      * by ordering operations just so, it's easiest to close the links first.
>      * (Fortunately, since one of the entries is local in our transaction,
>      * it's sufficient to clear out our own relcache this way; the problem
>      * cannot arise for other backends when they see our update on the
>      * non-transient relation.)
>      *
>      * Caution: the placement of this step interacts with the decision to
>      * handle toast rels by recursion.  When we are trying to rebuild pg_class
>      * itself, the smgr close on pg_class must happen after all accesses in
>      * this function.
>      */
>     RelationCloseSmgrByOid(r1);
>     RelationCloseSmgrByOid(r2);

If RelationCloseSmgrByOid() merely closes the underlying file descriptor 
but doesn't destroy the SMgrRelation object - as it does with these 
patches - I think we reintroduce the dangling reference problem that the 
comment mentions. But if we allow the same SMgrRelation to be pinned by 
two different relcache entries, the problem goes away and we can remove 
that kluge.

I think we're missing test coverage for that though. I commented out 
those calls in 'master' and ran the regression tests, but got no 
failures. I don't fully understand the problem anyway. Or does it not 
exist anymore? Is there a moment where we have two relcache entries 
pointing to the same SMgrRelation? I don't see it. In any case, with a 
pin counter mechanism, I believe it would be fine.

Summary of the changes to the attached main patch:

* Added an overview comment at top of smgr.c

* Added the comment Robert suggested to smgropen()

* Replaced the single owner with a pin count and smgrpin() / smgrunpin() 
functions. smgrdestroyall() now only destroys unpinned entries

* Removed that kluge from swap_relation_files(). It should not be needed 
anymore with the pin counter.

* Changed a few places in bufmgr.c where we called RelationGetSmgr() on 
every smgr call to keep the SMgrRelation in a local variable. That was
not safe before, but is now. I don't think we should go on a spree to 
change all callers - RelationGetSmgr() is still cheap - but in these few 
places it seems worthwhile.

* I kept the separate smgrclose() and smgrrelease() functions. I know I 
suggested to just change smgrclose() to do what smgrrelease() did, but 
on second thoughts keeping them separate seems nicer. However, 
sgmgrclose() just calls smgrrelease() now, so the distinction is just 
pro forma. The idea is that if you call smgrclose(), you hint that you 
don't need that SMgrRelation pointer anymore, although there might be 
other pointers to the same object and they stay valid.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

Re: Extending SMgrRelation lifetimes

From

Heikki Linnakangas

Date:

08 December 2023, 07:20:26

On 29/11/2023 14:41, Heikki Linnakangas wrote:
> 2. A funny case with foreign tables: ANALYZE on a foreign table calls
> visibilitymap_count(). A foreign table has no visibility map so it
> returns 0, but before doing so it calls RelationGetSmgr on the foreign
> table, which has 0/0/0 rellocator. That creates an SMgrRelation for
> 0/0/0, and sets the foreign table's relcache entry as its owner. If you
> then call ANALYZE on another foreign table, it also calls
> RelationGetSmgr with 0/0/0 rellocator, returning the same SMgrRelation
> entry, and changes its owner to the new relcache entry. That doesn't
> make much sense and it's pretty accidental that it works at all, so
> attached is a patch to avoid calling visibilitymap_count() on foreign
> tables.

This patch seems uncontroversial and independent of the others, so I 
committed it.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Extending SMgrRelation lifetimes

From

Thomas Munro

Date:

31 January 2024, 08:54:58

On Wed, Nov 29, 2023 at 1:42 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I spent some more time digging into this, experimenting with different
> approaches. Came up with pretty significant changes; see below:

Hi Heikki,

I think this approach is good.  As I wrote in the first email, I had
briefly considered reference counting, but at the time I figured there
wasn't much point if it's only ever going to be 0 or 1, so I was
trying to find the smallest change.  But as you explained, there is
already an interesting case where it goes to 2, and modelling it that
way removes a weird hack, so it's a net improvement over the unusual
'owner' concept.  +1 for your version.  Are there any further tidying
or other improvements you want to make?

Typos in comments:

s/desroyed/destroyed/
s/receiveing/receiving/

Re: Extending SMgrRelation lifetimes

From

Heikki Linnakangas

Date:

31 January 2024, 10:37:22

On 31/01/2024 10:54, Thomas Munro wrote:
> On Wed, Nov 29, 2023 at 1:42 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> I spent some more time digging into this, experimenting with different
>> approaches. Came up with pretty significant changes; see below:
> 
> Hi Heikki,
> 
> I think this approach is good.  As I wrote in the first email, I had
> briefly considered reference counting, but at the time I figured there
> wasn't much point if it's only ever going to be 0 or 1, so I was
> trying to find the smallest change.  But as you explained, there is
> already an interesting case where it goes to 2, and modelling it that
> way removes a weird hack, so it's a net improvement over the unusual
> 'owner' concept.  +1 for your version.  Are there any further tidying
> or other improvements you want to make?

Ok, no, this is good to go then. I'll rebase, fix the typos, run the 
regression tests again, and push this shortly. Thanks!

-- 
Heikki Linnakangas
Neon (https://neon.tech)