Re: Extending SMgrRelation lifetimes - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Extending SMgrRelation lifetimes
Date
Msg-id 621a52fd-3cd8-4f5d-a561-d510b853bbaf@iki.fi
Whole thread Raw
In response to Re: Extending SMgrRelation lifetimes  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Extending SMgrRelation lifetimes
Re: Extending SMgrRelation lifetimes
List pgsql-hackers
I spent some more time digging into this, experimenting with different 
approaches. Came up with pretty significant changes; see below:

On 18/09/2023 18:19, Robert Haas wrote:
> I think that if you believe 0001 to be correct you should go ahead and
> commit it sooner rather than later. If you're wrong and something
> weird starts happening we'll then have a chance to notice that before
> too much other stuff gets changed on top of this and confuses the
> matter.

+1

> On Wed, Aug 23, 2023 at 12:55 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Right, let's one find one good place.  I think smgropen() would be best.
> 
> I think it would be a good idea to give this comment a bit more oomph.
> In lieu of this:
> 
> + * This does not attempt to actually open the underlying files.  The returned
> + * object remains valid at least until AtEOXact_SMgr() is called, or until
> + * smgrdestroy() is called in non-transactional backends.
> 
> I would leave the existing "This does not attempt to actually open the
> underlying files." comment as a separate comment, and add something
> like this as a new paragraph:
> 
> In versions of PostgreSQL prior to 17, this function returned an
> object with no defined lifetime. Now, however, the object remains
> valid for the lifetime of the transaction, up to the point where
> AtEOXact_SMgr() is called, making it much easier for callers to know
> for how long they can hold on to a pointer to the returned object. If
> this function is called outside of a transaction, the object remains
> valid until smgrdestroy() or smgrdestroyall() is called. Background
> processes that use smgr but not transactions typically do this once
> per checkpoint cycle.

+1

> Apart from that, the main thing that is bothering me is that the
> justification for redefining smgrclose() to do what smgrrelease() now
> does isn't really spelled out anywhere. You mentioned some reasons and
> Heikki mentioned the benefit to extensions, but I feel like somebody
> should be able to understand the reasoning clearly from reading the
> commit message or the comments in the patch, rather than having to
> visit the mailing list discussion, and I'm not sure we're there yet. I
> feel like I understood why we were doing this and was convinced it was
> a good idea at some point, but now the reasoning has gone out of my
> head and I can't recover it. If somebody does smgropen() .. stuff ...
> smgrclose() as in heapam_relation_copy_data() or index_copy_data(),
> this change has the effect of making the SmgrRelation remain valid
> until eoxact whereas before it would have been destroyed instantly. Is
> that what we want? Presumably yes, or this patch wouldn't be shaped
> like this, but I don't know why we want that...

Fair. I tried to address that by adding an overview comment at top of 
smgr.c, explaining how this stuff work. I hope that helps.

> Another thing that seems a bit puzzling is how this is intended to, or
> does, interact with the ownership mechanism. Intuitively, I feel like
> a Relation owns an SMgrRelation *because* the SMgrRelation has no
> defined lifetime. But I think that's not quite right. I guess the
> ownership mechanism doesn't guarantee anything about the lifetime of
> the object, just that the pointer in the Relation won't hang around
> any longer than the object to which it's pointing. So does that mean
> that we're free to redefine the object lifetime to be pretty much
> anything we want and that mechanism doesn't really care? Huh,
> interesting.

Yeah that owner mechanism is weird. It guarantees that the pointer to 
the SMgrRelation is cleared when the SMgrRelation is destroyed. But it 
also prevents the SMgrRelation from being destroyed at end of 
transaction. That's how it is in 'master' too.

But with this patch, we don't normally call smgrdestroy() on an 
SMgrRelation that came from the relation cache. We do call 
smgrdestroyall() in the aux processes, but they don't have a relcache. 
So the real effect of setting the owner now is to prevent the 
SMgrRelation from being destroyed at end of transaction; the mechanism 
of clearing the pointer is unused.

I found two exceptions to that, though, by adding some extra assertions 
and running the regression tests:

1. The smgrdestroyall() in a single-user backend in RequestCheckpoint(). 
It destroys SMgrRelations belonging to relcache entries, and the owner 
mechanism clears the pointers from the relcache. I think smgrcloseall(), 
or doing nothing, would actually be more appropriate there.

2. A funny case with foreign tables: ANALYZE on a foreign table calls 
visibilitymap_count(). A foreign table has no visibility map so it 
returns 0, but before doing so it calls RelationGetSmgr on the foreign 
table, which has 0/0/0 rellocator. That creates an SMgrRelation for 
0/0/0, and sets the foreign table's relcache entry as its owner. If you 
then call ANALYZE on another foreign table, it also calls 
RelationGetSmgr with 0/0/0 rellocator, returning the same SMgrRelation 
entry, and changes its owner to the new relcache entry. That doesn't 
make much sense and it's pretty accidental that it works at all, so 
attached is a patch to avoid calling visibilitymap_count() on foreign 
tables.

I propose that we replace the single owner with a "pin counter". One 
SMgrRelation can have more than one pin on it, and the guarantee is that 
as long as the pin counter is non-zero, the SMgrRelation cannot be 
destroyed and the pointer remains valid. We don't really need the 
capability for more than one pin at the moment (the regression tests 
pass with an extra assertion that pincount <= 1 after fixing the foreign 
table issue), but it seems more straightforward than tracking an owner.

Here's another reason to do that: I noticed this at the end of 
swap_relation_files():

>     /*
>      * Close both relcache entries' smgr links.  We need this kluge because
>      * both links will be invalidated during upcoming CommandCounterIncrement.
>      * Whichever of the rels is the second to be cleared will have a dangling
>      * reference to the other's smgr entry.  Rather than trying to avoid this
>      * by ordering operations just so, it's easiest to close the links first.
>      * (Fortunately, since one of the entries is local in our transaction,
>      * it's sufficient to clear out our own relcache this way; the problem
>      * cannot arise for other backends when they see our update on the
>      * non-transient relation.)
>      *
>      * Caution: the placement of this step interacts with the decision to
>      * handle toast rels by recursion.  When we are trying to rebuild pg_class
>      * itself, the smgr close on pg_class must happen after all accesses in
>      * this function.
>      */
>     RelationCloseSmgrByOid(r1);
>     RelationCloseSmgrByOid(r2);

If RelationCloseSmgrByOid() merely closes the underlying file descriptor 
but doesn't destroy the SMgrRelation object - as it does with these 
patches - I think we reintroduce the dangling reference problem that the 
comment mentions. But if we allow the same SMgrRelation to be pinned by 
two different relcache entries, the problem goes away and we can remove 
that kluge.

I think we're missing test coverage for that though. I commented out 
those calls in 'master' and ran the regression tests, but got no 
failures. I don't fully understand the problem anyway. Or does it not 
exist anymore? Is there a moment where we have two relcache entries 
pointing to the same SMgrRelation? I don't see it. In any case, with a 
pin counter mechanism, I believe it would be fine.


Summary of the changes to the attached main patch:

* Added an overview comment at top of smgr.c

* Added the comment Robert suggested to smgropen()

* Replaced the single owner with a pin count and smgrpin() / smgrunpin() 
functions. smgrdestroyall() now only destroys unpinned entries

* Removed that kluge from swap_relation_files(). It should not be needed 
anymore with the pin counter.

* Changed a few places in bufmgr.c where we called RelationGetSmgr() on 
every smgr call to keep the SMgrRelation in a local variable. That was
not safe before, but is now. I don't think we should go on a spree to 
change all callers - RelationGetSmgr() is still cheap - but in these few 
places it seems worthwhile.

* I kept the separate smgrclose() and smgrrelease() functions. I know I 
suggested to just change smgrclose() to do what smgrrelease() did, but 
on second thoughts keeping them separate seems nicer. However, 
sgmgrclose() just calls smgrrelease() now, so the distinction is just 
pro forma. The idea is that if you call smgrclose(), you hint that you 
don't need that SMgrRelation pointer anymore, although there might be 
other pointers to the same object and they stay valid.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: remaining sql/json patches
Next
From: Tomas Vondra
Date:
Subject: Re: logical decoding and replication of sequences, take 2