Re: ResourceOwner refactoring - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: ResourceOwner refactoring
Date
Msg-id 7d9fa5cf-07e3-4691-b40a-802048a0b998@iki.fi
Whole thread Raw
In response to Re: ResourceOwner refactoring  (Alexander Lakhin <exclusion@gmail.com>)
Responses Re: ResourceOwner refactoring
List pgsql-hackers
Thanks for the testing!

On 01/06/2024 11:00, Alexander Lakhin wrote:
> Hello Heikki,
> 
> I've stumbled upon yet another anomaly introduced with b8bff07da.
> 
> Please try the following script:
> SELECT 'init' FROM pg_create_logical_replication_slot('slot',
>     'test_decoding');
> CREATE TABLE tbl(t text);
> BEGIN;
> TRUNCATE table tbl;
> 
> SELECT data FROM pg_logical_slot_get_changes('slot', NULL, NULL,
>    'include-xids', '0', 'skip-empty-xacts', '1', 'stream-changes', '1');
> 
> On b8bff07da (and the current HEAD), it ends up with:
> ERROR:  ResourceOwnerEnlarge called after release started
> 
> But on the previous commit, I get no error:
>    data
> ------
> (0 rows)

This is another case of the issue I tried to fix in commit b8bff07da 
with this:

> --- b/src/backend/access/transam/xact.c
> +++ a/src/backend/access/transam/xact.c
> @@ -5279,7 +5279,20 @@
>  
>          AtEOSubXact_RelationCshould be ache(false, s->subTransactionId,
>                                    s->parent->subTransactionId);
> +
> +
> +        /*
> +         * AtEOSubXact_Inval sometimes needs to temporarily bump the refcount
> +         * on the relcache entries that it processes.  We cannot use the
> +         * subtransaction's resource owner anymore, because we've already
> +         * started releasing it.  But we can use the parent resource owner.
> +         */
> +        CurrentResourceOwner = s->parent->curTransactionOwner;
> +
>          AtEOSubXact_Inval(false);
> +
> +        CurrentResourceOwner = s->curTransactionOwner;
> +
>          ResourceOwnerRelease(s->curTransactionOwner,
>                               RESOURCE_RELEASE_LOCKS,
>                               false, false);

AtEOSubXact_Inval calls LocalExecuteInvalidationMessage(), which 
ultimately calls RelationFlushRelation(). Your test case reaches 
ReorderBufferExecuteInvalidations(), which also calls 
LocalExecuteInvalidationMessage(), in an already aborted subtransaction.

Looking closer at relcache.c, I think we need to make 
RelationFlushRelation() safe to call without a valid resource owner. 
There are already IsTransactionState() checks in 
RelationClearRelation(), and a comment noting that it does not touch 
CurrentResourceOwner. So we should make sure RelationFlushRelation() 
doesn't touch it either, and revert the above change to 
AbortTransaction(). I didn't feel very good about it in the first place, 
and now I see what the proper fix is.

A straightforward fix is to modify RelationFlushRelation() so that if 
!IsTransactionState(), it just marks the entry as invalid instead of 
calling RelationClearRelation(). That's what RelationClearRelation() 
would do anyway, if it didn't hit the assertion first.

RelationClearRelation() is complicated. Depending on the circumstances, 
it removes the relcache entry, rebuilds it, marks it as invalid, or 
performs a partial reload of a nailed relation. So before going for the 
straightforward fix, I'm going to see if I can refactor 
RelationClearRelation() to be less complicated.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: POC: GROUP BY optimization
Next
From: Michael Paquier
Date:
Subject: Re: Test to dump and restore objects left behind by regression