Thread: Re: Speed up TransactionIdIsCurrentTransactionId() with lots of subxacts

Re: Speed up TransactionIdIsCurrentTransactionId() with lots of subxacts

From
Bertrand Drouvot
Date:
Hi,

On Fri, Oct 11, 2024 at 12:21:12AM +0300, Heikki Linnakangas wrote:
> > +   98.52%    98.45%  postgres  postgres           [.]
> > TransactionIdIsCurrentTransactionId
> In this scenario with lots of active subtransactions,
> TransactionIdIsCurrentTranactionId effectively degenerates into a linear
> search over a linked list, as it traverses each level in the
> CurrentTransactionState stack.

Agree.

> The patch

Thanks for the patch!

> Instead of having a separate childXids array on each transaction level, we
> can maintain one large array of XIDs that includes the XIDs of all committed
> and in-progress subtransactions. On each nesting level, remember the offset
> within that array, so that all XIDs belonging to that nesting level or its
> parents are above that offset.
> When a subtransaction commits, you don't need
> to copy XIDs to the parent, you just adjust the parent's offset into the
> array, to include the child's XIDs.

Yeah, makes sense. And in case of subtransaction rollback, that's fine because
all the subtransactions down to this one can be/are "discarded".

> Patch attached,

A few random comments:

=== 1

+  parentOffset--;
+  parents[parentOffset]->totalXids = totalXids;

what about "parents[--parentOffset]->totalXids = totalXids;" instead?

That would remove the ability for someone to insert code between the decrement
and the array access.

=== 2

+  newsize = 32;
+  CurrentTransactionIds = MemoryContextAlloc(TopTransactionContext,
+                                             32 * sizeof(TransactionId));

what about defining a macro for "32" (as it is used in multiple places in
xact.c)?

=== 3

-       /* Copy data into output area. */
+               /*
+                * In CurrentTransactoinIds,

s/CurrentTransactoinIds/CurrentTransactionIds/

4 ===

 int
 xactGetCommittedChildren(TransactionId **ptr)
 {
-       TransactionState s = CurrentTransactionState;
+       return GetCommittedChildren(CurrentTransactionState, ptr);

Worth to move this part of the comment on top of xactGetCommittedChildren(),

"
If there are no subxacts, *ptr is set to NULL.
"

on top of GetCommittedChildren() instead?

5 ===

 static void
 AtSubAbort_childXids(void)
 {
-       TransactionState s = CurrentTransactionState;
-
-       /*
-        * We keep the child-XID arrays in TopTransactionContext (see
-        * AtSubCommit_childXids).  This means we'd better free the array
-        * explicitly at abort to avoid leakage.
-        */
-       if (s->childXids != NULL)
-               pfree(s->childXids);
-       s->childXids = NULL;
-       s->nChildXids = 0;
-       s->maxChildXids = 0;
+       /* nothing to do */

Yeah that works but then some CurrentTransactionIds[] elements do not reflect
the "reality" anymore (until the top level transaction finishes, or until a
new subtransaction is created). 

Could'nt that be an issue from a code maintability point of view?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Speed up TransactionIdIsCurrentTransactionId() with lots of subxacts

From
Heikki Linnakangas
Date:
On 13/11/2024 18:08, Bertrand Drouvot wrote:
> === 1
> 
> +  parentOffset--;
> +  parents[parentOffset]->totalXids = totalXids;
> 
> what about "parents[--parentOffset]->totalXids = totalXids;" instead?
> 
> That would remove the ability for someone to insert code between the decrement
> and the array access.

IMHO it's good as it is. There's no particular harm if someone inserts 
code there, is there?

I wonder if we should add a "child" pointer to TransactionStateData 
though. That would make it unnecessary to build the temporary array 
here, and it could be used to avoid the recursion in 
ShowTransactionStateRec().

> === 2
> 
> +  newsize = 32;
> +  CurrentTransactionIds = MemoryContextAlloc(TopTransactionContext,
> +                                             32 * sizeof(TransactionId));
> 
> what about defining a macro for "32" (as it is used in multiple places in
> xact.c)?

The only other place is in AtStart_Memory(), but there it's used for a 
different thing. I think a constant is fine here. It'd probably be good 
to change "32 * sizeof(TransactionId)" to "newsize * 
sizeof(TransactionId)" though.

> === 3
> 
> -       /* Copy data into output area. */
> +               /*
> +                * In CurrentTransactoinIds,
> 
> s/CurrentTransactoinIds/CurrentTransactionIds/

thanks!

> 4 ===
> 
>   int
>   xactGetCommittedChildren(TransactionId **ptr)
>   {
> -       TransactionState s = CurrentTransactionState;
> +       return GetCommittedChildren(CurrentTransactionState, ptr);
> 
> Worth to move this part of the comment on top of xactGetCommittedChildren(),
> 
> "
> If there are no subxacts, *ptr is set to NULL.
> "
> 
> on top of GetCommittedChildren() instead?

I don't see why. The interface is described in 
xactGetCommittedChildren() now, and GetCommittedChildren() just notes 
that it's an internal version of xactGetCommittedChildren(). It seems 
clear to me that you should look at xactGetCommittedChildren() for more 
information.

> 5 ===
> 
>   static void
>   AtSubAbort_childXids(void)
>   {
> -       TransactionState s = CurrentTransactionState;
> -
> -       /*
> -        * We keep the child-XID arrays in TopTransactionContext (see
> -        * AtSubCommit_childXids).  This means we'd better free the array
> -        * explicitly at abort to avoid leakage.
> -        */
> -       if (s->childXids != NULL)
> -               pfree(s->childXids);
> -       s->childXids = NULL;
> -       s->nChildXids = 0;
> -       s->maxChildXids = 0;
> +       /* nothing to do */
> 
> Yeah that works but then some CurrentTransactionIds[] elements do not reflect
> the "reality" anymore (until the top level transaction finishes, or until a
> new subtransaction is created).
> 
> Could'nt that be an issue from a code maintability point of view?

Hmm, I don't know. The subtransaction is in the process of being 
aborted. Whether you consider its XID to still belong to the 
subtransaction until it's fully aborted is a matter of taste. If you 
consider that they still belong to the subtransaction until the 
subtransaction's TransactionState is free'd, it makes sense to leave 
them alone here.

Before this patch, it made sense to reset all those fields when the 
childXids array was free'd, but with this patch, the elements in 
CurrentTransactionIds[] are still valid until the TransactionState is 
free'd.

What would be the alternative? I don't think it makes sense to go out of 
our way to clear the elements in CurrentTransactionIds[]. We could set 
them to InvalidXid to make it clear they're not valid anymore, but no 
one is supposed to look at elements beyond totalXids anyway. We don't do 
such clearing at the end of top transaction either.

Thanks for the review!

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Speed up TransactionIdIsCurrentTransactionId() with lots of subxacts

From
Bertrand Drouvot
Date:
Hi,

On Thu, Nov 14, 2024 at 12:16:52AM +0200, Heikki Linnakangas wrote:
> On 13/11/2024 18:08, Bertrand Drouvot wrote:
> > === 1
> > 
> > +  parentOffset--;
> > +  parents[parentOffset]->totalXids = totalXids;
> > 
> > what about "parents[--parentOffset]->totalXids = totalXids;" instead?
> > 
> > That would remove the ability for someone to insert code between the decrement
> > and the array access.
> 
> IMHO it's good as it is. There's no particular harm if someone inserts code
> there, is there?

I think it all depends what the inserted code expects as parentOffset value.
I thought it could be better to just use a single line but this is probably just
a matter of taste afterall.

> I wonder if we should add a "child" pointer to TransactionStateData though.
> That would make it unnecessary to build the temporary array here, and it
> could be used to avoid the recursion in ShowTransactionStateRec().

Yeah, that's a good idea and that would also provide a more natural representation
of transactions relationships IMHO. I don't see cons doing so, that would not
add extra complexity, just the need to maintain both parent and child pointers
which should be simple enough. I think that would make the code simpler actually.

> > === 2
> > 
> > +  newsize = 32;
> > +  CurrentTransactionIds = MemoryContextAlloc(TopTransactionContext,
> > +                                             32 * sizeof(TransactionId));
> > 
> > what about defining a macro for "32" (as it is used in multiple places in
> > xact.c)?
> 
> The only other place is in AtStart_Memory(), but there it's used for a
> different thing. I think a constant is fine here. It'd probably be good to
> change "32 * sizeof(TransactionId)" to "newsize * sizeof(TransactionId)"
> though.

Yeah, +1 for using "newsize * sizeof(TransactionId)": that was the main idea
behind the define proposal (change the constant value at only one place if
needed).

> > 4 ===
> > 
> >   int
> >   xactGetCommittedChildren(TransactionId **ptr)
> >   {
> > -       TransactionState s = CurrentTransactionState;
> > +       return GetCommittedChildren(CurrentTransactionState, ptr);
> > 
> > Worth to move this part of the comment on top of xactGetCommittedChildren(),
> > 
> > "
> > If there are no subxacts, *ptr is set to NULL.
> > "
> > 
> > on top of GetCommittedChildren() instead?
> 
> I don't see why. The interface is described in xactGetCommittedChildren()
> now, and GetCommittedChildren() just notes that it's an internal version of
> xactGetCommittedChildren(). It seems clear to me that you should look at
> xactGetCommittedChildren() for more information.

Yeah, OTOH GetCommittedChildren() is the main reason of 
"If there are no subxacts, *ptr is set to NULL": so maybe instead of moving the
comment, just duplicate it. That's probably a Nit though. 

> > 5 ===
> > 
> >   static void
> >   AtSubAbort_childXids(void)
> >   {
> > -       TransactionState s = CurrentTransactionState;
> > -
> > -       /*
> > -        * We keep the child-XID arrays in TopTransactionContext (see
> > -        * AtSubCommit_childXids).  This means we'd better free the array
> > -        * explicitly at abort to avoid leakage.
> > -        */
> > -       if (s->childXids != NULL)
> > -               pfree(s->childXids);
> > -       s->childXids = NULL;
> > -       s->nChildXids = 0;
> > -       s->maxChildXids = 0;
> > +       /* nothing to do */
> > 
> > Yeah that works but then some CurrentTransactionIds[] elements do not reflect
> > the "reality" anymore (until the top level transaction finishes, or until a
> > new subtransaction is created).
> > 
> > Could'nt that be an issue from a code maintability point of view?
> 
> Hmm, I don't know. The subtransaction is in the process of being aborted.
> Whether you consider its XID to still belong to the subtransaction until
> it's fully aborted is a matter of taste. If you consider that they still
> belong to the subtransaction until the subtransaction's TransactionState is
> free'd, it makes sense to leave them alone here.

Agree.

> Before this patch, it made sense to reset all those fields when the
> childXids array was free'd, but with this patch, the elements in
> CurrentTransactionIds[] are still valid until the TransactionState is
> free'd.

Yes, there is nothing wrong with the patch, it works.

> What would be the alternative? I don't think it makes sense to go out of our
> way to clear the elements in CurrentTransactionIds[]. We could set them to
> InvalidXid to make it clear they're not valid anymore,

Yeah, that's what I had in mind. It's probably not worth the extra code but
maybe it's worth a comment?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com