Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers

From Masahiro Ikeda
Subject Re: Transactions involving multiple postgres foreign servers, take 2
Date
Msg-id 860245d7-481d-c651-562c-7b65d92e1397@oss.nttdata.com
Whole thread Raw
In response to Re: Transactions involving multiple postgres foreign servers, take 2  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Transactions involving multiple postgres foreign servers, take 2  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers

On 2021/03/17 12:03, Masahiko Sawada wrote:
> I've attached the updated version patch set.

Thanks for updating the patches! I'm now restarting to review of 2PC because
I'd like to use this feature in PG15.


I think the following logic of resolving and removing the fdwxact entries
by the transaction resolver needs to be fixed.

1. check if pending fdwxact entries exist

HoldInDoubtFdwXacts() checks if there are entries which the condition is
InvalidBackendId and so on. After that it gets the indexes of the fdwxacts
array. The fdwXactLock is released at the end of this phase.

2. resolve and remove the entries held in 1th phase.

ResolveFdwXacts() resloves the status per each fdwxact entry using the
indexes. The end of resolving, the transaction resolver remove the entry in
fdwxacts array via remove_fdwact().

The way to remove the entry is the following. Since to control using the
index, the indexes of getting in the 1st phase are meaningless anymore.

/* Remove the entry from active array */
FdwXactCtl->num_fdwxacts--;
FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];

This seems to lead resolving the unexpected fdwxacts and it can occur the
following assertion error. That's why I noticed. For example, there is the
case which a backend inserts new fdwxact entry in the free space, which the
resolver removed the entry right before, and the resolver accesses the new
entry which doesn't need to resolve yet because it use the indexes checked in
1st phase.

Assert(fdwxact->locking_backend == MyBackendId);



The simple solution is that to get fdwXactLock exclusive all the time from the
begining of 1st phase to the finishing of 2nd phase. But, I worried that the
performance impact became too big...

I came up with two solutions although there may be better solutions.

A. to remove resolved entries at once after resolution for all held entries is
finished

If so, we don't need to take the exclusive lock for a long time. But, this
have other problems, which pg_remove_foreign_xact() can still remove entries
and we need to handle the fail of resolving.

I wondered that we can solve the first problem to introduce a new lock like
"removing lock" and only the processes which hold the lock can remove the
entries. The performance impact is limited since the insertion the fdwxact
entries is not blocked by this lock. And second problem can be solved using
try-catch sentence.


B. to merge 1st and 2nd phase

Now, the resolver resolves the entries together. That's the reason why it's
difficult to remove the entries. So, it seems to solve the problem to execute
checking, resolving and removing per each entry. I think it's better since
this is simpler than A. If I'm missing something, please let me know.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Addition of authenticated ID to pg_stat_activity
Next
From: Julien Rouhaud
Date:
Subject: Re: Addition of authenticated ID to pg_stat_activity