Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Transactions involving multiple postgres foreign servers, take 2 |
Date | |
Msg-id | CAD21AoDH6zaUFJLyntZ_aiA-ij_NvqyMox+f7sg1TygMUJOSDQ@mail.gmail.com Whole thread Raw |
In response to | Re: Transactions involving multiple postgres foreign servers, take 2 (Masahiro Ikeda <ikedamsh@oss.nttdata.com>) |
List | pgsql-hackers |
On Tue, Apr 27, 2021 at 10:03 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > > > On 2021/03/17 12:03, Masahiko Sawada wrote: > > I've attached the updated version patch set. > > Thanks for updating the patches! I'm now restarting to review of 2PC because > I'd like to use this feature in PG15. Thank you for reviewing the patch! Much appreciated. > > > I think the following logic of resolving and removing the fdwxact entries > by the transaction resolver needs to be fixed. > > 1. check if pending fdwxact entries exist > > HoldInDoubtFdwXacts() checks if there are entries which the condition is > InvalidBackendId and so on. After that it gets the indexes of the fdwxacts > array. The fdwXactLock is released at the end of this phase. > > 2. resolve and remove the entries held in 1th phase. > > ResolveFdwXacts() resloves the status per each fdwxact entry using the > indexes. The end of resolving, the transaction resolver remove the entry in > fdwxacts array via remove_fdwact(). > > The way to remove the entry is the following. Since to control using the > index, the indexes of getting in the 1st phase are meaningless anymore. > > /* Remove the entry from active array */ > FdwXactCtl->num_fdwxacts--; > FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts]; > > This seems to lead resolving the unexpected fdwxacts and it can occur the > following assertion error. That's why I noticed. For example, there is the > case which a backend inserts new fdwxact entry in the free space, which the > resolver removed the entry right before, and the resolver accesses the new > entry which doesn't need to resolve yet because it use the indexes checked in > 1st phase. > > Assert(fdwxact->lockeing_backend == MyBackendId); Good point. I agree with your analysis. > > > > The simple solution is that to get fdwXactLock exclusive all the time from the > begining of 1st phase to the finishing of 2nd phase. But, I worried that the > performance impact became too big... > > I came up with two solutions although there may be better solutions. > > A. to remove resolved entries at once after resolution for all held entries is > finished > > If so, we don't need to take the exclusive lock for a long time. But, this > have other problems, which pg_remove_foreign_xact() can still remove entries > and we need to handle the fail of resolving. > > I wondered that we can solve the first problem to introduce a new lock like > "removing lock" and only the processes which hold the lock can remove the > entries. The performance impact is limited since the insertion the fdwxact > entries is not blocked by this lock. And second problem can be solved using > try-catch sentence. > > > B. to merge 1st and 2nd phase > > Now, the resolver resolves the entries together. That's the reason why it's > difficult to remove the entries. So, it seems to solve the problem to execute > checking, resolving and removing per each entry. I think it's better since > this is simpler than A. If I'm missing something, please let me know. It seems to me that solution B would be simpler and better. I'll try to fix this issue by using solution B and rebase the patch. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
pgsql-hackers by date: