Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers
From | Masahiro Ikeda |
---|---|
Subject | Re: Transactions involving multiple postgres foreign servers, take 2 |
Date | |
Msg-id | 860245d7-481d-c651-562c-7b65d92e1397@oss.nttdata.com Whole thread Raw |
In response to | Re: Transactions involving multiple postgres foreign servers, take 2 (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: Transactions involving multiple postgres foreign servers, take 2
|
List | pgsql-hackers |
On 2021/03/17 12:03, Masahiko Sawada wrote: > I've attached the updated version patch set. Thanks for updating the patches! I'm now restarting to review of 2PC because I'd like to use this feature in PG15. I think the following logic of resolving and removing the fdwxact entries by the transaction resolver needs to be fixed. 1. check if pending fdwxact entries exist HoldInDoubtFdwXacts() checks if there are entries which the condition is InvalidBackendId and so on. After that it gets the indexes of the fdwxacts array. The fdwXactLock is released at the end of this phase. 2. resolve and remove the entries held in 1th phase. ResolveFdwXacts() resloves the status per each fdwxact entry using the indexes. The end of resolving, the transaction resolver remove the entry in fdwxacts array via remove_fdwact(). The way to remove the entry is the following. Since to control using the index, the indexes of getting in the 1st phase are meaningless anymore. /* Remove the entry from active array */ FdwXactCtl->num_fdwxacts--; FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts]; This seems to lead resolving the unexpected fdwxacts and it can occur the following assertion error. That's why I noticed. For example, there is the case which a backend inserts new fdwxact entry in the free space, which the resolver removed the entry right before, and the resolver accesses the new entry which doesn't need to resolve yet because it use the indexes checked in 1st phase. Assert(fdwxact->locking_backend == MyBackendId); The simple solution is that to get fdwXactLock exclusive all the time from the begining of 1st phase to the finishing of 2nd phase. But, I worried that the performance impact became too big... I came up with two solutions although there may be better solutions. A. to remove resolved entries at once after resolution for all held entries is finished If so, we don't need to take the exclusive lock for a long time. But, this have other problems, which pg_remove_foreign_xact() can still remove entries and we need to handle the fail of resolving. I wondered that we can solve the first problem to introduce a new lock like "removing lock" and only the processes which hold the lock can remove the entries. The performance impact is limited since the insertion the fdwxact entries is not blocked by this lock. And second problem can be solved using try-catch sentence. B. to merge 1st and 2nd phase Now, the resolver resolves the entries together. That's the reason why it's difficult to remove the entries. So, it seems to solve the problem to execute checking, resolving and removing per each entry. I think it's better since this is simpler than A. If I'm missing something, please let me know. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
pgsql-hackers by date: