Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Transactions involving multiple postgres foreign servers, take 2
Date
Msg-id CAD21AoDH6zaUFJLyntZ_aiA-ij_NvqyMox+f7sg1TygMUJOSDQ@mail.gmail.com
Whole thread Raw
In response to Re: Transactions involving multiple postgres foreign servers, take 2  (Masahiro Ikeda <ikedamsh@oss.nttdata.com>)
List pgsql-hackers
On Tue, Apr 27, 2021 at 10:03 AM Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote:
>
>
>
> On 2021/03/17 12:03, Masahiko Sawada wrote:
> > I've attached the updated version patch set.
>
> Thanks for updating the patches! I'm now restarting to review of 2PC because
> I'd like to use this feature in PG15.

Thank you for reviewing the patch! Much appreciated.

>
>
> I think the following logic of resolving and removing the fdwxact entries
> by the transaction resolver needs to be fixed.
>
> 1. check if pending fdwxact entries exist
>
> HoldInDoubtFdwXacts() checks if there are entries which the condition is
> InvalidBackendId and so on. After that it gets the indexes of the fdwxacts
> array. The fdwXactLock is released at the end of this phase.
>
> 2. resolve and remove the entries held in 1th phase.
>
> ResolveFdwXacts() resloves the status per each fdwxact entry using the
> indexes. The end of resolving, the transaction resolver remove the entry in
> fdwxacts array via remove_fdwact().
>
> The way to remove the entry is the following. Since to control using the
> index, the indexes of getting in the 1st phase are meaningless anymore.
>
> /* Remove the entry from active array */
> FdwXactCtl->num_fdwxacts--;
> FdwXactCtl->fdwxacts[i] = FdwXactCtl->fdwxacts[FdwXactCtl->num_fdwxacts];
>
> This seems to lead resolving the unexpected fdwxacts and it can occur the
> following assertion error. That's why I noticed. For example, there is the
> case which a backend inserts new fdwxact entry in the free space, which the
> resolver removed the entry right before, and the resolver accesses the new
> entry which doesn't need to resolve yet because it use the indexes checked in
> 1st phase.
>
> Assert(fdwxact->lockeing_backend == MyBackendId);

Good point. I agree with your analysis.

>
>
>
> The simple solution is that to get fdwXactLock exclusive all the time from the
> begining of 1st phase to the finishing of 2nd phase. But, I worried that the
> performance impact became too big...
>
> I came up with two solutions although there may be better solutions.
>
> A. to remove resolved entries at once after resolution for all held entries is
> finished
>
> If so, we don't need to take the exclusive lock for a long time. But, this
> have other problems, which pg_remove_foreign_xact() can still remove entries
> and we need to handle the fail of resolving.
>
> I wondered that we can solve the first problem to introduce a new lock like
> "removing lock" and only the processes which hold the lock can remove the
> entries. The performance impact is limited since the insertion the fdwxact
> entries is not blocked by this lock. And second problem can be solved using
> try-catch sentence.
>
>
> B. to merge 1st and 2nd phase
>
> Now, the resolver resolves the entries together. That's the reason why it's
> difficult to remove the entries. So, it seems to solve the problem to execute
> checking, resolving and removing per each entry. I think it's better since
> this is simpler than A. If I'm missing something, please let me know.

It seems to me that solution B would be simpler and better. I'll try
to fix this issue by using solution B and rebase the patch.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Granting control of SUSET gucs to non-superusers
Next
From: Peter Geoghegan
Date:
Subject: Re: RFE: Make statistics robust for unplanned events