Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Transactions involving multiple postgres foreign servers, take 2
Date
Msg-id CAD21AoBaTc8M7D1iTvBxrfjQw8B3AgFTnjfWcXPUhgu4T6K8jw@mail.gmail.com
Whole thread Raw
In response to Re: Transactions involving multiple postgres foreign servers, take 2  (Masahiro Ikeda <ikedamsh@oss.nttdata.com>)
Responses Re: Transactions involving multiple postgres foreign servers, take 2
RE: Transactions involving multiple postgres foreign servers, take 2
Re: Transactions involving multiple postgres foreign servers, take 2
List pgsql-hackers
On Fri, Jun 25, 2021 at 9:53 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
>
> Hi Jamison-san, sawada-san,
>
> Thanks for testing!
>
> FWIF, I tested using pgbench with "--rate=" option to know the server
> can execute transactions with stable throughput. As sawada-san said,
> the latest patch resolved second phase of 2PC asynchronously. So,
> it's difficult to control the stable throughput without "--rate=" option.
>
> I also worried what I should do when the error happened because to increase
> "max_prepared_foreign_transaction" doesn't work. Since too overloading may
> show the error, is it better to add the case to the HINT message?
>
> BTW, if sawada-san already develop to run the resolver processes in parallel,
> why don't you measure performance improvement? Although Robert-san,
> Tunakawa-san and so on are discussing what architecture is best, one
> discussion point is that there is a performance risk if adopting asynchronous
> approach. If we have promising solutions, I think we can make the discussion
> forward.

Yeah, if we can asynchronously resolve the distributed transactions
without worrying about max_prepared_foreign_transaction error, it
would be good. But we will need synchronous resolution at some point.
I think we at least need to discuss it at this point.

I've attached the new version patch that incorporates the comments
from Fujii-san and Ikeda-san I got so far. We launch a resolver
process per foreign server, committing prepared foreign transactions
on foreign servers in parallel. To get a better performance based on
the current architecture, we can have multiple resolver processes per
foreign server but it seems not easy to tune it in practice. Perhaps
is it better if we simply have a pool of resolver processes and we
assign a resolver process to the resolution of one distributed
transaction one by one? That way, we need to launch resolver processes
as many as the concurrent backends using 2PC.

> In my understanding, there are three improvement idea. First is that to make
> the resolver processes run in parallel. Second is that to send "COMMIT/ABORT
> PREPARED" remote servers in bulk. Third is to stop syncing the WAL
> remove_fdwxact() after resolving is done, which I addressed in the mail sent
> at June 3rd, 13:56. Since third idea is not yet discussed, there may
> be my misunderstanding.

Yes, those optimizations are promising. On the other hand, they could
introduce complexity to the code and APIs. I'd like to keep the first
version simple. I think we need to discuss them at this stage but can
leave the implementation of both parallel execution and batch
execution as future improvements.

For the third idea, I think the implementation was wrong; it removes
the state file then flushes the WAL record. I think these should be
performed in the reverse order. Otherwise, FdwXactState entry could be
left on the standby if the server crashes between them. I might be
missing something though.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment

pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: ERROR: "ft1" is of the wrong type.
Next
From: Michael Paquier
Date:
Subject: Re: Speed up pg_checksums in cases where checksum already set