Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Transactions involving multiple postgres foreign servers, take 2
Date
Msg-id CAD21AoD5cFwSPAqHSdK+DzzKXX2BhOwZytd7OVSsAvahosY6zg@mail.gmail.com
Whole thread Raw
In response to Re: Transactions involving multiple postgres foreign servers, take 2  (Masahiro Ikeda <ikedamsh@oss.nttdata.com>)
Responses Re: Transactions involving multiple postgres foreign servers, take 2  (Masahiro Ikeda <ikedamsh@oss.nttdata.com>)
RE: Transactions involving multiple postgres foreign servers, take 2  ("k.jamison@fujitsu.com" <k.jamison@fujitsu.com>)
List pgsql-hackers
On Fri, May 21, 2021 at 12:45 PM Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote:
>
>
>
> On 2021/05/21 10:39, Masahiko Sawada wrote:
> > On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
> >>
> >>
> >> On 2021/05/11 13:37, Masahiko Sawada wrote:
> >>> I've attached the updated patches that incorporated comments from
> >>> Zhihong and Ikeda-san.
> >>
> >> Thanks for updating the patches!
> >>
> >>
> >> I have other comments including trivial things.
> >>
> >>
> >> a. about "foreign_transaction_resolver_timeout" parameter
> >>
> >> Now, the default value of "foreign_transaction_resolver_timeout" is 60 secs.
> >> Is there any reason? Although the following is minor case, it may confuse some
> >> users.
> >>
> >> Example case is that
> >>
> >> 1. a client executes transaction with 2PC when the resolver is processing
> >> FdwXactResolverProcessInDoubtXacts().
> >>
> >> 2. the resolution of 1st transaction must be waited until the other
> >> transactions for 2pc are executed or timeout.
> >>
> >> 3. if the client check the 1st result value, it should wait until resolution
> >> is finished for atomic visibility (although it depends on the way how to
> >> realize atomic visibility.) The clients may be waited
> >> foreign_transaction_resolver_timeout". Users may think it's stale.
> >>
> >> Like this situation can be observed after testing with pgbench. Some
> >> unresolved transaction remains after benchmarking.
> >>
> >> I assume that this default value refers to wal_sender, archiver, and so on.
> >> But, I think this parameter is more like "commit_delay". If so, 60 seconds
> >> seems to be big value.
> >
> > IIUC this situation seems like the foreign transaction resolution is
> > bottle-neck and doesn’t catch up to incoming resolution requests. But
> > how foreignt_transaction_resolver_timeout relates to this situation?
> > foreign_transaction_resolver_timeout controls when to terminate the
> > resolver process that doesn't have any foreign transactions to
> > resolve. So if we set it several milliseconds, resolver processes are
> > terminated immediately after each resolution, imposing the cost of
> > launching resolver processes on the next resolution.
>
> Thanks for your comments!
>
> No, this situation is not related to the foreign transaction resolution is
> bottle-neck or not. This issue may happen when the workload has very few
> foreign transactions.
>
> If new foreign transaction comes while the transaction resolver is processing
> resolutions via FdwXactResolverProcessInDoubtXacts(), the foreign transaction
> waits until starting next transaction resolution. If next foreign transaction
> doesn't come, the foreign transaction must wait starting resolution until
> timeout. I mentioned this situation.

Thanks for your explanation. I think that in this case we should set
the latch of the resolver after preparing all foreign transactions so
that the resolver process those transactions without sleep.

>
> Thanks for letting me know the side effect if setting resolution timeout to
> several milliseconds. I agree. But, why termination is needed? Is there a
> possibility to stale like walsender?

The purpose of this timeout is to terminate resolvers that are idle
for a long time. The resolver processes don't necessarily need to keep
running all the time for every database. On the other hand, launching
a resolver process per commit would be a high cost. So we have
resolver processes keep running at least for
foreign_transaction_resolver_timeout.

>
>
> >>
> >>
> >> b. about performance bottleneck (just share my simple benchmark results)
> >>
> >> The resolver process can be performance bottleneck easily although I think
> >> some users want this feature even if the performance is not so good.
> >>
> >> I tested with very simple workload in my laptop.
> >>
> >> The test condition is
> >> * two remote foreign partitions and one transaction inserts an entry in each
> >> partitions.
> >> * local connection only. If NW latency became higher, the performance became
> >> worse.
> >> * pgbench with 8 clients.
> >>
> >> The test results is the following. The performance of 2PC is only 10%
> >> performance of the one of without 2PC.
> >>
> >> * with foreign_twophase_commit = requried
> >> -> If load with more than 10TPS, the number of unresolved foreign transactions
> >> is increasing and stop with the warning "Increase
> >> max_prepared_foreign_transactions".
> >
> > What was the value of max_prepared_foreign_transactions?
>
> Now, I tested with 200.
>
> If each resolution is finished very soon, I thought it's enough because
> 8clients x 2partitions = 16, though... But, it's difficult how to know the
> stable values.

During resolving one distributed transaction, the resolver needs both
one round trip and fsync-ing WAL record for each foreign transaction.
Since the client doesn’t wait for the distributed transaction to be
resolved, the resolver process can be easily bottle-neck given there
are 8 clients.

If foreign transaction resolution was resolved synchronously, 16 would suffice.

>
>
> > To speed up the foreign transaction resolution, some ideas have been
> > discussed. As another idea, how about launching resolvers for each
> > foreign server? That way, we resolve foreign transactions on each
> > foreign server in parallel. If foreign transactions are concentrated
> > on the particular server, we can have multiple resolvers for the one
> > foreign server. It doesn’t change the fact that all foreign
> > transaction resolutions are processed by resolver processes.
>
> Awesome! There seems to be another pros that even if a foreign server is
> temporarily busy or stopped due to fail over, other foreign server's
> transactions can be resolved.

Yes. We also might need to be careful about the order of foreign
transaction resolution. I think we need to resolve foreign
transactions in arrival order at least within a foreign server.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Addition of authenticated ID to pg_stat_activity
Next
From: Greg Nancarrow
Date:
Subject: Re: Re[3]: On login trigger: take three