Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers

From Masahiro Ikeda
Subject Re: Transactions involving multiple postgres foreign servers, take 2
Date
Msg-id c043e14f-2c63-786f-9284-fdf8d6760835@oss.nttdata.com
Whole thread Raw
In response to Re: Transactions involving multiple postgres foreign servers, take 2  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses RE: Transactions involving multiple postgres foreign servers, take 2  ("r.takahashi_2@fujitsu.com" <r.takahashi_2@fujitsu.com>)
Re: Transactions involving multiple postgres foreign servers, take 2  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers

On 2021/06/30 10:05, Masahiko Sawada wrote:
> On Fri, Jun 25, 2021 at 9:53 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
>>
>> Hi Jamison-san, sawada-san,
>>
>> Thanks for testing!
>>
>> FWIF, I tested using pgbench with "--rate=" option to know the server
>> can execute transactions with stable throughput. As sawada-san said,
>> the latest patch resolved second phase of 2PC asynchronously. So,
>> it's difficult to control the stable throughput without "--rate=" option.
>>
>> I also worried what I should do when the error happened because to increase
>> "max_prepared_foreign_transaction" doesn't work. Since too overloading may
>> show the error, is it better to add the case to the HINT message?
>>
>> BTW, if sawada-san already develop to run the resolver processes in parallel,
>> why don't you measure performance improvement? Although Robert-san,
>> Tunakawa-san and so on are discussing what architecture is best, one
>> discussion point is that there is a performance risk if adopting asynchronous
>> approach. If we have promising solutions, I think we can make the discussion
>> forward.
> 
> Yeah, if we can asynchronously resolve the distributed transactions
> without worrying about max_prepared_foreign_transaction error, it
> would be good. But we will need synchronous resolution at some point.
> I think we at least need to discuss it at this point.
> 
> I've attached the new version patch that incorporates the comments
> from Fujii-san and Ikeda-san I got so far. We launch a resolver
> process per foreign server, committing prepared foreign transactions
> on foreign servers in parallel. To get a better performance based on
> the current architecture, we can have multiple resolver processes per
> foreign server but it seems not easy to tune it in practice. Perhaps
> is it better if we simply have a pool of resolver processes and we
> assign a resolver process to the resolution of one distributed
> transaction one by one? That way, we need to launch resolver processes
> as many as the concurrent backends using 2PC.

Thanks for updating the patches.

I have tested in my local laptop and summary is the following.

(1) The latest patch(v37) can improve throughput by 1.5 times compared to v36.

Although I expected it improves by 2.0 times because the workload is that one
transaction access two remote servers... I think the reason is that the disk
is bottleneck and I couldn't prepare disks for each postgresql servers. If I
could, I think the performance can be improved by 2.0 times.


(2) The latest patch(v37) throughput of foreign_twophase_commit = required is
about 36% compared to the case if foreign_twophase_commit = disabled.

Although the throughput is improved, the absolute performance is not good. It
may be the fate of 2PC. I think the reason is that the number of WAL writes is
much increase and, the disk writes in my laptop is the bottleneck. I want to
know the result testing in richer environments if someone can do so.


(3) The latest patch(v37) has no overhead if foreign_twophase_commit =
disabled. On the contrary, the performance improved by 3%. It may be within
the margin of error.



The test detail is following.

# condition

* 1 coordinator and 3 foreign servers

* 4 instance shared one ssd disk.

* one transaction queries different two foreign servers.

``` fxact_update.pgbench
\set id random(1, 1000000)

\set partnum 3
\set p1 random(1, :partnum)
\set p2 ((:p1 + 1) % :partnum) + 1

BEGIN;
UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
COMMIT;
```

* pgbench generates load. I increased ${RATE} little by little until "maximum
number of foreign transactions reached" error happens.

```
pgbench -f fxact_update.pgbench -R ${RATE} -c 8 -j 8 -T 180
```

* parameters
max_prepared_transactions = 100
max_prepared_foreign_transactions = 200
max_foreign_transaction_resolvers = 4


# test source code patterns

1. 2pc patches(v36) based on 6d0eb385 (foreign_twophase_commit = required).
2. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = required).
3. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = disabled).
4. 2595e039 without 2pc patches(v37).


# results

1. tps = 241.8000TPS
   latency average = 10.413ms

2. tps = 359.017519 ( by 1.5 times compared to 1. by 0.36% compared to 3.)
   latency average = 15.427ms

3. tps = 987.372220 ( by 1.03% compared to 4. )
   latency average = 8.102ms

4. tps = 955.984574
   latency average = 8.368ms

The disk is the bottleneck in my environment because disk util is almost 100%
in every pattern. If disks for each instance can be prepared, I think we can
expect more performance improvements.


>> In my understanding, there are three improvement idea. First is that to make
>> the resolver processes run in parallel. Second is that to send "COMMIT/ABORT
>> PREPARED" remote servers in bulk. Third is to stop syncing the WAL
>> remove_fdwxact() after resolving is done, which I addressed in the mail sent
>> at June 3rd, 13:56. Since third idea is not yet discussed, there may
>> be my misunderstanding.
> 
> Yes, those optimizations are promising. On the other hand, they could
> introduce complexity to the code and APIs. I'd like to keep the first
> version simple. I think we need to discuss them at this stage but can
> leave the implementation of both parallel execution and batch
> execution as future improvements.

OK, I agree.


> For the third idea, I think the implementation was wrong; it removes
> the state file then flushes the WAL record. I think these should be
> performed in the reverse order. Otherwise, FdwXactState entry could be
> left on the standby if the server crashes between them. I might be
> missing something though.

Oh, I see. I think you're right though what you wanted to say is that it
flushes the WAL records then removes the state file. If "COMMIT/ABORT
PREPARED" statements execute in bulk, it seems enough to sync the wal only
once, then remove all related state files.


BTW, I tested the binary building with -O2, and I got the following warnings.
It's needed to be fixed.

```
fdwxact.c: In function 'PrepareAllFdwXacts':
fdwxact.c:897:13: warning: 'flush_lsn' may be used uninitialized in this
function [-Wmaybe-uninitialized]
  897 |  canceled = SyncRepWaitForLSN(flush_lsn, false);
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Emre Hasegeli
Date:
Subject: Re: Yet another fast GiST build
Next
From: Yura Sokolov
Date:
Subject: Re: rand48 replacement