Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers
From | Masahiro Ikeda |
---|---|
Subject | Re: Transactions involving multiple postgres foreign servers, take 2 |
Date | |
Msg-id | fd82311b-eb19-feb4-e901-efd0c199c0fe@oss.nttdata.com Whole thread Raw |
In response to | Re: Transactions involving multiple postgres foreign servers, take 2 (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Transactions involving multiple postgres foreign servers, take 2
|
List | pgsql-hackers |
On 2021/06/24 22:27, Masahiko Sawada wrote: > On Thu, Jun 24, 2021 at 9:46 PM k.jamison@fujitsu.com > <k.jamison@fujitsu.com> wrote: >> >> Hi Sawada-san, >> >> I also tried to play a bit with the latest patches similar to Ikeda-san, >> and with foreign 2PC parameter enabled/required. > > Thank you for testing the patch! > >> >>>>>> b. about performance bottleneck (just share my simple benchmark >>>>>> results) >>>>>> >>>>>> The resolver process can be performance bottleneck easily although >>>>>> I think some users want this feature even if the performance is not so >>> good. >>>>>> >>>>>> I tested with very simple workload in my laptop. >>>>>> >>>>>> The test condition is >>>>>> * two remote foreign partitions and one transaction inserts an >>>>>> entry in each partitions. >>>>>> * local connection only. If NW latency became higher, the >>>>>> performance became worse. >>>>>> * pgbench with 8 clients. >>>>>> >>>>>> The test results is the following. The performance of 2PC is only >>>>>> 10% performance of the one of without 2PC. >>>>>> >>>>>> * with foreign_twophase_commit = requried >>>>>> -> If load with more than 10TPS, the number of unresolved foreign >>>>>> -> transactions >>>>>> is increasing and stop with the warning "Increase >>>>>> max_prepared_foreign_transactions". >>>>> >>>>> What was the value of max_prepared_foreign_transactions? >>>> >>>> Now, I tested with 200. >>>> >>>> If each resolution is finished very soon, I thought it's enough >>>> because 8clients x 2partitions = 16, though... But, it's difficult how >>>> to know the stable values. >>> >>> During resolving one distributed transaction, the resolver needs both one >>> round trip and fsync-ing WAL record for each foreign transaction. >>> Since the client doesn’t wait for the distributed transaction to be resolved, >>> the resolver process can be easily bottle-neck given there are 8 clients. >>> >>> If foreign transaction resolution was resolved synchronously, 16 would >>> suffice. >> >> >> I tested the V36 patches on my 16-core machines. >> I setup two foreign servers (F1, F2) . >> F1 has addressbook table. >> F2 has pgbench tables (scale factor = 1). >> There is also 1 coordinator (coor) server where I created user mapping to access the foreign servers. >> I executed the benchmark measurement on coordinator. >> My custom scripts are setup in a way that queries from coordinator >> would have to access the two foreign servers. >> >> Coordinator: >> max_prepared_foreign_transactions = 200 >> max_foreign_transaction_resolvers = 1 >> foreign_twophase_commit = required >> >> Other external servers 1 & 2 (F1 & F2): >> max_prepared_transactions = 100 >> >> >> [select.sql] >> \set int random(1, 100000) >> BEGIN; >> SELECT ad.name, ad.age, ac.abalance >> FROM addressbook ad, pgbench_accounts ac >> WHERE ad.id = :int AND ad.id = ac.aid; >> COMMIT; >> >> I then executed: >> pgbench -r -c 2 -j 2 -T 60 -f select.sql coor >> >> While there were no problems with 1-2 clients, I started having problems >> when running the benchmark with more than 3 clients. >> >> pgbench -r -c 4 -j 4 -T 60 -f select.sql coor >> >> I got the following error on coordinator: >> >> [95396] ERROR: could not prepare transaction on server F2 with ID fx_151455979_1216200_16422 >> [95396] STATEMENT: COMMIT; >> WARNING: there is no transaction in progress >> pgbench: error: client 1 script 0 aborted in command 3 query 0: ERROR: could not prepare transaction on server F2 withID fx_151455979_1216200_16422 >> >> Here's the log on foreign server 2 <F2> matching the above error: >> <F2> LOG: statement: PREPARE TRANSACTION 'fx_151455979_1216200_16422' >> <F2> ERROR: maximum number of prepared transactions reached >> <F2> HINT: Increase max_prepared_transactions (currently 100). >> <F2> STATEMENT: PREPARE TRANSACTION 'fx_151455979_1216200_16422' >> >> So I increased the max_prepared_transactions of <F1> and <F2> from 100 to 200. >> Then I got the error: >> >> [146926] ERROR: maximum number of foreign transactions reached >> [146926] HINT: Increase max_prepared_foreign_transactions: "200". >> >> So I increased the max_prepared_foreign_transactions to "300", >> and got the same error of need to increase the max_prepared_transactions of foreign servers. >> >> I just can't find the right tuning values for this. >> It seems that we always run out of memory in FdwXactState insert_fdwxact >> with multiple concurrent connections during PREPARE TRANSACTION. >> This one I only encountered for SELECT benchmark. >> Although I've got no problems with multiple connections for my custom scripts for >> UPDATE and INSERT benchmarks when I tested up to 30 clients. >> >> Would the following possibly solve this bottleneck problem? > > With the following idea, the performance will get better but will not > be completely solved. Because those results shared by you and > Ikeda-san come from the fact that with the patch we asynchronously > commit the foreign prepared transaction (i.g., asynchronously > performing the second phase of 2PC), but not the architecture. As I > mentioned before, I intentionally removed the synchronous committing > foreign prepared transaction part from the patch set since we still > need to have a discussion of that part. Therefore, with this version > patch, the backend returns OK to the client right after the local > transaction commits with neither committing foreign prepared > transactions by itself nor waiting for those to be committed by the > resolver process. As long as the backend doesn’t wait for foreign > prepared transactions to be committed and there is a limit of the > number of foreign prepared transactions to be held, it could reach the > upper bound if committing foreign prepared transactions cannot keep > up. Hi Jamison-san, sawada-san, Thanks for testing! FWIF, I tested using pgbench with "--rate=" option to know the server can execute transactions with stable throughput. As sawada-san said, the latest patch resolved second phase of 2PC asynchronously. So, it's difficult to control the stable throughput without "--rate=" option. I also worried what I should do when the error happened because to increase "max_prepared_foreign_transaction" doesn't work. Since too overloading may show the error, is it better to add the case to the HINT message? BTW, if sawada-san already develop to run the resolver processes in parallel, why don't you measure performance improvement? Although Robert-san, Tunakawa-san and so on are discussing what architecture is best, one discussion point is that there is a performance risk if adopting asynchronous approach. If we have promising solutions, I think we can make the discussion forward. In my understanding, there are three improvement idea. First is that to make the resolver processes run in parallel. Second is that to send "COMMIT/ABORT PREPARED" remote servers in bulk. Third is to stop syncing the WAL remove_fdwxact() after resolving is done, which I addressed in the mail sent at June 3rd, 13:56. Since third idea is not yet discussed, there may be my misunderstanding. -- Masahiro Ikeda NTT DATA CORPORATION
pgsql-hackers by date: