Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers

From Masahiro Ikeda
Subject Re: Transactions involving multiple postgres foreign servers, take 2
Date
Msg-id 615b8799-51b9-7fbf-8013-efd171321e21@oss.nttdata.com
Whole thread Raw
In response to Re: Transactions involving multiple postgres foreign servers, take 2  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Transactions involving multiple postgres foreign servers, take 2
List pgsql-hackers

On 2021/05/25 21:59, Masahiko Sawada wrote:
> On Fri, May 21, 2021 at 5:48 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:
>>
>> On 2021/05/21 13:45, Masahiko Sawada wrote:
>>>
>>> Yes. We also might need to be careful about the order of foreign
>>> transaction resolution. I think we need to resolve foreign> transactions in arrival order at least within a foreign
server.
>>
>> I agree it's better.
>>
>> (Although this is my interest...)
>> Is it necessary? Although this idea seems to be for atomic visibility,
>> 2PC can't realize that as you know. So, I wondered that.
> 
> I think it's for fairness. If a foreign transaction arrived earlier
> gets put off so often for other foreign transactions arrived later due
> to its index in FdwXactCtl->xacts, it’s not understandable for users
> and not fair. I think it’s better to handle foreign transactions in
> FIFO manner (although this problem exists even in the current code).

OK, thanks.


On 2021/05/21 12:45, Masahiro Ikeda wrote:
> On 2021/05/21 10:39, Masahiko Sawada wrote:
>> On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com>
wrote:
>> How much is the performance without those 2PC patches and with the
>> same workload? i.e., how fast is the current postgres_fdw that uses
>> XactCallback?
>
> OK, I'll test.

The test results are followings. But, I couldn't confirm the performance
improvements of 2PC patches though I may need to be changed the test condition.

[condition]
* 1 coordinator and 3 foreign servers
* There are two custom scripts which access different two foreign servers per
transaction

``` fxact_select.pgbench
BEGIN;
SELECT * FROM part:p1 WHERE id = :id;
SELECT * FROM part:p2 WHERE id = :id;
COMMIT;
```

``` fxact_update.pgbench
BEGIN;
UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id;
COMMIT;
```

[results]

I have tested three times.
Performance difference seems to be within the range of errors.

# 6d0eb38557 with 2pc patches(v36) and foreign_twophase_commit = disable
- fxact_update.pgbench
72.3, 74.9, 77.5  TPS  => avg 74.9 TPS
110.5, 106.8, 103.2  ms => avg 106.8 ms

- fxact_select.pgbench
1767.6, 1737.1, 1717.4 TPS  => avg 1740.7 TPS
4.5, 4.6, 4.7 ms => avg 4.6ms

# 6d0eb38557 without 2pc patches
- fxact_update.pgbench
76.5, 70.6, 69.5 TPS => avg 72.2 TPS
104.534 + 113.244 + 115.097 => avg 111.0 ms

-fxact_select.pgbench
1810.2, 1748.3, 1737.2 TPS => avg 1765.2 TPS
4.2, 4.6, 4.6 ms=>  4.5 ms





# About the bottleneck of the resolver process

I investigated the performance bottleneck of the resolver process using perf.
The main bottleneck is the following functions.

1st. 42.8% routine->CommitForeignTransaction()
2nd. 31.5% remove_fdwxact()
3rd. 10.16% CommitTransaction()

1st and 3rd problems can be solved by parallelizing resolver processes per
remote servers. But, I wondered that the idea, which backends call also
"COMMIT/ABORT PREPARED" and the resolver process only takes changes of
resolving in-doubt foreign transactions, is better. In many cases, I think
that the number of connections is much greater than the number of remote
servers. If so, the parallelization is not enough.

So, I think the idea which backends execute "PREPARED COMMIT" synchronously is
better. The citus has the 2PC feature and backends send "PREPARED COMMIT" in
the extension. So, this idea is not bad.

Although resolving asynchronously has the performance benefit, we can't take
advantage because the resolver process can be bottleneck easily now.


2nd remove_fdwxact() syncs the WAL, which indicates the foreign transaction
entry is removed. Is it necessary to sync momentarily?

To remove syncing leads the time of recovery phase may be longer because some
fdxact entries need to "COMMIT/ABORT PREPARED" again. But I think the effect
is limited.


# About other trivial comments.

* Is it better to call pgstat_send_wal() in the resolver process?

* Is it better to specify that only one resolver process can be launched in on
database on the descrpition of "max_foreign_transaction_resolvers"?

* Is it intentional that removing and inserting new lines in foreigncmds.c?

* Is it better that "max_prepared_foreign_transactions=%d" is after
"max_prepared_xacts=%d" in xlogdesc.c?

* Is "fdwxact_queue" unnecessary now?

* Is the following " + sizeof(FdwXactResolver)" unnecessary?

#define SizeOfFdwXactResolverCtlData \
    (offsetof(FdwXactResolverCtlData, resolvers) + sizeof(FdwXactResolver))

Although MultiXactStateData considered the backendIds start from 1 indexed,
the resolvers start from 0 indexed. Sorry, if my understanding is wrong.

* s/transaciton/transaction/

* s/foreign_xact_resolution_retry_interval since last
resolver/foreign_xact_resolution_retry_interval since last resolver was/

* Don't we need the debug log in the following in postgres.c like logical
launcher shutdown?

    else if (IsFdwXactLauncher())
    {
        /*
        * The foreign transaction launcher can be stopped at any time.
        * Use exit status 1 so the background worker is restarted.
        */
        proc_exit(1);
    }

* Is pg_stop_foreign_xact_resolver(PG_FUNCTION_ARGS) not documented?

* Is it better from "when arrived a requested by backend process." to
"when a request by backend process is arrived."?


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Race condition in recovery?
Next
From: "Joel Jacobson"
Date:
Subject: Re: security_definer_search_path GUC