Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers

From Masahiro Ikeda
Subject Re: Transactions involving multiple postgres foreign servers, take 2
Date
Msg-id 8fddc794e9bf8f3153627e622c23e992@oss.nttdata.com
Whole thread Raw
In response to Re: Transactions involving multiple postgres foreign servers, take 2  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
List pgsql-hackers
On 2020-07-17 15:55, Masahiko Sawada wrote:
> On Fri, 17 Jul 2020 at 11:06, Masahiro Ikeda <ikedamsh@oss.nttdata.com> 
> wrote:
>> 
>> On 2020-07-16 13:16, Masahiko Sawada wrote:
>> > On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda <ikedamsh@oss.nttdata.com>
>> > wrote:
>> >>
>> >> > I've attached the latest version patches. I've incorporated the review
>> >> > comments I got so far and improved locking strategy.
>> >>
>> >> I want to ask a question about streaming replication with 2PC.
>> >> Are you going to support 2PC with streaming replication?
>> >>
>> >> I tried streaming replication using v23 patches.
>> >> I confirm that 2PC works with streaming replication,
>> >> which there are primary/standby coordinator.
>> >>
>> >> But, in my understanding, the WAL of "PREPARE" and
>> >> "COMMIT/ABORT PREPARED" can't be replicated to the standby server in
>> >> sync.
>> >>
>> >> If this is right, the unresolved transaction can be occurred.
>> >>
>> >> For example,
>> >>
>> >> 1. PREPARE is done
>> >> 2. crash primary before the WAL related to PREPARE is
>> >>     replicated to the standby server
>> >> 3. promote standby server // but can't execute "ABORT PREPARED"
>> >>
>> >> In above case, the remote server has the unresolved transaction.
>> >> Can we solve this problem to support in-sync replication?
>> >>
>> >> But, I think some users use async replication for performance.
>> >> Do we need to document the limitation or make another solution?
>> >>
>> >
>> > IIUC with synchronous replication, we can guarantee that WAL records
>> > are written on both primary and replicas when the client got an
>> > acknowledgment of commit. We don't replicate each WAL records
>> > generated during transaction one by one in sync. In the case you
>> > described, the client will get an error due to the server crash.
>> > Therefore I think the user cannot expect WAL records generated so far
>> > has been replicated. The same issue could happen also when the user
>> > executes PREPARE TRANSACTION and the server crashes.
>> 
>> Thanks! I didn't noticed the behavior when a user executes PREPARE
>> TRANSACTION is same.
>> 
>> IIUC with 2PC, there is a different point between (1)PREPARE 
>> TRANSACTION
>> and (2)2PC.
>> The point is that whether the client can know when the server crashed
>> and it's global tx id.
>> 
>> If (1)PREPARE TRANSACTION is failed, it's ok the client execute same
>> command
>> because if the remote server is already prepared the command will be
>> ignored.
>> 
>> But, if (2)2PC is failed with coordinator crash, the client can't know
>> what operations should be done.
>> 
>> If the old coordinator already executed PREPARED, there are some
>> transaction which should be ABORT PREPARED.
>> But if the PREPARED WAL is not sent to the standby, the new 
>> coordinator
>> can't execute ABORT PREPARED.
>> And the client can't know which remote servers have PREPARED
>> transactions which should be ABORTED either.
>> 
>> Even if the client can know that, only the old coordinator knows its
>> global transaction id.
>> Only the database administrator can analyze the old coordinator's log
>> and then execute the appropriate commands manually, right?
> 
> I think that's right. In the case of the coordinator crash, the user
> can look orphaned foreign prepared transactions by checking the
> 'identifier' column of pg_foreign_xacts on the new standby server and
> the prepared transactions on the remote servers.

I think there is a case we can't check orphaned foreign
prepared transaction in pg_foreign_xacts view on the new standby server.
It confuses users and database administrators.

If the primary coordinator crashes after preparing foreign transaction,
but before sending XLOG_FDWXACT_INSERT records to the standby server,
the standby server can't restore their transaction status and
pg_foreign_xacts view doesn't show the prepared foreign transactions.

To send XLOG_FDWXACT_INSERT records asynchronously leads this problem.

>> > To prevent this
>> > issue, I think we would need to send each WAL records in sync but I'm
>> > not sure it's reasonable behavior, and as long as we write WAL in the
>> > local and then send it to replicas we would need a smart mechanism to
>> > prevent this situation.
>> 
>> I agree. To send each 2PC WAL records  in sync must be with a large
>> performance impact.
>> At least, we need to document the limitation and how to handle this
>> situation.
> 
> Ok. I'll add it.

Thanks a lot.

>> > Related to the pointing out by Ikeda-san, I realized that with the
>> > current patch the backend waits for synchronous replication and then
>> > waits for foreign transaction resolution. But it should be reversed.
>> > Otherwise, it could lead to data loss even when the client got an
>> > acknowledgment of commit. Also, when the user is using both atomic
>> > commit and synchronous replication and wants to cancel waiting, he/she
>> > will need to press ctl-c twice with the current patch, which also
>> > should be fixed.
>> 
>> I'm sorry that I can't understood.
>> 
>> In my understanding, if COMMIT WAL is replicated to the standby in 
>> sync,
>> the standby server can resolve the transaction after crash recovery in
>> promoted phase.
>> 
>> If reversed, there are some situation which can't guarantee atomic
>> commit.
>> In case that some foreign transaction resolutions are succeed but 
>> others
>> are failed(and COMMIT WAL is not replicated),
>> the standby must ABORT PREPARED because the COMMIT WAL is not
>> replicated.
>> This means that some  foreign transactions are COMMITE PREPARED 
>> executed
>> by primary coordinator,
>> other foreign transactions can be ABORT PREPARED executed by secondary
>> coordinator.
> 
> You're right. Thank you for pointing out!
> 
> If the coordinator crashes after the client gets acknowledgment of the
> successful commit of the transaction but before sending
> XLOG_FDWXACT_REMOVE record to the replicas, the FdwXact entries are
> left on the replicas even after failover. But since we require FDW to
> tolerate the error of undefined prepared transactions in
> COMMIT/ROLLBACK PREPARED it won’t be a critical problem.

I agree. It's ok that the primary coordinator sends
XLOG_FDWXACT_REMOVE records asynchronously.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Default setting for enable_hashagg_disk
Next
From: Peter Geoghegan
Date:
Subject: Re: Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)