Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers
From | Masahiro Ikeda |
---|---|
Subject | Re: Transactions involving multiple postgres foreign servers, take 2 |
Date | |
Msg-id | 8fddc794e9bf8f3153627e622c23e992@oss.nttdata.com Whole thread Raw |
In response to | Re: Transactions involving multiple postgres foreign servers, take 2 (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>) |
List | pgsql-hackers |
On 2020-07-17 15:55, Masahiko Sawada wrote: > On Fri, 17 Jul 2020 at 11:06, Masahiro Ikeda <ikedamsh@oss.nttdata.com> > wrote: >> >> On 2020-07-16 13:16, Masahiko Sawada wrote: >> > On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda <ikedamsh@oss.nttdata.com> >> > wrote: >> >> >> >> > I've attached the latest version patches. I've incorporated the review >> >> > comments I got so far and improved locking strategy. >> >> >> >> I want to ask a question about streaming replication with 2PC. >> >> Are you going to support 2PC with streaming replication? >> >> >> >> I tried streaming replication using v23 patches. >> >> I confirm that 2PC works with streaming replication, >> >> which there are primary/standby coordinator. >> >> >> >> But, in my understanding, the WAL of "PREPARE" and >> >> "COMMIT/ABORT PREPARED" can't be replicated to the standby server in >> >> sync. >> >> >> >> If this is right, the unresolved transaction can be occurred. >> >> >> >> For example, >> >> >> >> 1. PREPARE is done >> >> 2. crash primary before the WAL related to PREPARE is >> >> replicated to the standby server >> >> 3. promote standby server // but can't execute "ABORT PREPARED" >> >> >> >> In above case, the remote server has the unresolved transaction. >> >> Can we solve this problem to support in-sync replication? >> >> >> >> But, I think some users use async replication for performance. >> >> Do we need to document the limitation or make another solution? >> >> >> > >> > IIUC with synchronous replication, we can guarantee that WAL records >> > are written on both primary and replicas when the client got an >> > acknowledgment of commit. We don't replicate each WAL records >> > generated during transaction one by one in sync. In the case you >> > described, the client will get an error due to the server crash. >> > Therefore I think the user cannot expect WAL records generated so far >> > has been replicated. The same issue could happen also when the user >> > executes PREPARE TRANSACTION and the server crashes. >> >> Thanks! I didn't noticed the behavior when a user executes PREPARE >> TRANSACTION is same. >> >> IIUC with 2PC, there is a different point between (1)PREPARE >> TRANSACTION >> and (2)2PC. >> The point is that whether the client can know when the server crashed >> and it's global tx id. >> >> If (1)PREPARE TRANSACTION is failed, it's ok the client execute same >> command >> because if the remote server is already prepared the command will be >> ignored. >> >> But, if (2)2PC is failed with coordinator crash, the client can't know >> what operations should be done. >> >> If the old coordinator already executed PREPARED, there are some >> transaction which should be ABORT PREPARED. >> But if the PREPARED WAL is not sent to the standby, the new >> coordinator >> can't execute ABORT PREPARED. >> And the client can't know which remote servers have PREPARED >> transactions which should be ABORTED either. >> >> Even if the client can know that, only the old coordinator knows its >> global transaction id. >> Only the database administrator can analyze the old coordinator's log >> and then execute the appropriate commands manually, right? > > I think that's right. In the case of the coordinator crash, the user > can look orphaned foreign prepared transactions by checking the > 'identifier' column of pg_foreign_xacts on the new standby server and > the prepared transactions on the remote servers. I think there is a case we can't check orphaned foreign prepared transaction in pg_foreign_xacts view on the new standby server. It confuses users and database administrators. If the primary coordinator crashes after preparing foreign transaction, but before sending XLOG_FDWXACT_INSERT records to the standby server, the standby server can't restore their transaction status and pg_foreign_xacts view doesn't show the prepared foreign transactions. To send XLOG_FDWXACT_INSERT records asynchronously leads this problem. >> > To prevent this >> > issue, I think we would need to send each WAL records in sync but I'm >> > not sure it's reasonable behavior, and as long as we write WAL in the >> > local and then send it to replicas we would need a smart mechanism to >> > prevent this situation. >> >> I agree. To send each 2PC WAL records in sync must be with a large >> performance impact. >> At least, we need to document the limitation and how to handle this >> situation. > > Ok. I'll add it. Thanks a lot. >> > Related to the pointing out by Ikeda-san, I realized that with the >> > current patch the backend waits for synchronous replication and then >> > waits for foreign transaction resolution. But it should be reversed. >> > Otherwise, it could lead to data loss even when the client got an >> > acknowledgment of commit. Also, when the user is using both atomic >> > commit and synchronous replication and wants to cancel waiting, he/she >> > will need to press ctl-c twice with the current patch, which also >> > should be fixed. >> >> I'm sorry that I can't understood. >> >> In my understanding, if COMMIT WAL is replicated to the standby in >> sync, >> the standby server can resolve the transaction after crash recovery in >> promoted phase. >> >> If reversed, there are some situation which can't guarantee atomic >> commit. >> In case that some foreign transaction resolutions are succeed but >> others >> are failed(and COMMIT WAL is not replicated), >> the standby must ABORT PREPARED because the COMMIT WAL is not >> replicated. >> This means that some foreign transactions are COMMITE PREPARED >> executed >> by primary coordinator, >> other foreign transactions can be ABORT PREPARED executed by secondary >> coordinator. > > You're right. Thank you for pointing out! > > If the coordinator crashes after the client gets acknowledgment of the > successful commit of the transaction but before sending > XLOG_FDWXACT_REMOVE record to the replicas, the FdwXact entries are > left on the replicas even after failover. But since we require FDW to > tolerate the error of undefined prepared transactions in > COMMIT/ROLLBACK PREPARED it won’t be a critical problem. I agree. It's ok that the primary coordinator sends XLOG_FDWXACT_REMOVE records asynchronously. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
pgsql-hackers by date: