Re: Global snapshots - Mailing list pgsql-hackers
From | Alexey Kondratov |
---|---|
Subject | Re: Global snapshots |
Date | |
Msg-id | 53b1586317ae98ecd8c3383f2c9e7c16@postgrespro.ru Whole thread Raw |
In response to | Re: Global snapshots (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Global snapshots
Re: Global snapshots |
List | pgsql-hackers |
On 2020-09-08 14:48, Fujii Masao wrote: > On 2020/09/08 19:36, Alexey Kondratov wrote: >> On 2020-09-08 05:49, Fujii Masao wrote: >>> On 2020/09/05 3:31, Alexey Kondratov wrote: >>>> >>>> Attached is a patch, which implements a plain 2PC in the >>>> postgres_fdw and adds a GUC 'postgres_fdw.use_twophase'. Also it >>>> solves these errors handling issues above and tries to add proper >>>> comments everywhere. I think, that 0003 should be rebased on the top >>>> of it, or it could be a first patch in the set, since it may be used >>>> independently. What do you think? >>> >>> Thanks for the patch! >>> >>> Sawada-san was proposing another 2PC patch at [1]. Do you have any >>> thoughts >>> about pros and cons between your patch and Sawada-san's? >>> >>> [1] >>> https://www.postgresql.org/message-id/CA+fd4k4z6_B1ETEvQamwQhu4RX7XsrN5ORL7OhJ4B5B6sW-RgQ@mail.gmail.com >> >> Thank you for the link! >> >> After a quick look on the Sawada-san's patch set I think that there >> are two major differences: > > Thanks for sharing your thought! As far as I read your patch quickly, > I basically agree with your this view. > > >> >> 1. There is a built-in foreign xacts resolver in the [1], which should >> be much more convenient from the end-user perspective. It involves >> huge in-core changes and additional complexity that is of course worth >> of. >> >> However, it's still not clear for me that it is possible to resolve >> all foreign prepared xacts on the Postgres' own side with a 100% >> guarantee. Imagine a situation when the coordinator node is actually a >> HA cluster group (primary + sync + async replica) and it failed just >> after PREPARE stage of after local COMMIT. In that case all foreign >> xacts will be left in the prepared state. After failover process >> complete synchronous replica will become a new primary. Would it have >> all required info to properly resolve orphan prepared xacts? > > IIUC, yes, the information required for automatic resolution is > WAL-logged and the standby tries to resolve those orphan transactions > from WAL after the failover. But Sawada-san's patch provides > the special function for manual resolution, so there may be some cases > where manual resolution is necessary. > I've found a note about manual resolution in the v25 0002: +After that we prepare all foreign transactions by calling +PrepareForeignTransaction() API. If we failed on any of them we change to +rollback, therefore at this time some participants might be prepared whereas +some are not prepared. The former foreign transactions need to be resolved +using pg_resolve_foreign_xact() manually and the latter ends transaction +in one-phase by calling RollbackForeignTransaction() API. but it's not yet clear for me. > > Implementing 2PC feature only inside postgres_fdw seems to cause > another issue; COMMIT PREPARED is issued to the remote servers > after marking the local transaction as committed > (i.e., ProcArrayEndTransaction()). > According to the Sawada-san's v25 0002 the logic is pretty much the same there: +2. Pre-Commit phase (1st phase of two-phase commit) +3. Commit locally +Once we've prepared all of them, commit the transaction locally. +4. Post-Commit Phase (2nd phase of two-phase commit) Brief look at the code confirms this scheme. IIUC, AtEOXact_FdwXact / FdwXactParticipantEndTransaction happens after ProcArrayEndTransaction() in the CommitTransaction(). Thus, I don't see many difference between these approach and CallXactCallbacks() usage regarding this point. > Is this safe? This issue happens > because COMMIT PREPARED is issued via > CallXactCallbacks(XACT_EVENT_COMMIT) and that CallXactCallbacks() > is called after ProcArrayEndTransaction(). > Once the transaction is committed locally any ERROR (or higher level message) will be escalated to PANIC. And I do see possible ERROR level messages in the postgresCommitForeignTransaction() for example: + if (PQresultStatus(res) != PGRES_COMMAND_OK) + ereport(ERROR, (errmsg("could not commit transaction on server %s", + frstate->server->servername))); I don't think that it's very convenient to get a PANIC every time we failed to commit one of the prepared foreign xacts, since it could be not so rare in the distributed system. That's why I tried to get rid of possible ERRORs as far as possible in my proposed patch. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
pgsql-hackers by date: