Re: Global snapshots - Mailing list pgsql-hackers

From Alexey Kondratov
Subject Re: Global snapshots
Date
Msg-id 53b1586317ae98ecd8c3383f2c9e7c16@postgrespro.ru
Whole thread Raw
In response to Re: Global snapshots  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: Global snapshots  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Re: Global snapshots  (Fujii Masao <masao.fujii@oss.nttdata.com>)
List pgsql-hackers
On 2020-09-08 14:48, Fujii Masao wrote:
> On 2020/09/08 19:36, Alexey Kondratov wrote:
>> On 2020-09-08 05:49, Fujii Masao wrote:
>>> On 2020/09/05 3:31, Alexey Kondratov wrote:
>>>> 
>>>> Attached is a patch, which implements a plain 2PC in the 
>>>> postgres_fdw and adds a GUC 'postgres_fdw.use_twophase'. Also it 
>>>> solves these errors handling issues above and tries to add proper 
>>>> comments everywhere. I think, that 0003 should be rebased on the top 
>>>> of it, or it could be a first patch in the set, since it may be used 
>>>> independently. What do you think?
>>> 
>>> Thanks for the patch!
>>> 
>>> Sawada-san was proposing another 2PC patch at [1]. Do you have any 
>>> thoughts
>>> about pros and cons between your patch and Sawada-san's?
>>> 
>>> [1]
>>> https://www.postgresql.org/message-id/CA+fd4k4z6_B1ETEvQamwQhu4RX7XsrN5ORL7OhJ4B5B6sW-RgQ@mail.gmail.com
>> 
>> Thank you for the link!
>> 
>> After a quick look on the Sawada-san's patch set I think that there 
>> are two major differences:
> 
> Thanks for sharing your thought! As far as I read your patch quickly,
> I basically agree with your this view.
> 
> 
>> 
>> 1. There is a built-in foreign xacts resolver in the [1], which should 
>> be much more convenient from the end-user perspective. It involves 
>> huge in-core changes and additional complexity that is of course worth 
>> of.
>> 
>> However, it's still not clear for me that it is possible to resolve 
>> all foreign prepared xacts on the Postgres' own side with a 100% 
>> guarantee. Imagine a situation when the coordinator node is actually a 
>> HA cluster group (primary + sync + async replica) and it failed just 
>> after PREPARE stage of after local COMMIT. In that case all foreign 
>> xacts will be left in the prepared state. After failover process 
>> complete synchronous replica will become a new primary. Would it have 
>> all required info to properly resolve orphan prepared xacts?
> 
> IIUC, yes, the information required for automatic resolution is
> WAL-logged and the standby tries to resolve those orphan transactions
> from WAL after the failover. But Sawada-san's patch provides
> the special function for manual resolution, so there may be some cases
> where manual resolution is necessary.
> 

I've found a note about manual resolution in the v25 0002:

+After that we prepare all foreign transactions by calling
+PrepareForeignTransaction() API. If we failed on any of them we change 
to
+rollback, therefore at this time some participants might be prepared 
whereas
+some are not prepared. The former foreign transactions need to be 
resolved
+using pg_resolve_foreign_xact() manually and the latter ends 
transaction
+in one-phase by calling RollbackForeignTransaction() API.

but it's not yet clear for me.

> 
> Implementing 2PC feature only inside postgres_fdw seems to cause
> another issue; COMMIT PREPARED is issued to the remote servers
> after marking the local transaction as committed
> (i.e., ProcArrayEndTransaction()).
> 

According to the Sawada-san's v25 0002 the logic is pretty much the same 
there:

+2. Pre-Commit phase (1st phase of two-phase commit)

+3. Commit locally
+Once we've prepared all of them, commit the transaction locally.

+4. Post-Commit Phase (2nd phase of two-phase commit)

Brief look at the code confirms this scheme. IIUC, AtEOXact_FdwXact / 
FdwXactParticipantEndTransaction happens after ProcArrayEndTransaction() 
in the CommitTransaction(). Thus, I don't see many difference between 
these approach and CallXactCallbacks() usage regarding this point.

> Is this safe? This issue happens
> because COMMIT PREPARED is issued via
> CallXactCallbacks(XACT_EVENT_COMMIT) and that CallXactCallbacks()
> is called after ProcArrayEndTransaction().
> 

Once the transaction is committed locally any ERROR (or higher level 
message) will be escalated to PANIC. And I do see possible ERROR level 
messages in the postgresCommitForeignTransaction() for example:

+    if (PQresultStatus(res) != PGRES_COMMAND_OK)
+        ereport(ERROR, (errmsg("could not commit transaction on server %s",
+                               frstate->server->servername)));

I don't think that it's very convenient to get a PANIC every time we 
failed to commit one of the prepared foreign xacts, since it could be 
not so rare in the distributed system. That's why I tried to get rid of 
possible ERRORs as far as possible in my proposed patch.


Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Optimising compactify_tuples()
Next
From: Bharath Rupireddy
Date:
Subject: Re: Logical Replication - detail message with names of missing columns