Re: Global snapshots - Mailing list pgsql-hackers

From Alexey Kondratov
Subject Re: Global snapshots
Date
Msg-id 36887e82ad3f6c5254de71382a672c61@postgrespro.ru
Whole thread Raw
In response to Re: Global snapshots  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
List pgsql-hackers
On 2020-09-09 08:35, Masahiko Sawada wrote:
> On Wed, 9 Sep 2020 at 02:00, Alexey Kondratov
> <a.kondratov@postgrespro.ru> wrote:
>> 
>> On 2020-09-08 14:48, Fujii Masao wrote:
>> >
>> > IIUC, yes, the information required for automatic resolution is
>> > WAL-logged and the standby tries to resolve those orphan transactions
>> > from WAL after the failover. But Sawada-san's patch provides
>> > the special function for manual resolution, so there may be some cases
>> > where manual resolution is necessary.
>> >
>> 
>> I've found a note about manual resolution in the v25 0002:
>> 
>> +After that we prepare all foreign transactions by calling
>> +PrepareForeignTransaction() API. If we failed on any of them we 
>> change
>> to
>> +rollback, therefore at this time some participants might be prepared
>> whereas
>> +some are not prepared. The former foreign transactions need to be
>> resolved
>> +using pg_resolve_foreign_xact() manually and the latter ends
>> transaction
>> +in one-phase by calling RollbackForeignTransaction() API.
>> 
>> but it's not yet clear for me.
> 
> Sorry, the above description in README is out of date. In the v25
> patch, it's true that if a backend fails to prepare a transaction on a
> foreign server, it’s possible that some foreign transactions are
> prepared whereas others are not. But at the end of the transaction
> after changing to rollback, the process does rollback (or rollback
> prepared) all of them. So the use case of pg_resolve_foreign_xact() is
> to resolve orphaned foreign prepared transactions or to resolve a
> foreign transaction that is not resolved for some reasons, bugs etc.
> 

OK, thank you for the explanation!

>> 
>> Once the transaction is committed locally any ERROR (or higher level
>> message) will be escalated to PANIC.
> 
> I think this is true only inside the critical section and it's not
> necessarily true for all errors happening after the local commit,
> right?
> 

It's not actually related to critical section errors escalation. Any 
error in the backend after the local commit and 
ProcArrayEndTransaction() will try to abort the current transaction and 
do RecordTransactionAbort(), but it's too late to do so and PANIC will 
be risen:

    /*
     * Check that we haven't aborted halfway through 
RecordTransactionCommit.
     */
    if (TransactionIdDidCommit(xid))
        elog(PANIC, "cannot abort transaction %u, it was already committed",
             xid);

At least that's how I understand it.


>> And I do see possible ERROR level
>> messages in the postgresCommitForeignTransaction() for example:
>> 
>> +       if (PQresultStatus(res) != PGRES_COMMAND_OK)
>> +               ereport(ERROR, (errmsg("could not commit transaction 
>> on server %s",
>> +                                                          
>> frstate->server->servername)));
>> 
>> I don't think that it's very convenient to get a PANIC every time we
>> failed to commit one of the prepared foreign xacts, since it could be
>> not so rare in the distributed system. That's why I tried to get rid 
>> of
>> possible ERRORs as far as possible in my proposed patch.
>> 
> 
> In my patch, the second phase of 2PC is executed only by the resolver
> process. Therefore, even if an error would happen during committing a
> foreign prepared transaction, we just need to relaunch the resolver
> process and trying again. During that, the backend process will be
> just waiting. If a backend process raises an error after the local
> commit, the client will see transaction failure despite the local
> transaction having been committed. An error could happen even by
> palloc. So the patch uses a background worker to commit prepared
> foreign transactions, not by backend itself.
> 

Yes, if it's a background process, then it seems to be safe.

BTW, it seems that I've chosen a wrong thread for posting my patch and 
staring a discussion :) Activity from this thread moved to [1] and you 
solution with built-in resolver is discussed [2]. I'll try to take a 
look on v25 closely and write to [2] instead.


[1] 
https://www.postgresql.org/message-id/2020081009525213277261%40highgo.ca

[2] 
https://www.postgresql.org/message-id/CAExHW5uBy9QwjdSO4j82WC4aeW-Q4n2ouoZ1z70o%3D8Vb0skqYQ%40mail.gmail.com

Regards
-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company



pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: file_fdw vs relative paths
Next
From: Heikki Linnakangas
Date:
Subject: Re: Yet another fast GiST build