Re: [HACKERS] Transactions involving multiple postgres foreign servers - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [HACKERS] Transactions involving multiple postgres foreign servers
Date
Msg-id CAD21AoCUyLrfqx--F7aAHF48ax3M3CY2so1QtWxH9jL47eJT8Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Transactions involving multiple postgres foreign servers  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Transactions involving multiple postgres foreign servers  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
Thank you for the comment.

On Fri, May 11, 2018 at 3:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Feb 27, 2018 at 2:21 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> I might be missing your point. As for API breaking, this patch doesn't
>> break any existing FDWs. All new APIs I proposed are dedicated to 2PC.
>> In other words, FDWs that work today can continue working after this
>> patch gets committed, but if FDWs want to support atomic commit then
>> they should be updated to use new APIs. The reason why the calling of
>> FdwXactRegisterForeignServer is necessary is that the core code
>> controls the foreign servers that involved with the transaction but
>> the whether each foreign server uses 2PC option (two_phase_commit) is
>> known only on FDW code. We can eliminate the necessity of calling
>> FdwXactRegisterForeignServer by moving 2PC option from fdw-level to
>> server-level in order not to enforce calling the registering function
>> on FDWs. If we did that, the user can use the 2PC option as a
>> server-level option.
>
> Well, FdwXactRegisterForeignServer has a "bool two_phase_commit"
> argument.  If it only needs to be called by FDWs that support 2PC,
> then that argument is unnecessary.  If it needs to be called by all
> FDWs, then it breaks existing FDWs that don't call it.
>

I understood now. For now since FdwXactRegisterForeignServer needs to
be called by only FDWs that support 2PC, I will remove the argument.

>>>> But this is now responsible by FDW. I should change it to resolver
>>>> side. That is, FDW can raise error in ordinarily way but core code
>>>> should catch and process it.
>>>
>>> I don't understand exactly what you mean here.
>>
>> Hmm I think I got confused. My understanding is that logging a
>> complaint in commit case and not doing that in abort case if prepared
>> transaction doesn't exist is a core code's job. An API contract here
>> is that FDW raise an error with ERRCODE_UNDEFINED_OBJECT error code if
>> there is no such transaction. Since it's an expected case in abort
>> case for the fdwxact manager, the core code can regard the error as
>> not actual problem.
>
> In general, it's not safe to catch an error and continue unless you
> protect the code that throws the error by a sub-transaction.  That
> means we shouldn't expect the FDW to throw an error when the prepared
> transaction isn't found and then just have the core code ignore the
> error.  Instead the FDW should return normally and, if the core code
> needs to know whether the prepared transaction was found, then the FDW
> should indicate this through a return value, not an ERROR.
>
>> Or do you mean that FDWs should not raise an error if there is the
>> prepared transaction, and then core code doesn't need to check
>> sqlstate in case of error?
>
> Right.  As noted above, that's unsafe, so we shouldn't do it.

Thank you. I will think the API contract again based on your suggestion.

>
>>>> You're right. Perhaps we can deal with it by PrescanFdwXacts until
>>>> reach consistent point, and then have vac_update_datfrozenxid check
>>>> local xids of un-resolved fdwxact to determine the new datfrozenxid.
>>>> Since the local xids of un-resolved fdwxacts would not be relevant
>>>> with vacuuming, we don't need to include it to snapshot and
>>>> GetOldestXmin etc. Also we hint to resolve fdwxact when near
>>>> wraparound.
>>>
>>> I agree with you about snapshots, but I'm not sure about GetOldestXmin.
>>
>> Hmm, although I've thought concern in case where we don't consider
>> local xids of un-resolved fdwxact in GetOldestXmin, I could not find
>> problem. Could you share your concern if you have? I'll try to find a
>> possibility based on it.
>
> I don't remember exactly what I was thinking when I wrote that, but I
> think the point is that GetOldestXmin() does a bunch of things other
> than control the threshold for VACUUM, and we'd need to study them all
> and look for problems.  For example, it won't do for an XID to get
> reused while it's still associated with an unresolved fdwxact.  We
> therefore certainly need such XIDs to hold back the cluster-wide
> threshold for clog truncation in some manner -- and maybe that
> involves GetOldestXmin().  Or maybe not.  But anyway the point,
> broadly considered, is that GetOldestXmin() is used in various ways,
> and I don't know if we've thought through all of the consequences in
> regard to this new feature.

Okay, I'll have GetOldestXmin() consider the oldest local xid of
un-resolved fdwxact as well in the next version patch for more safety,
while considering more efficient ways.

>
> Can I ask what your time frame is for updating this patch?
> Considering how much work appears to remain, if you want to get this
> committed to v12, it would be best to get started as early as
> possible.
>

I'll post an updated patch by PGCon at the latest, hopefully in the next week.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Next
From: Masahiko Sawada
Date:
Subject: Re: Global snapshots