Re: [HACKERS] Transactions involving multiple postgres foreign servers - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Transactions involving multiple postgres foreign servers
Date
Msg-id CA+TgmobskLpmM7dsx_Ri-RCYzgJ9bJuNy_16PwzDHsuXGGVg9A@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Transactions involving multiple postgres foreign servers  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [HACKERS] Transactions involving multiple postgres foreign servers  (Masahiko Sawada <sawada.mshk@gmail.com>)
Re: [HACKERS] Transactions involving multiple postgres foreign servers  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
List pgsql-hackers
On Tue, Feb 27, 2018 at 2:21 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I might be missing your point. As for API breaking, this patch doesn't
> break any existing FDWs. All new APIs I proposed are dedicated to 2PC.
> In other words, FDWs that work today can continue working after this
> patch gets committed, but if FDWs want to support atomic commit then
> they should be updated to use new APIs. The reason why the calling of
> FdwXactRegisterForeignServer is necessary is that the core code
> controls the foreign servers that involved with the transaction but
> the whether each foreign server uses 2PC option (two_phase_commit) is
> known only on FDW code. We can eliminate the necessity of calling
> FdwXactRegisterForeignServer by moving 2PC option from fdw-level to
> server-level in order not to enforce calling the registering function
> on FDWs. If we did that, the user can use the 2PC option as a
> server-level option.

Well, FdwXactRegisterForeignServer has a "bool two_phase_commit"
argument.  If it only needs to be called by FDWs that support 2PC,
then that argument is unnecessary.  If it needs to be called by all
FDWs, then it breaks existing FDWs that don't call it.

>>> But this is now responsible by FDW. I should change it to resolver
>>> side. That is, FDW can raise error in ordinarily way but core code
>>> should catch and process it.
>>
>> I don't understand exactly what you mean here.
>
> Hmm I think I got confused. My understanding is that logging a
> complaint in commit case and not doing that in abort case if prepared
> transaction doesn't exist is a core code's job. An API contract here
> is that FDW raise an error with ERRCODE_UNDEFINED_OBJECT error code if
> there is no such transaction. Since it's an expected case in abort
> case for the fdwxact manager, the core code can regard the error as
> not actual problem.

In general, it's not safe to catch an error and continue unless you
protect the code that throws the error by a sub-transaction.  That
means we shouldn't expect the FDW to throw an error when the prepared
transaction isn't found and then just have the core code ignore the
error.  Instead the FDW should return normally and, if the core code
needs to know whether the prepared transaction was found, then the FDW
should indicate this through a return value, not an ERROR.

> Or do you mean that FDWs should not raise an error if there is the
> prepared transaction, and then core code doesn't need to check
> sqlstate in case of error?

Right.  As noted above, that's unsafe, so we shouldn't do it.

>>> You're right. Perhaps we can deal with it by PrescanFdwXacts until
>>> reach consistent point, and then have vac_update_datfrozenxid check
>>> local xids of un-resolved fdwxact to determine the new datfrozenxid.
>>> Since the local xids of un-resolved fdwxacts would not be relevant
>>> with vacuuming, we don't need to include it to snapshot and
>>> GetOldestXmin etc. Also we hint to resolve fdwxact when near
>>> wraparound.
>>
>> I agree with you about snapshots, but I'm not sure about GetOldestXmin.
>
> Hmm, although I've thought concern in case where we don't consider
> local xids of un-resolved fdwxact in GetOldestXmin, I could not find
> problem. Could you share your concern if you have? I'll try to find a
> possibility based on it.

I don't remember exactly what I was thinking when I wrote that, but I
think the point is that GetOldestXmin() does a bunch of things other
than control the threshold for VACUUM, and we'd need to study them all
and look for problems.  For example, it won't do for an XID to get
reused while it's still associated with an unresolved fdwxact.  We
therefore certainly need such XIDs to hold back the cluster-wide
threshold for clog truncation in some manner -- and maybe that
involves GetOldestXmin().  Or maybe not.  But anyway the point,
broadly considered, is that GetOldestXmin() is used in various ways,
and I don't know if we've thought through all of the consequences in
regard to this new feature.

Can I ask what your time frame is for updating this patch?
Considering how much work appears to remain, if you want to get this
committed to v12, it would be best to get started as early as
possible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Should we add GUCs to allow partition pruning to be disabled?
Next
From: Robert Haas
Date:
Subject: Re: pg_ugprade test failure on data set with column with defaultvalue with type bit/varbit