Re: Two proposed modifications to the PostgreSQL FDW - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Two proposed modifications to the PostgreSQL FDW
Date
Msg-id CAD21AoDy1oYsksEFy_vJhf6hZQyfVQfRu+r6JeW=1NMA64mbyQ@mail.gmail.com
Whole thread Raw
In response to Re: Two proposed modifications to the PostgreSQL FDW  (Chris Travers <chris.travers@adjust.com>)
Responses Re: Two proposed modifications to the PostgreSQL FDW
List pgsql-hackers
On Wed, Aug 22, 2018 at 1:20 PM Chris Travers <chris.travers@adjust.com> wrote:
>
>
>
> On Wed, Aug 22, 2018 at 3:12 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> On Tue, Aug 21, 2018 at 5:36 PM Chris Travers <chris.travers@adjust.com> wrote:
>> >
>> >
>> >
>> > On Tue, Aug 21, 2018 at 8:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> >>
>> >> On Tue, Aug 21, 2018 at 1:47 AM Chris Travers <chris.travers@adjust.com> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Aug 20, 2018 at 4:41 PM Andres Freund <andres@anarazel.de> wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> On 2018-08-20 16:28:01 +0200, Chris Travers wrote:
>> >> >> > 2.  TWOPHASECOMMIT=[off|on] option
>> >> >>
>> >> >> > The second major issue that I see with PostgreSQL's foreign database
>> >> >> > wrappers is the fact that there is no two phase commit which means that a
>> >> >> > single transaction writing to a group of tables has no expectation that all
>> >> >> > backends will commit or rollback together.  With this patch an option would
>> >> >> > be applied to foreign tables such that they could be set to use two phase
>> >> >> > commit  When this is done, the first write to each backend would register a
>> >> >> > connection with a global transaction handler and a pre-commit and commit
>> >> >> > hooks would be set up to properly process these.
>> >> >> >
>> >> >> > On recommit a per-global-transaction file would be opened in the data
>> >> >> > directory and prepare statements logged to the file.  On error, we simply
>> >> >> > roll back our local transaction.
>> >> >> >
>> >> >> > On commit hook , we go through and start to commit the remote global
>> >> >> > transactions.  At this point we make a best effort but track whether or not
>> >> >> > we were successfully on all.  If successful on all, we delete the file.  If
>> >> >> > unsuccessful we fire a background worker which re-reads the file and is
>> >> >> > responsible for cleanup.  If global transactions persist, a SQL
>> >> >> > administration function will be made available to restart the cleanup
>> >> >> > process.  On rollback, we do like commit but we roll back all transactions
>> >> >> > in the set.  The file has enough information to determine whether we should
>> >> >> > be committing or rolling back on cleanup.
>> >> >> >
>> >> >> > I would like to push these both for Pg 12.  Is there any feedback on the
>> >> >> > concepts and the problems first
>> >> >>
>> >>
>> >> Thank you for the proposal. I agree that it's a major problem that
>> >> postgres_fdw (or PostgreSQL core API) doesn't support two-phase
>> >> commit.
>> >>
>> >> >> There's been *substantial* work on this. You should at least read the
>> >> >> discussion & coordinate with the relevant developers.
>> >> >
>> >> >
>> >> > I suppose I should forward this to them directly also.
>> >> >
>> >> > Yeah.   Also the transaction manager code for this I wrote while helping with a proof of concept for this
copy-to-remoteextension. 
>> >> >
>> >> > There are a few big differences in implementation with the patches you mention and the disagreement was part of
whyI thought about going this direction. 
>> >> >
>> >> > First, discussion of differences in implementation:
>> >> >
>> >> > 1.  I treat the local and remote transactions symmetrically and I make no assumptions about what might happen
betweenprepare and an attempted local commit. 
>> >> >    prepare goes into the precommit hook
>> >> >    commit goes into the commit hook and we never raise errors if it fails (because you cannot rollback at that
point). Instead a warning is raised and cleanup commences. 
>> >> >    rollback goes into the rollback hook and we never raise errors if it fails (because you are already rolling
back).
>> >> >
>> >> > 2.  By treating this as a property of a table rather than a property of a foreign data wrapper or a server, we
canbetter prevent prepared transactions where they have not been enabled. 
>> >> >    This also ensures that we know whether we are guaranteeing two phase commit or not by looking at the table.
>> >> >
>> >> > 3.  By making this opt-in it avoids a lot of problems with regards to incorrect configuration etc since if the
DBAsays "use two phase commit" and failed to enable prepared transactions on the other side... 
>> >> >
>> >> > On to failure modes:
>> >> >  1.  Its possible that under high load too many foreign transactions are prepared and things start rolling back
insteadof committing.  Oh well.... 
>> >> >  2.  In the event that a foreign server goes away between prepare and commit, we continue to retry via the
backgroundworker.  The background worker is very pessimistic and checks every remote system for the named transaction. 
>> >>
>> >> If some participant servers fail during COMMIT PREPARED, will the
>> >> client get a "committed"? or an "aborted"? If the client gets
>> >> "aborted", that's not correct because the local changes are already
>> >> committed at that point.
>> >
>> >
>> > Ok so let's discuss this in more detail here.
>> >
>> > You have basically 6 states a TPC global transaction can be in.
>> > 1.  We haven't gotten to the point of trying to commit (BEGIN)
>> > 2.  We are trying to commit (PREPARE)
>> > 3.  We have committed to committing all transactions (COMMIT)
>> > 4.  We have committed to rolling back all transactions (ROLLBACK)
>> > 5.  We have successfully committed OR rolled back all transactions (COMPLETE)
>> > 6.  We tried to commit or rollback all transactions and got some errors (INCOMPLETE)
>> >
>> > During COMMIT PREPARED we cannot raise errors to PostgreSQL.  We have already committed to committing and
thereforethe only way forward is to fix the problem. 
>>
>> Agreed. I wrote the case where the client gets an "aborted" but it
>> should not happen.
>
>
> It is possible an administrator could log in and roll back the prepared transaction but that's beyond the scope of
anypossible patch. 
>>
>>
>> >
>> >>
>> >> On the other hand, if the client get
>> >> "committed" it might break the current user semantics because the
>> >> subsequent reads may not be able to see the own committed writes.
>> >
>> >
>> > Actually it is worse than that and this is why automatic attempted recovery is an absolute requirement.  If you
cannotcommit prepared, then you have a prepared statement that is stuck on the remote side.  This sets auto vacuum
horizonsand some other nastiness.  So we have to note, move on, and try to fix. 
>>
>> Yeah, in my patch the background worker will continue to try to fix if occur.
>
>
> The two things I would suggest is that rather than auto-detecting (if I understand your patch correctly) whether
preparedtransactions are possible on the other system, making it  an option to the foreign server or foreign table.
Otherwiseone might enable prepared transactions for one set of operations on one database and have it automatically
causeheadaches in another context. 

Yeah, currently it's an option for foreign servers. The patch adds a
new option "two_phase_commit" to postgres_fdw.

>
> The other thing I wasn't quite sure about on your patch was what happens if, say, someone trips over a power cord
whilethe background worker is trying to commit things, whether the information is available on the initiating server
whenit comes back. whether a DBA has to go out and try to figure out what needs to be committed remotely, and how this
wouldbe done.  If you could explain that process, that would be helpful to me. 
>
> (In my approach these would be recorded on the master and an SQL function could re-initiate the background worker.)

My approach is almost the same as yours. For details, in the
pre-commit we do WAL-logging for each participants server before
preparing transactions on the remote sides. The WAL has information of
participants foreign servers(foreign server oid, database oid etc) and
its global transaction identifier. Even if plug-pulled during trying
to commit we can recover the global transactions that are not
completed yet and its participants information from WAL. After the
recovery users needs to execute the SQL function to fix the
in-completed global transactions. Since the function can find out
whether the remote transaction should be committed or rollback-ed by
checking CLOG. Does my answer make sense?

>>
>>
>> >
>> > Moreover since COMMIT PREPARED occurs during the commit hook, not the precommit hook, it is too late to roll back
thelocal transaction.  We cannot raise errors since this causes a conflict in the commit status of the local
transaction. So when we commit the local transaction we commit to committing all prepared transactions as soon as
possible. Note some changes need to be made to make this usable in the FDW context, so what I am hoping is that the
dialoghelps impact the discussion and options going forward. 
>> >
>> >>
>> >> Also
>> >> since we don't want to wait for COMMIT PREPARED to complete we need to
>> >> consider that users could cancel the query anytime. To not break the
>> >> current semantics we cannot raise error during 2nd phase of two-phase
>> >> commit but it's not realistic because even the palloc() can raise an
>> >> error.
>> >
>> >
>> > We don't palloc.  All memory used here is on the stack.  I do allow for dramatic precondition checks to cause
errorsbut those should never happen absent some other programmer doing something dramatically unsafe anyway.  For
exampleif you try to double-commit a transaction set..... 
>>
>> Sorry, palloc() is just an example. I'm not sure all FDWs can
>> implement all callbacks for two-phase commit without codes that could
>> emit errors.
>
>
> Yeah, but if you are in the commit hook and someone emits an error, that's wrong because that then tries to rollback
analready committed transaction and the backend rightfully panics.  In fact I should probably strip out the
preconditioncheck errors there and issue  a warning.  It might sometimes happen when something goes seriously wrong on
asystem level, but.... 

In my patch since the commit hook is performed by the background
worker not by the backends it's no problem if someone emits an error
in the commit hook. After the backend prepared transactions on the all
remote side, it enqueue itself to the wait queue. The background
worker gets the global transaction waiting to be completed and commit
prepared transaction on all remote side. After completed the global
transaction the background worker dequeue it.

>>
>>
>> >
>> > There is a possible of system errors if one can no longer write to the file log but at this point as long as we
havelogged the phase change to commit we are able to recover later. 
>> >
>> > So in the event where one sees an error here one continues on to the next transaction in the global transaction
setand tries to commit it, etc. until it runs through the entire set of prepared transactions.  Then if there were any
errorsit fires off a background worker which re-reads the log file and goes out to the various foreign servers, checks
tosee if there is a prepared transaction, and if so commits it.  If the transaction set state was in rollback, it tries
toroll it back instead.  If this errors,, it sleeps for a second and then loops through those which errored and retries
untilall are complete. 
>>
>> Yeah, the patch has the similar functionality.
>>
>> > The other thing is we record whether we are committing or rolling back the transaction when we hit the commit or
rollbackhook.   This is critical because we can imagine a world where the Oracle FDW supports similar semantics.  In
thatcase everything works and is not ordering dependent.  I.e. we can prepare our transactions.  Oracle can try and
fail,and rollback, and we rollback all the transactions everywhere.  And all we have to know was we got to the
precommithook and then we rolled back. 
>>
>> In my patch the global transaction manager manages each status of
>> foreign servers participating in global transactions with WAL logging.
>> The fate of transaction on foreign server will be determined according
>> to the state of local transaction and their status. WAL logging is
>> important because not only in term of speedup but also supporting
>> streaming replication.
>
>
> So you are optimizing for large numbers of prepared transactions or at a high rate?

I don't do optimizations much for the patch as this is the first
implementation. Once the basic feature committed I will do that.

>
> Also does the background worker get fired again on recovery as needed?

No. I added new SQL function to fix global transactions. We need to
execute that function manually after recovery.

>>
>>
>> >>
>> >> The design the patch chose is making backends do only PREPARE and wait
>> >> for the background worker to complete COMMIT PREPARED. In this design
>> >> the clients get a "committed" only either when successful in commit on
>> >> all participants or when they cancel the query explicitly. In other
>> >> words, the client will wait for completion of 2nd phase of two-phase
>> >> commit forever unless it cancels.
>> >
>> >
>> > In this approach we make a best effort to commit or rollback (as appropriate in the state of the global
transaction)*all* remote transactions during global commit or global rollback.  It is not guaranteed but it avoids
breakingsemantics as much as we can.  Also the background worker here does not need to attach to shared memory since
thelog has everything required.   COMMIT PREPARED ought to be a fast operation unless there are network problems but
thosecan affect prepare as well. 
>> >
>> > Also imagine a case where you are writing to three dbs.  One is on Oracle, one on DB2, and one on PostgreSQL  You
successfullyprepare your transaction.  DB2 successfully prepares, and then the Oracle db errors for some reason (maybe
adeferred constraint).  Does the background worker have enough information to know to roll back your transaction on the
remoteside? 
>>
>> I think that what the background worker needs to know to rollback
>> remote transactions are how to rollback and what to rollback. How to
>> rollback is defined in each FDWs.
>
>
> Agreed.  And naturally same with commits.
>
> My assumption is that each foreign data wrapper would have to set its own precommit/commit hook callbacks.  I think
yourpatch extends the fdw structure to try to ensure these are done automatically? 

Yes. The patch adds new FDW APIs for the atomic commit such as
prepare, commit, rollback, resolve(2nd phase of 2PC). The FDW
developers who want make their FDW support the atomic commit need to
define these API and call the registration function when transaction
starts. If the FDW of the registered foreign server doesn't support
FDW's atomic commit API the transaction emit an error.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: [HACKERS] proposal: schema variables
Next
From: Amit Langote
Date:
Subject: Re: Speeding up INSERTs and UPDATEs to partitioned tables