Re: Transactions involving multiple postgres foreign servers - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Transactions involving multiple postgres foreign servers
Date
Msg-id CAFjFpRdFzqxH4p5du1yQQ_JAVkPVN=y1Gf+iBodUgLD6TevtnQ@mail.gmail.com
Whole thread Raw
In response to Re: Transactions involving multiple postgres foreign servers  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
List pgsql-hackers
Added to 2015-06 commitfest to attract some reviews and comments.

On Tue, Feb 17, 2015 at 2:56 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
Hi All,

Here are the steps and infrastructure for achieving atomic commits across multiple foreign servers. I have tried to address most of the concerns raised in this mail thread before. Let me know, if I have left something. Attached is a WIP patch implementing the same for postgres_fdw. I have tried to make it FDW-independent.

A. Steps during transaction processing
------------------------------------------------

1. When an FDW connects to a foreign server and starts a transaction, it registers that server with a boolean flag indicating whether that server is capable of participating in a two phase commit. In the patch this is implemented using function RegisterXactForeignServer(), which raises an error, thus aborting the transaction, if there is at least one foreign server incapable of 2PC in a multiserver transaction. This error thrown as early as possible. If all the foreign servers involved in the transaction are capable of 2PC, the function just updates the information. As of now, in the patch the function is in the form of a stub.

Whether a foreign server is capable of 2PC, can be
a. FDW level decision e.g. file_fdw as of now, is incapable of 2PC but it can build the capabilities which can be used for all the servers using file_fdw
b. a decision based on server version type etc. thus FDW can decide that by looking at the server properties for each server
c. a user decision where the FDW can allow a user to specify it in the form of CREATE/ALTER SERVER option. Implemented in the patch.

For a transaction involving only a single foreign server, the current code remains unaltered as two phase commit is not needed. Rest of the discussion pertains to a transaction involving more than one foreign servers.
At the commit or abort time, the FDW receives call backs with the appropriate events. FDW then takes following actions on each event.

2. On XACT_EVENT_PRE_COMMIT event, the FDW coins one prepared transaction id per foreign server involved and saves it along with xid, dbid, foreign server id and user mapping and foreign transaction status = PREPARING in-memory. The prepared transaction id can be anything represented as byte string. Same information is flushed to the disk to survive crashes. This is implemented in the patch as prepare_foreign_xact(). Persistent and in-memory storages and their usages are discussed later in the mail. FDW then prepares the transaction on the foreign server. If this step is successful, the foreign transaction status is changed to PREPARED. If the step is unsuccessful, the local transaction is aborted and each FDW will receive XACT_EVENT_ABORT (discussed later). The updates to the foreign transaction status need not be flushed to the disk, as they can be inferred from the status of local transaction.

3. If the local transaction is committed, the FDW callback will get XACT_EVENT_COMMIT event. Foreign transaction status is changed to COMMITTING. FDW tries to commit the foreign transaction with the prepared transaction id. If the commit is successful, the foreign transaction entry is removed. If the commit is unsuccessful because of local/foreign server crash or network failure, the foreign prepared transaction resolver takes care of the committing it at later point of time.

4. If the local transaction is aborted, the FDW callback will get XACT_EVENT_ABORT event. At this point, the FDW may or may not have prepared a transaction on foreign server as per step 1 above. If it has not prepared the transaction, it simply aborts the transaction on foreign server; a server crash or network failure doesn't alter the ultimate result in this case. If FDW has prepared the foreign transaction, it updates the foreign transaction status as ABORTING and tries to rollback the prepared transaction. If the rollback is successful, the foreign transaction entry is removed. If the rollback is not successful, the foreign prepared transaction resolver takes care of aborting it at later point of time.

B. Foreign prepared transaction resolver
---------------------------------------------------
In the patch this is implemented as a built-in function pg_fdw_resolve(). Ideally the functionality should be run by a background worker process frequently.

The resolver looks at each entry and invokes the FDW routine to resolve the transaction. The FDW routine returns boolean status: true if the prepared transaction was resolved (committed/aborted), false otherwise.
The resolution is as follows -
1. If foreign transaction status is COMMITTING or ABORTING, commits or aborts the prepared transaction resp through the FDW routine. If the transaction is successfully resolved, it removes the foreign transaction entry.
2. Else, it checks if the local transaction was committed or aborted, it update the foreign transaction status accordingly and takes the action according to above step 1.
3. The resolver doesn't touch entries created by in-progress local transactions.

If server/backend crashes after it has registered the foreign transaction entry (during step A.1), we will be left with a prepared transaction id, which was never prepared on the foreign server. Similarly the server/backend crashes after it has resolved the foreign prepared transaction but before removing the entry, same situation can arise. FDW should detect these situations, when foreign server complains about non-existing prepared transaction ids and consider such foreign transactions as resolved.

After looking at all the entries the resolver flushes the entries to the disk, so as to retain the latest status across shutdown and crash.

C. Other methods and infrastructure
------------------------------------------------
1. Method to show the current foreign transaction entries (in progress or waiting to be resolved). Implemented as function pg_fdw_xact() in the patch.
2. Method to drop foreign transaction entries in case they are resolved by user/DBA themselves. Not implemented in the patch.
3. Method to prevent altering or dropping foreign server and user mapping used to prepare the foreign transaction till the later gets resolved. Not implemented in the patch. While altering or dropping the foreign server or user mapping, that portion of the code needs to check if there exists an foreign transaction entry depending upon the foreign server or user mapping and should error out.
4. The information about the xid needs to be available till it is decided whether to commit or abort the foreign transaction and that decision is persisted. That should put some constraint on the xid wraparound or oldest active transaction. Not implemented in the patch.
5. Method to propagate the foreign transaction information to the slave.

D. Persistent and in-memory storage considerations
--------------------------------------------------------------------
I considered following options for persistent storage
1. in-memory table and file(s) - The foreign transaction entries are saved and manipulated in shared memory. They are written to file whenever persistence is necessary e.g. while registering the foreign transaction in step A.2. Requirements C.1, C.2 need some SQL interface in the form of built-in functions or SQL commands.

The patch implements the in-memory foreign transaction table as a fixed size array of foreign transaction entries (similar to prepared transaction entries in twophase.c). This puts a restriction on number of foreign prepared transactions that need to be maintained at a time. We need separate locks to syncronize the access to the shared memory; the patch uses only a single LW lock. There is restriction on the length of prepared transaction id (or prepared transaction information saved by FDW to be general), since everything is being saved in fixed size memory. We may be able to overcome that restriction by writing this information to separate files (one file per foreign prepared transaction). We need to take the same route as 2PC for C.5.

2. New catalog - This method takes out the need to have separate method for C1, C5 and even C2, also the synchronization will be taken care of by row locks, there will be no limit on the number of foreign transactions as well as the size of foreign prepared transaction information. But big problem with this approach is that, the changes to the catalogs are atomic with the local transaction. If a foreign prepared transaction can not be aborted while the local transaction is rolled back, that entry needs to retained. But since the local transaction is aborting the corresponding catalog entry would become invisible and thus unavailable to the resolver (alas! we do not have autonomous transaction support). We may be able to overcome this, by simulating autonomous transaction through a background worker (which can also act as a resolver). But the amount of communication and synchronization, might affect the performance.

A mixed approach where the backend shifts the entries from storage in approach 1 to catalog, thus lifting the constraints on size is possible, but is very complicated.

Any other ideas to use catalog table as the persistent storage here? Does anybody think, catalog table is a viable option?

3. WAL records - Since the algorithm follows "write ahead of action", WAL seems to be a possible way to persist the foreign transaction entries. But WAL records can not be used for repeated scan as is required by the foreign transaction resolver. Also, replaying WALs is controlled by checkpoint, so not all WALs are replayed. If a checkpoint happens after a foreign prepared transaction remains resolved, corresponding WALs will never be replayed, thus causing the foreign prepared transaction to remain unresolved forever without a clue. So, WALs alone don't seem to be a fit here.

The algorithms rely on the FDWs to take right steps to the large extent, rather than controlling each step explicitly. It expects the FDWs to take the right steps for each event and call the right functions to manipulate foreign transaction entries. It does not ensure the correctness of these steps, by say examining the foreign transaction entries in response to each event or by making the callback return the information and manipulate the entries within the core. I am willing to go the stricter but more intrusive route if the others also think that way. Otherwise, the current approach is less intrusive and I am fine with that too.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

pgsql-hackers by date:

Previous
From: Sreerama Manoj
Date:
Subject: How to create virtual indexes on postgres
Next
From: Andres Freund
Date:
Subject: Re: Partitioning WIP patch