Thread: Transactions involving multiple postgres foreign servers

Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

01 January 2015, 10:47:53

Hi All,

While looking at the patch for supporting inheritance on foreign tables, I noticed that if a transaction makes changes to more than two foreign servers the current implementation in postgres_fdw doesn't make sure that either all of them rollback or all of them commit their changes, IOW there is a possibility that some of them commit their changes while others rollback theirs.

PFA patch which uses 2PC to solve this problem. In pgfdw_xact_callback() at XACT_EVENT_PRE_COMMIT event, it sends prepares the transaction at all the foreign postgresql servers and at XACT_EVENT_COMMIT or XACT_EVENT_ABORT event it commits or aborts those transactions resp.

The logic to craft the prepared transaction ids is rudimentary and I am open to suggestions for the same. I have following goals in mind while crafting the transaction ids

1. Minimize the chances of crafting a transaction id which would conflict with a concurrent transaction id on that foreign server.

2. Because of a limitation described later, DBA/user should be able to identify the server which originated a remote transaction.

More can be found in comments above function pgfdw_get_prep_xact_id() in the patch.

Limitations
---------------

1. After a transaction has been prepared on foreign server, if the connection to that server is lost before the transaction is rolled back or committed on that server, the transaction remains in prepared state forever. Manual intervention would be needed to clean up such a transaction (Hence the goal 2 above). Automating this process would require significant changes to the transaction manager, so, left out of this patch, which I thought would be better right now. If required, I can work on that part in this patch itself.

2. 2PC is needed only when there are more than two foreign servers involved in a transaction. Transactions on a single foreign server are handled well right now. So, ideally, the code should detect if there are more than two foreign server are involved in the transaction and only then use 2PC. But I couldn't find a way to detect that without changing the transaction manager.

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

pg_fdw_transact.patch

Re: Transactions involving multiple postgres foreign servers

From

Tom Lane

Date:

02 January 2015, 20:46:10

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:
> While looking at the patch for supporting inheritance on foreign tables, I
> noticed that if a transaction makes changes to more than two foreign
> servers the current implementation in postgres_fdw doesn't make sure that
> either all of them rollback or all of them commit their changes, IOW there
> is a possibility that some of them commit their changes while others
> rollback theirs.

> PFA patch which uses 2PC to solve this problem. In pgfdw_xact_callback() at
> XACT_EVENT_PRE_COMMIT event, it sends prepares the transaction at all the
> foreign postgresql servers and at XACT_EVENT_COMMIT or XACT_EVENT_ABORT
> event it commits or aborts those transactions resp.

TBH, I think this is a pretty awful idea.

In the first place, this does little to improve the actual reliability
of a commit occurring across multiple foreign servers; and in the second
place it creates a bunch of brand new failure modes, many of which would
require manual DBA cleanup.

The core of the problem is that this doesn't have anything to do with
2PC as it's commonly understood: for that, you need a genuine external
transaction manager that is aware of all the servers involved in a
transaction, and has its own persistent state (or at least a way to
reconstruct its own state by examining the per-server states).
This patch is not that; in particular it treats the local transaction
asymmetrically from the remote ones, which doesn't seem like a great
idea --- ie, the local transaction could still abort after committing
all the remote ones, leaving you no better off in terms of cross-server
consistency.

As far as failure modes go, one basic reason why this cannot work as
presented is that the remote servers may not even have prepared
transaction support enabled (in fact max_prepared_transactions = 0
is the default in all supported PG versions).  So this would absolutely
have to be a not-on-by-default option.  But the bigger issue is that
leaving it to the DBA to clean up after failures is not a production
grade solution, *especially* not for prepared transactions, which are
performance killers if not closed out promptly.  So I can't imagine
anyone wanting to turn this on without a more robust answer than that.

Basically I think what you'd need for this to be a credible patch would be
for it to work by changing the behavior only in the PREPARE TRANSACTION
path: rather than punting as we do now, prepare the remote transactions,
and report their server identities and gids to an external transaction
manager, which would then be responsible for issuing the actual commits
(along with the actual commit of the local transaction).  I have no idea
whether it's feasible to do that without having to assume a particular
2PC transaction manager API/implementation.

It'd be interesting to hear from people who are using 2PC in production
to find out if this would solve any real-world problems for them, and
what the details of the TM interface would need to look like to make it
work in practice.

In short, you can't force 2PC technology on people who aren't using it
already; while for those who are using it already, this isn't nearly
good enough as-is.
        regards, tom lane

Re: Transactions involving multiple postgres foreign servers

From

Robert Haas

Date:

05 January 2015, 18:25:28

On Fri, Jan 2, 2015 at 3:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In short, you can't force 2PC technology on people who aren't using it
> already; while for those who are using it already, this isn't nearly
> good enough as-is.

I was involved in some internal discussions related to this patch, so
I have some opinions on it.  The long-term, high-level goal here is to
facilitate sharding.  If we've got a bunch of PostgreSQL servers
interlinked via postgres_fdw, it should be possible to perform
transactions on the cluster in such a way that transactions are just
as atomic, consistent, isolated, and durable as they would be with
just one server.  As far as I know, there is no way to achieve this
goal through the use of an external transaction manager, because even
if that external transaction manager guarantees, for every
transaction, that the transaction either commits on all nodes or rolls
back on all nodes, there's no way for it to guarantee that other
transactions won't see some intermediate state where the commit has
been completed on some nodes but not others.  To get that, you need
some of integration that reaches down to the way snapshots are taken.

I think, though, that it might be worthwhile to first solve the
simpler problem of figuring out how to ensure that a transaction
commits everywhere or rolls back everywhere, even if intermediate
states might still be transiently visible.   I don't think this patch,
as currently designed, is equal to that challenge, because
XACT_EVENT_PRE_COMMIT fires before the transaction is certain to
commit - PreCommit_CheckForSerializationFailure or PreCommit_Notify
could still error out.  We could have a hook that fires after that,
but that doesn't solve the problem if a user of that hook can itself
throw an error.  Even if part of the API contract is that it's not
allowed to do so, the actual attempt to commit the change on the
remote side can fail due to - e.g. - a network interruption, and
that's go to be dealt with somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Tom Lane

Date:

05 January 2015, 19:47:30

Robert Haas <robertmhaas@gmail.com> writes:
> I was involved in some internal discussions related to this patch, so
> I have some opinions on it.  The long-term, high-level goal here is to
> facilitate sharding.  If we've got a bunch of PostgreSQL servers
> interlinked via postgres_fdw, it should be possible to perform
> transactions on the cluster in such a way that transactions are just
> as atomic, consistent, isolated, and durable as they would be with
> just one server.  As far as I know, there is no way to achieve this
> goal through the use of an external transaction manager, because even
> if that external transaction manager guarantees, for every
> transaction, that the transaction either commits on all nodes or rolls
> back on all nodes, there's no way for it to guarantee that other
> transactions won't see some intermediate state where the commit has
> been completed on some nodes but not others.  To get that, you need
> some of integration that reaches down to the way snapshots are taken.

That's a laudable goal, but I would bet that nothing built on the FDW
infrastructure will ever get there.  Certainly the proposed patch doesn't
look like it moves us very far towards that set of goalposts.

> I think, though, that it might be worthwhile to first solve the
> simpler problem of figuring out how to ensure that a transaction
> commits everywhere or rolls back everywhere, even if intermediate
> states might still be transiently visible.

Perhaps.  I suspect that it might still be a dead end if the ultimate
goal is cross-system atomic commit ... but likely it would teach us
some useful things anyway.
        regards, tom lane

Re: Transactions involving multiple postgres foreign servers

From

Robert Haas

Date:

05 January 2015, 20:02:27

On Mon, Jan 5, 2015 at 2:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> That's a laudable goal, but I would bet that nothing built on the FDW
> infrastructure will ever get there.

Why?

It would be surprising to me if, given that we have gone to some pains
to create a system that allows cross-system queries, and hopefully
eventually pushdown of quals, joins, and aggregates, we then made
sharding work in some completely different way that reuses none of
that infrastructure.  But maybe I am looking at this the wrong way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Tom Lane

Date:

05 January 2015, 20:23:14

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jan 5, 2015 at 2:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> That's a laudable goal, but I would bet that nothing built on the FDW
>> infrastructure will ever get there.

> Why?

> It would be surprising to me if, given that we have gone to some pains
> to create a system that allows cross-system queries, and hopefully
> eventually pushdown of quals, joins, and aggregates, we then made
> sharding work in some completely different way that reuses none of
> that infrastructure.  But maybe I am looking at this the wrong way.

Well, we intentionally didn't couple the FDW stuff closely into
transaction commit, because of the thought that the "far end" would not
necessarily have Postgres-like transactional behavior, and even if it did
there would be about zero chance of having atomic commit with a
non-Postgres remote server.  postgres_fdw is a seriously bad starting
point as far as that goes, because it encourages one to make assumptions
that can't possibly work for any other wrapper.

I think the idea I sketched upthread of supporting an external transaction
manager might be worth pursuing, in that it would potentially lead to
having at least an approximation of atomic commit across heterogeneous
servers.

Independently of that, I think what you are talking about would be better
addressed outside the constraints of the FDW mechanism.  That's not to say
that we couldn't possibly make postgres_fdw use some additional non-FDW
infrastructure to manage commits; just that solving this in terms of the
FDW infrastructure seems wrongheaded to me.
        regards, tom lane

Re: Transactions involving multiple postgres foreign servers

From

Robert Haas

Date:

06 January 2015, 18:25:31

On Mon, Jan 5, 2015 at 3:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Well, we intentionally didn't couple the FDW stuff closely into
> transaction commit, because of the thought that the "far end" would not
> necessarily have Postgres-like transactional behavior, and even if it did
> there would be about zero chance of having atomic commit with a
> non-Postgres remote server.  postgres_fdw is a seriously bad starting
> point as far as that goes, because it encourages one to make assumptions
> that can't possibly work for any other wrapper.

Atomic commit is something that can potentially be supported by many
different FDWs, as long as the thing on the other end supports 2PC.
If you're talking to Oracle or DB2 or SQL Server, and it supports 2PC,
then you can PREPARE the transaction and then go back and COMMIT the
transaction once it's committed locally.  Getting a cluster-wide
*snapshot* is probably a PostgreSQL-only thing requiring much deeper
integration, but I think it would be sensible to leave that as a
future project and solve the simpler problem first.

> I think the idea I sketched upthread of supporting an external transaction
> manager might be worth pursuing, in that it would potentially lead to
> having at least an approximation of atomic commit across heterogeneous
> servers.

An important threshold question here is whether we want to rely on an
external transaction manager, or build one into PostgreSQL.  As far as
this particular project goes, there's nothing that can't be done
inside PostgreSQL.  You need a durable registry of which transactions
you prepared on which servers, and which XIDs they correlate to.  If
you have that, then you can use background workers or similar to go
retry commits or rollbacks of prepared transactions until it works,
even if there's been a local crash meanwhile.

Alternatively, you could rely on an external transaction manager to do
all that stuff.  I don't have a clear sense of what that would entail,
or how it might be better or worse than rolling our own.  I suspect,
though, that it might amount to little more than adding a middle man.
I mean, a third-party transaction manager isn't going to automatically
know how to commit a transaction prepared on some foreign server using
some foreign data wrapper.  It's going to be have to be taught that if
postgres_fdw leaves a transaction in-medias-res on server OID 1234,
you've got to connect to the target machine using that foreign
server's connection parameters, speak libpq, and issue the appropriate
COMMIT TRANSACTION command.  And similarly, you're going to need to
arrange to notify it before preparing that transaction so that it
knows that it needs to request the COMMIT or ABORT later on.  Once
you've got all of that infrastructure for that in place, what are you
really gaining over just doing it in PostgreSQL (or, say, a contrib
module thereto)?

(I'm also concerned that an external transaction manager might need
the PostgreSQL client to be aware of it, whereas what we'd really like
here is for the client to just speak PostgreSQL and be happy that its
commits no longer end up half-done.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

07 January 2015, 07:16:07

On Mon, Jan 5, 2015 at 11:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jan 2, 2015 at 3:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In short, you can't force 2PC technology on people who aren't using it
> already; while for those who are using it already, this isn't nearly
> good enough as-is.

I was involved in some internal discussions related to this patch, so
I have some opinions on it. The long-term, high-level goal here is to
facilitate sharding. If we've got a bunch of PostgreSQL servers
interlinked via postgres_fdw, it should be possible to perform
transactions on the cluster in such a way that transactions are just
as atomic, consistent, isolated, and durable as they would be with
just one server. As far as I know, there is no way to achieve this
goal through the use of an external transaction manager, because even
if that external transaction manager guarantees, for every
transaction, that the transaction either commits on all nodes or rolls
back on all nodes, there's no way for it to guarantee that other
transactions won't see some intermediate state where the commit has
been completed on some nodes but not others. To get that, you need
some of integration that reaches down to the way snapshots are taken.

I think, though, that it might be worthwhile to first solve the
simpler problem of figuring out how to ensure that a transaction
commits everywhere or rolls back everywhere, even if intermediate
states might still be transiently visible.

Agreed.

I don't think this patch,
as currently designed, is equal to that challenge, because
XACT_EVENT_PRE_COMMIT fires before the transaction is certain to
commit - PreCommit_CheckForSerializationFailure or PreCommit_Notify
could still error out. We could have a hook that fires after that,
but that doesn't solve the problem if a user of that hook can itself
throw an error. Even if part of the API contract is that it's not
allowed to do so, the actual attempt to commit the change on the
remote side can fail due to - e.g. - a network interruption, and
that's go to be dealt with somehow.

Tom mentioned
--
in particular it treats the local transaction
asymmetrically from the remote ones, which doesn't seem like a great
idea --- ie, the local transaction could still abort after committing
all the remote ones, leaving you no better off in terms of cross-server
consistency.
--

You have given a specific example of this case. So, let me dry run through CommitTransaction() after applying my patch.
1899 CallXactCallbacks(XACT_EVENT_PRE_COMMIT);

While processing this event in postgres_fdw's callback pgfdw_xact_callback() sends a PREPARE TRANSACTION to all the foreign servers involved. These servers return with their success or failures. Even if one of them fails, the local transaction is aborted along-with all the prepared transactions. Only if all the foreign servers succeed we proceed further.

1925 PreCommit_CheckForSerializationFailure();
1926
1932 PreCommit_Notify();
1933

If any of these function (as you mentioned above), throws errors, the local transaction will be aborted as well as the remote prepared transactions. Note, that we haven't yet committed the local transaction (which will be done below) and also not the remote transactions which are in PREPAREd state there. Since all the transactions local as well as remote are aborted in case of error, the data is still consistent. If these steps succeed, we will proceed ahead.

1934     /* Prevent cancel/die interrupt while cleaning up */
1935     HOLD_INTERRUPTS();
1936
1937     /* Commit updates to the relation map --- do this as late as possible */
1938     AtEOXact_RelationMap(true);
1939
1940     /*
1941      * set the current transaction state information appropriately during
1942      * commit processing
1943      */
1944     s->state = TRANS_COMMIT;
1945
1946     /*
1947      * Here is where we really truly commit.
1948      */
1949     latestXid = RecordTransactionCommit();

1950
1951     TRACE_POSTGRESQL_TRANSACTION_COMMIT(MyProc->lxid);
1952
1953     /*
1954      * Let others know about no transaction in progress by me. Note that this
1955      * must be done _before_ releasing locks we hold and _after_
1956      * RecordTransactionCommit.
1957      */
1958     ProcArrayEndTransaction(MyProc, latestXid);
1959

Local transaction committed. Remote transactions still in PREPAREd state. Any server (including local) crash or link failure happens here, we leave the remote transactions dangling in PREPAREd state and manual cleanup will be required.

1975
1976 CallXactCallbacks(XACT_EVENT_COMMIT);

The postgresql callback pgfdw_xact_callback() commits the PREPAREd transactions by sending COMMIT TRANSACTION to remote server (my patch). So, I don't see why would my patch cause inconsistencies. It can cause dangling PREPAREd transactions and I have already acknowledged that fact.

Am I missing something?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

07 January 2015, 09:25:03

On Sat, Jan 3, 2015 at 2:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:
> While looking at the patch for supporting inheritance on foreign tables, I
> noticed that if a transaction makes changes to more than two foreign
> servers the current implementation in postgres_fdw doesn't make sure that
> either all of them rollback or all of them commit their changes, IOW there
> is a possibility that some of them commit their changes while others
> rollback theirs.

> PFA patch which uses 2PC to solve this problem. In pgfdw_xact_callback() at
> XACT_EVENT_PRE_COMMIT event, it sends prepares the transaction at all the
> foreign postgresql servers and at XACT_EVENT_COMMIT or XACT_EVENT_ABORT
> event it commits or aborts those transactions resp.

TBH, I think this is a pretty awful idea.

In the first place, this does little to improve the actual reliability
of a commit occurring across multiple foreign servers; and in the second
place it creates a bunch of brand new failure modes, many of which would
require manual DBA cleanup.

The core of the problem is that this doesn't have anything to do with
2PC as it's commonly understood: for that, you need a genuine external
transaction manager that is aware of all the servers involved in a
transaction, and has its own persistent state (or at least a way to
reconstruct its own state by examining the per-server states).
This patch is not that; in particular it treats the local transaction
asymmetrically from the remote ones, which doesn't seem like a great
idea --- ie, the local transaction could still abort after committing
all the remote ones, leaving you no better off in terms of cross-server
consistency.

As far as failure modes go, one basic reason why this cannot work as
presented is that the remote servers may not even have prepared
transaction support enabled (in fact max_prepared_transactions = 0
is the default in all supported PG versions). So this would absolutely
have to be a not-on-by-default option.

Agreed. We can have a per foreign server option, which says whether the corresponding server can participate in 2PC. A transaction spanning multiple foreign server with at least one of them not capable of participating in 2PC will need to be aborted.

But the bigger issue is that
leaving it to the DBA to clean up after failures is not a production
grade solution, *especially* not for prepared transactions, which are
performance killers if not closed out promptly. So I can't imagine
anyone wanting to turn this on without a more robust answer than that.

I purposefully left that outside this patch, since it involves significant changes in core. If that's necessary for the first cut, I will work on it.

Basically I think what you'd need for this to be a credible patch would be
for it to work by changing the behavior only in the PREPARE TRANSACTION
path: rather than punting as we do now, prepare the remote transactions,
and report their server identities and gids to an external transaction
manager, which would then be responsible for issuing the actual commits
(along with the actual commit of the local transaction). I have no idea
whether it's feasible to do that without having to assume a particular
2PC transaction manager API/implementation.

I doubt if a TM would expect a bunch of GIDs in response to PREPARE TRANSACTION command. Per X/Open xa_prepare() expects an integer return value, specifying whether the PREPARE succeeded or not and some piggybacked statuses.

In the context of foreign table under inheritance tree, a single DML can span multiple foreign servers. All such DMLs will then need to be handled by an external TM. An external TM or application may not have exact idea as to which all foreign servers are going to be affected by a DML. Users may not want to setup an external TM in such cases. Instead they would expect PostgreSQL to manage such DMLs and transactions all by itself.

As Robert has suggested in his responses, it would be better to enable PostgreSQL to manage distributed transactions itself.

It'd be interesting to hear from people who are using 2PC in production
to find out if this would solve any real-world problems for them, and
what the details of the TM interface would need to look like to make it
work in practice.

In short, you can't force 2PC technology on people who aren't using it
already; while for those who are using it already, this isn't nearly
good enough as-is.

regards, tom lane

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

07 January 2015, 09:28:55

On Tue, Jan 6, 2015 at 11:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jan 5, 2015 at 3:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Well, we intentionally didn't couple the FDW stuff closely into
> transaction commit, because of the thought that the "far end" would not
> necessarily have Postgres-like transactional behavior, and even if it did
> there would be about zero chance of having atomic commit with a
> non-Postgres remote server. postgres_fdw is a seriously bad starting
> point as far as that goes, because it encourages one to make assumptions
> that can't possibly work for any other wrapper.

Atomic commit is something that can potentially be supported by many
different FDWs, as long as the thing on the other end supports 2PC.
If you're talking to Oracle or DB2 or SQL Server, and it supports 2PC,
then you can PREPARE the transaction and then go back and COMMIT the
transaction once it's committed locally.

Getting a cluster-wide
*snapshot* is probably a PostgreSQL-only thing requiring much deeper
integration, but I think it would be sensible to leave that as a
future project and solve the simpler problem first.

> I think the idea I sketched upthread of supporting an external transaction
> manager might be worth pursuing, in that it would potentially lead to
> having at least an approximation of atomic commit across heterogeneous
> servers.

An important threshold question here is whether we want to rely on an
external transaction manager, or build one into PostgreSQL. As far as
this particular project goes, there's nothing that can't be done
inside PostgreSQL. You need a durable registry of which transactions
you prepared on which servers, and which XIDs they correlate to. If
you have that, then you can use background workers or similar to go
retry commits or rollbacks of prepared transactions until it works,
even if there's been a local crash meanwhile.

Alternatively, you could rely on an external transaction manager to do
all that stuff. I don't have a clear sense of what that would entail,
or how it might be better or worse than rolling our own. I suspect,
though, that it might amount to little more than adding a middle man.
I mean, a third-party transaction manager isn't going to automatically
know how to commit a transaction prepared on some foreign server using
some foreign data wrapper. It's going to be have to be taught that if
postgres_fdw leaves a transaction in-medias-res on server OID 1234,
you've got to connect to the target machine using that foreign
server's connection parameters, speak libpq, and issue the appropriate
COMMIT TRANSACTION command. And similarly, you're going to need to
arrange to notify it before preparing that transaction so that it
knows that it needs to request the COMMIT or ABORT later on. Once
you've got all of that infrastructure for that in place, what are you
really gaining over just doing it in PostgreSQL (or, say, a contrib
module thereto)?

Thanks Robert for giving high level view of system needed for PostgreSQL to be a transaction manager by itself. Agreed completely.

(I'm also concerned that an external transaction manager might need
the PostgreSQL client to be aware of it, whereas what we'd really like
here is for the client to just speak PostgreSQL and be happy that its
commits no longer end up half-done.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Kevin Grittner

Date:

07 January 2015, 16:21:36

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

> I don't see why would my patch cause inconsistencies. It can
> cause dangling PREPAREd transactions and I have already
> acknowledged that fact.
>
> Am I missing something?

To me that is the big problem.  Where I have run into ad hoc
distributed transaction managers it has usually been because a
crash left prepared transactions dangling, without cleaning them up
when the transaction manager was restarted.  This tends to wreak
havoc one way or another.

If we are going to include a distributed transaction manager with
PostgreSQL, it *must* persist enough information about the
transaction ID and where it is used in a way that will survive a
subsequent crash before beginning the PREPARE on any of the
systems.  After all nodes are PREPAREd it must flag that persisted
data to indicate that it is now at a point where ROLLBACK is no
longer an option.  Only then can it start committing the prepared
transactions.  After the last node is committed it can clear this
information.  On start-up the distributed transaction manager must
check for any distributed transactions left "in progress" and
commit or rollback based on the preceding; doing retries
indefinitely until it succeeds or is told to stop.

Doing this incompletely (i.e., not identifying and correctly
handling the various failure modes) is IMO far worse than not
attempting it.  If we could build in something that did this
completely and well, that would be a cool selling point; but let's
not gloss over the difficulties.  We must recognize how big a
problem it would be to include a low-quality implementation.

Also, as previously mentioned, it must behave in some reasonable
way if a database is not configured to support 2PC, especially
since 2PC is off by default in PostgreSQL.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

08 January 2015, 08:07:37

On Wed, Jan 7, 2015 at 9:50 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

> I don't see why would my patch cause inconsistencies. It can
> cause dangling PREPAREd transactions and I have already
> acknowledged that fact.
>
> Am I missing something?

To me that is the big problem. Where I have run into ad hoc
distributed transaction managers it has usually been because a
crash left prepared transactions dangling, without cleaning them up
when the transaction manager was restarted. This tends to wreak
havoc one way or another.

If we are going to include a distributed transaction manager with
PostgreSQL, it *must* persist enough information about the
transaction ID and where it is used in a way that will survive a
subsequent crash before beginning the PREPARE on any of the
systems.

Thanks a lot. I hadn't thought of this.

After all nodes are PREPAREd it must flag that persisted
data to indicate that it is now at a point where ROLLBACK is no
longer an option. Only then can it start committing the prepared
transactions. After the last node is committed it can clear this
information. On start-up the distributed transaction manager must
check for any distributed transactions left "in progress" and
commit or rollback based on the preceding; doing retries
indefinitely until it succeeds or is told to stop.

Agreed.

Doing this incompletely (i.e., not identifying and correctly
handling the various failure modes) is IMO far worse than not
attempting it. If we could build in something that did this
completely and well, that would be a cool selling point; but let's
not gloss over the difficulties. We must recognize how big a
problem it would be to include a low-quality implementation.

Also, as previously mentioned, it must behave in some reasonable
way if a database is not configured to support 2PC, especially
since 2PC is off by default in PostgreSQL.

I described one possibility in my reply to Tom's mail. Let me repeat it here.

We can have a per foreign server option, which says whether the corresponding server is able to participate in 2PC. A transaction spanning multiple foreign server with at least one of them not capable of participating in 2PC will be aborted.

Will that work?

In case a user flags a foreign server as capable to 2PC incorrectly, I expect the corresponding FDW would raise error (either because PREPARE fails or FDW doesn't handle that case) and the transaction will be aborted anyway.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Kevin Grittner

Date:

08 January 2015, 13:33:14

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
> On Wed, Jan 7, 2015 at 9:50 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

>> Also, as previously mentioned, it must behave in some reasonable
>> way if a database is not configured to support 2PC, especially
>> since 2PC is off by default in PostgreSQL.

> We can have a per foreign server option, which says whether the
> corresponding server is able to participate in 2PC. A transaction
> spanning multiple foreign server with at least one of them not
> capable of participating in 2PC will be aborted.
>
> Will that work?
>
> In case a user flags a foreign server as capable to 2PC
> incorrectly, I expect the corresponding FDW would raise error
> (either because PREPARE fails or FDW doesn't handle that case)
> and the transaction will be aborted anyway.

That sounds like one way to handle it.  I'm not clear on how you
plan to determine whether 2PC is required for a transaction.
(Apologies if it was previously mentioned and I've forgotten it.)

I don't mean to suggest that these problems are insurmountable; I
just think that people often underestimate the difficulty of
writing a distributed transaction manager and don't always
recognize the problems that it will cause if all of the failure
modes are not considered and handled.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Robert Haas

Date:

08 January 2015, 14:54:28

On Wed, Jan 7, 2015 at 11:20 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> If we are going to include a distributed transaction manager with
> PostgreSQL, it *must* persist enough information about the
> transaction ID and where it is used in a way that will survive a
> subsequent crash before beginning the PREPARE on any of the
> systems.  After all nodes are PREPAREd it must flag that persisted
> data to indicate that it is now at a point where ROLLBACK is no
> longer an option.  Only then can it start committing the prepared
> transactions.  After the last node is committed it can clear this
> information.  On start-up the distributed transaction manager must
> check for any distributed transactions left "in progress" and
> commit or rollback based on the preceding; doing retries
> indefinitely until it succeeds or is told to stop.

I think one key question here is whether all of this should be handled
in PostgreSQL core or whether some of it should be handled in other
ways.  Is the goal to make postgres_fdw (and FDWs for other databases
that support 2PC) to persist enough information that someone *could*
write a transaction manager for PostgreSQL, or is the goal to actually
write that transaction manager?

Just figuring out how to persist the necessary information is a
non-trivial problem by itself.  You might think that you could just
insert a row into a local table saying, hey, I'm about to prepare a
transaction remotely, but of course that doesn't work: if you then go
on to PREPARE before writing and flushing the local commit record,
then a crash before that's done leaves a dangling prepared transaction
on the remote note.  You might think to write the record, then after
writing and flush the local commit record do the PREPARE.  But you
can't do that either, because now if the PREPARE fails you've already
committed locally.

I guess what you need to do is something like:

1. Write and flush a WAL record indicating an intent to prepare, with
a list of foreign server OIDs and GUIDs.
2. Prepare the remote transaction on each node.  If any of those
operations fail, roll back any prepared nodes and error out.
3. Commit locally (i.e. RecordTransactionCommit, writing and flushing WAL).
4. Try to commit the remote transactions.
5. Write a WAL record indicating that you committed the remote transactions OK.

If you fail after step 1, you can straighten things out by looking at
the status of the transaction: if the transaction committed, any
transactions we intended-to-prepare need to be checked.  If they are
still prepared, we need to commit them or roll them back according to
what happened to our XID.

(Andres is talking in my other ear suggesting that we ought to reuse
the 2PC infrastructure to do all this.  I'm not convinced that's a
good idea, but I'll let him present his own ideas here if he wants to
rather than trying to explain them myself.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Kevin Grittner

Date:

08 January 2015, 15:19:32

Robert Haas <robertmhaas@gmail.com> wrote:

> Andres is talking in my other ear suggesting that we ought to
> reuse the 2PC infrastructure to do all this.

If you mean that the primary transaction and all FDWs in the
transaction must use 2PC, that is what I was saying, although
apparently not clearly enough.  All nodes *including the local one*
must be prepared and committed with data about the nodes saved
safely off somewhere that it can be read in the event of a failure
of any of the nodes *including the local one*.  Without that, I see
this whole approach as a train wreck just waiting to happen.

I'm not really clear on the mechanism that is being proposed for
doing this, but one way would be to have the PREPARE of the local
transaction be requested explicitly and to have that cause all FDWs
participating in the transaction to also be prepared.  (That might
be what Andres meant; I don't know.)  That doesn't strike me as the
only possible mechanism to drive this, but it might well be the
simplest and cleanest.  The trickiest bit might be to find a good
way to persist the distributed transaction information in a way
that survives the failure of the main transaction -- or even the
abrupt loss of the machine it's running on.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Robert Haas

Date:

08 January 2015, 17:32:05

On Thu, Jan 8, 2015 at 10:19 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>> Andres is talking in my other ear suggesting that we ought to
>> reuse the 2PC infrastructure to do all this.
>
> If you mean that the primary transaction and all FDWs in the
> transaction must use 2PC, that is what I was saying, although
> apparently not clearly enough.  All nodes *including the local one*
> must be prepared and committed with data about the nodes saved
> safely off somewhere that it can be read in the event of a failure
> of any of the nodes *including the local one*.  Without that, I see
> this whole approach as a train wreck just waiting to happen.

Clearly, all the nodes other than the local one need to use 2PC.  I am
unconvinced that the local node must write a 2PC state file only to
turn around and remove it again almost immediately thereafter.

> I'm not really clear on the mechanism that is being proposed for
> doing this, but one way would be to have the PREPARE of the local
> transaction be requested explicitly and to have that cause all FDWs
> participating in the transaction to also be prepared.  (That might
> be what Andres meant; I don't know.)

We want this to be client-transparent, so that the client just says
COMMIT and everything Just Works.

> That doesn't strike me as the
> only possible mechanism to drive this, but it might well be the
> simplest and cleanest.  The trickiest bit might be to find a good
> way to persist the distributed transaction information in a way
> that survives the failure of the main transaction -- or even the
> abrupt loss of the machine it's running on.

I'd be willing to punt on surviving a loss of the entire machine.  But
I'd like to be able to survive an abrupt reboot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Kevin Grittner

Date:

08 January 2015, 18:01:09

Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jan 8, 2015 at 10:19 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
>> Robert Haas <robertmhaas@gmail.com> wrote:
>>> Andres is talking in my other ear suggesting that we ought to
>>> reuse the 2PC infrastructure to do all this.
>>
>> If you mean that the primary transaction and all FDWs in the
>> transaction must use 2PC, that is what I was saying, although
>> apparently not clearly enough.  All nodes *including the local one*
>> must be prepared and committed with data about the nodes saved
>> safely off somewhere that it can be read in the event of a failure
>> of any of the nodes *including the local one*.  Without that, I see
>> this whole approach as a train wreck just waiting to happen.
>
> Clearly, all the nodes other than the local one need to use 2PC.  I am
> unconvinced that the local node must write a 2PC state file only to
> turn around and remove it again almost immediately thereafter.

The key point is that the distributed transaction data must be
flagged as needing to commit rather than roll back between the
prepare phase and the final commit.  If you try to avoid the
PREPARE, flagging, COMMIT PREPARED sequence by building the
flagging of the distributed transaction metadata into the COMMIT
process, you still have the problem of what to do on crash
recovery.  You really need to use 2PC to keep that clean, I think.

>> I'm not really clear on the mechanism that is being proposed for
>> doing this, but one way would be to have the PREPARE of the local
>> transaction be requested explicitly and to have that cause all FDWs
>> participating in the transaction to also be prepared.  (That might
>> be what Andres meant; I don't know.)
>
> We want this to be client-transparent, so that the client just says
> COMMIT and everything Just Works.

What about the case where one or more nodes doesn't support 2PC.
Do we silently make the choice, without the client really knowing?

>> That doesn't strike me as the
>> only possible mechanism to drive this, but it might well be the
>> simplest and cleanest.  The trickiest bit might be to find a good
>> way to persist the distributed transaction information in a way
>> that survives the failure of the main transaction -- or even the
>> abrupt loss of the machine it's running on.
>
> I'd be willing to punt on surviving a loss of the entire machine.  But
> I'd like to be able to survive an abrupt reboot.

As long as people are aware that there is an urgent need to find
and fix all data stores to which clusters on the failed machine
were connected via FDW when there is a hard machine failure, I
guess it is OK.  In essence we just document it and declare it to
be somebody else's problem.  In general I would expect a
distributed transaction manager to behave well in the face of any
single-machine failure, but if there is one aspect of a
full-featured distributed transaction manager we could give up, I
guess that would be it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

09 January 2015, 06:27:58

On Thu, Jan 8, 2015 at 7:02 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
> On Wed, Jan 7, 2015 at 9:50 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

>> Also, as previously mentioned, it must behave in some reasonable
>> way if a database is not configured to support 2PC, especially
>> since 2PC is off by default in PostgreSQL.

> We can have a per foreign server option, which says whether the
> corresponding server is able to participate in 2PC. A transaction
> spanning multiple foreign server with at least one of them not
> capable of participating in 2PC will be aborted.
>
> Will that work?
>
> In case a user flags a foreign server as capable to 2PC
> incorrectly, I expect the corresponding FDW would raise error
> (either because PREPARE fails or FDW doesn't handle that case)
> and the transaction will be aborted anyway.

That sounds like one way to handle it. I'm not clear on how you
plan to determine whether 2PC is required for a transaction.
(Apologies if it was previously mentioned and I've forgotten it.)

Any transaction involving more than one server (including local one, I guess), will require two PC. A transaction may modify and access remote database but not local one. In such a case, the state of local transaction doesn't matter once the remote transaction is committed or rolled back.

I don't mean to suggest that these problems are insurmountable; I
just think that people often underestimate the difficulty of
writing a distributed transaction manager and don't always
recognize the problems that it will cause if all of the failure
modes are not considered and handled.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

09 January 2015, 06:49:33

On Thu, Jan 8, 2015 at 8:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jan 7, 2015 at 11:20 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
> If we are going to include a distributed transaction manager with
> PostgreSQL, it *must* persist enough information about the
> transaction ID and where it is used in a way that will survive a
> subsequent crash before beginning the PREPARE on any of the
> systems. After all nodes are PREPAREd it must flag that persisted
> data to indicate that it is now at a point where ROLLBACK is no
> longer an option. Only then can it start committing the prepared
> transactions. After the last node is committed it can clear this
> information. On start-up the distributed transaction manager must
> check for any distributed transactions left "in progress" and
> commit or rollback based on the preceding; doing retries
> indefinitely until it succeeds or is told to stop.

I think one key question here is whether all of this should be handled
in PostgreSQL core or whether some of it should be handled in other
ways. Is the goal to make postgres_fdw (and FDWs for other databases
that support 2PC) to persist enough information that someone *could*
write a transaction manager for PostgreSQL, or is the goal to actually
write that transaction manager?

Just figuring out how to persist the necessary information is a
non-trivial problem by itself. You might think that you could just
insert a row into a local table saying, hey, I'm about to prepare a
transaction remotely, but of course that doesn't work: if you then go
on to PREPARE before writing and flushing the local commit record,
then a crash before that's done leaves a dangling prepared transaction
on the remote note. You might think to write the record, then after
writing and flush the local commit record do the PREPARE. But you
can't do that either, because now if the PREPARE fails you've already
committed locally.

I guess what you need to do is something like:

1. Write and flush a WAL record indicating an intent to prepare, with
a list of foreign server OIDs and GUIDs.
2. Prepare the remote transaction on each node. If any of those
operations fail, roll back any prepared nodes and error out.
3. Commit locally (i.e. RecordTransactionCommit, writing and flushing WAL).
4. Try to commit the remote transactions.
5. Write a WAL record indicating that you committed the remote transactions OK.

If you fail after step 1, you can straighten things out by looking at
the status of the transaction: if the transaction committed, any
transactions we intended-to-prepare need to be checked. If they are
still prepared, we need to commit them or roll them back according to
what happened to our XID.

When you want to strengthen and commit things, the foreign server may not be available to do that. As Kevin pointed out in above, we need to keep on retrying to resolve (commit or rollback based on the status of local transaction) the PREPAREd transactions on foreign server till they are resolved. So, we will have to persist the information somewhere else than the WAL OR keep on persisting the WALs even after the corresponding local transaction has been committed or aborted, which I don't think is a good idea, since that will have impact on replication, VACUUM esp. because it's going to affect the oldest transaction in WAL.

That's where Andres's suggestion might help.

(Andres is talking in my other ear suggesting that we ought to reuse
the 2PC infrastructure to do all this. I'm not convinced that's a
good idea, but I'll let him present his own ideas here if he wants to
rather than trying to explain them myself.)

We can persist the information about distributed transaction (which esp. require 2PC) similar to the way as 2PC infrastructure in pg_twophase directory. I am still investigating whether we can re-use existing 2PC infrastructure or not. My initial reaction is no, since 2PC persists information about local transaction including locked objects, WALs (?) in pg_twophase directory, which is not required for a distributed transaction. But rest of the mechanism like the manner of processing the records during normal processing and recovery looks very useful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Jim Nasby

Date:

10 January 2015, 00:02:54

On 1/8/15, 12:00 PM, Kevin Grittner wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Jan 8, 2015 at 10:19 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
>>> Robert Haas <robertmhaas@gmail.com> wrote:
>>>> Andres is talking in my other ear suggesting that we ought to
>>>> reuse the 2PC infrastructure to do all this.
>>>
>>> If you mean that the primary transaction and all FDWs in the
>>> transaction must use 2PC, that is what I was saying, although
>>> apparently not clearly enough.  All nodes *including the local one*
>>> must be prepared and committed with data about the nodes saved
>>> safely off somewhere that it can be read in the event of a failure
>>> of any of the nodes *including the local one*.  Without that, I see
>>> this whole approach as a train wreck just waiting to happen.
>>
>> Clearly, all the nodes other than the local one need to use 2PC.  I am
>> unconvinced that the local node must write a 2PC state file only to
>> turn around and remove it again almost immediately thereafter.
>
> The key point is that the distributed transaction data must be
> flagged as needing to commit rather than roll back between the
> prepare phase and the final commit.  If you try to avoid the
> PREPARE, flagging, COMMIT PREPARED sequence by building the
> flagging of the distributed transaction metadata into the COMMIT
> process, you still have the problem of what to do on crash
> recovery.  You really need to use 2PC to keep that clean, I think.

If we had an independent transaction coordinator then I agree with you Kevin. I think Robert is proposing that if we
arecontrolling one of the nodes that's participating as well as coordinating the overall transaction that we can take
someshortcuts. AIUI a PREPARE means you are completely ready to commit. In essence you're just waiting to write and
fsyncthe commit message. That is in fact the state that a coordinating PG node would be in by the time everyone else
hasdone their prepare. So from that standpoint we're OK.

Now, as soon as ANY of the nodes commit, our coordinating node MUST be able to commit as well! That would require it to
havea real prepared transaction of it's own created. However, as long as there is zero chance of any other prepared
transactionscommitting before our local transaction, that step isn't actually needed. Our local transaction will either
commitor abort, and that will determine what needs to happen on all other nodes.

I'm ignoring the question of how the local node needs to store info about the other nodes in case of a crash, but
AFAICTyou could reliably recover manually from what I just described.

I think the question is: are we OK with "going under the skirt" in this fashion? Presumably it would provide better
performance,whereas forcing ourselves to eat our own 2PC dogfood would presumably make it easier for someone to plugin
anexternal coordinator instead of using our own. I think there's also a lot to be said for getting a partial
implementationof this available today (requiring manual recovery), so long as it's not in core.

BTW, I found https://www.cs.rutgers.edu/~pxk/417/notes/content/transactions.html a useful read, specifically the 2PC
portion.

>>> I'm not really clear on the mechanism that is being proposed for
>>> doing this, but one way would be to have the PREPARE of the local
>>> transaction be requested explicitly and to have that cause all FDWs
>>> participating in the transaction to also be prepared.  (That might
>>> be what Andres meant; I don't know.)
>>
>> We want this to be client-transparent, so that the client just says
>> COMMIT and everything Just Works.
>
> What about the case where one or more nodes doesn't support 2PC.
> Do we silently make the choice, without the client really knowing?

We abort. (Unless we want to have a running_with_scissors GUC...)

>>> That doesn't strike me as the
>>> only possible mechanism to drive this, but it might well be the
>>> simplest and cleanest.  The trickiest bit might be to find a good
>>> way to persist the distributed transaction information in a way
>>> that survives the failure of the main transaction -- or even the
>>> abrupt loss of the machine it's running on.
>>
>> I'd be willing to punt on surviving a loss of the entire machine.  But
>> I'd like to be able to survive an abrupt reboot.
>
> As long as people are aware that there is an urgent need to find
> and fix all data stores to which clusters on the failed machine
> were connected via FDW when there is a hard machine failure, I
> guess it is OK.  In essence we just document it and declare it to
> be somebody else's problem.  In general I would expect a
> distributed transaction manager to behave well in the face of any
> single-machine failure, but if there is one aspect of a
> full-featured distributed transaction manager we could give up, I
> guess that would be it.

ISTM that one option here would be to "simply" write and sync WAL record(s) of all externally prepared transactions.
Thatwould be enough for a hot standby to find all the other servers and tell them to either commit or abort, based on
whetherour local transaction committed or aborted. If you wanted, you could even have the standby be responsible for
tellingall the other participants to commit...

-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Transactions involving multiple postgres foreign servers

From

Michael Paquier

Date:

10 January 2015, 13:11:17

On Sat, Jan 10, 2015 at 9:02 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 1/8/15, 12:00 PM, Kevin Grittner wrote:
>> The key point is that the distributed transaction data must be
>> flagged as needing to commit rather than roll back between the
>> prepare phase and the final commit.  If you try to avoid the
>> PREPARE, flagging, COMMIT PREPARED sequence by building the
>> flagging of the distributed transaction metadata into the COMMIT
>> process, you still have the problem of what to do on crash
>> recovery.  You really need to use 2PC to keep that clean, I think.
Yes, 2PC is needed as long as more than 2 nodes perform write
operations within a transaction.

> If we had an independent transaction coordinator then I agree with you
> Kevin. I think Robert is proposing that if we are controlling one of the
> nodes that's participating as well as coordinating the overall transaction
> that we can take some shortcuts. AIUI a PREPARE means you are completely
> ready to commit. In essence you're just waiting to write and fsync the
> commit message. That is in fact the state that a coordinating PG node would
> be in by the time everyone else has done their prepare. So from that
> standpoint we're OK.
>
> Now, as soon as ANY of the nodes commit, our coordinating node MUST be able
> to commit as well! That would require it to have a real prepared transaction
> of it's own created. However, as long as there is zero chance of any other
> prepared transactions committing before our local transaction, that step
> isn't actually needed. Our local transaction will either commit or abort,
> and that will determine what needs to happen on all other nodes.

It is a property of 2PC to ensure that a prepared transaction will
commit. Now, once it is confirmed on the coordinator that all the
remote nodes have successfully PREPAREd, the coordinator issues COMMIT
PREPARED to each node. What do you do if some nodes report ABORT
PREPARED while other nodes report COMMIT PREPARED? Do you abort the
transaction on coordinator, commit it or FATAL? This lets the cluster
in an inconsistent state, meaning that some consistent cluster-wide
recovery point is needed as well (Postgres-XC and XL have introduced
the concept of barriers for such problems, stuff created first by
Pavan Deolassee).
-- 
Michael

Re: Transactions involving multiple postgres foreign servers

From

Jim Nasby

Date:

11 January 2015, 01:37:32

On 1/10/15, 7:11 AM, Michael Paquier wrote:
>> If we had an independent transaction coordinator then I agree with you
>> >Kevin. I think Robert is proposing that if we are controlling one of the
>> >nodes that's participating as well as coordinating the overall transaction
>> >that we can take some shortcuts. AIUI a PREPARE means you are completely
>> >ready to commit. In essence you're just waiting to write and fsync the
>> >commit message. That is in fact the state that a coordinating PG node would
>> >be in by the time everyone else has done their prepare. So from that
>> >standpoint we're OK.
>> >
>> >Now, as soon as ANY of the nodes commit, our coordinating node MUST be able
>> >to commit as well! That would require it to have a real prepared transaction
>> >of it's own created. However, as long as there is zero chance of any other
>> >prepared transactions committing before our local transaction, that step
>> >isn't actually needed. Our local transaction will either commit or abort,
>> >and that will determine what needs to happen on all other nodes.
> It is a property of 2PC to ensure that a prepared transaction will
> commit. Now, once it is confirmed on the coordinator that all the
> remote nodes have successfully PREPAREd, the coordinator issues COMMIT
> PREPARED to each node. What do you do if some nodes report ABORT
> PREPARED while other nodes report COMMIT PREPARED? Do you abort the
> transaction on coordinator, commit it or FATAL? This lets the cluster
> in an inconsistent state, meaning that some consistent cluster-wide
> recovery point is needed as well (Postgres-XC and XL have introduced
> the concept of barriers for such problems, stuff created first by
> Pavan Deolassee).

My understanding is that once you get a successful PREPARE that should mean that it's basically impossible for the
transactionto fail to commit. If that's not the case, I fail to see how you can get any decent level of sanity out of
this...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Transactions involving multiple postgres foreign servers

From

Michael Paquier

Date:

11 January 2015, 08:36:20

On Sun, Jan 11, 2015 at 10:37 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 1/10/15, 7:11 AM, Michael Paquier wrote:
>>>
>>> If we had an independent transaction coordinator then I agree with you
>>> >Kevin. I think Robert is proposing that if we are controlling one of the
>>> >nodes that's participating as well as coordinating the overall
>>> > transaction
>>> >that we can take some shortcuts. AIUI a PREPARE means you are completely
>>> >ready to commit. In essence you're just waiting to write and fsync the
>>> >commit message. That is in fact the state that a coordinating PG node
>>> > would
>>> >be in by the time everyone else has done their prepare. So from that
>>> >standpoint we're OK.
>>> >
>>> >Now, as soon as ANY of the nodes commit, our coordinating node MUST be
>>> > able
>>> >to commit as well! That would require it to have a real prepared
>>> > transaction
>>> >of it's own created. However, as long as there is zero chance of any
>>> > other
>>> >prepared transactions committing before our local transaction, that step
>>> >isn't actually needed. Our local transaction will either commit or
>>> > abort,
>>> >and that will determine what needs to happen on all other nodes.
>>
>> It is a property of 2PC to ensure that a prepared transaction will
>> commit. Now, once it is confirmed on the coordinator that all the
>> remote nodes have successfully PREPAREd, the coordinator issues COMMIT
>> PREPARED to each node. What do you do if some nodes report ABORT
>> PREPARED while other nodes report COMMIT PREPARED? Do you abort the
>> transaction on coordinator, commit it or FATAL? This lets the cluster
>> in an inconsistent state, meaning that some consistent cluster-wide
>> recovery point is needed as well (Postgres-XC and XL have introduced
>> the concept of barriers for such problems, stuff created first by
>> Pavan Deolassee).
>
>
> My understanding is that once you get a successful PREPARE that should mean
> that it's basically impossible for the transaction to fail to commit. If
> that's not the case, I fail to see how you can get any decent level of
> sanity out of this...
When giving the responsability of a group of COMMIT PREPARED to a set
of nodes in a network, there could be a couple of problems showing up,
of the type split-brain for example. There could be as well failures
at hardware-level, so you would need a mechanism ensuring that WAL is
consistent among all the nodes, with for example the addition of a
common restore point on all the nodes once PREPARE is successfully
done with for example XLOG_RESTORE_POINT. That's a reason why I think
that the local Coordinator should use 2PC as well, to ensure a
consistency point once all the remote nodes have successfully
PREPAREd, and a reason why things can get complicated for either the
DBA or the upper application in charge of ensuring the DB consistency
even in case of critical failures.
-- 
Michael

Re: Transactions involving multiple postgres foreign servers

From

Robert Haas

Date:

14 January 2015, 04:10:46

On Thu, Jan 8, 2015 at 1:00 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
>> Clearly, all the nodes other than the local one need to use 2PC.  I am
>> unconvinced that the local node must write a 2PC state file only to
>> turn around and remove it again almost immediately thereafter.
>
> The key point is that the distributed transaction data must be
> flagged as needing to commit rather than roll back between the
> prepare phase and the final commit.  If you try to avoid the
> PREPARE, flagging, COMMIT PREPARED sequence by building the
> flagging of the distributed transaction metadata into the COMMIT
> process, you still have the problem of what to do on crash
> recovery.  You really need to use 2PC to keep that clean, I think.

I don't really follow this.  You need to write a list of the
transactions that you're going to prepare to stable storage before
preparing any of them.  And then you need to write something to stable
storage when you've definitively determined that you're going to
commit.  But we have no current mechanism for the first thing (so
reusing 2PC doesn't help) and we already have the second thing (it's
the commit record itself).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Robert Haas

Date:

14 January 2015, 04:28:41

On Sun, Jan 11, 2015 at 3:36 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
>> My understanding is that once you get a successful PREPARE that should mean
>> that it's basically impossible for the transaction to fail to commit. If
>> that's not the case, I fail to see how you can get any decent level of
>> sanity out of this...
> When giving the responsability of a group of COMMIT PREPARED to a set
> of nodes in a network, there could be a couple of problems showing up,
> of the type split-brain for example.

I think this is just confusing the issue.  When a machine reports that
a transaction is successfully prepared, any future COMMIT PREPARED
operation *must* succeed.  If it doesn't, the machine has broken its
promises, and that's not OK.  Period.  It doesn't matter whether
that's due to split-brain or sunspots or Oscar Wilde having bad
breath.  If you say that it's prepared, then you're not allowed to
change your mind later and say that it can't be committed.  If you do,
then you have a broken 2PC implementation and, as Jim says, all bets
are off.

Now of course nothing is certain in life except death and taxes.  If
you PREPARE a transaction, and then go into the data directory and
corrupt the 2PC state file using dd, and then try to commit it, it
might fail.  But no system can survive that sort of thing, whether 2PC
is involved or not; in such extraordinary situations, of course
operator intervention will be required.  But in a more normal
situation where you just have a failover, if the failover causes your
prepared transaction to come unprepared, that means your failover
mechanism is broken.  If you're using synchronous replication, this
shouldn't happen.

> There could be as well failures
> at hardware-level, so you would need a mechanism ensuring that WAL is
> consistent among all the nodes, with for example the addition of a
> common restore point on all the nodes once PREPARE is successfully
> done with for example XLOG_RESTORE_POINT. That's a reason why I think
> that the local Coordinator should use 2PC as well, to ensure a
> consistency point once all the remote nodes have successfully
> PREPAREd, and a reason why things can get complicated for either the
> DBA or the upper application in charge of ensuring the DB consistency
> even in case of critical failures.

It's up to the DBA to decide whether they care about surviving
complete loss of a node while having 2PC still work.  If they do, they
should use sync rep, and they should be fine -- the machine on which
the transaction is prepared shouldn't acknowledge the PREPARE as
having succeeded until the WAL is safely on disk on the standby.  Most
probably don't, though; that's a big performance penalty.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

17 February 2015, 09:26:30

Hi All,

Here are the steps and infrastructure for achieving atomic commits across multiple foreign servers. I have tried to address most of the concerns raised in this mail thread before. Let me know, if I have left something. Attached is a WIP patch implementing the same for postgres_fdw. I have tried to make it FDW-independent.

A. Steps during transaction processing
------------------------------------------------

1. When an FDW connects to a foreign server and starts a transaction, it registers that server with a boolean flag indicating whether that server is capable of participating in a two phase commit. In the patch this is implemented using function RegisterXactForeignServer(), which raises an error, thus aborting the transaction, if there is at least one foreign server incapable of 2PC in a multiserver transaction. This error thrown as early as possible. If all the foreign servers involved in the transaction are capable of 2PC, the function just updates the information. As of now, in the patch the function is in the form of a stub.

Whether a foreign server is capable of 2PC, can be
a. FDW level decision e.g. file_fdw as of now, is incapable of 2PC but it can build the capabilities which can be used for all the servers using file_fdw

b. a decision based on server version type etc. thus FDW can decide that by looking at the server properties for each server

c. a user decision where the FDW can allow a user to specify it in the form of CREATE/ALTER SERVER option. Implemented in the patch.

For a transaction involving only a single foreign server, the current code remains unaltered as two phase commit is not needed. Rest of the discussion pertains to a transaction involving more than one foreign servers.

At the commit or abort time, the FDW receives call backs with the appropriate events. FDW then takes following actions on each event.

2. On XACT_EVENT_PRE_COMMIT event, the FDW coins one prepared transaction id per foreign server involved and saves it along with xid, dbid, foreign server id and user mapping and foreign transaction status = PREPARING in-memory. The prepared transaction id can be anything represented as byte string. Same information is flushed to the disk to survive crashes. This is implemented in the patch as prepare_foreign_xact(). Persistent and in-memory storages and their usages are discussed later in the mail. FDW then prepares the transaction on the foreign server. If this step is successful, the foreign transaction status is changed to PREPARED. If the step is unsuccessful, the local transaction is aborted and each FDW will receive XACT_EVENT_ABORT (discussed later). The updates to the foreign transaction status need not be flushed to the disk, as they can be inferred from the status of local transaction.

3. If the local transaction is committed, the FDW callback will get XACT_EVENT_COMMIT event. Foreign transaction status is changed to COMMITTING. FDW tries to commit the foreign transaction with the prepared transaction id. If the commit is successful, the foreign transaction entry is removed. If the commit is unsuccessful because of local/foreign server crash or network failure, the foreign prepared transaction resolver takes care of the committing it at later point of time.

4. If the local transaction is aborted, the FDW callback will get XACT_EVENT_ABORT event. At this point, the FDW may or may not have prepared a transaction on foreign server as per step 1 above. If it has not prepared the transaction, it simply aborts the transaction on foreign server; a server crash or network failure doesn't alter the ultimate result in this case. If FDW has prepared the foreign transaction, it updates the foreign transaction status as ABORTING and tries to rollback the prepared transaction. If the rollback is successful, the foreign transaction entry is removed. If the rollback is not successful, the foreign prepared transaction resolver takes care of aborting it at later point of time.

B. Foreign prepared transaction resolver
---------------------------------------------------

In the patch this is implemented as a built-in function pg_fdw_resolve(). Ideally the functionality should be run by a background worker process frequently.

The resolver looks at each entry and invokes the FDW routine to resolve the transaction. The FDW routine returns boolean status: true if the prepared transaction was resolved (committed/aborted), false otherwise.

The resolution is as follows -

1. If foreign transaction status is COMMITTING or ABORTING, commits or aborts the prepared transaction resp through the FDW routine. If the transaction is successfully resolved, it removes the foreign transaction entry.

2. Else, it checks if the local transaction was committed or aborted, it update the foreign transaction status accordingly and takes the action according to above step 1.

3. The resolver doesn't touch entries created by in-progress local transactions.

If server/backend crashes after it has registered the foreign transaction entry (during step A.1), we will be left with a prepared transaction id, which was never prepared on the foreign server. Similarly the server/backend crashes after it has resolved the foreign prepared transaction but before removing the entry, same situation can arise. FDW should detect these situations, when foreign server complains about non-existing prepared transaction ids and consider such foreign transactions as resolved.

After looking at all the entries the resolver flushes the entries to the disk, so as to retain the latest status across shutdown and crash.

C. Other methods and infrastructure
------------------------------------------------

1. Method to show the current foreign transaction entries (in progress or waiting to be resolved). Implemented as function pg_fdw_xact() in the patch.

2. Method to drop foreign transaction entries in case they are resolved by user/DBA themselves. Not implemented in the patch.

3. Method to prevent altering or dropping foreign server and user mapping used to prepare the foreign transaction till the later gets resolved. Not implemented in the patch. While altering or dropping the foreign server or user mapping, that portion of the code needs to check if there exists an foreign transaction entry depending upon the foreign server or user mapping and should error out.

4. The information about the xid needs to be available till it is decided whether to commit or abort the foreign transaction and that decision is persisted. That should put some constraint on the xid wraparound or oldest active transaction. Not implemented in the patch.

5. Method to propagate the foreign transaction information to the slave.

D. Persistent and in-memory storage considerations
--------------------------------------------------------------------

I considered following options for persistent storage

1. in-memory table and file(s) - The foreign transaction entries are saved and manipulated in shared memory. They are written to file whenever persistence is necessary e.g. while registering the foreign transaction in step A.2. Requirements C.1, C.2 need some SQL interface in the form of built-in functions or SQL commands.

The patch implements the in-memory foreign transaction table as a fixed size array of foreign transaction entries (similar to prepared transaction entries in twophase.c). This puts a restriction on number of foreign prepared transactions that need to be maintained at a time. We need separate locks to syncronize the access to the shared memory; the patch uses only a single LW lock. There is restriction on the length of prepared transaction id (or prepared transaction information saved by FDW to be general), since everything is being saved in fixed size memory. We may be able to overcome that restriction by writing this information to separate files (one file per foreign prepared transaction). We need to take the same route as 2PC for C.5.

2. New catalog - This method takes out the need to have separate method for C1, C5 and even C2, also the synchronization will be taken care of by row locks, there will be no limit on the number of foreign transactions as well as the size of foreign prepared transaction information. But big problem with this approach is that, the changes to the catalogs are atomic with the local transaction. If a foreign prepared transaction can not be aborted while the local transaction is rolled back, that entry needs to retained. But since the local transaction is aborting the corresponding catalog entry would become invisible and thus unavailable to the resolver (alas! we do not have autonomous transaction support). We may be able to overcome this, by simulating autonomous transaction through a background worker (which can also act as a resolver). But the amount of communication and synchronization, might affect the performance.

A mixed approach where the backend shifts the entries from storage in approach 1 to catalog, thus lifting the constraints on size is possible, but is very complicated.

Any other ideas to use catalog table as the persistent storage here? Does anybody think, catalog table is a viable option?

3. WAL records - Since the algorithm follows "write ahead of action", WAL seems to be a possible way to persist the foreign transaction entries. But WAL records can not be used for repeated scan as is required by the foreign transaction resolver. Also, replaying WALs is controlled by checkpoint, so not all WALs are replayed. If a checkpoint happens after a foreign prepared transaction remains resolved, corresponding WALs will never be replayed, thus causing the foreign prepared transaction to remain unresolved forever without a clue. So, WALs alone don't seem to be a fit here.

The algorithms rely on the FDWs to take right steps to the large extent, rather than controlling each step explicitly. It expects the FDWs to take the right steps for each event and call the right functions to manipulate foreign transaction entries. It does not ensure the correctness of these steps, by say examining the foreign transaction entries in response to each event or by making the callback return the information and manipulate the entries within the core. I am willing to go the stricter but more intrusive route if the others also think that way. Otherwise, the current approach is less intrusive and I am fine with that too.

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

pg_fdw_transact.patch

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

26 February 2015, 09:22:00

Added to 2015-06 commitfest to attract some reviews and comments.

On Tue, Feb 17, 2015 at 2:56 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

Hi All,

Here are the steps and infrastructure for achieving atomic commits across multiple foreign servers. I have tried to address most of the concerns raised in this mail thread before. Let me know, if I have left something. Attached is a WIP patch implementing the same for postgres_fdw. I have tried to make it FDW-independent.

A. Steps during transaction processing
------------------------------------------------

1. When an FDW connects to a foreign server and starts a transaction, it registers that server with a boolean flag indicating whether that server is capable of participating in a two phase commit. In the patch this is implemented using function RegisterXactForeignServer(), which raises an error, thus aborting the transaction, if there is at least one foreign server incapable of 2PC in a multiserver transaction. This error thrown as early as possible. If all the foreign servers involved in the transaction are capable of 2PC, the function just updates the information. As of now, in the patch the function is in the form of a stub.

Whether a foreign server is capable of 2PC, can be
a. FDW level decision e.g. file_fdw as of now, is incapable of 2PC but it can build the capabilities which can be used for all the servers using file_fdw
b. a decision based on server version type etc. thus FDW can decide that by looking at the server properties for each server
c. a user decision where the FDW can allow a user to specify it in the form of CREATE/ALTER SERVER option. Implemented in the patch.

For a transaction involving only a single foreign server, the current code remains unaltered as two phase commit is not needed. Rest of the discussion pertains to a transaction involving more than one foreign servers.
At the commit or abort time, the FDW receives call backs with the appropriate events. FDW then takes following actions on each event.

2. On XACT_EVENT_PRE_COMMIT event, the FDW coins one prepared transaction id per foreign server involved and saves it along with xid, dbid, foreign server id and user mapping and foreign transaction status = PREPARING in-memory. The prepared transaction id can be anything represented as byte string. Same information is flushed to the disk to survive crashes. This is implemented in the patch as prepare_foreign_xact(). Persistent and in-memory storages and their usages are discussed later in the mail. FDW then prepares the transaction on the foreign server. If this step is successful, the foreign transaction status is changed to PREPARED. If the step is unsuccessful, the local transaction is aborted and each FDW will receive XACT_EVENT_ABORT (discussed later). The updates to the foreign transaction status need not be flushed to the disk, as they can be inferred from the status of local transaction.

3. If the local transaction is committed, the FDW callback will get XACT_EVENT_COMMIT event. Foreign transaction status is changed to COMMITTING. FDW tries to commit the foreign transaction with the prepared transaction id. If the commit is successful, the foreign transaction entry is removed. If the commit is unsuccessful because of local/foreign server crash or network failure, the foreign prepared transaction resolver takes care of the committing it at later point of time.

4. If the local transaction is aborted, the FDW callback will get XACT_EVENT_ABORT event. At this point, the FDW may or may not have prepared a transaction on foreign server as per step 1 above. If it has not prepared the transaction, it simply aborts the transaction on foreign server; a server crash or network failure doesn't alter the ultimate result in this case. If FDW has prepared the foreign transaction, it updates the foreign transaction status as ABORTING and tries to rollback the prepared transaction. If the rollback is successful, the foreign transaction entry is removed. If the rollback is not successful, the foreign prepared transaction resolver takes care of aborting it at later point of time.

B. Foreign prepared transaction resolver
---------------------------------------------------
In the patch this is implemented as a built-in function pg_fdw_resolve(). Ideally the functionality should be run by a background worker process frequently.

The resolver looks at each entry and invokes the FDW routine to resolve the transaction. The FDW routine returns boolean status: true if the prepared transaction was resolved (committed/aborted), false otherwise.
The resolution is as follows -
1. If foreign transaction status is COMMITTING or ABORTING, commits or aborts the prepared transaction resp through the FDW routine. If the transaction is successfully resolved, it removes the foreign transaction entry.
2. Else, it checks if the local transaction was committed or aborted, it update the foreign transaction status accordingly and takes the action according to above step 1.
3. The resolver doesn't touch entries created by in-progress local transactions.

If server/backend crashes after it has registered the foreign transaction entry (during step A.1), we will be left with a prepared transaction id, which was never prepared on the foreign server. Similarly the server/backend crashes after it has resolved the foreign prepared transaction but before removing the entry, same situation can arise. FDW should detect these situations, when foreign server complains about non-existing prepared transaction ids and consider such foreign transactions as resolved.

After looking at all the entries the resolver flushes the entries to the disk, so as to retain the latest status across shutdown and crash.

C. Other methods and infrastructure
------------------------------------------------
1. Method to show the current foreign transaction entries (in progress or waiting to be resolved). Implemented as function pg_fdw_xact() in the patch.
2. Method to drop foreign transaction entries in case they are resolved by user/DBA themselves. Not implemented in the patch.
3. Method to prevent altering or dropping foreign server and user mapping used to prepare the foreign transaction till the later gets resolved. Not implemented in the patch. While altering or dropping the foreign server or user mapping, that portion of the code needs to check if there exists an foreign transaction entry depending upon the foreign server or user mapping and should error out.
4. The information about the xid needs to be available till it is decided whether to commit or abort the foreign transaction and that decision is persisted. That should put some constraint on the xid wraparound or oldest active transaction. Not implemented in the patch.
5. Method to propagate the foreign transaction information to the slave.

D. Persistent and in-memory storage considerations
--------------------------------------------------------------------
I considered following options for persistent storage
1. in-memory table and file(s) - The foreign transaction entries are saved and manipulated in shared memory. They are written to file whenever persistence is necessary e.g. while registering the foreign transaction in step A.2. Requirements C.1, C.2 need some SQL interface in the form of built-in functions or SQL commands.

The patch implements the in-memory foreign transaction table as a fixed size array of foreign transaction entries (similar to prepared transaction entries in twophase.c). This puts a restriction on number of foreign prepared transactions that need to be maintained at a time. We need separate locks to syncronize the access to the shared memory; the patch uses only a single LW lock. There is restriction on the length of prepared transaction id (or prepared transaction information saved by FDW to be general), since everything is being saved in fixed size memory. We may be able to overcome that restriction by writing this information to separate files (one file per foreign prepared transaction). We need to take the same route as 2PC for C.5.

2. New catalog - This method takes out the need to have separate method for C1, C5 and even C2, also the synchronization will be taken care of by row locks, there will be no limit on the number of foreign transactions as well as the size of foreign prepared transaction information. But big problem with this approach is that, the changes to the catalogs are atomic with the local transaction. If a foreign prepared transaction can not be aborted while the local transaction is rolled back, that entry needs to retained. But since the local transaction is aborting the corresponding catalog entry would become invisible and thus unavailable to the resolver (alas! we do not have autonomous transaction support). We may be able to overcome this, by simulating autonomous transaction through a background worker (which can also act as a resolver). But the amount of communication and synchronization, might affect the performance.

A mixed approach where the backend shifts the entries from storage in approach 1 to catalog, thus lifting the constraints on size is possible, but is very complicated.

Any other ideas to use catalog table as the persistent storage here? Does anybody think, catalog table is a viable option?

3. WAL records - Since the algorithm follows "write ahead of action", WAL seems to be a possible way to persist the foreign transaction entries. But WAL records can not be used for repeated scan as is required by the foreign transaction resolver. Also, replaying WALs is controlled by checkpoint, so not all WALs are replayed. If a checkpoint happens after a foreign prepared transaction remains resolved, corresponding WALs will never be replayed, thus causing the foreign prepared transaction to remain unresolved forever without a clue. So, WALs alone don't seem to be a fit here.

The algorithms rely on the FDWs to take right steps to the large extent, rather than controlling each step explicitly. It expects the FDWs to take the right steps for each event and call the right functions to manipulate foreign transaction entries. It does not ensure the correctness of these steps, by say examining the foreign transaction entries in response to each event or by making the callback return the information and manipulate the entries within the core. I am willing to go the stricter but more intrusive route if the others also think that way. Otherwise, the current approach is less intrusive and I am fine with that too.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Heikki Linnakangas

Date:

07 July 2015, 09:26:03

On 02/17/2015 11:26 AM, Ashutosh Bapat wrote:
> Hi All,
>
> Here are the steps and infrastructure for achieving atomic commits across
> multiple foreign servers. I have tried to address most of the concerns
> raised in this mail thread before. Let me know, if I have left something.
> Attached is a WIP patch implementing the same for postgres_fdw. I have
> tried to make it FDW-independent.

Wow, this is going to be a lot of new infrastructure. This is going to 
need good documentation, explaining how two-phase commit works in 
general, how it's implemented, how to monitor it etc. It's important to 
explain all the possible failure scenarios where you're left with 
in-doubt transactions, and how the DBA can resolve them.

Since we're building a Transaction Manager into PostgreSQL, please put a 
lot of thought on what kind of APIs it provides to the rest of the 
system. APIs for monitoring it, configuring it, etc. And how an 
extension could participate in a transaction, without necessarily being 
an FDW.

Regarding the configuration, there are many different behaviours that an 
FDW could implement:

1. The FDW is read-only. Commit/abort behaviour is moot.
2. Transactions are not supported. All updates happen immediately 
regardless of the local transaction.
3. Transactions are supported, but two-phase commit is not. There are 
three different ways we can use the remote transactions in that case:
3.1. Commit the remote transaction before local transaction.
3.2. Commit the remote transaction after local transaction.
3.3. As long as there is only one such FDW involved, we can still do 
safe two-phase commit using so-called Last Resource Optimization.
4. Full two-phases commit support

We don't necessarily have to support all of that, but let's keep all 
these cases in mind when we design the how to configure FDWs. There's 
more to it than "does it support 2PC".

> A. Steps during transaction processing
> ------------------------------------------------
>
> 1. When an FDW connects to a foreign server and starts a transaction, it
> registers that server with a boolean flag indicating whether that server is
> capable of participating in a two phase commit. In the patch this is
> implemented using function RegisterXactForeignServer(), which raises an
> error, thus aborting the transaction, if there is at least one foreign
> server incapable of 2PC in a multiserver transaction. This error thrown as
> early as possible. If all the foreign servers involved in the transaction
> are capable of 2PC, the function just updates the information. As of now,
> in the patch the function is in the form of a stub.
>
> Whether a foreign server is capable of 2PC, can be
> a. FDW level decision e.g. file_fdw as of now, is incapable of 2PC but it
> can build the capabilities which can be used for all the servers using
> file_fdw
> b. a decision based on server version type etc. thus FDW can decide that by
> looking at the server properties for each server
> c. a user decision where the FDW can allow a user to specify it in the form
> of CREATE/ALTER SERVER option. Implemented in the patch.
>
> For a transaction involving only a single foreign server, the current code
> remains unaltered as two phase commit is not needed.

Just to be clear: you also need two-phase commit if the transaction 
updated anything in the local server and in even one foreign server.

> D. Persistent and in-memory storage considerations
> --------------------------------------------------------------------
> I considered following options for persistent storage
> 1. in-memory table and file(s) - The foreign transaction entries are saved
> and manipulated in shared memory. They are written to file whenever
> persistence is necessary e.g. while registering the foreign transaction in
> step A.2. Requirements C.1, C.2 need some SQL interface in the form of
> built-in functions or SQL commands.
>
> The patch implements the in-memory foreign transaction table as a fixed
> size array of foreign transaction entries (similar to prepared transaction
> entries in twophase.c). This puts a restriction on number of foreign
> prepared transactions that need to be maintained at a time. We need
> separate locks to syncronize the access to the shared memory; the patch
> uses only a single LW lock. There is restriction on the length of prepared
> transaction id (or prepared transaction information saved by FDW to be
> general), since everything is being saved in fixed size memory. We may be
> able to overcome that restriction by writing this information to separate
> files (one file per foreign prepared transaction). We need to take the same
> route as 2PC for C.5.

Your current approach with a file that's flushed to disk on every update 
has a few problems. Firstly, it's not crash safe. Secondly, if you make 
it crash-safe with fsync(), performance will suffer. You're going to 
need to need several fsyncs per commit with 2PC anyway, there's no way 
around that, but the scalable way to do that is to use the WAL so that 
one fsync() can flush more than one update in one operation.

So I think you'll need to do something similar to the pg_twophase files. 
WAL-log each update, and only flush the file/files to disk on a 
checkpoint. Perhaps you could use the pg_twophase infrastructure for 
this directly, by essentially treating every local transaction as a 
two-phase transaction, with some extra flag to indicate that it's an 
internally-created one.

> 2. New catalog - This method takes out the need to have separate method for
> C1, C5 and even C2, also the synchronization will be taken care of by row
> locks, there will be no limit on the number of foreign transactions as well
> as the size of foreign prepared transaction information. But big problem
> with this approach is that, the changes to the catalogs are atomic with the
> local transaction. If a foreign prepared transaction can not be aborted
> while the local transaction is rolled back, that entry needs to retained.
> But since the local transaction is aborting the corresponding catalog entry
> would become invisible and thus unavailable to the resolver (alas! we do
> not have autonomous transaction support). We may be able to overcome this,
> by simulating autonomous transaction through a background worker (which can
> also act as a resolver). But the amount of communication and
> synchronization, might affect the performance.

Or you could insert/update the rows in the catalog with xmin=FrozenXid, 
ignoring MVCC. Not sure how well that would work.

> 3. WAL records - Since the algorithm follows "write ahead of action", WAL
> seems to be a possible way to persist the foreign transaction entries. But
> WAL records can not be used for repeated scan as is required by the foreign
> transaction resolver. Also, replaying WALs is controlled by checkpoint, so
> not all WALs are replayed. If a checkpoint happens after a foreign prepared
> transaction remains resolved, corresponding WALs will never be replayed,
> thus causing the foreign prepared transaction to remain unresolved forever
> without a clue. So, WALs alone don't seem to be a fit here.

Right. The pg_twophase files solve that exact same issue.

There is clearly a lot of work to do here. I'm marking this as Returned 
with Feedback in the commitfest, I don't think more review is going to 
be helpful at this point.

- Heikki

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

09 July 2015, 10:18:36

Hi All,

I have been working on improving the previous implementation and addressing TODOs in my previous mail. Let me explain the approach first and I will get to Heikki's comments later in the same mail.

The patch provides support for atomic commit for transactions involving foreign servers. When a transaction makes changes to foreign servers, either all the changes to all the foreign servers commit or rollback. We should not see some changes committed and others rolled back.

Hooks and GUCs
==============

The patch introduces a GUC atomic_foreign_transaction, which when ON ensures atomic commit for foreign transactions, otherwise not. The value of this GUC at the time of committing or preparing a local transaction is used. This gives applications the flexibility to choose the behaviour as late in the transaction as possible. This GUC has no effect if there are no foreign servers involved in the transaction.

Another GUC max_fdw_transactions sets the maximum number of transactions that can be simultaneously prepared on all the foreign servers. This limits the memory required for remembering the prepared foreign transactions.

Two new FDW hooks are introduced for transaction management.

1. GetPrepareId: to get the prepared transaction identifier for a given foreign server connection. An FDW which doesn't want to support this feature can keep this hook undefined (NULL). When defined the hook should return a unique identifier for the transaction prepared on the foreign server. The identifier should be unique enough not to conflict with currently prepared or future transactions. This point will be clear when discussing phase 2 of 2PC.

2. HandleForeignTransaction: to end a transaction in specified way. The hook should be able to prepare/commit/rollback current running transaction on given connection or commit/rollback a previously prepared transaction. This is described in detail while describing phase two of two-phase commit. The function is required to return a boolean status of whether the requested operation was successful or not. The function or its minions should not raise any error on failure so as not to interfere with the distributed transaction processing. This point will be clarified more in the description below.

Achieving atomic commit
===================

If atomic_foreign_transaction is enabled, two-commit protocol is used to achieve atomic commit for transaction involving foreign servers. All the foreign servers participating in such transaction should be capable of participating in two-phase commit protocol. If not, the local and foreign transactions are aborted as atomic commit can not be guaranteed.

Phase 1
-----------

Every FDW needs to register the connection while starting new transaction on a foreign connection (RegisterXactForeignServer()). A foreign server connection is identified by foreign server oid and the local user oid (similar to the entry cached by postgres_fdw). While registering FDW also tells whether the foreign server is capable of participating in two-phase commit protocol. How to decide that is left entirely to the FDW. An FDW like file_fdw may not have 2PC support at all, so all its foreign servers do not comply with 2PC. An FDW might have all its servers 2PC compliant. An FDW like postgres_fdw can have some of its servers compliant and some not, depending upon server version, configuration (max_prepared_transactions = 0) etc. An FDW can decide not to register its connections at all and the foreign servers belonging to that FDW will not be considered by the core at all.

During pre-commit processing following steps are executed
1. GetPrepareId hook is called on each of the connections registered to get the identifier that will be used to prepare the transaction.
2. For each connection the prepared transaction id along with the connection information, database id and local transaction id (xid) is recorded in the memory.

3. This is logged in XLOG. If standby is configured, it is replayed on standby. In case of master failover a standby is able to resolve in-doubt prepared transactions created by the master.

4. The information is written to an on-disk file in pg_fdw_xact/ directory. This directory contains one file per prepared transaction on foreign connection. The file is fsynced during checkpoint similar to pg_twophase files. The file management in this directory is similar to the way, files are managed in pg_twophase.

5. HandleForeignTransaction is called to prepare the transaction on given connection with the identifier provided by GetPrepareId().

If the server crashes after step 5, we will remember the transaction prepared on the foreign server and will try to abort it after recovery. If it crashes after step 3 and completion of 5, we will remember a transaction that was never prepared and try to resolve it later. This scenario will be described while describing phase 2.

If any of the steps fail including the PREPARE on the foreign server itself, the local transaction will be aborted. All the prepared transactions on foreign servers will be aborted as described in phase 2 discussion below. Yet to be prepared transactions are rolled back by using the same hook. If step 5 fails, the prepared foreign transaction entry is removed from memory and disk following steps 2,3,4 in phase 2. HandleForeignTransaction throwing an error will interfere with this, so it is not expected to throw an error.

If the transactions are prepared on all the foreign servers successfully, we enter phase 2 of 2PC.

The local transaction is not required to be prepared per say.

Phase 2
-----------

After the local transaction has committed or aborted the foreign transactions prepared as part of the local transaction as committed or aborted resp. Committing or aborting prepared foreign transaction is collectively henceforth termed as "resolving" for simplicity. Following steps are executed while resolving a foreign prepared transaction.

1. Resolve the foreign prepared transaction on corresponding foreign server using user mapping of local user used at the time of preparing the transaction. This is done through hook HandleForeignTransaction().

2. If the resolution is successful, remove the prepared foreign transaction entry from the memory

3. Log about removal of entry in XLOG. When this log is replayed during recovery or in standby mode, it executes step 4 below.

4. Remove the corresponding file from pg_fdw_xact directory.

If the resolution is unsuccessful, leave the entry untouched. Since this phase is carried out when no transaction exists, HandleForeignTransaction should not throw an error and should be designed not to access database while performing this operation.

In case server crashes after step 1 and before step 3, a resolved foreign transaction will be considered unresolved when the local server recovers or standby takes over the master. It will try to resolve the prepared transaction again and should get an error from foreign server. HandleForeignTransaction hook should treat this as normal and return true since the prepared transaction is resolved (or rather there is nothing that can be done). For such cases it is important that GetPrepareId returns a transaction identifier which does not conflict with a future tansaction id, lest we may resolve (may be with wrong outcome) a prepared transaction which shouldn't be resolved.

Any crash or connection failure in phase 2 leaves the prepared transaction in unresolved state.

Resolving unresolved foreign transactions
================================

A local/foreign server crash or connection failure after a transaction is prepared on the foreign server leaves that transaction in unresolved state. The patch provides a built-in function pg_fdw_resolve() to resolve those after recovering from the failure. This built-in scans all the prepared transactions in-memory and decides the fate (commit/rollback) based on the fate of local transaction that prepared it on the foreign server. It does not touch entries corresponding to the in-progress local transactions. It then executes the same steps as phase 2 to resolve the prepared foreign transactions. Since foreign server information is contained within a database, the function only touches the entries corresponding to the database from which it is invoked. A user can configure a daemon or cron-job to execute this function frequently from various databases. Alternatively, user can use contrib module pg_fdw_xact_resolver which does the same using background worker mechanism. This module needs to be installed and listed in shared_preload_libraries to start the daemon automatically on the startup.

A foreign server, user mapping corresponding to an unresolved foreign transaction is not allowed to be dropped or altered until the foreign transaction is resolved. This is required to retain the connection properties which need to resolve the prepared transaction on the foreign server.

Crash recovery
============

During crash recovery, the files in pg_fdw_xact/ are created or removed when corresponding WAL records are replayed. After the redo is done pg_fdw_xact directory is scanned for unresolved foreign prepared transactions. The files in this directory are named as triplet (xid, foreign server oid, user oid) to create a unique name for each file. This scan also emits the oldest transaction id with an unresolved prepared foreign transactions. This affects oldest active transaction id, since the status of this transaction id is required to decide the fate of unresolved prepared foreign transaction.

On standby during WAL replay files are just created or removed. If the standby is required to finish recovery and take over the master, pg_fdw_xact is scanned to read unresolved foreign prepared transactions into the shared memory.

Preparing transaction involving foreign server/s, on local server
=================================================

While PREPARing a local transaction that involves foreign servers, the transactions are prepared on the foreign server (as described in phase 1 above), if atomic_foreign_transaction is enabled. If the GUC is disabled, such local transactions can not be prepared (as of this patch at least). This also means that all the foreign servers participating in the transaction to be prepared are required to support 2PC. While committing/rolling back the prepared transaction the corresponding foreign prepared transactions are committed or rolled back (as described in phase 2) resp. Any unresolved foreign transactions are resolved the same way as above.

View for checking the current foreign prepared transactions
=============================================

A built-in function pg_fdw_xact() lists all the currently prepared foreign transactions. This function does not list anything on standby while its replaying WAL, since it doesn't have any entry in-memory. A convenient view pg_fdw_xacts lists the same with the oids converted to the names.

Handling non-atomic foreign transactions
===============================

When atomic_foreign_transaction is disabled, one-phase commit protocol is used to commit/rollback the foreign transactions. After the local transaction has committed/aborted, all the foreign transactions on the registered foreign connections are committed or aborted resp. using hook HandleForeignTransaction. Failing to commit a foreign transaction does not affect the other foreign transactions; they are still tried to be committed (if the local transaction commits).

PITR
====

PITR may rewind the database to a point before an xid associated with an unresolved foreign transaction. There are two approaches to deal with the situation.

1. Just forget about the unresolved foreign transaction and remove the file just like we do for a prepared local transaction. But then the prepared transaction on the foreign server might be left unresolved forever and will keep holding the resources.

2. Do not allow PITR to such point. We can not get rid of the transaction id without getting rid of prepared foreign transaction. If we do so, we might create conflicting files in future and might resolve the transaction with wrong outcome.

Rest of the mail contains replies to Heikki's comments.

On Tue, Jul 7, 2015 at 2:55 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 02/17/2015 11:26 AM, Ashutosh Bapat wrote:
Hi All,

Here are the steps and infrastructure for achieving atomic commits across
multiple foreign servers. I have tried to address most of the concerns
raised in this mail thread before. Let me know, if I have left something.
Attached is a WIP patch implementing the same for postgres_fdw. I have
tried to make it FDW-independent.

Wow, this is going to be a lot of new infrastructure. This is going to need good documentation, explaining how two-phase commit works in general, how it's implemented, how to monitor it etc. It's important to explain all the possible failure scenarios where you're left with in-doubt transactions, and how the DBA can resolve them.

I have included some documentation in the patch. Once we agree on the functionality, design, I will improve the documentation further.

Since we're building a Transaction Manager into PostgreSQL, please put a lot of thought on what kind of APIs it provides to the rest of the system. APIs for monitoring it, configuring it, etc. And how an extension could participate in a transaction, without necessarily being an FDW.

The patch has added all of it except extension thing. Let me know if anything is missing.

Even today and extension can participate in a transaction by registering transaction and subtransaction call backs. So, as long as an extension (and so some FDW) does things such that the failures in those do not affect the atomicity, they can use these callbacks. However, these call backs are not enough to handle unresolved prepared transactions or handle connectivity failures in the phase 2. The patch adds infrastructure to do that.

dblink might be something on your mind, but to support dblink here, it will need too liberal format for storing information about the prepared transactions on other servers. This format will vary from extension to extension, and may not be very useful as above. What we might be able to do is expose the functions for creating files for prepared transactions and logging about them and let extension use them. BTW, dblink_plus already supports 2PC for dblink.

Regarding the configuration, there are many different behaviours that an FDW could implement:

1. The FDW is read-only. Commit/abort behaviour is moot.

I can think of two flavours of read-only FDW: 1. the underlying data is read-only 2. the FDW is read-only but the underlying data is not.

In first case, the FDW may choose not to participate in the transaction management at all, so doesn't register the foreign connections. Still the rest of the transaction will be atomic.

In second case however, the writes to other foreign server may depend upon what has been read from the read-only FDW esp. in repeatable read and higher isolation levels. So it's important that the data once read remains intact till the transaction commits or at least is prepared, implying we have to start a transaction on the read-only foreign server. Once the other foreign transactions get prepared, we might be able to commit the transaction on read-only foreign server. That optimization is not yet implemented by my patch. But it should be possible to do in the approach taken by the patch. Can we leave that as a future enhancement?

Does that solve your concern?

2. Transactions are not supported. All updates happen immediately regardless of the local transaction.

An FDW can choose not to register its server and local PostgreSQL won't know about it. Is that acceptable behaviour?

3. Transactions are supported, but two-phase commit is not. There are three different ways we can use the remote transactions in that case:

This case is supported by using GUC atomic_foreign_transaction. The patch implements 3.2 approach.

3.1. Commit the remote transaction before local transaction.
3.2. Commit the remote transaction after local transaction.
3.3. As long as there is only one such FDW involved, we can still do safe two-phase commit using so-called Last Resource Optimization.

IIUC LRO, the patch uses the local transaction as last resource, which is always present. The fate of foreign transaction is decided by the fate of the local transaction, which is not required to be prepared per say. There is more relevant note later.

4. Full two-phases commit support

We don't necessarily have to support all of that, but let's keep all these cases in mind when we design the how to configure FDWs. There's more to it than "does it support 2PC".

A. Steps during transaction processing
------------------------------------------------

1. When an FDW connects to a foreign server and starts a transaction, it
registers that server with a boolean flag indicating whether that server is
capable of participating in a two phase commit. In the patch this is
implemented using function RegisterXactForeignServer(), which raises an
error, thus aborting the transaction, if there is at least one foreign
server incapable of 2PC in a multiserver transaction. This error thrown as
early as possible. If all the foreign servers involved in the transaction
are capable of 2PC, the function just updates the information. As of now,
in the patch the function is in the form of a stub.

Whether a foreign server is capable of 2PC, can be
a. FDW level decision e.g. file_fdw as of now, is incapable of 2PC but it
can build the capabilities which can be used for all the servers using
file_fdw
b. a decision based on server version type etc. thus FDW can decide that by
looking at the server properties for each server
c. a user decision where the FDW can allow a user to specify it in the form
of CREATE/ALTER SERVER option. Implemented in the patch.

For a transaction involving only a single foreign server, the current code
remains unaltered as two phase commit is not needed.

Just to be clear: you also need two-phase commit if the transaction updated anything in the local server and in even one foreign server.

Any local transaction involving a foreign sever transaction uses two-phase commit for the foreign transaction. The local transaction is not prepared per say. However, we should be able to optimize a case, when there are no local changes. I am not able to find a way to deduce that there was no local change, so I have left that case in this patch. Is there a way to know whether a local transaction changed something locally or not?

D. Persistent and in-memory storage considerations
--------------------------------------------------------------------
I considered following options for persistent storage
1. in-memory table and file(s) - The foreign transaction entries are saved
and manipulated in shared memory. They are written to file whenever
persistence is necessary e.g. while registering the foreign transaction in
step A.2. Requirements C.1, C.2 need some SQL interface in the form of
built-in functions or SQL commands.

The patch implements the in-memory foreign transaction table as a fixed
size array of foreign transaction entries (similar to prepared transaction
entries in twophase.c). This puts a restriction on number of foreign
prepared transactions that need to be maintained at a time. We need
separate locks to syncronize the access to the shared memory; the patch
uses only a single LW lock. There is restriction on the length of prepared
transaction id (or prepared transaction information saved by FDW to be
general), since everything is being saved in fixed size memory. We may be
able to overcome that restriction by writing this information to separate
files (one file per foreign prepared transaction). We need to take the same
route as 2PC for C.5.

Your current approach with a file that's flushed to disk on every update has a few problems. Firstly, it's not crash safe. Secondly, if you make it crash-safe with fsync(), performance will suffer. You're going to need to need several fsyncs per commit with 2PC anyway, there's no way around that, but the scalable way to do that is to use the WAL so that one fsync() can flush more than one update in one operation.

So I think you'll need to do something similar to the pg_twophase files. WAL-log each update, and only flush the file/files to disk on a checkpoint. Perhaps you could use the pg_twophase infrastructure for this directly, by essentially treating every local transaction as a two-phase transaction, with some extra flag to indicate that it's an internally-created one.

I have used approach similar to pg_twophase, but implemented it as a separate code, as the requirements differ. But, I would like to minimize code by unifying both, if we finalise this design. Suggestions in this regard will be very helpful.

2. New catalog - This method takes out the need to have separate method for
C1, C5 and even C2, also the synchronization will be taken care of by row
locks, there will be no limit on the number of foreign transactions as well
as the size of foreign prepared transaction information. But big problem
with this approach is that, the changes to the catalogs are atomic with the
local transaction. If a foreign prepared transaction can not be aborted
while the local transaction is rolled back, that entry needs to retained.
But since the local transaction is aborting the corresponding catalog entry
would become invisible and thus unavailable to the resolver (alas! we do
not have autonomous transaction support). We may be able to overcome this,
by simulating autonomous transaction through a background worker (which can
also act as a resolver). But the amount of communication and
synchronization, might affect the performance.

Or you could insert/update the rows in the catalog with xmin=FrozenXid, ignoring MVCC. Not sure how well that would work.

I am not aware how to do that. Do we have any precedence in the code. Something like a reference implementation, which I can follow. It will help to lift two restrictions

1. Restriction on the number of simultaneously prepared foreign transactions.

2. Restriction on the prepared transaction identifier length.

Obviously we may be able to shed a lot of code related to file managment, lookup etc.

3. WAL records - Since the algorithm follows "write ahead of action", WAL
seems to be a possible way to persist the foreign transaction entries. But
WAL records can not be used for repeated scan as is required by the foreign
transaction resolver. Also, replaying WALs is controlled by checkpoint, so
not all WALs are replayed. If a checkpoint happens after a foreign prepared
transaction remains resolved, corresponding WALs will never be replayed,
thus causing the foreign prepared transaction to remain unresolved forever
without a clue. So, WALs alone don't seem to be a fit here.

Right. The pg_twophase files solve that exact same issue.

There is clearly a lot of work to do here.

I'm marking this as Returned with Feedback in the commitfest, I don't think more review is going to be helpful at this point.

That's sad. Hope people to review the patch and help it improve, even if it's out of commitfest.

- Heikki

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

pg_fdw_transact.patch

Re: Transactions involving multiple postgres foreign servers

From

Robert Haas

Date:

17 July 2015, 16:50:28

Overall, you seem to have made some significant progress on the design
since the last version of this patch.  There's probably a lot left to
do, but the design seems more mature now.  I haven't read the code,
but here are some comments based on the email.

On Thu, Jul 9, 2015 at 6:18 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> The patch introduces a GUC atomic_foreign_transaction, which when ON ensures
> atomic commit for foreign transactions, otherwise not. The value of this GUC
> at the time of committing or preparing a local transaction is used. This
> gives applications the flexibility to choose the behaviour as late in the
> transaction as possible. This GUC has no effect if there are no foreign
> servers involved in the transaction.

Hmm.  I'm not crazy about that name, but I don't have a better idea either.

One thing about this design is that it makes atomicity a property of
the transaction rather than the server.  That is, any given
transaction is either atomic with respect to all servers or atomic
with respect to none.  You could also design this the other way: each
server is either configured to do atomic commit, or not.  When a
transaction is committed, it prepares on those servers which are
configured for it, and then commits the others.  So then you can have
a "partially atomic" transaction where, for example, you transfer
money from account A to account B (using one or more FDW connections
that support atomic commit) and also use twitter_fdw to tweet about it
(using an FDW connection that does NOT support atomic commit).  The
tweet will survive even if the local commit fails, but that's OK.  You
could even do this at table granularity: we'll prepare the transaction
if at least one foreign table involved in the transaction has
atomic_commit = true.

In some sense I think this might be a nicer design, because suppose
you connect to a foreign server and mostly just log stuff but
occasionally do important things there.  In your design, you can do
this, but you'll need to make sure atomic_foreign_transaction is set
for the correct set of transactions.  But in what I'm proposing here
we might be able to derive the correct value mostly automatically.

We should consider other possible designs as well; the choices we make
here may have a significant impact on usability.

> Another GUC max_fdw_transactions sets the maximum number of transactions
> that can be simultaneously prepared on all the foreign servers. This limits
> the memory required for remembering the prepared foreign transactions.

How about max_prepared_foreign_transactions?

> Two new FDW hooks are introduced for transaction management.
> 1. GetPrepareId: to get the prepared transaction identifier for a given
> foreign server connection. An FDW which doesn't want to support this feature
> can keep this hook undefined (NULL). When defined the hook should return a
> unique identifier for the transaction prepared on the foreign server. The
> identifier should be unique enough not to conflict with currently prepared
> or future transactions. This point will be clear when discussing phase 2 of
> 2PC.
>
> 2. HandleForeignTransaction: to end a transaction in specified way. The hook
> should be able to prepare/commit/rollback current running transaction on
> given connection or commit/rollback a previously prepared transaction. This
> is described in detail while describing phase two of two-phase commit. The
> function is required to return a boolean status of whether the requested
> operation was successful or not. The function or its minions should not
> raise any error on failure so as not to interfere with the distributed
> transaction processing. This point will be clarified more in the description
> below.

HandleForeignTransaction is not very descriptive, and I think you're
jamming together things that ought to be separated.  Let's have a
PrepareForeignTransaction and a ResolvePreparedForeignTransaction.

> A foreign server, user mapping corresponding to an unresolved foreign
> transaction is not allowed to be dropped or altered until the foreign
> transaction is resolved. This is required to retain the connection
> properties which need to resolve the prepared transaction on the foreign
> server.

I agree with not letting it be dropped, but I think not letting it be
altered is a serious mistake.  Suppose the foreign server dies in a
fire, its replica is promoted, and we need to re-point the master at
the replica's hostname or IP.

> Handling non-atomic foreign transactions
> ===============================
> When atomic_foreign_transaction is disabled, one-phase commit protocol is
> used to commit/rollback the foreign transactions. After the local
> transaction has committed/aborted, all the foreign transactions on the
> registered foreign connections are committed or aborted resp. using hook
> HandleForeignTransaction. Failing to commit a foreign transaction does not
> affect the other foreign transactions; they are still tried to be committed
> (if the local transaction commits).

Is this a change from the current behavior?  What if we call the first
commit handler and it throws an ERROR?  Presumably then nothing else
gets committed, and the transaction overall aborts.

> PITR
> ====
> PITR may rewind the database to a point before an xid associated with an
> unresolved foreign transaction. There are two approaches to deal with the
> situation.
> 1. Just forget about the unresolved foreign transaction and remove the file
> just like we do for a prepared local transaction. But then the prepared
> transaction on the foreign server might be left unresolved forever and will
> keep holding the resources.
> 2. Do not allow PITR to such point. We can not get rid of the transaction id
> without getting rid of prepared foreign transaction. If we do so, we might
> create conflicting files in future and might resolve the transaction with
> wrong outcome.

I don't think either of these is correct.  The database shouldn't
behave differently when PITR is used than when it isn't.  Otherwise
you are not doing what it says on the tin: recovering to the chosen
point in time.  I recommend adding a function that forgets about a
foreign prepared transaction and making it the DBA's job to figure out
whether to call it in a particular scenario.  After all, the remote
machine might have been subjected to PITR, too.  Or maybe not.  We
can't know, so we should give the DBA the tools to clean things up and
leave it at that.

> IIUC LRO, the patch uses the local transaction as last resource, which is
> always present. The fate of foreign transaction is decided by the fate of
> the local transaction, which is not required to be prepared per say. There
> is more relevant note later.

Personally, I think that's perfectly fine.  We could do more later if
we wanted to, but there's plenty to like here without that.

>> Just to be clear: you also need two-phase commit if the transaction
>> updated anything in the local server and in even one foreign server.
>
> Any local transaction involving a foreign sever transaction uses two-phase
> commit for the foreign transaction. The local transaction is not prepared
> per say. However, we should be able to optimize a case, when there are no
> local changes. I am not able to find a way to deduce that there was no local
> change, so I have left that case in this patch. Is there a way to know
> whether a local transaction changed something locally or not?

You might check whether it wrote any WAL.  There's a global variable
for that somewhere; RecordTransactionCommit() uses it.  But I don't
think this is an essential optimization for v1, either.

> I have used approach similar to pg_twophase, but implemented it as a
> separate code, as the requirements differ. But, I would like to minimize
> code by unifying both, if we finalise this design. Suggestions in this
> regard will be very helpful.

-1 for trying to unify those unless it's really clear that it's a good
idea.  I bet it's not.

>> Or you could insert/update the rows in the catalog with xmin=FrozenXid,
>> ignoring MVCC. Not sure how well that would work.
>
> I am not aware how to do that. Do we have any precedence in the code.

No.  I bet that's also a bad idea.  A non-transactional table is a
good idea that has been proposed before, but let's not try to invent
it in this patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

29 July 2015, 10:58:46

On Fri, Jul 17, 2015 at 10:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Overall, you seem to have made some significant progress on the design
since the last version of this patch. There's probably a lot left to
do, but the design seems more mature now. I haven't read the code,
but here are some comments based on the email.

Thanks for your comments.

I have incorporated most of your suggestions (marked as Done) in the attached patch.

On Thu, Jul 9, 2015 at 6:18 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> The patch introduces a GUC atomic_foreign_transaction, which when ON ensures
> atomic commit for foreign transactions, otherwise not. The value of this GUC
> at the time of committing or preparing a local transaction is used. This
> gives applications the flexibility to choose the behaviour as late in the
> transaction as possible. This GUC has no effect if there are no foreign
> servers involved in the transaction.

Hmm. I'm not crazy about that name, but I don't have a better idea either.

One thing about this design is that it makes atomicity a property of
the transaction rather than the server. That is, any given
transaction is either atomic with respect to all servers or atomic
with respect to none. You could also design this the other way: each
server is either configured to do atomic commit, or not. When a
transaction is committed, it prepares on those servers which are
configured for it, and then commits the others. So then you can have
a "partially atomic" transaction where, for example, you transfer
money from account A to account B (using one or more FDW connections
that support atomic commit) and also use twitter_fdw to tweet about it
(using an FDW connection that does NOT support atomic commit). The
tweet will survive even if the local commit fails, but that's OK. You

could even do this at table granularity: we'll prepare the transaction
if at least one foreign table involved in the transaction has
atomic_commit = true.

In some sense I think this might be a nicer design, because suppose
you connect to a foreign server and mostly just log stuff but
occasionally do important things there. In your design, you can do
this, but you'll need to make sure atomic_foreign_transaction is set
for the correct set of transactions. But in what I'm proposing here
we might be able to derive the correct value mostly automatically.

A user may set atomic_foreign_transaction to ON to guarantee atomicity, IOW it throws error when atomicity can not be guaranteed. Thus if application accidentally does something to a foreign server, which doesn't support 2PC, the transaction would abort. A user may set it to OFF (consciously and takes the responsibility of the result) so as not to use 2PC (probably to reduce the overheads) even if the foreign server is 2PC compliant. So, I thought a GUC would be necessary. We can incorporate the behaviour you are suggesting by having atomic_foreign_transaction accept three values "full" (ON behaviour), "partial" (behaviour you are describing), "none" (OFF behaviour). Default value of this GUC would be "partial". Will that be fine?

About table level atomic commit attribute, I agree that some foreign tables might hold "more critical" data than others from the same server, but I am not sure whether only that attribute should dictate the atomicity or not. A transaction collectively might need to be "atomic" even if the individual tables it modified are not set atomic_commit attribute. So, we need a transaction level attribute for atomicity, which may be overridden by a table level attribute. Should we add support to the table level atomicity setting as version 2+?

We should consider other possible designs as well; the choices we make
here may have a significant impact on usability.

I looked at other RBDMSes like IBM's federated database or Oracle. They support only "full" behaviour as described above with some optimizations like LRO. But, I would like to hear about other options.

> Another GUC max_fdw_transactions sets the maximum number of transactions
> that can be simultaneously prepared on all the foreign servers. This limits
> the memory required for remembering the prepared foreign transactions.

How about max_prepared_foreign_transactions?

Done.

> Two new FDW hooks are introduced for transaction management.
> 1. GetPrepareId: to get the prepared transaction identifier for a given
> foreign server connection. An FDW which doesn't want to support this feature
> can keep this hook undefined (NULL). When defined the hook should return a
> unique identifier for the transaction prepared on the foreign server. The
> identifier should be unique enough not to conflict with currently prepared
> or future transactions. This point will be clear when discussing phase 2 of
> 2PC.
>
> 2. HandleForeignTransaction: to end a transaction in specified way. The hook
> should be able to prepare/commit/rollback current running transaction on
> given connection or commit/rollback a previously prepared transaction. This
> is described in detail while describing phase two of two-phase commit. The
> function is required to return a boolean status of whether the requested
> operation was successful or not. The function or its minions should not
> raise any error on failure so as not to interfere with the distributed
> transaction processing. This point will be clarified more in the description
> below.

HandleForeignTransaction is not very descriptive, and I think you're
jamming together things that ought to be separated. Let's have a
PrepareForeignTransaction and a ResolvePreparedForeignTransaction.

Done, there are three hooks now

1. For preparing a foreign transaction

2. For resolving a prepared foreign transaction

3. For committing/aborting a running foreign transaction (more explanation later)

> A foreign server, user mapping corresponding to an unresolved foreign
> transaction is not allowed to be dropped or altered until the foreign
> transaction is resolved. This is required to retain the connection
> properties which need to resolve the prepared transaction on the foreign
> server.

I agree with not letting it be dropped, but I think not letting it be
altered is a serious mistake. Suppose the foreign server dies in a
fire, its replica is promoted, and we need to re-point the master at
the replica's hostname or IP.

Done

IP might be fine, but consider altering dbname option or dropping it; we won't find the prepared foreign transaction in new database. I think we should at least warn the user that there exist a prepared foreign transaction on given foreign server or user mapping; better even if we let FDW decide which options are allowed to be altered when there exists a foreign prepared transaction. The later requires some surgery in the way we handle the options.

> Handling non-atomic foreign transactions
> ===============================
> When atomic_foreign_transaction is disabled, one-phase commit protocol is
> used to commit/rollback the foreign transactions. After the local
> transaction has committed/aborted, all the foreign transactions on the
> registered foreign connections are committed or aborted resp. using hook
> HandleForeignTransaction. Failing to commit a foreign transaction does not
> affect the other foreign transactions; they are still tried to be committed
> (if the local transaction commits).

Is this a change from the current behavior?

There is no current behaviour defined per say. Each FDW is free to add its transaction callbacks, which can commit/rollback their respective transactions at pre-commit time or after the commit. postgres_fdw's callback tries to commit the foreign transactions on PRE_COMMIT event and throws error if that fails.

What if we call the first
commit handler and it throws an ERROR? Presumably then nothing else
gets committed, and the transaction overall aborts.

In this case, the fate of transaction depends upon the order in which foreign transactions are committed, in turn the order in which the foreign transactions are started. This can result in non-deterministic results. The patch tries to give it a deterministic behaviour: commit whatever can be committed and abort rest. This requires EndForeignTransaction (HandleForeignTransaction in the earlier patch) hook not to raise error. Although I do not know how to prevent it from throwing an error. We may try catching the error and not rethrowing them. But I haven't tried that.

The same requirement goes with ResolvePreparedForeignTransaction(). If that hook throws an error, we end up with unresolved prepared transactions, which will be committed only when the resolver kicks in.

> PITR
> ====
> PITR may rewind the database to a point before an xid associated with an
> unresolved foreign transaction. There are two approaches to deal with the
> situation.
> 1. Just forget about the unresolved foreign transaction and remove the file
> just like we do for a prepared local transaction. But then the prepared
> transaction on the foreign server might be left unresolved forever and will
> keep holding the resources.
> 2. Do not allow PITR to such point. We can not get rid of the transaction id
> without getting rid of prepared foreign transaction. If we do so, we might
> create conflicting files in future and might resolve the transaction with
> wrong outcome.

I don't think either of these is correct. The database shouldn't
behave differently when PITR is used than when it isn't. Otherwise
you are not doing what it says on the tin: recovering to the chosen
point in time. I recommend adding a function that forgets about a
foreign prepared transaction and making it the DBA's job to figure out
whether to call it in a particular scenario. After all, the remote
machine might have been subjected to PITR, too. Or maybe not. We
can't know, so we should give the DBA the tools to clean things up and
leave it at that.

I have added a built-in pg_fdw_remove() (or any suitable name), which removes the prepared foreign transaction entry from the memory and disk. The function needs to be called before attempting PITR. If the recovery points to a past time without removing file, we abort the recovery. In such case, a DBA can remove the foreign prepared transaction file manually before recovery. I have added a hint with that effect in the error message. Is that enough?

I noticed that the functions pg_fdw_resolve() and pg_fdw_remove() which resolve or remove unresolved prepared foreign transaction resp. are effecting changes which can not be rolled back if the transaction which ran these functions rolled back. These need to be converted into SQL command like ROLLBACK PREPARED which can't be run within a transaction.

> IIUC LRO, the patch uses the local transaction as last resource, which is
> always present. The fate of foreign transaction is decided by the fate of
> the local transaction, which is not required to be prepared per say. There
> is more relevant note later.

Personally, I think that's perfectly fine. We could do more later if
we wanted to, but there's plenty to like here without that.

Agreed.

>> Just to be clear: you also need two-phase commit if the transaction
>> updated anything in the local server and in even one foreign server.
>
> Any local transaction involving a foreign sever transaction uses two-phase
> commit for the foreign transaction. The local transaction is not prepared
> per say. However, we should be able to optimize a case, when there are no
> local changes. I am not able to find a way to deduce that there was no local
> change, so I have left that case in this patch. Is there a way to know
> whether a local transaction changed something locally or not?

You might check whether it wrote any WAL. There's a global variable
for that somewhere; RecordTransactionCommit() uses it. But I don't
think this is an essential optimization for v1, either.

Agreed.

> I have used approach similar to pg_twophase, but implemented it as a
> separate code, as the requirements differ. But, I would like to minimize
> code by unifying both, if we finalise this design. Suggestions in this
> regard will be very helpful.

-1 for trying to unify those unless it's really clear that it's a good
idea. I bet it's not.

Fine.

>> Or you could insert/update the rows in the catalog with xmin=FrozenXid,
>> ignoring MVCC. Not sure how well that would work.
>
> I am not aware how to do that. Do we have any precedence in the code.

No. I bet that's also a bad idea. A non-transactional table is a
good idea that has been proposed before, but let's not try to invent
it in this patch.

Agreed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

pg_fdw_transact.patch

Re: Transactions involving multiple postgres foreign servers

From

Robert Haas

Date:

29 July 2015, 20:23:02

On Wed, Jul 29, 2015 at 6:58 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> A user may set atomic_foreign_transaction to ON to guarantee atomicity, IOW
> it throws error when atomicity can not be guaranteed. Thus if application
> accidentally does something to a foreign server, which doesn't support 2PC,
> the transaction would abort. A user may set it to OFF (consciously and takes
> the responsibility of the result) so as not to use 2PC (probably to reduce
> the overheads) even if the foreign server is 2PC compliant. So, I thought a
> GUC would be necessary. We can incorporate the behaviour you are suggesting
> by having atomic_foreign_transaction accept three values "full" (ON
> behaviour), "partial" (behaviour you are describing), "none" (OFF
> behaviour). Default value of this GUC would be "partial". Will that be fine?

I don't really see the point.  If the user attempts a distributed
transaction involving FDWs that can't support atomic foreign
transactions, then I think it's reasonable to assume that they want
that to work rather than arbitrarily fail.  The only situation in
which it's desirable for that to fail is when the user doesn't realize
that the FDW in question doesn't support atomic foreign commit, and
the error message warns them that their assumptions are unfounded.
But can't the user find that out easily enough by reading the
documentation?   So I think that in practice the "full" value of this
GUC would get almost zero use; I think that nearly everyone will be
happy with what you are here calling "partial" or "none".  I'll defer
to any other consensus that emerges, but that's my view.

I think that we should not change the default behavior.  Currently,
the only behavior is not to attempt 2PC.  Let's stick with that.

> About table level atomic commit attribute, I agree that some foreign tables
> might hold "more critical" data than others from the same server, but I am
> not sure whether only that attribute should dictate the atomicity or not. A
> transaction collectively might need to be "atomic" even if the individual
> tables it modified are not set atomic_commit attribute. So, we need a
> transaction level attribute for atomicity, which may be overridden by a
> table level attribute. Should we add support to the table level atomicity
> setting as version 2+?

I'm not hung up on the table-level attribute, but I think having a
server-level attribute rather than a global GUC is a good idea.
However, I welcome other thoughts on that.

>> We should consider other possible designs as well; the choices we make
>> here may have a significant impact on usability.
>
> I looked at other RBDMSes like IBM's federated database or Oracle. They
> support only "full" behaviour as described above with some optimizations
> like LRO. But, I would like to hear about other options.

Yes, I hope others will weigh in.

>> HandleForeignTransaction is not very descriptive, and I think you're
>> jamming together things that ought to be separated.  Let's have a
>> PrepareForeignTransaction and a ResolvePreparedForeignTransaction.
>
> Done, there are three hooks now
> 1. For preparing a foreign transaction
> 2. For resolving a prepared foreign transaction
> 3. For committing/aborting a running foreign transaction (more explanation
> later)

(2) and (3) seem like the same thing.  I don't see any further
explanation later in your email; what am I missing?

> IP might be fine, but consider altering dbname option or dropping it; we
> won't find the prepared foreign transaction in new database.

Probably not, but I think that's the DBA's problem, not ours.

> I think we
> should at least warn the user that there exist a prepared foreign
> transaction on given foreign server or user mapping; better even if we let
> FDW decide which options are allowed to be altered when there exists a
> foreign prepared transaction. The later requires some surgery in the way we
> handle the options.

We can consider that, but I don't think it's an essential part of the
patch, and I'd punt it for now in the interest of keeping this as
simple as possible.

>> Is this a change from the current behavior?
>
> There is no current behaviour defined per say.

My point is that you had some language in the email describing what
happens if the GUC is turned off.  You shouldn't have to describe
that, because there should be absolutely zero difference.  If there
isn't, that's a problem for this patch, and probably a subject for a
different one.

> I have added a built-in pg_fdw_remove() (or any suitable name), which
> removes the prepared foreign transaction entry from the memory and disk. The
> function needs to be called before attempting PITR.  If the recovery points
> to a past time without removing file, we abort the recovery. In such case, a
> DBA can remove the foreign prepared transaction file manually before
> recovery. I have added a hint with that effect in the error message. Is that
> enough?

That seems totally broken.  Before PITR, the database might be
inconsistent, in which case you can't call any functions at all.
Also, you shouldn't be trying to resolve any transactions until the
end of recovery, because you don't know when you see that the
transaction was prepared whether, at some subsequent time, you will
see it resolved.  You need to finish recovery and, only after entering
normal running, decide whether to resolve any transactions that are
still sitting around.  There should be no situation (short of e.g. OS
errors writing the state files) where this stuff makes recovery fail.

> I noticed that the functions pg_fdw_resolve() and pg_fdw_remove() which
> resolve or remove unresolved prepared foreign transaction resp. are
> effecting changes which can not be rolled back if the transaction which ran
> these functions rolled back. These need to be converted into SQL command
> like ROLLBACK PREPARED which can't be run within a transaction.

Yeah, maybe.  I'm not sure using a functional interface is all that
bad, but we could think about changing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

31 July 2015, 10:33:18

On Thu, Jul 30, 2015 at 1:52 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jul 29, 2015 at 6:58 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> A user may set atomic_foreign_transaction to ON to guarantee atomicity, IOW
> it throws error when atomicity can not be guaranteed. Thus if application
> accidentally does something to a foreign server, which doesn't support 2PC,
> the transaction would abort. A user may set it to OFF (consciously and takes
> the responsibility of the result) so as not to use 2PC (probably to reduce
> the overheads) even if the foreign server is 2PC compliant. So, I thought a
> GUC would be necessary. We can incorporate the behaviour you are suggesting
> by having atomic_foreign_transaction accept three values "full" (ON
> behaviour), "partial" (behaviour you are describing), "none" (OFF
> behaviour). Default value of this GUC would be "partial". Will that be fine?

I don't really see the point. If the user attempts a distributed
transaction involving FDWs that can't support atomic foreign
transactions, then I think it's reasonable to assume that they want
that to work rather than arbitrarily fail. The only situation in
which it's desirable for that to fail is when the user doesn't realize
that the FDW in question doesn't support atomic foreign commit, and
the error message warns them that their assumptions are unfounded.
But can't the user find that out easily enough by reading the
documentation? So I think that in practice the "full" value of this
GUC would get almost zero use; I think that nearly everyone will be
happy with what you are here calling "partial" or "none". I'll defer
to any other consensus that emerges, but that's my view.

I think that we should not change the default behavior. Currently,
the only behavior is not to attempt 2PC. Let's stick with that.

Ok. I will remove the GUC and have "partial atomic" behaviour as you suggested in previous mail.

> About table level atomic commit attribute, I agree that some foreign tables
> might hold "more critical" data than others from the same server, but I am
> not sure whether only that attribute should dictate the atomicity or not. A
> transaction collectively might need to be "atomic" even if the individual
> tables it modified are not set atomic_commit attribute. So, we need a
> transaction level attribute for atomicity, which may be overridden by a
> table level attribute. Should we add support to the table level atomicity
> setting as version 2+?

I'm not hung up on the table-level attribute, but I think having a
server-level attribute rather than a global GUC is a good idea.
However, I welcome other thoughts on that.

The patch supports server level attribute. Let me repeat the relevant description from my earlier mail
--
Every FDW needs to register the connection while starting new transaction on a foreign connection (RegisterXactForeignServer()). A foreign server connection is identified by foreign server oid and the local user oid (similar to the entry cached by postgres_fdw). While registering, FDW also tells whether the foreign server is capable of participating in two-phase commit protocol. How to decide that is left entirely to the FDW. An FDW like file_fdw may not have 2PC support at all, so all its foreign servers do not comply with 2PC. An FDW might have all its servers 2PC compliant. An FDW like postgres_fdw can have some of its servers compliant and some not, depending upon server version, configuration (max_prepared_transactions = 0) etc.
--

Does that look good?

>> We should consider other possible designs as well; the choices we make
>> here may have a significant impact on usability.
>
> I looked at other RBDMSes like IBM's federated database or Oracle. They
> support only "full" behaviour as described above with some optimizations
> like LRO. But, I would like to hear about other options.

Yes, I hope others will weigh in.

>> HandleForeignTransaction is not very descriptive, and I think you're
>> jamming together things that ought to be separated. Let's have a
>> PrepareForeignTransaction and a ResolvePreparedForeignTransaction.
>
> Done, there are three hooks now
> 1. For preparing a foreign transaction
> 2. For resolving a prepared foreign transaction
> 3. For committing/aborting a running foreign transaction (more explanation
> later)

(2) and (3) seem like the same thing. I don't see any further
explanation later in your email; what am I missing?

In case of postgres_fdw, 2 always fires COMMIT/ROLLBACK PREPARED 'xyz' (fill the prepared transaction id) and 3 always fires COMMIT/ABORT TRANSACTION (notice absence of PREPARED and 'xyz'). We might want to combine them into a single hook but there are slight differences there depending upon the FDW. For postgres_fdw, 2 should get a connection which should not have a running transaction, whereas for 3 there has to be a running transaction on that connection. Hook 2 should get prepared foreign transaction identifier as one of the arguments, hook 3 will not have that argument. Hook 2 will be relevant for two-phase commit protocol where as 3 will be used for connections not using two-phase commit.

The differences are much more visible in the code.

> IP might be fine, but consider altering dbname option or dropping it; we
> won't find the prepared foreign transaction in new database.

Probably not, but I think that's the DBA's problem, not ours.

Fine.

> I think we
> should at least warn the user that there exist a prepared foreign
> transaction on given foreign server or user mapping; better even if we let
> FDW decide which options are allowed to be altered when there exists a
> foreign prepared transaction. The later requires some surgery in the way we
> handle the options.

We can consider that, but I don't think it's an essential part of the
patch, and I'd punt it for now in the interest of keeping this as
simple as possible.

Fine.

>> Is this a change from the current behavior?
>
> There is no current behaviour defined per say.

My point is that you had some language in the email describing what
happens if the GUC is turned off. You shouldn't have to describe
that, because there should be absolutely zero difference. If there
isn't, that's a problem for this patch, and probably a subject for a
different one.

Ok got it.

> I have added a built-in pg_fdw_remove() (or any suitable name), which
> removes the prepared foreign transaction entry from the memory and disk. The
> function needs to be called before attempting PITR. If the recovery points
> to a past time without removing file, we abort the recovery. In such case, a
> DBA can remove the foreign prepared transaction file manually before
> recovery. I have added a hint with that effect in the error message. Is that
> enough?

That seems totally broken. Before PITR, the database might be
inconsistent, in which case you can't call any functions at all.
Also, you shouldn't be trying to resolve any transactions until the
end of recovery, because you don't know when you see that the
transaction was prepared whether, at some subsequent time, you will
see it resolved. You need to finish recovery and, only after entering
normal running, decide whether to resolve any transactions that are
still sitting around.

That's how it works in the patch for unresolved prepared foreign transactions belonging to xids within the known range. For those belonging to xids in future (beyond of known range of xids after PITR), we can not determine the status of that local transaction (as those do not appear in the xlog) and hence can not decide the fate of prepared foreign transaction. You seem to be suggesting that we should let the recovery finish and mark those prepared foreign transaction as "can not be resolved" or something like that. A DBA can remove those entries once s/he has dealt with them on the foreign server.

There's little problem with that approach. Triplet (xid, serverid, userid) is used to identify the a foreign prepared transaction entry in memory and is used to create unique file name for storing it on the disk. If we allow a future xid after PITR, it might conflict with an xid of a transaction that might take place after PITR. It will cause problem if exactly same foreign server and user participate in the transaction with conflicting xid (rare but possible).

Other problem is that the foreign server on which the transaction was prepared (or the user whose mapping was used to prepare the transaction), might have got added in a future time wrt PITR, in which case, we can not even know which foreign server this transaction was prepared on.

There should be no situation (short of e.g. OS
errors writing the state files) where this stuff makes recovery fail.

During PITR, if we encounter a prepared (local) transaction with a future xid, we just forget that prepared transaction (instead of failing recovery). May be we should do the same for unresolved foreign prepared transaction as well (at least for version 1); forget the unresolved prepared foreign transactions which belong to a future xid. Anyway, as per the timeline after PITR those never existed.

Other DBMSes solve this problem by using markers. Markers are allowed to be set at times when there were no unresolved foreign transactions and PITR is allowed upto those markers and not any arbitrary point in time. But this looks out of scope of this patch.

> I noticed that the functions pg_fdw_resolve() and pg_fdw_remove() which
> resolve or remove unresolved prepared foreign transaction resp. are
> effecting changes which can not be rolled back if the transaction which ran
> these functions rolled back. These need to be converted into SQL command
> like ROLLBACK PREPARED which can't be run within a transaction.

Yeah, maybe. I'm not sure using a functional interface is all that
bad, but we could think about changing it.

Fine.

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Robert Haas

Date:

31 July 2015, 18:49:04

On Fri, Jul 31, 2015 at 6:33 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:>
>> I'm not hung up on the table-level attribute, but I think having a
>> server-level attribute rather than a global GUC is a good idea.
>> However, I welcome other thoughts on that.
>
> The patch supports server level attribute. Let me repeat the relevant
> description from my earlier mail
> --
> Every FDW needs to register the connection while starting new transaction on
> a foreign connection (RegisterXactForeignServer()). A foreign server
> connection is identified by foreign server oid and the local user oid
> (similar to the entry cached by postgres_fdw). While registering, FDW also
> tells whether the foreign server is capable of participating in two-phase
> commit protocol. How to decide that is left entirely to the FDW. An FDW like
> file_fdw may not have 2PC support at all, so all its foreign servers do not
> comply with 2PC. An FDW might have all its servers 2PC compliant. An FDW
> like postgres_fdw can have some of its servers compliant and some not,
> depending upon server version, configuration (max_prepared_transactions = 0)
> etc.
> --
>
> Does that look good?

OK, sure.  But let's make sure postgres_fdw gets a server-level option
to control this.

>> > Done, there are three hooks now
>> > 1. For preparing a foreign transaction
>> > 2. For resolving a prepared foreign transaction
>> > 3. For committing/aborting a running foreign transaction (more
>> > explanation
>> > later)
>>
>> (2) and (3) seem like the same thing.  I don't see any further
>> explanation later in your email; what am I missing?
>
> In case of postgres_fdw, 2 always fires COMMIT/ROLLBACK PREPARED 'xyz' (fill
> the prepared transaction id) and 3 always fires COMMIT/ABORT TRANSACTION
> (notice absence of PREPARED and 'xyz').

Oh, OK.  But then isn't #3 something we already have?  i.e. pgfdw_xact_callback?

>> That seems totally broken.  Before PITR, the database might be
>> inconsistent, in which case you can't call any functions at all.
>> Also, you shouldn't be trying to resolve any transactions until the
>> end of recovery, because you don't know when you see that the
>> transaction was prepared whether, at some subsequent time, you will
>> see it resolved.  You need to finish recovery and, only after entering
>> normal running, decide whether to resolve any transactions that are
>> still sitting around.
>
> That's how it works in the patch for unresolved prepared foreign
> transactions belonging to xids within the known range. For those belonging
> to xids in future (beyond of known range of xids after PITR), we can not
> determine the status of that local transaction (as those do not appear in
> the xlog) and hence can not decide the fate of prepared foreign transaction.
> You seem to be suggesting that we should let the recovery finish and mark
> those prepared foreign transaction as "can not be resolved" or something
> like that. A DBA can remove those entries once s/he has dealt with them on
> the foreign server.
>
> There's little problem with that approach. Triplet (xid, serverid, userid)
> is used to identify the a foreign prepared transaction entry in memory and
> is used to create unique file name for storing it on the disk. If we allow a
> future xid after PITR, it might conflict with an xid of a transaction that
> might take place after PITR. It will cause problem if exactly same foreign
> server and user participate in the transaction with conflicting xid (rare
> but possible).
>
> Other problem is that the foreign server on which the transaction was
> prepared (or the user whose mapping was used to prepare the transaction),
> might have got added in a future time wrt PITR, in which case, we can not
> even know which foreign server this transaction was prepared on.
>
>> There should be no situation (short of e.g. OS
>> errors writing the state files) where this stuff makes recovery fail.
>
> During PITR, if we encounter a prepared (local) transaction with a future
> xid, we just forget that prepared transaction (instead of failing recovery).
> May be we should do the same for unresolved foreign prepared transaction as
> well (at least for version 1); forget the unresolved prepared foreign
> transactions which belong to a future xid. Anyway, as per the timeline after
> PITR those never existed.

This last sentence seems to me to be exactly on point.  Note the
comment in twophase.c:
* We throw away any prepared xacts with main XID beyond nextXid --- if any* are present, it suggests that the DBA has
donea PITR recovery to an* earlier point in time without cleaning out pg_twophase.  We dare not* try to recover such
preparedxacts since they likely depend on database* state that doesn't exist now.
 

In other words, normally there should never be any XIDs "from the
future" with prepared transactions; but in certain PITR scenarios it
might be possible.  We might as well be consistent with what the
existing 2PC code does in this case - i.e. just warn and then remove
the files.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Amit Kapila

Date:

01 August 2015, 11:57:13

On Tue, Feb 17, 2015 at 2:56 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

2. New catalog - This method takes out the need to have separate method for C1, C5 and even C2, also the synchronization will be taken care of by row locks, there will be no limit on the number of foreign transactions as well as the size of foreign prepared transaction information. But big problem with this approach is that, the changes to the catalogs are atomic with the local transaction. If a foreign prepared transaction can not be aborted while the local transaction is rolled back, that entry needs to retained. But since the local transaction is aborting the corresponding catalog entry would become invisible and thus unavailable to the resolver (alas! we do not have autonomous transaction support). We may be able to overcome this, by simulating autonomous transaction through a background worker (which can also act as a resolver). But the amount of communication and synchronization, might affect the performance.

For Rollback, why can't we do it in reverse way, first rollback

transaction in foreign servers and then rollback local transaction.

I think for Commit, it is essential that we first commit in local

server, so that we can resolve the transaction status of prepared

transactions on foreign servers after crash recovery. However

for Abort case, I think even if we don't Rollback in local server, it

can be deduced (any transaction which is not committed should be

Rolledback) during crash recovery for the matter of resolving

transaction status of prepared transaction.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Transactions involving multiple postgres foreign servers

From

Amit Kapila

Date:

03 August 2015, 04:05:31

On Thu, Jul 9, 2015 at 3:48 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

2. New catalog - This method takes out the need to have separate method for
C1, C5 and even C2, also the synchronization will be taken care of by row
locks, there will be no limit on the number of foreign transactions as well
as the size of foreign prepared transaction information. But big problem
with this approach is that, the changes to the catalogs are atomic with the
local transaction. If a foreign prepared transaction can not be aborted
while the local transaction is rolled back, that entry needs to retained.
But since the local transaction is aborting the corresponding catalog entry
would become invisible and thus unavailable to the resolver (alas! we do
not have autonomous transaction support). We may be able to overcome this,
by simulating autonomous transaction through a background worker (which can
also act as a resolver). But the amount of communication and
synchronization, might affect the performance.

Or you could insert/update the rows in the catalog with xmin=FrozenXid, ignoring MVCC. Not sure how well that would work.

I am not aware how to do that. Do we have any precedence in the code. Something like a reference implementation, which I can follow.

Does some thing on lines of Copy Freeze can help here?

However if you are going to follow this method, then I think you

need to also ensure when and how to clear those rows after

rollback is complete or once resolver has resolved those prepared

foreign transactions.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

03 August 2015, 12:25:03

On Sat, Aug 1, 2015 at 12:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jul 31, 2015 at 6:33 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:>
>> I'm not hung up on the table-level attribute, but I think having a
>> server-level attribute rather than a global GUC is a good idea.
>> However, I welcome other thoughts on that.
>
> The patch supports server level attribute. Let me repeat the relevant
> description from my earlier mail
> --
> Every FDW needs to register the connection while starting new transaction on
> a foreign connection (RegisterXactForeignServer()). A foreign server
> connection is identified by foreign server oid and the local user oid
> (similar to the entry cached by postgres_fdw). While registering, FDW also
> tells whether the foreign server is capable of participating in two-phase
> commit protocol. How to decide that is left entirely to the FDW. An FDW like
> file_fdw may not have 2PC support at all, so all its foreign servers do not
> comply with 2PC. An FDW might have all its servers 2PC compliant. An FDW
> like postgres_fdw can have some of its servers compliant and some not,
> depending upon server version, configuration (max_prepared_transactions = 0)
> etc.
> --
>
> Does that look good?

OK, sure. But let's make sure postgres_fdw gets a server-level option
to control this.

For postgres_fdw it's a boolean server-level option 'twophase_compliant' (suggestions for name welcome).

>> > Done, there are three hooks now
>> > 1. For preparing a foreign transaction
>> > 2. For resolving a prepared foreign transaction
>> > 3. For committing/aborting a running foreign transaction (more
>> > explanation
>> > later)
>>
>> (2) and (3) seem like the same thing. I don't see any further
>> explanation later in your email; what am I missing?
>
> In case of postgres_fdw, 2 always fires COMMIT/ROLLBACK PREPARED 'xyz' (fill
> the prepared transaction id) and 3 always fires COMMIT/ABORT TRANSACTION
> (notice absence of PREPARED and 'xyz').

Oh, OK. But then isn't #3 something we already have? i.e. pgfdw_xact_callback?

While transactions are being prepared on the foreign connections, if any prepare fails, we have to abort transactions on the rest of the connections (and abort the prepared transactions). pgfdw_xact_callback wouldn't know, which connections have prepared transactions and which do not have. So, even in case of two-phase commit we need all the three hooks. Since we have to define these three hooks, we might as well centralize all the transaction processing and let the foreign transaction manager decide which of the hooks to invoke. So, the patch moves most of the code in pgfdw_xact_callback in the relevant hook and foreign transaction manager invokes appropriate hook. Only thing that remains in pgfdw_xact_callback now is end of transaction handling like resetting cursor numbering.

>> That seems totally broken. Before PITR, the database might be
>> inconsistent, in which case you can't call any functions at all.
>> Also, you shouldn't be trying to resolve any transactions until the
>> end of recovery, because you don't know when you see that the
>> transaction was prepared whether, at some subsequent time, you will
>> see it resolved. You need to finish recovery and, only after entering
>> normal running, decide whether to resolve any transactions that are
>> still sitting around.
>
> That's how it works in the patch for unresolved prepared foreign
> transactions belonging to xids within the known range. For those belonging
> to xids in future (beyond of known range of xids after PITR), we can not
> determine the status of that local transaction (as those do not appear in
> the xlog) and hence can not decide the fate of prepared foreign transaction.
> You seem to be suggesting that we should let the recovery finish and mark
> those prepared foreign transaction as "can not be resolved" or something
> like that. A DBA can remove those entries once s/he has dealt with them on
> the foreign server.
>
> There's little problem with that approach. Triplet (xid, serverid, userid)
> is used to identify the a foreign prepared transaction entry in memory and
> is used to create unique file name for storing it on the disk. If we allow a
> future xid after PITR, it might conflict with an xid of a transaction that
> might take place after PITR. It will cause problem if exactly same foreign
> server and user participate in the transaction with conflicting xid (rare
> but possible).
>
> Other problem is that the foreign server on which the transaction was
> prepared (or the user whose mapping was used to prepare the transaction),
> might have got added in a future time wrt PITR, in which case, we can not
> even know which foreign server this transaction was prepared on.
>
>> There should be no situation (short of e.g. OS
>> errors writing the state files) where this stuff makes recovery fail.
>
> During PITR, if we encounter a prepared (local) transaction with a future
> xid, we just forget that prepared transaction (instead of failing recovery).
> May be we should do the same for unresolved foreign prepared transaction as
> well (at least for version 1); forget the unresolved prepared foreign
> transactions which belong to a future xid. Anyway, as per the timeline after
> PITR those never existed.

This last sentence seems to me to be exactly on point. Note the
comment in twophase.c:

* We throw away any prepared xacts with main XID beyond nextXid --- if any
* are present, it suggests that the DBA has done a PITR recovery to an
* earlier point in time without cleaning out pg_twophase. We dare not
* try to recover such prepared xacts since they likely depend on database
* state that doesn't exist now.

In other words, normally there should never be any XIDs "from the
future" with prepared transactions; but in certain PITR scenarios it
might be possible. We might as well be consistent with what the
existing 2PC code does in this case - i.e. just warn and then remove
the files.

Ok. Done.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Amit Langote

Date:

04 August 2015, 00:20:57

On 2015-08-03 PM 09:24, Ashutosh Bapat wrote:
> On Sat, Aug 1, 2015 at 12:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> OK, sure.  But let's make sure postgres_fdw gets a server-level option
>> to control this.
>>
>>
> For postgres_fdw it's a boolean server-level option 'twophase_compliant'
> (suggestions for name welcome).
> 

How about just 'twophase'?

Thanks,
Amit

Re: Transactions involving multiple postgres foreign servers

From

Robert Haas

Date:

04 August 2015, 21:11:19

On Mon, Aug 3, 2015 at 8:19 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2015-08-03 PM 09:24, Ashutosh Bapat wrote:
>> On Sat, Aug 1, 2015 at 12:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>>
>>> OK, sure.  But let's make sure postgres_fdw gets a server-level option
>>> to control this.
>>>
>>>
>> For postgres_fdw it's a boolean server-level option 'twophase_compliant'
>> (suggestions for name welcome).
>>
>
> How about just 'twophase'?

How about two_phase_commit?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Amit Langote

Date:

05 August 2015, 00:52:30

On 2015-08-05 AM 06:11, Robert Haas wrote:
> On Mon, Aug 3, 2015 at 8:19 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> On 2015-08-03 PM 09:24, Ashutosh Bapat wrote:
>>> For postgres_fdw it's a boolean server-level option 'twophase_compliant'
>>> (suggestions for name welcome).
>>>
>>
>> How about just 'twophase'?
> 
> How about two_phase_commit?
> 

Much cleaner, +1

Thanks,
Amit

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

11 August 2015, 08:25:16

On Wed, Aug 5, 2015 at 6:20 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2015-08-05 AM 06:11, Robert Haas wrote:
> On Mon, Aug 3, 2015 at 8:19 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> On 2015-08-03 PM 09:24, Ashutosh Bapat wrote:
>>> For postgres_fdw it's a boolean server-level option 'twophase_compliant'
>>> (suggestions for name welcome).
>>>
>>
>> How about just 'twophase'?
>
> How about two_phase_commit?
>

Much cleaner, +1

I was more inclined to use an adjective, since it's a property of server, instead of a noun. But two_phase_commit looks fine as well, included in the patch attached.

Attached patch addresses all the concerns and suggestions from previous mails in this mail thread.

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

pg_fdw_transact.patch

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

12 August 2015, 10:25:46

The previous patch would not compile on the latest HEAD. Here's updated patch.

On Tue, Aug 11, 2015 at 1:55 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

On Wed, Aug 5, 2015 at 6:20 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2015-08-05 AM 06:11, Robert Haas wrote:
> On Mon, Aug 3, 2015 at 8:19 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> On 2015-08-03 PM 09:24, Ashutosh Bapat wrote:
>>> For postgres_fdw it's a boolean server-level option 'twophase_compliant'
>>> (suggestions for name welcome).
>>>
>>
>> How about just 'twophase'?
>
> How about two_phase_commit?
>

Much cleaner, +1

I was more inclined to use an adjective, since it's a property of server, instead of a noun. But two_phase_commit looks fine as well, included in the patch attached.

Attached patch addresses all the concerns and suggestions from previous mails in this mail thread.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

pg_fdw_transact.patch

Re: Transactions involving multiple postgres foreign servers

From

Robert Haas

Date:

06 November 2015, 18:38:06

On Wed, Aug 12, 2015 at 6:25 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> The previous patch would not compile on the latest HEAD. Here's updated
> patch.

Perhaps unsurprisingly, this doesn't apply any more.  But we have
bigger things to worry about.

The recent eXtensible Transaction Manager and the slides shared at the
Vienna sharding summit, now posted at
https://drive.google.com/file/d/0B8hhdhUVwRHyMXpRRHRSLWFXeXc/view make
me think that some careful thought is needed here about what we want
and how it should work. Slide 10 proposes a method for the extensible
transaction manager API to interact with FDWs.  The FDW would do this:

select dtm_join_transaction(xid);
begin transaction;
update...;
commit;

I think the idea here is that the commit command doesn't really
commit; it just escapes the distributed transaction while leaving it
marked not-committed.  When the transaction subsequently commits on
the local server, the XID is marked committed and the effects of the
transaction become visible on all nodes.

I think that this API is intended to provide not only consistent
cross-node decisions about whether a particular transaction has
committed, but also consistent visibility.  If the API is sufficient
for that and if it can be made sufficiently performant, that's a
strictly stronger guarantee than what this proposal would provide.

On the other hand, I see a couple of problems:

1. The extensible transaction manager API is meant to be pluggable.
Depending on which XTM module you choose to load, the SQL that needs
to be executed by postgres_fdw on the remote node will vary.
postgres_fdw shouldn't have knowledge of all the possible XTMs out
there, so it would need some way to know what SQL to send.

2. If the remote server isn't running the same XTM as the local
server, or if it is running the same XTM but is not part of the same
group of cooperating nodes as the local server, then we can't send a
command to join the distributed transaction at all.  In that case, the
2PC for FDW approach is still, maybe, useful.

On the whole, I'm inclined to think that the XTM-based approach is
probably more useful and more general, if we can work out the problems
with it.  I'm not sure that I'm right, though, nor am I sure how hard
it will be.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Amit Kapila

Date:

07 November 2015, 07:22:44

On Sat, Nov 7, 2015 at 12:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Aug 12, 2015 at 6:25 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
> > The previous patch would not compile on the latest HEAD. Here's updated
> > patch.
>
> Perhaps unsurprisingly, this doesn't apply any more. But we have
> bigger things to worry about.
>
> The recent eXtensible Transaction Manager and the slides shared at the
> Vienna sharding summit, now posted at
> https://drive.google.com/file/d/0B8hhdhUVwRHyMXpRRHRSLWFXeXc/view make
> me think that some careful thought is needed here about what we want
> and how it should work. Slide 10 proposes a method for the extensible
> transaction manager API to interact with FDWs. The FDW would do this:
>
> select dtm_join_transaction(xid);
> begin transaction;
> update...;
> commit;
>
> I think the idea here is that the commit command doesn't really
> commit; it just escapes the distributed transaction while leaving it
> marked not-committed. When the transaction subsequently commits on
> the local server, the XID is marked committed and the effects of the
> transaction become visible on all nodes.

As per my reading of the slides shared by you, the commit in above

context would send a message to Arbiter which indicates it's Vote

for being ready to commit and when Arbiter gets the votes from all

nodes participating in transaction, it sends back an ok message

(this is what I could understand from slides 12 and 13). I think on

receiving ok message each node will mark the transaction as

committed.

> I think that this API is intended to provide not only consistent
> cross-node decisions about whether a particular transaction has
> committed, but also consistent visibility. If the API is sufficient
> for that and if it can be made sufficiently performant, that's a
> strictly stronger guarantee than what this proposal would provide.
>
>
>
> On the whole, I'm inclined to think that the XTM-based approach is
> probably more useful and more general, if we can work out the problems
> with it. I'm not sure that I'm right, though, nor am I sure how hard
> it will be.
>

If I understood correctly, then the main difference between 2PC idea

used in this patch (considering we find some way of sharing snapshots

in this approach) and what is shared in slides is that XTM-based

approach relies on an external identity which it refers to as Arbiter for

performing consistent transaction commit/abort and sharing of snapshots

across all the nodes whereas in the approach in this patch, the transaction

originator (or we can call it as coordinator) is responsible for consistent

transaction commit/abort. I think the plus-point of XTM based approach is

that it provides way of sharing snapshots, but I think we still needs to evaluate

what is the overhead of communication between these methods, as far as I

can see, in Arbiter based approach, Arbiter could become single point of

contention for coordinating messages for all the transactions in a system

whereas if we extend this approach such a contention could be avoided.

Now it is very well possible that the number of messages shared between

nodes in Arbiter based approach are lesser, but still contention could play a

major role. Also another important point which needs some more thought

before concluding on any approach is detection of deadlocks between different

nodes, in the slides shared by you, there is no discussion of deadlocks,

so it is not clear whether it will work as it is without any modification or do

we need any modifications and deadlock detection system and if yes, then

how that will be achieved.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Transactions involving multiple postgres foreign servers

From

Amit Kapila

Date:

07 November 2015, 14:16:41

On Sat, Nov 7, 2015 at 12:52 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Nov 7, 2015 at 12:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On the whole, I'm inclined to think that the XTM-based approach is
> > probably more useful and more general, if we can work out the problems
> > with it. I'm not sure that I'm right, though, nor am I sure how hard
> > it will be.
> >
>
> If I understood correctly, then the main difference between 2PC idea
> used in this patch (considering we find some way of sharing snapshots
> in this approach) and what is shared in slides is that XTM-based
> approach

Read it as DTM-based approach.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

09 November 2015, 07:00:02

On Sat, Nov 7, 2015 at 12:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Aug 12, 2015 at 6:25 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> The previous patch would not compile on the latest HEAD. Here's updated
> patch.

Perhaps unsurprisingly, this doesn't apply any more. But we have
bigger things to worry about.

The recent eXtensible Transaction Manager and the slides shared at the
Vienna sharding summit, now posted at
https://drive.google.com/file/d/0B8hhdhUVwRHyMXpRRHRSLWFXeXc/view make
me think that some careful thought is needed here about what we want
and how it should work. Slide 10 proposes a method for the extensible
transaction manager API to interact with FDWs. The FDW would do this:

select dtm_join_transaction(xid);
begin transaction;
update...;
commit;

I think the idea here is that the commit command doesn't really
commit; it just escapes the distributed transaction while leaving it
marked not-committed. When the transaction subsequently commits on
the local server, the XID is marked committed and the effects of the
transaction become visible on all nodes.

Since the foreign server (referred to in the slides as secondary server) requires to call "create extension pg_dtm" and select dtm_join_transaction(xid);, I assume that the foreign server has to be a PostgreSQL server and one which has this extension installed and has a version that can support this extension. So, we can not use the extension for all FDWs and even for postgres_fdw it can be used only for a foreign server with above capabilities. The slides mention just FDW but I think they mean postgres_fdw and not all FDWs.

I think that this API is intended to provide not only consistent
cross-node decisions about whether a particular transaction has
committed, but also consistent visibility. If the API is sufficient
for that and if it can be made sufficiently performant, that's a
strictly stronger guarantee than what this proposal would provide.

On the other hand, I see a couple of problems:

1. The extensible transaction manager API is meant to be pluggable.
Depending on which XTM module you choose to load, the SQL that needs
to be executed by postgres_fdw on the remote node will vary.
postgres_fdw shouldn't have knowledge of all the possible XTMs out
there, so it would need some way to know what SQL to send.

2. If the remote server isn't running the same XTM as the local
server, or if it is running the same XTM but is not part of the same
group of cooperating nodes as the local server, then we can't send a
command to join the distributed transaction at all. In that case, the
2PC for FDW approach is still, maybe, useful.

Elaborating more on this: Slide 11 shows arbiter protocol to start a transaction and next slide shows the same for commit. Slide 15 shows the transaction flow diagram for tsDTM. In DTM approach it doesn't specify how xids are communicated between nodes, but it's implicit in the protocol that xid space is shared by the nodes. Similarly for tsDTM it assumes that CSN space is shared by all the nodes (see synchronization for max(CSN)). This can not be assumed for FDWs (not even postgres_fdw) where foreign servers are independent entities with independent xid space.

On the whole, I'm inclined to think that the XTM-based approach is
probably more useful and more general, if we can work out the problems
with it. I'm not sure that I'm right, though, nor am I sure how hard
it will be.

2PC for FDW and XTM are trying to solve different problems with some commonality. 2PC for FDW is trying to solve problem of atomic commit (I am borrowing from the terminology you used in PGCon 2015) for FDWs in general (although limited to FDWs which can support 2 phase commit) and XTM tries to solve problems of atomic visibility, atomic commit and consistency for postgres_fdw where foreign servers support XTM. The only thing common between these two is atomic visibility.

If we accept XTM and discard 2PC for FDW, we will not be able to support atomic commit for FDWs in general. That, I think would be serious limitation for Postgres FDW, esp. now that DMLs are allowed. If we accept only 2PC for FDW and discard XTM, we won't be able to get atomic visibility and consistency for postgres_fdw with foreign servers supporting XTM. That would be again serious limitation for solutions implementing sharding, multi-master clusters etc.

There are approaches like [1] by which cluster of heterogenous servers (with some level of snapshot isolation) can be constructed. Ideally that will enable PostgreSQL users to maximize their utilization of FDWs.

Any distributed transaction management requires 2PC in some or other form. So, we should implement 2PC for FDW keeping in mind various forms of 2PC used practically. Use that infrastructure to build XTM like capabilities for restricted postgres_fdw uses. Previously, I have requested the authors of XTM to look at my patch and provide me feedback about their requirements for implementing 2PC part of XTM. But I have not heard anything from them.

1. https://domino.mpi-inf.mpg.de/intranet/ag5/ag5publ.nsf/1c0a12a383dd2cd8c125613300585c64/7684dd8109a5b3d5c1256de40051686f/$FILE/tdd99.pdf

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Konstantin Knizhnik

Date:

09 November 2015, 08:55:15


On 06.11.2015 21:37, Robert Haas wrote:
> On Wed, Aug 12, 2015 at 6:25 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> The previous patch would not compile on the latest HEAD. Here's updated
>> patch.
> Perhaps unsurprisingly, this doesn't apply any more.  But we have
> bigger things to worry about.
>
> The recent eXtensible Transaction Manager and the slides shared at the
> Vienna sharding summit, now posted at
> https://drive.google.com/file/d/0B8hhdhUVwRHyMXpRRHRSLWFXeXc/view make
> me think that some careful thought is needed here about what we want
> and how it should work. Slide 10 proposes a method for the extensible
> transaction manager API to interact with FDWs.  The FDW would do this:
>
> select dtm_join_transaction(xid);
> begin transaction;
> update...;
> commit;
>
> I think the idea here is that the commit command doesn't really
> commit; it just escapes the distributed transaction while leaving it
> marked not-committed.  When the transaction subsequently commits on
> the local server, the XID is marked committed and the effects of the
> transaction become visible on all nodes.
>
> I think that this API is intended to provide not only consistent
> cross-node decisions about whether a particular transaction has
> committed, but also consistent visibility.  If the API is sufficient
> for that and if it can be made sufficiently performant, that's a
> strictly stronger guarantee than what this proposal would provide.
>
> On the other hand, I see a couple of problems:
>
> 1. The extensible transaction manager API is meant to be pluggable.
> Depending on which XTM module you choose to load, the SQL that needs
> to be executed by postgres_fdw on the remote node will vary.
> postgres_fdw shouldn't have knowledge of all the possible XTMs out
> there, so it would need some way to know what SQL to send.
>
> 2. If the remote server isn't running the same XTM as the local
> server, or if it is running the same XTM but is not part of the same
> group of cooperating nodes as the local server, then we can't send a
> command to join the distributed transaction at all.  In that case, the
> 2PC for FDW approach is still, maybe, useful.
>
> On the whole, I'm inclined to think that the XTM-based approach is
> probably more useful and more general, if we can work out the problems
> with it.  I'm not sure that I'm right, though, nor am I sure how hard
> it will be.
Sorry, but we currently considered only case of homogeneous environment: 
when all cluster instances are using PostgreSQL with the same XTM 
implementation.
I can imagine situations when it may be useful to coordinate transaction 
processing in heterogeneous cluster, but it seems to be quite exotic use 
case.
Combining several different databases on one cluster can be explained by 
some historical reasons or specific of particular system architecture. 
But I can not imagine any reason for using different XTM implementations 
and especially mixing them in one transaction.

Re: Transactions involving multiple postgres foreign servers

From

Konstantin Knizhnik

Date:

09 November 2015, 10:02:05

On 09.11.2015 09:59, Ashutosh Bapat wrote:

Since the foreign server (referred to in the slides as secondary server) requires to call "create extension pg_dtm" and select dtm_join_transaction(xid);, I assume that the foreign server has to be a PostgreSQL server and one which has this extension installed and has a version that can support this extension. So, we can not use the extension for all FDWs and even for postgres_fdw it can be used only for a foreign server with above capabilities. The slides mention just FDW but I think they mean postgres_fdw and not all FDWs.

DTM approach is based on sharing XIDs and snapshots between different cluster nodes, so it really can be easily implemented only for PostgreSQL. So I really have in mind postgres_fdw rather than abstract FDW.
Approach with timestamps is more universal and in principle can be used for any DBMS where visibility is based on CSNs.

I think that this API is intended to provide not only consistent
cross-node decisions about whether a particular transaction has
committed, but also consistent visibility. If the API is sufficient
for that and if it can be made sufficiently performant, that's a
strictly stronger guarantee than what this proposal would provide.

On the other hand, I see a couple of problems:

1. The extensible transaction manager API is meant to be pluggable.
Depending on which XTM module you choose to load, the SQL that needs
to be executed by postgres_fdw on the remote node will vary.
postgres_fdw shouldn't have knowledge of all the possible XTMs out
there, so it would need some way to know what SQL to send.

2. If the remote server isn't running the same XTM as the local
server, or if it is running the same XTM but is not part of the same
group of cooperating nodes as the local server, then we can't send a
command to join the distributed transaction at all. In that case, the
2PC for FDW approach is still, maybe, useful.

Elaborating more on this: Slide 11 shows arbiter protocol to start a transaction and next slide shows the same for commit. Slide 15 shows the transaction flow diagram for tsDTM. In DTM approach it doesn't specify how xids are communicated between nodes, but it's implicit in the protocol that xid space is shared by the nodes. Similarly for tsDTM it assumes that CSN space is shared by all the nodes (see synchronization for max(CSN)). This can not be assumed for FDWs (not even postgres_fdw) where foreign servers are independent entities with independent xid space.

Proposed architecture of DTM includes "coordinator". Coordinator is a process responsible for managing logic of distributed transaction. It can be just a normal client application, or it can be intermediate master node (like in case of pg_shard).
It can be also PostgreSQL instance (as in case of postgres_fdw) or not. We try to put as less restriction on "coordinator" as possible.
It should just communicate with PostgreSQL backends using any communication protocol it likes (i.e. libpq) and invokes some special stored procedures which are part of particular DTM extension. Such functions also impose some protocol of exchanging data between different nodes involved in distributed transaction. In such way we are propagating XIDs/CSNs between different nodes which may even do not know about each other.
In DTM approach nodes only know about location of "arbiter". In tsDTM approach there is even not arbiter...

On the whole, I'm inclined to think that the XTM-based approach is
probably more useful and more general, if we can work out the problems
with it. I'm not sure that I'm right, though, nor am I sure how hard
it will be.

2PC for FDW and XTM are trying to solve different problems with some commonality. 2PC for FDW is trying to solve problem of atomic commit (I am borrowing from the terminology you used in PGCon 2015) for FDWs in general (although limited to FDWs which can support 2 phase commit) and XTM tries to solve problems of atomic visibility, atomic commit and consistency for postgres_fdw where foreign servers support XTM. The only thing common between these two is atomic visibility.

If we accept XTM and discard 2PC for FDW, we will not be able to support atomic commit for FDWs in general. That, I think would be serious limitation for Postgres FDW, esp. now that DMLs are allowed. If we accept only 2PC for FDW and discard XTM, we won't be able to get atomic visibility and consistency for postgres_fdw with foreign servers supporting XTM. That would be again serious limitation for solutions implementing sharding, multi-master clusters etc.

There are approaches like [1] by which cluster of heterogenous servers (with some level of snapshot isolation) can be constructed. Ideally that will enable PostgreSQL users to maximize their utilization of FDWs.

Any distributed transaction management requires 2PC in some or other form. So, we should implement 2PC for FDW keeping in mind various forms of 2PC used practically. Use that infrastructure to build XTM like capabilities for restricted postgres_fdw uses. Previously, I have requested the authors of XTM to look at my patch and provide me feedback about their requirements for implementing 2PC part of XTM. But I have not heard anything from them.

1. https://domino.mpi-inf.mpg.de/intranet/ag5/ag5publ.nsf/1c0a12a383dd2cd8c125613300585c64/7684dd8109a5b3d5c1256de40051686f/$FILE/tdd99.pdf

Sorry, may be I missed some message. but I have not received request from you to provide feedback concerning your patch.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

09 November 2015, 11:02:05

Any distributed transaction management requires 2PC in some or other form. So, we should implement 2PC for FDW keeping in mind various forms of 2PC used practically. Use that infrastructure to build XTM like capabilities for restricted postgres_fdw uses. Previously, I have requested the authors of XTM to look at my patch and provide me feedback about their requirements for implementing 2PC part of XTM. But I have not heard anything from them.

1. https://domino.mpi-inf.mpg.de/intranet/ag5/ag5publ.nsf/1c0a12a383dd2cd8c125613300585c64/7684dd8109a5b3d5c1256de40051686f/$FILE/tdd99.pdf

Sorry, may be I missed some message. but I have not received request from you to provide feedback concerning your patch.

See my mail on 31st August on hackers in the thread with subject "Horizontal scalability/sharding".

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

09 November 2015, 11:55:32

On Sat, Nov 7, 2015 at 12:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Aug 12, 2015 at 6:25 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> The previous patch would not compile on the latest HEAD. Here's updated
> patch.

Perhaps unsurprisingly, this doesn't apply any more. But we have
bigger things to worry about.

Here's updated patch. I didn't use version numbers in file names in my previous patches. I am starting from this onwards.

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

pg_fdw_transact_v1.patch

Re: Transactions involving multiple postgres foreign servers

From

Michael Paquier

Date:

24 December 2015, 03:02:26

On Mon, Nov 9, 2015 at 8:55 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>
>
> On Sat, Nov 7, 2015 at 12:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Wed, Aug 12, 2015 at 6:25 AM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>> > The previous patch would not compile on the latest HEAD. Here's updated
>> > patch.
>>
>> Perhaps unsurprisingly, this doesn't apply any more.  But we have
>> bigger things to worry about.
>>
>
> Here's updated patch. I didn't use version numbers in file names in my
> previous patches. I am starting from this onwards.

Ashutosh, others, this thread has been stalling for more than 1 month
and a half. There is a new patch that still applies (be careful of
whitespaces btw), but no reviews came in. So what should we do? I
would tend to move this patch to the next CF because of a lack of
reviews.
-- 
Michael

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

24 December 2015, 10:03:16

On Thu, Dec 24, 2015 at 8:32 AM, Michael Paquier <michael.paquier@gmail.com> wrote:

On Mon, Nov 9, 2015 at 8:55 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>
>
> On Sat, Nov 7, 2015 at 12:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Wed, Aug 12, 2015 at 6:25 AM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>> > The previous patch would not compile on the latest HEAD. Here's updated
>> > patch.
>>
>> Perhaps unsurprisingly, this doesn't apply any more. But we have
>> bigger things to worry about.
>>
>
> Here's updated patch. I didn't use version numbers in file names in my
> previous patches. I am starting from this onwards.

Ashutosh, others, this thread has been stalling for more than 1 month
and a half. There is a new patch that still applies (be careful of
whitespaces btw), but no reviews came in. So what should we do? I
would tend to move this patch to the next CF because of a lack of
reviews.

Yes, that would help. Thanks.

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Michael Paquier

Date:

24 December 2015, 12:42:40

On Thu, Dec 24, 2015 at 7:03 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Thu, Dec 24, 2015 at 8:32 AM, Michael Paquier <michael.paquier@gmail.com>
>> Ashutosh, others, this thread has been stalling for more than 1 month
>> and a half. There is a new patch that still applies (be careful of
>> whitespaces btw), but no reviews came in. So what should we do? I
>> would tend to move this patch to the next CF because of a lack of
>> reviews.
>
>
> Yes, that would help. Thanks.

Done.
-- 
Michael

Re: Transactions involving multiple postgres foreign servers

From

Alvaro Herrera

Date:

31 January 2016, 12:43:32

Ashutosh Bapat wrote:

> Here's updated patch. I didn't use version numbers in file names in my
> previous patches. I am starting from this onwards.

Um, I tried this patch and it doesn't apply at all.  There's a large
number of conflicts.  Please update it and resubmit to the next
commitfest.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Transactions involving multiple postgres foreign servers

From

Alvaro Herrera

Date:

01 February 2016, 14:14:50

Alvaro Herrera wrote:
> Ashutosh Bapat wrote:
> 
> > Here's updated patch. I didn't use version numbers in file names in my
> > previous patches. I am starting from this onwards.
> 
> Um, I tried this patch and it doesn't apply at all.  There's a large
> number of conflicts.  Please update it and resubmit to the next
> commitfest.

Also, please run "git show --check" of "git diff origin/master --check"
and fix the whitespace problems that it shows.  It's an easy thing but
there's a lot of red squares in my screen.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Transactions involving multiple postgres foreign servers

From

Vinayak Pokale

Date:

26 August 2016, 04:32:57

<div dir="ltr"><p>Hi All,<p>Ashutosh proposed the feature 2PC for FDW for achieving atomic commits across multiple
foreignservers. <br /> If a transaction make changes to more than two foreign servers the current implementation in
postgres_fdwdoesn't make sure that either all of them commit or all of them rollback their changes. <br /><br /> We
(MasahikoSawada and me) reopen this thread and trying to contribute in it. <br /><br /> 2PC for FDW <br /> ============
<br/> The patch provides support for atomic commit for transactions involving foreign servers. when the transaction
makeschanges to foreign servers, <br /> either all the changes to all the foreign servers commit or rollback. <br /><br
/>The new patch 2PC for FDW include the following things: <br /> 1. The patch 0001 introduces a generic feature. All
kindsof FDW that support 2PC such as oracle_fdw, mysql_fdw, postgres_fdw etc. can involve in the transaction. <br /><br
/>Currentlywe can push some conditions down to shard nodes, especially in 9.6 the directly modify feature has <br
/>beenintroduced. But such a transaction modifying data on shard node is not executed surely. <br />Using 0002 patch,
thatmodify is executed with 2PC. It means that we almost can provide sharding solution using <br />multiple PostgreSQL
server(one parent node and several shared node). <br /><br />For multi master, we definitely need transaction manager
buttransaction manager probably can use this 2PC for FDW feature to manage distributed transaction. <br /><br /> 2.
0002patch makes postgres_fdw possible to use 2PC.<br /><p> 0002 patch makes postgres_fdw to use below APIs. These APIs
aregeneric features which can be used by all kinds of FDWs.<p>    a. Execute PREAPRE TRANSACTION and COMMIT/ABORT
PREAPREDinstead of COMMIT/ABORT on foreign server which supports 2PC. <br />    b. Manage information of foreign
preparedtransactions resolver <br /><p>Masahiko Sawada will post the patch. <br /><br /> Suggestions and comments are
helpfulto implement this feature.<br /><br /> Regards, <br /><br /> Vinayak Pokale </div><div class="gmail_extra"><br
/><divclass="gmail_quote">On Mon, Feb 1, 2016 at 11:14 PM, Alvaro Herrera <span dir="ltr"><<a
href="mailto:alvherre@2ndquadrant.com"target="_blank">alvherre@2ndquadrant.com</a>></span> wrote:<br /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">Alvaro Herrera
wrote:<br/> > Ashutosh Bapat wrote:<br /> ><br /> > > Here's updated patch. I didn't use version numbers in
filenames in my<br /> > > previous patches. I am starting from this onwards.<br /> ><br /> > Um, I tried
thispatch and it doesn't apply at all.  There's a large<br /> > number of conflicts.  Please update it and resubmit
tothe next<br /> > commitfest.<br /><br /></span>Also, please run "git show --check" of "git diff origin/master
--check"<br/> and fix the whitespace problems that it shows.  It's an easy thing but<br /> there's a lot of red squares
inmy screen.<br /><div class="HOEnZb"><div class="h5"><br /> --<br /> Álvaro Herrera                <a
href="http://www.2ndQuadrant.com/"rel="noreferrer" target="_blank">http://www.2ndQuadrant.com/</a><br /> PostgreSQL
Development,24x7 Support, Remote DBA, Training & Services<br /><br /><br /> --<br /> Sent via pgsql-hackers mailing
list(<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br /> To make changes to your
subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers" rel="noreferrer"
target="_blank">http://www.postgresql.org/<wbr/>mailpref/pgsql-hackers</a><br /></div></div></blockquote></div><br
/></div>

Re: Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

26 August 2016, 05:53:38

On Fri, Aug 26, 2016 at 1:32 PM, Vinayak Pokale <vinpokale@gmail.com> wrote:
> Hi All,
>
> Ashutosh proposed the feature 2PC for FDW for achieving atomic commits
> across multiple foreign servers.
> If a transaction make changes to more than two foreign servers the current
> implementation in postgres_fdw doesn't make sure that either all of them
> commit or all of them rollback their changes.
>
> We (Masahiko Sawada and me) reopen this thread and trying to contribute in
> it.
>
> 2PC for FDW
> ============
> The patch provides support for atomic commit for transactions involving
> foreign servers. when the transaction makes changes to foreign servers,
> either all the changes to all the foreign servers commit or rollback.
>
> The new patch 2PC for FDW include the following things:
> 1. The patch 0001 introduces a generic feature. All kinds of FDW that
> support 2PC such as oracle_fdw, mysql_fdw, postgres_fdw etc. can involve in
> the transaction.
>
> Currently we can push some conditions down to shard nodes, especially in 9.6
> the directly modify feature has
> been introduced. But such a transaction modifying data on shard node is not
> executed surely.
> Using 0002 patch, that modify is executed with 2PC. It means that we almost
> can provide sharding solution using
> multiple PostgreSQL server (one parent node and several shared node).
>
> For multi master, we definitely need transaction manager but transaction
> manager probably can use this 2PC for FDW feature to manage distributed
> transaction.
>
> 2. 0002 patch makes postgres_fdw possible to use 2PC.
>
> 0002 patch makes postgres_fdw to use below APIs. These APIs are generic
> features which can be used by all kinds of FDWs.
>
>     a. Execute PREAPRE TRANSACTION and COMMIT/ABORT PREAPRED instead of
> COMMIT/ABORT on foreign server which supports 2PC.
>     b. Manage information of foreign prepared transactions resolver
>
> Masahiko Sawada will post the patch.
>
>

Still lot of work to do but attached latest patches.
These are based on the patch Ashutosh posted before, I revised it and
divided into two patches.
Compare with original patch, patch of pg_fdw_xact_resolver and
documentation are lacked.

Feedback and suggestion are very welcome.

Regards,

--
Masahiko Sawada

Attachment

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

26 August 2016, 06:03:09

On Fri, Aug 26, 2016 at 11:22 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Aug 26, 2016 at 1:32 PM, Vinayak Pokale <vinpokale@gmail.com> wrote:
> Hi All,
>
> Ashutosh proposed the feature 2PC for FDW for achieving atomic commits
> across multiple foreign servers.
> If a transaction make changes to more than two foreign servers the current
> implementation in postgres_fdw doesn't make sure that either all of them
> commit or all of them rollback their changes.
>
> We (Masahiko Sawada and me) reopen this thread and trying to contribute in
> it.
>
> 2PC for FDW
> ============
> The patch provides support for atomic commit for transactions involving
> foreign servers. when the transaction makes changes to foreign servers,
> either all the changes to all the foreign servers commit or rollback.
>
> The new patch 2PC for FDW include the following things:
> 1. The patch 0001 introduces a generic feature. All kinds of FDW that
> support 2PC such as oracle_fdw, mysql_fdw, postgres_fdw etc. can involve in
> the transaction.
>
> Currently we can push some conditions down to shard nodes, especially in 9.6
> the directly modify feature has
> been introduced. But such a transaction modifying data on shard node is not
> executed surely.
> Using 0002 patch, that modify is executed with 2PC. It means that we almost
> can provide sharding solution using
> multiple PostgreSQL server (one parent node and several shared node).
>
> For multi master, we definitely need transaction manager but transaction
> manager probably can use this 2PC for FDW feature to manage distributed
> transaction.
>
> 2. 0002 patch makes postgres_fdw possible to use 2PC.
>
> 0002 patch makes postgres_fdw to use below APIs. These APIs are generic
> features which can be used by all kinds of FDWs.
>
> a. Execute PREAPRE TRANSACTION and COMMIT/ABORT PREAPRED instead of
> COMMIT/ABORT on foreign server which supports 2PC.
> b. Manage information of foreign prepared transactions resolver
>
> Masahiko Sawada will post the patch.
>
>

Thanks Vinayak and Sawada-san for taking this forward and basing your work on my patch.

Still lot of work to do but attached latest patches.
These are based on the patch Ashutosh posted before, I revised it and
divided into two patches.
Compare with original patch, patch of pg_fdw_xact_resolver and
documentation are lacked.

I am not able to understand the last statement.

Do you mean to say that your patches do not have pg_fdw_xact_resolver() and documentation that my patches had?

you mean to say that my patches did not have (lacked) pg_fdw_xact_resolver() and documenation

OR some combination of those?

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

26 August 2016, 06:07:49

On Fri, Aug 26, 2016 at 3:03 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>
>
> On Fri, Aug 26, 2016 at 11:22 AM, Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
>>
>> On Fri, Aug 26, 2016 at 1:32 PM, Vinayak Pokale <vinpokale@gmail.com>
>> wrote:
>> > Hi All,
>> >
>> > Ashutosh proposed the feature 2PC for FDW for achieving atomic commits
>> > across multiple foreign servers.
>> > If a transaction make changes to more than two foreign servers the
>> > current
>> > implementation in postgres_fdw doesn't make sure that either all of them
>> > commit or all of them rollback their changes.
>> >
>> > We (Masahiko Sawada and me) reopen this thread and trying to contribute
>> > in
>> > it.
>> >
>> > 2PC for FDW
>> > ============
>> > The patch provides support for atomic commit for transactions involving
>> > foreign servers. when the transaction makes changes to foreign servers,
>> > either all the changes to all the foreign servers commit or rollback.
>> >
>> > The new patch 2PC for FDW include the following things:
>> > 1. The patch 0001 introduces a generic feature. All kinds of FDW that
>> > support 2PC such as oracle_fdw, mysql_fdw, postgres_fdw etc. can involve
>> > in
>> > the transaction.
>> >
>> > Currently we can push some conditions down to shard nodes, especially in
>> > 9.6
>> > the directly modify feature has
>> > been introduced. But such a transaction modifying data on shard node is
>> > not
>> > executed surely.
>> > Using 0002 patch, that modify is executed with 2PC. It means that we
>> > almost
>> > can provide sharding solution using
>> > multiple PostgreSQL server (one parent node and several shared node).
>> >
>> > For multi master, we definitely need transaction manager but transaction
>> > manager probably can use this 2PC for FDW feature to manage distributed
>> > transaction.
>> >
>> > 2. 0002 patch makes postgres_fdw possible to use 2PC.
>> >
>> > 0002 patch makes postgres_fdw to use below APIs. These APIs are generic
>> > features which can be used by all kinds of FDWs.
>> >
>> >     a. Execute PREAPRE TRANSACTION and COMMIT/ABORT PREAPRED instead of
>> > COMMIT/ABORT on foreign server which supports 2PC.
>> >     b. Manage information of foreign prepared transactions resolver
>> >
>> > Masahiko Sawada will post the patch.
>> >
>> >
>>
>
> Thanks Vinayak and Sawada-san for taking this forward and basing your work
> on my patch.
>
>>
>> Still lot of work to do but attached latest patches.
>> These are based on the patch Ashutosh posted before, I revised it and
>> divided into two patches.
>> Compare with original patch, patch of pg_fdw_xact_resolver and
>> documentation are lacked.
>
>
> I am not able to understand the last statement.

Sorry to confuse you.

> Do you mean to say that your patches do not have pg_fdw_xact_resolver() and
> documentation that my patches had?

Yes.
I'm confirming them that your patches had.

Regards,

--
Masahiko Sawada

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

26 August 2016, 06:13:09

On Fri, Aug 26, 2016 at 11:37 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Aug 26, 2016 at 3:03 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>
>
> On Fri, Aug 26, 2016 at 11:22 AM, Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
>>
>> On Fri, Aug 26, 2016 at 1:32 PM, Vinayak Pokale <vinpokale@gmail.com>
>> wrote:
>> > Hi All,
>> >
>> > Ashutosh proposed the feature 2PC for FDW for achieving atomic commits
>> > across multiple foreign servers.
>> > If a transaction make changes to more than two foreign servers the
>> > current
>> > implementation in postgres_fdw doesn't make sure that either all of them
>> > commit or all of them rollback their changes.
>> >
>> > We (Masahiko Sawada and me) reopen this thread and trying to contribute
>> > in
>> > it.
>> >
>> > 2PC for FDW
>> > ============
>> > The patch provides support for atomic commit for transactions involving
>> > foreign servers. when the transaction makes changes to foreign servers,
>> > either all the changes to all the foreign servers commit or rollback.
>> >
>> > The new patch 2PC for FDW include the following things:
>> > 1. The patch 0001 introduces a generic feature. All kinds of FDW that
>> > support 2PC such as oracle_fdw, mysql_fdw, postgres_fdw etc. can involve
>> > in
>> > the transaction.
>> >
>> > Currently we can push some conditions down to shard nodes, especially in
>> > 9.6
>> > the directly modify feature has
>> > been introduced. But such a transaction modifying data on shard node is
>> > not
>> > executed surely.
>> > Using 0002 patch, that modify is executed with 2PC. It means that we
>> > almost
>> > can provide sharding solution using
>> > multiple PostgreSQL server (one parent node and several shared node).
>> >
>> > For multi master, we definitely need transaction manager but transaction
>> > manager probably can use this 2PC for FDW feature to manage distributed
>> > transaction.
>> >
>> > 2. 0002 patch makes postgres_fdw possible to use 2PC.
>> >
>> > 0002 patch makes postgres_fdw to use below APIs. These APIs are generic
>> > features which can be used by all kinds of FDWs.
>> >
>> > a. Execute PREAPRE TRANSACTION and COMMIT/ABORT PREAPRED instead of
>> > COMMIT/ABORT on foreign server which supports 2PC.
>> > b. Manage information of foreign prepared transactions resolver
>> >
>> > Masahiko Sawada will post the patch.
>> >
>> >
>>
>
> Thanks Vinayak and Sawada-san for taking this forward and basing your work
> on my patch.
>
>>
>> Still lot of work to do but attached latest patches.
>> These are based on the patch Ashutosh posted before, I revised it and
>> divided into two patches.
>> Compare with original patch, patch of pg_fdw_xact_resolver and
>> documentation are lacked.
>
>
> I am not able to understand the last statement.

Sorry to confuse you.

> Do you mean to say that your patches do not have pg_fdw_xact_resolver() and
> documentation that my patches had?

Yes.
I'm confirming them that your patches had.

Thanks for the clarification. I had added pg_fdw_xact_resolver() to resolve any transactions which can not be resolved immediately after they were prepared. There was a comment from Kevin (IIRC) that leaving transactions unresolved on the foreign server keeps the resources locked on those servers. That's not a very good situation. And nobody but the initiating server can resolve those. That functionality is important to make it a complete 2PC solution. So, please consider it to be included in your first set of patches.

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

26 August 2016, 16:29:57

On Fri, Aug 26, 2016 at 3:13 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>
>
> On Fri, Aug 26, 2016 at 11:37 AM, Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
>>
>> On Fri, Aug 26, 2016 at 3:03 PM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>> >
>> >
>> > On Fri, Aug 26, 2016 at 11:22 AM, Masahiko Sawada
>> > <sawada.mshk@gmail.com>
>> > wrote:
>> >>
>> >> On Fri, Aug 26, 2016 at 1:32 PM, Vinayak Pokale <vinpokale@gmail.com>
>> >> wrote:
>> >> > Hi All,
>> >> >
>> >> > Ashutosh proposed the feature 2PC for FDW for achieving atomic
>> >> > commits
>> >> > across multiple foreign servers.
>> >> > If a transaction make changes to more than two foreign servers the
>> >> > current
>> >> > implementation in postgres_fdw doesn't make sure that either all of
>> >> > them
>> >> > commit or all of them rollback their changes.
>> >> >
>> >> > We (Masahiko Sawada and me) reopen this thread and trying to
>> >> > contribute
>> >> > in
>> >> > it.
>> >> >
>> >> > 2PC for FDW
>> >> > ============
>> >> > The patch provides support for atomic commit for transactions
>> >> > involving
>> >> > foreign servers. when the transaction makes changes to foreign
>> >> > servers,
>> >> > either all the changes to all the foreign servers commit or rollback.
>> >> >
>> >> > The new patch 2PC for FDW include the following things:
>> >> > 1. The patch 0001 introduces a generic feature. All kinds of FDW that
>> >> > support 2PC such as oracle_fdw, mysql_fdw, postgres_fdw etc. can
>> >> > involve
>> >> > in
>> >> > the transaction.
>> >> >
>> >> > Currently we can push some conditions down to shard nodes, especially
>> >> > in
>> >> > 9.6
>> >> > the directly modify feature has
>> >> > been introduced. But such a transaction modifying data on shard node
>> >> > is
>> >> > not
>> >> > executed surely.
>> >> > Using 0002 patch, that modify is executed with 2PC. It means that we
>> >> > almost
>> >> > can provide sharding solution using
>> >> > multiple PostgreSQL server (one parent node and several shared node).
>> >> >
>> >> > For multi master, we definitely need transaction manager but
>> >> > transaction
>> >> > manager probably can use this 2PC for FDW feature to manage
>> >> > distributed
>> >> > transaction.
>> >> >
>> >> > 2. 0002 patch makes postgres_fdw possible to use 2PC.
>> >> >
>> >> > 0002 patch makes postgres_fdw to use below APIs. These APIs are
>> >> > generic
>> >> > features which can be used by all kinds of FDWs.
>> >> >
>> >> >     a. Execute PREAPRE TRANSACTION and COMMIT/ABORT PREAPRED instead
>> >> > of
>> >> > COMMIT/ABORT on foreign server which supports 2PC.
>> >> >     b. Manage information of foreign prepared transactions resolver
>> >> >
>> >> > Masahiko Sawada will post the patch.
>> >> >
>> >> >
>> >>
>> >
>> > Thanks Vinayak and Sawada-san for taking this forward and basing your
>> > work
>> > on my patch.
>> >
>> >>
>> >> Still lot of work to do but attached latest patches.
>> >> These are based on the patch Ashutosh posted before, I revised it and
>> >> divided into two patches.
>> >> Compare with original patch, patch of pg_fdw_xact_resolver and
>> >> documentation are lacked.
>> >
>> >
>> > I am not able to understand the last statement.
>>
>> Sorry to confuse you.
>>
>> > Do you mean to say that your patches do not have pg_fdw_xact_resolver()
>> > and
>> > documentation that my patches had?
>>
>> Yes.
>> I'm confirming them that your patches had.
>
>
> Thanks for the clarification. I had added pg_fdw_xact_resolver() to resolve
> any transactions which can not be resolved immediately after they were
> prepared. There was a comment from Kevin (IIRC) that leaving transactions
> unresolved on the foreign server keeps the resources locked on those
> servers. That's not a very good situation. And nobody but the initiating
> server can resolve those. That functionality is important to make it a
> complete 2PC solution. So, please consider it to be included in your first
> set of patches.
>

Yeah, I know the reason why pg_fdw_xact_resolver is required.
I will add it as a separated patch.

Regards,

--
Masahiko Sawada

Re: Transactions involving multiple postgres foreign servers

From

vinayak

Date:

07 September 2016, 01:56:41

On 2016/08/26 15:13, Ashutosh Bapat wrote:

On Fri, Aug 26, 2016 at 11:37 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Aug 26, 2016 at 3:03 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>
>
> On Fri, Aug 26, 2016 at 11:22 AM, Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
>>
>> On Fri, Aug 26, 2016 at 1:32 PM, Vinayak Pokale <vinpokale@gmail.com>
>> wrote:
>> > Hi All,
>> >
>> > Ashutosh proposed the feature 2PC for FDW for achieving atomic commits
>> > across multiple foreign servers.
>> > If a transaction make changes to more than two foreign servers the
>> > current
>> > implementation in postgres_fdw doesn't make sure that either all of them
>> > commit or all of them rollback their changes.
>> >
>> > We (Masahiko Sawada and me) reopen this thread and trying to contribute
>> > in
>> > it.
>> >
>> > 2PC for FDW
>> > ============
>> > The patch provides support for atomic commit for transactions involving
>> > foreign servers. when the transaction makes changes to foreign servers,
>> > either all the changes to all the foreign servers commit or rollback.
>> >
>> > The new patch 2PC for FDW include the following things:
>> > 1. The patch 0001 introduces a generic feature. All kinds of FDW that
>> > support 2PC such as oracle_fdw, mysql_fdw, postgres_fdw etc. can involve
>> > in
>> > the transaction.
>> >
>> > Currently we can push some conditions down to shard nodes, especially in
>> > 9.6
>> > the directly modify feature has
>> > been introduced. But such a transaction modifying data on shard node is
>> > not
>> > executed surely.
>> > Using 0002 patch, that modify is executed with 2PC. It means that we
>> > almost
>> > can provide sharding solution using
>> > multiple PostgreSQL server (one parent node and several shared node).
>> >
>> > For multi master, we definitely need transaction manager but transaction
>> > manager probably can use this 2PC for FDW feature to manage distributed
>> > transaction.
>> >
>> > 2. 0002 patch makes postgres_fdw possible to use 2PC.
>> >
>> > 0002 patch makes postgres_fdw to use below APIs. These APIs are generic
>> > features which can be used by all kinds of FDWs.
>> >
>> > a. Execute PREAPRE TRANSACTION and COMMIT/ABORT PREAPRED instead of
>> > COMMIT/ABORT on foreign server which supports 2PC.
>> > b. Manage information of foreign prepared transactions resolver
>> >
>> > Masahiko Sawada will post the patch.
>> >
>> >
>>
>
> Thanks Vinayak and Sawada-san for taking this forward and basing your work
> on my patch.
>
>>
>> Still lot of work to do but attached latest patches.
>> These are based on the patch Ashutosh posted before, I revised it and
>> divided into two patches.
>> Compare with original patch, patch of pg_fdw_xact_resolver and
>> documentation are lacked.
>
>
> I am not able to understand the last statement.

Sorry to confuse you.

> Do you mean to say that your patches do not have pg_fdw_xact_resolver() and
> documentation that my patches had?

Yes.
I'm confirming them that your patches had.

Thanks for the clarification. I had added pg_fdw_xact_resolver() to resolve any transactions which can not be resolved immediately after they were prepared. There was a comment from Kevin (IIRC) that leaving transactions unresolved on the foreign server keeps the resources locked on those servers. That's not a very good situation. And nobody but the initiating server can resolve those. That functionality is important to make it a complete 2PC solution. So, please consider it to be included in your first set of patches.

The attached patch included pg_fdw_xact_resolver.

Regards,
Vinayak Pokale
NTT Open Source Software Center

Attachment

0003-pg-fdw-xact-resolver.patch

Re: Transactions involving multiple postgres foreign servers

From

vinayak

Date:

26 September 2016, 07:32:08

On 2016/09/07 10:54, vinayak wrote:

Thanks for the clarification. I had added pg_fdw_xact_resolver() to resolve any transactions which can not be resolved immediately after they were prepared. There was a comment from Kevin (IIRC) that leaving transactions unresolved on the foreign server keeps the resources locked on those servers. That's not a very good situation. And nobody but the initiating server can resolve those. That functionality is important to make it a complete 2PC solution. So, please consider it to be included in your first set of patches.
The attached patch included pg_fdw_xact_resolver.

The attached patch includes the documentation.

Regards,
Vinayak Pokale
NTT Open Source Software Center

Attachment

0001-Support-transaction-with-foreign-servers.patch

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

26 September 2016, 10:28:58

My original patch added code to manage the files for 2 phase
transactions opened by the local server on the remote servers. This
code was mostly inspired from the code in twophase.c which manages the
file for prepared transactions. The logic to manage 2PC files has
changed since [1] and has been optimized. One of the things I wanted
to do is see, if those optimizations are applicable here as well. Have
you considered that?


[1]. https://www.postgresql.org/message-id/74355FCF-AADC-4E51-850B-47AF59E0B215%40postgrespro.ru

On Fri, Aug 26, 2016 at 11:43 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>
>
> On Fri, Aug 26, 2016 at 11:37 AM, Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
>>
>> On Fri, Aug 26, 2016 at 3:03 PM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>> >
>> >
>> > On Fri, Aug 26, 2016 at 11:22 AM, Masahiko Sawada
>> > <sawada.mshk@gmail.com>
>> > wrote:
>> >>
>> >> On Fri, Aug 26, 2016 at 1:32 PM, Vinayak Pokale <vinpokale@gmail.com>
>> >> wrote:
>> >> > Hi All,
>> >> >
>> >> > Ashutosh proposed the feature 2PC for FDW for achieving atomic
>> >> > commits
>> >> > across multiple foreign servers.
>> >> > If a transaction make changes to more than two foreign servers the
>> >> > current
>> >> > implementation in postgres_fdw doesn't make sure that either all of
>> >> > them
>> >> > commit or all of them rollback their changes.
>> >> >
>> >> > We (Masahiko Sawada and me) reopen this thread and trying to
>> >> > contribute
>> >> > in
>> >> > it.
>> >> >
>> >> > 2PC for FDW
>> >> > ============
>> >> > The patch provides support for atomic commit for transactions
>> >> > involving
>> >> > foreign servers. when the transaction makes changes to foreign
>> >> > servers,
>> >> > either all the changes to all the foreign servers commit or rollback.
>> >> >
>> >> > The new patch 2PC for FDW include the following things:
>> >> > 1. The patch 0001 introduces a generic feature. All kinds of FDW that
>> >> > support 2PC such as oracle_fdw, mysql_fdw, postgres_fdw etc. can
>> >> > involve
>> >> > in
>> >> > the transaction.
>> >> >
>> >> > Currently we can push some conditions down to shard nodes, especially
>> >> > in
>> >> > 9.6
>> >> > the directly modify feature has
>> >> > been introduced. But such a transaction modifying data on shard node
>> >> > is
>> >> > not
>> >> > executed surely.
>> >> > Using 0002 patch, that modify is executed with 2PC. It means that we
>> >> > almost
>> >> > can provide sharding solution using
>> >> > multiple PostgreSQL server (one parent node and several shared node).
>> >> >
>> >> > For multi master, we definitely need transaction manager but
>> >> > transaction
>> >> > manager probably can use this 2PC for FDW feature to manage
>> >> > distributed
>> >> > transaction.
>> >> >
>> >> > 2. 0002 patch makes postgres_fdw possible to use 2PC.
>> >> >
>> >> > 0002 patch makes postgres_fdw to use below APIs. These APIs are
>> >> > generic
>> >> > features which can be used by all kinds of FDWs.
>> >> >
>> >> >     a. Execute PREAPRE TRANSACTION and COMMIT/ABORT PREAPRED instead
>> >> > of
>> >> > COMMIT/ABORT on foreign server which supports 2PC.
>> >> >     b. Manage information of foreign prepared transactions resolver
>> >> >
>> >> > Masahiko Sawada will post the patch.
>> >> >
>> >> >
>> >>
>> >
>> > Thanks Vinayak and Sawada-san for taking this forward and basing your
>> > work
>> > on my patch.
>> >
>> >>
>> >> Still lot of work to do but attached latest patches.
>> >> These are based on the patch Ashutosh posted before, I revised it and
>> >> divided into two patches.
>> >> Compare with original patch, patch of pg_fdw_xact_resolver and
>> >> documentation are lacked.
>> >
>> >
>> > I am not able to understand the last statement.
>>
>> Sorry to confuse you.
>>
>> > Do you mean to say that your patches do not have pg_fdw_xact_resolver()
>> > and
>> > documentation that my patches had?
>>
>> Yes.
>> I'm confirming them that your patches had.
>
>
> Thanks for the clarification. I had added pg_fdw_xact_resolver() to resolve
> any transactions which can not be resolved immediately after they were
> prepared. There was a comment from Kevin (IIRC) that leaving transactions
> unresolved on the foreign server keeps the resources locked on those
> servers. That's not a very good situation. And nobody but the initiating
> server can resolve those. That functionality is important to make it a
> complete 2PC solution. So, please consider it to be included in your first
> set of patches.
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

26 September 2016, 11:56:26

On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> My original patch added code to manage the files for 2 phase
> transactions opened by the local server on the remote servers. This
> code was mostly inspired from the code in twophase.c which manages the
> file for prepared transactions. The logic to manage 2PC files has
> changed since [1] and has been optimized. One of the things I wanted
> to do is see, if those optimizations are applicable here as well. Have
> you considered that?
>
>

Yeah, we're considering it.
After these changes are committed, we will post the patch incorporated
these changes.

But what we need to do first is the discussion in order to get consensus.
Since current design of this patch is to transparently execute DCL of
2PC on foreign server, this code changes lot of code and is
complicated.
Another approach I have is to push down DCL to only foreign servers
that support 2PC protocol, which is similar to DML push down.
This approach would be more simpler than current idea and is easy to
use by distributed transaction manager.
I think that it would be good place to start.

I'd like to discuss what the best approach is for transaction
involving foreign servers.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

26 September 2016, 12:07:49

On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> My original patch added code to manage the files for 2 phase
>> transactions opened by the local server on the remote servers. This
>> code was mostly inspired from the code in twophase.c which manages the
>> file for prepared transactions. The logic to manage 2PC files has
>> changed since [1] and has been optimized. One of the things I wanted
>> to do is see, if those optimizations are applicable here as well. Have
>> you considered that?
>>
>>
>
> Yeah, we're considering it.
> After these changes are committed, we will post the patch incorporated
> these changes.
>
> But what we need to do first is the discussion in order to get consensus.
> Since current design of this patch is to transparently execute DCL of
> 2PC on foreign server, this code changes lot of code and is
> complicated.

Can you please elaborate. I am not able to understand what DCL is
involved here. According to [1], examples of DCL are GRANT and REVOKE
command.

> Another approach I have is to push down DCL to only foreign servers
> that support 2PC protocol, which is similar to DML push down.
> This approach would be more simpler than current idea and is easy to
> use by distributed transaction manager.

Again, can you please elaborate, how that would be different from the
current approach and how does it simplify the code.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

27 September 2016, 09:25:04

On Mon, Sep 26, 2016 at 9:07 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>>> My original patch added code to manage the files for 2 phase
>>> transactions opened by the local server on the remote servers. This
>>> code was mostly inspired from the code in twophase.c which manages the
>>> file for prepared transactions. The logic to manage 2PC files has
>>> changed since [1] and has been optimized. One of the things I wanted
>>> to do is see, if those optimizations are applicable here as well. Have
>>> you considered that?
>>>
>>>
>>
>> Yeah, we're considering it.
>> After these changes are committed, we will post the patch incorporated
>> these changes.
>>
>> But what we need to do first is the discussion in order to get consensus.
>> Since current design of this patch is to transparently execute DCL of
>> 2PC on foreign server, this code changes lot of code and is
>> complicated.
>
> Can you please elaborate. I am not able to understand what DCL is
> involved here. According to [1], examples of DCL are GRANT and REVOKE
> command.

I meant transaction management command such as PREPARE TRANSACTION and
COMMIT/ABORT PREPARED command.
The web page I refered might be wrong, sorry.

>> Another approach I have is to push down DCL to only foreign servers
>> that support 2PC protocol, which is similar to DML push down.
>> This approach would be more simpler than current idea and is easy to
>> use by distributed transaction manager.
>
> Again, can you please elaborate, how that would be different from the
> current approach and how does it simplify the code.
>

The idea is just to push down PREPARE TRANSACTION, COMMIT/ROLLBACK
PREPARED to foreign servers that support 2PC.
With this idea, the client need to do following operation when foreign
server is involved with transaction.

BEGIN;
UPDATE parent_table SET ...; -- update including foreign server
PREPARE TRANSACTION 'xact_id';
COMMIT PREPARED 'xact_id';

The above PREPARE TRANSACTION and COMMIT PREPARED command are pushed
down to foreign server.
That is, the client needs to execute PREPARE TRANSACTION and
COMMIT/ROLLBACK PREPARED explicitly.

In this idea, I think that we don't need to do followings,

* Providing the prepare id of 2PC. Current patch adds new API prepare_id_provider() but we can use the
prepare id of 2PC that is used on parent server.

* Keeping track of status of foreign servers. Current patch keeps track of status of foreign servers involved with
transaction but this idea is just to push down transaction management
command to foreign server. So I think that we no longer need to do that.

* Adding max_prepared_foreign_transactions parameter. It means that the number of transaction involving foreign server
is
the same as max_prepared_transactions.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

27 September 2016, 12:06:45

On Tue, Sep 27, 2016 at 2:54 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Mon, Sep 26, 2016 at 9:07 PM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat
>>> <ashutosh.bapat@enterprisedb.com> wrote:
>>>> My original patch added code to manage the files for 2 phase
>>>> transactions opened by the local server on the remote servers. This
>>>> code was mostly inspired from the code in twophase.c which manages the
>>>> file for prepared transactions. The logic to manage 2PC files has
>>>> changed since [1] and has been optimized. One of the things I wanted
>>>> to do is see, if those optimizations are applicable here as well. Have
>>>> you considered that?
>>>>
>>>>
>>>
>>> Yeah, we're considering it.
>>> After these changes are committed, we will post the patch incorporated
>>> these changes.
>>>
>>> But what we need to do first is the discussion in order to get consensus.
>>> Since current design of this patch is to transparently execute DCL of
>>> 2PC on foreign server, this code changes lot of code and is
>>> complicated.
>>
>> Can you please elaborate. I am not able to understand what DCL is
>> involved here. According to [1], examples of DCL are GRANT and REVOKE
>> command.
>
> I meant transaction management command such as PREPARE TRANSACTION and
> COMMIT/ABORT PREPARED command.
> The web page I refered might be wrong, sorry.
>
>>> Another approach I have is to push down DCL to only foreign servers
>>> that support 2PC protocol, which is similar to DML push down.
>>> This approach would be more simpler than current idea and is easy to
>>> use by distributed transaction manager.
>>
>> Again, can you please elaborate, how that would be different from the
>> current approach and how does it simplify the code.
>>
>
> The idea is just to push down PREPARE TRANSACTION, COMMIT/ROLLBACK
> PREPARED to foreign servers that support 2PC.
> With this idea, the client need to do following operation when foreign
> server is involved with transaction.
>
> BEGIN;
> UPDATE parent_table SET ...; -- update including foreign server
> PREPARE TRANSACTION 'xact_id';
> COMMIT PREPARED 'xact_id';
>
> The above PREPARE TRANSACTION and COMMIT PREPARED command are pushed
> down to foreign server.
> That is, the client needs to execute PREPARE TRANSACTION and
>
> In this idea, I think that we don't need to do followings,
>
> * Providing the prepare id of 2PC.
>   Current patch adds new API prepare_id_provider() but we can use the
> prepare id of 2PC that is used on parent server.
>
> * Keeping track of status of foreign servers.
>   Current patch keeps track of status of foreign servers involved with
> transaction but this idea is just to push down transaction management
> command to foreign server.
>   So I think that we no longer need to do that.

> COMMIT/ROLLBACK PREPARED explicitly.

The problem with this approach is same as one previously stated. If
the connection between local and foreign server is lost between
PREPARE and COMMIT the prepared transaction on the foreign server
remains dangling, none other than the local server knows what to do
with it and the local server has lost track of the prepared
transaction on the foreign server. So, just pushing down those
commands doesn't work.

>
> * Adding max_prepared_foreign_transactions parameter.
>   It means that the number of transaction involving foreign server is
> the same as max_prepared_transactions.
>

That isn't true exactly. max_prepared_foreign_transactions indicates
how many transactions can be prepared on the foreign server, which in
the method you propose should have a cap of max_prepared_transactions
* number of foreign servers.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Michael Paquier

Date:

28 September 2016, 01:04:24

On Tue, Sep 27, 2016 at 6:24 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> * Providing the prepare id of 2PC.
>   Current patch adds new API prepare_id_provider() but we can use the
> prepare id of 2PC that is used on parent server.

And we assume that when this is used across many servers there will be
no GID conflict because each server is careful enough to generate
unique strings, say with UUIDs?
-- 
Michael

Re: Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

28 September 2016, 05:14:31

On Tue, Sep 27, 2016 at 9:06 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Tue, Sep 27, 2016 at 2:54 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Mon, Sep 26, 2016 at 9:07 PM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>>> On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat
>>>> <ashutosh.bapat@enterprisedb.com> wrote:
>>>>> My original patch added code to manage the files for 2 phase
>>>>> transactions opened by the local server on the remote servers. This
>>>>> code was mostly inspired from the code in twophase.c which manages the
>>>>> file for prepared transactions. The logic to manage 2PC files has
>>>>> changed since [1] and has been optimized. One of the things I wanted
>>>>> to do is see, if those optimizations are applicable here as well. Have
>>>>> you considered that?
>>>>>
>>>>>
>>>>
>>>> Yeah, we're considering it.
>>>> After these changes are committed, we will post the patch incorporated
>>>> these changes.
>>>>
>>>> But what we need to do first is the discussion in order to get consensus.
>>>> Since current design of this patch is to transparently execute DCL of
>>>> 2PC on foreign server, this code changes lot of code and is
>>>> complicated.
>>>
>>> Can you please elaborate. I am not able to understand what DCL is
>>> involved here. According to [1], examples of DCL are GRANT and REVOKE
>>> command.
>>
>> I meant transaction management command such as PREPARE TRANSACTION and
>> COMMIT/ABORT PREPARED command.
>> The web page I refered might be wrong, sorry.
>>
>>>> Another approach I have is to push down DCL to only foreign servers
>>>> that support 2PC protocol, which is similar to DML push down.
>>>> This approach would be more simpler than current idea and is easy to
>>>> use by distributed transaction manager.
>>>
>>> Again, can you please elaborate, how that would be different from the
>>> current approach and how does it simplify the code.
>>>
>>
>> The idea is just to push down PREPARE TRANSACTION, COMMIT/ROLLBACK
>> PREPARED to foreign servers that support 2PC.
>> With this idea, the client need to do following operation when foreign
>> server is involved with transaction.
>>
>> BEGIN;
>> UPDATE parent_table SET ...; -- update including foreign server
>> PREPARE TRANSACTION 'xact_id';
>> COMMIT PREPARED 'xact_id';
>>
>> The above PREPARE TRANSACTION and COMMIT PREPARED command are pushed
>> down to foreign server.
>> That is, the client needs to execute PREPARE TRANSACTION and
>>
>> In this idea, I think that we don't need to do followings,
>>
>> * Providing the prepare id of 2PC.
>>   Current patch adds new API prepare_id_provider() but we can use the
>> prepare id of 2PC that is used on parent server.
>>
>> * Keeping track of status of foreign servers.
>>   Current patch keeps track of status of foreign servers involved with
>> transaction but this idea is just to push down transaction management
>> command to foreign server.
>>   So I think that we no longer need to do that.
>
>> COMMIT/ROLLBACK PREPARED explicitly.
>
> The problem with this approach is same as one previously stated. If
> the connection between local and foreign server is lost between
> PREPARE and COMMIT the prepared transaction on the foreign server
> remains dangling, none other than the local server knows what to do
> with it and the local server has lost track of the prepared
> transaction on the foreign server. So, just pushing down those
> commands doesn't work.

Yeah, my idea is one of the first step.
Mechanism that resolves the dangling foreign transaction and the
resolver worker process are necessary.

>>
>> * Adding max_prepared_foreign_transactions parameter.
>>   It means that the number of transaction involving foreign server is
>> the same as max_prepared_transactions.
>>
>
> That isn't true exactly. max_prepared_foreign_transactions indicates
> how many transactions can be prepared on the foreign server, which in
> the method you propose should have a cap of max_prepared_transactions
> * number of foreign servers.

Oh, I understood, thanks.

Consider sharding solution using postgres_fdw (that is, the parent
postgres server has multiple shard postgres servers), we need to
increase max_prepared_foreign_transactions whenever new shard server
is added to cluster, or to allocate enough size in advance. But the
estimation of enough max_prepared_foreign_transactions would not be
easy, for example can we estimate it by (max throughput of the system)
* (the number of foreign servers)?

One new idea I came up with is that we set transaction id on parent
server to global transaction id (gid) that is prepared on shard
server.
And pg_fdw_resolver worker process periodically resolves the dangling
transaction on foreign server by comparing active lowest XID on parent
server with the XID in gid used by PREPARE TRANSACTION.

For example, suppose that there are one parent server and one shard
server, and the client executes update transaction (XID = 100)
involving foreign servers.
In commit phase, parent server executes PREPARE TRANSACTION command
with gid containing 100, say 'px_<random
number>_100_<serverid>_<userid>', on foreign server.
If the shard server crashed before COMMIT PREPARED, the transaction
100 become danging transaction.

But resolver worker process on parent server can resolve it with
following steps.
1. Get lowest active XID on parent server(XID=110).
2. Connect to foreign server. (Get foreign server information from
pg_foreign_server system catalog.)
3. Check if there is prepared transaction with XID less than 110.
4. Rollback the dangling transaction found at #3 step.   gid 'px_<random number>_100_<serverid>_<userid>' is prepared
on
foreign server by transaction 100, rollback it.

In this idea, we need gid provider API but parent server doesn't need
to have persistent foreign transaction data.
Also we could remove max_prepared_foreign_transactions, and fdw_xact.c
would become more simple implementation.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

28 September 2016, 06:30:09

On Wed, Sep 28, 2016 at 10:43 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Tue, Sep 27, 2016 at 9:06 PM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> On Tue, Sep 27, 2016 at 2:54 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Mon, Sep 26, 2016 at 9:07 PM, Ashutosh Bapat
>>> <ashutosh.bapat@enterprisedb.com> wrote:
>>>> On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>>> On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat
>>>>> <ashutosh.bapat@enterprisedb.com> wrote:
>>>>>> My original patch added code to manage the files for 2 phase
>>>>>> transactions opened by the local server on the remote servers. This
>>>>>> code was mostly inspired from the code in twophase.c which manages the
>>>>>> file for prepared transactions. The logic to manage 2PC files has
>>>>>> changed since [1] and has been optimized. One of the things I wanted
>>>>>> to do is see, if those optimizations are applicable here as well. Have
>>>>>> you considered that?
>>>>>>
>>>>>>
>>>>>
>>>>> Yeah, we're considering it.
>>>>> After these changes are committed, we will post the patch incorporated
>>>>> these changes.
>>>>>
>>>>> But what we need to do first is the discussion in order to get consensus.
>>>>> Since current design of this patch is to transparently execute DCL of
>>>>> 2PC on foreign server, this code changes lot of code and is
>>>>> complicated.
>>>>
>>>> Can you please elaborate. I am not able to understand what DCL is
>>>> involved here. According to [1], examples of DCL are GRANT and REVOKE
>>>> command.
>>>
>>> I meant transaction management command such as PREPARE TRANSACTION and
>>> COMMIT/ABORT PREPARED command.
>>> The web page I refered might be wrong, sorry.
>>>
>>>>> Another approach I have is to push down DCL to only foreign servers
>>>>> that support 2PC protocol, which is similar to DML push down.
>>>>> This approach would be more simpler than current idea and is easy to
>>>>> use by distributed transaction manager.
>>>>
>>>> Again, can you please elaborate, how that would be different from the
>>>> current approach and how does it simplify the code.
>>>>
>>>
>>> The idea is just to push down PREPARE TRANSACTION, COMMIT/ROLLBACK
>>> PREPARED to foreign servers that support 2PC.
>>> With this idea, the client need to do following operation when foreign
>>> server is involved with transaction.
>>>
>>> BEGIN;
>>> UPDATE parent_table SET ...; -- update including foreign server
>>> PREPARE TRANSACTION 'xact_id';
>>> COMMIT PREPARED 'xact_id';
>>>
>>> The above PREPARE TRANSACTION and COMMIT PREPARED command are pushed
>>> down to foreign server.
>>> That is, the client needs to execute PREPARE TRANSACTION and
>>>
>>> In this idea, I think that we don't need to do followings,
>>>
>>> * Providing the prepare id of 2PC.
>>>   Current patch adds new API prepare_id_provider() but we can use the
>>> prepare id of 2PC that is used on parent server.
>>>
>>> * Keeping track of status of foreign servers.
>>>   Current patch keeps track of status of foreign servers involved with
>>> transaction but this idea is just to push down transaction management
>>> command to foreign server.
>>>   So I think that we no longer need to do that.
>>
>>> COMMIT/ROLLBACK PREPARED explicitly.
>>
>> The problem with this approach is same as one previously stated. If
>> the connection between local and foreign server is lost between
>> PREPARE and COMMIT the prepared transaction on the foreign server
>> remains dangling, none other than the local server knows what to do
>> with it and the local server has lost track of the prepared
>> transaction on the foreign server. So, just pushing down those
>> commands doesn't work.
>
> Yeah, my idea is one of the first step.
> Mechanism that resolves the dangling foreign transaction and the
> resolver worker process are necessary.
>
>>>
>>> * Adding max_prepared_foreign_transactions parameter.
>>>   It means that the number of transaction involving foreign server is
>>> the same as max_prepared_transactions.
>>>
>>
>> That isn't true exactly. max_prepared_foreign_transactions indicates
>> how many transactions can be prepared on the foreign server, which in
>> the method you propose should have a cap of max_prepared_transactions
>> * number of foreign servers.
>
> Oh, I understood, thanks.
>
> Consider sharding solution using postgres_fdw (that is, the parent
> postgres server has multiple shard postgres servers), we need to
> increase max_prepared_foreign_transactions whenever new shard server
> is added to cluster, or to allocate enough size in advance. But the
> estimation of enough max_prepared_foreign_transactions would not be
> easy, for example can we estimate it by (max throughput of the system)
> * (the number of foreign servers)?
>
> One new idea I came up with is that we set transaction id on parent
> server to global transaction id (gid) that is prepared on shard
> server.
> And pg_fdw_resolver worker process periodically resolves the dangling
> transaction on foreign server by comparing active lowest XID on parent
> server with the XID in gid used by PREPARE TRANSACTION.
>
> For example, suppose that there are one parent server and one shard
> server, and the client executes update transaction (XID = 100)
> involving foreign servers.
> In commit phase, parent server executes PREPARE TRANSACTION command
> with gid containing 100, say 'px_<random
> number>_100_<serverid>_<userid>', on foreign server.
> If the shard server crashed before COMMIT PREPARED, the transaction
> 100 become danging transaction.
>
> But resolver worker process on parent server can resolve it with
> following steps.
> 1. Get lowest active XID on parent server(XID=110).
> 2. Connect to foreign server. (Get foreign server information from
> pg_foreign_server system catalog.)
> 3. Check if there is prepared transaction with XID less than 110.
> 4. Rollback the dangling transaction found at #3 step.
>     gid 'px_<random number>_100_<serverid>_<userid>' is prepared on
> foreign server by transaction 100, rollback it.

Why always rollback any dangling transaction? There can be a case that
a foreign server has a dangling transaction which needs to be
committed because the portions of that transaction on the other shards
are committed.

The way gid is crafted, there is no way to check whether the given
prepared transaction was created by the local server or not. Probably
the local server needs to add a unique signature in GID to identify
the transactions prepared by itself. That signature should be
transferred to standby to cope up with the fail-over of local server.
In this idea, one has to keep on polling the foreign server to find
any dangling transactions. In usual scenario, we shouldn't have a
large number of dangling transactions, and thus periodic polling might
be a waste.

>
> In this idea, we need gid provider API but parent server doesn't need
> to have persistent foreign transaction data.
> Also we could remove max_prepared_foreign_transactions, and fdw_xact.c
> would become more simple implementation.
>

I agree, but we need to cope with above two problems.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Michael Paquier

Date:

03 October 2016, 02:17:39

On Wed, Sep 28, 2016 at 3:30 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> I agree, but we need to cope with above two problems.

I have marked the patch as returned with feedback per the last output
Ashutosh has provided.
-- 
Michael

Re: Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

04 October 2016, 01:48:09

On Wed, Sep 28, 2016 at 3:30 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Wed, Sep 28, 2016 at 10:43 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Tue, Sep 27, 2016 at 9:06 PM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>>> On Tue, Sep 27, 2016 at 2:54 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> On Mon, Sep 26, 2016 at 9:07 PM, Ashutosh Bapat
>>>> <ashutosh.bapat@enterprisedb.com> wrote:
>>>>> On Mon, Sep 26, 2016 at 5:25 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>>>> On Mon, Sep 26, 2016 at 7:28 PM, Ashutosh Bapat
>>>>>> <ashutosh.bapat@enterprisedb.com> wrote:
>>>>>>> My original patch added code to manage the files for 2 phase
>>>>>>> transactions opened by the local server on the remote servers. This
>>>>>>> code was mostly inspired from the code in twophase.c which manages the
>>>>>>> file for prepared transactions. The logic to manage 2PC files has
>>>>>>> changed since [1] and has been optimized. One of the things I wanted
>>>>>>> to do is see, if those optimizations are applicable here as well. Have
>>>>>>> you considered that?
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Yeah, we're considering it.
>>>>>> After these changes are committed, we will post the patch incorporated
>>>>>> these changes.
>>>>>>
>>>>>> But what we need to do first is the discussion in order to get consensus.
>>>>>> Since current design of this patch is to transparently execute DCL of
>>>>>> 2PC on foreign server, this code changes lot of code and is
>>>>>> complicated.
>>>>>
>>>>> Can you please elaborate. I am not able to understand what DCL is
>>>>> involved here. According to [1], examples of DCL are GRANT and REVOKE
>>>>> command.
>>>>
>>>> I meant transaction management command such as PREPARE TRANSACTION and
>>>> COMMIT/ABORT PREPARED command.
>>>> The web page I refered might be wrong, sorry.
>>>>
>>>>>> Another approach I have is to push down DCL to only foreign servers
>>>>>> that support 2PC protocol, which is similar to DML push down.
>>>>>> This approach would be more simpler than current idea and is easy to
>>>>>> use by distributed transaction manager.
>>>>>
>>>>> Again, can you please elaborate, how that would be different from the
>>>>> current approach and how does it simplify the code.
>>>>>
>>>>
>>>> The idea is just to push down PREPARE TRANSACTION, COMMIT/ROLLBACK
>>>> PREPARED to foreign servers that support 2PC.
>>>> With this idea, the client need to do following operation when foreign
>>>> server is involved with transaction.
>>>>
>>>> BEGIN;
>>>> UPDATE parent_table SET ...; -- update including foreign server
>>>> PREPARE TRANSACTION 'xact_id';
>>>> COMMIT PREPARED 'xact_id';
>>>>
>>>> The above PREPARE TRANSACTION and COMMIT PREPARED command are pushed
>>>> down to foreign server.
>>>> That is, the client needs to execute PREPARE TRANSACTION and
>>>>
>>>> In this idea, I think that we don't need to do followings,
>>>>
>>>> * Providing the prepare id of 2PC.
>>>>   Current patch adds new API prepare_id_provider() but we can use the
>>>> prepare id of 2PC that is used on parent server.
>>>>
>>>> * Keeping track of status of foreign servers.
>>>>   Current patch keeps track of status of foreign servers involved with
>>>> transaction but this idea is just to push down transaction management
>>>> command to foreign server.
>>>>   So I think that we no longer need to do that.
>>>
>>>> COMMIT/ROLLBACK PREPARED explicitly.
>>>
>>> The problem with this approach is same as one previously stated. If
>>> the connection between local and foreign server is lost between
>>> PREPARE and COMMIT the prepared transaction on the foreign server
>>> remains dangling, none other than the local server knows what to do
>>> with it and the local server has lost track of the prepared
>>> transaction on the foreign server. So, just pushing down those
>>> commands doesn't work.
>>
>> Yeah, my idea is one of the first step.
>> Mechanism that resolves the dangling foreign transaction and the
>> resolver worker process are necessary.
>>
>>>>
>>>> * Adding max_prepared_foreign_transactions parameter.
>>>>   It means that the number of transaction involving foreign server is
>>>> the same as max_prepared_transactions.
>>>>
>>>
>>> That isn't true exactly. max_prepared_foreign_transactions indicates
>>> how many transactions can be prepared on the foreign server, which in
>>> the method you propose should have a cap of max_prepared_transactions
>>> * number of foreign servers.
>>
>> Oh, I understood, thanks.
>>
>> Consider sharding solution using postgres_fdw (that is, the parent
>> postgres server has multiple shard postgres servers), we need to
>> increase max_prepared_foreign_transactions whenever new shard server
>> is added to cluster, or to allocate enough size in advance. But the
>> estimation of enough max_prepared_foreign_transactions would not be
>> easy, for example can we estimate it by (max throughput of the system)
>> * (the number of foreign servers)?
>>
>> One new idea I came up with is that we set transaction id on parent
>> server to global transaction id (gid) that is prepared on shard
>> server.
>> And pg_fdw_resolver worker process periodically resolves the dangling
>> transaction on foreign server by comparing active lowest XID on parent
>> server with the XID in gid used by PREPARE TRANSACTION.
>>
>> For example, suppose that there are one parent server and one shard
>> server, and the client executes update transaction (XID = 100)
>> involving foreign servers.
>> In commit phase, parent server executes PREPARE TRANSACTION command
>> with gid containing 100, say 'px_<random
>> number>_100_<serverid>_<userid>', on foreign server.
>> If the shard server crashed before COMMIT PREPARED, the transaction
>> 100 become danging transaction.
>>
>> But resolver worker process on parent server can resolve it with
>> following steps.
>> 1. Get lowest active XID on parent server(XID=110).
>> 2. Connect to foreign server. (Get foreign server information from
>> pg_foreign_server system catalog.)
>> 3. Check if there is prepared transaction with XID less than 110.
>> 4. Rollback the dangling transaction found at #3 step.
>>     gid 'px_<random number>_100_<serverid>_<userid>' is prepared on
>> foreign server by transaction 100, rollback it.
>
> Why always rollback any dangling transaction? There can be a case that
> a foreign server has a dangling transaction which needs to be
> committed because the portions of that transaction on the other shards
> are committed.

Right, we can heuristically make a decision whether we do COMMIT or
ABORT on local server.
For example, if COMMIT PREPARED succeeded on at least one foreign
server, the local server return OK to client and the other dangling
transactions should be committed later.
We can find out that we should do either commit or abort the dangling
transaction by checking CLOG.

But we need to handle the case where the CLOG file containing XID
necessary for resolving dangling transaction is truncated.
If the user does VACUUM FREEZE just after remote server crashed, it
could be truncated.

> The way gid is crafted, there is no way to check whether the given
> prepared transaction was created by the local server or not. Probably
> the local server needs to add a unique signature in GID to identify
> the transactions prepared by itself. That signature should be
> transferred to standby to cope up with the fail-over of local server.

Maybe we can use database system identifier in control file.

> In this idea, one has to keep on polling the foreign server to find
> any dangling transactions. In usual scenario, we shouldn't have a
> large number of dangling transactions, and thus periodic polling might
> be a waste.

We can optimize it by storing the XID that is resolved heuristically
into the control file or system catalog, for example.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

04 October 2016, 04:27:10

>>
>> Why always rollback any dangling transaction? There can be a case that
>> a foreign server has a dangling transaction which needs to be
>> committed because the portions of that transaction on the other shards
>> are committed.
>
> Right, we can heuristically make a decision whether we do COMMIT or
> ABORT on local server.
> For example, if COMMIT PREPARED succeeded on at least one foreign
> server, the local server return OK to client and the other dangling
> transactions should be committed later.
> We can find out that we should do either commit or abort the dangling
> transaction by checking CLOG.

Heuristics can not become the default behavior. A user should be given
an option to choose a heuristic, and he should be aware of the
pitfalls when using this heuristic. I guess, first, we need to get a
solution which ensures that the transaction gets committed on all the
servers or is rolled back on all the foreign servers involved. AFAIR,
my patch did that. Once we have that kind of solution, we can think
about heuristics.

>
> But we need to handle the case where the CLOG file containing XID
> necessary for resolving dangling transaction is truncated.
> If the user does VACUUM FREEZE just after remote server crashed, it
> could be truncated.

Hmm, this needs to be fixed. Even my patch relied on XID to determine
whether the transaction committed or rolled back locally and thus to
decide whether it should be committed or rolled back on all the
foreign servers involved. I think I had taken care of the issue you
have pointed out here. Can you please verify the same?

>
>> The way gid is crafted, there is no way to check whether the given
>> prepared transaction was created by the local server or not. Probably
>> the local server needs to add a unique signature in GID to identify
>> the transactions prepared by itself. That signature should be
>> transferred to standby to cope up with the fail-over of local server.
>
> Maybe we can use database system identifier in control file.

may be.

>
>> In this idea, one has to keep on polling the foreign server to find
>> any dangling transactions. In usual scenario, we shouldn't have a
>> large number of dangling transactions, and thus periodic polling might
>> be a waste.
>
> We can optimize it by storing the XID that is resolved heuristically
> into the control file or system catalog, for example.
>

There will be many such XIDs. We don't want to dump so many things in
control file, esp. when that's not control data. System catalog is out
of question since a rollback of local transaction would make those
rows in the system catalog invisible. That's the reason, why I chose
to write the foreign prepared transactions to files rather than a
system catalog.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Amit Langote

Date:

04 October 2016, 05:22:54

Hi,

On 2016/10/04 13:26, Ashutosh Bapat wrote:
>>>
>>> Why always rollback any dangling transaction? There can be a case that
>>> a foreign server has a dangling transaction which needs to be
>>> committed because the portions of that transaction on the other shards
>>> are committed.
>>
>> Right, we can heuristically make a decision whether we do COMMIT or
>> ABORT on local server.
>> For example, if COMMIT PREPARED succeeded on at least one foreign
>> server, the local server return OK to client and the other dangling
>> transactions should be committed later.
>> We can find out that we should do either commit or abort the dangling
>> transaction by checking CLOG.
> 
> Heuristics can not become the default behavior. A user should be given
> an option to choose a heuristic, and he should be aware of the
> pitfalls when using this heuristic. I guess, first, we need to get a
> solution which ensures that the transaction gets committed on all the
> servers or is rolled back on all the foreign servers involved. AFAIR,
> my patch did that. Once we have that kind of solution, we can think
> about heuristics.

I wonder if Sawada-san is referring to some sort of quorum-based (atomic)
commitment protocol [1, 2], although I agree that that would be an
advanced technique for handling the limitations such as blocking nature of
the basic two-phase commit protocol in case of communication failures,
IOW, meant for better availability rather than correctness.

Thanks,
Amit

[1]
https://en.wikipedia.org/wiki/Quorum_(distributed_computing)#Quorum-based_voting_in_commit_protocols

[2] http://hub.hku.hk/bitstream/10722/158032/1/Content.pdf

Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

04 October 2016, 06:29:18

<br /><br /> On Tue, Oct 4, 2016 at 1:26 PM, Ashutosh Bapat <<a
href="javascript:;">ashutosh.bapat@enterprisedb.com</a>>wrote:<br /> >>><br /> >>> Why always
rollbackany dangling transaction? There can be a case that<br /> >>> a foreign server has a dangling
transactionwhich needs to be<br /> >>> committed because the portions of that transaction on the other
shards<br/> >>> are committed.<br /> >><br /> >> Right, we can heuristically make a decision
whetherwe do COMMIT or<br /> >> ABORT on local server.<br /> >> For example, if COMMIT PREPARED succeeded
onat least one foreign<br /> >> server, the local server return OK to client and the other dangling<br />
>>transactions should be committed later.<br /> >> We can find out that we should do either commit or abort
thedangling<br /> >> transaction by checking CLOG.<br /> ><br /> > Heuristics can not become the default
behavior.A user should be given<br /> > an option to choose a heuristic, and he should be aware of the<br /> >
pitfallswhen using this heuristic. I guess, first, we need to get a<br /> > solution which ensures that the
transactiongets committed on all the<br /> > servers or is rolled back on all the foreign servers involved.
AFAIR,<br/> > my patch did that. Once we have that kind of solution, we can think<br /> > about heuristics.<br
/><br/> I meant that we could determine it heuristically only when remote server crashed in 2nd phase of 2PC.<br />For
example,what does the local server returns to client when no one remote server returns OK to local server in 2nd phase
of2PC for more than statement_timeout seconds? Ok or error?<br /><br /> >><br /> >> But we need to handle
thecase where the CLOG file containing XID<br /> >> necessary for resolving dangling transaction is truncated.<br
/>>> If the user does VACUUM FREEZE just after remote server crashed, it<br /> >> could be truncated.<br />
><br/> > Hmm, this needs to be fixed. Even my patch relied on XID to determine<br /> > whether the transaction
committedor rolled back locally and thus to<br /> > decide whether it should be committed or rolled back on all
the<br/> > foreign servers involved. I think I had taken care of the issue you<br /> > have pointed out here. Can
youplease verify the same?<br /> ><br /> >><br /> >>> The way gid is crafted, there is no way to
checkwhether the given<br /> >>> prepared transaction was created by the local server or not. Probably<br />
>>>the local server needs to add a unique signature in GID to identify<br /> >>> the transactions
preparedby itself. That signature should be<br /> >>> transferred to standby to cope up with the fail-over of
localserver.<br /> >><br /> >> Maybe we can use database system identifier in control file.<br /> ><br
/>> may be.<br /> ><br /> >><br /> >>> In this idea, one has to keep on polling the foreign server
tofind<br /> >>> any dangling transactions. In usual scenario, we shouldn't have a<br /> >>> large
numberof dangling transactions, and thus periodic polling might<br /> >>> be a waste.<br /> >><br />
>>We can optimize it by storing the XID that is resolved heuristically<br /> >> into the control file or
systemcatalog, for example.<br /> >><br /> ><br /> > There will be many such XIDs. We don't want to dump so
manythings in<br /> > control file, esp. when that's not control data. System catalog is out<br /> > of question
sincea rollback of local transaction would make those<br /> > rows in the system catalog invisible. That's the
reason,why I chose<br /> > to write the foreign prepared transactions to files rather than a<br /> > system
catalog.<br/> ><br /><br /> We can store the lowest in-doubt transaction id (say in-doubt XID) that needs to be
resolvedlater into control file and the CLOG containing XID greater than in-doubt XID is never truncated.<br /> We need
totry to solve such transaction only when in-doubt XID is not NULL.<br /><br /> Regards,<br /><br /> --<br /> Masahiko
Sawada<br/> NIPPON TELEGRAPH AND TELEPHONE CORPORATION<br /> NTT Open Source Software Center<br /><br /><br />-- <br
/>Regards,<br/><br />--<br />Masahiko Sawada<br />NIPPON TELEGRAPH AND TELEPHONE CORPORATION<br />NTT Open Source
SoftwareCenter <br />

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

04 October 2016, 07:10:23

>>
>> Heuristics can not become the default behavior. A user should be given
>> an option to choose a heuristic, and he should be aware of the
>> pitfalls when using this heuristic. I guess, first, we need to get a
>> solution which ensures that the transaction gets committed on all the
>> servers or is rolled back on all the foreign servers involved. AFAIR,
>> my patch did that. Once we have that kind of solution, we can think
>> about heuristics.
>
> I meant that we could determine it heuristically only when remote server
> crashed in 2nd phase of 2PC.
> For example, what does the local server returns to client when no one remote
> server returns OK to local server in 2nd phase of 2PC for more than
> statement_timeout seconds? Ok or error?
>

The local server doesn't wait for the completion of the second phase
to finish the currently running statement. Once all the foreign
servers have responded to PREPARE request in the first phase, the
local server responds to the client. Am I missing something?


>>
>> There will be many such XIDs. We don't want to dump so many things in
>> control file, esp. when that's not control data. System catalog is out
>> of question since a rollback of local transaction would make those
>> rows in the system catalog invisible. That's the reason, why I chose
>> to write the foreign prepared transactions to files rather than a
>> system catalog.
>>
>
> We can store the lowest in-doubt transaction id (say in-doubt XID) that
> needs to be resolved later into control file and the CLOG containing XID
> greater than in-doubt XID is never truncated.
> We need to try to solve such transaction only when in-doubt XID is not NULL.
>
IIRC, my patch takes care of this. If the oldest active transaction
happens to be later in the time line than the oldest in-doubt
transaction, it sets oldest active transaction id to that of the
oldest in-doubt transaction.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Amit Langote

Date:

04 October 2016, 07:41:28

On 2016/10/04 16:10, Ashutosh Bapat wrote:
>>> Heuristics can not become the default behavior. A user should be given
>>> an option to choose a heuristic, and he should be aware of the
>>> pitfalls when using this heuristic. I guess, first, we need to get a
>>> solution which ensures that the transaction gets committed on all the
>>> servers or is rolled back on all the foreign servers involved. AFAIR,
>>> my patch did that. Once we have that kind of solution, we can think
>>> about heuristics.
>>
>> I meant that we could determine it heuristically only when remote server
>> crashed in 2nd phase of 2PC.
>> For example, what does the local server returns to client when no one remote
>> server returns OK to local server in 2nd phase of 2PC for more than
>> statement_timeout seconds? Ok or error?
>>
> 
> The local server doesn't wait for the completion of the second phase
> to finish the currently running statement. Once all the foreign
> servers have responded to PREPARE request in the first phase, the
> local server responds to the client. Am I missing something?

PREPARE sent to foreign servers involved in a given transaction is
*transparent* to the user who started the transaction, no?  That is, user
just says COMMIT and if it is found that there are multiple servers
involved in the transaction, it must be handled using two-phase commit
protocol *behind the scenes*.  So the aforementioned COMMIT should not
return to the client until after the above two-phase commit processing has
finished.

Or are you and Sawada-san talking about the case where the user issued
PREPARE and not COMMIT?

Thanks,
Amit

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

04 October 2016, 11:29:55

On Tue, Oct 4, 2016 at 1:11 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2016/10/04 16:10, Ashutosh Bapat wrote:
>>>> Heuristics can not become the default behavior. A user should be given
>>>> an option to choose a heuristic, and he should be aware of the
>>>> pitfalls when using this heuristic. I guess, first, we need to get a
>>>> solution which ensures that the transaction gets committed on all the
>>>> servers or is rolled back on all the foreign servers involved. AFAIR,
>>>> my patch did that. Once we have that kind of solution, we can think
>>>> about heuristics.
>>>
>>> I meant that we could determine it heuristically only when remote server
>>> crashed in 2nd phase of 2PC.
>>> For example, what does the local server returns to client when no one remote
>>> server returns OK to local server in 2nd phase of 2PC for more than
>>> statement_timeout seconds? Ok or error?
>>>
>>
>> The local server doesn't wait for the completion of the second phase
>> to finish the currently running statement. Once all the foreign
>> servers have responded to PREPARE request in the first phase, the
>> local server responds to the client. Am I missing something?
>
> PREPARE sent to foreign servers involved in a given transaction is
> *transparent* to the user who started the transaction, no?  That is, user
> just says COMMIT and if it is found that there are multiple servers
> involved in the transaction, it must be handled using two-phase commit
> protocol *behind the scenes*.  So the aforementioned COMMIT should not
> return to the client until after the above two-phase commit processing has
> finished.

No, the COMMIT returns after the first phase. It can not wait for all
the foreign servers to complete their second phase, which can take
quite long (or never) if one of the servers has crashed in between.

>
> Or are you and Sawada-san talking about the case where the user issued
> PREPARE and not COMMIT?

I guess, Sawada-san is still talking about the user issued PREPARE.
But my comment is applicable otherwise as well.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

04 October 2016, 13:09:14

On Tue, Oct 4, 2016 at 8:29 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Tue, Oct 4, 2016 at 1:11 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> On 2016/10/04 16:10, Ashutosh Bapat wrote:
>>>>> Heuristics can not become the default behavior. A user should be given
>>>>> an option to choose a heuristic, and he should be aware of the
>>>>> pitfalls when using this heuristic. I guess, first, we need to get a
>>>>> solution which ensures that the transaction gets committed on all the
>>>>> servers or is rolled back on all the foreign servers involved. AFAIR,
>>>>> my patch did that. Once we have that kind of solution, we can think
>>>>> about heuristics.
>>>>
>>>> I meant that we could determine it heuristically only when remote server
>>>> crashed in 2nd phase of 2PC.
>>>> For example, what does the local server returns to client when no one remote
>>>> server returns OK to local server in 2nd phase of 2PC for more than
>>>> statement_timeout seconds? Ok or error?
>>>>
>>>
>>> The local server doesn't wait for the completion of the second phase
>>> to finish the currently running statement. Once all the foreign
>>> servers have responded to PREPARE request in the first phase, the
>>> local server responds to the client. Am I missing something?
>>
>> PREPARE sent to foreign servers involved in a given transaction is
>> *transparent* to the user who started the transaction, no?  That is, user
>> just says COMMIT and if it is found that there are multiple servers
>> involved in the transaction, it must be handled using two-phase commit
>> protocol *behind the scenes*.  So the aforementioned COMMIT should not
>> return to the client until after the above two-phase commit processing has
>> finished.
>
> No, the COMMIT returns after the first phase. It can not wait for all
> the foreign servers to complete their second phase

Hm, it sounds like it's same as normal commit (not 2PC).
What's the difference?

My understanding is that basically the local server can not return
COMMIT to the client until 2nd phase is completed.
Otherwise the next transaction can see data that is not committed yet
on remote server.

> , which can take
> quite long (or never) if one of the servers has crashed in between.
>
>>
>> Or are you and Sawada-san talking about the case where the user issued
>> PREPARE and not COMMIT?
>
> I guess, Sawada-san is still talking about the user issued PREPARE.
> But my comment is applicable otherwise as well.
>

Yes, I'm considering the case where the local server tries to COMMIT
but the remote server crashed after the local server completes 1st
phase (PREPARE) on the all remote server.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

06 October 2016, 04:41:14

>>
>> No, the COMMIT returns after the first phase. It can not wait for all
>> the foreign servers to complete their second phase
>
> Hm, it sounds like it's same as normal commit (not 2PC).
> What's the difference?
>
> My understanding is that basically the local server can not return
> COMMIT to the client until 2nd phase is completed.


If we do that, the local server may not return to the client at all,
if the foreign server crashes and never comes up. Practically, it may
take much longer to finish a COMMIT, depending upon how long it takes
for the foreign server to reply to a COMMIT message. I don't think
that's desirable.

> Otherwise the next transaction can see data that is not committed yet
> on remote server.

2PC doesn't guarantee transactional consistency all by itself. It only
guarantees that all legs of a distributed transaction are either all
rolled back or all committed. IOW, it guarantees that a distributed
transaction is not rolled back on some nodes and committed on the
other node.

Providing a transactionally consistent view is a very hard problem.
Trying to solve all those problems in a single patch would be very
difficult and the amount of changes required may be really huge. Then
there are many possible consistency definitions when it comes to
consistency of distributed system. I have not seen a consensus on what
kind of consistency model/s we want to support in PostgreSQL. That's
another large debate. We have had previous attempts where people have
tried to complete everything in one go and nothing has been completed
yet.

2PC implementation OR guaranteeing that all the legs of a transaction
commit or roll back, is an essential block of any kind of distributed
transaction manager. So, we should at least support that one, before
attacking further problems.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

06 October 2016, 08:04:55

On Thu, Oct 6, 2016 at 1:41 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>>>
>>> No, the COMMIT returns after the first phase. It can not wait for all
>>> the foreign servers to complete their second phase
>>
>> Hm, it sounds like it's same as normal commit (not 2PC).
>> What's the difference?
>>
>> My understanding is that basically the local server can not return
>> COMMIT to the client until 2nd phase is completed.
>
>
> If we do that, the local server may not return to the client at all,
> if the foreign server crashes and never comes up. Practically, it may
> take much longer to finish a COMMIT, depending upon how long it takes
> for the foreign server to reply to a COMMIT message.

Yes, I think 2PC behaves so, please refer to [1].
To prevent local server stops forever due to communication failure.,
we could provide the timeout on coordinator side or on participant
side.

>
>> Otherwise the next transaction can see data that is not committed yet
>> on remote server.
>
> 2PC doesn't guarantee transactional consistency all by itself. It only
> guarantees that all legs of a distributed transaction are either all
> rolled back or all committed. IOW, it guarantees that a distributed
> transaction is not rolled back on some nodes and committed on the
> other node.
> Providing a transactionally consistent view is a very hard problem.
> Trying to solve all those problems in a single patch would be very
> difficult and the amount of changes required may be really huge. Then
> there are many possible consistency definitions when it comes to
> consistency of distributed system. I have not seen a consensus on what
> kind of consistency model/s we want to support in PostgreSQL. That's
> another large debate. We have had previous attempts where people have
> tried to complete everything in one go and nothing has been completed
> yet.

Yes, providing a atomic visibility is hard problem, and it's a
separated issue[2].

> 2PC implementation OR guaranteeing that all the legs of a transaction
> commit or roll back, is an essential block of any kind of distributed
> transaction manager. So, we should at least support that one, before
> attacking further problems.

I agree.

[1]https://en.wikipedia.org/wiki/Two-phase_commit_protocol
[2]http://www.bailis.org/papers/ramp-sigmod2014.pdf

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

06 October 2016, 08:45:43

On Thu, Oct 6, 2016 at 1:34 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Oct 6, 2016 at 1:41 PM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>>>>
>>>> No, the COMMIT returns after the first phase. It can not wait for all
>>>> the foreign servers to complete their second phase
>>>
>>> Hm, it sounds like it's same as normal commit (not 2PC).
>>> What's the difference?
>>>
>>> My understanding is that basically the local server can not return
>>> COMMIT to the client until 2nd phase is completed.
>>
>>
>> If we do that, the local server may not return to the client at all,
>> if the foreign server crashes and never comes up. Practically, it may
>> take much longer to finish a COMMIT, depending upon how long it takes
>> for the foreign server to reply to a COMMIT message.
>
> Yes, I think 2PC behaves so, please refer to [1].
> To prevent local server stops forever due to communication failure.,
> we could provide the timeout on coordinator side or on participant
> side.
>

This too, looks like a heuristic and shouldn't be the default
behaviour and hence not part of the first version of this feature.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Amit Langote

Date:

06 October 2016, 09:23:16

On 2016/10/06 17:45, Ashutosh Bapat wrote:
> On Thu, Oct 6, 2016 at 1:34 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Thu, Oct 6, 2016 at 1:41 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
>>>> My understanding is that basically the local server can not return
>>>> COMMIT to the client until 2nd phase is completed.
>>>
>>> If we do that, the local server may not return to the client at all,
>>> if the foreign server crashes and never comes up. Practically, it may
>>> take much longer to finish a COMMIT, depending upon how long it takes
>>> for the foreign server to reply to a COMMIT message.
>>
>> Yes, I think 2PC behaves so, please refer to [1].
>> To prevent local server stops forever due to communication failure.,
>> we could provide the timeout on coordinator side or on participant
>> side.
> 
> This too, looks like a heuristic and shouldn't be the default
> behaviour and hence not part of the first version of this feature.

At any rate, the coordinator should not return to the client until after
the 2nd phase is completed, which was the original point.  If COMMIT
taking longer is an issue, then it could be handled with one of the
approaches mentioned so far (even if not in the first version), but no
version of this feature should really return COMMIT to the client only
after finishing the first phase.  Am I missing something?

I am saying this because I am assuming that this feature means the client
itself does not invoke 2PC, even knowing that there are multiple servers
involved, but rather rely on the involved FDW drivers and related core
code handling it transparently.  I may have misunderstood the feature
though, apologies if so.

Thanks,
Amit

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

07 October 2016, 07:26:02

On Thu, Oct 6, 2016 at 2:52 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2016/10/06 17:45, Ashutosh Bapat wrote:
>> On Thu, Oct 6, 2016 at 1:34 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Thu, Oct 6, 2016 at 1:41 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
>>>>> My understanding is that basically the local server can not return
>>>>> COMMIT to the client until 2nd phase is completed.
>>>>
>>>> If we do that, the local server may not return to the client at all,
>>>> if the foreign server crashes and never comes up. Practically, it may
>>>> take much longer to finish a COMMIT, depending upon how long it takes
>>>> for the foreign server to reply to a COMMIT message.
>>>
>>> Yes, I think 2PC behaves so, please refer to [1].
>>> To prevent local server stops forever due to communication failure.,
>>> we could provide the timeout on coordinator side or on participant
>>> side.
>>
>> This too, looks like a heuristic and shouldn't be the default
>> behaviour and hence not part of the first version of this feature.
>
> At any rate, the coordinator should not return to the client until after
> the 2nd phase is completed, which was the original point.  If COMMIT
> taking longer is an issue, then it could be handled with one of the
> approaches mentioned so far (even if not in the first version), but no
> version of this feature should really return COMMIT to the client only
> after finishing the first phase.  Am I missing something?

There is small time window between actual COMMIT and a commit message
returned. An actual commit happens when we insert a WAL saying
transaction X committed and then we return to the client saying a
COMMIT happened. Note that a transaction may be committed but we will
never return to the client with a commit message, because connection
was lost or the server crashed. I hope we agree on this.

COMMITTING the foreign prepared transactions happens after we COMMIT
the local transaction. If we do it before COMMITTING local transaction
and the local server crashes, we will roll back local transaction
during subsequence recovery while the foreign segments have committed
resulting in an inconsistent state.

If we are successful in COMMITTING foreign transactions during
post-commit phase, COMMIT message will be returned after we have
committed all foreign transactions. But in case we can not reach a
foreign server, and request times out, we can not revert back our
decision that we are going to commit the transaction. That's my answer
to the timeout based heuristic.

I don't see much point in holding up post-commit processing for a
non-responsive foreign server, which may not respond for days
together. Can you please elaborate a use case? Which commercial
transaction manager does that?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

13 October 2016, 10:25:35

On Fri, Oct 7, 2016 at 4:25 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Thu, Oct 6, 2016 at 2:52 PM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> On 2016/10/06 17:45, Ashutosh Bapat wrote:
>>> On Thu, Oct 6, 2016 at 1:34 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> On Thu, Oct 6, 2016 at 1:41 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
>>>>>> My understanding is that basically the local server can not return
>>>>>> COMMIT to the client until 2nd phase is completed.
>>>>>
>>>>> If we do that, the local server may not return to the client at all,
>>>>> if the foreign server crashes and never comes up. Practically, it may
>>>>> take much longer to finish a COMMIT, depending upon how long it takes
>>>>> for the foreign server to reply to a COMMIT message.
>>>>
>>>> Yes, I think 2PC behaves so, please refer to [1].
>>>> To prevent local server stops forever due to communication failure.,
>>>> we could provide the timeout on coordinator side or on participant
>>>> side.
>>>
>>> This too, looks like a heuristic and shouldn't be the default
>>> behaviour and hence not part of the first version of this feature.
>>
>> At any rate, the coordinator should not return to the client until after
>> the 2nd phase is completed, which was the original point.  If COMMIT
>> taking longer is an issue, then it could be handled with one of the
>> approaches mentioned so far (even if not in the first version), but no
>> version of this feature should really return COMMIT to the client only
>> after finishing the first phase.  Am I missing something?
>
> There is small time window between actual COMMIT and a commit message
> returned. An actual commit happens when we insert a WAL saying
> transaction X committed and then we return to the client saying a
> COMMIT happened. Note that a transaction may be committed but we will
> never return to the client with a commit message, because connection
> was lost or the server crashed. I hope we agree on this.

Agree.

> COMMITTING the foreign prepared transactions happens after we COMMIT
> the local transaction. If we do it before COMMITTING local transaction
> and the local server crashes, we will roll back local transaction
> during subsequence recovery while the foreign segments have committed
> resulting in an inconsistent state.
>
> If we are successful in COMMITTING foreign transactions during
> post-commit phase, COMMIT message will be returned after we have
> committed all foreign transactions. But in case we can not reach a
> foreign server, and request times out, we can not revert back our
> decision that we are going to commit the transaction. That's my answer
> to the timeout based heuristic.

IIUC 2PC is the protocol that assumes that all of the foreign server live.
In case we can not reach a foreign server during post-commit phase,
basically the transaction and following transaction should stop until
the crashed server revived. This is the first place to implement 2PC
for FDW, I think. The heuristically determination approach I mentioned
is one of the optimization idea to avoid holding up transaction in
case a foreign server crashed.

> I don't see much point in holding up post-commit processing for a
> non-responsive foreign server, which may not respond for days
> together. Can you please elaborate a use case? Which commercial
> transaction manager does that?

For example, the client updates a data on foreign server and then
commits. And the next transaction from the same client selects new
data which was updated on previous transaction. In this case, because
the first transaction is committed the second transaction should be
able to see updated data, but it can see old data in your idea. Since
these is obviously order between first transaction and second
transaction I think that It's not problem of providing consistent
view.

I guess transaction manager of Postgres-XC behaves so, no?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

13 October 2016, 10:38:03

>>
>> If we are successful in COMMITTING foreign transactions during
>> post-commit phase, COMMIT message will be returned after we have
>> committed all foreign transactions. But in case we can not reach a
>> foreign server, and request times out, we can not revert back our
>> decision that we are going to commit the transaction. That's my answer
>> to the timeout based heuristic.
>
> IIUC 2PC is the protocol that assumes that all of the foreign server live.

Do you have any references? Take a look at [1]. The first paragraph
itself mentions that 2PC can achieve its goals despite temporary
failures.

> In case we can not reach a foreign server during post-commit phase,
> basically the transaction and following transaction should stop until
> the crashed server revived.

I have repeatedly given reasons why this is not correct. You and Amit
seem to repeat this statement again and again in turns without giving
any concrete reasons about why this is so.

> This is the first place to implement 2PC
> for FDW, I think. The heuristically determination approach I mentioned
> is one of the optimization idea to avoid holding up transaction in
> case a foreign server crashed.
>
>> I don't see much point in holding up post-commit processing for a
>> non-responsive foreign server, which may not respond for days
>> together. Can you please elaborate a use case? Which commercial
>> transaction manager does that?
>
> For example, the client updates a data on foreign server and then
> commits. And the next transaction from the same client selects new
> data which was updated on previous transaction. In this case, because
> the first transaction is committed the second transaction should be
> able to see updated data, but it can see old data in your idea. Since
> these is obviously order between first transaction and second
> transaction I think that It's not problem of providing consistent
> view.

2PC doesn't guarantee this. For that you need other methods and
protocols. We have discussed this before. [2]


[1] https://en.wikipedia.org/wiki/Two-phase_commit_protocol
[2] https://www.postgresql.org/message-id/CAD21AoCTe1CFfA9g1uqETvLaJZfFH6QoPSDf-L3KZQ-CDZ7q8g%40mail.gmail.com
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Amit Langote

Date:

13 October 2016, 11:27:54

On 2016/10/13 19:37, Ashutosh Bapat wrote:
>> In case we can not reach a foreign server during post-commit phase,
>> basically the transaction and following transaction should stop until
>> the crashed server revived.
> 
> I have repeatedly given reasons why this is not correct. You and Amit
> seem to repeat this statement again and again in turns without giving
> any concrete reasons about why this is so.

As mentioned in description of the "Commit" or "Completion" phase in the
Wikipedia article [1]:

* Success

If the coordinator received an agreement message from all cohorts during
the commit-request phase:

1. The coordinator sends a commit message to all the cohorts.

2. Each cohort completes the operation, and releases all the locks and  resources held during the transaction.

3. Each cohort sends an acknowledgment to the coordinator.

4. The coordinator completes the transaction when all acknowledgments  have been received.

* Failure

If any cohort votes No during the commit-request phase (or the
coordinator's timeout expires):

1. The coordinator sends a rollback message to all the cohorts.

2. Each cohort undoes the transaction using the undo log, and releases  the resources and locks held during the
transaction.

3. Each cohort sends an acknowledgement to the coordinator.

4. The coordinator undoes the transaction when all acknowledgements have  been received.

In point 4 of both commit and abort cases above, it's been said, "when
*all* acknowledgements have been received."

However, when I briefly read the description in "Transaction Management in
the R* Distributed Database Management System (C. Mohan et al)" [2], it
seems that what Ashutosh is saying might be a correct way to proceed after
all:

"""
2. THE TWO-PHASE COMMIT PROTOCOL

...

After the coordinator receives the votes from all its subordinates, it
initiates the second phase of the protocol. If all the votes were YES
VOTES, then the coordinator moves to the committing state by force-writing
a commit record and sending COMMIT messages to all the subordinates. The
completion of the force-write takes the transaction to its commit point.
Once this point is passed the user can be told that the transaction has
been committed.
...

"""

Sorry about the noise.

Thanks,
Amit

[1] https://en.wikipedia.org/wiki/Two-phase_commit_protocol#Commit_phase

[2] http://www.cs.cmu.edu/~natassa/courses/15-823/F02/papers/p378-mohan.pdf

Re: Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

17 October 2016, 07:33:39

On Thu, Oct 13, 2016 at 7:37 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>>>
>>> If we are successful in COMMITTING foreign transactions during
>>> post-commit phase, COMMIT message will be returned after we have
>>> committed all foreign transactions. But in case we can not reach a
>>> foreign server, and request times out, we can not revert back our
>>> decision that we are going to commit the transaction. That's my answer
>>> to the timeout based heuristic.
>>
>> IIUC 2PC is the protocol that assumes that all of the foreign server live.
>
> Do you have any references? Take a look at [1]. The first paragraph
> itself mentions that 2PC can achieve its goals despite temporary
> failures.

I guess that It doesn't mention that 2PC can it by ignoring temporary failures.
Even by waiting for the crashed server revives, 2PC can achieve its goals.

>> In case we can not reach a foreign server during post-commit phase,
>> basically the transaction and following transaction should stop until
>> the crashed server revived.
>
> I have repeatedly given reasons why this is not correct. You and Amit
> seem to repeat this statement again and again in turns without giving
> any concrete reasons about why this is so.
>
>> This is the first place to implement 2PC
>> for FDW, I think. The heuristically determination approach I mentioned
>> is one of the optimization idea to avoid holding up transaction in
>> case a foreign server crashed.
>>
>>> I don't see much point in holding up post-commit processing for a
>>> non-responsive foreign server, which may not respond for days
>>> together. Can you please elaborate a use case? Which commercial
>>> transaction manager does that?
>>
>> For example, the client updates a data on foreign server and then
>> commits. And the next transaction from the same client selects new
>> data which was updated on previous transaction. In this case, because
>> the first transaction is committed the second transaction should be
>> able to see updated data, but it can see old data in your idea. Since
>> these is obviously order between first transaction and second
>> transaction I think that It's not problem of providing consistent
>> view.
>
> 2PC doesn't guarantee this. For that you need other methods and
> protocols. We have discussed this before. [2]
>

At any rate, I think that it would confuse the user that there is no
guarantee that the latest data updated by previous transaction can be
seen by following transaction. I don't think that it's worth enough to
immolate in order to get better performance.
Providing atomic visibility for concurrency transaction would be
supported later.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Transactions involving multiple postgres foreign servers

From

Robert Haas

Date:

19 October 2016, 15:47:35

On Thu, Oct 13, 2016 at 7:27 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> However, when I briefly read the description in "Transaction Management in
> the R* Distributed Database Management System (C. Mohan et al)" [2], it
> seems that what Ashutosh is saying might be a correct way to proceed after
> all:

I think Ashutosh is mostly right, but I think there's a lot of room to
doubt whether the design of this patch is good enough that we should
adopt it.

Consider two possible designs.  In design #1, the leader performs the
commit locally and then tries to send COMMIT PREPARED to every standby
server afterward, and only then acknowledges the commit to the client.
In design #2, the leader performs the commit locally and then
acknowledges the commit to the client at once, leaving the task of
running COMMIT PREPARED to some background process.  Design #2
involves a race condition, because it's possible that the background
process might not complete COMMIT PREPARED on every node before the
user submits the next query, and that query might then fail to see
supposedly-committed changes.  This can't happen in design #1.  On the
other hand, there's always the possibility that the leader's session
is forcibly killed, even perhaps by pulling the plug.  If the
background process contemplated by design #2 is well-designed, it can
recover and finish sending COMMIT PREPARED to each relevant server
after the next restart.  In design #1, that background process doesn't
necessarily exist, so inevitably there is a possibility of orphaning
prepared transactions on the remote servers, which is not good. Even
if the DBA notices them, it won't be easy to figure out whether to
commit them or roll them back.

I think this thought experiment shows that, on the one hand, there is
a point to waiting for commits on the foreign servers, because it can
avoid the anomaly of not seeing the effects of your own commits.  On
the other hand, it's ridiculous to suppose that every case can be
handled by waiting, because that just isn't true.  You can't be sure
that you'll be able to wait long enough for COMMIT PREPARED to
complete, and even if that works out, you may not want to wait
indefinitely for a dead server.  Waiting for a ROLLBACK PREPARED has
no value whatsoever unless the system design is such that failing to
wait for it results in the ROLLBACK PREPARED never getting performed
-- which is a pretty poor excuse.

Moreover, there are good reasons to think that doing this kind of
cleanup work in the post-commit hooks is never going to be acceptable.
Generally, the post-commit hooks need to be no-fail, because it's too
late to throw an ERROR.  But there's very little hope that a
connection to a remote server can be no-fail; anything that involves a
network connection is, by definition, prone to failure.  We can try to
guarantee that every single bit of code that runs in the path that
sends COMMIT PREPARED only raises a WARNING or NOTICE rather than an
ERROR, but that's going to be quite difficult to do: even palloc() can
throw an error.  And what about interrupts?  We don't want to be stuck
inside this code for a long time without any hope of the user
recovering control of the session by pressing ^C, but of course the
way that works is it throws an ERROR, which we can't handle here.  We
fixed a similar issue for synchronous replication in
9a56dc3389b9470031e9ef8e45c95a680982e01a by making an interrupt emit a
WARNING in that case and then return control to the user.  But if we
do that here, all of the code that every FDW emits has to be aware of
that rule and follow it, and it just adds to the list of ways that the
user backend can escape this code without having cleaned up all of the
prepared transactions on the remote side.

It seems to me that the only way to really make this feature robust is
to have a background worker as part of the equation.  The background
worker launches at startup and looks around for local state that tells
it whether there are any COMMIT PREPARED or ROLLBACK PREPARED
operations pending that weren't completed during the last server
lifetime, whether because of a local crash or remote unavailability.
It attempts to complete those and retries periodically.  When a new
transaction needs this type of coordination, it adds the necessary
crash-proof state and then signals the background worker.  If
appropriate, it can wait for the background worker to complete, just
like a CHECKPOINT waits for the checkpointer to finish -- but if the
CHECKPOINT command is interrupted, the actual checkpoint is
unaffected.

More broadly, the question has been raised as to whether it's right to
try to handle atomic commit and atomic visibility as two separate
problems.  The XTM API proposed by Postgres Pro aims to address both
with a single stroke.  I don't think that API was well-designed, but
maybe the idea is good even if the code is not.  Generally, there are
two ways in which you could imagine that a distributed version of
PostgreSQL might work.  One possibility is that one node makes
everything work by going around and giving instructions to the other
nodes, which are more or less unaware that they are part of a cluster.
That is basically the design of Postgres-XC and certainly the design
being proposed here.  The other possibility is that the nodes are
actually clustered in some way and agree on things like whether a
transaction committed or what snapshot is current using some kind of
consensus protocol.  It is obviously possible to get a fairly long way
using the first approach but it seems likely that the second one is
fundamentally more powerful: among other things, because the first
approach is so centralized, the leader is apt to become a bottleneck.
And, quite apart from that, can a centralized architecture with the
leader manipulating the other workers ever allow for atomic
visibility?  If atomic visibility can build on top of atomic commit,
then it makes sense to do atomic commit first, but if we build this
infrastructure and then find that we need an altogether different
solution for atomic visibility, that will be unfortunate.

I know I was one of the people initially advocating this approach, but
I'm no longer convinced that it's going to work out well.  I don't
mean that we should abandon all work on this topic, or even less all
discussion, but I think we should be careful not to get so sucked into
the details of perfecting this particular patch that we ignore the
bigger design questions here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Bruce Momjian

Date:

19 October 2016, 17:08:49

On Wed, Oct 19, 2016 at 11:47:25AM -0400, Robert Haas wrote:
> It seems to me that the only way to really make this feature robust is
> to have a background worker as part of the equation.  The background
> worker launches at startup and looks around for local state that tells
> it whether there are any COMMIT PREPARED or ROLLBACK PREPARED
> operations pending that weren't completed during the last server
> lifetime, whether because of a local crash or remote unavailability.

Yes, you really need both commit on foreign servers before acknowledging
commit to the client, and a background process to clean things up from
an abandoned server.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

21 October 2016, 05:38:16

On Wed, Oct 19, 2016 at 9:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Oct 13, 2016 at 7:27 AM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> However, when I briefly read the description in "Transaction Management in
>> the R* Distributed Database Management System (C. Mohan et al)" [2], it
>> seems that what Ashutosh is saying might be a correct way to proceed after
>> all:
>
> I think Ashutosh is mostly right, but I think there's a lot of room to
> doubt whether the design of this patch is good enough that we should
> adopt it.
>
> Consider two possible designs.  In design #1, the leader performs the
> commit locally and then tries to send COMMIT PREPARED to every standby
> server afterward, and only then acknowledges the commit to the client.
> In design #2, the leader performs the commit locally and then
> acknowledges the commit to the client at once, leaving the task of
> running COMMIT PREPARED to some background process.  Design #2
> involves a race condition, because it's possible that the background
> process might not complete COMMIT PREPARED on every node before the
> user submits the next query, and that query might then fail to see
> supposedly-committed changes.  This can't happen in design #1.  On the
> other hand, there's always the possibility that the leader's session
> is forcibly killed, even perhaps by pulling the plug.  If the
> background process contemplated by design #2 is well-designed, it can
> recover and finish sending COMMIT PREPARED to each relevant server
> after the next restart.  In design #1, that background process doesn't
> necessarily exist, so inevitably there is a possibility of orphaning
> prepared transactions on the remote servers, which is not good. Even
> if the DBA notices them, it won't be easy to figure out whether to
> commit them or roll them back.
>
> I think this thought experiment shows that, on the one hand, there is
> a point to waiting for commits on the foreign servers, because it can
> avoid the anomaly of not seeing the effects of your own commits.  On
> the other hand, it's ridiculous to suppose that every case can be
> handled by waiting, because that just isn't true.  You can't be sure
> that you'll be able to wait long enough for COMMIT PREPARED to
> complete, and even if that works out, you may not want to wait
> indefinitely for a dead server.  Waiting for a ROLLBACK PREPARED has
> no value whatsoever unless the system design is such that failing to
> wait for it results in the ROLLBACK PREPARED never getting performed
> -- which is a pretty poor excuse.
>
> Moreover, there are good reasons to think that doing this kind of
> cleanup work in the post-commit hooks is never going to be acceptable.
> Generally, the post-commit hooks need to be no-fail, because it's too
> late to throw an ERROR.  But there's very little hope that a
> connection to a remote server can be no-fail; anything that involves a
> network connection is, by definition, prone to failure.  We can try to
> guarantee that every single bit of code that runs in the path that
> sends COMMIT PREPARED only raises a WARNING or NOTICE rather than an
> ERROR, but that's going to be quite difficult to do: even palloc() can
> throw an error.  And what about interrupts?  We don't want to be stuck
> inside this code for a long time without any hope of the user
> recovering control of the session by pressing ^C, but of course the
> way that works is it throws an ERROR, which we can't handle here.  We
> fixed a similar issue for synchronous replication in
> 9a56dc3389b9470031e9ef8e45c95a680982e01a by making an interrupt emit a
> WARNING in that case and then return control to the user.  But if we
> do that here, all of the code that every FDW emits has to be aware of
> that rule and follow it, and it just adds to the list of ways that the
> user backend can escape this code without having cleaned up all of the
> prepared transactions on the remote side.

Hmm, IIRC, my patch and possibly patch by Masahiko-san and Vinayak,
tries to resolve prepared transactions in post-commit code. I agree
with you here, that it should be avoided and the backend should take
over the job of resolving transactions.

>
> It seems to me that the only way to really make this feature robust is
> to have a background worker as part of the equation.  The background
> worker launches at startup and looks around for local state that tells
> it whether there are any COMMIT PREPARED or ROLLBACK PREPARED
> operations pending that weren't completed during the last server
> lifetime, whether because of a local crash or remote unavailability.
> It attempts to complete those and retries periodically.  When a new
> transaction needs this type of coordination, it adds the necessary
> crash-proof state and then signals the background worker.  If
> appropriate, it can wait for the background worker to complete, just
> like a CHECKPOINT waits for the checkpointer to finish -- but if the
> CHECKPOINT command is interrupted, the actual checkpoint is
> unaffected.

My patch and hence patch by Masahiko-san and Vinayak have the
background worker in the equation. The background worker tries to
resolve prepared transactions on the foreign server periodically.
IIRC, sending it a signal when another backend creates foreign
prepared transactions is not implemented. That may be a good addition.

>
> More broadly, the question has been raised as to whether it's right to
> try to handle atomic commit and atomic visibility as two separate
> problems.  The XTM API proposed by Postgres Pro aims to address both
> with a single stroke.  I don't think that API was well-designed, but
> maybe the idea is good even if the code is not.  Generally, there are
> two ways in which you could imagine that a distributed version of
> PostgreSQL might work.  One possibility is that one node makes
> everything work by going around and giving instructions to the other
> nodes, which are more or less unaware that they are part of a cluster.
> That is basically the design of Postgres-XC and certainly the design
> being proposed here.  The other possibility is that the nodes are
> actually clustered in some way and agree on things like whether a
> transaction committed or what snapshot is current using some kind of
> consensus protocol.  It is obviously possible to get a fairly long way
> using the first approach but it seems likely that the second one is
> fundamentally more powerful: among other things, because the first
> approach is so centralized, the leader is apt to become a bottleneck.
> And, quite apart from that, can a centralized architecture with the
> leader manipulating the other workers ever allow for atomic
> visibility?  If atomic visibility can build on top of atomic commit,
> then it makes sense to do atomic commit first, but if we build this
> infrastructure and then find that we need an altogether different
> solution for atomic visibility, that will be unfortunate.
>

There are two problems to solve as far as visibility is concerned. 1.
Consistency: changes by which transactions are visible to a given
transaction 2. Making visible, the changes by all the segments of a
given distributed transaction on different foreign servers, at the
same time IOW no other transaction sees changes by only few segments
but does not see changes by all the transactions.

First problem is hard to solve and there are many consistency
symantics. A large topic of discussion.

The second problem can be solved on top of this infrastructure by
extending PREPARE transaction API. I am writing down my ideas so that
they don't get lost. It's not a completed design.

Assume that we have syntax which tells the originating server which
prepared the transaction. PREPARE TRANSACTION <GID> FOR SERVER <local
server name> with ID <xid> ,where xid is the transaction identifier on
local server. OR we may incorporate that information in GID itself and
the foreign server knows how to decode it.

Once we have that information, the foreign server can actively poll
the local server to get the status of transaction xid and resolves the
prepared transaction itself. It can go a step further and inform the
local server that it has resolved the transaction, so that the local
server can purge it from it's own state. It can remember the fate of
xid, which can be consulted by another foreign server if the local
server is down. If another transaction on the foreign server stumbles
on a transaction prepared (but not resolved) by the local server,
foreign server has two options - 1. consult the local server and
resolve 2. if the first options fails to get the status of xid or that
if that option is not workable, throw an error e.g. indoubt
transaction. There is probably more network traffic happening here.
Usually, the local server should be able to resolve the transaction
before any other transaction stumbles upon it. The overhead is
incurred only when necessary.

-- 
Best Wishes
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

26 October 2016, 06:01:02

On Fri, Oct 21, 2016 at 2:38 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Wed, Oct 19, 2016 at 9:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Oct 13, 2016 at 7:27 AM, Amit Langote
>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>> However, when I briefly read the description in "Transaction Management in
>>> the R* Distributed Database Management System (C. Mohan et al)" [2], it
>>> seems that what Ashutosh is saying might be a correct way to proceed after
>>> all:
>>
>> I think Ashutosh is mostly right, but I think there's a lot of room to
>> doubt whether the design of this patch is good enough that we should
>> adopt it.
>>
>> Consider two possible designs.  In design #1, the leader performs the
>> commit locally and then tries to send COMMIT PREPARED to every standby
>> server afterward, and only then acknowledges the commit to the client.
>> In design #2, the leader performs the commit locally and then
>> acknowledges the commit to the client at once, leaving the task of
>> running COMMIT PREPARED to some background process.  Design #2
>> involves a race condition, because it's possible that the background
>> process might not complete COMMIT PREPARED on every node before the
>> user submits the next query, and that query might then fail to see
>> supposedly-committed changes.  This can't happen in design #1.  On the
>> other hand, there's always the possibility that the leader's session
>> is forcibly killed, even perhaps by pulling the plug.  If the
>> background process contemplated by design #2 is well-designed, it can
>> recover and finish sending COMMIT PREPARED to each relevant server
>> after the next restart.  In design #1, that background process doesn't
>> necessarily exist, so inevitably there is a possibility of orphaning
>> prepared transactions on the remote servers, which is not good. Even
>> if the DBA notices them, it won't be easy to figure out whether to
>> commit them or roll them back.
>>
>> I think this thought experiment shows that, on the one hand, there is
>> a point to waiting for commits on the foreign servers, because it can
>> avoid the anomaly of not seeing the effects of your own commits.  On
>> the other hand, it's ridiculous to suppose that every case can be
>> handled by waiting, because that just isn't true.  You can't be sure
>> that you'll be able to wait long enough for COMMIT PREPARED to
>> complete, and even if that works out, you may not want to wait
>> indefinitely for a dead server.  Waiting for a ROLLBACK PREPARED has
>> no value whatsoever unless the system design is such that failing to
>> wait for it results in the ROLLBACK PREPARED never getting performed
>> -- which is a pretty poor excuse.
>>
>> Moreover, there are good reasons to think that doing this kind of
>> cleanup work in the post-commit hooks is never going to be acceptable.
>> Generally, the post-commit hooks need to be no-fail, because it's too
>> late to throw an ERROR.  But there's very little hope that a
>> connection to a remote server can be no-fail; anything that involves a
>> network connection is, by definition, prone to failure.  We can try to
>> guarantee that every single bit of code that runs in the path that
>> sends COMMIT PREPARED only raises a WARNING or NOTICE rather than an
>> ERROR, but that's going to be quite difficult to do: even palloc() can
>> throw an error.  And what about interrupts?  We don't want to be stuck
>> inside this code for a long time without any hope of the user
>> recovering control of the session by pressing ^C, but of course the
>> way that works is it throws an ERROR, which we can't handle here.  We
>> fixed a similar issue for synchronous replication in
>> 9a56dc3389b9470031e9ef8e45c95a680982e01a by making an interrupt emit a
>> WARNING in that case and then return control to the user.  But if we
>> do that here, all of the code that every FDW emits has to be aware of
>> that rule and follow it, and it just adds to the list of ways that the
>> user backend can escape this code without having cleaned up all of the
>> prepared transactions on the remote side.
>
> Hmm, IIRC, my patch and possibly patch by Masahiko-san and Vinayak,
> tries to resolve prepared transactions in post-commit code. I agree
> with you here, that it should be avoided and the backend should take
> over the job of resolving transactions.
>
>>
>> It seems to me that the only way to really make this feature robust is
>> to have a background worker as part of the equation.  The background
>> worker launches at startup and looks around for local state that tells
>> it whether there are any COMMIT PREPARED or ROLLBACK PREPARED
>> operations pending that weren't completed during the last server
>> lifetime, whether because of a local crash or remote unavailability.
>> It attempts to complete those and retries periodically.  When a new
>> transaction needs this type of coordination, it adds the necessary
>> crash-proof state and then signals the background worker.  If
>> appropriate, it can wait for the background worker to complete, just
>> like a CHECKPOINT waits for the checkpointer to finish -- but if the
>> CHECKPOINT command is interrupted, the actual checkpoint is
>> unaffected.
>
> My patch and hence patch by Masahiko-san and Vinayak have the
> background worker in the equation. The background worker tries to
> resolve prepared transactions on the foreign server periodically.
> IIRC, sending it a signal when another backend creates foreign
> prepared transactions is not implemented. That may be a good addition.
>
>>
>> More broadly, the question has been raised as to whether it's right to
>> try to handle atomic commit and atomic visibility as two separate
>> problems.  The XTM API proposed by Postgres Pro aims to address both
>> with a single stroke.  I don't think that API was well-designed, but
>> maybe the idea is good even if the code is not.  Generally, there are
>> two ways in which you could imagine that a distributed version of
>> PostgreSQL might work.  One possibility is that one node makes
>> everything work by going around and giving instructions to the other
>> nodes, which are more or less unaware that they are part of a cluster.
>> That is basically the design of Postgres-XC and certainly the design
>> being proposed here.  The other possibility is that the nodes are
>> actually clustered in some way and agree on things like whether a
>> transaction committed or what snapshot is current using some kind of
>> consensus protocol.  It is obviously possible to get a fairly long way
>> using the first approach but it seems likely that the second one is
>> fundamentally more powerful: among other things, because the first
>> approach is so centralized, the leader is apt to become a bottleneck.
>> And, quite apart from that, can a centralized architecture with the
>> leader manipulating the other workers ever allow for atomic
>> visibility?  If atomic visibility can build on top of atomic commit,
>> then it makes sense to do atomic commit first, but if we build this
>> infrastructure and then find that we need an altogether different
>> solution for atomic visibility, that will be unfortunate.
>>
>
> There are two problems to solve as far as visibility is concerned. 1.
> Consistency: changes by which transactions are visible to a given
> transaction 2. Making visible, the changes by all the segments of a
> given distributed transaction on different foreign servers, at the
> same time IOW no other transaction sees changes by only few segments
> but does not see changes by all the transactions.
>
> First problem is hard to solve and there are many consistency
> symantics. A large topic of discussion.
>
> The second problem can be solved on top of this infrastructure by
> extending PREPARE transaction API. I am writing down my ideas so that
> they don't get lost. It's not a completed design.
>
> Assume that we have syntax which tells the originating server which
> prepared the transaction. PREPARE TRANSACTION <GID> FOR SERVER <local
> server name> with ID <xid> ,where xid is the transaction identifier on
> local server. OR we may incorporate that information in GID itself and
> the foreign server knows how to decode it.
>
> Once we have that information, the foreign server can actively poll
> the local server to get the status of transaction xid and resolves the
> prepared transaction itself. It can go a step further and inform the
> local server that it has resolved the transaction, so that the local
> server can purge it from it's own state. It can remember the fate of
> xid, which can be consulted by another foreign server if the local
> server is down. If another transaction on the foreign server stumbles
> on a transaction prepared (but not resolved) by the local server,
> foreign server has two options - 1. consult the local server and
> resolve 2. if the first options fails to get the status of xid or that
> if that option is not workable, throw an error e.g. indoubt
> transaction. There is probably more network traffic happening here.
> Usually, the local server should be able to resolve the transaction
> before any other transaction stumbles upon it. The overhead is
> incurred only when necessary.
>

I think we can consider the atomic commit and the atomic visibility
separately, and the atomic visibility can build on the top of the
atomic commit. We can't provide the atomic visibility across multiple
nodes without consistent update. So I'd like to focus on atomic commit
in this thread. Considering to providing the atomic commit, the two
phase commit protocol is the perfect solution for providing atomic
commit. Whatever type of solution for atomic visibility we have, the
atomic commit by 2PC is necessary feature. We can consider to have the
atomic commit feature that ha following functionalities.* The local node is responsible for the transaction management
among
relevant remote servers using 2PC.* The local node has information about the state of distributed
transaction state.* There is a process resolving in-doubt transaction.

As Ashutosh mentioned, current patch supports almost these
functionalities. But I'm trying to update it so that it can have
multiple foreign server information into one FDWXact file, one entry
on shared buffer. Because in spite of that new remote server can be
added on the fly, we could need to restart local server in order to
allocate the more large shared buffer for fdw transaction whenever
remote server is added. Also I'm incorporating other comments.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Transactions involving multiple postgres foreign servers

From

Robert Haas

Date:

27 October 2016, 18:19:06

On Fri, Oct 21, 2016 at 1:38 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> Once we have that information, the foreign server can actively poll
> the local server to get the status of transaction xid and resolves the
> prepared transaction itself. It can go a step further and inform the
> local server that it has resolved the transaction, so that the local
> server can purge it from it's own state. It can remember the fate of
> xid, which can be consulted by another foreign server if the local
> server is down. If another transaction on the foreign server stumbles
> on a transaction prepared (but not resolved) by the local server,
> foreign server has two options - 1. consult the local server and
> resolve 2. if the first options fails to get the status of xid or that
> if that option is not workable, throw an error e.g. indoubt
> transaction. There is probably more network traffic happening here.
> Usually, the local server should be able to resolve the transaction
> before any other transaction stumbles upon it. The overhead is
> incurred only when necessary.

Yes, something like this could be done.  It's pretty complicated, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Robert Haas

Date:

27 October 2016, 18:20:02

On Wed, Oct 26, 2016 at 2:00 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I think we can consider the atomic commit and the atomic visibility
> separately, and the atomic visibility can build on the top of the
> atomic commit.

It is true that we can do that, but I'm not sure whether it's the best design.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

31 October 2016, 00:48:23

On Fri, Oct 28, 2016 at 3:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Oct 26, 2016 at 2:00 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> I think we can consider the atomic commit and the atomic visibility
>> separately, and the atomic visibility can build on the top of the
>> atomic commit.
>
> It is true that we can do that, but I'm not sure whether it's the best design.

I'm not sure best design, too. We need to discuss more. But this is
not a particular feature for the sharing solution. The atomic commit
using 2PC is useful for other servers that can use 2PC, not only
postgres_fdw.

Attached latest 3 patches that incorporated review comments so far.
But recovery speed improvement that is discussed on another thread is
not incorporated yet.
Please give me feedback.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

On Fri, Nov 11, 2016 at 5:38 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

2PC is a basic building block to support the atomic commit and there
are some optimizations way in order to reduce disadvantage of 2PC. As
you mentioned, it's hard to support a single model that would suit
several type of FDWs. But even if it's not a purpose for sharding,
because many other database which could be connected to PostgreSQL via
FDW supports 2PC, 2PC for FDW would be useful for not only sharding
purpose. That's why I was focusing on implementing 2PC for FDW so far.

Moved to next CF with "needs review" status.

Regards,

Hari Babu

Fujitsu Australia

Re: Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

05 December 2016, 05:43:05

On Mon, Dec 5, 2016 at 11:04 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>
>
> On Fri, Nov 11, 2016 at 5:38 PM, Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
>>
>>
>> 2PC is a basic building block to support the atomic commit and there
>> are some optimizations way in order to reduce disadvantage of 2PC. As
>> you mentioned, it's hard to support a single model that would suit
>> several type of FDWs. But even if it's not a purpose for sharding,
>> because many other database which could be connected to PostgreSQL via
>> FDW supports 2PC, 2PC for FDW would be useful for not only sharding
>> purpose. That's why I was focusing on implementing 2PC for FDW so far.
>
>
> Moved to next CF with "needs review" status.

I think this should be changed to "returned with feedback.". The
design and approach itself needs to be discussed. I think, we should
let authors decide whether they want it to be added to the next
commitfest or not.

When I first started with this work, Tom had suggested me to try to
make PREPARE and COMMIT/ROLLBACK PREPARED involving foreign servers or
at least postgres_fdw servers work. I think, most of my work that
Vinayak and Sawada have rebased to the latest master will be required
for getting what Tom suggested done. We wouldn't need a lot of changes
to that design. PREPARE involving foreign servers errors out right
now. If we start supporting prepared transactions involving foreign
servers that will be a good improvement over the current status-quo.
Once we get that done, we can continue working on the larger problem
of supporting ACID transactions involving foreign servers.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Transactions involving multiple postgres foreign servers

From

Haribabu Kommi

Date:

05 December 2016, 05:55:36

On Mon, Dec 5, 2016 at 4:42 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

On Mon, Dec 5, 2016 at 11:04 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>
>
> On Fri, Nov 11, 2016 at 5:38 PM, Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
>>
>>
>> 2PC is a basic building block to support the atomic commit and there
>> are some optimizations way in order to reduce disadvantage of 2PC. As
>> you mentioned, it's hard to support a single model that would suit
>> several type of FDWs. But even if it's not a purpose for sharding,
>> because many other database which could be connected to PostgreSQL via
>> FDW supports 2PC, 2PC for FDW would be useful for not only sharding
>> purpose. That's why I was focusing on implementing 2PC for FDW so far.
>
>
> Moved to next CF with "needs review" status.

I think this should be changed to "returned with feedback.". The
design and approach itself needs to be discussed. I think, we should
let authors decide whether they want it to be added to the next
commitfest or not.

When I first started with this work, Tom had suggested me to try to
make PREPARE and COMMIT/ROLLBACK PREPARED involving foreign servers or
at least postgres_fdw servers work. I think, most of my work that
Vinayak and Sawada have rebased to the latest master will be required
for getting what Tom suggested done. We wouldn't need a lot of changes
to that design. PREPARE involving foreign servers errors out right
now. If we start supporting prepared transactions involving foreign
servers that will be a good improvement over the current status-quo.
Once we get that done, we can continue working on the larger problem
of supporting ACID transactions involving foreign servers.

Thanks for the update.

I closed it in commitfest 2017-01 with "returned with feedback". Author can

update it once the new patch is submitted.

Regards,

Hari Babu

Fujitsu Australia

Re: [HACKERS] Transactions involving multiple postgres foreignservers

From

vinayak

Date:

09 December 2016, 09:02:54

On 2016/12/05 14:42, Ashutosh Bapat wrote:
> On Mon, Dec 5, 2016 at 11:04 AM, Haribabu Kommi
> <kommi.haribabu@gmail.com> wrote:
>
>
> On Fri, Nov 11, 2016 at 5:38 PM, Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
>>>
>>> 2PC is a basic building block to support the atomic commit and there
>>> are some optimizations way in order to reduce disadvantage of 2PC. As
>>> you mentioned, it's hard to support a single model that would suit
>>> several type of FDWs. But even if it's not a purpose for sharding,
>>> because many other database which could be connected to PostgreSQL via
>>> FDW supports 2PC, 2PC for FDW would be useful for not only sharding
>>> purpose. That's why I was focusing on implementing 2PC for FDW so far.
>>
>> Moved to next CF with "needs review" status.
> I think this should be changed to "returned with feedback.". The
> design and approach itself needs to be discussed. I think, we should
> let authors decide whether they want it to be added to the next
> commitfest or not.
>
> When I first started with this work, Tom had suggested me to try to
> make PREPARE and COMMIT/ROLLBACK PREPARED involving foreign servers or
> at least postgres_fdw servers work. I think, most of my work that
> Vinayak and Sawada have rebased to the latest master will be required
> for getting what Tom suggested done. We wouldn't need a lot of changes
> to that design. PREPARE involving foreign servers errors out right
> now. If we start supporting prepared transactions involving foreign
> servers that will be a good improvement over the current status-quo.
> Once we get that done, we can continue working on the larger problem
> of supporting ACID transactions involving foreign servers.
In the pgconf ASIA depelopers meeting Bruce Momjian and other developers 
discussed
on FDW based sharding [1]. The suggestions from other hackers was that 
we need to discuss
the big picture and use cases of sharding. Bruce has listed all the 
building blocks of built-in sharding
on wiki [2]. IIUC,transaction manager involving foreign servers is one 
part of sharding.
As per the Bruce's wiki page there are two use cases for transactions 
involved multiple foreign servers:
1. Cross-node read-only queries on read/write shards:    This will require a global snapshot manager to make sure the
shards
 
return consistent data.
2. Cross-node read-write queries:    This will require a global snapshot manager and global transaction 
manager.

I agree with you that if we start supporting PREPARE and COMMIT/ROLLBACK 
PREPARED
involving foreign servers that will be good improvement.

[1] https://wiki.postgresql.org/wiki/PgConf.Asia_2016_Developer_Meeting
[2] https://wiki.postgresql.org/wiki/Built-in_Sharding

Regards,
Vinayak Pokale
NTT Opern Source Software Center

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

09 December 2016, 10:02:23

On Fri, Dec 9, 2016 at 3:02 PM, vinayak <Pokale_Vinayak_q3@lab.ntt.co.jp> wrote:
> On 2016/12/05 14:42, Ashutosh Bapat wrote:
>>
>> On Mon, Dec 5, 2016 at 11:04 AM, Haribabu Kommi
>> <kommi.haribabu@gmail.com> wrote:
>>
>>
>> On Fri, Nov 11, 2016 at 5:38 PM, Masahiko Sawada <sawada.mshk@gmail.com>
>> wrote:
>>>>
>>>>
>>>> 2PC is a basic building block to support the atomic commit and there
>>>> are some optimizations way in order to reduce disadvantage of 2PC. As
>>>> you mentioned, it's hard to support a single model that would suit
>>>> several type of FDWs. But even if it's not a purpose for sharding,
>>>> because many other database which could be connected to PostgreSQL via
>>>> FDW supports 2PC, 2PC for FDW would be useful for not only sharding
>>>> purpose. That's why I was focusing on implementing 2PC for FDW so far.
>>>
>>>
>>> Moved to next CF with "needs review" status.
>>
>> I think this should be changed to "returned with feedback.". The
>> design and approach itself needs to be discussed. I think, we should
>> let authors decide whether they want it to be added to the next
>> commitfest or not.
>>
>> When I first started with this work, Tom had suggested me to try to
>> make PREPARE and COMMIT/ROLLBACK PREPARED involving foreign servers or
>> at least postgres_fdw servers work. I think, most of my work that
>> Vinayak and Sawada have rebased to the latest master will be required
>> for getting what Tom suggested done. We wouldn't need a lot of changes
>> to that design. PREPARE involving foreign servers errors out right
>> now. If we start supporting prepared transactions involving foreign
>> servers that will be a good improvement over the current status-quo.
>> Once we get that done, we can continue working on the larger problem
>> of supporting ACID transactions involving foreign servers.
>
> In the pgconf ASIA depelopers meeting Bruce Momjian and other developers
> discussed
> on FDW based sharding [1]. The suggestions from other hackers was that we
> need to discuss
> the big picture and use cases of sharding. Bruce has listed all the building
> blocks of built-in sharding
> on wiki [2]. IIUC,transaction manager involving foreign servers is one part
> of sharding.

Yeah, the 2PC on FDW is a basic building block for FDW based sharding
and it would be useful not only FDW sharding but also other purposes.
As far as I surveyed some papers the many kinds of distributed
transaction management architectures use the 2PC for atomic commit
with some optimisations. And using 2PC to provide atomic commit on
distributed transaction has much affinity with current PostgreSQL
implementation from some perspective.

> As per the Bruce's wiki page there are two use cases for transactions
> involved multiple foreign servers:
> 1. Cross-node read-only queries on read/write shards:
>     This will require a global snapshot manager to make sure the shards
> return consistent data.
> 2. Cross-node read-write queries:
>     This will require a global snapshot manager and global transaction
> manager.
>
> I agree with you that if we start supporting PREPARE and COMMIT/ROLLBACK
> PREPARED
> involving foreign servers that will be good improvement.
>
> [1] https://wiki.postgresql.org/wiki/PgConf.Asia_2016_Developer_Meeting
> [2] https://wiki.postgresql.org/wiki/Built-in_Sharding
>

I also agree to work on implementing the atomic commit across the
foreign servers and then continue to work on the more larger problem.
I think that this will be large step forward. I'm going to submit the
updated version patch to CF3.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

22 December 2016, 19:49:30

On Fri, Dec 9, 2016 at 4:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Dec 9, 2016 at 3:02 PM, vinayak <Pokale_Vinayak_q3@lab.ntt.co.jp> wrote:
>> On 2016/12/05 14:42, Ashutosh Bapat wrote:
>>>
>>> On Mon, Dec 5, 2016 at 11:04 AM, Haribabu Kommi
>>> <kommi.haribabu@gmail.com> wrote:
>>>
>>>
>>> On Fri, Nov 11, 2016 at 5:38 PM, Masahiko Sawada <sawada.mshk@gmail.com>
>>> wrote:
>>>>>
>>>>>
>>>>> 2PC is a basic building block to support the atomic commit and there
>>>>> are some optimizations way in order to reduce disadvantage of 2PC. As
>>>>> you mentioned, it's hard to support a single model that would suit
>>>>> several type of FDWs. But even if it's not a purpose for sharding,
>>>>> because many other database which could be connected to PostgreSQL via
>>>>> FDW supports 2PC, 2PC for FDW would be useful for not only sharding
>>>>> purpose. That's why I was focusing on implementing 2PC for FDW so far.
>>>>
>>>>
>>>> Moved to next CF with "needs review" status.
>>>
>>> I think this should be changed to "returned with feedback.". The
>>> design and approach itself needs to be discussed. I think, we should
>>> let authors decide whether they want it to be added to the next
>>> commitfest or not.
>>>
>>> When I first started with this work, Tom had suggested me to try to
>>> make PREPARE and COMMIT/ROLLBACK PREPARED involving foreign servers or
>>> at least postgres_fdw servers work. I think, most of my work that
>>> Vinayak and Sawada have rebased to the latest master will be required
>>> for getting what Tom suggested done. We wouldn't need a lot of changes
>>> to that design. PREPARE involving foreign servers errors out right
>>> now. If we start supporting prepared transactions involving foreign
>>> servers that will be a good improvement over the current status-quo.
>>> Once we get that done, we can continue working on the larger problem
>>> of supporting ACID transactions involving foreign servers.
>>
>> In the pgconf ASIA depelopers meeting Bruce Momjian and other developers
>> discussed
>> on FDW based sharding [1]. The suggestions from other hackers was that we
>> need to discuss
>> the big picture and use cases of sharding. Bruce has listed all the building
>> blocks of built-in sharding
>> on wiki [2]. IIUC,transaction manager involving foreign servers is one part
>> of sharding.
>
> Yeah, the 2PC on FDW is a basic building block for FDW based sharding
> and it would be useful not only FDW sharding but also other purposes.
> As far as I surveyed some papers the many kinds of distributed
> transaction management architectures use the 2PC for atomic commit
> with some optimisations. And using 2PC to provide atomic commit on
> distributed transaction has much affinity with current PostgreSQL
> implementation from some perspective.
>
>> As per the Bruce's wiki page there are two use cases for transactions
>> involved multiple foreign servers:
>> 1. Cross-node read-only queries on read/write shards:
>>     This will require a global snapshot manager to make sure the shards
>> return consistent data.
>> 2. Cross-node read-write queries:
>>     This will require a global snapshot manager and global transaction
>> manager.
>>
>> I agree with you that if we start supporting PREPARE and COMMIT/ROLLBACK
>> PREPARED
>> involving foreign servers that will be good improvement.
>>
>> [1] https://wiki.postgresql.org/wiki/PgConf.Asia_2016_Developer_Meeting
>> [2] https://wiki.postgresql.org/wiki/Built-in_Sharding
>>
>
> I also agree to work on implementing the atomic commit across the
> foreign servers and then continue to work on the more larger problem.
> I think that this will be large step forward. I'm going to submit the
> updated version patch to CF3.

Attached latest version patches. Almost design is the same as previous
patches and I incorporated some optimisations and updated
documentation. But the documentation and regression test is not still
enough.

000 patch adds some new FDW APIs to achive the atomic commit involving
the foreign servers using two-phase-commit. If more than one foreign
servers involve with the transaction or the transaction changes local
data and involves even one foreign server, local node executes PREPARE
and COMMIT/ROLLBACK PREPARED on foreign servers at commit. A lot of
part of this implementation is inspired by two phase commit code. So I
incorporated recent changes of two phase commit code, for example
recovery speed improvement, into this patch.
001 patch makes postgres_fdw support atomic commit. If
two_phase_commit is set 'on' to a foreign server, the two-phase-commit
will be used at commit. 002 patch adds the pg_fdw_resolver new contrib
module that is a bgworker process that resolves the in-doubt
transaction on foreign server if there is.

The reply might be late next week but feedback and review comment are
very welcome.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Fri, Jan 13, 2017 at 3:48 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Jan 13, 2017 at 3:20 PM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>>>>
>>>
>>> Long time passed since original patch proposed by Ashutosh, so I
>>> explain again about current design and functionality of this feature.
>>> If you have any question, please feel free to ask.
>>
>> Thanks for the summary.
>>
>>>
>>> Parameters
>>> ==========
>>
>> [ snip ]
>>
>>>
>>> Cluster-wide atomic commit
>>> =======================
>>> Since the distributed transaction commit on foreign servers are
>>> executed independently, the transaction that modified data on the
>>> multiple foreign servers is not ensured that transaction did either
>>> all of them commit or all of them rollback. The patch adds the
>>> functionality that guarantees distributed transaction did either
>>> commit or rollback on all foreign servers. IOW the goal of this patch
>>> is achieving the cluster-wide atomic commit across foreign server that
>>> is capable two phase commit protocol.
>>
>> In [1], I proposed that we solve the problem of supporting PREPARED
>> transactions involving foreign servers and in subsequent mail Vinayak
>> agreed to that. But this goal has wider scope than that proposal. I am
>> fine widening the scope, but then it would again lead to the same
>> discussion we had about the big picture. May be you want to share
>> design (or point out the parts of this design that will help) for
>> solving smaller problem and tone down the patch for the same.
>>
>
> Sorry for confuse you. I'm still focusing on solving only that
> problem. What I was trying to say is that I think that supporting
> PREPARED transaction involving foreign server is the means, not the
> end. So once we supports PREPARED transaction involving foreign
> servers we can achieve cluster-wide atomic commit in a sense.
>

Attached updated patches. I fixed some bugs and add 003 patch that
adds TAP test for foreign transaction.
003 patch depends 000 and 001 patch.

Please give me feedback.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Transactions involving multiple postgres foreignservers

From

vinayak

Date:

19 January 2017, 10:04:38

On 2017/01/16 17:35, Masahiko Sawada wrote:
> On Fri, Jan 13, 2017 at 3:48 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Fri, Jan 13, 2017 at 3:20 PM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>>>> Long time passed since original patch proposed by Ashutosh, so I
>>>> explain again about current design and functionality of this feature.
>>>> If you have any question, please feel free to ask.
>>> Thanks for the summary.
>>>
>>>> Parameters
>>>> ==========
>>> [ snip ]
>>>
>>>> Cluster-wide atomic commit
>>>> =======================
>>>> Since the distributed transaction commit on foreign servers are
>>>> executed independently, the transaction that modified data on the
>>>> multiple foreign servers is not ensured that transaction did either
>>>> all of them commit or all of them rollback. The patch adds the
>>>> functionality that guarantees distributed transaction did either
>>>> commit or rollback on all foreign servers. IOW the goal of this patch
>>>> is achieving the cluster-wide atomic commit across foreign server that
>>>> is capable two phase commit protocol.
>>> In [1], I proposed that we solve the problem of supporting PREPARED
>>> transactions involving foreign servers and in subsequent mail Vinayak
>>> agreed to that. But this goal has wider scope than that proposal. I am
>>> fine widening the scope, but then it would again lead to the same
>>> discussion we had about the big picture. May be you want to share
>>> design (or point out the parts of this design that will help) for
>>> solving smaller problem and tone down the patch for the same.
>>>
>> Sorry for confuse you. I'm still focusing on solving only that
>> problem. What I was trying to say is that I think that supporting
>> PREPARED transaction involving foreign server is the means, not the
>> end. So once we supports PREPARED transaction involving foreign
>> servers we can achieve cluster-wide atomic commit in a sense.
>>
> Attached updated patches. I fixed some bugs and add 003 patch that
> adds TAP test for foreign transaction.
> 003 patch depends 000 and 001 patch.
>
> Please give me feedback.

I have tested prepared transactions with foreign servers but after 
preparing the transaction
the following error occur infinitely.
Test:
=====
=#BEGIN;
=#INSERT INTO ft1_lt VALUES (10);
=#INSERT INTO ft2_lt VALUES (20);
=#PREPARE TRANSACTION 'prep_xact_with_fdw';

2017-01-18 15:09:48.378 JST [4312] ERROR:  function pg_fdw_resolve() 
does not exist at character 8
2017-01-18 15:09:48.378 JST [4312] HINT:  No function matches the given 
name and argument types. You might need to add explicit type casts.
2017-01-18 15:09:48.378 JST [4312] QUERY:  SELECT pg_fdw_resolve()
2017-01-18 15:09:48.378 JST [29224] LOG:  worker process: foreign 
transaction resolver (dbid 13119) (PID 4312) exited with exit code 1
.....

If we check the status on another session then it showing the status as 
prepared.
=# select * from pg_fdw_xacts; dbid  | transaction | serverid | userid |  status  | identifier
-------+-------------+----------+--------+----------+------------------------ 
 13119 |        1688 |    16388 |     10 | prepared | 
px_2102366504_16388_10 13119 |        1688 |    16389 |     10 | prepared | 
px_749056984_16389_10
(2 rows)

I think this is a bug.

Regards,
Vinayak Pokale
NTT Open Source Software Center

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

19 January 2017, 11:44:08

On Thu, Jan 19, 2017 at 4:04 PM, vinayak
<Pokale_Vinayak_q3@lab.ntt.co.jp> wrote:
>
> On 2017/01/16 17:35, Masahiko Sawada wrote:
>>
>> On Fri, Jan 13, 2017 at 3:48 PM, Masahiko Sawada <sawada.mshk@gmail.com>
>> wrote:
>>>
>>> On Fri, Jan 13, 2017 at 3:20 PM, Ashutosh Bapat
>>> <ashutosh.bapat@enterprisedb.com> wrote:
>>>>>
>>>>> Long time passed since original patch proposed by Ashutosh, so I
>>>>> explain again about current design and functionality of this feature.
>>>>> If you have any question, please feel free to ask.
>>>>
>>>> Thanks for the summary.
>>>>
>>>>> Parameters
>>>>> ==========
>>>>
>>>> [ snip ]
>>>>
>>>>> Cluster-wide atomic commit
>>>>> =======================
>>>>> Since the distributed transaction commit on foreign servers are
>>>>> executed independently, the transaction that modified data on the
>>>>> multiple foreign servers is not ensured that transaction did either
>>>>> all of them commit or all of them rollback. The patch adds the
>>>>> functionality that guarantees distributed transaction did either
>>>>> commit or rollback on all foreign servers. IOW the goal of this patch
>>>>> is achieving the cluster-wide atomic commit across foreign server that
>>>>> is capable two phase commit protocol.
>>>>
>>>> In [1], I proposed that we solve the problem of supporting PREPARED
>>>> transactions involving foreign servers and in subsequent mail Vinayak
>>>> agreed to that. But this goal has wider scope than that proposal. I am
>>>> fine widening the scope, but then it would again lead to the same
>>>> discussion we had about the big picture. May be you want to share
>>>> design (or point out the parts of this design that will help) for
>>>> solving smaller problem and tone down the patch for the same.
>>>>
>>> Sorry for confuse you. I'm still focusing on solving only that
>>> problem. What I was trying to say is that I think that supporting
>>> PREPARED transaction involving foreign server is the means, not the
>>> end. So once we supports PREPARED transaction involving foreign
>>> servers we can achieve cluster-wide atomic commit in a sense.
>>>
>> Attached updated patches. I fixed some bugs and add 003 patch that
>> adds TAP test for foreign transaction.
>> 003 patch depends 000 and 001 patch.
>>
>> Please give me feedback.
>
>
> I have tested prepared transactions with foreign servers but after preparing
> the transaction
> the following error occur infinitely.
> Test:
> =====
> =#BEGIN;
> =#INSERT INTO ft1_lt VALUES (10);
> =#INSERT INTO ft2_lt VALUES (20);
> =#PREPARE TRANSACTION 'prep_xact_with_fdw';
>
> 2017-01-18 15:09:48.378 JST [4312] ERROR:  function pg_fdw_resolve() does
> not exist at character 8
> 2017-01-18 15:09:48.378 JST [4312] HINT:  No function matches the given name
> and argument types. You might need to add explicit type casts.
> 2017-01-18 15:09:48.378 JST [4312] QUERY:  SELECT pg_fdw_resolve()
> 2017-01-18 15:09:48.378 JST [29224] LOG:  worker process: foreign
> transaction resolver (dbid 13119) (PID 4312) exited with exit code 1
> .....
>
> If we check the status on another session then it showing the status as
> prepared.
> =# select * from pg_fdw_xacts;
>  dbid  | transaction | serverid | userid |  status  | identifier
> -------+-------------+----------+--------+----------+------------------------
>  13119 |        1688 |    16388 |     10 | prepared | px_2102366504_16388_10
>  13119 |        1688 |    16389 |     10 | prepared | px_749056984_16389_10
> (2 rows)
>
> I think this is a bug.
>

Thank you for reviewing!

I think this is a bug of pg_fdw_resolver contrib module. I had
forgotten to change the SQL executed by pg_fdw_resolver process.
Attached latest version 002 patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

002_pg_fdw_resolver_contrib_v5.patch

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

26 January 2017, 10:51:29

On Thu, Jan 19, 2017 at 5:44 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Jan 19, 2017 at 4:04 PM, vinayak
> <Pokale_Vinayak_q3@lab.ntt.co.jp> wrote:
>>
>> On 2017/01/16 17:35, Masahiko Sawada wrote:
>>>
>>> On Fri, Jan 13, 2017 at 3:48 PM, Masahiko Sawada <sawada.mshk@gmail.com>
>>> wrote:
>>>>
>>>> On Fri, Jan 13, 2017 at 3:20 PM, Ashutosh Bapat
>>>> <ashutosh.bapat@enterprisedb.com> wrote:
>>>>>>
>>>>>> Long time passed since original patch proposed by Ashutosh, so I
>>>>>> explain again about current design and functionality of this feature.
>>>>>> If you have any question, please feel free to ask.
>>>>>
>>>>> Thanks for the summary.
>>>>>
>>>>>> Parameters
>>>>>> ==========
>>>>>
>>>>> [ snip ]
>>>>>
>>>>>> Cluster-wide atomic commit
>>>>>> =======================
>>>>>> Since the distributed transaction commit on foreign servers are
>>>>>> executed independently, the transaction that modified data on the
>>>>>> multiple foreign servers is not ensured that transaction did either
>>>>>> all of them commit or all of them rollback. The patch adds the
>>>>>> functionality that guarantees distributed transaction did either
>>>>>> commit or rollback on all foreign servers. IOW the goal of this patch
>>>>>> is achieving the cluster-wide atomic commit across foreign server that
>>>>>> is capable two phase commit protocol.
>>>>>
>>>>> In [1], I proposed that we solve the problem of supporting PREPARED
>>>>> transactions involving foreign servers and in subsequent mail Vinayak
>>>>> agreed to that. But this goal has wider scope than that proposal. I am
>>>>> fine widening the scope, but then it would again lead to the same
>>>>> discussion we had about the big picture. May be you want to share
>>>>> design (or point out the parts of this design that will help) for
>>>>> solving smaller problem and tone down the patch for the same.
>>>>>
>>>> Sorry for confuse you. I'm still focusing on solving only that
>>>> problem. What I was trying to say is that I think that supporting
>>>> PREPARED transaction involving foreign server is the means, not the
>>>> end. So once we supports PREPARED transaction involving foreign
>>>> servers we can achieve cluster-wide atomic commit in a sense.
>>>>
>>> Attached updated patches. I fixed some bugs and add 003 patch that
>>> adds TAP test for foreign transaction.
>>> 003 patch depends 000 and 001 patch.
>>>
>>> Please give me feedback.
>>
>>
>> I have tested prepared transactions with foreign servers but after preparing
>> the transaction
>> the following error occur infinitely.
>> Test:
>> =====
>> =#BEGIN;
>> =#INSERT INTO ft1_lt VALUES (10);
>> =#INSERT INTO ft2_lt VALUES (20);
>> =#PREPARE TRANSACTION 'prep_xact_with_fdw';
>>
>> 2017-01-18 15:09:48.378 JST [4312] ERROR:  function pg_fdw_resolve() does
>> not exist at character 8
>> 2017-01-18 15:09:48.378 JST [4312] HINT:  No function matches the given name
>> and argument types. You might need to add explicit type casts.
>> 2017-01-18 15:09:48.378 JST [4312] QUERY:  SELECT pg_fdw_resolve()
>> 2017-01-18 15:09:48.378 JST [29224] LOG:  worker process: foreign
>> transaction resolver (dbid 13119) (PID 4312) exited with exit code 1
>> .....
>>
>> If we check the status on another session then it showing the status as
>> prepared.
>> =# select * from pg_fdw_xacts;
>>  dbid  | transaction | serverid | userid |  status  | identifier
>> -------+-------------+----------+--------+----------+------------------------
>>  13119 |        1688 |    16388 |     10 | prepared | px_2102366504_16388_10
>>  13119 |        1688 |    16389 |     10 | prepared | px_749056984_16389_10
>> (2 rows)
>>
>> I think this is a bug.
>>
>
> Thank you for reviewing!
>
> I think this is a bug of pg_fdw_resolver contrib module. I had
> forgotten to change the SQL executed by pg_fdw_resolver process.
> Attached latest version 002 patch.
>

As previous version patch conflicts to current HEAD, attached updated
version patches. Also I fixed some bugs in pg_fdw_xact_resolver and
added some documentations.
Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Transactions involving multiple postgres foreignservers

From

vinayak

Date:

26 January 2017, 12:04:10

Hi Sawada-san,

On 2017/01/26 16:51, Masahiko Sawada wrote:

Thank you for reviewing!

I think this is a bug of pg_fdw_resolver contrib module. I had
forgotten to change the SQL executed by pg_fdw_resolver process.
Attached latest version 002 patch.

As previous version patch conflicts to current HEAD, attached updated
version patches. Also I fixed some bugs in pg_fdw_xact_resolver and
added some documentations.
Please review it.

Thank you updating the patches.

I have applied patches on Postgres HEAD.
I have created the postgres=fdw extension in PostgreSQL and then I got segmentation fault.
Details:
=# 2017-01-26 17:52:56.156 JST [3411] LOG: worker process: foreign transaction resolver launcher (PID 3418) was terminated by signal 11: Segmentation fault
2017-01-26 17:52:56.156 JST [3411] LOG: terminating any other active server processes
2017-01-26 17:52:56.156 JST [3425] WARNING: terminating connection because of crash of another server process
2017-01-26 17:52:56.156 JST [3425] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2017-01-26 17:52:56.156 JST [3425] HINT: In a moment you should be able to reconnect to the database and repeat your command.

Is this a bug?

Regards,
Vinayak Pokale
NTT Open Source Software Center

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

26 January 2017, 12:49:51

On Thu, Jan 26, 2017 at 6:04 PM, vinayak
<Pokale_Vinayak_q3@lab.ntt.co.jp> wrote:
> Hi Sawada-san,
>
> On 2017/01/26 16:51, Masahiko Sawada wrote:
>
> Thank you for reviewing!
>
> I think this is a bug of pg_fdw_resolver contrib module. I had
> forgotten to change the SQL executed by pg_fdw_resolver process.
> Attached latest version 002 patch.
>
> As previous version patch conflicts to current HEAD, attached updated
> version patches. Also I fixed some bugs in pg_fdw_xact_resolver and
> added some documentations.
> Please review it.
>
> Thank you updating the patches.
>
> I have applied patches on Postgres HEAD.
> I have created the postgres=fdw extension in PostgreSQL and then I got
> segmentation fault.
> Details:
> =# 2017-01-26 17:52:56.156 JST [3411] LOG:  worker process: foreign
> transaction resolver launcher (PID 3418) was terminated by signal 11:
> Segmentation fault
> 2017-01-26 17:52:56.156 JST [3411] LOG:  terminating any other active server
> processes
> 2017-01-26 17:52:56.156 JST [3425] WARNING:  terminating connection because
> of crash of another server process
> 2017-01-26 17:52:56.156 JST [3425] DETAIL:  The postmaster has commanded
> this server process to roll back the current transaction and exit, because
> another server process exited abnormally and possibly corrupted shared
> memory.
> 2017-01-26 17:52:56.156 JST [3425] HINT:  In a moment you should be able to
> reconnect to the database and repeat your command.
>
> Is this a bug?
>

Thank you for testing!

Sorry, I attached wrong version patch of pg_fdw_xact_resovler. Please
use attached patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

002_pg_fdw_resolver_contrib_v7.patch

Re: [HACKERS] Transactions involving multiple postgres foreignservers

From

Peter Eisentraut

Date:

28 January 2017, 18:11:32

On 1/26/17 4:49 AM, Masahiko Sawada wrote:
> Sorry, I attached wrong version patch of pg_fdw_xact_resovler. Please
> use attached patch.

So in some other thread we are talking about renaming "xlog", because
nobody knows what the "x" means.  In the spirit of that, let's find
better names for new functions as well.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Transactions involving multiple postgres foreignservers

From

vinayak

Date:

30 January 2017, 04:20:27

On 2017/01/29 0:11, Peter Eisentraut wrote:
> On 1/26/17 4:49 AM, Masahiko Sawada wrote:
>> Sorry, I attached wrong version patch of pg_fdw_xact_resovler. Please
>> use attached patch.
> So in some other thread we are talking about renaming "xlog", because
> nobody knows what the "x" means.  In the spirit of that, let's find
> better names for new functions as well.
+1

Regards,
Vinayak Pokale
NTT Open Source Software Center

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

30 January 2017, 06:50:41

On Sat, Jan 28, 2017 at 8:41 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 1/26/17 4:49 AM, Masahiko Sawada wrote:
>> Sorry, I attached wrong version patch of pg_fdw_xact_resovler. Please
>> use attached patch.
>
> So in some other thread we are talking about renaming "xlog", because
> nobody knows what the "x" means.  In the spirit of that, let's find
> better names for new functions as well.

It's common in English (not just the database jargon) to abbreviate
"trans" by "x" [1]. xlog went a bit far by abbreviating whole
"transaction" by "x". But here "xact" means "transact", which is fine.
May be we should use 'X' instead of 'x', I don't know. Said that, I am
fine with any other name which conveys what the function does.

[1] https://en.wikipedia.org/wiki/X

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

30 January 2017, 10:30:20

On Mon, Jan 30, 2017 at 12:50 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Sat, Jan 28, 2017 at 8:41 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> On 1/26/17 4:49 AM, Masahiko Sawada wrote:
>>> Sorry, I attached wrong version patch of pg_fdw_xact_resovler. Please
>>> use attached patch.
>>
>> So in some other thread we are talking about renaming "xlog", because
>> nobody knows what the "x" means.  In the spirit of that, let's find
>> better names for new functions as well.
>
> It's common in English (not just the database jargon) to abbreviate
> "trans" by "x" [1]. xlog went a bit far by abbreviating whole
> "transaction" by "x". But here "xact" means "transact", which is fine.
> May be we should use 'X' instead of 'x', I don't know. Said that, I am
> fine with any other name which conveys what the function does.
>
> [1] https://en.wikipedia.org/wiki/X
>

"txn" can be used for abbreviation of "Transaction", so for example
pg_fdw_txn_resolver?
I'm also fine to change the module and function name.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Michael Paquier

Date:

01 February 2017, 07:06:43

On Thu, Jan 26, 2017 at 6:49 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Sorry, I attached wrong version patch of pg_fdw_xact_resovler. Please
> use attached patch.

This patch has been moved to CF 2017-03.
-- 
Michael

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Robert Haas

Date:

01 February 2017, 22:25:40

On Mon, Jan 30, 2017 at 2:30 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> "txn" can be used for abbreviation of "Transaction", so for example
> pg_fdw_txn_resolver?
> I'm also fine to change the module and function name.

If we're judging the relative clarity of various ways of abbreviating
the word "transaction", "txn" surely beats "x".

To repeat my usual refrain, is there any merit to abbreviating at all?Could we call it, say, "fdw_transaction_resolver"
or
"fdw_transaction_manager"?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

06 February 2017, 16:48:56

On Wed, Feb 1, 2017 at 8:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jan 30, 2017 at 2:30 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> "txn" can be used for abbreviation of "Transaction", so for example
>> pg_fdw_txn_resolver?
>> I'm also fine to change the module and function name.
>
> If we're judging the relative clarity of various ways of abbreviating
> the word "transaction", "txn" surely beats "x".
>
> To repeat my usual refrain, is there any merit to abbreviating at all?
>  Could we call it, say, "fdw_transaction_resolver" or
> "fdw_transaction_manager"?
>

Almost modules in contrib are name with "pg_" prefix but I prefer
"fdw_transcation_resolver" if we don't need  "pg_" prefix.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

15 February 2017, 09:11:02

On Mon, Feb 6, 2017 at 10:48 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Wed, Feb 1, 2017 at 8:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Jan 30, 2017 at 2:30 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> "txn" can be used for abbreviation of "Transaction", so for example
>>> pg_fdw_txn_resolver?
>>> I'm also fine to change the module and function name.
>>
>> If we're judging the relative clarity of various ways of abbreviating
>> the word "transaction", "txn" surely beats "x".
>>
>> To repeat my usual refrain, is there any merit to abbreviating at all?
>>  Could we call it, say, "fdw_transaction_resolver" or
>> "fdw_transaction_manager"?
>>
>
> Almost modules in contrib are name with "pg_" prefix but I prefer
> "fdw_transcation_resolver" if we don't need  "pg_" prefix.
>

Since previous patches conflict to current HEAD, attached latest
version patches.
Please review them.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

28 February 2017, 10:54:07

On Wed, Feb 15, 2017 at 3:11 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Mon, Feb 6, 2017 at 10:48 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Wed, Feb 1, 2017 at 8:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Mon, Jan 30, 2017 at 2:30 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> "txn" can be used for abbreviation of "Transaction", so for example
>>>> pg_fdw_txn_resolver?
>>>> I'm also fine to change the module and function name.
>>>
>>> If we're judging the relative clarity of various ways of abbreviating
>>> the word "transaction", "txn" surely beats "x".
>>>
>>> To repeat my usual refrain, is there any merit to abbreviating at all?
>>>  Could we call it, say, "fdw_transaction_resolver" or
>>> "fdw_transaction_manager"?
>>>
>>
>> Almost modules in contrib are name with "pg_" prefix but I prefer
>> "fdw_transcation_resolver" if we don't need  "pg_" prefix.
>>
>
> Since previous patches conflict to current HEAD, attached latest
> version patches.
> Please review them.
>

I've created a wiki page[1] describing about the design and
functionality of this feature. Also it has some examples of use case,
so this page would be helpful for even testing. Please refer it if
you're interested in testing this feature.

[1] 2PC on FDW
<https://wiki.postgresql.org/wiki/2PC_on_FDW>

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] Transactions involving multiple postgres foreignservers

From

vinayak

Date:

02 March 2017, 05:56:23

On 2017/02/28 16:54, Masahiko Sawada wrote:

I've created a wiki page[1] describing about the design and
functionality of this feature. Also it has some examples of use case,
so this page would be helpful for even testing. Please refer it if
you're interested in testing this feature.

[1] 2PC on FDW
<https://wiki.postgresql.org/wiki/2PC_on_FDW>

Thank you for creating the wiki page.

In the "src/test/regress/pg_regress.c" file
-                * xacts. (Note: to reduce the probability of unexpected shmmax
-                * failures, don't set max_prepared_transactions any higher than
-                * actually needed by the prepared_xacts regression test.)
+                * xacts. We also set max_fdw_transctions to enable testing of atomic
+                * foreign transactions. (Note: to reduce the probability of unexpected
+                * shmmax failures, don't set max_prepared_transactions or
+                * max_prepared_foreign_transactions any higher than actually needed by the
+                * corresponding regression tests.).

I think we are not setting the "max_fdw_transctions" anywhere.
Is this correct?

In the "src/bin/pg_waldump/rmgrdesc.c" file following header file used two times.
+ #include "access/fdw_xact.h"
I think we need to remove one line.

Regards,
Vinayak Pokale

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

02 March 2017, 19:24:10

On Thu, Mar 2, 2017 at 11:56 AM, vinayak
<Pokale_Vinayak_q3@lab.ntt.co.jp> wrote:
>
> On 2017/02/28 16:54, Masahiko Sawada wrote:
>
> I've created a wiki page[1] describing about the design and
> functionality of this feature. Also it has some examples of use case,
> so this page would be helpful for even testing. Please refer it if
> you're interested in testing this feature.
>
> [1] 2PC on FDW
> <https://wiki.postgresql.org/wiki/2PC_on_FDW>
>
> Thank you for creating the wiki page.

Thank you for looking at this patch.

> In the "src/test/regress/pg_regress.c" file
> -                * xacts.  (Note: to reduce the probability of unexpected
> shmmax
> -                * failures, don't set max_prepared_transactions any higher
> than
> -                * actually needed by the prepared_xacts regression test.)
> +                * xacts. We also set max_fdw_transctions to enable testing
> of atomic
> +                * foreign transactions. (Note: to reduce the probability of
> unexpected
> +                * shmmax failures, don't set max_prepared_transactions or
> +                * max_prepared_foreign_transactions any higher than
> actually needed by the
> +                * corresponding regression tests.).
>
> I think we are not setting the "max_fdw_transctions" anywhere.
> Is this correct?

This comment is out of date. Will fix.

>
> In the "src/bin/pg_waldump/rmgrdesc.c" file following header file used two
> times.
> + #include "access/fdw_xact.h"
> I think we need to remove one line.
>

Not necessary. Will get rid of it.

Since these are not feature bugs I will incorporate these when making
update version patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

07 March 2017, 11:04:57

On Fri, Mar 3, 2017 at 1:24 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Mar 2, 2017 at 11:56 AM, vinayak
> <Pokale_Vinayak_q3@lab.ntt.co.jp> wrote:
>>
>> On 2017/02/28 16:54, Masahiko Sawada wrote:
>>
>> I've created a wiki page[1] describing about the design and
>> functionality of this feature. Also it has some examples of use case,
>> so this page would be helpful for even testing. Please refer it if
>> you're interested in testing this feature.
>>
>> [1] 2PC on FDW
>> <https://wiki.postgresql.org/wiki/2PC_on_FDW>
>>
>> Thank you for creating the wiki page.
>
> Thank you for looking at this patch.
>
>> In the "src/test/regress/pg_regress.c" file
>> -                * xacts.  (Note: to reduce the probability of unexpected
>> shmmax
>> -                * failures, don't set max_prepared_transactions any higher
>> than
>> -                * actually needed by the prepared_xacts regression test.)
>> +                * xacts. We also set max_fdw_transctions to enable testing
>> of atomic
>> +                * foreign transactions. (Note: to reduce the probability of
>> unexpected
>> +                * shmmax failures, don't set max_prepared_transactions or
>> +                * max_prepared_foreign_transactions any higher than
>> actually needed by the
>> +                * corresponding regression tests.).
>>
>> I think we are not setting the "max_fdw_transctions" anywhere.
>> Is this correct?
>
> This comment is out of date. Will fix.
>
>>
>> In the "src/bin/pg_waldump/rmgrdesc.c" file following header file used two
>> times.
>> + #include "access/fdw_xact.h"
>> I think we need to remove one line.
>>
>
> Not necessary. Will get rid of it.
>
> Since these are not feature bugs I will incorporate these when making
> update version patches.
>

Attached updated set of patches.
The differences from previous patch are,
  * Fixed a few bugs.
  * Separated previous 000 patch into two patches.
  * Changed name pg_fdw_xact_resovler contrib module to
fdw_transaction_resolver.
  * Incorporated review comments got from Vinayak

Please review these patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

13 March 2017, 03:59:21

On Tue, Mar 7, 2017 at 5:04 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Mar 3, 2017 at 1:24 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Thu, Mar 2, 2017 at 11:56 AM, vinayak
>> <Pokale_Vinayak_q3@lab.ntt.co.jp> wrote:
>>>
>>> On 2017/02/28 16:54, Masahiko Sawada wrote:
>>>
>>> I've created a wiki page[1] describing about the design and
>>> functionality of this feature. Also it has some examples of use case,
>>> so this page would be helpful for even testing. Please refer it if
>>> you're interested in testing this feature.
>>>
>>> [1] 2PC on FDW
>>> <https://wiki.postgresql.org/wiki/2PC_on_FDW>
>>>
>>> Thank you for creating the wiki page.
>>
>> Thank you for looking at this patch.
>>
>>> In the "src/test/regress/pg_regress.c" file
>>> -                * xacts.  (Note: to reduce the probability of unexpected
>>> shmmax
>>> -                * failures, don't set max_prepared_transactions any higher
>>> than
>>> -                * actually needed by the prepared_xacts regression test.)
>>> +                * xacts. We also set max_fdw_transctions to enable testing
>>> of atomic
>>> +                * foreign transactions. (Note: to reduce the probability of
>>> unexpected
>>> +                * shmmax failures, don't set max_prepared_transactions or
>>> +                * max_prepared_foreign_transactions any higher than
>>> actually needed by the
>>> +                * corresponding regression tests.).
>>>
>>> I think we are not setting the "max_fdw_transctions" anywhere.
>>> Is this correct?
>>
>> This comment is out of date. Will fix.
>>
>>>
>>> In the "src/bin/pg_waldump/rmgrdesc.c" file following header file used two
>>> times.
>>> + #include "access/fdw_xact.h"
>>> I think we need to remove one line.
>>>
>>
>> Not necessary. Will get rid of it.
>>
>> Since these are not feature bugs I will incorporate these when making
>> update version patches.
>>
>
> Attached updated set of patches.
> The differences from previous patch are,
>   * Fixed a few bugs.
>   * Separated previous 000 patch into two patches.
>   * Changed name pg_fdw_xact_resovler contrib module to
> fdw_transaction_resolver.
>   * Incorporated review comments got from Vinayak
>
> Please review these patches.
>

Since previous v9 patches conflict with current HEAD, I attached latest patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Vinayak Pokale

Date:

16 March 2017, 08:37:01

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            tested, passed

I have tested the latest patch and it looks good to me,
so I marked it "Ready for committer".
Anyway, it would be great if anyone could also have a look at the patches and send comments.

The new status of this patch is: Ready for Committer

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

21 March 2017, 20:49:55

On Thu, Mar 16, 2017 at 2:37 PM, Vinayak Pokale
<pokale_vinayak_q3@lab.ntt.co.jp> wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:       tested, passed
> Spec compliant:           tested, passed
> Documentation:            tested, passed
>
> I have tested the latest patch and it looks good to me,
> so I marked it "Ready for committer".
> Anyway, it would be great if anyone could also have a look at the patches and send comments.
>
> The new status of this patch is: Ready for Committer
>

Thank you for updating but I found a bug in 001 patch. Attached latest patches.
The differences are
  * Fixed a bug.
  * Ran pgindent.
  * Separated the patch supporting GetPrepareID API.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

29 March 2017, 17:14:53

On Wed, Mar 22, 2017 at 2:49 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Mar 16, 2017 at 2:37 PM, Vinayak Pokale
> <pokale_vinayak_q3@lab.ntt.co.jp> wrote:
>> The following review has been posted through the commitfest application:
>> make installcheck-world:  tested, passed
>> Implements feature:       tested, passed
>> Spec compliant:           tested, passed
>> Documentation:            tested, passed
>>
>> I have tested the latest patch and it looks good to me,
>> so I marked it "Ready for committer".
>> Anyway, it would be great if anyone could also have a look at the patches and send comments.
>>
>> The new status of this patch is: Ready for Committer
>>
>
> Thank you for updating but I found a bug in 001 patch. Attached latest patches.
> The differences are
>   * Fixed a bug.
>   * Ran pgindent.
>   * Separated the patch supporting GetPrepareID API.
>

Since previous patches conflict with current HEAD, I attached latest
set of patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

> On 31 Jul 2017, at 20:03, Robert Haas <robertmhaas@gmail.com> wrote:
>
> Regardless of whether we share XIDs or DXIDs, we need a more complex
> concept of transaction state than we have now.

Seems that discussion shifted from 2PC itself to the general issues with distributed
transactions. So it is probably appropriate to share here resume of things that we
done in area of distributed visibility. During last two years we tried three quite different
approaches and finally settled with Clock-SI.

At first, to test different approaches we did small patch that wrap calls to visibility-related
functions (SetTransactionStatus, GetSnapshot, etc. Described in detail at wiki[1] ) in order to
allow overload them from extension. Such approach allows to implement almost anything
related to distributed visibility since you have full control about how local visibility is done.
That API isn’t hard prerequisite, and if one wants to create some concrete implementation
it can be done just in place. However, I think it is good to have such API in some form.

So three approaches that we tried:

1) Postgres-XL-like:

That is most straightforward way. Basically we need separate network service (GTM/DTM) that is
responsible for xid generation, and managing running-list of transactions. So acquiring
xid and snapshot is done by network calls. Because of shared xid space it is possible
to compare them in ordinary way and get right order. Gap between non-simultaneous
commits by 2pc is covered by the fact that we getting our snapshots from GTM, and
it will remove xid from running list only when transaction committed on both nodes.

Such approach is okay for OLAP-style transactions where tps isn’t high. But OLTP with
high transaction rate GTM will immediately became a bottleneck since even write transactions
need to get snapshot from GTM. Even if they access only one node.

2) Incremental SI [2]

Approach with central coordinator, that can allow local reads without network
communications by slightly altering visibility rules.

Despite the fact that it is kind of patented, we also failed to achieve proper visibility
by implementing algorithms from that paper. It always showed some inconsistencies.
May be because of bugs in our implementation, may be because of some
typos/mistakes in algorithm description itself. Reasoning in paper wasn’t very
clear for us, as well as patent issues, so we just leaved that.

3) Clock-SI [3]

It is MS research paper, that describes algorithm similar to ones used in Spanner and
CockroachDB, without central GTM and with reads that do not require network roundtrip.

There are two ideas behind it:

* Assuming snapshot isolation and visibility on node are based on CSN, use local time as CSN,
then when you are doing 2PC, collect prepare time from all participating nodes and
commit transaction everywhere with maximum of that times. If node during read faces tuples
committed by tx with CSN greater then their snapshot CSN (that can happen due to
time desynchronisation on node) then it just waits until that time come. So time desynchronisation
can affect performance, but can’t affect correctness.

* During distributed commit transaction neither running (if it commits then tuple
should be already visible) nor committed/aborted (it still can be aborted, so it is illegal to read).
So here IN-DOUBT transaction state appears, when reader should wait for writers.

We managed to implement that using mentioned XTM api. XID<->CSN mapping is
accounted by extension itself. Speed/scalability are also good.

I want to resubmit implementation of that algorithm for FDW later in August, along with some
isolation tests based on set of queries in [4].

[1] https://wiki.postgresql.org/wiki/DTM#eXtensible_Transaction_Manager_API
[2] http://pi3.informatik.uni-mannheim.de/~norman/dsi_jour_2014.pdf
[3] https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/samehe-clocksi.srds2013.pdf
[4] https://github.com/ept/hermitage

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Michael Paquier

Date:

03 August 2017, 13:08:21

On Mon, Jul 31, 2017 at 7:27 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Robert Haas wrote:
>
>> An alternative approach is to have some kind of other identifier,
>> let's call it a distributed transaction ID (DXID) which is mapped by
>> each node onto a local XID.
>
> Postgres-XL seems to manage this problem by using a transaction manager
> node, which is in charge of assigning snapshots.  I don't know how that
> works, but perhaps adding that concept here could be useful too.  One
> critical point to that design is that the app connects not directly to
> the underlying Postgres server but instead to some other node which is
> or connects to the node that manages the snapshots.
>
> Maybe Michael can explain in better detail how it works, and/or how (and
> if) it could be applied here.

XL (and XC) use a transaction ID that plugs in directly with the
internal XID assigned by Postgres, actually bypassing what Postgres
assigns to each backend if a transaction needs one. So if transactions
are not heavenly shared among multiple nodes, performance gets
impacted. Now when we worked on this project we noticed that we gained
in performance by reducing the number of requests and grouping them
together, so a proxy layer has been added between the global
transaction manager and Postgres to group those requests. This does
not change the fact that read-committed transactions still need
snapshots for each query, which is consuming. So this approach hurts
less with analytic queries, and more with OLTP.

2PC transaction status was tracked as well in the GTM. This allows
fancy things like being able to prepare a transaction on node 1, and
commit it on node 2 for example. I am not honestly sure that you need
to add anything at clog level for example, but I think that having at
the FDW level the meta data of a transaction stored as a rather
correct approach on the matter. That's what greenplum actually does if
I recall correctly (Heikki save me!): it has one coordinator with such
metadata handling, and bunch of underlying nodes that store the data.
Citus does also that if I recall correctly. So instead of
decentralizing this information, this gets stored in a Postgres
coordinator instance.
-- 
Michael

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

26 September 2017, 12:06:51

On Tue, Aug 1, 2017 at 1:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jul 27, 2017 at 8:25 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> The remote transaction can be committed/aborted only after the fate of
>> the local transaction is decided. If we commit remote transaction and
>> abort local transaction, that's not good. AtEOXact* functions are
>> called immediately after that decision in post-commit/abort phase. So,
>> if we want to commit/abort the remote transaction immediately it has
>> to be done in post-commit/abort processing. Instead if we delegate
>> that to the remote transaction resolved backend (introduced by the
>> patches) the delay between local commit and remote commits depends
>> upon when the resolve gets a chance to run and process those
>> transactions. One could argue that that delay would anyway exist when
>> post-commit/abort processing fails to resolve remote transaction. But
>> given the real high availability these days, in most of the cases
>> remote transaction will be resolved in the post-commit/abort phase. I
>> think we should optimize for most common case. Your concern is still
>> valid, that we shouldn't raise an error or do anything critical in
>> post-commit/abort phase. So we should device a way to send
>> COMMIT/ABORT prepared messages to the remote server in asynchronous
>> fashion carefully avoiding errors. Recent changes to 2PC have improved
>> performance in that area to a great extent. Relying on resolver
>> backend to resolve remote transactions would erode that performance
>> gain.
>
> I think there are two separate but interconnected issues here.  One is
> that if we give the user a new command prompt without resolving the
> remote transaction, then they might run a new query that sees their
> own work as committed, which would be bad.  Or, they might commit,
> wait for the acknowledgement, and then tell some other session to go
> look at the data, and find it not there.  That would also be bad.  I
> think the solution is likely to do something like what we did for
> synchronous replication in commit
> 9a56dc3389b9470031e9ef8e45c95a680982e01a -- wait for the remove
> transaction to be resolved (by the background process) but allow an
> interrupt to escape the wait-loop.
>
> The second issue is that having the resolver resolve transactions
> might be slower than doing it in the foreground.  I don't necessarily
> see a reason why that should be a big problem.  I mean, the resolver
> might need to establish a separate connection, but if it keeps that
> connection open for a while (say, 5 minutes) in case further
> transactions arrive then it won't be an issue except on really
> low-volume system which isn't really a case I think we need to worry
> about very much.  Also, the hand-off to the resolver might take some
> time, but that's equally true for sync rep and we're living with it
> there.  Anything else is presumably just the resolver itself being
> inefficient which seems like something that can simply be fixed.
>
> FWIW, I don't think the present resolver implementation is likely to
> be what we want.  IIRC, it's just calling an SQL function which
> doesn't seem like a good approach.  Ideally we should stick an entry
> into a shared memory queue and then ping the resolver via SetLatch,
> and it can directly invoke an FDW method on the data from the shared
> memory queue.  It should be possible to set things up so that a user
> who wishes to do so can run multiple copies of the resolver thread at
> the same time, which would be a good way to keep latency down if the
> system is very busy with distributed transactions.
>

Based on the review comment from Robert, I'm planning to do the big
change to the architecture of this patch so that a backend process
work together with a dedicated background worker that is responsible
for resolving the foreign transactions. For the usage of this feature,
it will be almost the same as what this patch has been doing except
for adding a new GUC paramter that controls the number of resovler
process launch. That is, we can have multiple resolver process to keep
latency down.

On technical view, the processing of the transaction involving
multiple foreign server will be changed as follows.

* Backend processes
1. In PreCommit phase, prepare the transaction on foreign servers and
save fdw_xact entries into the array on shmem. Also create a
fdw_xact_state entry on shmem hash that has the index of each fdw_xact
entry.
2. Local commit/abort.
3. Change its process state to FDWXACT_WAITING and enqueue the MyProc
to the shmem queue.
4. Ping to the resolver process via SetLatch.
5. Wait to be waken up.

* Resovler processes
1. Fetch PGPROC entry from the shmem queue and get its XID (say, XID-a).
2. Get the fdw_xact_state entry from shmem hash by XID-a.
3. Iterate fdw_xact entries using the index, and resolve the foreign
transactions.
3-a. If even one foreign transaction failed to resolve, raise an error.
4. Change the waiting backend state to FDWXACT_COMPLETED and release it.

Also, the resolver process scans over the array of fdw_xact entry
periodically, and tries to resolve in-doubt transactions.
This patch still has the concern in the design and I'm planing to
update the patch for the next commit fest. So I'll mark this as
"Waiting on Author".

Feedback and suggestion are very welcome.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Robert Haas

Date:

26 September 2017, 15:50:29

On Tue, Sep 26, 2017 at 5:06 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Based on the review comment from Robert, I'm planning to do the big
> change to the architecture of this patch so that a backend process
> work together with a dedicated background worker that is responsible
> for resolving the foreign transactions. For the usage of this feature,
> it will be almost the same as what this patch has been doing except
> for adding a new GUC paramter that controls the number of resovler
> process launch. That is, we can have multiple resolver process to keep
> latency down.

Multiple resolver processes is useful but gets a bit complicated.  For
example, if process 1 has a connection open to foreign server A and
process 2 does not, and a request arrives that needs to be handled on
foreign server A, what happens?  If process 1 is already busy doing
something else, probably we want process 2 to try to open a new
connection to foreign server A and handle the request.  But if process
1 and 2 are both idle, ideally we'd like 1 to get that request rather
than 2.  That seems a bit difficult to get working though.  Maybe we
should just ignore such considerations in the first version.

> * Resovler processes
> 1. Fetch PGPROC entry from the shmem queue and get its XID (say, XID-a).
> 2. Get the fdw_xact_state entry from shmem hash by XID-a.
> 3. Iterate fdw_xact entries using the index, and resolve the foreign
> transactions.
> 3-a. If even one foreign transaction failed to resolve, raise an error.
> 4. Change the waiting backend state to FDWXACT_COMPLETED and release it.

Comments:

- Note that any error we raise here won't reach the user; this is a
background process.  We don't want to get into a loop where we just
error out repeatedly forever -- at least not if there's any other
reasonable choice.

- I suggest that we ought to track the status for each XID separately
on each server rather than just track the XID status overall.  That
way, if transaction resolution fails on one server, we don't keep
trying to reconnect to the others.

- If we go to resolve a remote transaction and find that no such
remote transaction exists, what should we do?  I'm inclined to think
that we should regard that as if we had succeeded in resolving the
transaction.  Certainly, if we've retried the server repeatedly, it
might be that we previously succeeded in resolving the transaction but
then the network connection was broken before we got the success
message back from the remote server.  But even if that's not the
scenario, I think we should assume that the DBA or some other system
resolved it and therefore we don't need to do anything further.  If we
assume anything else, then we just go into an infinite error loop,
which isn't useful behavior.  We could log a message, though (for
example, LOG: unable to resolve foreign transaction ... because it
does not exist).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

27 September 2017, 09:41:25

On Tue, Sep 26, 2017 at 9:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Sep 26, 2017 at 5:06 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Based on the review comment from Robert, I'm planning to do the big
>> change to the architecture of this patch so that a backend process
>> work together with a dedicated background worker that is responsible
>> for resolving the foreign transactions. For the usage of this feature,
>> it will be almost the same as what this patch has been doing except
>> for adding a new GUC paramter that controls the number of resovler
>> process launch. That is, we can have multiple resolver process to keep
>> latency down.
>
> Multiple resolver processes is useful but gets a bit complicated.  For
> example, if process 1 has a connection open to foreign server A and
> process 2 does not, and a request arrives that needs to be handled on
> foreign server A, what happens?  If process 1 is already busy doing
> something else, probably we want process 2 to try to open a new
> connection to foreign server A and handle the request.  But if process
> 1 and 2 are both idle, ideally we'd like 1 to get that request rather
> than 2.  That seems a bit difficult to get working though.  Maybe we
> should just ignore such considerations in the first version.

I understood. I keep it simple in the first version.

>> * Resovler processes
>> 1. Fetch PGPROC entry from the shmem queue and get its XID (say, XID-a).
>> 2. Get the fdw_xact_state entry from shmem hash by XID-a.
>> 3. Iterate fdw_xact entries using the index, and resolve the foreign
>> transactions.
>> 3-a. If even one foreign transaction failed to resolve, raise an error.
>> 4. Change the waiting backend state to FDWXACT_COMPLETED and release it.
>
> Comments:
>
> - Note that any error we raise here won't reach the user; this is a
> background process.  We don't want to get into a loop where we just
> error out repeatedly forever -- at least not if there's any other
> reasonable choice.

Thank you for the comments.

Agreed.

> - I suggest that we ought to track the status for each XID separately
> on each server rather than just track the XID status overall.  That
> way, if transaction resolution fails on one server, we don't keep
> trying to reconnect to the others.

Agreed. In the current patch we manage fdw_xact entries that track the
status for each XID separately on each server. I'm going to use the
same mechanism. The resolver process get an target XID from shmem
queue and get the all fdw_xact entries associated with the XID from
the fdw_xact array in shmem. But since the scanning the whole fdw_xact
entries could be slow because the number of entry of fdw_xact array
could be a large number (e.g, max_connections * # of foreign servers),I'm considering to have a linked list of the all
fdw_xactentries
 
associated with same XID, and to have a shmem hash pointing to the
first fdw_xact entry of the linked lists for each XID. That way, we
can find the target fdw_xact entries from the array in O(1).

> - If we go to resolve a remote transaction and find that no such
> remote transaction exists, what should we do?  I'm inclined to think
> that we should regard that as if we had succeeded in resolving the
> transaction.  Certainly, if we've retried the server repeatedly, it
> might be that we previously succeeded in resolving the transaction but
> then the network connection was broken before we got the success
> message back from the remote server.  But even if that's not the
> scenario, I think we should assume that the DBA or some other system
> resolved it and therefore we don't need to do anything further.  If we
> assume anything else, then we just go into an infinite error loop,
> which isn't useful behavior.  We could log a message, though (for
> example, LOG: unable to resolve foreign transaction ... because it
> does not exist).

Agreed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

27 September 2017, 10:05:39

On Wed, Sep 27, 2017 at 12:11 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Tue, Sep 26, 2017 at 9:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Sep 26, 2017 at 5:06 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> Based on the review comment from Robert, I'm planning to do the big
>>> change to the architecture of this patch so that a backend process
>>> work together with a dedicated background worker that is responsible
>>> for resolving the foreign transactions. For the usage of this feature,
>>> it will be almost the same as what this patch has been doing except
>>> for adding a new GUC paramter that controls the number of resovler
>>> process launch. That is, we can have multiple resolver process to keep
>>> latency down.
>>
>> Multiple resolver processes is useful but gets a bit complicated.  For
>> example, if process 1 has a connection open to foreign server A and
>> process 2 does not, and a request arrives that needs to be handled on
>> foreign server A, what happens?  If process 1 is already busy doing
>> something else, probably we want process 2 to try to open a new
>> connection to foreign server A and handle the request.  But if process
>> 1 and 2 are both idle, ideally we'd like 1 to get that request rather
>> than 2.  That seems a bit difficult to get working though.  Maybe we
>> should just ignore such considerations in the first version.
>
> I understood. I keep it simple in the first version.

While a resolver process is useful for resolving transaction later, it
seems performance effective to try to resolve the prepared foreign
transaction, in post-commit phase, in the same backend which prepared
those for two reasons 1. the backend already has a connection to that
foreign server 2. it has just run some commands to completion on that
foreign server, so it's highly likely that a COMMIT PREPARED would
succeed too. If we let a resolver process do that, we will spend time
in 1. signalling resolver process 2. setting up a connection to the
foreign server and 3. by the time resolver process tries to resolve
the prepared transaction the foreign server may become unavailable,
thus delaying the resolution.

Said that, I agree that post-commit phase doesn't have a transaction
of itself, and thus any catalog lookup, error reporting is not
possible. We will need some different approach here, which may not be
straight forward. So, we may need to delay this optimization for v2. I
think we have discussed this before, but I don't find a mail off-hand.

>
>>> * Resovler processes
>>> 1. Fetch PGPROC entry from the shmem queue and get its XID (say, XID-a).
>>> 2. Get the fdw_xact_state entry from shmem hash by XID-a.
>>> 3. Iterate fdw_xact entries using the index, and resolve the foreign
>>> transactions.
>>> 3-a. If even one foreign transaction failed to resolve, raise an error.
>>> 4. Change the waiting backend state to FDWXACT_COMPLETED and release it.
>>
>> Comments:
>>
>> - Note that any error we raise here won't reach the user; this is a
>> background process.  We don't want to get into a loop where we just
>> error out repeatedly forever -- at least not if there's any other
>> reasonable choice.
>
> Thank you for the comments.
>
> Agreed.

We should probably log an error message in the server log, so that
DBAs are aware of such a failure. Is that something you are
considering to do?

>
>> - I suggest that we ought to track the status for each XID separately
>> on each server rather than just track the XID status overall.  That
>> way, if transaction resolution fails on one server, we don't keep
>> trying to reconnect to the others.
>
> Agreed. In the current patch we manage fdw_xact entries that track the
> status for each XID separately on each server. I'm going to use the
> same mechanism. The resolver process get an target XID from shmem
> queue and get the all fdw_xact entries associated with the XID from
> the fdw_xact array in shmem. But since the scanning the whole fdw_xact
> entries could be slow because the number of entry of fdw_xact array
> could be a large number (e.g, max_connections * # of foreign servers),
>  I'm considering to have a linked list of the all fdw_xact entries
> associated with same XID, and to have a shmem hash pointing to the
> first fdw_xact entry of the linked lists for each XID. That way, we
> can find the target fdw_xact entries from the array in O(1).
>

If we want to do something like this, would it be useful to use a data
structure similar to what is used for maintaining subtrasactions? Just
a thought.

>> - If we go to resolve a remote transaction and find that no such
>> remote transaction exists, what should we do?  I'm inclined to think
>> that we should regard that as if we had succeeded in resolving the
>> transaction.  Certainly, if we've retried the server repeatedly, it
>> might be that we previously succeeded in resolving the transaction but
>> then the network connection was broken before we got the success
>> message back from the remote server.  But even if that's not the
>> scenario, I think we should assume that the DBA or some other system
>> resolved it and therefore we don't need to do anything further.  If we
>> assume anything else, then we just go into an infinite error loop,
>> which isn't useful behavior.  We could log a message, though (for
>> example, LOG: unable to resolve foreign transaction ... because it
>> does not exist).
>
> Agreed.
>

Yes. I think the current patch takes care of this, except probably the
error message.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreignservers

From

Stas Kelvich

Date:

27 September 2017, 13:12:43

> On 26 Sep 2017, at 12:06, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
> Based on the review comment from Robert, I'm planning to do the big
> change to the architecture of this patch so that a backend process
> work together with a dedicated background worker that is responsible
> for resolving the foreign transactions.

For what it worth, I rebased latest patch to current master.

As far as I understand it is planned to change resolver arch,
so is it okay to review code that is intended for non-faulty
work scenarios?




Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

fdw2pc_v13.diff

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

28 September 2017, 06:15:20

On Wed, Sep 27, 2017 at 4:05 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Wed, Sep 27, 2017 at 12:11 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Tue, Sep 26, 2017 at 9:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Tue, Sep 26, 2017 at 5:06 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> Based on the review comment from Robert, I'm planning to do the big
>>>> change to the architecture of this patch so that a backend process
>>>> work together with a dedicated background worker that is responsible
>>>> for resolving the foreign transactions. For the usage of this feature,
>>>> it will be almost the same as what this patch has been doing except
>>>> for adding a new GUC paramter that controls the number of resovler
>>>> process launch. That is, we can have multiple resolver process to keep
>>>> latency down.
>>>
>>> Multiple resolver processes is useful but gets a bit complicated.  For
>>> example, if process 1 has a connection open to foreign server A and
>>> process 2 does not, and a request arrives that needs to be handled on
>>> foreign server A, what happens?  If process 1 is already busy doing
>>> something else, probably we want process 2 to try to open a new
>>> connection to foreign server A and handle the request.  But if process
>>> 1 and 2 are both idle, ideally we'd like 1 to get that request rather
>>> than 2.  That seems a bit difficult to get working though.  Maybe we
>>> should just ignore such considerations in the first version.
>>
>> I understood. I keep it simple in the first version.
>
> While a resolver process is useful for resolving transaction later, it
> seems performance effective to try to resolve the prepared foreign
> transaction, in post-commit phase, in the same backend which prepared
> those for two reasons 1. the backend already has a connection to that
> foreign server 2. it has just run some commands to completion on that
> foreign server, so it's highly likely that a COMMIT PREPARED would
> succeed too. If we let a resolver process do that, we will spend time
> in 1. signalling resolver process 2. setting up a connection to the
> foreign server and 3. by the time resolver process tries to resolve
> the prepared transaction the foreign server may become unavailable,
> thus delaying the resolution.

I think that making a resolver process have connection caches to each
foreign server for a while can reduce the overhead of connection to
foreign servers. These connections will be invalidated by DDLs. Also,
most of the time we spend to commit a distributed transaction is the
interaction between the coordinator and foreign servers using
two-phase commit protocal. So I guess the time in signalling to a
resolver process would not be a big overhead.

> Said that, I agree that post-commit phase doesn't have a transaction
> of itself, and thus any catalog lookup, error reporting is not
> possible. We will need some different approach here, which may not be
> straight forward. So, we may need to delay this optimization for v2. I
> think we have discussed this before, but I don't find a mail off-hand.
>
>>
>>>> * Resovler processes
>>>> 1. Fetch PGPROC entry from the shmem queue and get its XID (say, XID-a).
>>>> 2. Get the fdw_xact_state entry from shmem hash by XID-a.
>>>> 3. Iterate fdw_xact entries using the index, and resolve the foreign
>>>> transactions.
>>>> 3-a. If even one foreign transaction failed to resolve, raise an error.
>>>> 4. Change the waiting backend state to FDWXACT_COMPLETED and release it.
>>>
>>> Comments:
>>>
>>> - Note that any error we raise here won't reach the user; this is a
>>> background process.  We don't want to get into a loop where we just
>>> error out repeatedly forever -- at least not if there's any other
>>> reasonable choice.
>>
>> Thank you for the comments.
>>
>> Agreed.
>
> We should probably log an error message in the server log, so that
> DBAs are aware of such a failure. Is that something you are
> considering to do?

Yes, a resolver process logs an error message in that case.

>
>>
>>> - I suggest that we ought to track the status for each XID separately
>>> on each server rather than just track the XID status overall.  That
>>> way, if transaction resolution fails on one server, we don't keep
>>> trying to reconnect to the others.
>>
>> Agreed. In the current patch we manage fdw_xact entries that track the
>> status for each XID separately on each server. I'm going to use the
>> same mechanism. The resolver process get an target XID from shmem
>> queue and get the all fdw_xact entries associated with the XID from
>> the fdw_xact array in shmem. But since the scanning the whole fdw_xact
>> entries could be slow because the number of entry of fdw_xact array
>> could be a large number (e.g, max_connections * # of foreign servers),
>>  I'm considering to have a linked list of the all fdw_xact entries
>> associated with same XID, and to have a shmem hash pointing to the
>> first fdw_xact entry of the linked lists for each XID. That way, we
>> can find the target fdw_xact entries from the array in O(1).
>>
>
> If we want to do something like this, would it be useful to use a data
> structure similar to what is used for maintaining subtrasactions? Just
> a thought.

Thank you for the advise, I'll consider that. But what I want to do is
just grouping the fdw_xact entries by XID and fetching the group of
fdw_xact in O(1) so we might not need to have the group as using a
stack like that is used for maintaining subtransactions.

>
>>> - If we go to resolve a remote transaction and find that no such
>>> remote transaction exists, what should we do?  I'm inclined to think
>>> that we should regard that as if we had succeeded in resolving the
>>> transaction.  Certainly, if we've retried the server repeatedly, it
>>> might be that we previously succeeded in resolving the transaction but
>>> then the network connection was broken before we got the success
>>> message back from the remote server.  But even if that's not the
>>> scenario, I think we should assume that the DBA or some other system
>>> resolved it and therefore we don't need to do anything further.  If we
>>> assume anything else, then we just go into an infinite error loop,
>>> which isn't useful behavior.  We could log a message, though (for
>>> example, LOG: unable to resolve foreign transaction ... because it
>>> does not exist).
>>
>> Agreed.
>>
>
> Yes. I think the current patch takes care of this, except probably the
> error message.
>

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Robert Haas

Date:

29 September 2017, 18:42:11

On Wed, Sep 27, 2017 at 11:15 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I think that making a resolver process have connection caches to each
> foreign server for a while can reduce the overhead of connection to
> foreign servers. These connections will be invalidated by DDLs. Also,
> most of the time we spend to commit a distributed transaction is the
> interaction between the coordinator and foreign servers using
> two-phase commit protocal. So I guess the time in signalling to a
> resolver process would not be a big overhead.

I agree.  Also, in the future, we might try to allow connections to be
shared across backends.  I did some research on this a number of years
ago and found that every operating system I investigated had some way
of passing a file descriptor from one process to another -- so a
shared connection cache might be possible.

Also, we might port the whole backend to use threads, and then this
problem goes way.  But I don't have time to write that patch this
week.  :-)

It's possible that we might find that neither of the above approaches
are practical and that the performance benefits of resolving the
transaction from the original connection are large enough that we want
to try to make it work anyhow.  However, I think we can postpone that
work to a future time.  Any general solution to this problem at least
needs to be ABLE to resolve transactions at a later time from a
different session, so let's get that working first, and then see what
else we want to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

02 October 2017, 09:31:34

On Sat, Sep 30, 2017 at 12:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Sep 27, 2017 at 11:15 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> I think that making a resolver process have connection caches to each
>> foreign server for a while can reduce the overhead of connection to
>> foreign servers. These connections will be invalidated by DDLs. Also,
>> most of the time we spend to commit a distributed transaction is the
>> interaction between the coordinator and foreign servers using
>> two-phase commit protocal. So I guess the time in signalling to a
>> resolver process would not be a big overhead.
>
> I agree.  Also, in the future, we might try to allow connections to be
> shared across backends.  I did some research on this a number of years
> ago and found that every operating system I investigated had some way
> of passing a file descriptor from one process to another -- so a
> shared connection cache might be possible.

It sounds good idea.

> Also, we might port the whole backend to use threads, and then this
> problem goes way.  But I don't have time to write that patch this
> week.  :-)
>
> It's possible that we might find that neither of the above approaches
> are practical and that the performance benefits of resolving the
> transaction from the original connection are large enough that we want
> to try to make it work anyhow.  However, I think we can postpone that
> work to a future time.  Any general solution to this problem at least
> needs to be ABLE to resolve transactions at a later time from a
> different session, so let's get that working first, and then see what
> else we want to do.
>

I understood and agreed. I'll post the first version patch of new
design to next CF.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Daniel Gustafsson

Date:

02 October 2017, 13:13:18

> On 02 Oct 2017, at 08:31, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Sep 30, 2017 at 12:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Sep 27, 2017 at 11:15 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> I think that making a resolver process have connection caches to each
>>> foreign server for a while can reduce the overhead of connection to
>>> foreign servers. These connections will be invalidated by DDLs. Also,
>>> most of the time we spend to commit a distributed transaction is the
>>> interaction between the coordinator and foreign servers using
>>> two-phase commit protocal. So I guess the time in signalling to a
>>> resolver process would not be a big overhead.
>>
>> I agree.  Also, in the future, we might try to allow connections to be
>> shared across backends.  I did some research on this a number of years
>> ago and found that every operating system I investigated had some way
>> of passing a file descriptor from one process to another -- so a
>> shared connection cache might be possible.
>
> It sounds good idea.
>
>> Also, we might port the whole backend to use threads, and then this
>> problem goes way.  But I don't have time to write that patch this
>> week.  :-)
>>
>> It's possible that we might find that neither of the above approaches
>> are practical and that the performance benefits of resolving the
>> transaction from the original connection are large enough that we want
>> to try to make it work anyhow.  However, I think we can postpone that
>> work to a future time.  Any general solution to this problem at least
>> needs to be ABLE to resolve transactions at a later time from a
>> different session, so let's get that working first, and then see what
>> else we want to do.
>
> I understood and agreed. I'll post the first version patch of new
> design to next CF.

Closing this patch with Returned with feedback in this commitfest, looking
forward to a new version in an upcoming commitfest.

cheers ./daniel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Ashutosh Bapat

Date:

03 October 2017, 08:09:56

On Fri, Sep 29, 2017 at 9:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> It's possible that we might find that neither of the above approaches
> are practical and that the performance benefits of resolving the
> transaction from the original connection are large enough that we want
> to try to make it work anyhow.  However, I think we can postpone that
> work to a future time.  Any general solution to this problem at least
> needs to be ABLE to resolve transactions at a later time from a
> different session, so let's get that working first, and then see what
> else we want to do.
>
+1.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

From

Masahiko Sawada

Date:

25 October 2017, 00:45:46

On Mon, Oct 2, 2017 at 3:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Sat, Sep 30, 2017 at 12:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Sep 27, 2017 at 11:15 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> I think that making a resolver process have connection caches to each
>>> foreign server for a while can reduce the overhead of connection to
>>> foreign servers. These connections will be invalidated by DDLs. Also,
>>> most of the time we spend to commit a distributed transaction is the
>>> interaction between the coordinator and foreign servers using
>>> two-phase commit protocal. So I guess the time in signalling to a
>>> resolver process would not be a big overhead.
>>
>> I agree.  Also, in the future, we might try to allow connections to be
>> shared across backends.  I did some research on this a number of years
>> ago and found that every operating system I investigated had some way
>> of passing a file descriptor from one process to another -- so a
>> shared connection cache might be possible.
>
> It sounds good idea.
>
>> Also, we might port the whole backend to use threads, and then this
>> problem goes way.  But I don't have time to write that patch this
>> week.  :-)
>>
>> It's possible that we might find that neither of the above approaches
>> are practical and that the performance benefits of resolving the
>> transaction from the original connection are large enough that we want
>> to try to make it work anyhow.  However, I think we can postpone that
>> work to a future time.  Any general solution to this problem at least
>> needs to be ABLE to resolve transactions at a later time from a
>> different session, so let's get that working first, and then see what
>> else we want to do.
>>
>
> I understood and agreed. I'll post the first version patch of new
> design to next CF.
>

Attached latest version patch. I've heavily changed the patch since
previous one. The most part I modified is the resolving foreign
transaction and handling of dangling transactions. The part of
management of fdwxact entries is almost same as the previous patch.

Foreign Transaction Resolver
======================
I introduced a new background worker called "foreign transaction
resolver" which is responsible for resolving the transaction prepared
on foreign servers. The foreign transaction resolver process is
launched by backend processes when commit/rollback transaction. And it
periodically resolves the queued transactions on a database as long as
the queue is not empty. If the queue has been empty for the certain
time specified by foreign_transaction_resolver_time GUC parameter, it
exits. It means that the backend doesn't launch a new resolver process
if the resolver process is already working. In this case, the backend
process just adds the entry to the queue on shared memory and wake it
up. The maximum number of resolver process we can launch is controlled
by max_foreign_transaction_resolvers. So we recommends to set larger
max_foreign_transaction_resolvers value than the number of databases.
The resolver process also tries to resolve dangling transaction as
well in a cycle.

Processing Sequence
=================
I've changed the processing sequence of resolving foreign transaction
so that the second phase of two-phase commit protocol (COMMIT/ROLLBACK
prepared) is executed by a resolver process, not by backend process.
The basic processing sequence is following;

* Backend process
1. In pre-commit phase, the backend process saves fdwxact entries, and
then prepares transaction on all foreign servers that can execute
two-phase commit protocol.
2. Local commit.
3. Enqueue itself to the shmem queue and change its status to WAITING
4. launch or wakeup a resolver process and wait

    * Resolver process
    1. Dequeue the waiting process from shmem qeue
    2. Collect the fdwxact entries that are associated with the waiting process.
    3. Resolve foreign transactoins
    4. Release the waiting process

5. Wake up and restart

This is still under the design phase and I'm sure that there is room
for improvement and consider more sensitive behaviour but I'd like to
share the current status of the patch. The patch includes regression
tests but not includes fully documentation.

Feedback and comment are very welcome.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Wed, Dec 13, 2017 at 10:47 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Wed, Dec 13, 2017 at 12:03 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Dec 11, 2017 at 5:20 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> The question I have is how would we deal with a foreign server that is
>>>> not available for longer duration due to crash, longer network outage
>>>> etc. Example is the foreign server crashed/got disconnected after
>>>> PREPARE but before COMMIT/ROLLBACK was issued. The backend will remain
>>>> blocked for much longer duration without user having an idea of what's
>>>> going on. May be we should add some timeout.
>>>
>>> After more thought, I agree with adding some timeout. I can image
>>> there are users who want the timeout, for example, who cannot accept
>>> even a few seconds latency. If the timeout occurs backend unlocks the
>>> foreign transactions and breaks the loop. The resolver process will
>>> keep to continue to resolve foreign transactions at certain interval.
>>
>> I don't think a timeout is a very good idea.  There is no timeout for
>> synchronous replication and the issues here are similar.  I will not
>> try to block a patch adding a timeout, but I think it had better be
>> disabled by default and have very clear documentation explaining why
>> it's really dangerous.  And this is why: with no timeout, you can
>> count on being able to see the effects of your own previous
>> transactions, unless at some point you sent a query cancel or got
>> disconnected.  With a timeout, you may or may not see the effects of
>> your own previous transactions depending on whether or not you hit the
>> timeout, which you have no sure way of knowing.
>>
>>>>> transactions after the coordinator server recovered. On the other
>>>>> hand, for the reading a consistent result on such situation by
>>>>> subsequent reads, for example, we can disallow backends to inquiry SQL
>>>>> to the foreign server if a foreign transaction of the foreign server
>>>>> is remained.
>>>>
>>>> +1 for the last sentence. If we do that, we don't need the backend to
>>>> be blocked by resolver since a subsequent read accessing that foreign
>>>> server would get an error and not inconsistent data.
>>>
>>> Yeah, however the disadvantage of this is that we manage foreign
>>> transactions per foreign servers. If a transaction that modified even
>>> one table is remained as a in-doubt transaction, we cannot issue any
>>> SQL that touches that foreign server. Can we occur an error at
>>> ExecInitForeignScan()?
>>
>> I really feel strongly we shouldn't complicate the initial patch with
>> this kind of thing.  Let's make it enough for this patch to guarantee
>> that either all parts of the transaction commit eventually or they all
>> abort eventually.  Ensuring consistent visibility is a different and
>> hard project, and if we try to do that now, this patch is not going to
>> be done any time soon.
>>
>
> Thank you for the suggestion.
>
> I was really wondering if we should add a timeout to this feature.
> It's a common concern that we want to put a timeout at critical
> section. But currently we don't have such timeout to neither
> synchronous replication or writing WAL. I can image there will be
> users who want to a timeout for such cases but obviously it makes this
> feature more complex. Anyway, even if we add a timeout to this feature
> we can make it as a separated patch and feature. So I'd like to keep
> it simple as first step. This patch guarantees that the transaction
> commit or rollback on all foreign servers or not unless users doesn't
> cancel.
>
> Regards,
>

I've updated documentation of patches, and fixed some bugs. I did some
failure tests of this feature using a fault simulation tool[1] for
PostgreSQL that I created.

0001 patch adds a mechanism to track of writes on local server. This
is required to determine whether we should use 2pc at commit. 0002
patch is the main part. It adds a distributed transaction manager
(currently only for atomic commit), APIs for 2pc and foreign
transaction manager resolver process. 0003 patch makes postgres_fdw
support atomic commit using 2pc.

Please review patches.

[1] https://github.com/MasahikoSawada/pg_simula

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

On Thu, Feb 8, 2018 at 3:11 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Jan 9, 2018 at 9:49 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> If I understand correctly, XactLastRecEnd can be set by, for example,
>>> a HOT cleanup record, so that doesn't seem like a good thing to use.
>>
>> Yes, that's right.
>>
>>> Whether we need to use 2PC across remote nodes seems like it shouldn't
>>> depend on whether a local SELECT statement happened to do a HOT
>>> cleanup or not.
>>
>> So I think we need to check if the top transaction is invalid or not as well.
>
> Even if you check both, it doesn't sound like it really does what you
> want.  Won't you still end up partially dependent on whether a HOT
> cleanup happened, if not in quite the same way as before?  How about
> defining a new bit in MyXactFlags for XACT_FLAGS_WROTENONTEMPREL?
> Just have heap_insert, heap_update, and heap_delete do something like:
>
> if (RelationNeedsWAL(relation))
>     MyXactFlags |= XACT_FLAGS_WROTENONTEMPREL;

Agreed.

>
> Overall, what's the status of this patch?  Are we hung up on this
> issue only, or are there other things?

AFAIK there is no more technical issue in this patch so far other than
this issue. The patch has tests and docs, and includes all stuff to
support atomic commit to distributed transactions: the introducing
both the atomic commit ability to distributed transactions and some
corresponding FDW APIs, and having postgres_fdw support 2pc. I think
this patch needs to be reviewed, especially the functionality of
foreign transaction resolution which is re-designed before.

The previous patches doesn't apply cleanly to current HEAD and I've
fixed some issues. Attached latest patch set.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

On Sat, May 26, 2018 at 12:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, May 18, 2018 at 11:21 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Regarding to API design, should we use 2PC for a distributed
>> transaction if both two or more 2PC-capable foreign servers and
>> 2PC-non-capable foreign server are involved with it?  Or should we end
>> up with an error? the 2PC-non-capable server might be either that has
>> 2PC functionality but just disables it or that doesn't have it.
>
> It seems to me that this is functionality that many people will not
> want to use.  First, doing a PREPARE and then a COMMIT for each FDW
> write transaction is bound to be more expensive than just doing a
> COMMIT.  Second, because the default value of
> max_prepared_transactions is 0, this can only work at all if special
> configuration has been done on the remote side.  Because of the second
> point in particular, it seems to me that the default for this new
> feature must be "off".  It would make to ship a default configuration
> of PostgreSQL that doesn't work with the default configuration of
> postgres_fdw, and I do not think we want to change the default value
> of max_prepared_transactions.  It was changed from 5 to 0 a number of
> years back for good reason.

I'm not sure that many people will not want to use this feature
because it seems to me that there are many people who don't want to
use the database that is missing transaction atomicity. But I agree
that this feature should not be enabled by default as we disable 2PC
by default.

>
> So, I think the question could be broadened a bit: how you enable this
> feature if you want it, and what happens if you want it but it's not
> available for your choice of FDW?  One possible enabling method is a
> GUC (e.g. foreign_twophase_commit).  It could be true/false, with true
> meaning use PREPARE for all FDW writes and fail if that's not
> supported, or it could be three-valued, like require/prefer/disable,
> with require throwing an error if PREPARE support is not available and
> prefer using PREPARE where available but without failing when it isn't
> available.  Another possibility could be to make it an FDW option,
> possibly capable of being set at multiple levels (e.g. server or
> foreign table).  If any FDW involved in the transaction demands
> distributed 2PC semantics then the whole transaction must have those
> semantics or it fails.  I was previous leaning toward the latter
> approach, but I guess now the former approach is sounding better.  I'm
> not totally certain I know what's best here.
>

I agree that the former is better. That way, we also can control that
parameter at transaction level. If we allow the 'prefer' behavior we
need to manage not only 2PC-capable foreign server but also
2PC-non-capable foreign server. It requires all FDW to call the
registration function. So I think two-values parameter would be
better.

BTW, sorry for late submitting the updated patch. I'll post the
updated patch in this week but I'd like to share the new APIs design
beforehand.

APIs that I'd like to add are 4 functions and 1 registration function:
PrepareForeignTransaction, CommitForeignTransaction,
RollbackForeignTransaction, IsTwoPhaseCommitEnabled and
FdwXactRegisterForeignServer. All FDWs that want to support atomic
commit have to support all APIs and to call the registration function
when foreign transaction opens.

Transaction processing sequence with atomic commit will be like follows.

1. FDW begins a transaction on a 2PC-capable foreign server.
2. FDW registers the foreign server with/without a foreign transaction
identifier by calling FdwXactRegisterForeignServer().
    * The passing foreign transaction identifier can be NULL. If it's
NULL, the core code constructs it like 'fx_<4 random
chars>_<serverid>_<userid>'.
    * Providing foreign transaction identifier at beginning of
transaction is useful because some DBMS such as Oracle database or
MySQL requires a transaction identifier at beginning of its XA
transaction.
    * Registration the foreign transaction guarantees that its
transaction is controlled by the core and APIs are called at an
appropriate time.
3. Perform 1 and 2 whenever the distributed transaction opens a
transaction on 2PC-capable foreign servers.
    * When the distributed transaction modifies a foreign server, we
mark it as 'modified'.
        * This mark is used at commit to check if it's necessary to use 2PC.
    * At the same time, we also check if the foreign server enables
2PC by calling IsTwoPhaseCommitEnabled().
        * If an FDW disables or doesn't provide that function, we mark
XACT_FALGS_FDWNONPREPARE. This is necessary because we need to
remember wrote 2PC-non-capable foreign server.
    * When the distributed transaction modifies temp table locally,
mark XACT_FALGS_WROTENONTEMREL.
        * This is used at commit to check i it's necessary to use 2PC as well.
4. During pre-commit, we prepare all foreign transaction if 2PC is
required by calling PrepareFOreignTransaciton()
    * If we don't need to use 2PC, we commit all foreign transactions
by calling CommitForeignTransaction() with 'prepared' == false.
    * If transaction raises an error during or until pre-commit for
whatever reason, we rollback them calling
RollbackForeignTransaction(). In case of rollback, we could call
RollbackForeignTransaction() with 'prepared' == true but the
corresponding foreign transaction might not exist. This is an API
contract.
5. Local commit
6. Launch a foreign transaction resolver process and wait for it to
resolve all foreign transactions.
    * The foreign transactions are resolved according to the status of
local transaction by calling CommitForeignTransaciton or
RollbackForeignTransaction() with 'prepared' == true.
7. After resolved all foreign transactions, the resolver process wake
the waiting backend process up.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center