Thread: BDR and TX obeyance
I have been experimenting with BDR and have a question about how BDR interacts with transactions. bdrdemo=# create table thingy (id INT, value TEXT, PRIMARY KEY(id)); CREATE TABLE bdrdemo=# create table tx_log(id INT, msg TEXT, PRIMARY KEY(id)); CREATE TABLE bdrdemo=# insert into thingy (id, value) VALUES (1, 'insert from node1'); INSERT 0 1 From node1: bdrdemo=# begin; BEGIN bdrdemo=# update thingy set value='update from node1' where id=1; UPDATE 1 bdrdemo=# insert into tx_log (id, msg) values (1, 'tx log insert from node1'); INSERT 0 1 bdrdemo=# commit; COMMIT Simultaneously from node2: bdrdemo=# begin; BEGIN bdrdemo=# update thingy set value='update from node2' where id=1; UPDATE 1 bdrdemo=# insert into tx_log (id, msg) values (2, 'tx log insert from node2'); INSERT 0 1 bdrdemo=# commit; COMMIT ... bdrdemo=# select * from tx_log ; id | msg ----+-------------------------- 1 | tx log insert from node1 2 | tx log insert from node2 (2 rows) bdrdemo=# select * from thingy ; id | value ----+------------------- 1 | update from node2 (1 row) The conflict on the "thingy" table has resulted in node2 winning based on last_update wins default resolution. However, both inserts have applied. My expectation is that the entire TX applies or does not apply. This expectation is clearly wrong. Question is: is there a way (via a custom conflict handler) to have the TX obeyed? I can't see a way to even implement a simple bank account database that changes multiple tables in a single transaction without having the data end up in an inconsistent state. Am I missing something obvious here? Thanks in advance for any help. riley
I think this is the nature of "async multi master"... IMHO, It would be necessary to be "sync multi master" (with two-phase commit?) to get the behavior you expect. Atenciosamente, Edson Carlos Ericksson Richter Em 04/01/2016 18:09, Riley Berton escreveu: > I have been experimenting with BDR and have a question about how BDR > interacts with transactions. > > bdrdemo=# create table thingy (id INT, value TEXT, PRIMARY KEY(id)); > CREATE TABLE > bdrdemo=# create table tx_log(id INT, msg TEXT, PRIMARY KEY(id)); > CREATE TABLE > bdrdemo=# insert into thingy (id, value) VALUES (1, 'insert from node1'); > INSERT 0 1 > > From node1: > > bdrdemo=# begin; > BEGIN > bdrdemo=# update thingy set value='update from node1' where id=1; > UPDATE 1 > bdrdemo=# insert into tx_log (id, msg) values (1, 'tx log insert from node1'); > INSERT 0 1 > bdrdemo=# commit; > COMMIT > > Simultaneously from node2: > > bdrdemo=# begin; > BEGIN > bdrdemo=# update thingy set value='update from node2' where id=1; > UPDATE 1 > bdrdemo=# insert into tx_log (id, msg) values (2, 'tx log insert from node2'); > INSERT 0 1 > bdrdemo=# commit; > COMMIT > > ... > > bdrdemo=# select * from tx_log ; > id | msg > ----+-------------------------- > 1 | tx log insert from node1 > 2 | tx log insert from node2 > (2 rows) > > bdrdemo=# select * from thingy ; > id | value > ----+------------------- > 1 | update from node2 > (1 row) > > The conflict on the "thingy" table has resulted in node2 winning based > on last_update wins default resolution. However, both inserts have > applied. My expectation is that the entire TX applies or does not > apply. This expectation is clearly wrong. > > Question is: is there a way (via a custom conflict handler) to have the > TX obeyed? I can't see a way to even implement a simple bank account > database that changes multiple tables in a single transaction without > having the data end up in an inconsistent state. Am I missing something > obvious here? > > Thanks in advance for any help. > > riley >
BTW, I'm also looking for a "synchronous multi-master" solution... If you find one, please share :-) The only solution I've found so far is a middleware that is close, the C-Jdbc/Sequoia, which seems not being actively maintained for a while now. Regards, Edson Atenciosamente, Edson Carlos Ericksson Richter Em 04/01/2016 18:09, Riley Berton escreveu: > I have been experimenting with BDR and have a question about how BDR > interacts with transactions. > > bdrdemo=# create table thingy (id INT, value TEXT, PRIMARY KEY(id)); > CREATE TABLE > bdrdemo=# create table tx_log(id INT, msg TEXT, PRIMARY KEY(id)); > CREATE TABLE > bdrdemo=# insert into thingy (id, value) VALUES (1, 'insert from node1'); > INSERT 0 1 > > From node1: > > bdrdemo=# begin; > BEGIN > bdrdemo=# update thingy set value='update from node1' where id=1; > UPDATE 1 > bdrdemo=# insert into tx_log (id, msg) values (1, 'tx log insert from node1'); > INSERT 0 1 > bdrdemo=# commit; > COMMIT > > Simultaneously from node2: > > bdrdemo=# begin; > BEGIN > bdrdemo=# update thingy set value='update from node2' where id=1; > UPDATE 1 > bdrdemo=# insert into tx_log (id, msg) values (2, 'tx log insert from node2'); > INSERT 0 1 > bdrdemo=# commit; > COMMIT > > ... > > bdrdemo=# select * from tx_log ; > id | msg > ----+-------------------------- > 1 | tx log insert from node1 > 2 | tx log insert from node2 > (2 rows) > > bdrdemo=# select * from thingy ; > id | value > ----+------------------- > 1 | update from node2 > (1 row) > > The conflict on the "thingy" table has resulted in node2 winning based > on last_update wins default resolution. However, both inserts have > applied. My expectation is that the entire TX applies or does not > apply. This expectation is clearly wrong. > > Question is: is there a way (via a custom conflict handler) to have the > TX obeyed? I can't see a way to even implement a simple bank account > database that changes multiple tables in a single transaction without > having the data end up in an inconsistent state. Am I missing something > obvious here? > > Thanks in advance for any help. > > riley >
Edson Richter <edsonrichter@hotmail.com> writes: > BTW, I'm also looking for a "synchronous multi-master" solution... If > you find one, please share :-) > The only solution I've found so far is a middleware that is close, the > C-Jdbc/Sequoia, which seems not being actively maintained for a while > now. See Postgres-R for sync multi-master. http://www.postgres-r.org/documentation/ Note that it is specifically geared towards low-latency environments and is likely not suitable for geo-distributed applications. It hasn't been touched in 4 years so likely not actively maintained. riley > > Regards, > > Edson > > Atenciosamente, > > Edson Carlos Ericksson Richter > > Em 04/01/2016 18:09, Riley Berton escreveu: >> I have been experimenting with BDR and have a question about how BDR >> interacts with transactions. >> >> bdrdemo=# create table thingy (id INT, value TEXT, PRIMARY KEY(id)); >> CREATE TABLE >> bdrdemo=# create table tx_log(id INT, msg TEXT, PRIMARY KEY(id)); >> CREATE TABLE >> bdrdemo=# insert into thingy (id, value) VALUES (1, 'insert from node1'); >> INSERT 0 1 >> >> From node1: >> >> bdrdemo=# begin; >> BEGIN >> bdrdemo=# update thingy set value='update from node1' where id=1; >> UPDATE 1 >> bdrdemo=# insert into tx_log (id, msg) values (1, 'tx log insert from node1'); >> INSERT 0 1 >> bdrdemo=# commit; >> COMMIT >> >> Simultaneously from node2: >> >> bdrdemo=# begin; >> BEGIN >> bdrdemo=# update thingy set value='update from node2' where id=1; >> UPDATE 1 >> bdrdemo=# insert into tx_log (id, msg) values (2, 'tx log insert from node2'); >> INSERT 0 1 >> bdrdemo=# commit; >> COMMIT >> >> ... >> >> bdrdemo=# select * from tx_log ; >> id | msg >> ----+-------------------------- >> 1 | tx log insert from node1 >> 2 | tx log insert from node2 >> (2 rows) >> >> bdrdemo=# select * from thingy ; >> id | value >> ----+------------------- >> 1 | update from node2 >> (1 row) >> >> The conflict on the "thingy" table has resulted in node2 winning based >> on last_update wins default resolution. However, both inserts have >> applied. My expectation is that the entire TX applies or does not >> apply. This expectation is clearly wrong. >> >> Question is: is there a way (via a custom conflict handler) to have the >> TX obeyed? I can't see a way to even implement a simple bank account >> database that changes multiple tables in a single transaction without >> having the data end up in an inconsistent state. Am I missing something >> obvious here? >> >> Thanks in advance for any help. >> >> riley >> > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
Em 05/01/2016 11:42, Riley Berton escreveu: > Edson Richter <edsonrichter@hotmail.com> writes: > >> BTW, I'm also looking for a "synchronous multi-master" solution... If >> you find one, please share :-) >> The only solution I've found so far is a middleware that is close, the >> C-Jdbc/Sequoia, which seems not being actively maintained for a while >> now. > See Postgres-R for sync multi-master. > http://www.postgres-r.org/documentation/ > > Note that it is specifically geared towards low-latency environments and > is likely not suitable for geo-distributed applications. It hasn't been > touched in 4 years so likely not actively maintained. > > riley That seems to be what I'm looking for... As soon as I get some free time, I'll give a try. Regards, Edson
On 5 January 2016 at 04:09, Riley Berton <rberton@appnexus.com> wrote:
The conflict on the "thingy" table has resulted in node2 winning based
on last_update wins default resolution. However, both inserts have
applied. My expectation is that the entire TX applies or does not
apply. This expectation is clearly wrong.
Correct. Conflicts are resolved row-by-row. Their outcomes are determined (by default) by transaction commit timestamps, but the conflicts themselves are row-by-row.
Because BDR:
* applies changes to other nodes only AFTER commit on the origin node; and
* does not take row and table locks across nodes
it has no way to sensibly apply all or none of a transaction on downstream peers because the client has already committed and moved on to other things. If the xact doesn't apply, what do we do? Log output on the failing node(s) and throw it away?
It's probably practical to have xacts abort on the first conflict, though some thought would be needed about making sure that doesn't break consistency requirements across nodes. It's not clear if doing so is useful though.
For that you IMO want synchronous replication where the client doesn't get a local COMMIT until all nodes have confirmed they can commit the xact. That's something that could be added to BDR in future, but doing it well it requires support for logical decoding of prepared transactions which is currently missing from PostgreSQL's logical decoding support. If it's something you think is important/useful you might want to explore what's involved in implementing that.
Question is: is there a way (via a custom conflict handler) to have the
TX obeyed?
No.
Even if you ERROR in your handler, BDR will just retry the xact. It has no concept of "throw this transaction away forever".
I can't see a way to even implement a simple bank account
database that changes multiple tables in a single transaction without
having the data end up in an inconsistent state. Am I missing something
obvious here?
You're trying to use asynchronous multimaster replication as if it was an application-transparent synchronous cluster with a global transaction manager and global lock manager.
BDR is not application-transparent. You need to understand replication conflicts and think about them. It does not preserve full READ COMMITTED semantics across nodes. This comes with big benefits in partition tolerance, performance and latency tolerance, but it means you can't point an existing app at more than one node and expect it to work properly.
The documentation tries over and over to emphasise this. Can you suggest where it can be made clearer or more prominent?
Craig Ringer <craig@2ndquadrant.com> writes: > On 5 January 2016 at 04:09, Riley Berton <rberton@appnexus.com> wrote: > >> >> The conflict on the "thingy" table has resulted in node2 winning based >> on last_update wins default resolution. However, both inserts have >> applied. My expectation is that the entire TX applies or does not >> apply. This expectation is clearly wrong. >> > > Correct. Conflicts are resolved row-by-row. Their outcomes are determined > (by default) by transaction commit timestamps, but the conflicts themselves > are row-by-row. > > Because BDR: > > * applies changes to other nodes only AFTER commit on the origin node; and > * does not take row and table locks across nodes > > it has no way to sensibly apply all or none of a transaction on downstream > peers because the client has already committed and moved on to other > things. If the xact doesn't apply, what do we do? Log output on the failing > node(s) and throw it away? Yes. This is impossible. I understand that clearly now. > > It's probably practical to have xacts abort on the first conflict, though > some thought would be needed about making sure that doesn't break > consistency requirements across nodes. It's not clear if doing so is useful > though. > > For that you IMO want synchronous replication where the client doesn't get > a local COMMIT until all nodes have confirmed they can commit the xact. > That's something that could be added to BDR in future, but doing it well it > requires support for logical decoding of prepared transactions which is > currently missing from PostgreSQL's logical decoding support. If it's > something you think is important/useful you might want to explore what's > involved in implementing that. I have considered 2 paths here. 1. What you suggest above. 2. Write sharding across the masters with RLS to prevent writes to the wrong master. I have not fully thought through whether this will work in practice, but as long as the constraints are identical on all the masters and we never mutate the same row(s) on multiple masters we should never get conflicts. This requires application design that ties all the data to some root node which can be used to shard on and is not applicable generally. > > Question is: is there a way (via a custom conflict handler) to have the >> TX obeyed? > > > No. > > Even if you ERROR in your handler, BDR will just retry the xact. It has no > concept of "throw this transaction away forever". > > >> I can't see a way to even implement a simple bank account >> database that changes multiple tables in a single transaction without >> having the data end up in an inconsistent state. Am I missing something >> obvious here? >> > > You're trying to use asynchronous multimaster replication as if it was an > application-transparent synchronous cluster with a global transaction > manager and global lock manager. > > BDR is not application-transparent. You need to understand replication > conflicts and think about them. It does not preserve full READ COMMITTED > semantics across nodes. This comes with big benefits in partition > tolerance, performance and latency tolerance, but it means you can't point > an existing app at more than one node and expect it to work properly. > > The documentation tries over and over to emphasise this. Can you suggest > where it can be made clearer or more prominent? I was not the only one to be confused by this. I think the reputation of PostgreSQL is for correct transactional semantics by default. BDR requires a different way of thinking about it. You might prevent future confusion by giving some example scenarios in the Overview (or Concepts) where a traditional single master would result in X but BDR across 2 masters would result in Y. Thanks so much for the detailed response. riley > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services
On 4 January 2016 at 20:09, Riley Berton <rberton@appnexus.com> wrote:
--
The conflict on the "thingy" table has resulted in node2 winning based
on last_update wins default resolution. However, both inserts have
applied. My expectation is that the entire TX applies or does not
apply. This expectation is clearly wrong.
I'm sorry to say: Yes, I think so.
If you try to update the same data at the same time in multiple locations, your application has a significant problem, period. That's just physics.
How that problem manifests itself is really based upon your choice of technology. Choosing Postgres, Oracle or ProblemoDB won't change that.
If you choose single master, then you get an error because one of the nodes can't be updated at all. If you have multiple masters, then you get to choose between an early abort because of serialization errors (which causes a huge performance overhead), or a later difficulty when conflict resolution kicks in (which is why BDR supports post-commit conflict resolution semantics). Or it you use a shared cache system like RAC then you get significant performance degradation as the data blocks ping around the cluster.
I'm personally in favour of giving people choice about how they configure their databases. So you will see me acting to extend the range of options available to users, allowing them to make informed choices.
Question is: is there a way (via a custom conflict handler) to have the
TX obeyed? I can't see a way to even implement a simple bank account
database that changes multiple tables in a single transaction without
having the data end up in an inconsistent state. Am I missing something
obvious here?
Don't use updates in that way. Banks never do, but financial institutions and many others have been using replication technology that supports post-commit conflict resolution for more than a decade in products from SQLServer, Informix and others. BDR was specified by a customer/user with expert knowledge of that area and who knew what he wanted and didn't want to see in the final product. I think those choices were good ones.
Design your applications carefully, understanding the trade-offs between availability, local access times, serializability and performance.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services