Re: BDR and TX obeyance - Mailing list pgsql-general

From Riley Berton
Subject Re: BDR and TX obeyance
Date
Msg-id rgslh80ry7j.fsf@rberton.i-did-not-set--mail-host-address--so-tickle-me
Whole thread Raw
In response to Re: BDR and TX obeyance  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-general
Craig Ringer <craig@2ndquadrant.com> writes:

> On 5 January 2016 at 04:09, Riley Berton <rberton@appnexus.com> wrote:
>
>>
>> The conflict on the "thingy" table has resulted in node2 winning based
>> on last_update wins default resolution.  However, both inserts have
>> applied.  My expectation is that the entire TX applies or does not
>> apply.  This expectation is clearly wrong.
>>
>
> Correct. Conflicts are resolved row-by-row. Their outcomes are determined
> (by default) by transaction commit timestamps, but the conflicts themselves
> are row-by-row.
>
> Because BDR:
>
> * applies changes to other nodes only AFTER commit on the origin node; and
> * does not take row and table locks across nodes
>
> it has no way to sensibly apply all or none of a transaction on downstream
> peers because the client has already committed and moved on to other
> things. If the xact doesn't apply, what do we do? Log output on the failing
> node(s) and throw it away?

Yes.  This is impossible.  I understand that clearly now.

>
> It's probably practical to have xacts abort on the first conflict, though
> some thought would be needed about making sure that doesn't break
> consistency requirements across nodes. It's not clear if doing so is useful
> though.
>
> For that you IMO want synchronous replication where the client doesn't get
> a local COMMIT until all nodes have confirmed they can commit the xact.
> That's something that could be added to BDR in future, but doing it well it
> requires support for logical decoding of prepared transactions which is
> currently missing from PostgreSQL's logical decoding support. If it's
> something you think is important/useful you might want to explore what's
> involved in implementing that.

I have considered 2 paths here.

1. What you suggest above.
2. Write sharding across the masters with RLS to prevent writes to the
wrong master.  I have not fully thought through whether this will work
in practice, but as long as the constraints are identical on all the
masters and we never mutate the same row(s) on multiple masters we
should never get conflicts.  This requires application design that ties
all the data to some root node which can be used to shard on and is not
applicable generally.

>
> Question is: is there a way (via a custom conflict handler) to have the
>> TX obeyed?
>
>
> No.
>
> Even if you ERROR in your handler, BDR will just retry the xact. It has no
> concept of "throw this transaction away forever".
>
>
>> I can't see a way to even implement a simple bank account
>> database that changes multiple tables in a single transaction without
>> having the data end up in an inconsistent state.  Am I missing something
>> obvious here?
>>
>
> You're trying to use asynchronous multimaster replication as if it was an
> application-transparent synchronous cluster with a global transaction
> manager and global lock manager.
>
> BDR is not application-transparent. You need to understand replication
> conflicts and think about them. It does not preserve full READ COMMITTED
> semantics across nodes. This comes with big benefits in partition
> tolerance, performance and latency tolerance, but it means you can't point
> an existing app at more than one node and expect it to work properly.
>
> The documentation tries over and over to emphasise this. Can you suggest
> where it can be made clearer or more prominent?

I was not the only one to be confused by this.  I think the reputation
of PostgreSQL is for correct transactional semantics by default.  BDR
requires a different way of thinking about it.  You might prevent future
confusion by giving some example scenarios in the Overview (or Concepts)
where a traditional single master would result in X but BDR across 2
masters would result in Y.

Thanks so much for the detailed response.

riley

>
> --
>  Craig Ringer                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services

pgsql-general by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Recovery regression tests
Next
From: Simon Riggs
Date:
Subject: Re: BDR and TX obeyance