Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Transactions involving multiple postgres foreign servers, take 2
Date
Msg-id CAExHW5u_8DcnYApXXLRovpcG33B3k=BcP0oLs+5FDRDNyn2fBQ@mail.gmail.com
Whole thread Raw
In response to Re: Transactions involving multiple postgres foreign servers, take 2  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Thu, Jun 18, 2020 at 6:49 PM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Thu, Jun 18, 2020 at 04:09:56PM +0530, Amit Kapila wrote:
> > You are right and we are not going to claim that after this feature is
> > committed.  This feature has independent use cases like it can allow
> > parallel copy when foreign tables are involved once we have parallel
> > copy and surely there will be more.  I think it is clear that we need
> > atomic visibility (some way to ensure global consistency) to avoid the
> > data inconsistency problems you and I are worried about and we can do
> > that as a separate patch but at this stage, it would be good if we can
> > have some high-level design of that as well so that if we need some
> > adjustments in the design/implementation of this patch then we can do
> > it now.  I think there is some discussion on the other threads (like
> > [1]) about the kind of stuff we are worried about which I need to
> > follow up on to study the impact.
> >
> > Having said that, I don't think that is a reason to stop reviewing or
> > working on this patch.
>
> I think our first step is to allow sharding to work on read-only
> databases, e.g. data warehousing.  Read/write will require global
> snapshots.  It is true that 2PC is limited usefulness without global
> snapshots, because, by definition, systems using 2PC are read-write
> systems.   However, I can see cases where you are loading data into a
> data warehouse but want 2PC so the systems remain consistent even if
> there is a crash during loading.
>

For sharding, just implementing 2PC without global consistency
provides limited functionality. But for general purpose federated
databases 2PC serves an important functionality - atomic visibility.
When PostgreSQL is used as one of the coordinators in a heterogeneous
federated database system, it's not expected to have global
consistency or even atomic visibility. But it needs a guarantee that
once a transaction commit, all its legs are committed. 2PC provides
that guarantee as long as the other databases keep their promise that
prepared transactions will always get committed when requested so.
Subtle to this is HA requirement from these databases as well. So the
functionality provided by this patch is important outside the sharding
case as well.

As you said, even for a data warehousing application, there is some
write in the form of loading/merging data. If that write happens
across multiple servers, we need  atomic commit to be guaranteed. Some
of these applications can work even if global consistency and atomic
visibility is guaranteed eventually.

-- 
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: snowball ASCII stemmer configuration
Next
From: Thomas Munro
Date:
Subject: Re: doing something about the broken dynloader.h symlink