Re: High Availability, Load Balancing, and Replication Feature Matrix - Mailing list pgsql-docs

From Bruce Momjian
Subject Re: High Availability, Load Balancing, and Replication Feature Matrix
Date
Msg-id 200711101920.lAAJKFf23421@momjian.us
Whole thread Raw
In response to Re: High Availability, Load Balancing, and Replication Feature Matrix  (Markus Schiltknecht <markus@bluegap.ch>)
Responses Re: High Availability, Load Balancing, and Replication Feature Matrix  (Markus Schiltknecht <markus@bluegap.ch>)
List pgsql-docs
Markus Schiltknecht wrote:
> Hello Bruce,
>
> Bruce Momjian wrote:
> > I have added a High Availability, Load Balancing, and Replication
> > Feature Matrix table to the docs:
>
> Nice work. I appreciate your efforts in clearing up the uncertainty that
> surrounds this topic.
>
> As you might have guessed, I have some complaints regarding the Feature
> Matrix. I hope this won't discourage you, but I'd rather like to
> contribute to an improved variant.

Not sure if you were around when we wrote this chapter but there was a
lot of good discussion to get it to where it is now.

> First of all, I don't quite like the negated formulations. I can see
> that you want a dot to mark a positive feature, but I find it hard to
> understand.

Well, the idea is to say "what things do I want and what offers it?"  If
you have positive/negative it makes it harder to do that.  I realize it
is confusing in a different way.  We could split out the negatives into
a different table but that seems worse.

> I'm especially puzzled about is the "master never locks others". All
> first four, namely "shared disk failover", "file system replication",
> "warm standby" and "master slave replication", block others (the slaves)
> completely, which is about the worst kind of lock.

That item assumes you have slaves that are trying to do work.  The point
is that multi-master slows down the other slaves in a way no other
option does, which is the reason we don't support it yet.  I have
updated the wording to "No inter-server locking delay".

> Comparing between "File System Replication" and "Shared Disk Failover",
> you state that the former has "master server overhead", while the later
> doesn't. Seen solely from the single server node, this might be true.
> But summarized over the cluster, you have a network with a quite similar
> load in both cases. I wouldn't say one has less overhead than the other
> per definition.

The point is that file system replication has to wait for the standby
server to write the blocks, while disk failover does not.  I don't think
the network is an issue considering many use NAS anyway.

> Then, you are mixing apples and oranges. Why should a "statement based
> replication solution" not require conflict resolution? You can build
> eager as well as lazy statement based replication solutions, that does
> not have anything to do with the other, does it?

There is no dot there so I am saying "statement based replication
solution" requires conflict resolution.  Agreed you could do it without
conflict resolution and it is kind of independent.  How should we deal
with this?

> Same applies to "master slave replication" and "per table granularity".

I tried to mark them based on existing or typical solutions, but you are
right, especially if the master/slave is not PITR based.  Some can't do
per-table, like disk failover.

> And in the special case of (async, but eager) Postgres-R also to "async
> multi-master replication" and "no conflict resolution necessary".
> Although I can understand that that's a pretty nifty difference.

Yea, the table isn't going to be 100% but tries to summarize what in the
section above.

> Given the matrix focuses on practically available solutions, I can see
> some value in it. But from a more theoretical viewpoint, I find it
> pretty confusing. Now, if you want a practically usable feature
> comparison table, I'd strongly vote for clearly mentioning the products
> you have in mind - otherwise the table pretends to be something it is not.

I considered that and I can add something that says you have to consider
the text above for more details.  Some require solution mentions, Slony,
while others do not, like disk failover.

> If it should be theoretically correct without mentioning available
> solutions, I'd rather vote for explaining the terms and concepts.
>
> To clarify my viewpoint, I'll quickly go over the features you're
> mentioning and associate them with the concepts, as I understand them.
>
>   - special hardware:  always nice, not much theoretical effect, a
>                        network is a network, storage is storage.
>
>   - multiple masters:  that's what single- vs multi masters is about:
>                        writing transactions. Can be mixed with
>                        eager/lazy, every combination makes
>                        sense for certain applications.
>
>   - overhead:          replication per definition generates overhead,
>                        question is: how much, and where.
>
>   - locking of others: again, question of how much and how fine grained
>                        the locking is. In a single master repl. sol., the
>                        slaves are locked completely. In lazy repl. sol.,
>                        the locking is deferred until after the commit,
>                        during conflict resolution. In eager repl. sol.,
>                        the locking needs to take place before the commit.
>                        But all replication systems need some kind of
>                        locks!
>
>   - data loss on fail: solely dependent on eager/lazy. (Given a real
>                        replication, with a replica, which shared storage
>                        does not provide, IMO)
>
>   - slaves read only:  theoretically possible with all replication
>                        system, are they lazy/eager, single-/multi-
>                        master. That we are unable to read from slave
>                        nodes is an implementation annoyance of
>                        Postgres, if you want.
>
>   - per table gran.:   again, independent of lazy/eager, single-/multi.
>                        Depends solely on the level where data is
>                        replicated: block device, file system, statement,
>                        WAL or other internal format.
>
>   - conflict resol.:   in multi master systems, that depends on the
>                        lazy/eager property. Single master systems
>                        obviously never need to resolve conflicts.

Right, but the point of the chart is go give people guidance, not to
give them details;  that is in the part above.

> IMO, "data partitioning" is entirely perpendicular to replication. It
> can be combined, in various ways. There's horizontal and vertical
> partitioning, eager/lazy and single-/multi-master replication. I guess
> we could find a use case for most of the combinations thereof. (Kudos
> for finding a combination which definitely has no use case).

Really?  Are you saying the office example is useless?  What is a good
use case for this?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

pgsql-docs by date:

Previous
From: Markus Schiltknecht
Date:
Subject: Re: High Availability, Load Balancing, and Replication Feature Matrix
Next
From: Tom Lane
Date:
Subject: Placement of contrib modules in SGML documentation