Re: High Availability, Load Balancing, and Replication Feature Matrix - Mailing list pgsql-docs

From Bruce Momjian
Subject Re: High Availability, Load Balancing, and Replication Feature Matrix
Date
Msg-id 200711111452.lABEqv023393@momjian.us
Whole thread Raw
In response to Re: High Availability, Load Balancing, and Replication Feature Matrix  (Markus Schiltknecht <markus@bluegap.ch>)
Responses Re: High Availability, Load Balancing, and Replication Feature Matrix  (Markus Schiltknecht <markus@bluegap.ch>)
List pgsql-docs
Markus Schiltknecht wrote:
> Hello Bruce,
>
> thank you for your detailed answer.
>
> Bruce Momjian wrote:
> > Not sure if you were around when we wrote this chapter but there was a
> > lot of good discussion to get it to where it is now.
>
> Uh.. IIRC quite a good part of the discussion for chapter 23 was between
> you and me, pretty exactly a year ago. Or what discussion are you
> referring to?

Sorry, I forgot who was involved in that discussion.

> >> First of all, I don't quite like the negated formulations. I can see
> >> that you want a dot to mark a positive feature, but I find it hard to
> >> understand.
> >
> > Well, the idea is to say "what things do I want and what offers it?"  If
> > you have positive/negative it makes it harder to do that.  I realize it
> > is confusing in a different way.  We could split out the negatives into
> > a different table but that seems worse.
>
> Hm.. yeah, I can understand that. As those are thing the user wants, I
> think we could formulate positive wishes. Just a proposal:
>
> No special hardware required:        works with commodity hardware
>
> No conflict resolution necessary:    maintains durability property
>
> master failure will never lose data: maintains durability
>                                       on single node failure
>
> With the other two I'm unsure.. I see it's very hard to find helpful
> positive formulations...

Yea, that's where I got stuck --- that the positives were harder to
understand.

> >> I'm especially puzzled about is the "master never locks others". All
> >> first four, namely "shared disk failover", "file system replication",
> >> "warm standby" and "master slave replication", block others (the slaves)
> >> completely, which is about the worst kind of lock.
> >
> > That item assumes you have slaves that are trying to do work.
>
> Yes, replication in general assumes that. So does high availability,
> IMO. Having read-only slaves means nothing else but locking them from
> write access.
>
> > The point
> > is that multi-master slows down the other slaves in a way no other
> > option does,
>
> Uh.. you mean the other masters? But according to that statement, "async

Sorry, I meant that a master that is modifying data is slowed down by
other masters to an extent that doesn't happen in other cases (e.g. with
slaves).  Is the current "No inter-server locking delay" OK?

> multi-master replication" as well as "statement-based replication
> middleware" should not have a dot, because those as well slow down other
> masters. In the async case at different points in time, yes, but all
> master have to write the data, which slows them down.

Yea, that is why I have the new text about locking.

> I'm suspecting you are rather talking about the network dependent commit
> latency of eager replication solutions. I find the term "locking delay"
> for that rather confusing. How about: "normal commit latency"? (Normal,
> as in: depends on the storage system used, instead of on the network and
> storage).

Uh, I assume that multi-master locking happens often before the commit.

> > which is the reason we don't support it yet.
>
> Uhm.. PgCluster *is* a synchronous multi-master replication solution. It
> also is a middleware and it does statement based replication. Which dots
> of the matrix do you think apply for it?

I don't consider PgCluster middleware because the servers have to
cooperate with the middleware.  And I am told it is much slower for
writes than a single server which supports my "locking" item, though it
is more "waiting for other masters" that is the delay, I think.

> >> Comparing between "File System Replication" and "Shared Disk Failover",
> >> you state that the former has "master server overhead", while the later
> >> doesn't. Seen solely from the single server node, this might be true.
> >> But summarized over the cluster, you have a network with a quite similar
> >> load in both cases. I wouldn't say one has less overhead than the other
> >> per definition.
> >
> > The point is that file system replication has to wait for the standby
> > server to write the blocks, while disk failover does not.
>
> In "disk failover", the master has to wait for the NAS to write the
> blocks on mirrored disks, while in "file system replication" the master
> has to wait for multiple nodes to write the blocks. As the nodes of a
> replicated file system can write in parallel, very much like a RAID-1
> NAS, I don't see that much of a difference there.

I don't assume the disk failover has mirrored disks.  It can just like a
single server can, but it isn't part of the backend process, and I
assume a RAID card that has RAM that can cache writes.  In the file
system replication case the server is having to send commands to the
mirror and wait for completion.

> > I don't think
> > the network is an issue considering many use NAS anyway.
>
> I think you are comparing an enterprise NAS to a low-cost, commodity
> hardware clustered filesystem. Take the same amount of money and the
> same number of mirrors and you'll get comparable performance.

Agreed.  In the one case you are relying on another server, and in the
NAS case you are relying on a black box server.  I think the big
difference is that the other server is a separate entity, while the NAS
is a shared item.

> > There is no dot there so I am saying "statement based replication
> > solution" requires conflict resolution.  Agreed you could do it without
> > conflict resolution and it is kind of independent.  How should we deal
> > with this?
>
> Maybe a third state: 'n/a'?

Good idea, or "~".  How would middleware avoid conflicts, i.e. how would
it know that two incoming queries were in conflict?

> >> And in the special case of (async, but eager) Postgres-R also to "async
> >> multi-master replication" and "no conflict resolution necessary".
> >> Although I can understand that that's a pretty nifty difference.
> >
> > Yea, the table isn't going to be 100% but tries to summarize what in the
> > section above.
>
> That's fine.
>
>  > [...]
>  >
> > Right, but the point of the chart is go give people guidance, not to
> > give them details;  that is in the part above.
>
> Well, sure. But then we are back at the discussion of the parts above,
> which is quite fuzzy, IMO. I'm still missing those details. And I'm
> dubious about it being a basis for a feature matrix with clear dots or
> no dots. For the reasons explained above.
>
> >> IMO, "data partitioning" is entirely perpendicular to replication. It
> >> can be combined, in various ways. There's horizontal and vertical
> >> partitioning, eager/lazy and single-/multi-master replication. I guess
> >> we could find a use case for most of the combinations thereof. (Kudos
> >> for finding a combination which definitely has no use case).
> >
> > Really?  Are you saying the office example is useless?  What is a good
> > use case for this?
>
> Uhm, no sorry, I was unclear here. And not even correct. I was trying to
> say that there's a use case for each and every combination of the three
> properties above.

OK.

> I'm now revoking one: "master-slave" combines very badly with "eager
> replication". Because if you do eager replication, you can as well have
> multiple masters without any additional cost. So, only these three

Right.  I was trying to hit typical usages.

> combinations make sense:
>
>   - lazy, master-slave
>   - eager, master-slave
>   - eager, multi-master

Yep.

> Now, no partitioning, horizontal as well as vertical partitioning can be
> combined with any of the above replication method. Giving a total of
> nine combinations, which all make perfect sense for certain applications.
>
> If I understand correctly, your office example is about horizontal data
> partitioning, with lazy, master-slave replication for the read-only copy
> of the remote data. It makes perfect sense.

I did move it below and removed it from the chart because as you say how
to replicate to the slaves is an independent issue.

> With regard to replication, there's another feature I think would be
> worth mentioning: dynamic addition or removal of nodes (masters or
> slaves). But that's solely implementation dependent, so it probably
> doesn't fit into the matrix.

Yea, I had that but found you could add/remove slaves easily in most
cases.

> Another interesting property I'm missing is the existence of single
> points of failures.

Ah, yea, but then you get into power and fire issues.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

pgsql-docs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [PATCHES] Contrib docs v1
Next
From: Bruce Momjian
Date:
Subject: Re: Placement of contrib modules in SGML documentation