Hi Markus,
> I didn't have much reliability issues with ensemble, appia or spread, so
> far. Although, I admit I didn't ever run any of these in production.
> Performance is certainly an issue, yes.
>
I may suggest another reading even though a bit dates, most of the
results still apply:
http://jmob.objectweb.org/jgroups/JGroups-middleware-2004.pdf
The baseline is that if you use UDP multicast, you need a dedicated
switch and the tuning is a nightmare. I discussed these issues with the
developers of Spread and they have no real magic. TCP seems a more
reliable alternative (especially predictable performance) but the TCP
timeouts are also tricky to tune depending on the platform. We worked
quite a bit with Nuno around Appia in the context of Sequoia and
performance can be outstanding when properly tuned or absolutely awful
is some default values are wrong. The chaotic behavior of GCS under
stress quickly compromises the reliability of the replication system,
and admission control on UDP multicast has no good solution so far.
It's just a heads up on what is awaiting you in production when the
system is stressed. There is no good solution so far besides a good
admission control on top of the GCS (in the application).
I am now off for the holidays.
Cheers,
Emmanuel
--
Emmanuel Cecchet
Aster Data Systems
Web: http://www.asterdata.com