Thread: Thoughts on the mirroring system etc

Thoughts on the mirroring system etc

From
"Magnus Hagander"
Date:
Hello!

In light of yesterdays release and what was probably the largest hit so
far on the current websites "way of things", I had a couple of thoughts.
The site more or less went down, which is not good. What's in there now
is a temporary fix, and a permanent one is needed. And one that does not
need manual intervention to fix (as this one did). So here are some
thoughts on what I think need to be done.

I know some of these things have been discussed before. Some exactly the
same way, some slightly different. I know steps are in motion to do some
of them. I'm just lining up everything here. And yes, actually offering
to help out if wanted. Just say the words.

And if I'm stepping on someones toes here, let me apologize in advance.
Just point me in the right direction. It's not my intention to be
someone who just complains about what is now, I'd rather be someone who
helps with ideas on how to move forward.


Number of mirrors
-----------------
* There are currently almost 60 mirrors for the static web content.

* During the very largest load during slashdotting etc, the three
servers serving up the static content totalled no more than a little
over 6Mbit of traffic, at around less than 500 requests / second.

* During this time, wwwmaster pushed around 1.5Mbit

* As long as www.postgresql.org is fast, people will *not* pick their
local mirror for the web (ftp is a different thing, as it's more
bandwidth intensive).


This leads me to the conclusion that we do *not* in fact need the large
mirror network to handle the bandwidth load. In fact, most of those
sites probably use up more bandwidth syncing than they save. It *is much
needed* for redundancy, however, and we need better automation for that
one. (A lot of man-hours were thrown in to fix this problem. For next
time, it's better if it's done before)

My suggestion for this is to limit the number of mirrors to around 5,
give or take a few. But instead, put higher demands on these mirrors
than we do now. Demand they sync every 30 minutes (or 60, but you get
the point). Demand that they have a fast machine and a fast network
connection. There have been enough offers of servers and networks that
this should not be a major problem. Demand that they respond to
www.postgresql.org - if it can have a dedicated IP, even better.
Distributed across the world of course.

The other mirrors can stay if they want. Don't let them sync to the
master, to keep the load down, just to another mirror (as it is now with
only srv4, borg and eastside syncing to wwwmaster, and all others
syncing to svr4).

For wwwmaster, have two machines at different locations. Use Slony to
replicate the database. Some coding probably needed to manually handle
some updates (like the logs), since Slony isn't multimaster yet.
wwwmaster held up fine now, but if something happens to the box or the
network it's on we're dead in the water.


Then do some "DNS magic" to do the load balancing:
* Create a new zone, let's call it "mirrors.postgresql.org". With a TTL
of no more than 10-15 minutes. Distribute this zone to more DNS servers
than the current zone, since the load on the nameservers will be much
higher. But require that all these machine respond to update
notifications so they pick up changes *right away*. By creating a new
zone we can both separate the handling of it (so a bug only affects this
and not say the mailinglists etc), and we can keep the TTL on the main
zone fairly large.

* Add a CNAME for www.postgresql.org to
www-static.mirrors.postgresql.org

* Have a script running at a dedicated machine somewhere *very* well
connected that is *not* one of the webservers. This script will poll the
website every 5 minutes. If the site does not respond, it's dropped from
the zone right away. If it is not up to date, the site is dropped from
the zone if it's more than <n> minutes old (depending on how often sync
is demanded)

* This also provides a way to gracefully take one machine out of the
cluster without needing any manual hacking of DNZ zones, etc. Simply
stop syncing and then wait an hour or so and all requests should be
elsewhere. Then once the machine is upgraded/reinstalled/moved/whatever,
just start syncing again and things should be picked up again.

A similar solution for wwwmaster, of course.


I am willing to invest some time in doing these scripts if wanted. I
don't think it's a huge amount of work. And parts of it has already been
done by dave in the current mirror checking script.


A similar solution can be made for the ftp servers, but I think it's of
less need there. If we want to do it, let's start with www and take it
from there if necessary.


Sync speed
----------
After setting up eastside to help handle the load of www.postgresql.org
I noticed the sync was horribly slow when nothing had changed. This was
because it synced the attributes on all files every time - the update
date, I beleive. Dave has committed a couple of patches I made for this
now, and sync time has dropped from >5 minutes down to <5 seconds.

A mirror pull when *nothing* has changed is right now around 400Kb. With
60 servers syncing up that's a full 24Mb every time when nothing has
changed. With just 5 servers, well, do the math ;-)


Bittorrent/Ftp
--------------
As Dave has already referred to, I think it'd be good to link bittorrent
links from every file in the ftp browser. Slashdot linked directly to
the bittorrent downloads, and that showed. But once it fell down on the
slashdot page, the amount of people using bittorrent fell off very fast.
During peak my two seeders sent about 4Mbit/sec on bittorrent. Also, the
load hit bt.postgresql.org instead of www.postgresql.org, so it was not
distributed.

Since this means more bittorrent seeders, it should perhaps be on a
separate box from the web stuff. There could be several that just
rsynced the .torrents between each other so the project always has a
couple of seeders in. This would be a very easy point for people to just
"plug in more bandwidth" when required as well, since bittorrent
automatically makes sure that nobody can serve a non-up-to-date file,
etc. With some tweak to the scripts it ought to be possible to make this
run with just one process serving a whole lots of torrents - they just
need to be in the same directory.

As for ftp mirrors, the bandwidth demand there is no dobut much higher
than it is on the web servers, so keeping more mirrors here make a lot
of sense. Also, some of the ftp sites that mirror us now have *huge*
amuonts of bandwidth (in the size of many gigabits/sec).


wwwmaster
---------
If you hit the ftp browser (or a download link), and then click anything
in the menu, you get the whole site served from wwwmaster. If the above
is fixed, so mirrors are all referred to as www.postgresql.org, it
should be as simple as sticking a <base href> in there or something. BUt
until then, perhaps some creative coding in the framework can fix it so
links that are hit on wwwmaster point back to www whereas the static
site uses relative links only?




Wow. That was a lot longer than initially intended. Hope someone has the
patience to read it all ;-)

//Magnus

Re: Thoughts on the mirroring system etc

From
"Dave Page"
Date:

> -----Original Message-----
> From: pgsql-www-owner@postgresql.org
> [mailto:pgsql-www-owner@postgresql.org] On Behalf Of Magnus Hagander
> Sent: 20 January 2005 12:12
> To: pgsql-www@postgresql.org
> Subject: [pgsql-www] Thoughts on the mirroring system etc
>
> And if I'm stepping on someones toes here, let me apologize
> in advance.

Only really mine - and I knew you were writing this :-)

>
>
> Number of mirrors
> -----------------
> * There are currently almost 60 mirrors for the static web content.
>
> * During the very largest load during slashdotting etc, the three
> servers serving up the static content totalled no more than a little
> over 6Mbit of traffic, at around less than 500 requests / second.
>
> * During this time, wwwmaster pushed around 1.5Mbit
>
> * As long as www.postgresql.org is fast, people will *not* pick their
> local mirror for the web (ftp is a different thing, as it's more
> bandwidth intensive).
>
>
> This leads me to the conclusion that we do *not* in fact need
> the large
> mirror network to handle the bandwidth load. In fact, most of those
> sites probably use up more bandwidth syncing than they save.

Yes. I posted comments to this effect before Christmas. During the fun
yesterday morning, I also noticed that rsync connections were taking
significant amounts of CPU - in fact, 4 concurrent ones were taking
around 40% CPU between then on svr4 for at least a few minutes. Disk IO
was almost certainly equally high. I cannot believe that the bandwidth
saved by 60 odd mirrors justifies the CPU, network and disk IO required
to rsync.

As an example, I run www.uk.postgresql.org. On the 10th Jan, a date
picked pretty much at random, I logged 2448 http requests. Each hit on
the homepage results in about 30(!) httpd requests, so represents as few
as 82 hits!

Yesterday, release day, I only logged 2387 hits!!

> My suggestion for this is to limit the number of mirrors to around 5,
> give or take a few. But instead, put higher demands on these mirrors
> than we do now. Demand they sync every 30 minutes (or 60, but you get
> the point). Demand that they have a fast machine and a fast network
> connection. There have been enough offers of servers and networks that
> this should not be a major problem. Demand that they respond to
> www.postgresql.org - if it can have a dedicated IP, even better.
> Distributed across the world of course.

Yes - we are already planning to do this, and indeed some of the work
has been done. The mirror tracker checks whether or not a mirror will
respond to www.postgresql.org requests, and the backend database has a
flag to mark the 'primary' mirrors.

>
> Then do some "DNS magic" to do the load balancing:

<snip DNS Magic>

Yes, the current mirror tracker could easily be adapted to do this.

>
> A similar solution for wwwmaster, of course.
>

The major problem with wwwmaster is that we need multimaster replication
to handle it properly, without having a single point of failure. Slony 1
will not resolve that basic issue.

> wwwmaster
> ---------
> If you hit the ftp browser (or a download link), and then
> click anything
> in the menu, you get the whole site served from wwwmaster. If
> the above
> is fixed, so mirrors are all referred to as www.postgresql.org, it
> should be as simple as sticking a <base href> in there or
> something. BUt
> until then, perhaps some creative coding in the framework can
> fix it so
> links that are hit on wwwmaster point back to www whereas the static
> site uses relative links only?

Yes, I need to think about this. At the moment, the flags on the mirror
pages have been hardcoded back to www, but a better solution is needed.

> Wow. That was a lot longer than initially intended. Hope
> someone has the
> patience to read it all ;-)

I did :-)

/D

Re: Thoughts on the mirroring system etc

From
"Magnus Hagander"
Date:
>> And if I'm stepping on someones toes here, let me apologize
>> in advance.
>
>Only really mine - and I knew you were writing this :-)

:-)
Well, feel free to tell me to stop ;-)


>As an example, I run www.uk.postgresql.org. On the 10th Jan, a date
>picked pretty much at random, I logged 2448 http requests. Each hit on
>the homepage results in about 30(!) httpd requests, so
>represents as few
>as 82 hits!
>
>Yesterday, release day, I only logged 2387 hits!!

Which proves my point (and yours). Thanks :-)



>> My suggestion for this is to limit the number of mirrors to around 5,
>> give or take a few. But instead, put higher demands on these mirrors
>> than we do now. Demand they sync every 30 minutes (or 60, but you get
>> the point). Demand that they have a fast machine and a fast network
>> connection. There have been enough offers of servers and
>networks that
>> this should not be a major problem. Demand that they respond to
>> www.postgresql.org - if it can have a dedicated IP, even better.
>> Distributed across the world of course.
>
>Yes - we are already planning to do this, and indeed some of the work
>has been done. The mirror tracker checks whether or not a mirror will
>respond to www.postgresql.org requests, and the backend database has a
>flag to mark the 'primary' mirrors.

Ok. All good. Though I beleive some of this is unnecessary - if you
operate from the standpoint that *all* approved mirrors need to answer
requests for www.postgresql.org.


>> Then do some "DNS magic" to do the load balancing:
>
><snip DNS Magic>
>
>Yes, the current mirror tracker could easily be adapted to do this.

Right. And IMHO, it should be moved off webmaster and have a separate
system - it has different needs, and separation of critical services is
good.


>> A similar solution for wwwmaster, of course.
>>
>
>The major problem with wwwmaster is that we need multimaster
>replication
>to handle it properly, without having a single point of
>failure. Slony 1
>will not resolve that basic issue.

No, I beleive you can solve this. Let's assume we don't care if we can't
add/remove news and events. AFAIK, then the database is almost only
INSERTs right - answers to surveys, redirect logging etc?
For this, create two tables, say "log1" and "log2". Where one of the
servers each own one table, and only writes to that one. You set up two
sets of slony replications, one in each direction. Then you create a
view that is a UNION ALL of these, that's the one used when you read
from the table.
Simplified, but most of the time you can spot fairly easy ways to do
this in the application.

Getting rid of as many single points of failure as possible should be an
overall goal.


>> wwwmaster
>> ---------
>> If you hit the ftp browser (or a download link), and then
>> click anything
>> in the menu, you get the whole site served from wwwmaster. If
>> the above
>> is fixed, so mirrors are all referred to as www.postgresql.org, it
>> should be as simple as sticking a <base href> in there or
>> something. BUt
>> until then, perhaps some creative coding in the framework can
>> fix it so
>> links that are hit on wwwmaster point back to www whereas the static
>> site uses relative links only?
>
>Yes, I need to think about this. At the moment, the flags on the mirror
>pages have been hardcoded back to www, but a better solution is needed.

IF the decision is made to require that all mirrors "we care about"
answer to www.postgresql.org, then the problem goes away.


>> Wow. That was a lot longer than initially intended. Hope
>> someone has the
>> patience to read it all ;-)
>
>I did :-)

Yay!

//Magnus

Re: Thoughts on the mirroring system etc

From
Robert Treat
Date:
On Thursday 20 January 2005 07:47, Dave Page wrote:
> The major problem with wwwmaster is that we need multimaster replication
> to handle it properly, without having a single point of failure. Slony 1
> will not resolve that basic issue.
>

I wonder if daffodil replicator is worth looking into.  I don't think it is a
true multi-master system, but it claims to do bi-directional replicating and
data-synchronization between databases, so it might do what we need.  I'm
sure if we told them we were thinking of using it for the pg website they
would be willing to help.

As an aside, I'm not convinced that we have to go the db replication route.
AFAIK the php website doesn't use any db replication, instead relying on one
central server for submitting comments and some apache/php magic to do
automagic mirror redirection.  I don't see any reason we can't use that
scheme.

--
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

Re: Thoughts on the mirroring system etc

From
"Gavin M. Roy"
Date:
As I mentioned before I'd be happy to donate an apache module
(commercial) that when combined with mod rewrite would redirect
www.postgresql.org to www.<country_code>.postgresql.org.  It takes very
little overhead and would be very effictive for the mirroring.

Gavin

Robert Treat wrote:

>On Thursday 20 January 2005 07:47, Dave Page wrote:
>
>
>>The major problem with wwwmaster is that we need multimaster replication
>>to handle it properly, without having a single point of failure. Slony 1
>>will not resolve that basic issue.
>>
>>
>>
>
>I wonder if daffodil replicator is worth looking into.  I don't think it is a
>true multi-master system, but it claims to do bi-directional replicating and
>data-synchronization between databases, so it might do what we need.  I'm
>sure if we told them we were thinking of using it for the pg website they
>would be willing to help.
>
>As an aside, I'm not convinced that we have to go the db replication route.
>AFAIK the php website doesn't use any db replication, instead relying on one
>central server for submitting comments and some apache/php magic to do
>automagic mirror redirection.  I don't see any reason we can't use that
>scheme.
>
>
>


Re: Thoughts on the mirroring system etc

From
"Magnus Hagander"
Date:
>As I mentioned before I'd be happy to donate an apache module
>(commercial) that when combined with mod rewrite would redirect
>www.postgresql.org to www.<country_code>.postgresql.org.  It
>takes very
>little overhead and would be very effictive for the mirroring.

Doesn't the main problem remain? If the server with this module on goes
down or its network connection has problems, then the whole site is
down, no? So you'd still have the problem of having multiple machines
running www.postgresql.org, and some way of dealing with them when they
go down?

//Magnus

News Events do NOT mirror

From
Josh Berkus
Date:
People:

Go to:  www3.us.postgresql.org
Click on "PostgreSQL 8.0.0 Released"
Get:
The requested URL /about/news.277 was not found on this server.

This issue seems to be consistent across mirrors.


--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: Thoughts on the mirroring system etc

From
"Magnus Hagander"
Date:
>As an aside, I'm not convinced that we have to go the db
>replication route.
>AFAIK the php website doesn't use any db replication, instead
>relying on one
>central server for submitting comments and some apache/php magic to do
>automagic mirror redirection.  I don't see any reason we can't
>use that
>scheme.

Depends on how much you want in the database. If that machine goes down
or off the net, everything relying on the db will go down. Right now,
that includes the ftp downloads, because they are going through the
download counter. It also includes new events, survey responses etc,
which are much less of a loss if they go down.

Same for redirection - if the redundancy is based on redirection only,
everything falls apart if the redirector goes down. It can help get the
bandwidth down, but I just don't see that as the main problem.

//Magnus

Re: Thoughts on the mirroring system etc

From
Justin Clift
Date:
Magnus Hagander wrote:
 > Dave Page wrote:
<snip>
>>The major problem with wwwmaster is that we need multimaster
>>replication to handle it properly, without having a single point of
>>failure. Slony 1 will not resolve that basic issue.
>
> No, I beleive you can solve this. Let's assume we don't care if we can't
> add/remove news and events. AFAIK, then the database is almost only
> INSERTs right - answers to surveys, redirect logging etc?
> For this, create two tables, say "log1" and "log2". Where one of the
> servers each own one table, and only writes to that one. You set up two
> sets of slony replications, one in each direction. Then you create a
> view that is a UNION ALL of these, that's the one used when you read
> from the table.

This would work.

It would also allow for transient individual server failure(s) and keep
the "web presence" going strongly too.  Given working with a small
number of independent servers (i.e. 5 or 6), this should also scale for
what we need too.

Good thinking Magnus.  :)

Regards and best wishes,

Justin Clift

--
"One who sees the invisible can do the impossible."
  + Frank Gaines

Re: Thoughts on the mirroring system etc

From
"Dave Page"
Date:

> -----Original Message-----
> From: Magnus Hagander [mailto:mha@sollentuna.net]
> Sent: 20 January 2005 18:00
> To: Dave Page; pgsql-www@postgresql.org
> Subject: RE: [pgsql-www] Thoughts on the mirroring system etc
>
> >As an example, I run www.uk.postgresql.org. On the 10th Jan, a date
> >picked pretty much at random, I logged 2448 http requests.
> Each hit on
> >the homepage results in about 30(!) httpd requests, so
> >represents as few
> >as 82 hits!
> >
> >Yesterday, release day, I only logged 2387 hits!!
>
> Which proves my point (and yours). Thanks :-)

Actually, looking at my terminology there - I meant requests, not hits.
Jan 10 was busier than the 19th. Which proves the point even more.

> >Yes - we are already planning to do this, and indeed some of the work
> >has been done. The mirror tracker checks whether or not a mirror will
> >respond to www.postgresql.org requests, and the backend
> database has a
> >flag to mark the 'primary' mirrors.
>
> Ok. All good. Though I beleive some of this is unnecessary - if you
> operate from the standpoint that *all* approved mirrors need to answer
> requests for www.postgresql.org.

Yeah, but it was written with slightly different aims in mind.

>
> >> Then do some "DNS magic" to do the load balancing:
> >
> ><snip DNS Magic>
> >
> >Yes, the current mirror tracker could easily be adapted to do this.
>
> Right. And IMHO, it should be moved off webmaster and have a separate
> system - it has different needs, and separation of critical
> services is
> good.

Absolutely.

> >> A similar solution for wwwmaster, of course.
> >>
> >
> >The major problem with wwwmaster is that we need multimaster
> >replication
> >to handle it properly, without having a single point of
> >failure. Slony 1
> >will not resolve that basic issue.
>
> No, I beleive you can solve this. Let's assume we don't care
> if we can't
> add/remove news and events. AFAIK, then the database is almost only
> INSERTs right - answers to surveys, redirect logging etc?
> For this, create two tables, say "log1" and "log2". Where one of the
> servers each own one table, and only writes to that one. You
> set up two
> sets of slony replications, one in each direction. Then you create a
> view that is a UNION ALL of these, that's the one used when you read
> from the table.
> Simplified, but most of the time you can spot fairly easy ways to do
> this in the application.

Yeah, that'd work. Then we just have the news, events and docs etc. to
worry about.

So if we ran 2 wwwmasters, and say, four static primary servers, I guess
we would basically split them into 2 sets, so a pair of front ends and
one backend worked together?

Regards, Dave.

Re: Thoughts on the mirroring system etc

From
"Dave Page"
Date:

> -----Original Message-----
> From: Robert Treat [mailto:xzilla@users.sourceforge.net]
> Sent: 20 January 2005 18:24
> To: Dave Page
> Cc: Magnus Hagander; pgsql-www@postgresql.org
> Subject: Re: [pgsql-www] Thoughts on the mirroring system etc
>
> As an aside, I'm not convinced that we have to go the db
> replication route.
> AFAIK the php website doesn't use any db replication, instead
> relying on one
> central server for submitting comments and some apache/php
> magic to do
> automagic mirror redirection.  I don't see any reason we
> can't use that
> scheme.

Hmm, I have to disagree there - one of their UK mirrors is occasionally
flakey, and I end up tearing my hair out trying to get to the docs. We
used to have a similar situation with the pgwebsite years ago - I'd like
to avoid that happening again.

Daffodil sounds good though - I will try to look at that.

Regardsm, Dave

Re: Thoughts on the mirroring system etc

From
"Magnus Hagander"
Date:
> > >Yes - we are already planning to do this, and indeed some
> of the work
> > >has been done. The mirror tracker checks whether or not a
> mirror will
> > >respond to www.postgresql.org requests, and the backend
> > database has a
> > >flag to mark the 'primary' mirrors.
> >
> > Ok. All good. Though I beleive some of this is unnecessary - if you
> > operate from the standpoint that *all* approved mirrors
> need to answer
> > requests for www.postgresql.org.
>
> Yeah, but it was written with slightly different aims in mind.

Oh, certainly, I wasn't implying anything else.


> > >> A similar solution for wwwmaster, of course.
> > >>
> > >
> > >The major problem with wwwmaster is that we need multimaster
> > >replication to handle it properly, without having a single
> point of
> > >failure. Slony 1 will not resolve that basic issue.
> >
> > No, I beleive you can solve this. Let's assume we don't care if we
> > can't add/remove news and events. AFAIK, then the database
> is almost
> > only INSERTs right - answers to surveys, redirect logging etc?
> > For this, create two tables, say "log1" and "log2". Where
> one of the
> > servers each own one table, and only writes to that one. You set up
> > two sets of slony replications, one in each direction. Then
> you create
> > a view that is a UNION ALL of these, that's the one used
> when you read
> > from the table.
> > Simplified, but most of the time you can spot fairly easy
> ways to do
> > this in the application.
>
> Yeah, that'd work. Then we just have the news, events and
> docs etc. to worry about.
>
> So if we ran 2 wwwmasters, and say, four static primary
> servers, I guess we would basically split them into 2 sets,
> so a pair of front ends and one backend worked together?

No, I'd go for keeping them all independent. E.g. 2 independent dynamic
and 4 independent static. KISS principle, again. And it makes it very
easy to add more static boxes if need be. Dynamic boxes are bit more
work, since they need to get db tables set up etc.

//Magnus

Re: Thoughts on the mirroring system etc

From
"Dave Page"
Date:

> -----Original Message-----
> From: Magnus Hagander [mailto:mha@sollentuna.net]
> Sent: 21 January 2005 08:24
> To: Dave Page; pgsql-www@postgresql.org
> Subject: RE: [pgsql-www] Thoughts on the mirroring system etc
>
> No, I'd go for keeping them all independent. E.g. 2
> independent dynamic
> and 4 independent static. KISS principle, again. And it makes it very
> easy to add more static boxes if need be. Dynamic boxes are bit more
> work, since they need to get db tables set up etc.

Well they can't all be independent - the statics have to mirror from
somewhere.

/D

Re: Thoughts on the mirroring system etc

From
Tom Lane
Date:
"Dave Page" <dpage@vale-housing.co.uk> writes:
> The major problem with wwwmaster is that we need multimaster replication
> to handle it properly, without having a single point of failure. Slony 1
> will not resolve that basic issue.

This is a bogus conclusion, and the later-proposed solution involving a
UNION view is just silly.

Supposing that you replicate the database to one or more other machines
via Slony-I, you have a defense against complete loss of wwwmaster,
namely you can just (manually) decree that one of the other copies is
now the master.  So that solves one of the problems posed.  The other
problem this poses is getting the web server machines to hit a working
copy of the database when they need to serve up dynamic content.  That
problem has zero to do with your replication technology.  I think a
DNS-based solution similar to Magnus' proposal would work fine.

Multimaster replication is only important if you need reliable 24x7
updating of the database, which as far as I understand isn't needed for
this one.  So a single master (at a time) ought to work fine, and that
can be handled just fine with Slony-I.

Having just returned from Afilias' mini conference about design of
Slony-II, I can tell you that multimaster replication isn't right
around the corner ;-).  It's gonna take some work.

            regards, tom lane

Re: Thoughts on the mirroring system etc

From
"Magnus Hagander"
Date:
>> The major problem with wwwmaster is that we need multimaster
>replication
>> to handle it properly, without having a single point of
>failure. Slony 1
>> will not resolve that basic issue.
>
>This is a bogus conclusion, and the later-proposed solution involving a
>UNION view is just silly.
>
>Supposing that you replicate the database to one or more other machines
>via Slony-I, you have a defense against complete loss of wwwmaster,
>namely you can just (manually) decree that one of the other copies is
>now the master.  So that solves one of the problems posed.  The other
>problem this poses is getting the web server machines to hit a working
>copy of the database when they need to serve up dynamic content.  That
>problem has zero to do with your replication technology.  I think a
>DNS-based solution similar to Magnus' proposal would work fine.

Right, as long as you accept that you cannot accept any writes pending
the manual changing of who is the master. It also permits no
load-balancing between the servers, since only one can accept writes.
Unless the "promote to master" step can be automated safely, we're
looking at quite noticeable downtime if this happens. (And it's not all
that simple to do, I beleive. Consider lost connectivity between the two
machines, and you may end up with two machines that beleive they are the
master, unless you have some fairly advanced checks)

I beleive using the bidirectional replication and UNIONs this can be
solved in a very simple way. It does permit two (well, more as well, but
the complexity increases fast) redundant servers with full
functionality. And setting up Slony-I in two directions can't be *that*
much more complicated than setting it up in one, no?

Finally, the dynamic web server *must* have a local database, IMHO. You
don't want to be crossing the atlantic, for example, between the
webserver and the database server - that will kill performance in a
heartbeat.


>Multimaster replication is only important if you need reliable 24x7
>updating of the database, which as far as I understand isn't needed for
>this one.  So a single master (at a time) ought to work fine, and that
>can be handled just fine with Slony-I.

As the site is designed now, we need reliable 24x7 read and INSERT, but
not UPDATE. But db INSERTs are done on every download of a file, on
interactive docs and on surveys (I may have missed something there, the
point being there are >0).


>Having just returned from Afilias' mini conference about design of
>Slony-II, I can tell you that multimaster replication isn't right
>around the corner ;-).  It's gonna take some work.

Certainly :-)
I still think the UNION way and
different-tables-replicate-in-different-direction can be a decent fake
version of MM replication under certain restricted conditions.


//Magnus

Re: Thoughts on the mirroring system etc

From
"Dave Page"
Date:

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: 22 January 2005 19:15
> To: Dave Page
> Cc: Magnus Hagander; pgsql-www@postgresql.org
> Subject: Re: [pgsql-www] Thoughts on the mirroring system etc
>
> "Dave Page" <dpage@vale-housing.co.uk> writes:
> > The major problem with wwwmaster is that we need
> multimaster replication
> > to handle it properly, without having a single point of
> failure. Slony 1
> > will not resolve that basic issue.
>
> This is a bogus conclusion, and the later-proposed solution
> involving a
> UNION view is just silly.
>
> Supposing that you replicate the database to one or more
> other machines
> via Slony-I, you have a defense against complete loss of wwwmaster,
> namely you can just (manually) decree that one of the other copies is
> now the master.  So that solves one of the problems posed.

Assuming there is someone available who is able to do that. We are
looking for a robust solution that requires no manual intervention to
keep the site running in the event of a failure.

>  The other
> problem this poses is getting the web server machines to hit a working
> copy of the database when they need to serve up dynamic content.  That
> problem has zero to do with your replication technology.  I think a
> DNS-based solution similar to Magnus' proposal would work fine.

The webservers and the database machines are the same boxes. The current
implementation has a fully dynamic version of the website running on
wwwmaster, which also builds a static HTML version of the site. This is
currently rysnc'd to three distributed front end servers which link back
to wwwmaster only on pages that cannot be statically generated.

> Multimaster replication is only important if you need reliable 24x7
> updating of the database, which as far as I understand isn't
> needed for
> this one.  So a single master (at a time) ought to work fine, and that
> can be handled just fine with Slony-I.

A single master will work fine assuming that a replacement can be
swapped in quickly somehow. We then of course have the fun of reversing
the master/slave relationship(s) when the original master comes back up,
and making sure that both (or more) servers are properly in sync. Also,
having a single master doesn't help at all with load distribution of
course.

The beauty of Magnus' union idea is that having any small number of
masters is possible, and it would be easy to support and pretty much
self-maintaining - the existing mirror tracker would do everything that
is required if run regularly enough with a small DNS ttl giving a system
that can patch up a failure after only a few minutes.

> Having just returned from Afilias' mini conference about design of
> Slony-II, I can tell you that multimaster replication isn't right
> around the corner ;-).  It's gonna take some work.

Yeah, I guessed as much :-)

Regards, Dave.