Thread: PG on two nodes with shared disk ocfs2 & drbd
Hi,
I have to build a load balanced pg-cluster and I wanted to ask you, if this configuration would work:
A drbd disk in dual primary mode with ocfs2-filesystem.
Will there be any conflicts if using the shared volume as PGDATA directory?
R+W is a required feature for this cluster.
Thank you
Jasmin
On Sun, Feb 27, 2011 at 01:48:24PM +0100, Jasmin Dizdarevic wrote: > A drbd disk in dual primary mode with ocfs2-filesystem. > > Will there be any conflicts if using the shared volume as PGDATA directory? Yes. A -- Andrew Sullivan ajs@crankycanuck.ca
On 02/27/11 4:48 AM, Jasmin Dizdarevic wrote: > > I have to build a load balanced pg-cluster and I wanted... master-master doesn't work real well with databases, especially ones like postgres that are optimized for a high level of concurrency and transactional integrity. on proper hardware, postgres can run quite high transactional volumes on a single server. If high availability is a requirement, you can run a 2nd server as a standby slave using either postgres streaming replication, or drbd style block replication, or another such similar technique. you -can- distribute read accesses between a master and a slave server via things like pgpool2, where all inserts, updates, DDL changes, etc are made to the master server, but reads are done to either. note you do NOT want to use block level replication like drbd for this as the drbd slave can not be actively mounted, nor could the slave instance of postgres be aware of changes to the underlying storage, rather you would use the streaming replication built into postgresql 9.0. another approach is to use the master server for all OLTP type accesses, and the hot standby server for more complex long running OLAP queries for reporting, etc. in case of master failure, the slave becomes the new master, and you shut down the presumably less important OLAP operations until such time as a new slave can be deployed.
On Sun, Feb 27, 2011 at 12:10:36PM -0800, John R Pierce wrote: > are made to the master server, but reads are done to either. note you > do NOT want to use block level replication like drbd for this as the > drbd slave can not be actively mounted, nor could the slave instance of > postgres be aware of changes to the underlying storage, rather you would > use the streaming replication built into postgresql 9.0. Note that with drbd, you can have a piece of hot standby hardware sitting there to take over the filesystem in real time, in the event the original master blows up or something. My experience with systems designed like this is that they are a foot-bazooka: the only real utility I ever saw in them was to increase on-call hours for sysadmins after they blew off their own foot (and too often, my database) doing something tricky with the standby server. If it were me setting it up, I'd think the streaming replication approach a better bet. Not that anything will save you when someone else has root and decides to play with a production server. I believe that Greenplum sells a system based on Postgres that is supposed to do some kind of distributed cluster thing. I don't understand the details and it's been a long time since I had any look at it. I think it's intended to compete in the scalability rather than the availability market. Maybe someone around here knows more. The only people I'm aware of who really do this sort of thing for availability are Oracle with RAC, and Oracle with some mostly-works clustering stuff in MySQL. I have never met a happy customer of the former, but I've heard some people tell me it's real impressive technology when it's working. (The unhappy people seemed mostly unhappy because, for that kind of coin, they would like it to work most of the time. I know at least one metronet deployment that didn't work even once for two years.) In the case of the MySQL stuff, there are some trade-offs in the design that make my heart sink. But maybe for the OP's application it will work. A -- Andrew Sullivan ajs@crankycanuck.ca
Thank you for your detailed information about HA and LB. First of all it's a pitty that there is no built-in feature for LB+HA (both of them, simultaneous).
In my eyes, the pgpool2/3-solution has to much disadvantages and restrictions.
My idea was the one, that john described: DML and DDL are done on the small box and reporting on the "big mama" with streaming replication and hot stand-by enabled. the only problem is that we use temp tables for reporting purposes. i hope that the query duration impact with not using temp tables will be equalized through running dml/ddl on the small box.
I think, this will be the final configuration:
- drbd with multi primary (ocfs2) as archive location for the primary node
- streaming replication and hot stand-by
this is a good howto to get real high availability when the primary node goes down, but for now I'm going to deploy the described configuration with manual fail over.
Regards,
Jasmin
2011/2/27 Andrew Sullivan <ajs@crankycanuck.ca>
On Sun, Feb 27, 2011 at 12:10:36PM -0800, John R Pierce wrote:Note that with drbd, you can have a piece of hot standby hardware
> are made to the master server, but reads are done to either. note you
> do NOT want to use block level replication like drbd for this as the
> drbd slave can not be actively mounted, nor could the slave instance of
> postgres be aware of changes to the underlying storage, rather you would
> use the streaming replication built into postgresql 9.0.
sitting there to take over the filesystem in real time, in the event
the original master blows up or something. My experience with systems
designed like this is that they are a foot-bazooka: the only real
utility I ever saw in them was to increase on-call hours for sysadmins
after they blew off their own foot (and too often, my database) doing
something tricky with the standby server. If it were me setting it
up, I'd think the streaming replication approach a better bet. Not
that anything will save you when someone else has root and decides to
play with a production server.
I believe that Greenplum sells a system based on Postgres that is
supposed to do some kind of distributed cluster thing. I don't
understand the details and it's been a long time since I had any look
at it. I think it's intended to compete in the scalability rather
than the availability market. Maybe someone around here knows more.
The only people I'm aware of who really do this sort of thing for
availability are Oracle with RAC, and Oracle with some mostly-works
clustering stuff in MySQL. I have never met a happy customer of the
former, but I've heard some people tell me it's real impressive
technology when it's working. (The unhappy people seemed mostly
unhappy because, for that kind of coin, they would like it to work
most of the time. I know at least one metronet deployment that didn't
work even once for two years.) In the case of the MySQL stuff, there
are some trade-offs in the design that make my heart sink. But maybe
for the OP's application it will work.Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mon, Feb 28, 2011 at 12:13:32AM +0100, Jasmin Dizdarevic wrote: > Thank you for your detailed information about HA and LB. First of all it's a > pitty that there is no built-in feature for LB+HA (both of them, > simultaneous). I think it's a pity that I'm not paid a million dollars a year, too, but barring magic I don't think it'll happen soon. Multi-master transactional ACID-type databases with multiple masters is very hard. A -- Andrew Sullivan ajs@crankycanuck.ca
On Mon, Feb 28, 2011 at 12:13:32AM +0100, Jasmin Dizdarevic wrote: > My idea was the one, that john described: DML and DDL are done on the small > box and reporting on the "big mama" with streaming replication and hot > stand-by enabled. the only problem is that we use temp tables for reporting > purposes. i hope that the query duration impact with not using temp tables > will be equalized through running dml/ddl on the small box. By the way, despite my flip comment, it is entirely possible that what you need would be better handled by one of the other replication systems. Slony is actually well-suited to this sort of thing, despite the overhead that it imposes. This is a matter of trade-offs, and you might want to think about different roles for different boxes -- especially since hardware is so cheap these days. A -- Andrew Sullivan ajs@crankycanuck.ca
On 02/27/11 4:07 PM, Andrew Sullivan wrote: > Multi-master transactional ACID-type databases with multiple masters is very hard. > indeed. Oracle RAC works by having a distributed cache and locking manager replicating over a fast bus like infininet. oracle fundamentally uses a transaction redo log rather than a write-ahead log like postgres, this is somewhat more amendable to distributed processing when combined with the distributed cache and lock. The trade-off is, its a really complicated and fragile system, with a high opportunity for catastrophic failure. It seems to me like Oracle wants to sell turnkey database servers with their Exadata stuff.
On Sun, Feb 27, 2011 at 7:17 PM, Andrew Sullivan <ajs@crankycanuck.ca> wrote: > On Mon, Feb 28, 2011 at 12:13:32AM +0100, Jasmin Dizdarevic wrote: >> My idea was the one, that john described: DML and DDL are done on the small >> box and reporting on the "big mama" with streaming replication and hot >> stand-by enabled. the only problem is that we use temp tables for reporting >> purposes. i hope that the query duration impact with not using temp tables >> will be equalized through running dml/ddl on the small box. > > By the way, despite my flip comment, it is entirely possible that what > you need would be better handled by one of the other replication > systems. Slony is actually well-suited to this sort of thing, despite > the overhead that it imposes. This is a matter of trade-offs, and you > might want to think about different roles for different boxes -- > especially since hardware is so cheap these days. > Yeah, it's possible one of the async master-master systems like bucardo or rubyrep would also fit his needs. There are options here, just no full on pony/unicorn/pegasus mix like everyone hopes for. Oh, I guess if someone is looking to fund/help development of such a thing, it might be worth pointing people to Postgres-XC (http://wiki.postgresql.org/wiki/Postgres-XC). It's got a ways to go, but they are at least trying. Robert Treat play: xzilla.net work: omniti.com hiring: l42.org/Lg
hehe...
andrew, I appriciate pg and it's free open source features - maybe I've chosen a wrong formulation.
in my eyes such a feature is getting more important nowadays. Postgresql-R and -XC are interesting ideas.
thanks everybody for the comments
regards,
jasmin
2011/2/28 Andrew Sullivan <ajs@crankycanuck.ca>
On Mon, Feb 28, 2011 at 12:13:32AM +0100, Jasmin Dizdarevic wrote:> My idea was the one, that john described: DML and DDL are done on the smallBy the way, despite my flip comment, it is entirely possible that what
> box and reporting on the "big mama" with streaming replication and hot
> stand-by enabled. the only problem is that we use temp tables for reporting
> purposes. i hope that the query duration impact with not using temp tables
> will be equalized through running dml/ddl on the small box.
you need would be better handled by one of the other replication
systems. Slony is actually well-suited to this sort of thing, despite
the overhead that it imposes. This is a matter of trade-offs, and you
might want to think about different roles for different boxes --
especially since hardware is so cheap these days.
A
--
Andrew Sullivan
ajs@crankycanuck.ca
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 03/03/11 09:01, Jasmin Dizdarevic wrote: > hehe... > andrew, I appriciate pg and it's free open source features - maybe I've > chosen a wrong formulation. > in my eyes such a feature is getting more important nowadays. Why? Shared disk means shared point of failure, and poor redundancy against a variety of non-total failure conditions (data corruption, etc). Add the synchronization costs to the mix, and I don't see the appeal. I think clustering _in general_ is becoming a big issue, but I don't really see the appeal of shared-disk clustering personally. -- Craig Ringer