Thread: PostgreSQL clustering (shared disk)

PostgreSQL clustering (shared disk)

From

"Mikko Partio"

Date:

16 August 2007, 03:42:09

Hello list,

I have a mission to implement a two-node active-passive PostgreSQL cluster. The databases at the cluster are rather large (hundreds of GB's) which opts me to consider a shared disk environment. I know this is not natively supported with PostgreSQL, but I have been investigating the Red Hat Cluster Suite with GFS. The idea would be that the cluster programs with gfs (and HP ilo) would make sure that only one postmaster at a time would be able to access the shared disk, and in case the active node fails the cluster software would shift the services to the previously passive node. What I'm pondering here is that is the cluster able to keep the postmasters synchronized at all times so that the database won't get corrupted.

Is there anyone on the list that has seen such configuration, or, even better, implemented it themselves? I found a small document by Devrim Gunduz describing this scenario but it was rather scant on details.

If shared disk is definitely out of the question, the fallback plan would be to use drbd and linux-ha.

Regards

MP

Re: PostgreSQL clustering (shared disk)

From

Hannes Dorbath

Date:

16 August 2007, 04:01:09

On 16.08.2007 08:42, Mikko Partio wrote:
> I have a mission to implement a two-node active-passive PostgreSQL cluster.
> The databases at the cluster are rather large (hundreds of GB's) which opts
> me to consider a shared disk environment. I know this is not natively
> supported with PostgreSQL, but I have been investigating the Red Hat Cluster
> Suite with GFS. The idea would be that the cluster programs with gfs (and HP
> ilo) would make sure that only one postmaster at a time would be able to
> access the shared disk, and in case the active node fails the cluster
> software would shift the services to the previously passive node. What I'm
> pondering here is that is the cluster able to keep the postmasters
> synchronized at all times so that the database won't get corrupted.
>
> Is there anyone on the list that has seen such configuration, or, even
> better, implemented it themselves? I found a small document by Devrim Gunduz
> describing this scenario but it was rather scant on details.
>
> If shared disk is definitely out of the question, the fallback plan would be
> to use drbd and linux-ha.

The usual setup is DRBD +  Heartbeat, which is fast, simple and proven.
Using a shared disk / SAN has disadvantages, such as single point of
failure, (usually) non-native fencing and (usually) way higher latency.

DRBD does handle a lot of stuff by it self, which you need to take care
yourself with a plain shared device. Using a cluster file system such as
GFS2, OCFS2 or Lustre is a waste of resources, as you can't have
active/active with PostgreSQL anyway.


--
Regards,
Hannes Dorbath

Re: PostgreSQL clustering (shared disk)

From

Devrim GÜNDÜZ

Date:

16 August 2007, 04:07:47

Hi,

On Thu, 2007-08-16 at 09:42 +0300, Mikko Partio wrote:
> The idea would be that the cluster programs with gfs (and HP ilo)
> would make sure that only one postmaster at a time would be able to
> access the shared disk, and in case the active node fails the cluster
> software would shift the services to the previously passive node.

AFAIK, it is the fence device that will prevent the postmaster access
from the failed node. RHCS will just switch the servers.

> What I'm pondering here is that is the cluster able to keep the
> postmasters synchronized at all times so that the database won't get
> corrupted.

Keep all the $PGDATA in the shared disk. That would minimize data loss
(Of course, there is still a risk of data loss -- the postmasters are
not aware of each other and they don't share each other's buffers, etc.)

Regards,
--
Devrim GÜNDÜZ
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/

Attachment

signature.asc

Re: PostgreSQL clustering (shared disk)

From

Devrim GÜNDÜZ

Date:

16 August 2007, 05:00:53

Hi,

On Thu, 2007-08-16 at 10:05 +0300, Devrim GÜNDÜZ wrote:
> (Of course, there is still a risk of data loss -- the postmasters are
> not aware of each other and they don't share each other's buffers,
> etc.)

Err... I was talking about uncommitted transactions, and of course this
does not mean a data loss (Thanks to Magnus for the reminder)

Regards,
--
Devrim GÜNDÜZ
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/

Attachment

signature.asc

Re: PostgreSQL clustering (shared disk)

From

Douglas McNaught

Date:

16 August 2007, 11:27:51

Devrim GÜNDÜZ <devrim@CommandPrompt.com> writes:

>> What I'm pondering here is that is the cluster able to keep the
>> postmasters synchronized at all times so that the database won't get
>> corrupted.
>
> Keep all the $PGDATA in the shared disk. That would minimize data loss
> (Of course, there is still a risk of data loss -- the postmasters are
> not aware of each other and they don't share each other's buffers, etc.)

It would be much better to have the cluster software only run one
postmaster at a time, starting up the secondary if the primary fails.
That's the usual practice with shared storage.

-Doug

Re: PostgreSQL clustering (shared disk)

From

"Mikko Partio"

Date:

17 August 2007, 06:12:22

On 8/16/07, Douglas McNaught <doug@mcnaught.org> wrote:

Devrim GÜNDÜZ <devrim@CommandPrompt.com> writes:

>> What I'm pondering here is that is the cluster able to keep the
>> postmasters synchronized at all times so that the database won't get
>> corrupted.
>
> Keep all the $PGDATA in the shared disk. That would minimize data loss
> (Of course, there is still a risk of data loss -- the postmasters are
> not aware of each other and they don't share each other's buffers, etc.)

It would be much better to have the cluster software only run one
postmaster at a time, starting up the secondary if the primary fails.
That's the usual practice with shared storage.

This was my original intention. I'm still quite hesitant to trust the fencing devices ability to quarantee that only one postmaster at a time is running, because of the disastrous possibility of corrupting the whole database.

Maybe I'm just better off using the more simple (crude?) method of drbd + heartbeat?

Regards

MP

Re: PostgreSQL clustering (shared disk)

From

Hannes Dorbath

Date:

17 August 2007, 06:46:58

On 17.08.2007 11:12, Mikko Partio wrote:
> Maybe I'm just better off using the more simple (crude?) method of drbd +
> heartbeat?

Crude? Use what you like to use, but you should keep one thing in mind:
If you don't know the software you are running in each and every detail,
how it behaves in each and every situation you can think of, it's a bad
idea to use it in a HA setup.

You don't want to be one of those admins that just configured something
in a few days, moved production stuff on it and fail to recover from a
split brain situation. Setting up a HA environment is something you do
in months, not days, at least if you want to do it right. There is so
much that can go wrong, and so much to learn. Keep it simple.

--
Regards,
Hannes Dorbath

Re: PostgreSQL clustering (shared disk)

From

"Mikko Partio"

Date:

17 August 2007, 07:18:42

On 8/17/07, Hannes Dorbath <light@theendofthetunnel.de> wrote:

On 17.08.2007 11:12, Mikko Partio wrote:
> Maybe I'm just better off using the more simple (crude?) method of drbd +
> heartbeat?

Crude? Use what you like to use, but you should keep one thing in mind:
If you don't know the software you are running in each and every detail,
how it behaves in each and every situation you can think of, it's a bad
idea to use it in a HA setup.

You don't want to be one of those admins that just configured something
in a few days, moved production stuff on it and fail to recover from a
split brain situation. Setting up a HA environment is something you do
in months, not days, at least if you want to do it right. There is so
much that can go wrong, and so much to learn. Keep it simple.

Exactly my thoughts, as I have some experience with drbd and I know it works. My point was that since I have access to a san environment, a shared storage would be a more "elegant" solution, but as you pointed out it's probably better to stick to the method that feels most comfortable.

Thanks for your thoughts.

Regards

MP

Re: PostgreSQL clustering (shared disk)

From

Tom Lane

Date:

17 August 2007, 10:59:35

"Mikko Partio" <mpartio@gmail.com> writes:
> This was my original intention. I'm still quite hesitant to trust the
> fencing devices ability to quarantee that only one postmaster at a time is
> running, because of the disastrous possibility of corrupting the whole
> database.

Making that guarantee is a fencing device's only excuse for existence.
So I think you should trust that a properly-implemented fence will do
what it's claimed to do.

On the other side of the coin, I have little confidence in DRBD
providing the storage semantics we need (in particular guaranteeing
write ordering).  So that path doesn't sound exactly risk-free either.

            regards, tom lane

Re: PostgreSQL clustering (shared disk)

From

Hannes Dorbath

Date:

17 August 2007, 11:19:45

On 17.08.2007 15:59, Tom Lane wrote:
> On the other side of the coin, I have little confidence in DRBD
> providing the storage semantics we need (in particular guaranteeing
> write ordering).  So that path doesn't sound exactly risk-free either.

To my understanding DRBD provides this. I think a discussion about that
with the DRBD developers would be very useful for many users searching
for a solution to replicate PostgreSQL, so I'm cross posting this to
DRBD list. Maybe you can make clear in detail what requirements
PostgreSQL has.

--
Regards,
Hannes Dorbath

Re: PostgreSQL clustering (shared disk)

From

"Sander Steffann"

Date:

17 August 2007, 11:22:44

Hi,

> On the other side of the coin, I have little confidence in DRBD
> providing the storage semantics we need (in particular guaranteeing
> write ordering).  So that path doesn't sound exactly risk-free either.

DRBD seems to enforce strict write ordering on both sides of the link
according to the docs. I didn't look at the code, but my plug-pulling tests
on a busy PostgreSQL server didn't cause any problems. No conclusive
evidence, but useful at lease in my use-case. (And yes: I make ps_dumps
often just in case)

- Sander

Re: PostgreSQL clustering (shared disk)

From

Magnus Hagander

Date:

17 August 2007, 11:45:46

On Fri, Aug 17, 2007 at 04:19:57PM +0200, Hannes Dorbath wrote:
> On 17.08.2007 15:59, Tom Lane wrote:
> >On the other side of the coin, I have little confidence in DRBD
> >providing the storage semantics we need (in particular guaranteeing
> >write ordering).  So that path doesn't sound exactly risk-free either.
>
> To my understanding DRBD provides this. I think a discussion about that
> with the DRBD developers would be very useful for many users searching
> for a solution to replicate PostgreSQL, so I'm cross posting this to
> DRBD list. Maybe you can make clear in detail what requirements
> PostgreSQL has.

It does, AFAIK, if yuo configure it properly. I think it's the "protocol"
parameter you need to set to C which is the slowest, but it's the only one
that waits for the block to hit *both* disks.

//Magnus

Re: PostgreSQL clustering (shared disk)

From

Lars Ellenberg

Date:

20 August 2007, 15:26:42

On Fri, Aug 17, 2007 at 09:59:26AM -0400, Tom Lane wrote:
> "Mikko Partio" <mpartio@gmail.com> writes:
> > This was my original intention. I'm still quite hesitant to trust the
> > fencing devices ability to quarantee that only one postmaster at a time is
> > running, because of the disastrous possibility of corrupting the whole
> > database.
>
> Making that guarantee is a fencing device's only excuse for existence.
> So I think you should trust that a properly-implemented fence will do
> what it's claimed to do.
>
> On the other side of the coin, I have little confidence in DRBD
> providing the storage semantics we need (in particular guaranteeing
> write ordering).  So that path doesn't sound exactly risk-free either.
>
>             regards, tom lane

of course we guarantee write ordering.

we (linbit, company behind drbd, paying drbd developers)
operate quite a few postgres clusters in production on clusters
"powered by heartbeat and DRBD".
there are much more we do not operate directly ourselves.

just because we happen to have a partnership with mysql
does not mean we don't like postgres very much indeed :)

to get an idea of what drbd does for you, please,
if you are interessted, read some of the
 http://www.drbd.org/publications.html,
maybe start with the LinuxConf 2007 pdf.

cheers,

--
: Lars Ellenberg                            Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :