Thread: Protecting against multiple instances per cluster

Protecting against multiple instances per cluster

From
Thom Brown
Date:
Hi all,

I've come across a PostgreSQL set up where there are 2 servers, each
with the same version of PostgreSQL on, both mounting the same SAN
onto their respective file systems.  It was intended that only 1 of
the servers would be running an instance of PostgreSQL at a time as
they both point to the same pgdata.  This was dubbed a "high
availability" set up, where if one server went down, they could start
PostgreSQL on the other.  (yes, I know what you're thinking)  Now
normally there is protection against 2 instances running only if the
instances on the same server as it would reference shared memory.  But
in this case, neither server has access to the other's shared memory,
so it has to rely on the pid file.  But the pid file isn't enough by
itself.  In this set up, if someone were to inadvertently start up a
Postgres instance on the 2nd server whilst the 1st was still running,
it would do very bad things.

For example, when I set up the same scenario on my own network, it
indeed let me start up the 2nd instance.  I then tried setting up a
table and generating lots of data for it, then... KABOOM:

postgres=# create table things (id serial, things int);
NOTICE:  CREATE TABLE will create implicit sequence "things_id_seq"
for serial column "things.id"
CREATE TABLE
postgres=# insert into things (things) select x from
generate_series(1,500000) a(x);
LOG:  could not link file "pg_xlog/xlogtemp.28426" to
"pg_xlog/000000010000000000000002" (initialization of log file 0,
segment 2): Operation not supported
STATEMENT:  insert into things (things) select x from
generate_series(1,500000) a(x);
PANIC:  could not open file "pg_xlog/000000010000000000000002" (log
file 0, segment 2): No such file or directory
STATEMENT:  insert into things (things) select x from
generate_series(1,500000) a(x);
PANIC:  could not open file "pg_xlog/000000010000000000000002" (log
file 0, segment 2): No such file or directory
PANIC:  could not open file "pg_xlog/000000010000000000000002" (log
file 0, segment 2): No such file or directory
The connection to the server was lost. Attempting reset: LOG:  server
process (PID 28426) was terminated by signal 6: Abort trap
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
Failed.
!> LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted; last known up at 2011-09-08 19:04:47 BST
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  consistent recovery state reached at 0/1755268
LOG:  redo starts at 0/1755268
LOG:  could not open file "pg_xlog/000000010000000000000002" (log file
0, segment 2): No such file or directory
LOG:  redo done at 0/1FFFFA8
LOG:  last completed transaction was at log time 2011-09-08 19:05:14.098496+01
LOG:  could not link file "pg_xlog/xlogtemp.28429" to
"pg_xlog/000000010000000000000002" (initialization of log file 0,
segment 2): Operation not supported
PANIC:  could not open file "pg_xlog/000000010000000000000002" (log
file 0, segment 2): No such file or directory
LOG:  startup process (PID 28429) was terminated by signal 6: Abort trap
LOG:  aborting startup due to startup process failure

Now obviously no-one should ever set up their system in such a
fashion, but some have, as I've witnessed it.  I suspect this is
potentially the cause of their continued database corruption and
outages, where expected WAL files aren't in the pg_xlog directory, so
recovery can't finish, and clog files going missing etc.

While I appreciate that this isn't a bug, and that no-one should
actually be setting things up in this way, it does introduce the
ability to hose one's own cluster without realising (until it's
possibly too late).

Would there be a way to prevent this abhorrent scenario from coming
into existence?  One idea is to have a configuration option to be
strict about the presence of a pid file in the data directory, and
force manual intervention, but I'm not sure this would solve the
problem in most cases where this problem exists as someone would have
had to specifically sought out the option and set it.  It might also
encourage some to just delete the pid file thinking that would make
the nasty errors go away.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Protecting against multiple instances per cluster

From
Tom Lane
Date:
Thom Brown <thom@linux.com> writes:
> I've come across a PostgreSQL set up where there are 2 servers, each
> with the same version of PostgreSQL on, both mounting the same SAN
> onto their respective file systems.  It was intended that only 1 of
> the servers would be running an instance of PostgreSQL at a time as
> they both point to the same pgdata.  This was dubbed a "high
> availability" set up, where if one server went down, they could start
> PostgreSQL on the other.  (yes, I know what you're thinking)

Multiply by about ten and you might have an idea what I'm thinking.

> Now
> normally there is protection against 2 instances running only if the
> instances on the same server as it would reference shared memory.  But
> in this case, neither server has access to the other's shared memory,
> so it has to rely on the pid file.  But the pid file isn't enough by
> itself.

The pid file is not merely "not enough", it's entirely useless, since
the two machines aren't sharing a process ID space.  If somebody starts
a postmaster on machine 2, it will almost certainly see the pid in the
pidfile as not running (on machine 2).  So you have no safety interlock
whatsoever in this configuration.

It is possible to build configurations of this type safely, but you need
some external dead-man-switch or STONITH arrangement that forcibly kills
machine 1 (or at least disconnects it from the SAN) before machine 2 is
allowed to start.  Postgres can't do it on its own, and as you've
undoubtedly already found out, human DBAs can't be trusted to get it
right either.
        regards, tom lane


Re: Protecting against multiple instances per cluster

From
Magnus Hagander
Date:
On Thu, Sep 8, 2011 at 20:40, Thom Brown <thom@linux.com> wrote:
> Hi all,
>
> I've come across a PostgreSQL set up where there are 2 servers, each
> with the same version of PostgreSQL on, both mounting the same SAN
> onto their respective file systems.  It was intended that only 1 of
> the servers would be running an instance of PostgreSQL at a time as
> they both point to the same pgdata.  This was dubbed a "high

<snip>

> Would there be a way to prevent this abhorrent scenario from coming
> into existence?  One idea is to have a configuration option to be
> strict about the presence of a pid file in the data directory, and
> force manual intervention, but I'm not sure this would solve the
> problem in most cases where this problem exists as someone would have
> had to specifically sought out the option and set it.  It might also
> encourage some to just delete the pid file thinking that would make
> the nasty errors go away.

There are plenty of clustering products out there that are really
designed for one thing pimarily, and that's dealing with this kind of
fencing. The proper solution is to use one of those. There's no way we
can do what's required from inside postgresql, and I see no reason why
we should re-invent generic clustering software. (for example, if you
do this, we can't prevent the two kernels from corrupting the
filesystem on the shared storage, which we rely on working..)

Such software is often marketed as an "easy way to set up high
availability". It's easy to set up. It requires some actual expertise
to set up *right*. But once you've done that, it works well, and it
prevents this kind of situation to happen.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Protecting against multiple instances per cluster

From
Greg Stark
Date:
On Thu, Sep 8, 2011 at 10:03 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> Would there be a way to prevent this abhorrent scenario from coming
>> into existence?

> There are plenty of clustering products out there that are really
> designed for one thing pimarily, and that's dealing with this kind of
> fencing.

Wouldn't those products exist to *allow* you to set up an environment
like this safely?

I think what Thom is saying is it would be nice if we could notice the
situation looks bad and *stop* the user from doing this at all.

We could do that easily if we were willing to trade off some
convenience for users who don't have shared storage by just removing
the code for determining if there's a stale lock file.

Also if the shared filesystem happened to have a working locking
server and we use the right file locking api then we would be able to
notice an apparently stale lock file that is nonetheless locked by
another postgres instance. There was some talk about using one of the
locking apis a while back.


-- 
greg