Thread: postgresql in FreeBSD jails: proposal

postgresql in FreeBSD jails: proposal

From
Mischa Sandberg
Date:
Here (@sophos.com) we run machine cluster tests using FreeBSD jails. A
jail is halfway between a chroot and a VM. Jails blow a number of
assumptions about a unix environment: sysv ipc's are global to all
jails; but a process can only "see" other processes also running in the
jail. In fact, the quickest way to tell whether you're running in a jail
is to test for process 1.

PGSharedMemoryCreate chooses/reuses an ipc key in a reasonable way to
cover previous postmasters crashing and leaving a shm seg behind,
possibly with some backends still running.

Unfortunately, with multiple jails running PG servers and (due to app
limitations) all servers having same PGPORT, you get the situation that
when jail#2 (,jail#3,...) server comes up, it:
- detects that there is a shm seg with ipc key 5432001
- checks whether the associated postmaster process exists (with kill -0)
- overwrites the segment created and being used by jail #1

There's a workaround (there always is) other than this patch, involving
NAT translation so that the postmasters listen on different ports, but
the outside world sees them each listening on 5432. But that seems
somewhat circuitous.

I've hacked sysv_shmem.c (in PG 8.0.9) to handle this problem. Given the
trouble that postmaster goes to, to stop shm seg leakage, I'd like to
solicit any opinions on the wisdom of this edge case. If this patch IS
useful, what would be the right level of compile-time restriction
("#ifdef __FreeBSD__" ???)

@@ -319,7 +319,8 @@               if (makePrivate)                /* a standalone backend
shouldn't do this */                       continue;
-
+               /* In a FreeBSD jail, you can't "kill -0" a postmaster
+                * running in a different jail, so the shm seg might
+                * still be in use. Safer to test nattch ?
+                */
+               if (kill(1,0) && errno == ESRCH &&
!PGSharedMemoryIsInUse(0,NextShmSegID))
+                       continue;               if ((memAddress = PGSharedMemoryAttach(NextShmemSegID,
&shmid)) == NULL)                       continue;                       /* can't attach,
not one of mine */
End of Patch.



Re: postgresql in FreeBSD jails: proposal

From
Tom Lane
Date:
Mischa Sandberg <mischa_sandberg@telus.net> writes:
> +               /* In a FreeBSD jail, you can't "kill -0" a postmaster
> +                * running in a different jail, so the shm seg might
> +                * still be in use. Safer to test nattch ?
> +                */
> +               if (kill(1,0) && errno == ESRCH && !PGSharedMemoryIsInUse(0,NextShmSegID))
> +                       continue;

Isn't the last part of that test backward?  If it isn't, I don't
understand what it's for at all.
        regards, tom lane


Re: postgresql in FreeBSD jails: proposal

From
Mischa Sandberg
Date:
Quoting Tom Lane <tgl@sss.pgh.pa.us>:

> Mischa Sandberg <mischa_sandberg@telus.net> writes:
> > +               /* In a FreeBSD jail, you can't "kill -0" a
> postmaster
> > +                * running in a different jail, so the shm seg
> might
> > +                * still be in use. Safer to test nattch ?
> > +                */
> > +               if (kill(1,0) && errno == ESRCH &&
> PGSharedMemoryIsInUse(0,NextShmemSegID))
> > +                       continue;
> 
> Isn't the last part of that test backward?  If it isn't, I don't
> understand what it's for at all.

Serious blush here. Yes.



Re: postgresql in FreeBSD jails: proposal

From
Tom Lane
Date:
Mischa Sandberg <mischa_sandberg@telus.net> writes:
> Quoting Tom Lane <tgl@sss.pgh.pa.us>:
>> Mischa Sandberg <mischa_sandberg@telus.net> writes:
>>> +               if (kill(1,0) && errno == ESRCH && PGSharedMemoryIsInUse(0,NextShmemSegID))
>>> +                       continue;
>> 
>> Isn't the last part of that test backward?  If it isn't, I don't
>> understand what it's for at all.

> Serious blush here. Yes.

Actually, after re-reading what PGSharedMemoryIsInUse does, I don't
think you want to use it: it goes to considerable lengths to avoid
returning a false positive, whereas in this context I believe we
*do* need to avoid segments that belong to other data directories.
So you probably need a separate chunk of code that only does the
nattch test.
        regards, tom lane


Re: postgresql in FreeBSD jails: proposal

From
Stephen Frost
Date:
* Mischa Sandberg (mischa_sandberg@telus.net) wrote:
> Here (@sophos.com) we run machine cluster tests using FreeBSD jails. A
> jail is halfway between a chroot and a VM. Jails blow a number of
> assumptions about a unix environment: sysv ipc's are global to all
> jails; but a process can only "see" other processes also running in the
> jail. In fact, the quickest way to tell whether you're running in a jail
> is to test for process 1.

I've got a couple of concerns about this-

#1: Having the shared memory be global is a rather large problem when it   comes to something like PG which can have a
fairbit of data goingthrough that area that could be sensitive.
 
#2: Isn't there already a uid check that's done?  Wouldn't this makemore sense anyway (and hopefully minimize the
impactof a bad persongetting control of the PG database/user in a given jail)?
 
#3: At least in the linux-equivilant to jails (linux-vservers, imvanyway), they started w/o an init process and
eventuallydecided itmade sense to have one, so I'm not sure that this test will alwayswork and the result might catch
someoneby suprise at some laterdate.  Is there a better/more explicit test?
 
Thanks,
    Stephen

Re: postgresql in FreeBSD jails: proposal

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> I've got a couple of concerns about this-

> #1: Having the shared memory be global is a rather large problem when it
>     comes to something like PG which can have a fair bit of data going
>     through that area that could be sensitive.

Well, you'd have to talk to the FreeBSD kernel hackers about changing
that, but I imagine it's still true that userid permissions checking
applies.  Whether to run the postmasters that are in different jails
under different userids is a separate questions.

> #3: At least in the linux-equivilant to jails (linux-vservers, imv
>     anyway), they started w/o an init process and eventually decided it
>     made sense to have one, so I'm not sure that this test will always
>     work and the result might catch someone by suprise at some later
>     date.  Is there a better/more explicit test?

We could just leave out the kill(1,0) part.  In fact I wonder whether
we shouldn't do something like this on all platforms not only FreeBSD.
Quite aside from any considerations of jails, it seems like a pretty
bad idea to try to zap a shmem segment that has any attached processes.

Consider a system that normally runs multiple postmasters, in which one
postmaster has died but left orphaned backends behind, and we are trying
to start an unrelated postmaster.  The current code seems capable of
deciding to zap the segment with those orphaned backends attached.
This'd require a shmem key collision which seems pretty improbable given
our key assignments, but not quite impossible.  If it did happen then
the net effect would be to clear the segment's ID (since it can't
actually go away till the connected processes do).  The bad thing about
that is that if the dead postmaster were then restarted, it wouldn't
recognize the segment as being its own, and would happily start up
despite the orphaned backends.  Result: exactly the kind of conflicts
and data corruption that all these interlocks are trying to prevent.

So unless I'm missing something here, adding a check for nattch = 0
is a good idea, quite aside from making FreeBSD jails safer.

I think the worrisome question that follows on from Stephen's is really
whether FreeBSD will ever decide to lie about nattch (ie, exclude
processes in other jails from that count).
        regards, tom lane


Re: postgresql in FreeBSD jails: proposal

From
"Marc G. Fournier"
Date:
mischa_sandberg@telus.net (Mischa Sandberg) writes:

>Unfortunately, with multiple jails running PG servers and (due to app
>limitations) all servers having same PGPORT, you get the situation that
>when jail#2 (,jail#3,...) server comes up, it:
>- detects that there is a shm seg with ipc key 5432001
>- checks whether the associated postmaster process exists (with kill -0)
>- overwrites the segment created and being used by jail #1

Easiest fix: change the UID of the user running the postmaster (ie. pgsql) so
that each runs as a distinct UID (instead of distinct PGPORT) ... been doing
this since moving to FreeBSD 6.x ... no patches required ...
--
----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664

Re: [ADMIN] postgresql in FreeBSD jails: proposal

From
Tom Lane
Date:
"Marc G. Fournier" <scrappy@hub.org> writes:
> mischa_sandberg@telus.net (Mischa Sandberg) writes:
>> Unfortunately, with multiple jails running PG servers and (due to app
>> limitations) all servers having same PGPORT, you get the situation that
>> when jail#2 (,jail#3,...) server comes up, it:
>> - detects that there is a shm seg with ipc key 5432001
>> - checks whether the associated postmaster process exists (with kill -0)
>> - overwrites the segment created and being used by jail #1

> Easiest fix: change the UID of the user running the postmaster (ie. pgsql) so
> that each runs as a distinct UID (instead of distinct PGPORT) ... been doing 
> this since moving to FreeBSD 6.x ... no patches required ...

Sure, but in the spirit of "belt and suspenders too", I'd think that
doing that *and* something like Mischa's proposal wouldn't be bad.

(BTW, as far as I saw the original post only went to -hackers
... there's something messed up about your reply.)
        regards, tom lane


Re: [ADMIN] postgresql in FreeBSD jails: proposal

From
"Marc G. Fournier"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



- --On Thursday, January 17, 2008 01:12:54 -0500 Tom Lane <tgl@sss.pgh.pa.us> 
wrote:

> "Marc G. Fournier" <scrappy@hub.org> writes:
>> mischa_sandberg@telus.net (Mischa Sandberg) writes:
>>> Unfortunately, with multiple jails running PG servers and (due to app
>>> limitations) all servers having same PGPORT, you get the situation that
>>> when jail#2 (,jail#3,...) server comes up, it:
>>> - detects that there is a shm seg with ipc key 5432001
>>> - checks whether the associated postmaster process exists (with kill -0)
>>> - overwrites the segment created and being used by jail #1
>
>> Easiest fix: change the UID of the user running the postmaster (ie. pgsql) so
>> that each runs as a distinct UID (instead of distinct PGPORT) ... been doing
>> this since moving to FreeBSD 6.x ... no patches required ...
>
> Sure, but in the spirit of "belt and suspenders too", I'd think that
> doing that *and* something like Mischa's proposal wouldn't be bad.

No arguments here, just pointing out that changing PGPORT isn't/wasnt' the only 
way of addressing this problem ... if we can do something more 'internal', it 
would definitely make life alot easier ...

- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHjvMR4QvfyHIvDvMRAjJuAKCAYGkyvDOMkA6wqeQ7nAqoA1mkRQCdG+5n
aD1uG+zUtevdJGJ3BsqeDAs=
=Y0DY
-----END PGP SIGNATURE-----



Re: [ADMIN] postgresql in FreeBSD jails: proposal

From
Alvaro Herrera
Date:
Cc:
"pgadmin-devteam@postgresql.org.pgadmin-hackers@postgresql.org.pgadmin-support@postgresql.org.pgsql-admin@postgresql.org.pgsql-advocacy@postgresql.org.pgsql-announce@postgresql.org.pgsql-benchmarks@postgresql.org.pgsql-bugs@postgresql.org.pgsql-chat"@post

Hey, this is exactly the sort of weird "Cc:" line I saw in the recent
spam surge.  Since I suspect you are using the news server to post, I
suggest you take a long and careful look at the gateway's configuration.
It seems there's something very broken here.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: [ADMIN] postgresql in FreeBSD jails: proposal

From
"Dave Page"
Date:
On 17/01/2008, Alvaro Herrera <alvherre@commandprompt.com> wrote:
>
> Cc:
"pgadmin-devteam@postgresql.org.pgadmin-hackers@postgresql.org.pgadmin-support@postgresql.org.pgsql-admin@postgresql.org.pgsql-advocacy@postgresql.org.pgsql-announce@postgresql.org.pgsql-benchmarks@postgresql.org.pgsql-bugs@postgresql.org.pgsql-chat"@post
>
> Hey, this is exactly the sort of weird "Cc:" line I saw in the recent
> spam surge.  Since I suspect you are using the news server to post, I
> suggest you take a long and careful look at the gateway's configuration.
> It seems there's something very broken here.

I sure hope that pgadmin-devteam isn't going out on the newserver -
thats the pgAdmin equivalent to pgsql-core :-O

/D


Re: [ADMIN] postgresql in FreeBSD jails: proposal

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> "Marc G. Fournier" <scrappy@hub.org> writes:
> > Easiest fix: change the UID of the user running the postmaster (ie. pgsql) so
> > that each runs as a distinct UID (instead of distinct PGPORT) ... been doing
> > this since moving to FreeBSD 6.x ... no patches required ...
>
> Sure, but in the spirit of "belt and suspenders too", I'd think that
> doing that *and* something like Mischa's proposal wouldn't be bad.

I agree that we should try to be careful about stepping on segments that
might still be in use, but I would also discourage jail users from using
the same uid for multiple PG clusters since the jail doesn't protect the
shmem segment.  We use seperate uids even w/ linux-vservers where shmem
and everything *is* seperate, following the same 'belt and suspenders
too' spirit for security.
Thanks,
    Stephen

Re: [ADMIN] postgresql in FreeBSD jails: proposal

From
"Marc G. Fournier"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



- --On Thursday, January 17, 2008 13:58:36 +0000 Dave Page
<dpage@postgresql.org> 
wrote:

> On 17/01/2008, Alvaro Herrera <alvherre@commandprompt.com> wrote:
>>
>> Cc:
>> "pgadmin-devteam@postgresql.org.pgadmin-hackers@postgresql.org.pgadmin-suppo
>> rt@postgresql.org.pgsql-admin@postgresql.org.pgsql-advocacy@postgresql.org.p
>> gsql-announce@postgresql.org.pgsql-benchmarks@postgresql.org.pgsql-bugs@post
>> gresql.org.pgsql-chat"@post
>>
>> Hey, this is exactly the sort of weird "Cc:" line I saw in the recent
>> spam surge.  Since I suspect you are using the news server to post, I
>> suggest you take a long and careful look at the gateway's configuration.
>> It seems there's something very broken here.
>
> I sure hope that pgadmin-devteam isn't going out on the newserver -
> thats the pgAdmin equivalent to pgsql-core :-O

Just checked the subscriber list for that list, and I don't see news listed on 
it ...and no such newsgroup either:

%grep pgadmin db/active
pgsql.interfaces.pgadmin.hackers 0000004826 0000000001 y
pgsql.interfaces.pgadmin.support 0000002946 0000000001 y


- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHj9Ri4QvfyHIvDvMRAq/SAJ433rjmQjHG5OiR1PJ3BOq/93kPBwCg4an3
QaqGiypV6Jp0Bovi/O7EADs=
=KKZB
-----END PGP SIGNATURE-----



Re: [ADMIN] postgresql in FreeBSD jails: proposal

From
Mischa Sandberg
Date:
Quoting Stephen Frost <sfrost@snowman.net>:

> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> > "Marc G. Fournier" <scrappy@hub.org> writes:
> > > Easiest fix: change the UID of the user running the postmaster
> (ie. pgsql) so
> > > that each runs as a distinct UID (instead of distinct PGPORT) ...
> been doing 
> > > this since moving to FreeBSD 6.x ... no patches required ...
> > 
> > Sure, but in the spirit of "belt and suspenders too", I'd think
> that
> > doing that *and* something like Mischa's proposal wouldn't be bad.
> 
> I agree that we should try to be careful about stepping on 
> segments that might still be in use, but I would also discourage
> jail users from using the same uid for multiple PG clusters 
> since the jail doesn't protect the shmem segment.  
> We use seperate uids even w/ linux-vservers where shmem
> and everything *is* seperate, following the same 
> 'belt and suspenders too' spirit for security.

Thanks for all the input. Fixing freebsd might get answered
on a different channel :-)

Unfortunately, different uid's is not even an option here;
but serious security in this sitch is  not relevant, either.

We have a freebsd core guy here, and he says that there's no
pressing incentive for jails to handle sysv ipc, given mmap
and file locking :-( And given his other comments, I wouldn't
consider jails a "secure" environment, just a modest and
convenient way to emulate multiple machines with caveats.
.........................................................
So, given Tom's comment, that it's antisocial to zap a shm seg 
that other processes have attached ...

I'm going to skip the kill(1,0) test and depend on nattch only,
with a function that PGSharedMemoryIsInUse() can also use.
(For a healthy server, nattch is never less than 2, right?)
If no unpleasant edge cases come out of this in our test
framework, I'd like to submit that as a patch. 
Talked with our Linux guys about vserver, and they see no issues. 
Mr. Solaris here is currently a long way ooto ... opinions?

Afaics the change in behaviour is, if a degraded server exited
with some backend hanging, the second server will create a new
segment after bumping the ipc key; if system shm limits do not
allow for two such shm segments, the second server will bail.
For production systems, ensuring no orphan shm segs
is not left to heuristic clean-up by server re-start.

Hope that makes sense for the generic Postgres world.

If anyone is interested in creating hung backends, you can
create a named pipe, and tell the server to COPY from it.
---
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.




Re: [ADMIN] postgresql in FreeBSD jails: proposal

From
Tom Lane
Date:
Mischa Sandberg <mischa_sandberg@telus.net> writes:
> If anyone is interested in creating hung backends, you can
> create a named pipe, and tell the server to COPY from it.

That won't create a problematic situation though, until/unless you
SIGQUIT the parent postmaster.

Personally I think of this as "what happens after someone
kill -9's a postmaster that has live children".
        regards, tom lane


Re: [ADMIN] postgresql in FreeBSD jails: proposal

From
Tom Lane
Date:
Mischa Sandberg <mischa_sandberg@telus.net> writes:
> I'm going to skip the kill(1,0) test and depend on nattch only,
> with a function that PGSharedMemoryIsInUse() can also use.
> (For a healthy server, nattch is never less than 2, right?)

Oh, forgot to mention: healthy servers are not the point here.
You should make the code keep its hands off any segment with
nonzero nattch, because even one orphaned backend is enough
to cause trouble.
        regards, tom lane


Re: [ADMIN] postgresql in FreeBSD jails: proposal

From
Mischa Sandberg
Date:
Quoting Tom Lane <tgl@sss.pgh.pa.us>:

> Mischa Sandberg <mischa_sandberg@telus.net> writes:
> > I'm going to skip the kill(1,0) test and depend on nattch only,
> > with a function that PGSharedMemoryIsInUse() can also use.
> > (For a healthy server, nattch is never less than 2, right?)
> 
> Oh, forgot to mention: healthy servers are not the point here.
> You should make the code keep its hands off any segment with
> nonzero nattch, because even one orphaned backend is enough
> to cause trouble.

Note taken. Worth putting a warning in the log, too?

Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.



Re: postgresql in FreeBSD jails: proposal

From
Bruce Momjian
Date:
Added to TODO:

* Improve detection of shared memory segments being used by other FreeBSD jails
 http://archives.postgresql.org/pgsql-hackers/2008-01/msg00656.php


---------------------------------------------------------------------------

Mischa Sandberg wrote:
> Here (@sophos.com) we run machine cluster tests using FreeBSD jails. A
> jail is halfway between a chroot and a VM. Jails blow a number of
> assumptions about a unix environment: sysv ipc's are global to all
> jails; but a process can only "see" other processes also running in the
> jail. In fact, the quickest way to tell whether you're running in a jail
> is to test for process 1.
> 
> PGSharedMemoryCreate chooses/reuses an ipc key in a reasonable way to
> cover previous postmasters crashing and leaving a shm seg behind,
> possibly with some backends still running.
> 
> Unfortunately, with multiple jails running PG servers and (due to app
> limitations) all servers having same PGPORT, you get the situation that
> when jail#2 (,jail#3,...) server comes up, it:
> - detects that there is a shm seg with ipc key 5432001
> - checks whether the associated postmaster process exists (with kill -0)
> - overwrites the segment created and being used by jail #1
> 
> There's a workaround (there always is) other than this patch, involving
> NAT translation so that the postmasters listen on different ports, but
> the outside world sees them each listening on 5432. But that seems
> somewhat circuitous.
> 
> I've hacked sysv_shmem.c (in PG 8.0.9) to handle this problem. Given the
> trouble that postmaster goes to, to stop shm seg leakage, I'd like to
> solicit any opinions on the wisdom of this edge case. If this patch IS
> useful, what would be the right level of compile-time restriction
> ("#ifdef __FreeBSD__" ???)
> 
> @@ -319,7 +319,8 @@
>  
>                 if (makePrivate)                /* a standalone backend
> shouldn't do this */
>                         continue;
> -
> +               /* In a FreeBSD jail, you can't "kill -0" a postmaster
> +                * running in a different jail, so the shm seg might
> +                * still be in use. Safer to test nattch ?
> +                */
> +               if (kill(1,0) && errno == ESRCH &&
> !PGSharedMemoryIsInUse(0,NextShmSegID))
> +                       continue;
>                 if ((memAddress = PGSharedMemoryAttach(NextShmemSegID,
> &shmid)) == NULL)
>                         continue;                       /* can't attach,
> not one of mine */
>  
> End of Patch.
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: postgresql in FreeBSD jails: proposal

From
Alvaro Herrera
Date:
Bruce Momjian wrote:
> 
> Added to TODO:
> 
> * Improve detection of shared memory segments being used by other
>   FreeBSD jails
> 
>   http://archives.postgresql.org/pgsql-hackers/2008-01/msg00656.php

There's a bit more than that to it -- see
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00673.php

In short, it's not just a FreeBSD issue, but something a bit more
general.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: postgresql in FreeBSD jails: proposal

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > 
> > Added to TODO:
> > 
> > * Improve detection of shared memory segments being used by other
> >   FreeBSD jails
> > 
> >   http://archives.postgresql.org/pgsql-hackers/2008-01/msg00656.php
> 
> There's a bit more than that to it -- see
> http://archives.postgresql.org/pgsql-hackers/2008-01/msg00673.php
> 
> In short, it's not just a FreeBSD issue, but something a bit more
> general.

Added to TODO:

* Improve detection of shared memory segments being used by others by checking the SysV shared memory field 'nattch'
 http://archives.postgresql.org/pgsql-hackers/2008-01/msg00656.php
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00673.php

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +