Thread: Detecting libpq connections improperly shared via fork()

Detecting libpq connections improperly shared via fork()

From
Daniel Farina
Date:
Per http://archives.postgresql.org/pgsql-hackers/2012-10/msg00167.php

On Wed, Oct 3, 2012 at 9:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> To bring that closer to home, suppose you have a program with an open
> database connection in libpq, and you fork(), and then parent and child
> both try to use the connection.  How well would that work?  Is it the
> fault of fork()?

This brings to mind a lot of issues we see reported among our users:

I see the problem of database connections shared among processes as a
reported problem *all the time*, because of the popularity of
fork-based web servers.  If you do something just a tiny bit wrong you
end up establishing the connection before the fork and then two things
can happen:

* If SSL is off, you never notice until you get some really bizarre
issues that result from an unfortunate collision of protocol traffic.
Since many applications have idle-time, this happens rarely enough to
be subtle that some people never take notice.  A tell-tell sign is an
error reported from something that looks like one query jammed into
another.

* When SSL is enabled this strangeness is seen more or less
immediately, but shows up as cryptic OpenSSL complaints, to which no
earthly person is going to know what to do.

It would be fantastic for libpq to somehow monitor use of a connection
from multiple PIDs that share a parent and deliver an error indicating
what is wrong.  Unfortunately detecting that would require either a
file or some kind of shared memory map, AFAIK, and I don't know how
keen anyone is on accepting that patch.  So, may I ask: how keen is
anyone on accepting such a patch, and under what conditions of
mechanism?

As a minor functional downside, it would hurt a hypothetical user that
is carefully sharing a backend file descriptor between processes using
libpq and synchronizing them very carefully (notably, with encryption
this becomes almost comically difficult and brittle).  I conjecture
such a person is almost entirely hypothetical in nature.

-- 
fdr



Re: Detecting libpq connections improperly shared via fork()

From
Daniel Farina
Date:
On Wed, Oct 3, 2012 at 3:14 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On Thursday, October 04, 2012 12:08:18 AM Daniel Farina wrote:
>> It would be fantastic for libpq to somehow monitor use of a connection
>> from multiple PIDs that share a parent and deliver an error indicating
>> what is wrong.  Unfortunately detecting that would require either a
>> file or some kind of shared memory map, AFAIK, and I don't know how
>> keen anyone is on accepting that patch.  So, may I ask: how keen is
>> anyone on accepting such a patch, and under what conditions of
>> mechanism?
> Hm. An easier version of this could just be storing the pid of the process
> that did the PQconnectdb* in the PGconn struct. You can then check that
> PGconn->pid == getpid() at relatively few places and error out on a mismatch.
> That should be doable with only minor overhead.

I suppose this might needlessly eliminate someone who forks and hands
off the PGconn struct to exactly one child, but it's hard to argue
with its simplicity and portability of mechanism.

-- 
fdr



Re: Detecting libpq connections improperly shared via fork()

From
Andres Freund
Date:
On Thursday, October 04, 2012 12:08:18 AM Daniel Farina wrote:
> It would be fantastic for libpq to somehow monitor use of a connection
> from multiple PIDs that share a parent and deliver an error indicating
> what is wrong.  Unfortunately detecting that would require either a
> file or some kind of shared memory map, AFAIK, and I don't know how
> keen anyone is on accepting that patch.  So, may I ask: how keen is
> anyone on accepting such a patch, and under what conditions of
> mechanism?
Hm. An easier version of this could just be storing the pid of the process 
that did the PQconnectdb* in the PGconn struct. You can then check that 
PGconn->pid == getpid() at relatively few places and error out on a mismatch. 
That should be doable with only minor overhead.

I have seen such errors before...

Andres
-- 
Andres Freund        http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Detecting libpq connections improperly shared via fork()

From
Andres Freund
Date:
On Thursday, October 04, 2012 12:16:14 AM Daniel Farina wrote:
> On Wed, Oct 3, 2012 at 3:14 PM, Andres Freund <andres@2ndquadrant.com> 
wrote:
> > On Thursday, October 04, 2012 12:08:18 AM Daniel Farina wrote:
> >> It would be fantastic for libpq to somehow monitor use of a connection
> >> from multiple PIDs that share a parent and deliver an error indicating
> >> what is wrong.  Unfortunately detecting that would require either a
> >> file or some kind of shared memory map, AFAIK, and I don't know how
> >> keen anyone is on accepting that patch.  So, may I ask: how keen is
> >> anyone on accepting such a patch, and under what conditions of
> >> mechanism?
> > 
> > Hm. An easier version of this could just be storing the pid of the
> > process that did the PQconnectdb* in the PGconn struct. You can then
> > check that PGconn->pid == getpid() at relatively few places and error
> > out on a mismatch. That should be doable with only minor overhead.
> 
> I suppose this might needlessly eliminate someone who forks and hands
> off the PGconn struct to exactly one child, but it's hard to argue
> with its simplicity and portability of mechanism.
Even that scenario isn't easy to get right... You need to get the socket from 
libpq in the parent after the fork() and close it. Only after that you can 
PQfinish it. Which you probably need to do before establishing other 
connections.
The only scenario where I can dream up some use for doing something like that 
isn't really convincing and revolves around setreuid() and peer 
authentication. But you can do the setreuid after the fork just fine...

Andres
-- 
Andres Freund        http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Detecting libpq connections improperly shared via fork()

From
Daniel Farina
Date:
On Wed, Oct 3, 2012 at 3:34 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On Thursday, October 04, 2012 12:16:14 AM Daniel Farina wrote:
>> On Wed, Oct 3, 2012 at 3:14 PM, Andres Freund <andres@2ndquadrant.com>
> wrote:
>> > On Thursday, October 04, 2012 12:08:18 AM Daniel Farina wrote:
>> >> It would be fantastic for libpq to somehow monitor use of a connection
>> >> from multiple PIDs that share a parent and deliver an error indicating
>> >> what is wrong.  Unfortunately detecting that would require either a
>> >> file or some kind of shared memory map, AFAIK, and I don't know how
>> >> keen anyone is on accepting that patch.  So, may I ask: how keen is
>> >> anyone on accepting such a patch, and under what conditions of
>> >> mechanism?
>> >
>> > Hm. An easier version of this could just be storing the pid of the
>> > process that did the PQconnectdb* in the PGconn struct. You can then
>> > check that PGconn->pid == getpid() at relatively few places and error
>> > out on a mismatch. That should be doable with only minor overhead.
>>
>> I suppose this might needlessly eliminate someone who forks and hands
>> off the PGconn struct to exactly one child, but it's hard to argue
>> with its simplicity and portability of mechanism.
> Even that scenario isn't easy to get right... You need to get the socket from
> libpq in the parent after the fork() and close it. Only after that you can
> PQfinish it. Which you probably need to do before establishing other
> connections.

Well, as a general rule, people care a lot less about "that strange
error that happens when I'm done" as opposed to "that thing that
happens randomly while I'm mid-workload," so odds are very slim that
I'd see that defect reported -- I think chances are also poor that
that bug would go fixed in applications that have it, because the
impact would appear minor, so as per the natural of order of things
it'll be deprioritized indefinitely.  But I see your point: the number
of intentional abusers might be even smaller than I thought.

-- 
fdr



Re: Detecting libpq connections improperly shared via fork()

From
Tom Lane
Date:
Daniel Farina <daniel@heroku.com> writes:
> On Wed, Oct 3, 2012 at 3:14 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> Hm. An easier version of this could just be storing the pid of the process
>> that did the PQconnectdb* in the PGconn struct. You can then check that
>> PGconn->pid == getpid() at relatively few places and error out on a mismatch.
>> That should be doable with only minor overhead.

> I suppose this might needlessly eliminate someone who forks and hands
> off the PGconn struct to exactly one child, but it's hard to argue
> with its simplicity and portability of mechanism.

Yeah, the same thing had occurred to me, but I'm not sure that getpid()
is really cheap enough to claim that the overhead is negligible.

A bigger problem with this is that it only fixes the issue for cases in
which somebody makes new threads of control with fork().  I believe that
issues involving multiple threads trying to use the same PGconn are at
least as widespread.  I'm not terribly excited about removing
functionality and adding overhead to protect against just one variant of
the problem.
        regards, tom lane



Re: Detecting libpq connections improperly shared via fork()

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On Thursday, October 04, 2012 12:16:14 AM Daniel Farina wrote:
>> I suppose this might needlessly eliminate someone who forks and hands
>> off the PGconn struct to exactly one child, but it's hard to argue
>> with its simplicity and portability of mechanism.

> Even that scenario isn't easy to get right... You need to get the socket from
> libpq in the parent after the fork() and close it. Only after that you can 
> PQfinish it. Which you probably need to do before establishing other 
> connections.

No, it's much easier than that: the parent can simply forget that it has
a PGconn.  It will leak the memory occupied by the PGconn object, and it
will leak an open socket (which will only be half-open after the child
does PQfinish).  This would be noticeable if the parent is long-lived
and creates many such connections over its lifespan, but otherwise
people could be doing it just fine.  In fact, I had to look closely to
convince myself that pgbench didn't do it already.

I suspect that if we provide a mechanism like this, we'll have to
provide a way to turn it off, or somebody is going to complain that
we broke their code.
        regards, tom lane



Re: Detecting libpq connections improperly shared via fork()

From
Andrew Dunstan
Date:
On 10/03/2012 09:23 PM, Tom Lane wrote:

> A bigger problem with this is that it only fixes the issue for cases in
> which somebody makes new threads of control with fork().  I believe that
> issues involving multiple threads trying to use the same PGconn are at
> least as widespread.  I'm not terribly excited about removing
> functionality and adding overhead to protect against just one variant of
> the problem.
>
>             

I had the same thought re threads.

cheers

andrew




Re: Detecting libpq connections improperly shared via fork()

From
Andres Freund
Date:
On Thursday, October 04, 2012 03:23:54 AM Tom Lane wrote:
> Daniel Farina <daniel@heroku.com> writes:
> > On Wed, Oct 3, 2012 at 3:14 PM, Andres Freund <andres@2ndquadrant.com> 
wrote:
> >> Hm. An easier version of this could just be storing the pid of the
> >> process that did the PQconnectdb* in the PGconn struct. You can then
> >> check that PGconn->pid == getpid() at relatively few places and error
> >> out on a mismatch. That should be doable with only minor overhead.
> > 
> > I suppose this might needlessly eliminate someone who forks and hands
> > off the PGconn struct to exactly one child, but it's hard to argue
> > with its simplicity and portability of mechanism.
> 
> Yeah, the same thing had occurred to me, but I'm not sure that getpid()
> is really cheap enough to claim that the overhead is negligible.
I guess its going to be os/libc dependant. In glibc systems getpid() should be 
basically just be a function call (no syscall).

> A bigger problem with this is that it only fixes the issue for cases in
> which somebody makes new threads of control with fork().  I believe that
> issues involving multiple threads trying to use the same PGconn are at
> least as widespread.  I'm not terribly excited about removing
> functionality and adding overhead to protect against just one variant of
> the problem.
True. But people seem to be more wary of problems in the case of threads... We 
could play similar things with pthread_self() or gettid() but I am not sure how 
portable even the former is...

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Detecting libpq connections improperly shared via fork()

From
Daniel Farina
Date:
On Tue, Oct 9, 2012 at 2:51 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On Thursday, October 04, 2012 03:23:54 AM Tom Lane wrote:
>> Daniel Farina <daniel@heroku.com> writes:
>> > On Wed, Oct 3, 2012 at 3:14 PM, Andres Freund <andres@2ndquadrant.com>
> wrote:
>> >> Hm. An easier version of this could just be storing the pid of the
>> >> process that did the PQconnectdb* in the PGconn struct. You can then
>> >> check that PGconn->pid == getpid() at relatively few places and error
>> >> out on a mismatch. That should be doable with only minor overhead.
>> >
>> > I suppose this might needlessly eliminate someone who forks and hands
>> > off the PGconn struct to exactly one child, but it's hard to argue
>> > with its simplicity and portability of mechanism.
>>
>> Yeah, the same thing had occurred to me, but I'm not sure that getpid()
>> is really cheap enough to claim that the overhead is negligible.
> I guess its going to be os/libc dependant. In glibc systems getpid() should be
> basically just be a function call (no syscall).

To protect users who fork but then thoroughly forget about the
connection in either the parent or child process, the original sketch
I had in mind (which did not use getpid) would be to
increment-and-check a monotonic number of some kind when protocol
traffic is initiated (effectively "tell" on the socket).  If that
shared state is incremented in an unexpected way, then it is known
that another process somewhere has mucked with the protocol state, and
it's time to deliver a lucid error.

That means both a shared (such as an anonymous mmap) and a not-shared
(process-local as per most things when forking, or in the case of
threads thread-local) state would be required.  Both halves have
thorny portability problems AFAIK, so I was somewhat hesitant to bring
it up.

However, I would like to re-iterate that this is a very common
problem, so it may be worth pausing to think about solving it.
Whenever someone writes in saying "what's up with these strange SSL
errors", generally the first question in response is "are you using
unicorn?" (for Ruby, 'gunicorn' for Python).  The answer is almost
invariably yes.  The remainder have renegotiation issues.

-- 
fdr



Re: Detecting libpq connections improperly shared via fork()

From
Martijn van Oosterhout
Date:
On Thu, Oct 04, 2012 at 12:14:02AM +0200, Andres Freund wrote:
> On Thursday, October 04, 2012 12:08:18 AM Daniel Farina wrote:
> > It would be fantastic for libpq to somehow monitor use of a connection
> > from multiple PIDs that share a parent and deliver an error indicating
> > what is wrong.  Unfortunately detecting that would require either a
> > file or some kind of shared memory map, AFAIK, and I don't know how
> > keen anyone is on accepting that patch.  So, may I ask: how keen is
> > anyone on accepting such a patch, and under what conditions of
> > mechanism?
> Hm. An easier version of this could just be storing the pid of the process
> that did the PQconnectdb* in the PGconn struct. You can then check that
> PGconn->pid == getpid() at relatively few places and error out on a mismatch.
> That should be doable with only minor overhead.

On system's that support it, pthread_atfork() might help. Though being
like a signal handler you don't have access to the PGconn structure, so
you can't flag anything easily. Maybe just to cache getpid()?

In any case, adding a check to PQexec and friends would probably be
sufficient, since that's what every program is going to start with.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer