Thread: Detecting libpq connections improperly shared via fork()
Per http://archives.postgresql.org/pgsql-hackers/2012-10/msg00167.php On Wed, Oct 3, 2012 at 9:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > To bring that closer to home, suppose you have a program with an open > database connection in libpq, and you fork(), and then parent and child > both try to use the connection. How well would that work? Is it the > fault of fork()? This brings to mind a lot of issues we see reported among our users: I see the problem of database connections shared among processes as a reported problem *all the time*, because of the popularity of fork-based web servers. If you do something just a tiny bit wrong you end up establishing the connection before the fork and then two things can happen: * If SSL is off, you never notice until you get some really bizarre issues that result from an unfortunate collision of protocol traffic. Since many applications have idle-time, this happens rarely enough to be subtle that some people never take notice. A tell-tell sign is an error reported from something that looks like one query jammed into another. * When SSL is enabled this strangeness is seen more or less immediately, but shows up as cryptic OpenSSL complaints, to which no earthly person is going to know what to do. It would be fantastic for libpq to somehow monitor use of a connection from multiple PIDs that share a parent and deliver an error indicating what is wrong. Unfortunately detecting that would require either a file or some kind of shared memory map, AFAIK, and I don't know how keen anyone is on accepting that patch. So, may I ask: how keen is anyone on accepting such a patch, and under what conditions of mechanism? As a minor functional downside, it would hurt a hypothetical user that is carefully sharing a backend file descriptor between processes using libpq and synchronizing them very carefully (notably, with encryption this becomes almost comically difficult and brittle). I conjecture such a person is almost entirely hypothetical in nature. -- fdr
On Wed, Oct 3, 2012 at 3:14 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On Thursday, October 04, 2012 12:08:18 AM Daniel Farina wrote: >> It would be fantastic for libpq to somehow monitor use of a connection >> from multiple PIDs that share a parent and deliver an error indicating >> what is wrong. Unfortunately detecting that would require either a >> file or some kind of shared memory map, AFAIK, and I don't know how >> keen anyone is on accepting that patch. So, may I ask: how keen is >> anyone on accepting such a patch, and under what conditions of >> mechanism? > Hm. An easier version of this could just be storing the pid of the process > that did the PQconnectdb* in the PGconn struct. You can then check that > PGconn->pid == getpid() at relatively few places and error out on a mismatch. > That should be doable with only minor overhead. I suppose this might needlessly eliminate someone who forks and hands off the PGconn struct to exactly one child, but it's hard to argue with its simplicity and portability of mechanism. -- fdr
On Thursday, October 04, 2012 12:08:18 AM Daniel Farina wrote: > It would be fantastic for libpq to somehow monitor use of a connection > from multiple PIDs that share a parent and deliver an error indicating > what is wrong. Unfortunately detecting that would require either a > file or some kind of shared memory map, AFAIK, and I don't know how > keen anyone is on accepting that patch. So, may I ask: how keen is > anyone on accepting such a patch, and under what conditions of > mechanism? Hm. An easier version of this could just be storing the pid of the process that did the PQconnectdb* in the PGconn struct. You can then check that PGconn->pid == getpid() at relatively few places and error out on a mismatch. That should be doable with only minor overhead. I have seen such errors before... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thursday, October 04, 2012 12:16:14 AM Daniel Farina wrote: > On Wed, Oct 3, 2012 at 3:14 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > On Thursday, October 04, 2012 12:08:18 AM Daniel Farina wrote: > >> It would be fantastic for libpq to somehow monitor use of a connection > >> from multiple PIDs that share a parent and deliver an error indicating > >> what is wrong. Unfortunately detecting that would require either a > >> file or some kind of shared memory map, AFAIK, and I don't know how > >> keen anyone is on accepting that patch. So, may I ask: how keen is > >> anyone on accepting such a patch, and under what conditions of > >> mechanism? > > > > Hm. An easier version of this could just be storing the pid of the > > process that did the PQconnectdb* in the PGconn struct. You can then > > check that PGconn->pid == getpid() at relatively few places and error > > out on a mismatch. That should be doable with only minor overhead. > > I suppose this might needlessly eliminate someone who forks and hands > off the PGconn struct to exactly one child, but it's hard to argue > with its simplicity and portability of mechanism. Even that scenario isn't easy to get right... You need to get the socket from libpq in the parent after the fork() and close it. Only after that you can PQfinish it. Which you probably need to do before establishing other connections. The only scenario where I can dream up some use for doing something like that isn't really convincing and revolves around setreuid() and peer authentication. But you can do the setreuid after the fork just fine... Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 3, 2012 at 3:34 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On Thursday, October 04, 2012 12:16:14 AM Daniel Farina wrote: >> On Wed, Oct 3, 2012 at 3:14 PM, Andres Freund <andres@2ndquadrant.com> > wrote: >> > On Thursday, October 04, 2012 12:08:18 AM Daniel Farina wrote: >> >> It would be fantastic for libpq to somehow monitor use of a connection >> >> from multiple PIDs that share a parent and deliver an error indicating >> >> what is wrong. Unfortunately detecting that would require either a >> >> file or some kind of shared memory map, AFAIK, and I don't know how >> >> keen anyone is on accepting that patch. So, may I ask: how keen is >> >> anyone on accepting such a patch, and under what conditions of >> >> mechanism? >> > >> > Hm. An easier version of this could just be storing the pid of the >> > process that did the PQconnectdb* in the PGconn struct. You can then >> > check that PGconn->pid == getpid() at relatively few places and error >> > out on a mismatch. That should be doable with only minor overhead. >> >> I suppose this might needlessly eliminate someone who forks and hands >> off the PGconn struct to exactly one child, but it's hard to argue >> with its simplicity and portability of mechanism. > Even that scenario isn't easy to get right... You need to get the socket from > libpq in the parent after the fork() and close it. Only after that you can > PQfinish it. Which you probably need to do before establishing other > connections. Well, as a general rule, people care a lot less about "that strange error that happens when I'm done" as opposed to "that thing that happens randomly while I'm mid-workload," so odds are very slim that I'd see that defect reported -- I think chances are also poor that that bug would go fixed in applications that have it, because the impact would appear minor, so as per the natural of order of things it'll be deprioritized indefinitely. But I see your point: the number of intentional abusers might be even smaller than I thought. -- fdr
Daniel Farina <daniel@heroku.com> writes: > On Wed, Oct 3, 2012 at 3:14 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> Hm. An easier version of this could just be storing the pid of the process >> that did the PQconnectdb* in the PGconn struct. You can then check that >> PGconn->pid == getpid() at relatively few places and error out on a mismatch. >> That should be doable with only minor overhead. > I suppose this might needlessly eliminate someone who forks and hands > off the PGconn struct to exactly one child, but it's hard to argue > with its simplicity and portability of mechanism. Yeah, the same thing had occurred to me, but I'm not sure that getpid() is really cheap enough to claim that the overhead is negligible. A bigger problem with this is that it only fixes the issue for cases in which somebody makes new threads of control with fork(). I believe that issues involving multiple threads trying to use the same PGconn are at least as widespread. I'm not terribly excited about removing functionality and adding overhead to protect against just one variant of the problem. regards, tom lane
Andres Freund <andres@2ndquadrant.com> writes: > On Thursday, October 04, 2012 12:16:14 AM Daniel Farina wrote: >> I suppose this might needlessly eliminate someone who forks and hands >> off the PGconn struct to exactly one child, but it's hard to argue >> with its simplicity and portability of mechanism. > Even that scenario isn't easy to get right... You need to get the socket from > libpq in the parent after the fork() and close it. Only after that you can > PQfinish it. Which you probably need to do before establishing other > connections. No, it's much easier than that: the parent can simply forget that it has a PGconn. It will leak the memory occupied by the PGconn object, and it will leak an open socket (which will only be half-open after the child does PQfinish). This would be noticeable if the parent is long-lived and creates many such connections over its lifespan, but otherwise people could be doing it just fine. In fact, I had to look closely to convince myself that pgbench didn't do it already. I suspect that if we provide a mechanism like this, we'll have to provide a way to turn it off, or somebody is going to complain that we broke their code. regards, tom lane
On 10/03/2012 09:23 PM, Tom Lane wrote: > A bigger problem with this is that it only fixes the issue for cases in > which somebody makes new threads of control with fork(). I believe that > issues involving multiple threads trying to use the same PGconn are at > least as widespread. I'm not terribly excited about removing > functionality and adding overhead to protect against just one variant of > the problem. > > I had the same thought re threads. cheers andrew
On Thursday, October 04, 2012 03:23:54 AM Tom Lane wrote: > Daniel Farina <daniel@heroku.com> writes: > > On Wed, Oct 3, 2012 at 3:14 PM, Andres Freund <andres@2ndquadrant.com> wrote: > >> Hm. An easier version of this could just be storing the pid of the > >> process that did the PQconnectdb* in the PGconn struct. You can then > >> check that PGconn->pid == getpid() at relatively few places and error > >> out on a mismatch. That should be doable with only minor overhead. > > > > I suppose this might needlessly eliminate someone who forks and hands > > off the PGconn struct to exactly one child, but it's hard to argue > > with its simplicity and portability of mechanism. > > Yeah, the same thing had occurred to me, but I'm not sure that getpid() > is really cheap enough to claim that the overhead is negligible. I guess its going to be os/libc dependant. In glibc systems getpid() should be basically just be a function call (no syscall). > A bigger problem with this is that it only fixes the issue for cases in > which somebody makes new threads of control with fork(). I believe that > issues involving multiple threads trying to use the same PGconn are at > least as widespread. I'm not terribly excited about removing > functionality and adding overhead to protect against just one variant of > the problem. True. But people seem to be more wary of problems in the case of threads... We could play similar things with pthread_self() or gettid() but I am not sure how portable even the former is... Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Oct 9, 2012 at 2:51 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On Thursday, October 04, 2012 03:23:54 AM Tom Lane wrote: >> Daniel Farina <daniel@heroku.com> writes: >> > On Wed, Oct 3, 2012 at 3:14 PM, Andres Freund <andres@2ndquadrant.com> > wrote: >> >> Hm. An easier version of this could just be storing the pid of the >> >> process that did the PQconnectdb* in the PGconn struct. You can then >> >> check that PGconn->pid == getpid() at relatively few places and error >> >> out on a mismatch. That should be doable with only minor overhead. >> > >> > I suppose this might needlessly eliminate someone who forks and hands >> > off the PGconn struct to exactly one child, but it's hard to argue >> > with its simplicity and portability of mechanism. >> >> Yeah, the same thing had occurred to me, but I'm not sure that getpid() >> is really cheap enough to claim that the overhead is negligible. > I guess its going to be os/libc dependant. In glibc systems getpid() should be > basically just be a function call (no syscall). To protect users who fork but then thoroughly forget about the connection in either the parent or child process, the original sketch I had in mind (which did not use getpid) would be to increment-and-check a monotonic number of some kind when protocol traffic is initiated (effectively "tell" on the socket). If that shared state is incremented in an unexpected way, then it is known that another process somewhere has mucked with the protocol state, and it's time to deliver a lucid error. That means both a shared (such as an anonymous mmap) and a not-shared (process-local as per most things when forking, or in the case of threads thread-local) state would be required. Both halves have thorny portability problems AFAIK, so I was somewhat hesitant to bring it up. However, I would like to re-iterate that this is a very common problem, so it may be worth pausing to think about solving it. Whenever someone writes in saying "what's up with these strange SSL errors", generally the first question in response is "are you using unicorn?" (for Ruby, 'gunicorn' for Python). The answer is almost invariably yes. The remainder have renegotiation issues. -- fdr
On Thu, Oct 04, 2012 at 12:14:02AM +0200, Andres Freund wrote: > On Thursday, October 04, 2012 12:08:18 AM Daniel Farina wrote: > > It would be fantastic for libpq to somehow monitor use of a connection > > from multiple PIDs that share a parent and deliver an error indicating > > what is wrong. Unfortunately detecting that would require either a > > file or some kind of shared memory map, AFAIK, and I don't know how > > keen anyone is on accepting that patch. So, may I ask: how keen is > > anyone on accepting such a patch, and under what conditions of > > mechanism? > Hm. An easier version of this could just be storing the pid of the process > that did the PQconnectdb* in the PGconn struct. You can then check that > PGconn->pid == getpid() at relatively few places and error out on a mismatch. > That should be doable with only minor overhead. On system's that support it, pthread_atfork() might help. Though being like a signal handler you don't have access to the PGconn structure, so you can't flag anything easily. Maybe just to cache getpid()? In any case, adding a check to PQexec and friends would probably be sufficient, since that's what every program is going to start with. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer