Thread: Sharing database handles across forked child processes

Sharing database handles across forked child processes

From

dan@sidhe.org

Date:

13 November 2007, 13:02:41

How does Postgres handle sharing database handles across child processes?
That is, if I have a process that opens a connection to the database and
then forks a few child processes, what happens?


Can the child processes safely use the handle?


If one child closes the handle, what happens to the handle in all the
other children? The parent?

This isn't a great thing to do, I realize, but I'm wedging database access
into an existing heavily fork-bound perl program, so my hands are somewhat
tied architecturally. (If it means I have to constantly test to see if the
handle's valid, and may have to deal with a handle randomly going away on
me, I can handle that -- I'm more worried about data corruption and
deadlock problems here, stuff I can't reasonably catch at the application
level)

-Dan

Re: Sharing database handles across forked child processes

From

Martijn van Oosterhout

Date:

13 November 2007, 13:13:50

On Tue, Nov 13, 2007 at 12:02:31PM -0500, dan@sidhe.org wrote:
> How does Postgres handle sharing database handles across child processes?
> That is, if I have a process that opens a connection to the database and
> then forks a few child processes, what happens?
>
> Can the child processes safely use the handle?

No.

> If one child closes the handle, what happens to the handle in all the
> other children? The parent?

Just closing the file descriptor is ok. Just forgetting about it is ok
too.. Best just ignore you have it open at all...

> This isn't a great thing to do, I realize, but I'm wedging database access
> into an existing heavily fork-bound perl program, so my hands are somewhat
> tied architecturally. (If it means I have to constantly test to see if the
> handle's valid, and may have to deal with a handle randomly going away on
> me, I can handle that -- I'm more worried about data corruption and
> deadlock problems here, stuff I can't reasonably catch at the application
> level)

You're going to need a seperate handle for each process. Two processes
writing to the same socket won't work. Maybe just setup a table indexed
by PID and make sure you only use your own. Or after a fork() do a
"close $dbh->getfd()" (untested).

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
>  -- John F Kennedy

Attachment

signature.asc

Re: Sharing database handles across forked child processes

From

Tom Lane

Date:

13 November 2007, 13:27:00

Martijn van Oosterhout <kleptog@svana.org> writes:
> On Tue, Nov 13, 2007 at 12:02:31PM -0500, dan@sidhe.org wrote:
>> How does Postgres handle sharing database handles across child processes?
>> That is, if I have a process that opens a connection to the database and
>> then forks a few child processes, what happens?
>>
>> Can the child processes safely use the handle?

> No.

For some time now, libpq has set FD_CLOEXEC on the socket connection to
the backend, which ensures that child processes won't be able to mess up
the parent's database connection.  However it sounded like Dan might be
doing fork without exec, in which case he's definitely at risk ...

            regards, tom lane

Re: Sharing database handles across forked child processes

From

dan@sidhe.org

Date:

13 November 2007, 14:18:35

> Martijn van Oosterhout <kleptog@svana.org> writes:
>> On Tue, Nov 13, 2007 at 12:02:31PM -0500, dan@sidhe.org wrote:
>>> How does Postgres handle sharing database handles across child
>>> processes?
>>> That is, if I have a process that opens a connection to the database
>>> and
>>> then forks a few child processes, what happens?
>>>
>>> Can the child processes safely use the handle?
>
>> No.
>
> For some time now, libpq has set FD_CLOEXEC on the socket connection to
> the backend, which ensures that child processes won't be able to mess up
> the parent's database connection.  However it sounded like Dan might be
> doing fork without exec, in which case he's definitely at risk ...

Yep, this is a fork without exec. And the child processes often aren't
even doing any database access -- the database connection's opened and
held, then a child is forked off, and the child 'helpfully' closes the
handle during the child's global destruction phase.

Am I at any risk in the parent process? That is, if the parent's got some
transaction open, the child is forked, then the child either issues
(perhaps in error) a command to the database or shuts the handle down, am
I going to see any sort of corruption of the data on the back end?

I fully realize this is a Bad Thing, no argument there -- I'm just trying
to get a feel for my failure modes. If it's just going to be that the
parent sees the handle go away that's one thing, if I'm going to see weird
interleaving of commands from the parent and child or the back end's going
to get confused enough to corrupt the database it's something else
entirely.

-Dan

Re: Sharing database handles across forked child processes

From

"Greg Sabino Mullane"

Date:

13 November 2007, 14:37:53

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> Yep, this is a fork without exec. And the child processes often aren't
> even doing any database access -- the database connection's opened and
> held, then a child is forked off, and the child 'helpfully' closes the
> handle during the child's global destruction phase.
>
> Am I at any risk in the parent process?

Yes. But there is an easy solution, asuming you are using DBI:

$dbh->{InactiveDestroy} = 1;

This tells DBI not to do anything special when inside of DESTROY. Set
on the kids immediately after forking.

> "the child processes often aren't even doing any database access"
                       ^^^^^^^^^^^^

Often aren't? This should be "never", period, unless the parent contracts
to stop doing database access after the fork. You can't have two processes
sharing a handle.

Note also that InactiveDestroy should not be your first choice. Far
better to do the forking before the database connection whenever possible.
If they both need access, you can also disconnect, fork, and have both
reconnect afterwards.

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200711131332
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFHOe7qvJuQZxSWSsgRA/BUAJ4tfyoZja93h3q6EtJ3lHiGRRODOACg/M2Y
5VlkKiSZNfstdgrD5Ru+Q/c=
=OjGF
-----END PGP SIGNATURE-----

Re: Sharing database handles across forked child processes

From

Martijn van Oosterhout

Date:

13 November 2007, 14:41:33

On Tue, Nov 13, 2007 at 01:18:25PM -0500, dan@sidhe.org wrote:
> Yep, this is a fork without exec. And the child processes often aren't
> even doing any database access -- the database connection's opened and
> held, then a child is forked off, and the child 'helpfully' closes the
> handle during the child's global destruction phase.

Yes, that happens.

> Am I at any risk in the parent process? That is, if the parent's got some
> transaction open, the child is forked, then the child either issues
> (perhaps in error) a command to the database or shuts the handle down, am
> I going to see any sort of corruption of the data on the back end?

Well,corruption of the backend is unlikely, but you're likely to get
and lot of strange errors on your other connections. That why I
suggested closing the filehandle behind the back of the library. Or
better, dup2() /dev/null over the top of it. Then during global
destruction it'll just think the DB went away.

It's a bit of a hack though.

> I fully realize this is a Bad Thing, no argument there -- I'm just trying
> to get a feel for my failure modes. If it's just going to be that the
> parent sees the handle go away that's one thing, if I'm going to see weird
> interleaving of commands from the parent and child or the back end's going
> to get confused enough to corrupt the database it's something else
> entirely.

I think the effect is comparable to two people typing into the same
shell, and each only getting half the output back. Sure, you're unlikely
to lose anything big, but do you want to risk it?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
>  -- John F Kennedy

Attachment

signature.asc

Re: Sharing database handles across forked child processes

From

Vivek Khera

Date:

13 November 2007, 15:20:01

On Nov 13, 2007, at 1:18 PM, dan@sidhe.org wrote:

> Yep, this is a fork without exec. And the child processes often aren't
> even doing any database access -- the database connection's opened and
> held, then a child is forked off, and the child 'helpfully' closes the
> handle during the child's global destruction phase.

What's your programming language?  If it is perl using the DBI, you
*must* close the handle on the child else perl's object destroy will
try to close the handle by doing a shutdown on the connection, which
will muck up your parent.  The voodoo to make this happen is this:

  $dbh->{InactiveDestroy} = 1;
  $dbh = undef;

Also note that for some reason, this invalidates any prepared
statements in the parent DBI object, so you need to make sure you
don't have any, or just re-open the handle on the parent too.

Re: Sharing database handles across forked child processes

From

dan@sidhe.org

Date:

13 November 2007, 15:56:33

>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: RIPEMD160
>
>
>> Yep, this is a fork without exec. And the child processes often aren't
>> even doing any database access -- the database connection's opened and
>> held, then a child is forked off, and the child 'helpfully' closes the
>> handle during the child's global destruction phase.
>>
>> Am I at any risk in the parent process?
>
> Yes. But there is an easy solution, asuming you are using DBI:
>
> $dbh->{InactiveDestroy} = 1;
>
> This tells DBI not to do anything special when inside of DESTROY. Set
> on the kids immediately after forking.

I don't currently have a wedge into the parts of the programs that're
forking. I'd hoped to avoid having to, but at this point I'm thinking that
was a touch naive. (I'm also thinking I may want to hassle Rafael into
putting a post-fork handler into 5.10, but that's a separate issue)

>> "the child processes often aren't even doing any database access"
>                        ^^^^^^^^^^^^
>
> Often aren't? This should be "never", period, unless the parent contracts
> to stop doing database access after the fork. You can't have two processes
> sharing a handle.

The child processes are supposed to get their own handles; there's some
caching involved, but the cache checks pids. That doesn't mean the
children all do get their own handles, just that they're supposed to.

Regardless, at this point I'm sufficiently convinced that things will
potentially be bad (or at least annoying) enough that it warrants fixing
it now, rather than just putting it off and relying on error traps.

-Dan