Thread: Trouble in paradise: cancel via postmaster ain't so cool

Trouble in paradise: cancel via postmaster ain't so cool

From
Tom Lane
Date:
Well, I've got this new code running, and it works.  Sort of.
The postmaster and backend seem to be fine ... but psql has a tendency
to coredump right after sending a cancel request.

After digging into it, I realized that the problem is that psql.c is
coded to invoke PQrequestCancel() directly from its SIGINT signal
handler.  That was cool when the only thing PQrequestCancel() did
was to invoke send().

But now, PQrequestCancel requires allocating memory, opening a new
connection, sending some data, closing the connection, and freeing
memory.

On my machine, the C library is not reentrant, and if you try to do
this sort of stuff from a signal handler that has interrupted a call
to malloc() or printf() or some such, you can expect to crash.

I can see several alternatives, none very attractive:

1. Try to code the new PQrequestCancel so that it doesn't invoke
any likely-non-reentrant part of the C library.  Difficult at best,
maybe impossible (is gethostbyname reentrant?  I doubt it if malloc
isn't).

2. Live with PQrequestCancel not being reentrant: code apps using it
to invoke it from main line not a signal handler.  The trouble is that
this makes it *substantially* harder to use.  In psql.c, for example,
we could no longer use plain PQexec; we'd have to write some kind of
loop around the more primitive libpq functions, so that control would
block out in psql.c while waiting for a backend response, and not down
in the guts of libpq.

3. Keep a connection to the postmaster open at all times so that
PQrequestCancel only needs to do a send() and not any of the hard
stuff.  This is not good because it risks overflowing the number of
open files the postmaster process can have at one time.  It also means
establishing two IPC connections not one during backend startup, which
is clearly a performance loss.

4. Stick with OOB-based cancels and live with the portability
limitations thereof.

I will work on #1 but I am not very hopeful of success.  Has anyone
got a better idea?

            regards, tom lane

Re: [HACKERS] Trouble in paradise: cancel via postmaster ain't so cool

From
"Henry B. Hotz"
Date:
At 3:46 PM -0700 7/7/98, Tom Lane wrote:
>I can see several alternatives, none very attractive:
>
>1. Try to code the new PQrequestCancel so that it doesn't invoke
>any likely-non-reentrant part of the C library.  Difficult at best,
>maybe impossible (is gethostbyname reentrant?  I doubt it if malloc
>isn't).
...

>I will work on #1 but I am not very hopeful of success.  Has anyone
>got a better idea?

Idea A:  precompute everything you need to do a cancel as part of sending
the request in the first place so #1 above takes minimum effort (i.e. no
malloc(), no gethostbyname(), no nothing).

Idea B:  spawn (vfork()/exec()) a cancel process so all the funny stuff
happens in a different address space.

Idea C:  look at what some standard network clients do to handle similar
problems.  What does ftp do for example?  It also seems like some network
programming textbooks, like Stevens, should discuss this problem.

Signature failed Preliminary Design Review.
Feasibility of a new signature is currently being evaluated.
h.b.hotz@jpl.nasa.gov, or hbhotz@oxy.edu



"Henry B. Hotz" <hotz@jpl.nasa.gov> writes:
> Idea A:  precompute everything you need to do a cancel as part of sending
> the request in the first place so #1 above takes minimum effort (i.e. no
> malloc(), no gethostbyname(), no nothing).

Yeah, that's what I planned to try.  It'll mean making the PGconn
structure a little bigger to hold the info, but that seems OK.

> Idea B:  spawn (vfork()/exec()) a cancel process so all the funny stuff
> happens in a different address space.

Hadn't thought of that one... it'd have to be a real fork not a vfork,
consequently could be pretty expensive for a large application.
Still it might be a better answer than living with a nonreentrant
PQrequestCancel().

After sleeping on it, I feel like it should be possible to solve the
problem along the lines Henry mentions: have connectDB save everything
that it needs to get from C library routines, so that only kernel calls
are needed to open a new postmaster connection in PQrequestCancel.
Will work on that.

            regards, tom lane