Thread: Trouble in paradise: cancel via postmaster ain't so cool
Well, I've got this new code running, and it works. Sort of. The postmaster and backend seem to be fine ... but psql has a tendency to coredump right after sending a cancel request. After digging into it, I realized that the problem is that psql.c is coded to invoke PQrequestCancel() directly from its SIGINT signal handler. That was cool when the only thing PQrequestCancel() did was to invoke send(). But now, PQrequestCancel requires allocating memory, opening a new connection, sending some data, closing the connection, and freeing memory. On my machine, the C library is not reentrant, and if you try to do this sort of stuff from a signal handler that has interrupted a call to malloc() or printf() or some such, you can expect to crash. I can see several alternatives, none very attractive: 1. Try to code the new PQrequestCancel so that it doesn't invoke any likely-non-reentrant part of the C library. Difficult at best, maybe impossible (is gethostbyname reentrant? I doubt it if malloc isn't). 2. Live with PQrequestCancel not being reentrant: code apps using it to invoke it from main line not a signal handler. The trouble is that this makes it *substantially* harder to use. In psql.c, for example, we could no longer use plain PQexec; we'd have to write some kind of loop around the more primitive libpq functions, so that control would block out in psql.c while waiting for a backend response, and not down in the guts of libpq. 3. Keep a connection to the postmaster open at all times so that PQrequestCancel only needs to do a send() and not any of the hard stuff. This is not good because it risks overflowing the number of open files the postmaster process can have at one time. It also means establishing two IPC connections not one during backend startup, which is clearly a performance loss. 4. Stick with OOB-based cancels and live with the portability limitations thereof. I will work on #1 but I am not very hopeful of success. Has anyone got a better idea? regards, tom lane
At 3:46 PM -0700 7/7/98, Tom Lane wrote: >I can see several alternatives, none very attractive: > >1. Try to code the new PQrequestCancel so that it doesn't invoke >any likely-non-reentrant part of the C library. Difficult at best, >maybe impossible (is gethostbyname reentrant? I doubt it if malloc >isn't). ... >I will work on #1 but I am not very hopeful of success. Has anyone >got a better idea? Idea A: precompute everything you need to do a cancel as part of sending the request in the first place so #1 above takes minimum effort (i.e. no malloc(), no gethostbyname(), no nothing). Idea B: spawn (vfork()/exec()) a cancel process so all the funny stuff happens in a different address space. Idea C: look at what some standard network clients do to handle similar problems. What does ftp do for example? It also seems like some network programming textbooks, like Stevens, should discuss this problem. Signature failed Preliminary Design Review. Feasibility of a new signature is currently being evaluated. h.b.hotz@jpl.nasa.gov, or hbhotz@oxy.edu
"Henry B. Hotz" <hotz@jpl.nasa.gov> writes: > Idea A: precompute everything you need to do a cancel as part of sending > the request in the first place so #1 above takes minimum effort (i.e. no > malloc(), no gethostbyname(), no nothing). Yeah, that's what I planned to try. It'll mean making the PGconn structure a little bigger to hold the info, but that seems OK. > Idea B: spawn (vfork()/exec()) a cancel process so all the funny stuff > happens in a different address space. Hadn't thought of that one... it'd have to be a real fork not a vfork, consequently could be pretty expensive for a large application. Still it might be a better answer than living with a nonreentrant PQrequestCancel(). After sleeping on it, I feel like it should be possible to solve the problem along the lines Henry mentions: have connectDB save everything that it needs to get from C library routines, so that only kernel calls are needed to open a new postmaster connection in PQrequestCancel. Will work on that. regards, tom lane