Steinar H. Gunderson wrote:
> On Wed, Jul 12, 2006 at 08:43:18AM -0700, Craig A. James wrote:
>>> Then you killed the wrong backend...
>>> No queries run in postmaster. They all run in postgres backends. The
>>> postmaster does very little actual work, other than keeping track of
>>> everybody else.
>> It turns out I was confused by this: ps(1) reports a process called
>> "postgres", but top(1) reports a process called "postmaster", but they both
>> have the same pid. I guess postmaster replaces its own name in the process
>> table when it's executing a query, and it's not really the postmaster even
>> though top(1) calls it postmaster.
>>
>> So "kill -15 <pid>" is NOT killing the process -- to kill the process, I
>> have to use signal 9. But if I do that, ALL queries in progress are
>> aborted. I might as well shut down and restart the database, which is an
>> unacceptable solution for a web site.
>
> I don't follow your logic here. If you do "kill -15 <pid>" of the postmaster
> doing the work, the query should be aborted without taking down the entire
> cluster. I don't see why you'd need -9 (which is a really bad idea anyhow)...
I've solved this mystery. "kill -15" doesn't immediately kill the job -- it aborts the query, but it might take 15-30
secondsto clean up.
This confused me, because the query I was using to test took about 30 seconds, so the SIGTERM didn't seem to make a
difference. But when I used a harder query, one that would run for 5-10 minutes, SIGTERM still stopped it after 15
seconds,which isn't great but it's acceptable.
Bottom line is that I was expecting "instant death" with SIGTERM, but instead got an agonizing, drawn out -- but safe
--death of the query. At least that's my deduction based on experiments. I haven't dug into the source to confirm.
Thanks everyone for your answers. My "kill this query" feature is now acceptable.
Craig