(sorry for the late reply, this fell through the cracks..)
On 10.08.2011 11:48, Dave Page wrote:
> On Thu, Aug 4, 2011 at 8:30 PM, Merlin Moncure<mmoncure@gmail.com> wrote:
>> On Thu, Aug 4, 2011 at 2:19 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>> I created 100 identical pgagent jobs, with one step that simply does "SELECT
>>> pg_sleep(10)". I then forced them all to run immediately, with "UPDATE
>>> pgagent.pga_job SET jobnextrun=now();". pgagent crashed.
>>>
>>> What happened is that the when all those jobs are launched at the same time,
>>> the server ran into the max_connections limit, and pgagent didn't handle
>>> that too well. JobThread::JobThread constructor does not check for NULL
>>> result from DBConn::Get(), and passes a NULL connection to Job::Job, which
>>> tries to reference it, leading to a segfault.
>>>
>>> I propose the attached patch.
>>
>> hm, in the event that happens, is that logged in the client somehow?
>> wouldn't you want to throw an exception or something like that?
>
> I think the most straightforward way to handle this is to dump an
> error into pgagent.pga_joblog when deleting the thread. Might be a
> little ugly to pass the original error message back rather than a
> generic one though. Can you take a look Heikki?
You mean something like the attached? Works for me, but inserting an
entry in joblog for each failed attempt might create a lot of entries
there, if the problem persists.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com