On Thu, Aug 4, 2011 at 8:30 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Thu, Aug 4, 2011 at 2:19 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> I created 100 identical pgagent jobs, with one step that simply does "SELECT
>> pg_sleep(10)". I then forced them all to run immediately, with "UPDATE
>> pgagent.pga_job SET jobnextrun=now();". pgagent crashed.
>>
>> What happened is that the when all those jobs are launched at the same time,
>> the server ran into the max_connections limit, and pgagent didn't handle
>> that too well. JobThread::JobThread constructor does not check for NULL
>> result from DBConn::Get(), and passes a NULL connection to Job::Job, which
>> tries to reference it, leading to a segfault.
>>
>> I propose the attached patch.
>
> hm, in the event that happens, is that logged in the client somehow?
> wouldn't you want to throw an exception or something like that?
I think the most straightforward way to handle this is to dump an
error into pgagent.pga_joblog when deleting the thread. Might be a
little ugly to pass the original error message back rather than a
generic one though. Can you take a look Heikki?
--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company