Thread: PATCH: pgagent fix for 'Caught unhandled unknown exception; terminating' bug in SQL steps

Hi!

Running pgagent 3.4.0 on Debian jessie with an SQL jobstep crashes the
worker thread with the error message

Caught unhandled unknown exception; terminating

Both job and jobstep status then remain as 'r' (running) until the
pgagent process quits. After restarting pgagent, which cleans up the
job status and sets it to 'd' (aborted), the job containing the SQL
jobstep is free to run again, but the next run will end up in the same
deadlock situation.

I have tracked down the problem to DBconn::GetLastError(), which gets
called after each SQL jobstep execution in Job::Execute(): there,
DBconn::GetLastError() throws an exception whenever the last error
message is empty.

The attached patch fixes this problem by adding missing bounds checks
to DBconn::GetLastError().

Here is a log of pgagent with an SQL jobstep:

$ pgagent -l 2 -f hostaddr=127.0.0.1 port=5432 dbname=pgdb user=pgdb
DEBUG: Creating primary connection
DEBUG: Connection Information:
DEBUG:      user         : pgdb
DEBUG:      port         : 5432
DEBUG:      host         : 127.0.0.1
DEBUG:      dbname       : pgdb
DEBUG:      password     :
DEBUG:      conn timeout : 0
DEBUG: Connection Information:
DEBUG:      user         : pgdb
DEBUG:      port         : 5432
DEBUG:      host         : 127.0.0.1
DEBUG:      dbname       : pgdb
DEBUG:      password     :
DEBUG:      conn timeout : 0
DEBUG: Creating DB connection: user=pgdb port=5432 hostaddr=127.0.0.1 dbname=pgdb
DEBUG: Database sanity check
DEBUG: Clearing zombies
DEBUG: Checking for jobs to run
DEBUG: Sleeping...
DEBUG: Clearing inactive connections
DEBUG: Connection stats: total - 1, free - 0, deleted - 0
DEBUG: Checking for jobs to run
DEBUG: Sleeping...
[...]
DEBUG: Clearing inactive connections
DEBUG: Connection stats: total - 1, free - 0, deleted - 0
DEBUG: Checking for jobs to run
DEBUG: Sleeping...
DEBUG: Connection stats: total - 1, free - 0, deleted - 0
DEBUG: Checking for jobs to run
DEBUG: Creating job thread for job 5
DEBUG: Creating DB connection: user=pgdb port=5432 hostaddr=127.0.0.1 dbname=pgdb
DEBUG: Allocating new connection to database pgdb
DEBUG: Starting job: 5
DEBUG: Sleeping...
DEBUG: Creating DB connection: user=pgdb port=5432 hostaddr=127.0.0.1 dbname=pgdb dbname=pgdb
DEBUG: Allocating new connection to database pgdb
DEBUG: Executing SQL step 40 (part of job 5)
*** Caught unhandled unknown exception; terminating
DEBUG: Destroying job thread for job 5
DEBUG: Checking for jobs to run
DEBUG: Sleeping...
DEBUG: Clearing inactive connections
DEBUG: Connection stats: total - 3, free - 0, deleted - 0
DEBUG: Checking for jobs to run
DEBUG: Sleeping...
DEBUG: Clearing inactive connections
DEBUG: Connection stats: total - 3, free - 0, deleted - 0
DEBUG: Checking for jobs to run
DEBUG: Sleeping...
[...]

Attachment
Ashesh, can you do the honours with this please? Thanks!

On Thu, Mar 26, 2015 at 12:49 PM, Thomas Krennwallner
<tk+pgsql@postsubmeta.net> wrote:
> Hi!
>
> Running pgagent 3.4.0 on Debian jessie with an SQL jobstep crashes the
> worker thread with the error message
>
> Caught unhandled unknown exception; terminating
>
> Both job and jobstep status then remain as 'r' (running) until the
> pgagent process quits. After restarting pgagent, which cleans up the
> job status and sets it to 'd' (aborted), the job containing the SQL
> jobstep is free to run again, but the next run will end up in the same
> deadlock situation.
>
> I have tracked down the problem to DBconn::GetLastError(), which gets
> called after each SQL jobstep execution in Job::Execute(): there,
> DBconn::GetLastError() throws an exception whenever the last error
> message is empty.
>
> The attached patch fixes this problem by adding missing bounds checks
> to DBconn::GetLastError().
>
> Here is a log of pgagent with an SQL jobstep:
>
> $ pgagent -l 2 -f hostaddr=127.0.0.1 port=5432 dbname=pgdb user=pgdb
> DEBUG: Creating primary connection
> DEBUG: Connection Information:
> DEBUG:      user         : pgdb
> DEBUG:      port         : 5432
> DEBUG:      host         : 127.0.0.1
> DEBUG:      dbname       : pgdb
> DEBUG:      password     :
> DEBUG:      conn timeout : 0
> DEBUG: Connection Information:
> DEBUG:      user         : pgdb
> DEBUG:      port         : 5432
> DEBUG:      host         : 127.0.0.1
> DEBUG:      dbname       : pgdb
> DEBUG:      password     :
> DEBUG:      conn timeout : 0
> DEBUG: Creating DB connection: user=pgdb port=5432 hostaddr=127.0.0.1 dbname=pgdb
> DEBUG: Database sanity check
> DEBUG: Clearing zombies
> DEBUG: Checking for jobs to run
> DEBUG: Sleeping...
> DEBUG: Clearing inactive connections
> DEBUG: Connection stats: total - 1, free - 0, deleted - 0
> DEBUG: Checking for jobs to run
> DEBUG: Sleeping...
> [...]
> DEBUG: Clearing inactive connections
> DEBUG: Connection stats: total - 1, free - 0, deleted - 0
> DEBUG: Checking for jobs to run
> DEBUG: Sleeping...
> DEBUG: Connection stats: total - 1, free - 0, deleted - 0
> DEBUG: Checking for jobs to run
> DEBUG: Creating job thread for job 5
> DEBUG: Creating DB connection: user=pgdb port=5432 hostaddr=127.0.0.1 dbname=pgdb
> DEBUG: Allocating new connection to database pgdb
> DEBUG: Starting job: 5
> DEBUG: Sleeping...
> DEBUG: Creating DB connection: user=pgdb port=5432 hostaddr=127.0.0.1 dbname=pgdb dbname=pgdb
> DEBUG: Allocating new connection to database pgdb
> DEBUG: Executing SQL step 40 (part of job 5)
> *** Caught unhandled unknown exception; terminating
> DEBUG: Destroying job thread for job 5
> DEBUG: Checking for jobs to run
> DEBUG: Sleeping...
> DEBUG: Clearing inactive connections
> DEBUG: Connection stats: total - 3, free - 0, deleted - 0
> DEBUG: Checking for jobs to run
> DEBUG: Sleeping...
> DEBUG: Clearing inactive connections
> DEBUG: Connection stats: total - 3, free - 0, deleted - 0
> DEBUG: Checking for jobs to run
> DEBUG: Sleeping...
> [...]
>
>
> --
> Sent via pgadmin-hackers mailing list (pgadmin-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgadmin-hackers
>



--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Thu, Mar 26, 2015 at 6:19 PM, Thomas Krennwallner <tk+pgsql@postsubmeta.net> wrote:
Hi!

Running pgagent 3.4.0 on Debian jessie with an SQL jobstep crashes the
worker thread with the error message

Caught unhandled unknown exception; terminating

Both job and jobstep status then remain as 'r' (running) until the
pgagent process quits. After restarting pgagent, which cleans up the
job status and sets it to 'd' (aborted), the job containing the SQL
jobstep is free to run again, but the next run will end up in the same
deadlock situation.

I have tracked down the problem to DBconn::GetLastError(), which gets
called after each SQL jobstep execution in Job::Execute(): there,
DBconn::GetLastError() throws an exception whenever the last error
message is empty.

The attached patch fixes this problem by adding missing bounds checks
to DBconn::GetLastError().
Thanks for the patch.
But - I was thinking about another fix for the same.
I used the wxString::Trim(trimRight=true) function in order to remove the white-spaces from the right side.


Dave,

diff --git a/connection.cpp b/connection.cpp
index 6103c00..ccbb5c7 100644
--- a/connection.cpp
+++ b/connection.cpp
@@ -314,14 +314,7 @@ int DBconn::ExecuteVoid(const wxString &query)
 wxString DBconn::GetLastError()
 {
        // Return the last error message, minus any trailing line ends
-       if (lastError.substr(lastError.length() - 2, 2) == wxT("\r\n")) // DOS
-               return lastError.substr(0, lastError.length() - 2);
-       else if (lastError.substr(lastError.length() - 1, 1) == wxT("\n")) // Unix
-               return lastError.substr(0, lastError.length() - 1);
-       else if (lastError.substr(lastError.length() - 1, 1) == wxT("\r")) // Mac
-               return lastError.substr(0, lastError.length() - 1);
-       else
-               return lastError;
+       return lastError.Trim(true);
 }


Do you think - above diff make sense?
If yes - I will commit the code.

--

Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company


http://www.linkedin.com/in/asheshvashi


Here is a log of pgagent with an SQL jobstep:

$ pgagent -l 2 -f hostaddr=127.0.0.1 port=5432 dbname=pgdb user=pgdb
DEBUG: Creating primary connection
DEBUG: Connection Information:
DEBUG:      user         : pgdb
DEBUG:      port         : 5432
DEBUG:      host         : 127.0.0.1
DEBUG:      dbname       : pgdb
DEBUG:      password     :
DEBUG:      conn timeout : 0
DEBUG: Connection Information:
DEBUG:      user         : pgdb
DEBUG:      port         : 5432
DEBUG:      host         : 127.0.0.1
DEBUG:      dbname       : pgdb
DEBUG:      password     :
DEBUG:      conn timeout : 0
DEBUG: Creating DB connection: user=pgdb port=5432 hostaddr=127.0.0.1 dbname=pgdb
DEBUG: Database sanity check
DEBUG: Clearing zombies
DEBUG: Checking for jobs to run
DEBUG: Sleeping...
DEBUG: Clearing inactive connections
DEBUG: Connection stats: total - 1, free - 0, deleted - 0
DEBUG: Checking for jobs to run
DEBUG: Sleeping...
[...]
DEBUG: Clearing inactive connections
DEBUG: Connection stats: total - 1, free - 0, deleted - 0
DEBUG: Checking for jobs to run
DEBUG: Sleeping...
DEBUG: Connection stats: total - 1, free - 0, deleted - 0
DEBUG: Checking for jobs to run
DEBUG: Creating job thread for job 5
DEBUG: Creating DB connection: user=pgdb port=5432 hostaddr=127.0.0.1 dbname=pgdb
DEBUG: Allocating new connection to database pgdb
DEBUG: Starting job: 5
DEBUG: Sleeping...
DEBUG: Creating DB connection: user=pgdb port=5432 hostaddr=127.0.0.1 dbname=pgdb dbname=pgdb
DEBUG: Allocating new connection to database pgdb
DEBUG: Executing SQL step 40 (part of job 5)
*** Caught unhandled unknown exception; terminating
DEBUG: Destroying job thread for job 5
DEBUG: Checking for jobs to run
DEBUG: Sleeping...
DEBUG: Clearing inactive connections
DEBUG: Connection stats: total - 3, free - 0, deleted - 0
DEBUG: Checking for jobs to run
DEBUG: Sleeping...
DEBUG: Clearing inactive connections
DEBUG: Connection stats: total - 3, free - 0, deleted - 0
DEBUG: Checking for jobs to run
DEBUG: Sleeping...
[...]


--
Sent via pgadmin-hackers mailing list (pgadmin-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgadmin-hackers




On Tue, Mar 31, 2015 at 8:45 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
On Thu, Mar 26, 2015 at 6:19 PM, Thomas Krennwallner <tk+pgsql@postsubmeta.net> wrote:
Hi!

Running pgagent 3.4.0 on Debian jessie with an SQL jobstep crashes the
worker thread with the error message

Caught unhandled unknown exception; terminating

Both job and jobstep status then remain as 'r' (running) until the
pgagent process quits. After restarting pgagent, which cleans up the
job status and sets it to 'd' (aborted), the job containing the SQL
jobstep is free to run again, but the next run will end up in the same
deadlock situation.

I have tracked down the problem to DBconn::GetLastError(), which gets
called after each SQL jobstep execution in Job::Execute(): there,
DBconn::GetLastError() throws an exception whenever the last error
message is empty.

The attached patch fixes this problem by adding missing bounds checks
to DBconn::GetLastError().
Thanks for the patch.
But - I was thinking about another fix for the same.
I used the wxString::Trim(trimRight=true) function in order to remove the white-spaces from the right side.


Dave,

diff --git a/connection.cpp b/connection.cpp
index 6103c00..ccbb5c7 100644
--- a/connection.cpp
+++ b/connection.cpp
@@ -314,14 +314,7 @@ int DBconn::ExecuteVoid(const wxString &query)
 wxString DBconn::GetLastError()
 {
        // Return the last error message, minus any trailing line ends
-       if (lastError.substr(lastError.length() - 2, 2) == wxT("\r\n")) // DOS
-               return lastError.substr(0, lastError.length() - 2);
-       else if (lastError.substr(lastError.length() - 1, 1) == wxT("\n")) // Unix
-               return lastError.substr(0, lastError.length() - 1);
-       else if (lastError.substr(lastError.length() - 1, 1) == wxT("\r")) // Mac
-               return lastError.substr(0, lastError.length() - 1);
-       else
-               return lastError;
+       return lastError.Trim(true);
 }


Do you think - above diff make sense?
If yes - I will commit the code.


That does seem like it would do it. 

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Mar 31, 2015 at 6:20 PM, Dave Page <dpage@pgadmin.org> wrote:


On Tue, Mar 31, 2015 at 8:45 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
On Thu, Mar 26, 2015 at 6:19 PM, Thomas Krennwallner <tk+pgsql@postsubmeta.net> wrote:
Hi!

Running pgagent 3.4.0 on Debian jessie with an SQL jobstep crashes the
worker thread with the error message

Caught unhandled unknown exception; terminating

Both job and jobstep status then remain as 'r' (running) until the
pgagent process quits. After restarting pgagent, which cleans up the
job status and sets it to 'd' (aborted), the job containing the SQL
jobstep is free to run again, but the next run will end up in the same
deadlock situation.

I have tracked down the problem to DBconn::GetLastError(), which gets
called after each SQL jobstep execution in Job::Execute(): there,
DBconn::GetLastError() throws an exception whenever the last error
message is empty.

The attached patch fixes this problem by adding missing bounds checks
to DBconn::GetLastError().
Thanks for the patch.
But - I was thinking about another fix for the same.
I used the wxString::Trim(trimRight=true) function in order to remove the white-spaces from the right side.


Dave,

diff --git a/connection.cpp b/connection.cpp
index 6103c00..ccbb5c7 100644
--- a/connection.cpp
+++ b/connection.cpp
@@ -314,14 +314,7 @@ int DBconn::ExecuteVoid(const wxString &query)
 wxString DBconn::GetLastError()
 {
        // Return the last error message, minus any trailing line ends
-       if (lastError.substr(lastError.length() - 2, 2) == wxT("\r\n")) // DOS
-               return lastError.substr(0, lastError.length() - 2);
-       else if (lastError.substr(lastError.length() - 1, 1) == wxT("\n")) // Unix
-               return lastError.substr(0, lastError.length() - 1);
-       else if (lastError.substr(lastError.length() - 1, 1) == wxT("\r")) // Mac
-               return lastError.substr(0, lastError.length() - 1);
-       else
-               return lastError;
+       return lastError.Trim(true);
 }


Do you think - above diff make sense?
If yes - I will commit the code.


That does seem like it would do it. 
Thanks.
I will commit the changes.

--

Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company


http://www.linkedin.com/in/asheshvashi

 

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

On Tue, Mar 31, 2015 at 6:26 PM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
On Tue, Mar 31, 2015 at 6:20 PM, Dave Page <dpage@pgadmin.org> wrote:


On Tue, Mar 31, 2015 at 8:45 AM, Ashesh Vashi <ashesh.vashi@enterprisedb.com> wrote:
On Thu, Mar 26, 2015 at 6:19 PM, Thomas Krennwallner <tk+pgsql@postsubmeta.net> wrote:
Hi!

Running pgagent 3.4.0 on Debian jessie with an SQL jobstep crashes the
worker thread with the error message

Caught unhandled unknown exception; terminating

Both job and jobstep status then remain as 'r' (running) until the
pgagent process quits. After restarting pgagent, which cleans up the
job status and sets it to 'd' (aborted), the job containing the SQL
jobstep is free to run again, but the next run will end up in the same
deadlock situation.

I have tracked down the problem to DBconn::GetLastError(), which gets
called after each SQL jobstep execution in Job::Execute(): there,
DBconn::GetLastError() throws an exception whenever the last error
message is empty.

The attached patch fixes this problem by adding missing bounds checks
to DBconn::GetLastError().
Thanks for the patch.
But - I was thinking about another fix for the same.
I used the wxString::Trim(trimRight=true) function in order to remove the white-spaces from the right side.


Dave,

diff --git a/connection.cpp b/connection.cpp
index 6103c00..ccbb5c7 100644
--- a/connection.cpp
+++ b/connection.cpp
@@ -314,14 +314,7 @@ int DBconn::ExecuteVoid(const wxString &query)
 wxString DBconn::GetLastError()
 {
        // Return the last error message, minus any trailing line ends
-       if (lastError.substr(lastError.length() - 2, 2) == wxT("\r\n")) // DOS
-               return lastError.substr(0, lastError.length() - 2);
-       else if (lastError.substr(lastError.length() - 1, 1) == wxT("\n")) // Unix
-               return lastError.substr(0, lastError.length() - 1);
-       else if (lastError.substr(lastError.length() - 1, 1) == wxT("\r")) // Mac
-               return lastError.substr(0, lastError.length() - 1);
-       else
-               return lastError;
+       return lastError.Trim(true);
 }


Do you think - above diff make sense?
If yes - I will commit the code.


That does seem like it would do it. 
Thanks.
I will commit the changes.
I have committed the changes.
Thanks Thomas Krennwallner for the report and for the patch too.

--

Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company


http://www.linkedin.com/in/asheshvashi


--

Thanks & Regards,

Ashesh Vashi
EnterpriseDB INDIA: Enterprise PostgreSQL Company


http://www.linkedin.com/in/asheshvashi

 

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company