Thread: psycopg2 conn.poll() hangs when used with sshtunnel

psycopg2 conn.poll() hangs when used with sshtunnel

From
Akshay Joshi
Date:

Hello

I am using psycopg2 2.7.4 to connect to the PostgreSQL database server using asynchronoussupport. It is working absolutely fine. I have used sshtunnel v0.1.3 in pgAdmin4. When I connect the PostgreSQL database server using ssh tunnel and run the valid/correct query it works fine, but when I run any wrong query(invalid column of table) my application gets hang on conn.poll() function.

Please refer the code how we use conn.poll() with timeouts https://git.postgresql.org/gitweb/?p=pgadmin4.git;a=blob;f=web/pgadmin/utils/driver/psycopg2/connection.py;h=4f11c12b30882209c308cb3558e67189c97ea31e;hb=15fe26a7106610b710f3de5b604cd038302c926a#l1363

Can anyone please provide some pointers, suggestions?


--
Akshay Joshi
Sr. Software Architect


Phone: +91 20-3058-9517
Mobile: +91 976-788-8246

Re: psycopg2 conn.poll() hangs when used with sshtunnel

From
Akshay Joshi
Date:
Hi Team 

As one issue(https://github.com/psycopg/psycopg2/issues/781) has already been created for conn.poll() function hangs when executes COPY command in async mode. Similarly conn.poll() functions hangs when we run wrong query in async mode using sshtunnel v0.1.3.

Can someone please suggest what went wrong when used sshtunnel, or the pointers/reason where poll() method hangs.    

On Fri, Sep 28, 2018 at 5:43 PM Akshay Joshi <akshay.joshi@enterprisedb.com> wrote:

Hello

I am using psycopg2 2.7.4 to connect to the PostgreSQL database server using asynchronoussupport. It is working absolutely fine. I have used sshtunnel v0.1.3 in pgAdmin4. When I connect the PostgreSQL database server using ssh tunnel and run the valid/correct query it works fine, but when I run any wrong query(invalid column of table) my application gets hang on conn.poll() function.

Please refer the code how we use conn.poll() with timeouts https://git.postgresql.org/gitweb/?p=pgadmin4.git;a=blob;f=web/pgadmin/utils/driver/psycopg2/connection.py;h=4f11c12b30882209c308cb3558e67189c97ea31e;hb=15fe26a7106610b710f3de5b604cd038302c926a#l1363

Can anyone please provide some pointers, suggestions?


--
Akshay Joshi
Sr. Software Architect


Phone: +91 20-3058-9517
Mobile: +91 976-788-8246


--
Akshay Joshi
Sr. Software Architect


Phone: +91 20-3058-9517
Mobile: +91 976-788-8246

Re: psycopg2 conn.poll() hangs when used with sshtunnel

From
Akshay Joshi
Date:
Hi Team 

I have created one sample application to replicate the issue. Attached is the sample application, please read "README.txt" file to proceed further.
Can someone please suggest why poll() function hangs when run wrong query with sshtunnel v0.1.4.

On Mon, Oct 8, 2018 at 12:19 PM Akshay Joshi <akshay.joshi@enterprisedb.com> wrote:
Hi Team 

As one issue(https://github.com/psycopg/psycopg2/issues/781) has already been created for conn.poll() function hangs when executes COPY command in async mode. Similarly conn.poll() functions hangs when we run wrong query in async mode using sshtunnel v0.1.3.

Can someone please suggest what went wrong when used sshtunnel, or the pointers/reason where poll() method hangs.    

On Fri, Sep 28, 2018 at 5:43 PM Akshay Joshi <akshay.joshi@enterprisedb.com> wrote:

Hello

I am using psycopg2 2.7.4 to connect to the PostgreSQL database server using asynchronoussupport. It is working absolutely fine. I have used sshtunnel v0.1.3 in pgAdmin4. When I connect the PostgreSQL database server using ssh tunnel and run the valid/correct query it works fine, but when I run any wrong query(invalid column of table) my application gets hang on conn.poll() function.

Please refer the code how we use conn.poll() with timeouts https://git.postgresql.org/gitweb/?p=pgadmin4.git;a=blob;f=web/pgadmin/utils/driver/psycopg2/connection.py;h=4f11c12b30882209c308cb3558e67189c97ea31e;hb=15fe26a7106610b710f3de5b604cd038302c926a#l1363

Can anyone please provide some pointers, suggestions?


--
Akshay Joshi
Sr. Software Architect


Phone: +91 20-3058-9517
Mobile: +91 976-788-8246


--
Akshay Joshi
Sr. Software Architect


Phone: +91 20-3058-9517
Mobile: +91 976-788-8246


--
Akshay Joshi
Sr. Software Architect


Phone: +91 20-3058-9517
Mobile: +91 976-788-8246
Attachment

Re: psycopg2 conn.poll() hangs when used with sshtunnel

From
Daniele Varrazzo
Date:
Taking a look. Thank you for the test.

On Tue, Oct 30, 2018 at 12:36 PM Akshay Joshi <akshay.joshi@enterprisedb.com> wrote:
Hi Team 

I have created one sample application to replicate the issue. Attached is the sample application, please read "README.txt" file to proceed further.
Can someone please suggest why poll() function hangs when run wrong query with sshtunnel v0.1.4.

On Mon, Oct 8, 2018 at 12:19 PM Akshay Joshi <akshay.joshi@enterprisedb.com> wrote:
Hi Team 

As one issue(https://github.com/psycopg/psycopg2/issues/781) has already been created for conn.poll() function hangs when executes COPY command in async mode. Similarly conn.poll() functions hangs when we run wrong query in async mode using sshtunnel v0.1.3.

Can someone please suggest what went wrong when used sshtunnel, or the pointers/reason where poll() method hangs.    

On Fri, Sep 28, 2018 at 5:43 PM Akshay Joshi <akshay.joshi@enterprisedb.com> wrote:

Hello

I am using psycopg2 2.7.4 to connect to the PostgreSQL database server using asynchronoussupport. It is working absolutely fine. I have used sshtunnel v0.1.3 in pgAdmin4. When I connect the PostgreSQL database server using ssh tunnel and run the valid/correct query it works fine, but when I run any wrong query(invalid column of table) my application gets hang on conn.poll() function.

Please refer the code how we use conn.poll() with timeouts https://git.postgresql.org/gitweb/?p=pgadmin4.git;a=blob;f=web/pgadmin/utils/driver/psycopg2/connection.py;h=4f11c12b30882209c308cb3558e67189c97ea31e;hb=15fe26a7106610b710f3de5b604cd038302c926a#l1363

Can anyone please provide some pointers, suggestions?


--
Akshay Joshi
Sr. Software Architect


Phone: +91 20-3058-9517
Mobile: +91 976-788-8246


--
Akshay Joshi
Sr. Software Architect


Phone: +91 20-3058-9517
Mobile: +91 976-788-8246


--
Akshay Joshi
Sr. Software Architect


Phone: +91 20-3058-9517
Mobile: +91 976-788-8246

Re: psycopg2 conn.poll() hangs when used with sshtunnel

From
Daniele Varrazzo
Date:
It seems to me that the problem is in pqpath.c pq_get_last_result() function. The loop assumes PQgetResult won't block, which seems always the case with the direct connection, but PQisBusy() actually returns 1 connecting through the tunnel.

So at a first read, the simplification given by pq_get_last_result() is broken: the results should be returned one by one in the normal loop going through PQisBusy(), PQconsumeInput(), and the polling machinery. It seems a chunky refactoring but as it is now I don't like pq_get_last_result() anymore :(

-- Daniele 


On Tue, Oct 30, 2018 at 12:52 PM Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
Taking a look. Thank you for the test.

On Tue, Oct 30, 2018 at 12:36 PM Akshay Joshi <akshay.joshi@enterprisedb.com> wrote:
Hi Team 

I have created one sample application to replicate the issue. Attached is the sample application, please read "README.txt" file to proceed further.
Can someone please suggest why poll() function hangs when run wrong query with sshtunnel v0.1.4.

On Mon, Oct 8, 2018 at 12:19 PM Akshay Joshi <akshay.joshi@enterprisedb.com> wrote:
Hi Team 

As one issue(https://github.com/psycopg/psycopg2/issues/781) has already been created for conn.poll() function hangs when executes COPY command in async mode. Similarly conn.poll() functions hangs when we run wrong query in async mode using sshtunnel v0.1.3.

Can someone please suggest what went wrong when used sshtunnel, or the pointers/reason where poll() method hangs.    

On Fri, Sep 28, 2018 at 5:43 PM Akshay Joshi <akshay.joshi@enterprisedb.com> wrote:

Hello

I am using psycopg2 2.7.4 to connect to the PostgreSQL database server using asynchronoussupport. It is working absolutely fine. I have used sshtunnel v0.1.3 in pgAdmin4. When I connect the PostgreSQL database server using ssh tunnel and run the valid/correct query it works fine, but when I run any wrong query(invalid column of table) my application gets hang on conn.poll() function.

Please refer the code how we use conn.poll() with timeouts https://git.postgresql.org/gitweb/?p=pgadmin4.git;a=blob;f=web/pgadmin/utils/driver/psycopg2/connection.py;h=4f11c12b30882209c308cb3558e67189c97ea31e;hb=15fe26a7106610b710f3de5b604cd038302c926a#l1363

Can anyone please provide some pointers, suggestions?


--
Akshay Joshi
Sr. Software Architect


Phone: +91 20-3058-9517
Mobile: +91 976-788-8246


--
Akshay Joshi
Sr. Software Architect


Phone: +91 20-3058-9517
Mobile: +91 976-788-8246


--
Akshay Joshi
Sr. Software Architect


Phone: +91 20-3058-9517
Mobile: +91 976-788-8246

Re: psycopg2 conn.poll() hangs when used with sshtunnel

From
Akshay Joshi
Date:
Hi

On Tue, 30 Oct 2018, 19:17 Daniele Varrazzo <daniele.varrazzo@gmail.com wrote:
It seems to me that the problem is in pqpath.c pq_get_last_result() function. The loop assumes PQgetResult won't block, which seems always the case with the direct connection, but PQisBusy() actually returns 1 connecting through the tunnel.

So at a first read, the simplification given by pq_get_last_result() is broken: the results should be returned one by one in the normal loop going through PQisBusy(), PQconsumeInput(), and the polling machinery. It seems a chunky refactoring but as it is now I don't like pq_get_last_result() anymore :(

    What changes should I made to fix this? Or it should be fixed in psycopg2 itself.

-- Daniele 


On Tue, Oct 30, 2018 at 12:52 PM Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote:
Taking a look. Thank you for the test.

On Tue, Oct 30, 2018 at 12:36 PM Akshay Joshi <akshay.joshi@enterprisedb.com> wrote:
Hi Team 

I have created one sample application to replicate the issue. Attached is the sample application, please read "README.txt" file to proceed further.
Can someone please suggest why poll() function hangs when run wrong query with sshtunnel v0.1.4.

On Mon, Oct 8, 2018 at 12:19 PM Akshay Joshi <akshay.joshi@enterprisedb.com> wrote:
Hi Team 

As one issue(https://github.com/psycopg/psycopg2/issues/781) has already been created for conn.poll() function hangs when executes COPY command in async mode. Similarly conn.poll() functions hangs when we run wrong query in async mode using sshtunnel v0.1.3.

Can someone please suggest what went wrong when used sshtunnel, or the pointers/reason where poll() method hangs.    

On Fri, Sep 28, 2018 at 5:43 PM Akshay Joshi <akshay.joshi@enterprisedb.com> wrote:

Hello

I am using psycopg2 2.7.4 to connect to the PostgreSQL database server using asynchronoussupport. It is working absolutely fine. I have used sshtunnel v0.1.3 in pgAdmin4. When I connect the PostgreSQL database server using ssh tunnel and run the valid/correct query it works fine, but when I run any wrong query(invalid column of table) my application gets hang on conn.poll() function.

Please refer the code how we use conn.poll() with timeouts https://git.postgresql.org/gitweb/?p=pgadmin4.git;a=blob;f=web/pgadmin/utils/driver/psycopg2/connection.py;h=4f11c12b30882209c308cb3558e67189c97ea31e;hb=15fe26a7106610b710f3de5b604cd038302c926a#l1363

Can anyone please provide some pointers, suggestions?


--
Akshay Joshi
Sr. Software Architect


Phone: +91 20-3058-9517
Mobile: +91 976-788-8246


--
Akshay Joshi
Sr. Software Architect


Phone: +91 20-3058-9517
Mobile: +91 976-788-8246


--
Akshay Joshi
Sr. Software Architect


Phone: +91 20-3058-9517
Mobile: +91 976-788-8246

Re: psycopg2 conn.poll() hangs when used with sshtunnel

From
Daniele Varrazzo
Date:
On Tue, Oct 30, 2018 at 1:55 PM Akshay Joshi <akshay.joshi@enterprisedb.com> wrote:
Hi

On Tue, 30 Oct 2018, 19:17 Daniele Varrazzo <daniele.varrazzo@gmail.com wrote:
It seems to me that the problem is in pqpath.c pq_get_last_result() function. The loop assumes PQgetResult won't block, which seems always the case with the direct connection, but PQisBusy() actually returns 1 connecting through the tunnel.

So at a first read, the simplification given by pq_get_last_result() is broken: the results should be returned one by one in the normal loop going through PQisBusy(), PQconsumeInput(), and the polling machinery. It seems a chunky refactoring but as it is now I don't like pq_get_last_result() anymore :(

    What changes should I made to fix this? Or it should be fixed in psycopg2 itself.

It is a problem in psycopg. I have a quick fix I can release with the next bugfix if it shows no regression (https://github.com/psycopg/psycopg2/issues/801). A better fix requires a non trivial rewrite of the async and green paths (https://github.com/psycopg/psycopg2/issues/802): I'd like to do it but it may happen later if the band aid holds.

I was just about to prepare the 2.7.6 release, so if the quick fix is easy and doesn't cause regressions it will be released soon.

-- Daniele

Re: psycopg2 conn.poll() hangs when used with sshtunnel

From
Akshay Joshi
Date:
Thanks Daniele 

On Tue, 30 Oct 2018, 19:56 Daniele Varrazzo <daniele.varrazzo@gmail.com wrote:
On Tue, Oct 30, 2018 at 1:55 PM Akshay Joshi <akshay.joshi@enterprisedb.com> wrote:
Hi

On Tue, 30 Oct 2018, 19:17 Daniele Varrazzo <daniele.varrazzo@gmail.com wrote:
It seems to me that the problem is in pqpath.c pq_get_last_result() function. The loop assumes PQgetResult won't block, which seems always the case with the direct connection, but PQisBusy() actually returns 1 connecting through the tunnel.

So at a first read, the simplification given by pq_get_last_result() is broken: the results should be returned one by one in the normal loop going through PQisBusy(), PQconsumeInput(), and the polling machinery. It seems a chunky refactoring but as it is now I don't like pq_get_last_result() anymore :(

    What changes should I made to fix this? Or it should be fixed in psycopg2 itself.

It is a problem in psycopg. I have a quick fix I can release with the next bugfix if it shows no regression (https://github.com/psycopg/psycopg2/issues/801). A better fix requires a non trivial rewrite of the async and green paths (https://github.com/psycopg/psycopg2/issues/802): I'd like to do it but it may happen later if the band aid holds.

I was just about to prepare the 2.7.6 release, so if the quick fix is easy and doesn't cause regressions it will be released soon.

-- Daniele

Re: psycopg2 conn.poll() hangs when used with sshtunnel

From
Daniele Varrazzo
Date:
On Tue, Oct 30, 2018 at 1:55 PM Akshay Joshi <akshay.joshi@enterprisedb.com> wrote:
 
    What changes should I made to fix this? Or it should be fixed in psycopg2 itself.

Actually, I have another observation. I was surprised to see that the whole test suite passed running the connection through an ssh tunnel: there are definitely failing queries in the test suite. So I made other tests and it seems the bad condition only happens using the `sshtunnel` module: If I open a tunnel manually with something like:

    ssh -L 36421:localhost:5432 -N localhost

and point your script to port 36421 everything works ok.


Conversely, I don't seem able to run the test suite through the tunnel open by the sshtunnel module. It doesn't hang, but the tunnel breaks in a test with the following reported on the sshtunnel side:

    2018-10-30 15:24:25,550| ERROR   | Socket exception: Bad file descriptor (9)
    2018-10-30 15:24:25,551| ERROR   | Could not establish connection from ('127.0.0.1', 33743) to remote side of the tunnel

and the following tests fail to run as the connection is broken.


Con-conversely, the patch I had in mind to fix #801, which very brutally is:

```
@@ -1136,6 +1136,13 @@ pq_get_last_result(connectionObject *conn)
                 || status == PGRES_COPY_IN) {
             break;
         }
+        if (PQisBusy(conn->pgconn)) {
+            /* This happens connecting through ssl tunnel
+             * TODO: just kill this function. The loop should happen within
+             * the async/green machinery. */
+            Dprintf("pq_get_last_result: we are busy");
+            break;
+        }
     }
 
     return result;
```

doesn't work, and failing queries leave the connection in an inconsistent state.


So, wrapping up, I think there is something you can do on your side, checking why the sshtunnel module behaves differently from a normal ssh tunnel, and if you use the latter (or configure sshtunnel to behave so) you shouldn't hit the problem. On our side I don't think we can fix #801 with a quick band aid, and we should rather do #802, but as things stands now I don't trust sshtunnel to do the right thing, and have less of an urgency to do so.

Please let us know if you understand what is the difference between the module and the tunnel via `ssh -L`, thank you!

-- Daniele

Re: psycopg2 conn.poll() hangs when used with sshtunnel

From
Akshay Joshi
Date:


On Tue, 30 Oct 2018, 21:06 Daniele Varrazzo <daniele.varrazzo@gmail.com wrote:
On Tue, Oct 30, 2018 at 1:55 PM Akshay Joshi <akshay.joshi@enterprisedb.com> wrote:
 
    What changes should I made to fix this? Or it should be fixed in psycopg2 itself.

Actually, I have another observation. I was surprised to see that the whole test suite passed running the connection through an ssh tunnel: there are definitely failing queries in the test suite. So I made other tests and it seems the bad condition only happens using the `sshtunnel` module: If I open a tunnel manually with something like:

    ssh -L 36421:localhost:5432 -N localhost

and point your script to port 36421 everything works ok.


Conversely, I don't seem able to run the test suite through the tunnel open by the sshtunnel module. It doesn't hang, but the tunnel breaks in a test with the following reported on the sshtunnel side:

    2018-10-30 15:24:25,550| ERROR   | Socket exception: Bad file descriptor (9)
    2018-10-30 15:24:25,551| ERROR   | Could not establish connection from ('127.0.0.1', 33743) to remote side of the tunnel

and the following tests fail to run as the connection is broken.


Con-conversely, the patch I had in mind to fix #801, which very brutally is:

```
@@ -1136,6 +1136,13 @@ pq_get_last_result(connectionObject *conn)
                 || status == PGRES_COPY_IN) {
             break;
         }
+        if (PQisBusy(conn->pgconn)) {
+            /* This happens connecting through ssl tunnel
+             * TODO: just kill this function. The loop should happen within
+             * the async/green machinery. */
+            Dprintf("pq_get_last_result: we are busy");
+            break;
+        }
     }
 
     return result;
```

doesn't work, and failing queries leave the connection in an inconsistent state.


So, wrapping up, I think there is something you can do on your side, checking why the sshtunnel module behaves differently from a normal ssh tunnel, and if you use the latter (or configure sshtunnel to behave so) you shouldn't hit the problem. On our side I don't think we can fix #801 with a quick band aid, and we should rather do #802, but as things stands now I don't trust sshtunnel to do the right thing, and have less of an urgency to do so.

Please let us know if you understand what is the difference between the module and the tunnel via `ssh -L`, thank you!

    I have send the sample application and created an issue to sshtunnel git hub. Not sure where is the problem, as poll() function hangs so i thought its issue in psycopg2.

-- Daniele

Re: psycopg2 conn.poll() hangs when used with sshtunnel

From
Akshay Joshi
Date:
I have one more observation when i run this application one out of 10 times it works. I have logged one more issue as COPY command hangs, so thought it is the problem in psycopg2 

On Tue, 30 Oct 2018, 21:22 Akshay Joshi <akshay.joshi@enterprisedb.com wrote:


On Tue, 30 Oct 2018, 21:06 Daniele Varrazzo <daniele.varrazzo@gmail.com wrote:
On Tue, Oct 30, 2018 at 1:55 PM Akshay Joshi <akshay.joshi@enterprisedb.com> wrote:
 
    What changes should I made to fix this? Or it should be fixed in psycopg2 itself.

Actually, I have another observation. I was surprised to see that the whole test suite passed running the connection through an ssh tunnel: there are definitely failing queries in the test suite. So I made other tests and it seems the bad condition only happens using the `sshtunnel` module: If I open a tunnel manually with something like:

    ssh -L 36421:localhost:5432 -N localhost

and point your script to port 36421 everything works ok.


Conversely, I don't seem able to run the test suite through the tunnel open by the sshtunnel module. It doesn't hang, but the tunnel breaks in a test with the following reported on the sshtunnel side:

    2018-10-30 15:24:25,550| ERROR   | Socket exception: Bad file descriptor (9)
    2018-10-30 15:24:25,551| ERROR   | Could not establish connection from ('127.0.0.1', 33743) to remote side of the tunnel

and the following tests fail to run as the connection is broken.


Con-conversely, the patch I had in mind to fix #801, which very brutally is:

```
@@ -1136,6 +1136,13 @@ pq_get_last_result(connectionObject *conn)
                 || status == PGRES_COPY_IN) {
             break;
         }
+        if (PQisBusy(conn->pgconn)) {
+            /* This happens connecting through ssl tunnel
+             * TODO: just kill this function. The loop should happen within
+             * the async/green machinery. */
+            Dprintf("pq_get_last_result: we are busy");
+            break;
+        }
     }
 
     return result;
```

doesn't work, and failing queries leave the connection in an inconsistent state.


So, wrapping up, I think there is something you can do on your side, checking why the sshtunnel module behaves differently from a normal ssh tunnel, and if you use the latter (or configure sshtunnel to behave so) you shouldn't hit the problem. On our side I don't think we can fix #801 with a quick band aid, and we should rather do #802, but as things stands now I don't trust sshtunnel to do the right thing, and have less of an urgency to do so.

Please let us know if you understand what is the difference between the module and the tunnel via `ssh -L`, thank you!

    I have send the sample application and created an issue to sshtunnel git hub. Not sure where is the problem, as poll() function hangs so i thought its issue in psycopg2.

-- Daniele

Re: psycopg2 conn.poll() hangs when used with sshtunnel

From
Daniele Varrazzo
Date:
On Tue, Oct 30, 2018 at 3:57 PM Akshay Joshi
<akshay.joshi@enterprisedb.com> wrote:
>
> I have one more observation when i run this application one out of 10 times it works. I have logged one more issue as
COPYcommand hangs, so thought it is the problem in psycopg2
 

The problem with COPY was a different one: we just didn't manage the
state resulting by running COPY through execute, and ended up looping
forever on fake results returned by the libpq. The condition was
detected and handled for COPY BOTH (resulting by replication
statements) but not for the normal COPY TO/FROM. See
<https://github.com/psycopg/psycopg2/issues/781>.

This is a different story but, as described in
<https://github.com/pahaz/sshtunnel/issues/135> it seems that even if
we are doing something not entirely legit with
isBusy/consumeInput/getResult, it only becomes a problem through the
sshtunnel module, not through ssh -L.

-- Daniele