Thread: Handling connection loss

Handling connection loss

From
Dave Page
Date:
I've been sufficiently annoyed by Josh's discovery that the
reconnection changes in 1.14 are still broken in some situations that
I've spent most of the day hacking on this. It's become pretty clear
to me that the current code structure really doesn't lend itself to
cleanly handling all possible disconnection scenarios - every time I
managed to fix one potential problem area I found another - and then
realised that I was building connection handling code into all sorts
of unrelated places where it really shouldn't be.

So, I bit the bullet and moved all the reconnection code into the low
level database classes. Originally I shied away from that due to the
need to throw up message boxes, but I'm pretty much convinced now that
that is the lesser of the evils. So, the attached patch (against
REL-1_14_0_PATCHES) centralises the connection test code into on
function in pgConn:CheckConnection, which will check the connection
and allows the user to reconnect if desired. All the other pgConn
functions that use the database connection (except those doing the
setup) now call that function upon entry, as do most of the places in
other parts of the app that used to do some form of connection
checking.

The one downside to this design is that failure to reconnect
constitutes a fatal error, as there's no way from the low level
database classes that we can cleanup state in the higher levels. It's
not quite as bad as it initially sounds though - the user can request
infinite connection retries; it only exits when they hit "No" to the
reconnect prompt (which is clear that doing so will be fatal).

I'm proposing this for 1.14, however I'm aware it's a big change, and
it's late in the cycle. I'd therefore like lots of review and testing
please :-)

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: Handling connection loss

From
Guillaume Lelarge
Date:
On Mon, 2011-07-18 at 15:18 +0100, Dave Page wrote:
> I've been sufficiently annoyed by Josh's discovery that the
> reconnection changes in 1.14 are still broken in some situations that
> I've spent most of the day hacking on this. It's become pretty clear
> to me that the current code structure really doesn't lend itself to
> cleanly handling all possible disconnection scenarios - every time I
> managed to fix one potential problem area I found another - and then
> realised that I was building connection handling code into all sorts
> of unrelated places where it really shouldn't be.
>
> So, I bit the bullet and moved all the reconnection code into the low
> level database classes. Originally I shied away from that due to the
> need to throw up message boxes, but I'm pretty much convinced now that
> that is the lesser of the evils. So, the attached patch (against
> REL-1_14_0_PATCHES) centralises the connection test code into on
> function in pgConn:CheckConnection, which will check the connection
> and allows the user to reconnect if desired. All the other pgConn
> functions that use the database connection (except those doing the
> setup) now call that function upon entry, as do most of the places in
> other parts of the app that used to do some form of connection
> checking.
>
> The one downside to this design is that failure to reconnect
> constitutes a fatal error, as there's no way from the low level
> database classes that we can cleanup state in the higher levels. It's
> not quite as bad as it initially sounds though - the user can request
> infinite connection retries; it only exits when they hit "No" to the
> reconnect prompt (which is clear that doing so will be fatal).
>
> I'm proposing this for 1.14, however I'm aware it's a big change, and
> it's late in the cycle. I'd therefore like lots of review and testing
> please :-)
>

Did some tests and found something weird. I connect pgadmin to a
database, I look into some objects with pgAdmin's browser. Then, I stop
PostgreSQL, and start it again. Next object I click on will display a
nice "lost connection" message, and another messagebox asking me if I
want to reconnect. I click Yes and everything seems good (iow, I have
the info I asked). I then click on another object, and it asks me again
if I want to reconnect. Weird. I click Yes, and I get reconnected, and I
won't be bothered again. I find it weird to have to say Yes two times
when the connection was lost only once.

Moreover, with the SSH tunnel as described by Christophe Chauvet on
-support, I don't even have the message box asking me if I want to
reconnect.


--
Guillaume
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com


Re: Handling connection loss

From
Jasmin Dizdarevic
Date:
There are some unexpected errors, when reconnecting.

1. Open Function collection, shutdown DB, click on a function: Result: stacktrace.png
2. Open Table collection, shutdown DB, click on a table, start DB, reconnect: Result: stacktrace2.png

Can't we include a third layer between application and database code, which handle's corrupt connections?

2011/7/18 Guillaume Lelarge <guillaume@lelarge.info>
On Mon, 2011-07-18 at 15:18 +0100, Dave Page wrote:
> I've been sufficiently annoyed by Josh's discovery that the
> reconnection changes in 1.14 are still broken in some situations that
> I've spent most of the day hacking on this. It's become pretty clear
> to me that the current code structure really doesn't lend itself to
> cleanly handling all possible disconnection scenarios - every time I
> managed to fix one potential problem area I found another - and then
> realised that I was building connection handling code into all sorts
> of unrelated places where it really shouldn't be.
>
> So, I bit the bullet and moved all the reconnection code into the low
> level database classes. Originally I shied away from that due to the
> need to throw up message boxes, but I'm pretty much convinced now that
> that is the lesser of the evils. So, the attached patch (against
> REL-1_14_0_PATCHES) centralises the connection test code into on
> function in pgConn:CheckConnection, which will check the connection
> and allows the user to reconnect if desired. All the other pgConn
> functions that use the database connection (except those doing the
> setup) now call that function upon entry, as do most of the places in
> other parts of the app that used to do some form of connection
> checking.
>
> The one downside to this design is that failure to reconnect
> constitutes a fatal error, as there's no way from the low level
> database classes that we can cleanup state in the higher levels. It's
> not quite as bad as it initially sounds though - the user can request
> infinite connection retries; it only exits when they hit "No" to the
> reconnect prompt (which is clear that doing so will be fatal).
>
> I'm proposing this for 1.14, however I'm aware it's a big change, and
> it's late in the cycle. I'd therefore like lots of review and testing
> please :-)
>

Did some tests and found something weird. I connect pgadmin to a
database, I look into some objects with pgAdmin's browser. Then, I stop
PostgreSQL, and start it again. Next object I click on will display a
nice "lost connection" message, and another messagebox asking me if I
want to reconnect. I click Yes and everything seems good (iow, I have
the info I asked). I then click on another object, and it asks me again
if I want to reconnect. Weird. I click Yes, and I get reconnected, and I
won't be bothered again. I find it weird to have to say Yes two times
when the connection was lost only once.

Moreover, with the SSH tunnel as described by Christophe Chauvet on
-support, I don't even have the message box asking me if I want to
reconnect.


--
Guillaume
 http://blog.guillaume.lelarge.info
 http://www.dalibo.com


--
Sent via pgadmin-hackers mailing list (pgadmin-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgadmin-hackers

Attachment

Re: Handling connection loss

From
Dave Page
Date:
On Mon, Jul 18, 2011 at 9:11 PM, Guillaume Lelarge
<guillaume@lelarge.info> wrote:
>
> Did some tests and found something weird. I connect pgadmin to a
> database, I look into some objects with pgAdmin's browser. Then, I stop
> PostgreSQL, and start it again. Next object I click on will display a
> nice "lost connection" message, and another messagebox asking me if I
> want to reconnect. I click Yes and everything seems good (iow, I have
> the info I asked). I then click on another object, and it asks me again
> if I want to reconnect. Weird. I click Yes, and I get reconnected, and I
> won't be bothered again. I find it weird to have to say Yes two times
> when the connection was lost only once.

Was the second object in the same database? Don't forget that we might
have to restore multiple connections to individual databases, and the
maintenance DB - and we only test them when we need them.

> Moreover, with the SSH tunnel as described by Christophe Chauvet on
> -support, I don't even have the message box asking me if I want to
> reconnect.

I was testing using Josh's SSH tunnel test primarily, so that worked
well for me. Can you describe the steps to reproduce the issue?



--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Handling connection loss

From
Dave Page
Date:
On Mon, Jul 18, 2011 at 11:23 PM, Jasmin Dizdarevic
<jasmin.dizdarevic@gmail.com> wrote:
> There are some unexpected errors, when reconnecting.
> 1. Open Function collection, shutdown DB, click on a function: Result:
> stacktrace.png
> 2. Open Table collection, shutdown DB, click on a table, start DB,
> reconnect: Result: stacktrace2.png

I believe these would be fixed by the attached update.

> Can't we include a third layer between application and database code, which
> handle's corrupt connections?

I'm not sure that would help much, except to give us a place to bleed
the abstraction layers into.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: Handling connection loss

From
Guillaume Lelarge
Date:
On Tue, 2011-07-19 at 09:53 +0100, Dave Page wrote:
> On Mon, Jul 18, 2011 at 9:11 PM, Guillaume Lelarge
> <guillaume@lelarge.info> wrote:
> >
> > Did some tests and found something weird. I connect pgadmin to a
> > database, I look into some objects with pgAdmin's browser. Then, I stop
> > PostgreSQL, and start it again. Next object I click on will display a
> > nice "lost connection" message, and another messagebox asking me if I
> > want to reconnect. I click Yes and everything seems good (iow, I have
> > the info I asked). I then click on another object, and it asks me again
> > if I want to reconnect. Weird. I click Yes, and I get reconnected, and I
> > won't be bothered again. I find it weird to have to say Yes two times
> > when the connection was lost only once.
>
> Was the second object in the same database? Don't forget that we might
> have to restore multiple connections to individual databases, and the
> maintenance DB - and we only test them when we need them.
>

It was the same one.

> > Moreover, with the SSH tunnel as described by Christophe Chauvet on
> > -support, I don't even have the message box asking me if I want to
> > reconnect.
>
> I was testing using Josh's SSH tunnel test primarily, so that worked
> well for me. Can you describe the steps to reproduce the issue?
>

Create an SSH tunnel (for example like this ssh -L 2000:127.0.0.1:5432
myuser@my_ip).

Launch PgAdmin, open a connection with server localhost, port 2000. You
should be able to connect to a database. Then, go to your terminal, drop
the tunnel connection, and then reopen the tunnel connection. Go back to
the browser, click on some objects. pgAdmin didn't ask if I want to
reconnect.


--
Guillaume
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com


Re: Handling connection loss

From
Dave Page
Date:
On Tue, Jul 19, 2011 at 10:19 AM, Guillaume Lelarge
<guillaume@lelarge.info> wrote:
> On Tue, 2011-07-19 at 09:53 +0100, Dave Page wrote:
>> On Mon, Jul 18, 2011 at 9:11 PM, Guillaume Lelarge
>> <guillaume@lelarge.info> wrote:
>> >
>> > Did some tests and found something weird. I connect pgadmin to a
>> > database, I look into some objects with pgAdmin's browser. Then, I stop
>> > PostgreSQL, and start it again. Next object I click on will display a
>> > nice "lost connection" message, and another messagebox asking me if I
>> > want to reconnect. I click Yes and everything seems good (iow, I have
>> > the info I asked). I then click on another object, and it asks me again
>> > if I want to reconnect. Weird. I click Yes, and I get reconnected, and I
>> > won't be bothered again. I find it weird to have to say Yes two times
>> > when the connection was lost only once.
>>
>> Was the second object in the same database? Don't forget that we might
>> have to restore multiple connections to individual databases, and the
>> maintenance DB - and we only test them when we need them.
>>
>
> It was the same one.

Hmm. Not sure why that would happen. Do you recall what the object types were?

>> > Moreover, with the SSH tunnel as described by Christophe Chauvet on
>> > -support, I don't even have the message box asking me if I want to
>> > reconnect.
>>
>> I was testing using Josh's SSH tunnel test primarily, so that worked
>> well for me. Can you describe the steps to reproduce the issue?
>>
>
> Create an SSH tunnel (for example like this ssh -L 2000:127.0.0.1:5432
> myuser@my_ip).
>
> Launch PgAdmin, open a connection with server localhost, port 2000. You
> should be able to connect to a database. Then, go to your terminal, drop
> the tunnel connection, and then reopen the tunnel connection. Go back to
> the browser, click on some objects. pgAdmin didn't ask if I want to
> reconnect.

That's basically how I've been testing it. I wonder if you're getting
confused by the fact that not all object types will access the
database when clicked (and others won't access the database if they've
previously been clicked or expanded), thus won't necessarily cause a
connection check to occur.

Do you see any cases where the connection isn't checked and bad things
happen? Or just cases where you expected a check, but don't see one.

BTW; thanks to both you and Jasmin for testing.


--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Handling connection loss

From
Guillaume Lelarge
Date:
On Tue, 2011-07-19 at 10:56 +0100, Dave Page wrote:
> On Tue, Jul 19, 2011 at 10:19 AM, Guillaume Lelarge
> <guillaume@lelarge.info> wrote:
> > On Tue, 2011-07-19 at 09:53 +0100, Dave Page wrote:
> >> On Mon, Jul 18, 2011 at 9:11 PM, Guillaume Lelarge
> >> <guillaume@lelarge.info> wrote:
> >> >
> >> > Did some tests and found something weird. I connect pgadmin to a
> >> > database, I look into some objects with pgAdmin's browser. Then, I stop
> >> > PostgreSQL, and start it again. Next object I click on will display a
> >> > nice "lost connection" message, and another messagebox asking me if I
> >> > want to reconnect. I click Yes and everything seems good (iow, I have
> >> > the info I asked). I then click on another object, and it asks me again
> >> > if I want to reconnect. Weird. I click Yes, and I get reconnected, and I
> >> > won't be bothered again. I find it weird to have to say Yes two times
> >> > when the connection was lost only once.
> >>
> >> Was the second object in the same database? Don't forget that we might
> >> have to restore multiple connections to individual databases, and the
> >> maintenance DB - and we only test them when we need them.
> >>
> >
> > It was the same one.
>
> Hmm. Not sure why that would happen. Do you recall what the object types were?
>

Tables and schemas.

> >> > Moreover, with the SSH tunnel as described by Christophe Chauvet on
> >> > -support, I don't even have the message box asking me if I want to
> >> > reconnect.
> >>
> >> I was testing using Josh's SSH tunnel test primarily, so that worked
> >> well for me. Can you describe the steps to reproduce the issue?
> >>
> >
> > Create an SSH tunnel (for example like this ssh -L 2000:127.0.0.1:5432
> > myuser@my_ip).
> >
> > Launch PgAdmin, open a connection with server localhost, port 2000. You
> > should be able to connect to a database. Then, go to your terminal, drop
> > the tunnel connection, and then reopen the tunnel connection. Go back to
> > the browser, click on some objects. pgAdmin didn't ask if I want to
> > reconnect.
>
> That's basically how I've been testing it. I wonder if you're getting
> confused by the fact that not all object types will access the
> database when clicked (and others won't access the database if they've
> previously been clicked or expanded), thus won't necessarily cause a
> connection check to occur.
>

No, it was tables too.

> Do you see any cases where the connection isn't checked and bad things
> happen? Or just cases where you expected a check, but don't see one.
>
> BTW; thanks to both you and Jasmin for testing.
>
>

I tried something else today. Suppose a database with two schemas. Both
have objects in them, let's say two tables in each schema. Connect to
the database, click on the + of the first schema. You'll see all objects
in it. Now, restart PostgreSQL. And click on the + of the second schema.
It'll tell you you're disconnected and will ask you if you want to
reconnect. You say yes, it reconnects and you'll see all the objects
types with nothing ("Tables (0)" for example instead of "Tables (2)").
If you refresh the schema, you'll have the right info (ie, "Tables
(2)").

BTW, I used the v2 patch.


--
Guillaume
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com


Re: Handling connection loss

From
Dave Page
Date:
On Tue, Jul 19, 2011 at 7:47 PM, Guillaume Lelarge
<guillaume@lelarge.info> wrote:
> I tried something else today. Suppose a database with two schemas. Both
> have objects in them, let's say two tables in each schema. Connect to
> the database, click on the + of the first schema. You'll see all objects
> in it. Now, restart PostgreSQL. And click on the + of the second schema.
> It'll tell you you're disconnected and will ask you if you want to
> reconnect. You say yes, it reconnects and you'll see all the objects
> types with nothing ("Tables (0)" for example instead of "Tables (2)").
> If you refresh the schema, you'll have the right info (ie, "Tables
> (2)").

Yeah, I think we're going to run into issues like this. It comes about
because the database classes have no way of telling the browser to do
things like refresh again. The question is, is this approach better
than struggling on with what we have now, and if so, what else needs
to be done to make the reconnection invisible to the rest of the app
(it's supposed to be - what did I miss)?

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Handling connection loss

From
Guillaume Lelarge
Date:
On Tue, 2011-07-19 at 21:12 +0100, Dave Page wrote:
> On Tue, Jul 19, 2011 at 7:47 PM, Guillaume Lelarge
> <guillaume@lelarge.info> wrote:
> > I tried something else today. Suppose a database with two schemas. Both
> > have objects in them, let's say two tables in each schema. Connect to
> > the database, click on the + of the first schema. You'll see all objects
> > in it. Now, restart PostgreSQL. And click on the + of the second schema.
> > It'll tell you you're disconnected and will ask you if you want to
> > reconnect. You say yes, it reconnects and you'll see all the objects
> > types with nothing ("Tables (0)" for example instead of "Tables (2)").
> > If you refresh the schema, you'll have the right info (ie, "Tables
> > (2)").
>
> Yeah, I think we're going to run into issues like this. It comes about
> because the database classes have no way of telling the browser to do
> things like refresh again. The question is, is this approach better
> than struggling on with what we have now, and if so, what else needs
> to be done to make the reconnection invisible to the rest of the app
> (it's supposed to be - what did I miss)?
>

Actually, I don't know. I mean, showing no objects in a schema when it
should shows objects is a really major burden. But the lost connection
issue is also a major burden.

I don't think we should hurry on this, we have to take the time to do it
right. Whatever that means.


--
Guillaume
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com