Thread: dubious improvement in new psql
The new psql automatically tries to reconnect if the backend disconnects unexpectedly. This feature strikes me as ill-conceived; furthermore it appears to be buggy. It's ill-conceived because: (1) under WAL, following a backend crash the postmaster is going to be spending a few seconds reinitializing; an immediate reconnect attempt is almost guaranteed to fail. (2) if I'm running an SQL script, I think it's extremely foolhardy to press on with executing the script as though nothing had happened. A backend crash is not an event to be lightly ignored. It's buggy because: it doesn't work reliably. While poking at the backend's problems with oversize btree index entries, I saw psql claim it had successfully reconnected, and then go into a catatonic state. It wouldn't give me a new command prompt (not even with ^C), wouldn't exit with ^D, and had to be killed from another shell window. This behavior doesn't seem to happen for every crash, but I'm not really interested in trying to debug it. I think the "feature" ought to be ripped out. regards, tom lane
On Sat, 25 Dec 1999, Tom Lane wrote: > The new psql automatically tries to reconnect if the backend disconnects > unexpectedly. This feature strikes me as ill-conceived; furthermore > it appears to be buggy. > > It's ill-conceived because: > (1) under WAL, following a backend crash the postmaster is going to be > spending a few seconds reinitializing; an immediate reconnect attempt > is almost guaranteed to fail. Good point. > (2) if I'm running an SQL script, I think it's extremely foolhardy > to press on with executing the script as though nothing had happened. > A backend crash is not an event to be lightly ignored. It only does the reconnect thing if it's used interactively. I suppose leaving psql in an unconnected state (which does exist) would be a better solution. I'll investigate the behaviour you observed below after I get back from my vacation. > > It's buggy because: it doesn't work reliably. While poking at the > backend's problems with oversize btree index entries, I saw psql claim > it had successfully reconnected, and then go into a catatonic state. > It wouldn't give me a new command prompt (not even with ^C), wouldn't > exit with ^D, and had to be killed from another shell window. > > This behavior doesn't seem to happen for every crash, but I'm not > really interested in trying to debug it. I think the "feature" > ought to be ripped out. > > regards, tom lane > > ************ > > -- Peter Eisentraut Sernanders vaeg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
At 11:14 PM 12/28/99 +0100, Peter Eisentraut wrote: >On Sat, 25 Dec 1999, Tom Lane wrote: > >> The new psql automatically tries to reconnect if the backend disconnects >> unexpectedly. This feature strikes me as ill-conceived; furthermore >> it appears to be buggy. >> >> It's ill-conceived because: >> (1) under WAL, following a backend crash the postmaster is going to be >> spending a few seconds reinitializing; an immediate reconnect attempt >> is almost guaranteed to fail. > >Good point. > >> (2) if I'm running an SQL script, I think it's extremely foolhardy >> to press on with executing the script as though nothing had happened. >> A backend crash is not an event to be lightly ignored. > >It only does the reconnect thing if it's used interactively. This raises a question, then. What should drivers for (say) web servers that are expected to stay up 24/7 do if reconnecting to a broken db connection can't be made reliable? I've currently rewritten the AOLserver driver to do just that, and it's working fine with 6.5.3. The AOLserver driver for Oracle most certainly can reconnect to a broken connection - to tell folks that this can't be done with the WAL version of Postgres will simply reinforce those of my friends who laugh at me for trying to use Postgres instead of simply biting the bullet and buying an Oracle license... - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Don Baccus <dhogaza@pacifier.com> writes: > The AOLserver driver for Oracle most > certainly can reconnect to a broken connection - to tell folks that > this can't be done with the WAL version of Postgres I said no such thing! You certainly *can* reconnect, although under WAL it will take a delay (or better, a retry loop). However, I think reconnection has to be integrated into the application's logic at a level where you can have some idea of what needs to be redone after reconnecting. That's why I objected to having psql do it. If psql's only going to do it interactively then I guess it's safe enough, though. Question for discussion: when the WAL postmaster is running a database start or restart, perhaps it should simply delay processing of new connection requests until the DB is ready, instead of rejecting them immediately? That would eliminate the need for retry loops in applications, and thereby avoid wasted retry processing on both sides. On the other hand, I can see where an unexpected multi-second delay to connect might be bad news, too. Comments? regards, tom lane
Tom Lane wrote: > Question for discussion: when the WAL postmaster is running a database > start or restart, perhaps it should simply delay processing of new > connection requests until the DB is ready, instead of rejecting them > immediately? That would eliminate the need for retry loops in > applications, and thereby avoid wasted retry processing on both sides. > On the other hand, I can see where an unexpected multi-second delay to > connect might be bad news, too. Comments? Suggestion: Make the delay/reconnect optional with configurable parameters for how many times to retry, how long to retry, etc. I have an Apache mod-perl app already doing this reconnect logic, and I'm very glad my app has control over those parameters. Cheers, Ed Loehr
At 01:48 PM 1/1/00 -0500, Tom Lane wrote: >I said no such thing! > >You certainly *can* reconnect, although under WAL it will take a delay >(or better, a retry loop). > >However, I think reconnection has to be integrated into the >application's logic at a level where you can have some idea of what >needs to be redone after reconnecting. That's why I objected to having >psql do it. If psql's only going to do it interactively then I guess >it's safe enough, though. OK, my misunderstanding. I couldn't understand why psql in interactive mode should be a problem and took your comments in a more general context. > >Question for discussion: when the WAL postmaster is running a database >start or restart, perhaps it should simply delay processing of new >connection requests until the DB is ready, instead of rejecting them >immediately? That would eliminate the need for retry loops in >applications, and thereby avoid wasted retry processing on both sides. >On the other hand, I can see where an unexpected multi-second delay to >connect might be bad news, too. Comments? I've been thinking about this one, actually... Perhaps letting the caller decide in some manner? In my driver environment I'm not really supposed to call sleep or the like and a busy-wait for the connection(s) to be rebuilt probably isn't the best thing to do, since the postmaster is going to be hard at work straightening out things with the WAL. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Okay, I looked at the code again and I can't see anything wrong conceptually. It follows libpq semantics which I remember to have grabbed from the documentation: results = PQexec(pset->db, query); /* do something with result */ if (PQstatus(pset->db) == CONNECTION_BAD){ fputs("The connection to the server was lost. Attempting reset: ", stderr); PQreset(pset->db); if (PQstatus(pset->db) == CONNECTION_BAD) { fputs("Failed.\n", stderr); PQfinish(pset->db); PQclear(results); pset->db = NULL; return false; } else fputs("Succeeded.\n",stderr);} If you can still reproduce this somehow, I'd like to know where it hangs and/or what the output was. On 1999-12-25, Tom Lane mentioned: > The new psql automatically tries to reconnect if the backend disconnects > unexpectedly. This feature strikes me as ill-conceived; furthermore > it appears to be buggy. > > It's ill-conceived because: > (1) under WAL, following a backend crash the postmaster is going to be > spending a few seconds reinitializing; an immediate reconnect attempt > is almost guaranteed to fail. Then rip out PQreset. It's not psql's job to make these kinds of decisions. > (2) if I'm running an SQL script, I think it's extremely foolhardy > to press on with executing the script as though nothing had happened. > A backend crash is not an event to be lightly ignored. Then rip out PQreset. To quote from the docs: "This function will close the connection to the backend and attempt to reestablish a new connection to the same postmaster, using all the same parameters previously used. This may be useful for error recovery if a working connection is lost." I don't know all the possible ways a backend can go down, but one of them might be a short network failure. In that case attempting a reset might be the reasonable thing to do. Again, this should be addressed at the libpq level. > > It's buggy because: it doesn't work reliably. While poking at the > backend's problems with oversize btree index entries, I saw psql claim > it had successfully reconnected, and then go into a catatonic state. Look at the above code; seems like a libpq problem. > It wouldn't give me a new command prompt (not even with ^C), wouldn't > exit with ^D, and had to be killed from another shell window. > > This behavior doesn't seem to happen for every crash, but I'm not > really interested in trying to debug it. I think the "feature" I am. :) > ought to be ripped out. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden