Thread: Beating Oracle
I posted a question a while back but got no response :-( Due to network glitch, a PostgreSQL connection over IP is being dropped. In the same situation, Oracle can keep it's connections going. Is there any way to keep PostgreSQL connections alive across glitches, or perhaps re-connect to an existing backend connection? I have people using a mix of PostgreSQL and Oracle at the moment. I suspect they will drop PostgreSQL if there is no answer to this issue. BTW I am the author of the Smalltalk PostgreSQL library (http://sourceforge.net/projects/st-postgresql) Thanks.
Bruce Badger wrote: > I posted a question a while back but got no response :-( > > Due to network glitch, a PostgreSQL connection over IP is being dropped. > > In the same situation, Oracle can keep it's connections going. > > Is there any way to keep PostgreSQL connections alive across glitches, > or perhaps re-connect to an existing backend connection? > > I have people using a mix of PostgreSQL and Oracle at the moment. I > suspect they will drop PostgreSQL if there is no answer to this issue. > > BTW I am the author of the Smalltalk PostgreSQL library > (http://sourceforge.net/projects/st-postgresql) Wow, the TCP connection gets dropped and Oracle somehow reestablishes the connection? That is a nice feature. TCP has it's own restart logic that will handle the dropping of some packets, but if the TCP connection itself shuts down, PostgreSQL has no way of reestablishing it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Badger <bbadger@BadgerSE.com> writes: > Due to network glitch, a PostgreSQL connection over IP is being dropped. Perhaps there is something wrong with your TCP stacks. I do not think it's Postgres' responsibility to second-guess the network transport code... regards, tom lane
Tom Lane wrote: >Bruce Badger <bbadger@BadgerSE.com> writes: > >>Due to network glitch, a PostgreSQL connection over IP is being dropped. >> > >Perhaps there is something wrong with your TCP stacks. I do not think >it's Postgres' responsibility to second-guess the network transport >code... > > regards, tom lane > Not *my* stack, BTW, but that aside I agree with you. The people involved suspect that a firewall is causing the problem. The thing is that from their point of view, Oracle keeps going where PostgreSQL does not. Given that both are running over the same network, the response of these people could well be to simply move everything to Oracle. The question, then, is: what can we PostgreSQL people can do to match this Oracle ability??
Bruce Badger wrote: > Tom Lane wrote: > > >Bruce Badger <bbadger@BadgerSE.com> writes: > > > >>Due to network glitch, a PostgreSQL connection over IP is being dropped. > >> > > > >Perhaps there is something wrong with your TCP stacks. I do not think > >it's Postgres' responsibility to second-guess the network transport > >code... > > > > regards, tom lane > > > Not *my* stack, BTW, but that aside I agree with you. The people > involved suspect that a firewall is causing the problem. > > The thing is that from their point of view, Oracle keeps going where > PostgreSQL does not. Given that both are running over the same network, > the response of these people could well be to simply move everything to > Oracle. > > The question, then, is: what can we PostgreSQL people can do to match > this Oracle ability?? Unless there is more information, nothing. This is the first time I have heard of this in my 6 years with PostgreSQL. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: >Bruce Badger wrote: > >>Tom Lane wrote: >> >>>Bruce Badger <bbadger@BadgerSE.com> writes: >>> >>>>Due to network glitch, a PostgreSQL connection over IP is being dropped. >>>> >>>Perhaps there is something wrong with your TCP stacks. I do not think >>>it's Postgres' responsibility to second-guess the network transport >>>code... >>> >>> regards, tom lane >>> >>Not *my* stack, BTW, but that aside I agree with you. The people >>involved suspect that a firewall is causing the problem. >> >>The thing is that from their point of view, Oracle keeps going where >>PostgreSQL does not. Given that both are running over the same network, >>the response of these people could well be to simply move everything to >>Oracle. >> >>The question, then, is: what can we PostgreSQL people can do to match >>this Oracle ability?? >> > >Unless there is more information, nothing. This is the first time I >have heard of this in my 6 years with PostgreSQL. > OK - now I have spoken with you sages, I will indeed go back and see if these people have any more information. Many thanks.
Hello Bruce. I'd be willing to bet that if a firewall is involved, it's been especially tuned for Oracle, whereas special rules weren't set for Postgres. As soon as those rules get put in, things will probably be very equivalent. -Brian On Mar 2, 10:15am, Bruce Badger wrote: } Subject: Re: Beating Oracle } Bruce Momjian wrote: } } >Bruce Badger wrote: } > } >>Tom Lane wrote: } >> } >>>Bruce Badger <bbadger@BadgerSE.com> writes: } >>> } >>>>Due to network glitch, a PostgreSQL connection over IP is being dropped. } >>>> } >>>Perhaps there is something wrong with your TCP stacks. I do not think } >>>it's Postgres' responsibility to second-guess the network transport } >>>code... } >>> } >>> regards, tom lane } >>> } >>Not *my* stack, BTW, but that aside I agree with you. The people } >>involved suspect that a firewall is causing the problem. } >> } >>The thing is that from their point of view, Oracle keeps going where } >>PostgreSQL does not. Given that both are running over the same network, } >>the response of these people could well be to simply move everything to } >>Oracle. } >> } >>The question, then, is: what can we PostgreSQL people can do to match } >>this Oracle ability?? } >> } > } >Unless there is more information, nothing. This is the first time I } >have heard of this in my 6 years with PostgreSQL. } > } OK - now I have spoken with you sages, I will indeed go back and see if } these people have any more information. } } Many thanks. } } } } ---------------------------(end of broadcast)--------------------------- } TIP 4: Don't 'kill -9' the postmaster >-- End of excerpt from Bruce Badger
Bruce Badger <bbadger@BadgerSE.com> writes: > The thing is that from their point of view, Oracle keeps going where > PostgreSQL does not. Given that both are running over the same network, > the response of these people could well be to simply move everything to > Oracle. <shrug> If that feature makes Oracle worth its price to them, then no doubt that's what they'll do. Postgres doesn't have 6000 full-time developers engaged in inventing bizarre features (and a substitute for something that should be done at the network level is bizarre in my book). The reason I'm so unenthused about this is that I think it'd be quite difficult to do transparently at the database level. (1) How can you know where communication was lost, from the other end's point of view? (eg, client has no idea whether database completed the last request sent.) I think you'd wind up reimplementing TCP, or something much like it, just to deal with that aspect of the problem. (2) How can you even be sure that you're reconnecting to the same other-end session? Opportunities for security holes seem rampant. (3) Should the database hold sessions open indefinitely, waiting to see if the client can be recontacted? Won't take too long until your session limit is entirely filled by dead connections. Non-broken TCP stacks are normally quite tenacious about surviving transmission glitches in already-open connections. I think you ought to be looking for solutions at that level, not trying to tell us that it's our problem. regards, tom lane
Tom Lane wrote: >Bruce Badger <bbadger@BadgerSE.com> writes: > >>The thing is that from their point of view, Oracle keeps going where >>PostgreSQL does not. Given that both are running over the same network, >>the response of these people could well be to simply move everything to >>Oracle. >> > ><shrug> If that feature makes Oracle worth its price to them, then >no doubt that's what they'll do. Postgres doesn't have 6000 full-time >developers engaged in inventing bizarre features (and a substitute for >something that should be done at the network level is bizarre in my >book). > >The reason I'm so unenthused about this is that I think it'd be quite >difficult to do transparently at the database level. (1) How can you >know where communication was lost, from the other end's point of view? >(eg, client has no idea whether database completed the last request >sent.) I think you'd wind up reimplementing TCP, or something much like >it, just to deal with that aspect of the problem. (2) How can you even >be sure that you're reconnecting to the same other-end session? >Opportunities for security holes seem rampant. (3) Should the database >hold sessions open indefinitely, waiting to see if the client can be >recontacted? Won't take too long until your session limit is entirely >filled by dead connections. > >Non-broken TCP stacks are normally quite tenacious about surviving >transmission glitches in already-open connections. I think you ought >to be looking for solutions at that level, not trying to tell us that >it's our problem. > > regards, tom lane > Ah, Tom. I'm not demanding anything, nor saying this is any individual's problem. I was merely seeking the wisdom of others, and I am grateful for your feedback (apart from the bits that slap me for something I have not done ;-) ). FWIW, I too have invested a great deal of my own time in making PostgreSQL available to others, having built and supported the Smalltalk PostgreSQL library.
> The reason I'm so unenthused about this is that I think it'd be quite > difficult to do transparently at the database level. (1) How can you > know where communication was lost, from the other end's point of view? Actually, if we're talking about transient TCP failures here (ie. one in X connections dies halfway through a transaction), then [1]libpqxx does have the logic built-in for semi-transparent connection recovery. I'm not sure this is quite what is needed here, but what it does right now is reset the backend connection (PQreset() I think the function was), forget about the current attempt at performing the transaction, and re- running the client code that makes up the transaction. So we're not talking about a simple replay of old SQL, but a simple framework for writing restartable code that "plugs into" a connection. Look for the Transactor base class. If PQreset() is not sufficient, I see no fundamental problem in doing an entire new connect instead. The one underlying assumption of course is that the client side will know if the connection was lost. If TCP and libpq guarantee that much, then the libpqxx mechanism should solve the problem perfectly. All we need now is two-phase commit. :-) > (eg, client has no idea whether database completed the last request > sent.) I think you'd wind up reimplementing TCP, or something much > like it, just to deal with that aspect of the problem. (2) How can you > even be sure that you're reconnecting to the same other-end session? > Opportunities for security holes seem rampant. (3) Should the database > hold sessions open indefinitely, waiting to see if the client can be > recontacted? Won't take too long until your session limit is entirely > filled by dead connections. Basically, libpqxx solves these problems at the client side--it doesn't try to reconnect to an older session, but simply leaves the old session to wither and die while it starts up a new one to perform the same task. This takes very little in the way of state-keeping on either side. Jeroen [1] http://members.ams.chello.nl/j.vermeulen31/proj-libpqxx.html
Bruce Momjian wrote: > TCP has it's own restart logic that will handle the dropping of some > packets, but if the TCP connection itself shuts down, PostgreSQL has no > way of reestablishing it. Yupp, and it already has spent some time trying to retransmit lost packets in order to keep this connection alive, before telling "sorry, lost contact". Do we want to decide dynamically, maybe based on the number of row locks the current transaction holds, what a good amount of time for trying to reconnect is? I mean, someone else might wait for these rows to be available again. Maybe even the same user, because his client library decided earlier that the connection is lost, unfortunately some "reestablish packets" drowned in the net too, but his app finally succeeded in connectingto a new backend. If you have connection aborts, the client and application code is the wrong place to fix. In the "golden era",95% of all connectivity problems used to be cables or plugs. These days, firewalls take care that increased reliability of cables and plugs doesn't increase overall network reliability. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
<jtv@xs4all.nl> writes: > Basically, libpqxx solves these problems at the client side--it doesn't try > to reconnect to an older session, but simply leaves the old session to > wither and die while it starts up a new one to perform the same task. This > takes very little in the way of state-keeping on either side. This scares the hell out of me. How do you know whether the old session committed your transaction, but the connection died just before it could tell you so? libpqxx is the very LAST place in the chain that is qualified to undertake the task of error recovery. There are cases where the best way to design the client code is to make it automatically rollback and then retry whole transactions. (In particular, if you use SERIALIZABLE transaction mode then this is more or less essential to cope with concurrency violations.) If you've got a client that's built that way, then it's a pretty simple matter to extend the client to do the same sort of retry for connection-level failures, too. But I'd still be worried about the prospect of double execution of a transaction, if the link was lost just after you issued a COMMIT and just before the backend could tell you it had committed. So you'd also need to add code to verify you hadn't already executed the transaction. In any case, I do not think that libpq or libpqxx can or should try to hide this problem from the client. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Postgres doesn't have 6000 full-time developers engaged in inventing > bizarre features (and a substitute for something that should be done > at the network level is bizarre in my book). May be, though, it's just a matter of not setting SO_KEEPALIVE? In my experience, the difference between always keeping and spontaneously losing a connection during a network outage is often this option. In src/backend/libpq/pqcomm.c, we find (reformatted slightly here): /* select NODELAY and KEEPALIVE options if it's a TCP connection */ if (port->laddr.sa.sa_family == AF_INET) { inton = 1; if (setsockopt(port->sock, IPPROTO_TCP, TCP_NODELAY, (char *) &on, sizeof(on)) < 0) { perror("postmaster: StreamConnection: setsockopt(TCP_NODELAY)"); return STATUS_ERROR; } if (setsockopt(port->sock,SOL_SOCKET, SO_KEEPALIVE, (char *) &on, sizeof(on)) < 0) { perror("postmaster:StreamConnection: setsockopt(SO_KEEPALIVE)"); return STATUS_ERROR; } } And in my nearest manual entry on setsockopt(), I find: SO_KEEPALIVE enables the periodic transmission of messages on a connected socket. Should the connected party fail torespond to these messages, the connection is considered broken and processes using the socket are notified via a SIGPIPEsignal when attempting to send data. Betcha if Bruce commented out that second setsockopt() block, he'd suddenly see his PostgreSQL connections stay up, just like the Oracle ones! :-) > Non-broken TCP stacks are normally quite tenacious about surviving > transmission glitches in already-open connections. ...but you can turn part of that off by turning on SO_KEEPALIVE. -tih -- Puritanism -- the haunting fear that someone, somewhere, may be happy.
Bruce Badger <bbadger@BadgerSE.com> writes: > Due to network glitch, a PostgreSQL connection over IP is being dropped. > > In the same situation, Oracle can keep it's connections going. I mentioned the SO_KEEPALIVE socket option in another email, but the other possibility here is, of course, that the PostgreSQL connection might be receiving an ICMP error message that Oracle isn't. This would be due to different handling of the two protocols in a firewall or something -- and whether it happens or not would be something you could find out using tcpdump. -tih -- Puritanism -- the haunting fear that someone, somewhere, may be happy.
Tom Ivar Helbekkmo <tih@kpnQwest.no> writes: >> Non-broken TCP stacks are normally quite tenacious about surviving >> transmission glitches in already-open connections. > ...but you can turn part of that off by turning on SO_KEEPALIVE. In an RFC-compliant stack, the outage interval required before KEEPALIVE will kill the connection is of the order of hours. RFC 1122 specifies that the minimum interval before the first probe is even sent is 2 hours (since last activity on connection), and that a single failed probe is not sufficient reason to drop the connection. RFC 2525 does note that excessively short keepalive timeout is a common form of TCP-stack bug. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > In an RFC-compliant stack, the outage interval required before KEEPALIVE > will kill the connection is of the order of hours. RFC 1122 specifies > that the minimum interval before the first probe is even sent is 2 hours > (since last activity on connection), and that a single failed probe is > not sufficient reason to drop the connection. Ah, yes, I see that now. On NetBSD, it's four hours until the first keepalive, then eight missed ones at 150 second intervals (totalling 20 minutes) are required before the connection is considered dead. > RFC 2525 does note that excessively short keepalive timeout is a common > form of TCP-stack bug. So, Bruce might still be bothered with something like that, and/or (for all he's given us of details) he might actually be talking about a situation where Oracle will wait through severely prolonged outages where PostgreSQL won't. Details, Bruce? -tih -- Puritanism -- the haunting fear that someone, somewhere, may be happy.
Tom Ivar Helbekkmo <tih@kpnQwest.no> writes: > So, Bruce might still be bothered with something like that, and/or > (for all he's given us of details) he might actually be talking about > a situation where Oracle will wait through severely prolonged outages > where PostgreSQL won't. I believe that Oracle uses its own networking code (SQLNet) which might well have different error response behavior than TCP does. Still, it's hard to believe that even a broken TCP stack will not hold connections through short- or moderate-duration network glitches; and Bruce did not say that he was trying to deal with multiple-hour outages. I have been wondering about whether Bruce's problem is firewall misbehavior. In particular, if there's a NAT translation happening anywhere between his client and his server, then the firewall could break the connection by dropping that particular port mapping, which perhaps it might do if there's no activity for awhile. In this case, it might actually be that the default KEEPALIVE timeout of 2 hours is too long for us :-(. (RFC 1122 says that the timeout shall be configurable, but this requirement seems to be widely ignored.) As for why Oracle doesn't suffer the same problem, someone suggested that the firewall might be specially configured not to drop Oracle connections (or perhaps to pass them through without NAT mapping). I don't know enough about SQLNet to know what to look for. regards, tom lane
Tom Ivar Helbekkmo wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > > In an RFC-compliant stack, the outage interval required before KEEPALIVE > > will kill the connection is of the order of hours. RFC 1122 specifies > > that the minimum interval before the first probe is even sent is 2 hours > > (since last activity on connection), and that a single failed probe is > > not sufficient reason to drop the connection. > > Ah, yes, I see that now. On NetBSD, it's four hours until the first > keepalive, then eight missed ones at 150 second intervals (totalling > 20 minutes) are required before the connection is considered dead. > > > RFC 2525 does note that excessively short keepalive timeout is a common > > form of TCP-stack bug. > > So, Bruce might still be bothered with something like that, and/or > (for all he's given us of details) he might actually be talking about > a situation where Oracle will wait through severely prolonged outages > where PostgreSQL won't. The question is "what exactly is the network glitch"? Firewalls doing NAT frequently cleanup a little too much, namely connections that just have been idle for some time. Maybe Oracle has it's private li'l keepalive ping to avoid that? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
On Fri, Mar 01, 2002 at 02:40:06PM -0500, Tom Lane wrote: > > I have been wondering about whether Bruce's problem is firewall > misbehavior. In particular, if there's a NAT translation happening > anywhere between his client and his server, then the firewall could > break the connection by dropping that particular port mapping, which > perhaps it might do if there's no activity for awhile. In this case, > it might actually be that the default KEEPALIVE timeout of 2 hours is > too long for us :-(. (RFC 1122 says that the timeout shall be > configurable, but this requirement seems to be widely ignored.) If that turns out to be the case, sending null queries periodically should help some. Jeroen
what is your tcpdump looks like? I have been using Pgsql for about five years. I have not seen something like that. If you do have stateful firewall, may be it does not keep state for port 5432.I could be wrong. just a thought. anil anil Tom Lane wrote: > Bruce Badger <bbadger@BadgerSE.com> writes: > > The thing is that from their point of view, Oracle keeps going where > > PostgreSQL does not. Given that both are running over the same network, > > the response of these people could well be to simply move everything to > > Oracle. >
Thanks for that. I have passed the all the comments from the list back to the people with the problem. I'll pass your tcpdump suggestion on too. I'll post the outcome when I hear what it is. Thanks to you all. anil wrote: >what is your tcpdump looks like? I have been using Pgsql for about five years. >I have not seen something like that. If you do have stateful firewall, may be >it does not keep state for port 5432. > I could be wrong. just a thought. > >anil > >anil > >Tom Lane wrote: > >>Bruce Badger <bbadger@BadgerSE.com> writes: >> >>>The thing is that from their point of view, Oracle keeps going where >>>PostgreSQL does not. Given that both are running over the same network, >>>the response of these people could well be to simply move everything to >>>Oracle. >>> > > >---------------------------(end of broadcast)--------------------------- >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
Hello all Please how can I modify - insert, update and delete -records using libpq interfaces. I can use it to SELECT but cannot modify records. Thanks Arnold __________________________________________________ Do You Yahoo!? Try FREE Yahoo! Mail - the world's greatest free email! http://mail.yahoo.com/