Thread: Beating Oracle

Beating Oracle

From

Bruce Badger

Date:

28 February 2002, 21:51:20

I posted a question a while back but got no response :-(

Due to network glitch, a PostgreSQL connection over IP is being dropped.

In the same situation, Oracle can keep it's connections going.

Is there any way to keep PostgreSQL connections alive across glitches, 
or perhaps re-connect to an existing backend connection?

I have people using a mix of PostgreSQL and Oracle at the moment.  I 
suspect they will drop PostgreSQL if there is no answer to this issue.

BTW I am the author of the Smalltalk PostgreSQL library 
(http://sourceforge.net/projects/st-postgresql)

Thanks.

Re: Beating Oracle

From

Bruce Momjian

Date:

28 February 2002, 21:59:28

Bruce Badger wrote:
> I posted a question a while back but got no response :-(
> 
> Due to network glitch, a PostgreSQL connection over IP is being dropped.
> 
> In the same situation, Oracle can keep it's connections going.
> 
> Is there any way to keep PostgreSQL connections alive across glitches, 
> or perhaps re-connect to an existing backend connection?
> 
> I have people using a mix of PostgreSQL and Oracle at the moment.  I 
> suspect they will drop PostgreSQL if there is no answer to this issue.
> 
> BTW I am the author of the Smalltalk PostgreSQL library 
> (http://sourceforge.net/projects/st-postgresql)

Wow, the TCP connection gets dropped and Oracle somehow reestablishes
the connection?  That is a nice feature.

TCP has it's own restart logic that will handle the dropping of some
packets, but if the TCP connection itself shuts down, PostgreSQL has no
way of reestablishing it.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Beating Oracle

From

Tom Lane

Date:

01 March 2002, 01:49:44

Bruce Badger <bbadger@BadgerSE.com> writes:
> Due to network glitch, a PostgreSQL connection over IP is being dropped.

Perhaps there is something wrong with your TCP stacks.  I do not think
it's Postgres' responsibility to second-guess the network transport
code...
        regards, tom lane

Re: Beating Oracle

From

Bruce Badger

Date:

01 March 2002, 02:02:11

Tom Lane wrote:

>Bruce Badger <bbadger@BadgerSE.com> writes:
>
>>Due to network glitch, a PostgreSQL connection over IP is being dropped.
>>
>
>Perhaps there is something wrong with your TCP stacks.  I do not think
>it's Postgres' responsibility to second-guess the network transport
>code...
>
>            regards, tom lane
>
Not *my* stack, BTW, but that aside I agree with you.  The people 
involved suspect that a firewall is causing the problem.

The thing is that from their point of view, Oracle keeps going where 
PostgreSQL does not.  Given that both are running over the same network, 
the response of these people could well be to simply move everything to 
Oracle.

The question, then, is: what can we PostgreSQL people can do to match 
this Oracle ability??

Re: Beating Oracle

From

Bruce Momjian

Date:

01 March 2002, 02:06:08

Bruce Badger wrote:
> Tom Lane wrote:
> 
> >Bruce Badger <bbadger@BadgerSE.com> writes:
> >
> >>Due to network glitch, a PostgreSQL connection over IP is being dropped.
> >>
> >
> >Perhaps there is something wrong with your TCP stacks.  I do not think
> >it's Postgres' responsibility to second-guess the network transport
> >code...
> >
> >            regards, tom lane
> >
> Not *my* stack, BTW, but that aside I agree with you.  The people 
> involved suspect that a firewall is causing the problem.
> 
> The thing is that from their point of view, Oracle keeps going where 
> PostgreSQL does not.  Given that both are running over the same network, 
> the response of these people could well be to simply move everything to 
> Oracle.
> 
> The question, then, is: what can we PostgreSQL people can do to match 
> this Oracle ability??

Unless there is more information, nothing.  This is the first time I
have heard of this in my 6 years with PostgreSQL.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Beating Oracle

From

Bruce Badger

Date:

01 March 2002, 02:16:31

Bruce Momjian wrote:

>Bruce Badger wrote:
>
>>Tom Lane wrote:
>>
>>>Bruce Badger <bbadger@BadgerSE.com> writes:
>>>
>>>>Due to network glitch, a PostgreSQL connection over IP is being dropped.
>>>>
>>>Perhaps there is something wrong with your TCP stacks.  I do not think
>>>it's Postgres' responsibility to second-guess the network transport
>>>code...
>>>
>>>            regards, tom lane
>>>
>>Not *my* stack, BTW, but that aside I agree with you.  The people 
>>involved suspect that a firewall is causing the problem.
>>
>>The thing is that from their point of view, Oracle keeps going where 
>>PostgreSQL does not.  Given that both are running over the same network, 
>>the response of these people could well be to simply move everything to 
>>Oracle.
>>
>>The question, then, is: what can we PostgreSQL people can do to match 
>>this Oracle ability??
>>
>
>Unless there is more information, nothing.  This is the first time I
>have heard of this in my 6 years with PostgreSQL.
>
OK - now I have spoken with you sages, I will indeed go back and see if 
these people have any more information.

Many thanks.

Re: Beating Oracle

From

buhrow@lothlorien.nfbcal.org (Brian Buhrow)

Date:

01 March 2002, 02:23:57

    Hello Bruce.  I'd be willing to bet that if a firewall
is involved, it's been especially tuned for Oracle, whereas special rules
weren't set for Postgres.  As soon as those rules get put in, things will
probably be very equivalent.
-Brian
On Mar 2, 10:15am, Bruce Badger wrote:
} Subject: Re: Beating Oracle
} Bruce Momjian wrote:
} 
} >Bruce Badger wrote:
} >
} >>Tom Lane wrote:
} >>
} >>>Bruce Badger <bbadger@BadgerSE.com> writes:
} >>>
} >>>>Due to network glitch, a PostgreSQL connection over IP is being dropped.
} >>>>
} >>>Perhaps there is something wrong with your TCP stacks.  I do not think
} >>>it's Postgres' responsibility to second-guess the network transport
} >>>code...
} >>>
} >>>            regards, tom lane
} >>>
} >>Not *my* stack, BTW, but that aside I agree with you.  The people 
} >>involved suspect that a firewall is causing the problem.
} >>
} >>The thing is that from their point of view, Oracle keeps going where 
} >>PostgreSQL does not.  Given that both are running over the same network, 
} >>the response of these people could well be to simply move everything to 
} >>Oracle.
} >>
} >>The question, then, is: what can we PostgreSQL people can do to match 
} >>this Oracle ability??
} >>
} >
} >Unless there is more information, nothing.  This is the first time I
} >have heard of this in my 6 years with PostgreSQL.
} >
} OK - now I have spoken with you sages, I will indeed go back and see if 
} these people have any more information.
} 
} Many thanks.
} 
} 
} 
} ---------------------------(end of broadcast)---------------------------
} TIP 4: Don't 'kill -9' the postmaster
>-- End of excerpt from Bruce Badger

Re: Beating Oracle

From

Tom Lane

Date:

01 March 2002, 02:27:21

Bruce Badger <bbadger@BadgerSE.com> writes:
> The thing is that from their point of view, Oracle keeps going where 
> PostgreSQL does not.  Given that both are running over the same network, 
> the response of these people could well be to simply move everything to 
> Oracle.

<shrug>  If that feature makes Oracle worth its price to them, then
no doubt that's what they'll do.  Postgres doesn't have 6000 full-time
developers engaged in inventing bizarre features (and a substitute for
something that should be done at the network level is bizarre in my
book).

The reason I'm so unenthused about this is that I think it'd be quite
difficult to do transparently at the database level.  (1) How can you
know where communication was lost, from the other end's point of view?
(eg, client has no idea whether database completed the last request
sent.)  I think you'd wind up reimplementing TCP, or something much like
it, just to deal with that aspect of the problem.  (2) How can you even
be sure that you're reconnecting to the same other-end session?
Opportunities for security holes seem rampant.  (3) Should the database
hold sessions open indefinitely, waiting to see if the client can be
recontacted?  Won't take too long until your session limit is entirely
filled by dead connections.

Non-broken TCP stacks are normally quite tenacious about surviving
transmission glitches in already-open connections.  I think you ought
to be looking for solutions at that level, not trying to tell us that
it's our problem.
        regards, tom lane

Re: Beating Oracle

From

Bruce Badger

Date:

01 March 2002, 02:48:05

Tom Lane wrote:

>Bruce Badger <bbadger@BadgerSE.com> writes:
>
>>The thing is that from their point of view, Oracle keeps going where 
>>PostgreSQL does not.  Given that both are running over the same network, 
>>the response of these people could well be to simply move everything to 
>>Oracle.
>>
>
><shrug>  If that feature makes Oracle worth its price to them, then
>no doubt that's what they'll do.  Postgres doesn't have 6000 full-time
>developers engaged in inventing bizarre features (and a substitute for
>something that should be done at the network level is bizarre in my
>book).
>
>The reason I'm so unenthused about this is that I think it'd be quite
>difficult to do transparently at the database level.  (1) How can you
>know where communication was lost, from the other end's point of view?
>(eg, client has no idea whether database completed the last request
>sent.)  I think you'd wind up reimplementing TCP, or something much like
>it, just to deal with that aspect of the problem.  (2) How can you even
>be sure that you're reconnecting to the same other-end session?
>Opportunities for security holes seem rampant.  (3) Should the database
>hold sessions open indefinitely, waiting to see if the client can be
>recontacted?  Won't take too long until your session limit is entirely
>filled by dead connections.
>
>Non-broken TCP stacks are normally quite tenacious about surviving
>transmission glitches in already-open connections.  I think you ought
>to be looking for solutions at that level, not trying to tell us that
>it's our problem.
>
>            regards, tom lane
>
Ah, Tom.  I'm not demanding anything, nor saying this is any 
individual's problem.  I was merely seeking the wisdom of others, and I 
am grateful for your feedback (apart from the bits that slap me for 
something I have not done ;-) ).  

FWIW,  I too have invested a great deal of my own time in making 
PostgreSQL available to others, having built and supported the Smalltalk 
PostgreSQL library.

Re: Beating Oracle

From

Date:

01 March 2002, 11:06:47

> The reason I'm so unenthused about this is that I think it'd be quite
> difficult to do transparently at the database level.  (1) How can you
> know where communication was lost, from the other end's point of view?

Actually, if we're talking about transient TCP failures here (ie. one in X 
connections dies halfway through a transaction), then [1]libpqxx does have 
the logic built-in for semi-transparent connection recovery.

I'm not sure this is quite what is needed here, but what it does right now 
is reset the backend connection (PQreset() I think the function was), 
forget about the current attempt at performing the transaction, and re-
running the client code that makes up the transaction.  So we're not 
talking about a simple replay of old SQL, but a simple framework for 
writing restartable code that "plugs into" a connection.  Look for the 
Transactor base class.

If PQreset() is not sufficient, I see no fundamental problem in doing an 
entire new connect instead.

The one underlying assumption of course is that the client side will know 
if the connection was lost.  If TCP and libpq guarantee that much, then the 
libpqxx mechanism should solve the problem perfectly.  All we need now is 
two-phase commit.  :-)


> (eg, client has no idea whether database completed the last request
> sent.)  I think you'd wind up reimplementing TCP, or something much
> like it, just to deal with that aspect of the problem.  (2) How can you
> even be sure that you're reconnecting to the same other-end session?
> Opportunities for security holes seem rampant.  (3) Should the database
> hold sessions open indefinitely, waiting to see if the client can be
> recontacted?  Won't take too long until your session limit is entirely
> filled by dead connections.

Basically, libpqxx solves these problems at the client side--it doesn't try 
to reconnect to an older session, but simply leaves the old session to 
wither and die while it starts up a new one to perform the same task.  This 
takes very little in the way of state-keeping on either side.


Jeroen

[1] http://members.ams.chello.nl/j.vermeulen31/proj-libpqxx.html

Re: Beating Oracle

From

Jan Wieck

Date:

01 March 2002, 12:50:43

Bruce Momjian wrote:
> TCP has it's own restart logic that will handle the dropping of some
> packets, but if the TCP connection itself shuts down, PostgreSQL has no
> way of reestablishing it.
   Yupp, and it already has spent some time trying to retransmit   lost packets in order to keep this connection
alive, before   telling "sorry, lost contact".

   Do  we  want to decide dynamically, maybe based on the number   of row locks the  current  transaction  holds,  what
a  good   amount  of  time  for trying to reconnect is? I mean, someone   else might wait for these rows to be
available again.  Maybe   even  the  same  user,  because  his  client  library decided   earlier that  the  connection
is  lost,  unfortunately  some   "reestablish  packets"  drowned  in  the net too, but his app   finally succeeded in
connectingto a new backend.

   If you have connection aborts,  the  client  and  application   code  is  the wrong place to fix. In the "golden
era",95% of   all connectivity problems used to be cables or  plugs.  These   days,  firewalls  take  care  that
increased reliability of   cables   and   plugs   doesn't   increase   overall   network   reliability.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

Re: Beating Oracle

From

Tom Lane

Date:

01 March 2002, 12:55:31

<jtv@xs4all.nl> writes:
> Basically, libpqxx solves these problems at the client side--it doesn't try 
> to reconnect to an older session, but simply leaves the old session to 
> wither and die while it starts up a new one to perform the same task.  This 
> takes very little in the way of state-keeping on either side.

This scares the hell out of me.  How do you know whether the old session
committed your transaction, but the connection died just before it could
tell you so?  libpqxx is the very LAST place in the chain that is
qualified to undertake the task of error recovery.

There are cases where the best way to design the client code is to make
it automatically rollback and then retry whole transactions.  (In
particular, if you use SERIALIZABLE transaction mode then this is more
or less essential to cope with concurrency violations.)  If you've got a
client that's built that way, then it's a pretty simple matter to extend
the client to do the same sort of retry for connection-level failures,
too.  But I'd still be worried about the prospect of double execution of
a transaction, if the link was lost just after you issued a COMMIT and
just before the backend could tell you it had committed.  So you'd also
need to add code to verify you hadn't already executed the transaction.

In any case, I do not think that libpq or libpqxx can or should try to
hide this problem from the client.
        regards, tom lane

Re: Beating Oracle

From

Tom Ivar Helbekkmo

Date:

01 March 2002, 16:48:50

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Postgres doesn't have 6000 full-time developers engaged in inventing
> bizarre features (and a substitute for something that should be done
> at the network level is bizarre in my book).

May be, though, it's just a matter of not setting SO_KEEPALIVE?  In my
experience, the difference between always keeping and spontaneously
losing a connection during a network outage is often this option.  In
src/backend/libpq/pqcomm.c, we find (reformatted slightly here):
  /* select NODELAY and KEEPALIVE options if it's a TCP connection */  if (port->laddr.sa.sa_family == AF_INET)  {
inton = 1;    if (setsockopt(port->sock, IPPROTO_TCP, TCP_NODELAY,           (char *) &on, sizeof(on)) < 0)      {
 perror("postmaster: StreamConnection: setsockopt(TCP_NODELAY)");        return STATUS_ERROR;      }    if
(setsockopt(port->sock,SOL_SOCKET, SO_KEEPALIVE,           (char *) &on, sizeof(on)) < 0)      {
perror("postmaster:StreamConnection: setsockopt(SO_KEEPALIVE)");        return STATUS_ERROR;      }  }

And in my nearest manual entry on setsockopt(), I find:
  SO_KEEPALIVE enables the periodic transmission of messages on a  connected socket.  Should the connected party fail
torespond to  these messages, the connection is considered broken and processes  using the socket are notified via a
SIGPIPEsignal when attempting  to send data.

Betcha if Bruce commented out that second setsockopt() block, he'd
suddenly see his PostgreSQL connections stay up, just like the Oracle
ones!  :-)

> Non-broken TCP stacks are normally quite tenacious about surviving
> transmission glitches in already-open connections.

...but you can turn part of that off by turning on SO_KEEPALIVE.

-tih
-- 
Puritanism -- the haunting fear that someone, somewhere, may be happy.

Re: Beating Oracle

From

Tom Ivar Helbekkmo

Date:

01 March 2002, 16:54:19

Bruce Badger <bbadger@BadgerSE.com> writes:

> Due to network glitch, a PostgreSQL connection over IP is being dropped.
> 
> In the same situation, Oracle can keep it's connections going.

I mentioned the SO_KEEPALIVE socket option in another email, but the
other possibility here is, of course, that the PostgreSQL connection
might be receiving an ICMP error message that Oracle isn't.  This
would be due to different handling of the two protocols in a firewall
or something -- and whether it happens or not would be something you
could find out using tcpdump.

-tih
-- 
Puritanism -- the haunting fear that someone, somewhere, may be happy.

Re: Beating Oracle

From

Tom Lane

Date:

01 March 2002, 17:15:34

Tom Ivar Helbekkmo <tih@kpnQwest.no> writes:
>> Non-broken TCP stacks are normally quite tenacious about surviving
>> transmission glitches in already-open connections.

> ...but you can turn part of that off by turning on SO_KEEPALIVE.

In an RFC-compliant stack, the outage interval required before KEEPALIVE
will kill the connection is of the order of hours.  RFC 1122 specifies
that the minimum interval before the first probe is even sent is 2 hours
(since last activity on connection), and that a single failed probe is
not sufficient reason to drop the connection.

RFC 2525 does note that excessively short keepalive timeout is a common
form of TCP-stack bug.
        regards, tom lane

Re: Beating Oracle

From

Tom Ivar Helbekkmo

Date:

01 March 2002, 17:23:09

Tom Lane <tgl@sss.pgh.pa.us> writes:

> In an RFC-compliant stack, the outage interval required before KEEPALIVE
> will kill the connection is of the order of hours.  RFC 1122 specifies
> that the minimum interval before the first probe is even sent is 2 hours
> (since last activity on connection), and that a single failed probe is
> not sufficient reason to drop the connection.

Ah, yes, I see that now.  On NetBSD, it's four hours until the first
keepalive, then eight missed ones at 150 second intervals (totalling
20 minutes) are required before the connection is considered dead.

> RFC 2525 does note that excessively short keepalive timeout is a common
> form of TCP-stack bug.

So, Bruce might still be bothered with something like that, and/or
(for all he's given us of details) he might actually be talking about
a situation where Oracle will wait through severely prolonged outages
where PostgreSQL won't.

Details, Bruce?

-tih
-- 
Puritanism -- the haunting fear that someone, somewhere, may be happy.

Re: Beating Oracle

From

Tom Lane

Date:

01 March 2002, 17:40:38

Tom Ivar Helbekkmo <tih@kpnQwest.no> writes:
> So, Bruce might still be bothered with something like that, and/or
> (for all he's given us of details) he might actually be talking about
> a situation where Oracle will wait through severely prolonged outages
> where PostgreSQL won't.

I believe that Oracle uses its own networking code (SQLNet) which might
well have different error response behavior than TCP does.  Still, it's
hard to believe that even a broken TCP stack will not hold connections
through short- or moderate-duration network glitches; and Bruce did not
say that he was trying to deal with multiple-hour outages.

I have been wondering about whether Bruce's problem is firewall
misbehavior.  In particular, if there's a NAT translation happening
anywhere between his client and his server, then the firewall could
break the connection by dropping that particular port mapping, which
perhaps it might do if there's no activity for awhile.  In this case,
it might actually be that the default KEEPALIVE timeout of 2 hours is
too long for us :-(.  (RFC 1122 says that the timeout shall be
configurable, but this requirement seems to be widely ignored.)

As for why Oracle doesn't suffer the same problem, someone suggested
that the firewall might be specially configured not to drop Oracle
connections (or perhaps to pass them through without NAT mapping).
I don't know enough about SQLNet to know what to look for.
        regards, tom lane

Re: Beating Oracle

From

Jan Wieck

Date:

01 March 2002, 19:19:37

Tom Ivar Helbekkmo wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>
> > In an RFC-compliant stack, the outage interval required before KEEPALIVE
> > will kill the connection is of the order of hours.  RFC 1122 specifies
> > that the minimum interval before the first probe is even sent is 2 hours
> > (since last activity on connection), and that a single failed probe is
> > not sufficient reason to drop the connection.
>
> Ah, yes, I see that now.  On NetBSD, it's four hours until the first
> keepalive, then eight missed ones at 150 second intervals (totalling
> 20 minutes) are required before the connection is considered dead.
>
> > RFC 2525 does note that excessively short keepalive timeout is a common
> > form of TCP-stack bug.
>
> So, Bruce might still be bothered with something like that, and/or
> (for all he's given us of details) he might actually be talking about
> a situation where Oracle will wait through severely prolonged outages
> where PostgreSQL won't.
   The  question  is  "what  exactly  is  the  network  glitch"?   Firewalls doing NAT frequently cleanup  a  little
too much,   namely  connections  that  just have been idle for some time.   Maybe Oracle has it's private li'l
keepalive ping  to  avoid   that?
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

Re: Beating Oracle

From

jtv

Date:

02 March 2002, 09:23:17

On Fri, Mar 01, 2002 at 02:40:06PM -0500, Tom Lane wrote:
> 
> I have been wondering about whether Bruce's problem is firewall
> misbehavior.  In particular, if there's a NAT translation happening
> anywhere between his client and his server, then the firewall could
> break the connection by dropping that particular port mapping, which
> perhaps it might do if there's no activity for awhile.  In this case,
> it might actually be that the default KEEPALIVE timeout of 2 hours is
> too long for us :-(.  (RFC 1122 says that the timeout shall be
> configurable, but this requirement seems to be widely ignored.)

If that turns out to be the case, sending null queries periodically
should help some.


Jeroen

Re: Beating Oracle

From

anil

Date:

07 March 2002, 12:04:34

what is your tcpdump looks like? I have been using Pgsql for about five years.
I have not seen something like that. If you do have stateful firewall, may be
it does not keep state for port 5432.I could be wrong. just a thought.

anil

anil

Tom Lane wrote:

> Bruce Badger <bbadger@BadgerSE.com> writes:
> > The thing is that from their point of view, Oracle keeps going where
> > PostgreSQL does not.  Given that both are running over the same network,
> > the response of these people could well be to simply move everything to
> > Oracle.
>

Re: Beating Oracle

From

Bruce Badger

Date:

07 March 2002, 20:14:53

Thanks for that.  I have passed the all the comments from the list back 
to the people with the problem.  I'll pass your tcpdump suggestion on 
too.  I'll post the outcome when I hear what it is.

Thanks to you all.

anil wrote:

>what is your tcpdump looks like? I have been using Pgsql for about five years.
>I have not seen something like that. If you do have stateful firewall, may be
>it does not keep state for port 5432.
> I could be wrong. just a thought.
>
>anil
>
>anil
>
>Tom Lane wrote:
>
>>Bruce Badger <bbadger@BadgerSE.com> writes:
>>
>>>The thing is that from their point of view, Oracle keeps going where
>>>PostgreSQL does not.  Given that both are running over the same network,
>>>the response of these people could well be to simply move everything to
>>>Oracle.
>>>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>

INSERTing and UPDATEing records with libpq

From

Mars T

Date:

12 March 2002, 15:43:50

Hello all

Please how can I modify - insert, update and delete
-records using libpq interfaces. I can use it to
SELECT but cannot modify records.

Thanks

Arnold



__________________________________________________
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/