Thread: Feature freeze date for 8.1

Feature freeze date for 8.1

From
Bruce Momjian
Date:
You might remember that when we released 8.0, the plan was to have a
12-month development cycle for 8.1, unless there were Win32 problems
that required complex fixes, in which case we would have a shorter 8.1
cycle.

Well the good news is that there have been almost no Win32 problems, but
the other good news is that we are getting a lot of powerful features
for 8.1 already:
o two-phase (Heikki Linnakangas, almost done)o multiple out function paramters (Tom, done)o bitmappted indexes (Tom,
almostdone)o shared row locks (Alvaro, almost done)o integrated auto-vacuum (Bruce)o buffer cache fixes for SMP (Tom,
done)

It is possible all these items will be done by sometime in June.  Now,
if that happens, do we want to continue with the 12-month plan or
shorten the 8.1 release cycle, perhaps targeting a release in the
September/October timeframe?

The current core proposal is to do feature freeze on July 1, with the
understanding that we will be done most of the items above by then and
have the outstanding patches applied.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Feature freeze date for 8.1

From
Bruno Wolff III
Date:
On Thu, Apr 28, 2005 at 09:02:40 -0400, Bruce Momjian <pgman@candle.pha.pa.us> wrote:
> Well the good news is that there have been almost no Win32 problems, but
> the other good news is that we are getting a lot of powerful features
> for 8.1 already:

You forgot to list the indexed aggregate feature for max and min. While
this isn't that important for experienced postgres users, it is a gotcha
for new users. Between this, integrated autovacuum and the cross type
index changes in 8.0 we have covered almost all of the newbie gotchas.
This should make Postgres effectively equivalent in difficulty with
getting started for new users as MySQL. That could significantly
boost usage for low end use once word gets out.


Re: Feature freeze date for 8.1

From
Rob Butler
Date:
As a user, I would definetly prefer to see 8.1
released sooner with the feature set listed below,
than wait another 6+ months for a few other features. 
Additionally, the beta may go smoother/faster if you
don't have too many huge features going in at once.

Just my opinion.
Later
Rob
--- Bruce Momjian <pgman@candle.pha.pa.us> wrote:
> You might remember that when we released 8.0, the
> plan was to have a
> 12-month development cycle for 8.1, unless there
> were Win32 problems
> that required complex fixes, in which case we would
> have a shorter 8.1
> cycle.
> 
> Well the good news is that there have been almost no
> Win32 problems, but
> the other good news is that we are getting a lot of
> powerful features
> for 8.1 already:
> 
>     o two-phase (Heikki Linnakangas, almost done)
>     o multiple out function paramters (Tom, done)
>     o bitmappted indexes (Tom, almost done)
>     o shared row locks (Alvaro, almost done)
>     o integrated auto-vacuum (Bruce)
>     o buffer cache fixes for SMP (Tom, done)
> 
> It is possible all these items will be done by
> sometime in June.  Now,
> if that happens, do we want to continue with the
> 12-month plan or
> shorten the 8.1 release cycle, perhaps targeting a
> release in the
> September/October timeframe?
> 
> The current core proposal is to do feature freeze on
> July 1, with the
> understanding that we will be done most of the items
> above by then and
> have the outstanding patches applied.
> 
> -- 
>   Bruce Momjian                        | 
> http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610)
> 359-1001
>   +  If your life is a hard drive,     |  13 Roberts
> Road
>   +  Christ can be your backup.        |  Newtown
> Square, Pennsylvania 19073
> 
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose
> an index scan if your
>       joining column's datatypes do not match
> 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Re: Feature freeze date for 8.1

From
Andreas Pflug
Date:
Bruce Momjian wrote:
> You might remember that when we released 8.0, the plan was to have a
> 12-month development cycle for 8.1, unless there were Win32 problems
> that required complex fixes, in which case we would have a shorter 8.1
> cycle.
> 
> Well the good news is that there have been almost no Win32 problems, but
> the other good news is that we are getting a lot of powerful features
> for 8.1 already:
> 
>     o two-phase (Heikki Linnakangas, almost done)
>     o multiple out function paramters (Tom, done)
>     o bitmappted indexes (Tom, almost done)
>     o shared row locks (Alvaro, almost done)
>     o integrated auto-vacuum (Bruce)
>     o buffer cache fixes for SMP (Tom, done)
> 
> It is possible all these items will be done by sometime in June.  Now,
> if that happens, do we want to continue with the 12-month plan or
> shorten the 8.1 release cycle, perhaps targeting a release in the
> September/October timeframe?
> 
> The current core proposal is to do feature freeze on July 1, with the
> understanding that we will be done most of the items above by then and
> have the outstanding patches applied.

It seems to be a good idea to take the chance now to make a release. 
Delaying the release would mean preventing the wide usage of features 
although they appear production grade. OTOH, if feature freeze is 
delayed some seemingly essential features for 8.1 which might arise in 
the meantime might delay the release further, or induce late feature 
exclusion when some last minute issues are discovered. Integrated 
autovacuum seems good enough a reason to release early.

Regards,
Andreas



Re: Feature freeze date for 8.1

From
Christopher Browne
Date:
In the last exciting episode, pgman@candle.pha.pa.us (Bruce Momjian) wrote:
>     o integrated auto-vacuum (Bruce)

If this can kick off a vacuum of a Very Large Table at an unfortunate
time, this can turn out to be a prety painful misfeature.

What I'd _really_ love to see (and alas, it's beyond my ken) is some
parallel to the FSM, namely a "Recently Updated Blocks Map," which
would enable a vacuuming approach that would not go through entire
tables, but which would rather go through only those blocks known to
be recently updated.

There continues to be trouble if you have a table that grows to 50
million rows where there are 100K rows that are being heavily
updated.  In effect, only the 100K rows need facuuming.
-- 
(reverse (concatenate 'string "moc.liamg" "@" "enworbbc"))
http://linuxfinances.info/info/emacs.html
Group Dynamics
"Following Minsky and Schelling, consider a person as a society of
agents. A group is then a larger society of such agents. Understand
groups by examining interactions of coalitions of agents that
cross-cut their grouping into people."
-- Mark Miller


Re: Feature freeze date for 8.1

From
Tom Lane
Date:
Christopher Browne <cbbrowne@acm.org> writes:
> In the last exciting episode, pgman@candle.pha.pa.us (Bruce Momjian) wrote:
>> o integrated auto-vacuum (Bruce)

> If this can kick off a vacuum of a Very Large Table at an unfortunate
> time, this can turn out to be a prety painful misfeature.

[ shrug... ]  You'll always be able to turn it off if you don't want it.
I'm not sure that we'll be ready to turn it on by default even in 8.1.
        regards, tom lane


Re: Feature freeze date for 8.1

From
Bruce Momjian
Date:
Tom Lane wrote:
> Christopher Browne <cbbrowne@acm.org> writes:
> > In the last exciting episode, pgman@candle.pha.pa.us (Bruce Momjian) wrote:
> >> o integrated auto-vacuum (Bruce)
> 
> > If this can kick off a vacuum of a Very Large Table at an unfortunate
> > time, this can turn out to be a prety painful misfeature.
> 
> [ shrug... ]  You'll always be able to turn it off if you don't want it.
> I'm not sure that we'll be ready to turn it on by default even in 8.1.

Agreed.  It will just be there to turn on from postgresql.conf if you
want it, and we do have TODO information about keeping such an FSM for
recently expired pages.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Feature freeze date for 8.1

From
Bruno Wolff III
Date:
On Fri, Apr 29, 2005 at 10:09:43 -0400, Bruce Momjian <pgman@candle.pha.pa.us> wrote:
> Tom Lane wrote:
> > Christopher Browne <cbbrowne@acm.org> writes:
> > > In the last exciting episode, pgman@candle.pha.pa.us (Bruce Momjian) wrote:
> > >> o integrated auto-vacuum (Bruce)
> > 
> > > If this can kick off a vacuum of a Very Large Table at an unfortunate
> > > time, this can turn out to be a prety painful misfeature.
> > 
> > [ shrug... ]  You'll always be able to turn it off if you don't want it.
> > I'm not sure that we'll be ready to turn it on by default even in 8.1.
> 
> Agreed.  It will just be there to turn on from postgresql.conf if you
> want it, and we do have TODO information about keeping such an FSM for
> recently expired pages.

I think if we aren't finding problems in testing that it would be better
to turn pg_autovacuum on by default. Vacuum is something the burns new
users and having it one by default is going to cut down on surprises.
Experienced users know about vacuum and will probably read the release
notes when upgrading and do something that is appropiate for them.


Re: Feature freeze date for 8.1

From
"Matthew T. O'Connor"
Date:
Tom Lane wrote:

>Christopher Browne <cbbrowne@acm.org> writes:
>  
>
>>If this can kick off a vacuum of a Very Large Table at an unfortunate
>>time, this can turn out to be a prety painful misfeature.
>>    
>>
>
>[ shrug... ]  You'll always be able to turn it off if you don't want it.
>I'm not sure that we'll be ready to turn it on by default even in 8.1.
>  
>

What to people think about having an optional "maintenance window" so 
that autovac only takes action during an approved time.  But perhaps 
just using the vacuum delay settings will be enough.



Re: Feature freeze date for 8.1

From
"Marc G. Fournier"
Date:
On Fri, 29 Apr 2005, Bruno Wolff III wrote:

> On Fri, Apr 29, 2005 at 10:09:43 -0400,
>  Bruce Momjian <pgman@candle.pha.pa.us> wrote:
>> Tom Lane wrote:
>>> Christopher Browne <cbbrowne@acm.org> writes:
>>>> In the last exciting episode, pgman@candle.pha.pa.us (Bruce Momjian) wrote:
>>>>> o integrated auto-vacuum (Bruce)
>>>
>>>> If this can kick off a vacuum of a Very Large Table at an unfortunate
>>>> time, this can turn out to be a prety painful misfeature.
>>>
>>> [ shrug... ]  You'll always be able to turn it off if you don't want it.
>>> I'm not sure that we'll be ready to turn it on by default even in 8.1.
>>
>> Agreed.  It will just be there to turn on from postgresql.conf if you
>> want it, and we do have TODO information about keeping such an FSM for
>> recently expired pages.
>
> I think if we aren't finding problems in testing that it would be better
> to turn pg_autovacuum on by default. Vacuum is something the burns new
> users and having it one by default is going to cut down on surprises.

Except for the surprise of peridically having the system go unresponsive 
because it hit a large table, and that new user wondering what is wrong 
with postgresql that it just stalls seemingly randomly :(

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664


Re: Feature freeze date for 8.1

From
Bruno Wolff III
Date:
On Fri, Apr 29, 2005 at 12:43:37 -0300, "Marc G. Fournier" <scrappy@postgresql.org> wrote:
> On Fri, 29 Apr 2005, Bruno Wolff III wrote:
> 
> Except for the surprise of peridically having the system go unresponsive 
> because it hit a large table, and that new user wondering what is wrong 
> with postgresql that it just stalls seemingly randomly :(

I think most users running systems with that large of tables will know
what they are doing. And will be more careful with vacuum.

Vacuum running for a long time on large tables with few pages that need
updating is really a separate problem that could use its own solution.


Re: Feature freeze date for 8.1

From
Tom Lane
Date:
"Matthew T. O'Connor" <matthew@zeut.net> writes:
> What to people think about having an optional "maintenance window" so 
> that autovac only takes action during an approved time.  But perhaps 
> just using the vacuum delay settings will be enough.

I'm not sure autovac should go completely catatonic during the day;
what if someone does an unusual mass deletion, or something?  But
it does seem pretty reasonable to have a notion of a maintenance
window where it should be more active than it is at other times.

Maybe what you want is two complete sets of autovac parameters.
Definitely at least two sets of the vacuum-delay values.
        regards, tom lane


Re: Feature freeze date for 8.1

From
"Marc G. Fournier"
Date:
On Fri, 29 Apr 2005, Tom Lane wrote:

> "Matthew T. O'Connor" <matthew@zeut.net> writes:
>> What to people think about having an optional "maintenance window" so
>> that autovac only takes action during an approved time.  But perhaps
>> just using the vacuum delay settings will be enough.
>
> I'm not sure autovac should go completely catatonic during the day;
> what if someone does an unusual mass deletion, or something?  But
> it does seem pretty reasonable to have a notion of a maintenance
> window where it should be more active than it is at other times.
>
> Maybe what you want is two complete sets of autovac parameters.
> Definitely at least two sets of the vacuum-delay values.

With the introduction of the stats collector, is there not some way of 
extending it so that autovac has more information to work off of?  For 
instance, in my environment, we have clients in every timezone hitting the 
database ... our Japanese clients will be busy at a totally different time 
of day then our East Coast/NA clients, so a 'maintenance window' is near 
impossible to state ...

I know one person was talking about being able to target only those that 
pages that have changes, instead of the whole table ... but some sort of 
"load monitoring" that checks # of active connections and tries to find 
'lulls'?

Basically, everything right now is being keyed to updates to the tables 
themselves, but isn't looking at what the system itself is doing ... if 
I'm doing a massive import of data into a table, the last time I want is a 
VACUUM to cut in and slow down the loading ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664


Re: Feature freeze date for 8.1

From
"Jim C. Nasby"
Date:
I think what you're suggesting is that vacuum settings (most likely
delay) take into consideration the load on the database, which I think
is a great idea. One possibility is if vacuum tracks how many blocks
it's read/written, it can see how many blocks the database has done
overall; subtract the two and you know how much other disk IO is going
on in the system. You can then use that number to decide how long you'll
sleep before the next vacuum cycle.

On Fri, Apr 29, 2005 at 01:34:56PM -0300, Marc G. Fournier wrote:
> On Fri, 29 Apr 2005, Tom Lane wrote:
> 
> >"Matthew T. O'Connor" <matthew@zeut.net> writes:
> >>What to people think about having an optional "maintenance window" so
> >>that autovac only takes action during an approved time.  But perhaps
> >>just using the vacuum delay settings will be enough.
> >
> >I'm not sure autovac should go completely catatonic during the day;
> >what if someone does an unusual mass deletion, or something?  But
> >it does seem pretty reasonable to have a notion of a maintenance
> >window where it should be more active than it is at other times.
> >
> >Maybe what you want is two complete sets of autovac parameters.
> >Definitely at least two sets of the vacuum-delay values.
> 
> With the introduction of the stats collector, is there not some way of 
> extending it so that autovac has more information to work off of?  For 
> instance, in my environment, we have clients in every timezone hitting the 
> database ... our Japanese clients will be busy at a totally different time 
> of day then our East Coast/NA clients, so a 'maintenance window' is near 
> impossible to state ...
> 
> I know one person was talking about being able to target only those that 
> pages that have changes, instead of the whole table ... but some sort of 
> "load monitoring" that checks # of active connections and tries to find 
> 'lulls'?
> 
> Basically, everything right now is being keyed to updates to the tables 
> themselves, but isn't looking at what the system itself is doing ... if 
> I'm doing a massive import of data into a table, the last time I want is a 
> VACUUM to cut in and slow down the loading ...
> 
> ----
> Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
> Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 

-- 
Jim C. Nasby, Database Consultant               decibel@decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


Re: Feature freeze date for 8.1

From
Christopher Browne
Date:
Martha Stewart called it a Good Thing when scrappy@postgresql.org ("Marc G. Fournier") wrote:
> I know one person was talking about being able to target only those
> that pages that have changes, instead of the whole table ... but some
> sort of "load monitoring" that checks # of active connections and
> tries to find 'lulls'?

I have some "log table purging" processes I'd like to put in place; it
would be really slick to be able to get some statistics from the
system as to how busy the DB has been in the last little while.  

The nice, adaptive algorithm:

- Loop forever
 - Once a minute, evaluate how busy things seem, giving some metric X
  -> If X is "high" then purge 10 elderly tuples from table log_table  -> If X is "moderate" then purge 100 elderly
tuplesfrom table     log_table  -> If X is "low" then purge 1000 elderly tuples from table     log_table
 

The trouble is in measuring some form of "X."

Some reasonable approximations might include:- How much disk I/O was recorded in the last 60 seconds?- How many
applicationtransactions (e.g. - invoices or such) were   issued in the last 60 seconds (monitoring a sequence could be
goodenough).
 
-- 
output = reverse("gro.mca" "@" "enworbbc")
http://linuxfinances.info/info/slony.html
?OM ERROR


Re: Feature freeze date for 8.1

From
"Marc G. Fournier"
Date:
On Fri, 29 Apr 2005, Christopher Browne wrote:

> Martha Stewart called it a Good Thing when scrappy@postgresql.org ("Marc G. Fournier") wrote:
>> I know one person was talking about being able to target only those
>> that pages that have changes, instead of the whole table ... but some
>> sort of "load monitoring" that checks # of active connections and
>> tries to find 'lulls'?
>
> I have some "log table purging" processes I'd like to put in place; it
> would be really slick to be able to get some statistics from the
> system as to how busy the DB has been in the last little while.
>
> The nice, adaptive algorithm:
>
> - Loop forever
>
>  - Once a minute, evaluate how busy things seem, giving some metric X
>
>   -> If X is "high" then purge 10 elderly tuples from table log_table
>   -> If X is "moderate" then purge 100 elderly tuples from table
>      log_table
>   -> If X is "low" then purge 1000 elderly tuples from table
>      log_table
>
> The trouble is in measuring some form of "X."
>
> Some reasonable approximations might include:
> - How much disk I/O was recorded in the last 60 seconds?
> - How many application transactions (e.g. - invoices or such) were
>   issued in the last 60 seconds (monitoring a sequence could be
>   good enough).

Some way of doing a 'partial vacuum' would be nice ... where a VACUUM 
could stop after it processed those '10 elderly tuples' and on the next 
pass, resume from that point instead of starting from the beginning again 
...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664


Re: Feature freeze date for 8.1

From
"Matthew T. O'Connor"
Date:
Marc G. Fournier wrote:

> On Fri, 29 Apr 2005, Christopher Browne wrote:
>
>> Some reasonable approximations might include:
>> - How much disk I/O was recorded in the last 60 seconds?
>> - How many application transactions (e.g. - invoices or such) were
>>   issued in the last 60 seconds (monitoring a sequence could be
>>   good enough).
>
>
> Some way of doing a 'partial vacuum' would be nice ... where a VACUUM 
> could stop after it processed those '10 elderly tuples' and on the 
> next pass, resume from that point instead of starting from the 
> beginning again ...


That is sorta what the vacuum delay settings accomplish.


Re: Feature freeze date for 8.1

From
"Sander Steffann"
Date:
Hi,

> What to people think about having an optional "maintenance window" so 
> that autovac only takes action during an approved time.

This sounds like a realy good idea to me!
Sander.




Re: Feature freeze date for 8.1

From
Date:
We have talked about performance and some new features
before freeze of 8.1. Like ;

·        Bitmap indexes
·        Autovacuum
·        GIS features
·        Object-Oriented features
·        PITR
·        Table Partition
   But there is a feature that is too important for a
database. It is availability.Now PostgreSQL doesn't have
high availability.We must discuss it here. Imagine a
database that has a lots of features that others don’t
have. I tested the PostgreSQL for that feature, i couldn't
find it enough. Here :
   Process A start to update / insert some rows in a table
and then the connection of process A is lost to PostgreSQL
before it sends commit or rollback. Other processes want to
update the same rows or SELECT …..FOR UPDATE for the same
rows.Now these processes are providing SELECT WAITING… or
CANCEL QUERY if statement_timeout was set. Imagine these
processes is getting grower. What will we do now ?
Restarting backend or finding process A and kill it ?
  Now, do you think that the PostgreSQL database is a high
available database ? A feature must be added to solve that
problem or PostgreSQL databases would never get a good
place among huge databases.
Best Regards
Adnan DURSUNASRIN Bilişim Ltd.Ankara /TURKEY  


Re: Feature freeze date for 8.1

From
Alvaro Herrera
Date:
On Sun, May 01, 2005 at 03:09:37PM +0300, adnandursun@asrinbilisim.com.tr wrote:

>     Process A start to update / insert some rows in a table
> and then the connection of process A is lost to PostgreSQL
> before it sends commit or rollback. Other processes want to
> update the same rows or SELECT …..FOR UPDATE for the same
> rows.Now these processes are providing SELECT WAITING… or
> CANCEL QUERY if statement_timeout was set. Imagine these
> processes is getting grower. What will we do now ?
> Restarting backend or finding process A and kill it ?

Well, if process A loses the connection to the client, then the
transaction will be rolled back and other processes will be able to
continue.

Another thing to keep in mind is that if process A is inserting a tuple,
other processes will not see it because it isn't committed.  So MVCC
rules protect them from blocking.  (Unless there is a unique restriction
and some other process wants to insert the same value to it.)

Now, we do have an "availability" problem in 8.0 and earlier, which is
that you could block trying to check a foreign key that other process is
also checking.  I am happy to say that it doesn't happen anymore so
that's one less barrier.

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"In fact, the basic problem with Perl 5's subroutines is that they're not
crufty enough, so the cruft leaks out into user-defined code instead, by
the Conservation of Cruft Principle."  (Larry Wall, Apocalypse 6)


Re: Feature freeze date for 8.1

From
Peter Eisentraut
Date:
Alvaro Herrera wrote:
> On Sun, May 01, 2005 at 03:09:37PM +0300, 
adnandursun@asrinbilisim.com.tr wrote:
> >     Process A start to update / insert some rows in a table
> > and then the connection of process A is lost to PostgreSQL
> > before it sends commit or rollback. Other processes want to
> > update the same rows or SELECT …..FOR UPDATE for the same
> > rows.Now these processes are providing SELECT WAITING… or
> > CANCEL QUERY if statement_timeout was set. Imagine these
> > processes is getting grower. What will we do now ?
> > Restarting backend or finding process A and kill it ?
>
> Well, if process A loses the connection to the client, then the
> transaction will be rolled back and other processes will be able to
> continue.

The problem, as I understand it, is that if you have a long-running 
query and the client process disappears, the query keeps running and 
holds whatever resources it may have until it finishes.  In fact, it 
keeps sending data to the client and keeps ignoring the SIGPIPE it gets 
(in case of a Unix-domain socket connection).

Now of course this has nothing to do with "high availability" and does 
not warrant hijacking a thread about the release schedule, but it may 
be worth investigating.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Feature freeze date for 8.1

From
Dennis Bjorklund
Date:
On Sun, 1 May 2005, Alvaro Herrera wrote:

> Well, if process A loses the connection to the client, then the
> transaction will be rolled back and other processes will be able to
> continue.

If the other end of a tcp/ip connection just disapears, for example if the
network cable is cut off then in linux it can take up to 2 hours as
default for it to close the connection. Normally if a client application
dies then the client OS cleans up and closes the socket so that the server
knows about it.

There are some settings that one can alter to change the time it waits 
before probing and killing the connection, ie tcp_keepalive_time in
/proc/sys/net/ipv4/.

It's documented in "man tcp" that say that it will take 2h11m as default 
to kill of such a connection.

Pg could of course also implement some pinging protocl that should be done
every now and then by the client so that the server knows that it is 
alive. For now you just have to lower the global settings as the one above 
if you want it to handle it better.

-- 
/Dennis Björklund



Re: Feature freeze date for 8.1

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> The problem, as I understand it, is that if you have a long-running 
> query and the client process disappears, the query keeps running and 
> holds whatever resources it may have until it finishes.

There is a trivial solution for this: it's called statement_timeout.

If the concern is that a process may block other processes for a long
time, what does it matter whether the client is still connected or not?
It's the long-running command in itself that is the problem.  So you
limit the time the command can run.

It might be interesting to think about a transaction_timeout as well,
to bound the time locks can be held.  But none of this has anything
to do with "high availability" as I understand the term.  It looks
more like a forcing function to make your users fix poorly-written
client software ;-)
        regards, tom lane


Re: Feature freeze date for 8.1

From
Date:
-------Original Message-------
From: Dennis Bjorklund
Date: 05/01/05 17:57:44
To: Alvaro Herrera
Cc: adnandursun@asrinbilisim.com.tr;
pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Feature freeze date for 8.1
On Sun, 1 May 2005, Alvaro Herrera wrote:
> Well, if process A loses the connection to the client,
then the
> transaction will be rolled back and other processes will
be able to
> continue.
Never. Process do waits until it is killed or canceled. for
example unplugged network cable or crashes client machine
or in case of lost of network connection
>If the other end of a tcp/ip connection just disapears,
for example if the
>network cable is cut off then in linux it can take up to 2
hours as
>default for it to close the connection. Normally if a
client application
>dies then the client OS cleans up and closes the socket so
that the server
>knows about it.
>There are some settings that one can alter to change the
time it waits
>before probing and killing the connection, ie
tcp_keepalive_time in
>/proc/sys/net/ipv4/.
But TCP experts don't advice to change this setting.However
this is not suitable method for this purposes..
>It's documented in "man tcp" that say that it will take
2h11m as default
>to kill of such a connection.
>Pg could of course also implement some pinging protocl
that should be done
>every now and then by the client so that the server knows
that it is
>alive. For now you just have to lower the global settings
as the one above
>if you want it to handle it better.
If a database wants to get bigger on the usage these
settings like this must be implemented.tcp_keepalive_time
setting is for TCP. It is not for database connection. if
so, Oracle also recommends it. But we advice to ppl to use
PostgreSQL instead of Oracle and others because it is open
source.. And how we convince the ppl who wants to use
PostgreSQL. What do we recommend the ppl ? :  1. change tcp_keepalive_time setting or  2. restart database or  3. find
andkill the pid etc..
 
This is not good things for them.
We sometime discuss here for geographic system datatypes
and feature. First, a database must have real database
features, not extreme features.
Best Regards
Adnan DURSUN
ASRIN Bilişim Ltd.Şti
---------------------------(end of
broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an
index scan if your     joining column's datatypes do not match   


Network write errors (was: Re: Feature freeze date for 8.1)

From
Andrew - Supernews
Date:
On 2005-05-01, Peter Eisentraut <peter_e@gmx.net> wrote:
> The problem, as I understand it, is that if you have a long-running 
> query and the client process disappears, the query keeps running and 
> holds whatever resources it may have until it finishes.  In fact, it 
> keeps sending data to the client and keeps ignoring the SIGPIPE it gets 
> (in case of a Unix-domain socket connection).

Ignoring the SIGPIPE is exactly the right thing to do.

What's _not_ a good idea is ignoring the EPIPE error from write(), which
seems to currently be reported via ereport(COMMERROR) which doesn't try
and abort the query as far as I can tell.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


Re: Feature freeze date for 8.1

From
Date:
On Sun, 01 May 2005 11:37:47 -0400Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Peter Eisentraut <peter_e@gmx.net> writes:
>> The problem, as I understand it, is that if you have a
>long-running 
>> query and the client process disappears, the query keeps
>running and 
>> holds whatever resources it may have until it finishes.
>
>There is a trivial solution for this: it's called
>statement_timeout. statement_timeout is not a solution if many processes are
waiting the resource. statement_timeout is providing a
escape mechanism. Imagine tens of processes are waiting,
statement_timeout refuses them but this processes must do
their works.

>If the concern is that a process may block other processes
>for a long
>time, what does it matter whether the client is still
>connected or not?
>It's the long-running command in itself that is the
>problem.  So you
>limit the time the command can run.
>
>It might be interesting to think about a
>transaction_timeout as well,
>to bound the time locks can be held.  But none of this has
>anything
>to do with "high availability" as I understand the term.
> It looks
>more like a forcing function to make your users fix
>poorly-written
>client software ;-)

Listen Tom, write a client software that releases the
resources / locks that was hold before client power is down
or client connection was lost. 

Do we must suggest this solution to ppl that wants to use
PostgreSQL or  do we must implement a pinging mechanism to
check whether client is dead or live ?

Best Regards

Adnan DURSUN
ASRIN Bilişim Hiz.Ltd.
Ankara / TURKEY


Re: Feature freeze date for 8.1

From
Dennis Bjorklund
Date:
On Sun, 1 May 2005 adnandursun@asrinbilisim.com.tr wrote:

> If a database wants to get bigger on the usage these settings like this
> must be implemented.

Lucky thing that postgresql is open source so you or anyone else that need 
it can implement or sponsor it. Postgresql gets as good as we make it and 
nothing happens unless someone that need a feature sit down and implement 
it.

> First, a database must have real database features, not extreme
> features.

Different people have different needs. For me this have not even once been
a problem, so it's not something that I personally will lose any sleep
over. It doesn't mean I wouldn't welcome that someone else work on it.

-- 
/Dennis Björklund



Re: Feature freeze date for 8.1

From
Date:
On Sun, 1 May 2005 19:35:01 +0200 (CEST)Dennis Bjorklund <db@zigo.dhs.org> wrote:
>On Sun, 1 May 2005 adnandursun@asrinbilisim.com.tr wrote:
>
>> If a database wants to get bigger on the usage these
>settings like this
>> must be implemented.
>
>Lucky thing that postgresql is open source so you or
>anyone else that need 
>it can implement or sponsor it. Postgresql gets as good as
>we make it and 
>nothing happens unless someone that need a feature sit
>down and implement 
>it.
 Yes, i agree with you so i havent' sit down and i discuss
with you a feature that will be getting more availability
to PostgreSQL database.All the thing that i want is to get
large usage PostgreSQL in my country. I discuss a lot of
ppl about postgresql database and i try to convince the ppl
to use the PostgreSQL.

>> First, a database must have real database features, not
>extreme
>> features.
>
>Different people have different needs. For me this have
>not even once been
>a problem, so it's not something that I personally will
>lose any sleep
>over. It doesn't mean I wouldn't welcome that someone else
>work on it.
  Yes, but you have developed a database application so,
there are a few principal features must be added instead of
extreme features.
  I would never wait a minute to join postgresql
development group, if i knew enough C/C++ . I can do only
to talk ppl and convince them to use open source software
like PostgreSQL. The features that we discuss here to grown
up the usage of the PostgreSQL database. I am a DBA. I know
some other databases like Oracle, MSSQL etc.

Adnan DURSUN
ASRIN Bilisim Hiz.Ltd.


Re: Feature freeze date for 8.1

From
Bruno Wolff III
Date:
On Sun, May 01, 2005 at 19:57:37 +0300, adnandursun@asrinbilisim.com.tr wrote:
> 
> Listen Tom, write a client software that releases the
> resources / locks that was hold before client power is down
> or client connection was lost. 

If Postgres can tell the connection has been lost then it should roll
back the connection. The problem is that you can't always tell if
a connection has been lost. All you can do is timeout, either when TCP
times out or some other timeout (such as a statment timeout) that you
set.


Re: Feature freeze date for 8.1

From
Date:
On Sun, 1 May 2005 14:35:37 -0500Bruno Wolff III <bruno@wolff.to> wrote:
>On Sun, May 01, 2005 at 19:57:37 +0300,
>  adnandursun@asrinbilisim.com.tr wrote:
>> 
>> Listen Tom, write a client software that releases the
>> resources / locks that was hold before client power is
>down
>> or client connection was lost. 
>
>If Postgres can tell the connection has been lost then it
>should roll back the connection. 

Yes, but, Can PostgreSQL know which connection is lost or
live or dead ?

>The problem is that you can't always
>tell if a connection has been lost. All you can do is
timeout, either when TCP
>times out or some other timeout (such as a statment
timeout) that you set.
You are right, a timeout parameter must be used for that
on the backend. a client application never find the
previous instance before it crashed. However more than one
connection was able to be established to PostgreSQL
backend..
 Statement_timeout is just a escape mechanism for active
transaction. Imagine; you've started a process to update
the rows in a table then your PC power was down but you
have not sent commit or rollback yet..What will happen now
? Example Codes ;

-- Client Side of Codes

1. send statement_timeout = 10;
2. start a transaction;
3. start to update table;   ** connection is lost here
4. commit;

Best Regards,

Adnan DURSUN
ASRIN Bilişim Hiz.Ltd.
Ankara / TURKEY


Re: Feature freeze date for 8.1

From
Christopher Kings-Lynne
Date:
>>Well, if process A loses the connection to the client,
> 
> then the
> 
>>transaction will be rolled back and other processes will
> 
> be able to
> 
>>continue.
> 
>  
> Never. Process do waits until it is killed or canceled. for
> example unplugged network cable or crashes client machine
> or in case of lost of network connection

Always.  That's how it works for me.

Chris


Re: Feature freeze date for 8.1

From
Neil Conway
Date:
adnandursun@asrinbilisim.com.tr wrote:
>   statement_timeout is not a solution if many processes are
> waiting the resource.

Why not?

I think the only problem with using statement_timeout for this purpose 
is that the client connection might die during a long-running 
transaction at a point when no statement is currently executing. Tom's 
suggested transaction_timeout would be a reasonable way to fix this. 
Adnan, if you think this is such a significant problem (I can't say that 
I agree), I'd encourage you to submit a patch.

-Neil


Re: Feature freeze date for 8.1

From
Oliver Jowett
Date:
Neil Conway wrote:
> adnandursun@asrinbilisim.com.tr wrote:
> 
>>   statement_timeout is not a solution if many processes are
>> waiting the resource.
> 
> 
> Why not?
> 
> I think the only problem with using statement_timeout for this purpose 
> is that the client connection might die during a long-running 
> transaction at a point when no statement is currently executing. Tom's 
> suggested transaction_timeout would be a reasonable way to fix this. 
> Adnan, if you think this is such a significant problem (I can't say that 
> I agree), I'd encourage you to submit a patch.

I raised this a while back on -hackers:
  http://archives.postgresql.org/pgsql-hackers/2005-02/msg00397.php

but did not get much feedback.

Does anyone have comments on that email?

It's a problem that is unlikely to happen in normal operation, but you 
do need to deal with it to cover the network failure cases if you have 
an otherwise failure-tolerant cluster..

-O


Re: Feature freeze date for 8.1

From
Jaime Casanova
Date:
On 5/1/05, adnandursun@asrinbilisim.com.tr
<adnandursun@asrinbilisim.com.tr> wrote:
> On Sun, 1 May 2005 14:35:37 -0500
>  Bruno Wolff III <bruno@wolff.to> wrote:
> >On Sun, May 01, 2005 at 19:57:37 +0300,
> >  adnandursun@asrinbilisim.com.tr wrote:
> >>
> >> Listen Tom, write a client software that releases the
> >> resources / locks that was hold before client power is
> >down
> >> or client connection was lost.
> >
> >If Postgres can tell the connection has been lost then it
> >should roll back the connection.
>
> Yes, but, Can PostgreSQL know which connection is lost or
> live or dead ?
>
> >The problem is that you can't always
> >tell if a connection has been lost. All you can do is
> timeout, either when TCP
> >times out or some other timeout (such as a statment
> timeout) that you set.
>
>  You are right, a timeout parameter must be used for that
> on the backend. a client application never find the
> previous instance before it crashed. However more than one
> connection was able to be established to PostgreSQL
> backend..
>
>   Statement_timeout is just a escape mechanism for active
> transaction. Imagine; you've started a process to update
> the rows in a table then your PC power was down but you
> have not sent commit or rollback yet..What will happen now
>
If you send the update outside a transaction and...

Option 1) ...the client crashes then the update will commit, i think.
If you don't want that send the update inside a begin/commit block.

Option 2) ...the server crashes the update will rollback.


If you send the update inside a transaction and...

Option 1) ...the client crashes then the update will rollback.
Option 2) ...the server crashes the update will rollback.

Actually, i can't see what's the problem. :)

--
Atentamente,
Jaime Casanova
(DBA: DataBase Aniquilator ;)


Re: Feature freeze date for 8.1

From
Neil Conway
Date:
Oliver Jowett wrote:
> I raised this a while back on -hackers:
> 
>   http://archives.postgresql.org/pgsql-hackers/2005-02/msg00397.php
> 
> but did not get much feedback.

Perhaps you can interpret silence as consent? :)

> Does anyone have comments on that email?

I wouldn't be opposed to it. It would be different than 
statement_timeout, in that we'd be measuring transaction *idle* time, 
not total transaction runtime, so perhaps "transaction_idle_timeout" is 
a better name than "transaction_timeout". Also, presumably when the 
transaction idle timeout fires, we should just rollback the current 
transaction, not close the client connection -- so you could potentially 
have idle backends sticking around for the full TCP timeout period. 
Since they shouldn't be holding any locks I don't see that as a big problem.

-Neil


Re: Feature freeze date for 8.1

From
Tom Lane
Date:
Jaime Casanova <systemguards@gmail.com> writes:
> Actually, i can't see what's the problem. :)

I think the issue is "how long does it take for the rollback to happen?"

While that isn't an unreasonable issue on its face, I think it really
boils down to this: the OP is complaining because he thinks the
connection-loss timeout mandated by the TCP RFCs is too long.  Perhaps
the OP knows network engineering far better than the authors of those
RFCs, or perhaps not.  I'm not convinced that Postgres ought to provide
a way to second-guess the TCP stack ... this looks to me like "I can't
convince the network software people to provide me an easy way to
override their decisions, so I'll beat up on the database people to
override 'em instead.  Perhaps the database people don't know the issues
and can be browbeaten more easily."
        regards, tom lane


Re: Feature freeze date for 8.1

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
>> Does anyone have comments on that email?

> I wouldn't be opposed to it. It would be different than 
> statement_timeout, in that we'd be measuring transaction *idle* time, 

We would?  Why?  Please provide a motivation that justifies the
considerably higher cost to make it count that way, as opposed to
time-since-BEGIN.  If the point is to limit the time for which locks
are held, I should think this would actually be *less* desirable than
constraining time-since-BEGIN.

> Also, presumably when the 
> transaction idle timeout fires, we should just rollback the current 
> transaction, not close the client connection

Certainly ...

> -- so you could potentially 
> have idle backends sticking around for the full TCP timeout period. 

... but that doesn't necessarily follow.  Once we've been motivated to
try to send an error message to the client, the relevant timeouts are
way shorter than they are under connection-idle conditions.

> Since they shouldn't be holding any locks I don't see that as a big problem.

Right, once we've released the transaction the pain grows greatly less.
We are still occupying a backend slot though, so failing sooner has some
value, if there is no doubt the connection is unrecoverable.  (But see
my upthread doubts about whether we know that better than the TCP stack
does.)
        regards, tom lane


Re: Feature freeze date for 8.1

From
Neil Conway
Date:
Tom Lane wrote:
> We would?  Why?  Please provide a motivation that justifies the
> considerably higher cost to make it count that way, as opposed to
> time-since-BEGIN.

The specific scenario this feature is intended to resolve is 
idle-in-transaction backends holding on to resources while the network 
connection times out; it isn't intended to implement "I never want to 
run a transaction that takes more than X seconds to execute." While 
long-running transactions aren't usually a great idea, I can certainly 
imagine installations in which some transactions might take, say, 30 
minutes to execute but the admin would like to timeout idle connections 
in less than that amount of time.

As for cost, this feature has zero cost until it is enabled. I would 
also guess that setitimer() is reasonably cheap on most kernels, 
although let me know if I'm mistaken. If it's sufficiently expensive 
that a setitimer() per query is noticeable, then I agree that 
setitimer() at BEGIN-time is probably sufficient for most people.

> Once we've been motivated to try to send an error message to the
> client, the relevant timeouts are way shorter than they are under
> connection-idle conditions.

Sorry, yes, I should have been more clear.

-Neil


Re: Feature freeze date for 8.1

From
Oliver Jowett
Date:
Tom Lane wrote:

> I'm not convinced that Postgres ought to provide
> a way to second-guess the TCP stack ... this looks to me like "I can't
> convince the network software people to provide me an easy way to
> override their decisions, so I'll beat up on the database people to
> override 'em instead.  Perhaps the database people don't know the issues
> and can be browbeaten more easily."

Would you be ok with a patch that allowed configuration of the 
TCP_KEEPCNT / TCP_KEEPIDLE / TCP_KEEPINTVL socket options on backend 
sockets?

-O


Re: Feature freeze date for 8.1

From
Russell Smith
Date:
On Mon, 2 May 2005 03:05 pm, Neil Conway wrote:
> Tom Lane wrote:
> > We would?  Why?  Please provide a motivation that justifies the
> > considerably higher cost to make it count that way, as opposed to
> > time-since-BEGIN.
> 
> The specific scenario this feature is intended to resolve is 
> idle-in-transaction backends holding on to resources while the network 
> connection times out; it isn't intended to implement "I never want to 
> run a transaction that takes more than X seconds to execute." While 
> long-running transactions aren't usually a great idea, I can certainly 
> imagine installations in which some transactions might take, say, 30 
> minutes to execute but the admin would like to timeout idle connections 
> in less than that amount of time.
> 
The two big long running transactions I can think of are VACUUM on a large db,
and there is no way to shorten that time, since to stop wraparound you must vacuum
the whole db.

Backups with pg_dump can run for quite a long time.

I would prefer an idle timeout if it's not costly.  Because otherwise estimates need to be
made about how long VACUUM and backup could take, and set the timeout longer.  Which
in some senses defeats the purpose of being able to cleanup idle connection quickly.

The VACUUM issue may not be a problem, as if BEGIN is not issued, then the transaction
timeout would probably not be used. But the issues would remain for backups.

Just some thoughts

Regards

Russell Smith


Re: Feature freeze date for 8.1

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Tom Lane wrote:
>> We would?  Why?  Please provide a motivation that justifies the
>> considerably higher cost to make it count that way, as opposed to
>> time-since-BEGIN.

> The specific scenario this feature is intended to resolve is 
> idle-in-transaction backends holding on to resources while the network 
> connection times out; it isn't intended to implement "I never want to 
> run a transaction that takes more than X seconds to execute."

[ itch... ]  This seems to me to be conflating several distinct issues.
AFAIR the points that have been raised in the thread are:

#1  Defend against loss of connectivity to client
#2  Defend against client sitting idle while holding locks (or just   holding an open transaction and thereby
preventingVACUUM cleanup)
 
#3  Defend against client holding locks unreasonably long, even though   not idle (obviously any such constraint will
causeclients to   fail midstream, but perhaps some DBAs will see this as the lesser   of two evils)
 

I claim that if you have a problem with #1 you ought to go discuss it
with some TCP hackers: you basically want to second-guess the TCP
stack's ideas about appropriate timeouts.  Maybe you know what you
are doing or maybe not, but it's not a database-level issue.

#2 is a fair point if you need to cope with poorly-programmed clients,
but I'm not seeing exactly why it has to be measured by "idle time"
rather than "total time".  The known cases of this involve client
code that issues a BEGIN and then just sits, so there's no difference.

For point #3, I claim you have to measure total time not idle time
or you'll fail to perceive the problem at all.

> While long-running transactions aren't usually a great idea, I can
> certainly imagine installations in which some transactions might take,
> say, 30 minutes to execute but the admin would like to timeout idle
> connections in less than that amount of time.

The fallacy in that argument is that if you don't like a transaction
that sits idle for 30 minutes, you aren't likely to like ones that hog
the CPU for 30 minutes either.  The idle xact is certainly not chewing
more resources than the busy xact.  If you have a problem with it, it
has to be along the lines of holding-locks-too-long, and that would
apply just as much to the busy guy.
        regards, tom lane


Re: Feature freeze date for 8.1

From
Tom Lane
Date:
Oliver Jowett <oliver@opencloud.com> writes:
> Tom Lane wrote:
>> I'm not convinced that Postgres ought to provide
>> a way to second-guess the TCP stack ...

> Would you be ok with a patch that allowed configuration of the 
> TCP_KEEPCNT / TCP_KEEPIDLE / TCP_KEEPINTVL socket options on backend 
> sockets?

[ shrug... ]  As long as it doesn't fail to build on platforms that
don't offer those options, I couldn't complain too hard.  But do we
really need all that?
        regards, tom lane


Re: Feature freeze date for 8.1

From
Tom Lane
Date:
Russell Smith <mr-russ@pws.com.au> writes:
> I would prefer an idle timeout if it's not costly.  Because otherwise
> estimates need to be made about how long VACUUM and backup could take,
> and set the timeout longer.

Why?  No one has suggested that the same timeout must be applied to
every connection.  Clients that are going to do maintenance stuff like
VACUUM could just disable the timeout.

This does bring up thoughts of whether the timeout needs to be a
protected variable (SUSET or higher).  I'd argue not, since a
noncooperative client can certainly cause performance issues aplenty
no matter what you try to impose with timeouts.
        regards, tom lane


Re: Feature freeze date for 8.1

From
Neil Conway
Date:
Tom Lane wrote:
> #3  Defend against client holding locks unreasonably long, even though
>     not idle

I can't get too excited about this case. If the client is malicious, 
this feature is surely insufficient to stop them from consuming a lot of 
resources (for example, they could easily drop and then reacquire the 
locks every (timeout * 0.9) seconds). And how many DBAs are really going 
to want to abort non-malicious clients doing useful work if they happen 
to exceed a certain total runtime? Perhaps a motivating example would 
help...

> I claim that if you have a problem with #1 you ought to go discuss it
> with some TCP hackers: you basically want to second-guess the TCP
> stack's ideas about appropriate timeouts.

Well, no -- you might want to set a different timeout for PostgreSQL 
connections than for other connections. Is there a way to change the 
socket timeout for some subset of the processes on the machine without 
hacking the client or server source? You might also want to set this 
timeout on a more granular basis (e.g. per user, per database, etc.) 
Implementing this via setting a socket option (provided it can be done 
portably) would be fine with me.

-Neil


Re: Feature freeze date for 8.1

From
Dennis Bjorklund
Date:
On Mon, 2 May 2005, Tom Lane wrote:

> #1  Defend against loss of connectivity to client
> 
> I claim that if you have a problem with #1 you ought to go discuss it
> with some TCP hackers: you basically want to second-guess the TCP
> stack's ideas about appropriate timeouts.  Maybe you know what you
> are doing or maybe not, but it's not a database-level issue.

Different applications can have different needs here. For some it's okay 
to wait a long time, for others it is not.

The tcp hackers have provided an api for clients to set these values per
socket (setsockopt with TCP_KEEPIDLE and similar (in linux at least)).

My problem with the above setting is that some operations can be in
progress for a long time on the server without generating any tcp/ip
traffic to the client (a non verbose vacuum I guess is such a case). Such
an operation would look like it's idle.

There is an overlap with session and transaction timeouts, most
applications work fine with any of these.

-- 
/Dennis Björklund



Re: Feature freeze date for 8.1

From
Oliver Jowett
Date:
Tom Lane wrote:
> Oliver Jowett <oliver@opencloud.com> writes:
> 
>>Tom Lane wrote:
>>
>>>I'm not convinced that Postgres ought to provide
>>>a way to second-guess the TCP stack ...
> 
> 
>>Would you be ok with a patch that allowed configuration of the 
>>TCP_KEEPCNT / TCP_KEEPIDLE / TCP_KEEPINTVL socket options on backend 
>>sockets?
> 
> 
> [ shrug... ]  As long as it doesn't fail to build on platforms that
> don't offer those options, I couldn't complain too hard.  But do we
> really need all that?

I can't see how you'd aggregate or discard any of those options without 
losing useful tuning knobs.. if you're going to have one, you might as 
well have them all.

-O


Re: Feature freeze date for 8.1

From
Oliver Jowett
Date:
Neil Conway wrote:

> Is there a way to change the 
> socket timeout for some subset of the processes on the machine without 
> hacking the client or server source?

The only ways I can see of tuning the TCP idle parameters on Linux are 
globally via sysfs, or per-socket via setsockopt().

You could LD_PRELOAD something to wrap accept(), I suppose, but that 
seems needlessly ugly..

-O


Re: Feature freeze date for 8.1

From
Peter Eisentraut
Date:
Neil Conway wrote:
> The specific scenario this feature is intended to resolve is
> idle-in-transaction backends holding on to resources while the
> network connection times out;

I was under the impression that the specific scenario is 
busy-in-transaction backends continuing to produce and send data while 
the client has disappeared.  Why does the backend ignore network errors 
and keep sending data?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Feature freeze date for 8.1

From
Oliver Jowett
Date:
Peter Eisentraut wrote:
> Neil Conway wrote:
> 
>>The specific scenario this feature is intended to resolve is
>>idle-in-transaction backends holding on to resources while the
>>network connection times out;
> 
> 
> I was under the impression that the specific scenario is 
> busy-in-transaction backends continuing to produce and send data while 
> the client has disappeared.  Why does the backend ignore network errors 
> and keep sending data?

The scenario I need to deal with is this:

There are multiple nodes, network-separated, participating in a cluster.
One node is selected to talk to a particular postgresql instance (call
this node A).

A starts a transaction and grabs some locks in the course of that
transaction. Then A falls off the network before committing because of a
hardware or network failure. A's connection might be completely idle
when this happens.

The cluster liveness machinery notices that A is dead and selects a new
node to talk to postgresql (call this node B). B resumes the work that A
was doing prior to failure.

B has to wait for any locks held by A to be released before it can make
any progress.

Without some sort of tunable timeout, it could take a very long time (2+
hours by default on Linux) before A's connection finally times out and
releases the locks.

-O


Re: Feature freeze date for 8.1

From
Date:
On Mon, 02 May 2005 12:05:45 +1000Neil Conway <neilc@samurai.com> wrote:
>adnandursun@asrinbilisim.com.tr wrote:
>>   statement_timeout is not a solution if many processes
>are
>> waiting the resource.
>
>Why not?
  Imagine a process locked some rows to update and process
codes like that ;
 -- Sample Client Codes here :
  1. Start a Transaction  2. Set statement_timeout to 10000 or any value..  3. Update the rows      * after update is
completedthe connection was lost
 
and now commit keyword couldnt be sent  4. send commit to postgresql
  Above, because "update" is completed the
statement_timeout is not effected anymore to cancel
query..And others processes that waits same resources /
rows are waiting now...

>I think the only problem with using statement_timeout for
>this purpose is that the client connection might die
>during a long-running transaction at a point when no
>statement is currently executing. Tom's suggested
>transaction_timeout would be a reasonable way to fix this.
>Adnan, if you think this is such a significant problem (I
>can't say that I agree), I'd encourage you to submit a
>patch.
 Ok Neil, a transaction_timeout parameters solve this, but
this is worst case.. some ppl uses MSADO conneciton
component and ADO conneciton has an attributes that send
"start transaction" after a commit or sends "start
transaction"  after a rollback so, evertime has a
transaction on conneciton / session..

Adnan DURSUN
ASRIN Bilişim Hiz.Ltd.


Re: Feature freeze date for 8.1

From
Date:
On Sun, 1 May 2005 23:08:39 -0500Jaime Casanova <systemguards@gmail.com> wrote:
>On 5/1/05, adnandursun@asrinbilisim.com.tr
><adnandursun@asrinbilisim.com.tr> wrote:
>> On Sun, 1 May 2005 14:35:37 -0500
>>  Bruno Wolff III <bruno@wolff.to> wrote:
>> >On Sun, May 01, 2005 at 19:57:37 +0300,
>> >  adnandursun@asrinbilisim.com.tr wrote:
>> >> 
>> >> Listen Tom, write a client software that releases the
>> >> resources / locks that was hold before client power
>is
>> >down
>> >> or client connection was lost. 
>> >
>> >If Postgres can tell the connection has been lost then
>it
>> >should roll back the connection. 
>> 
>> Yes, but, Can PostgreSQL know which connection is lost
>or
>> live or dead ?
>> 
>> >The problem is that you can't always
>> >tell if a connection has been lost. All you can do is
>> timeout, either when TCP
>> >times out or some other timeout (such as a statment
>> timeout) that you set.
>> 
>>  You are right, a timeout parameter must be used for
>that
>> on the backend. a client application never find the
>> previous instance before it crashed. However more than
>one
>> connection was able to be established to PostgreSQL
>> backend..
>> 
>>   Statement_timeout is just a escape mechanism for
>active
>> transaction. Imagine; you've started a process to update
>> the rows in a table then your PC power was down but you
>> have not sent commit or rollback yet..What will happen
>now
>> 
>If you send the update outside a transaction and...
>
>Option 1) ...the client crashes then the update will
>commit, i think.
>If you don't want that send the update inside a
>begin/commit block.
>
>Option 2) ...the server crashes the update will rollback.
>
>
>If you send the update inside a transaction and...
>
>Option 1) ...the client crashes then the update will
>rollback.
>Option 2) ...the server crashes the update will rollback.
>
>Actually, i can't see what's the problem. :)

No, process waits until it is completed..

Adnan DURSUN
ASRIN Bili?im Hiz.Ltd.


Re: Feature freeze date for 8.1

From
Date:
On Mon, 02 May 2005 00:25:33 -0400Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Jaime Casanova <systemguards@gmail.com> writes:
>> Actually, i can't see what's the problem. :)
>
>I think the issue is "how long does it take for the
>rollback to happen?"
>
> ....so I'll beat up on the database people to override
'em instead.  Perhaps the database people don't know the
issues and can be browbeaten more easily."

Never,  How can ppl change TCP timeouts because PostgreSQL
can run on different OS. I dont want to browbeat you. All
thing i want is to provide a good mechanism to slove this
problem on PostgreSQL like others databases etc. Oracle,
MSSQL, PervasiveSQL and so...

Adnan DURSUN
ASRIN Bilişim Hiz.Ltd.


Re: Feature freeze date for 8.1

From
Date:
On Mon, 02 May 2005 01:35:14 -0400Tom Lane <tgl@sss.pgh.pa.us> wrote:

>[ itch... ]  This seems to me to be conflating several
>distinct issues.
>AFAIR the points that have been raised in the thread are:
>
>#1  Defend against loss of connectivity to client

>#2  Defend against client sitting idle while holding locks
(or just  holding an open transaction and thereby
preventing VACUUM cleanup)

>#3  Defend against client holding locks unreasonably long,
>even though  not idle (obviously any such constraint will
cause clients to
>    fail midstream, but perhaps some DBAs will see this as
the lesser  of two evils)

>I claim that if you have a problem with 
>#1 you ought to go discuss it with some TCP hackers: you
basically want to second-guess
>the TCP stack's ideas about appropriate timeouts.  Maybe
you know what you
>are doing or maybe not, but it's not a database-level
issue.

>#2 is a fair point if you need to cope with
poorly-programmed clients,
>but I'm not seeing exactly why it has to be measured by
"idle time"
>rather than "total time".  The known cases of this involve
>client code that issues a BEGIN and then just sits, so
there's no
>difference.  Ok, the client sent BEGIN and then connection was lost.
Does it means that the client sits ?

Adnan DURSUN
ASRIN Bilişim Hiz.Ltd.


Re: Feature freeze date for 8.1

From
Date:
On Mon, 02 May 2005 16:07:07 +1000Neil Conway <neilc@samurai.com> wrote:

>> I claim that if you have a problem with #1 you ought to
>go discuss it with some TCP hackers: you basically want to
second-guess the TCP
>> stack's ideas about appropriate timeouts.
>
>Well, no -- you might want to set a different timeout for
>PostgreSQL connections than for other connections. Is
>there a way to change the socket timeout for some subset
>of the processes on the machine without hacking the client
>or server source? You might also want to set this timeout
>on a more granular basis (e.g. per user, per database,
>etc.) Implementing this via setting a socket option
>(provided it can be done portably) would be fine with me.

You are right Neil.. A machine can make more than one
connection to same server. Some of them are database
connection and some only to OS conneciton. For Example :
Oracle has an parameter to find and kill dead sessions..

PostgreSQL can run on different OS.I think this is solved
database level rather than OS level.

Best Regards,

Adnan DURSUN
ASRIN Bili?im Hiz.Ltd.


Re: Feature freeze date for 8.1

From
Date:
On Mon, 2 May 2005 10:11:40 +0200Peter Eisentraut <peter_e@gmx.net> wrote:

>I was under the impression that the specific scenario is 
>busy-in-transaction backends continuing to produce and
>send data while the client has disappeared.  Why does the
backend ignore network errors 
>and keep sending data?

Yes, I think PostgreSQL doesnt know whether client is dead
or live ? Or It knows it but, it keeps to send data..

Best Regards,

Adnan DURSUN
ASRIN Bili?im Hiz.Ltd.


Re: Feature freeze date for 8.1

From
Hannu Krosing
Date:
On P, 2005-05-01 at 11:37 -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > The problem, as I understand it, is that if you have a long-running 
> > query and the client process disappears, the query keeps running and 
> > holds whatever resources it may have until it finishes.
> 
> There is a trivial solution for this: it's called statement_timeout.

statement timeout does not solve the fundamental problem of server not
seeing when a cleint connection has disappeared. 

I had another variant of the same problem - i had 300 client connections
to the same postmaster, postmaster was configured to handle 500
simultaneous connections. Then there was a network outage for a few
minutes, during which clients saw that the postmaster was not there and
closed their conections and tried to make new ones. After the network
came back, only 200 of them were able to reconnect, as the server never
saw them leaving and just kept the 300 zombie connections.

> It might be interesting to think about a transaction_timeout as well,
> to bound the time locks can be held.  But none of this has anything
> to do with "high availability" as I understand the term.  It looks
> more like a forcing function to make your users fix poorly-written
> client software ;-)

Im my case all ttransactions were implicit one command functon calls
("select * from dbfunc()"), so transaction timeout would not help.

probably the only way for server to detect stale connections would be
sending/receiving some kind of keepalives.

-- 
Hannu Krosing <hannu@skype.net>



Re: Feature freeze date for 8.1

From
Hannu Krosing
Date:
On E, 2005-05-02 at 01:35 -0400, Tom Lane wrote:
> Neil Conway <neilc@samurai.com> writes:
> > Tom Lane wrote:
> >> We would?  Why?  Please provide a motivation that justifies the
> >> considerably higher cost to make it count that way, as opposed to
> >> time-since-BEGIN.
> 
> > The specific scenario this feature is intended to resolve is 
> > idle-in-transaction backends holding on to resources while the network 
> > connection times out; it isn't intended to implement "I never want to 
> > run a transaction that takes more than X seconds to execute."
> 
> [ itch... ]  This seems to me to be conflating several distinct issues.
> AFAIR the points that have been raised in the thread are:
> 
> #1  Defend against loss of connectivity to client
> #2  Defend against client sitting idle while holding locks (or just
>     holding an open transaction and thereby preventing VACUUM cleanup)
> #3  Defend against client holding locks unreasonably long, even though
>     not idle (obviously any such constraint will cause clients to
>     fail midstream, but perhaps some DBAs will see this as the lesser
>     of two evils)
> 
> I claim that if you have a problem with #1 you ought to go discuss it
> with some TCP hackers: you basically want to second-guess the TCP
> stack's ideas about appropriate timeouts.  Maybe you know what you
> are doing or maybe not, but it's not a database-level issue.

Well, I've had problems with clients which resolve DB timeouts by
closing the current connection and establish a new one.

If it is actual DB timeout, then it all is ok, the server soon notices
that the client connection is closed and kills itself.

Problems happen when the timeout is caused by actual network problems -
when i have 300 clients (server's max_connections=500) which try to
reconnect after network outage, only 200 of them can do so as the server
is holding to 300 old connections.

In my case this has nothing to do with locks or transactions.

It would be nice if I coud st up some timeut using keepalives (like ssh-
s ProtocoKeepalives") and use similar timeouts on client and server.

-- 
Hannu Krosing <hannu@skype.net>



Re: Feature freeze date for 8.1

From
Date:
On Mon, 02 May 2005 13:59:21 +0300Hannu Krosing <hannu@skype.net> wrote:
>On E, 2005-05-02 at 01:35 -0400, Tom Lane wrote:

>Well, I've had problems with clients which resolve DB
timeouts by
>closing the current connection and establish a new one.
>
>If it is actual DB timeout, then it all is ok, the server
>soon notices that the client connection is closed and
kills itself.
>
>Problems happen when the timeout is caused by actual
>network problems - when i have 300 clients (server's
max_connections=500)
>which try to reconnect after network outage, only 200 of
them can do so
>as the server is holding to 300 old connections.
>
>In my case this has nothing to do with locks or
transactions.
 Yes, but if you would have locks then same  problem
occours..

>It would be nice if I coud st up some timeut using
keepalives (like ssh-s ProtocoKeepalives") and >use similar
timeouts on client and server.
 Diffrent ppl use diffrent OS.

Adnan DURSUN
ASRIN Bilişim Hiz.Ltd.


Re: Feature freeze date for 8.1

From
Date:
On Sun, 01 May 2005 22:23:19 +0300Hannu Krosing <hannu@skype.net> wrote:
>On P, 2005-05-01 at 11:37 -0400, Tom Lane wrote:
>Im my case all ttransactions were implicit one command
functon calls
>("select * from dbfunc()"), so transaction timeout would
not help.
>
>probably the only way for server to detect stale
connections would be
>sending/receiving some kind of keepalives.

...and then clear stale connection. This would be nice
feature for PostgreSQL

Best Regards,

Adnan DURSUN
ASRIN Bilişim Hiz.Ltd.


Re: Feature freeze date for 8.1

From
Alvar Freude
Date:
Hi,

-- Dennis Bjorklund <db@zigo.dhs.org> wrote:

> The tcp hackers have provided an api for clients to set these values per
> socket (setsockopt with TCP_KEEPIDLE and similar (in linux at least)).

you can use SO_KEEPALIVE:
  [...] SO_KEEPALIVE enables  the periodic transmission of messages on a connected socket.  Should the  connected party
failto respond to these messages, the connection is con-  sidered broken and processes using the socket are notified
viaa SIGPIPE  signal when attempting to send data. 
 


For me it seems to be a good idea to support this. For the applications
(server, client) it is transparent, but notices a cutted or broken network
connection faster then before...

But even with this you only realise that the connection is gone when sending
something, AFAIK.


Ciao Alvar


-- 
Alvar C.H. Freude -- http://alvar.a-blast.org/
http://odem.org/
http://www.assoziations-blaster.de/info/Hommingberger-Gepardenforelle.html
http://www.assoziations-blaster.de/

Re: Feature freeze date for 8.1

From
Date:
On Mon, 02 May 2005 13:32:18 +0200Alvar Freude <alvar@a-blast.org> wrote:
>Hi,
>
>-- Dennis Bjorklund <db@zigo.dhs.org> wrote:
>
>> The tcp hackers have provided an api for clients to set
>these values per
>> socket (setsockopt with TCP_KEEPIDLE and similar (in
>linux at least)).
>
>you can use SO_KEEPALIVE:
>
>   [...] SO_KEEPALIVE enables
>   the periodic transmission of messages on a connected
socket.  Should the
>   connected party fail to respond to these messages, the
connection is con-
>   sidered broken and processes using the socket are
notified via a SIGPIPE
>   signal when attempting to send data. 
>
>
>For me it seems to be a good idea to support this. For the
applications
>(server, client) it is transparent, but notices a cutted
or broken network
>connection faster then before...
>
>But even with this you only realise that the connection is
>gone when sending something, AFAIK.
 So this means, If client does never try to send data the
resources would be going to be held.
I think it is not a good solution to find zombie / dead
connection and clear them..

Best Regards,

Adnan DURSUN
ASRIN Bilişim Hiz.Ltd.


Re: Feature freeze date for 8.1

From
Christopher Browne
Date:
Centuries ago, Nostradamus foresaw when adnandursun@asrinbilisim.com.tr would write:
> We sometime discuss here for geographic system datatypes
> and feature. First, a database must have real database
> features, not extreme features.

Oh, but it would be so much better if we could call the next version "PostgreSQL 8.1 Extreme Edition"
because it was so much more "extreme."

That way we could get skateboarders to do promotional appearances.
And Vin Diesel for a "XXX" appearance...  :-)
-- 
let name="cbbrowne" and tld="gmail.com" in String.concat "@" [name;tld];;
http://cbbrowne.com/info/slony.html
Signs  of  a Klingon  Programmer  -  4. "You  cannot really appreciate
Dilbert unless you've read it in the original Klingon."


Re: Feature freeze date for 8.1

From
Christopher Browne
Date:
After takin a swig o' Arrakan spice grog, adnandursun@asrinbilisim.com.tr belched out:
> On Sun, 1 May 2005 14:35:37 -0500
>  Bruno Wolff III <bruno@wolff.to> wrote:
>>On Sun, May 01, 2005 at 19:57:37 +0300,
>>  adnandursun@asrinbilisim.com.tr wrote:
>>> 
>>> Listen Tom, write a client software that releases the
>>> resources / locks that was hold before client power is
>>down
>>> or client connection was lost. 
>>
>>If Postgres can tell the connection has been lost then it
>>should roll back the connection. 
>
> Yes, but, Can PostgreSQL know which connection is lost or
> live or dead ?

Certainly, with the "magic connection analysis protocol."

Or not...

In order to that sort of analysis, you need to have some form of
heartbeat monitor on the connection, thereby requiring extra
connections.

It could make sense to attach that kind of extra apparatus to a
connection pool manager like pgpool, but probably not directly to
PostgreSQL itself.

You might want to look into pgpool; that is something that many people
interested in "enterprise usage" of PostgreSQL are looking into...
-- 
(reverse (concatenate 'string "moc.liamg" "@" "enworbbc"))
http://linuxdatabases.info/info/slony.html
"A army's effectiveness depends  on its size, training, experience and
morale, and morale is worth more than all the other factors combined."
-- Napoleon Bonaparte


Re: Feature freeze date for 8.1

From
Christopher Browne
Date:
The world rejoiced as matthew@zeut.net ("Matthew T. O'Connor") wrote:
> Marc G. Fournier wrote:
>
>> On Fri, 29 Apr 2005, Christopher Browne wrote:
>>
>>> Some reasonable approximations might include:
>>> - How much disk I/O was recorded in the last 60 seconds?
>>> - How many application transactions (e.g. - invoices or such) were
>>>   issued in the last 60 seconds (monitoring a sequence could be
>>>   good enough).
>>
>>
>> Some way of doing a 'partial vacuum' would be nice ... where a
>> VACUUM could stop after it processed those '10 elderly tuples' and
>> on the next pass, resume from that point instead of starting from
>> the beginning again ...
>
> That is sorta what the vacuum delay settings accomplish.

What they do is orthogonal to that.

"Vacuum delay" prevents vacuum I/O from taking over the I/O bus.

Unfortunately, if you have a table with a very large number of _live_
tuples, there is no way to skip over those and only concentrate on the
dead ones.

In that scenario "vacuum delay" leads to the vacuum on the table
running for a Very, Very Long Time, because it sits there delaying a
lot as it walks thru pages it never modifies.  The one good news is
that, for any pages where no tuples are touched, the indices are also
left untouched.
-- 
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','gmail.com').
http://linuxdatabases.info/info/slony.html
"The Board views the endemic use of PowerPoint briefing slides instead
of technical papers  as an illustration of  the problematic methods of
technical communication at NASA."   -- Official report on the Columbia
shuttle disaster.


Re: Feature freeze date for 8.1

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> I was under the impression that the specific scenario is 
> busy-in-transaction backends continuing to produce and send data while 
> the client has disappeared.  Why does the backend ignore network errors 
> and keep sending data?

There are a couple of reasons behind that:

1. In terms of resources held against the rest of the database,
a query doing SELECT is hardly likely to be your worst problem.
Queries doing UPDATE, VACUUM, etc are holding stronger locks and
probably chewing at least as many resources; but they aren't doing
any client I/O (until the command tag at the very end) and so are
not going to detect client connection loss significantly earlier
than they do now anyway.

2. Because of the lack of output, the behavior of non-SELECT queries
is that they complete before the backend notices client connection
loss.  This is nice because it makes it simpler to reason about what
will happen.  Currently the backend guarantees the same for SELECT
queries, which is also nice, since we have plenty of SELECT queries
with side-effects (consider user function invocation).  Do we really
want to give that up?

3. If we error out on send() failure then we have turned a probable
failure into certain failure, because we will have lost message-boundary
sync with the client --- in other words the error might as well be a
FATAL one.  This seems like it might be overkill.

4. Erroring out in the low-level routines is trickier than it looks;
in particular, after you elog() your complaint, elog.c is going to
come right back to you with an error message to send to the client.
Not having this turn into an infinite recursion is a bit ticklish.
Making sure it stays working is even trickier, considering what a
seldom-tested code path it's going to be.
        regards, tom lane


Re: Feature freeze date for 8.1

From
Tom Lane
Date:
Oliver Jowett <oliver@opencloud.com> writes:
> The scenario I need to deal with is this:

> There are multiple nodes, network-separated, participating in a cluster.
> One node is selected to talk to a particular postgresql instance (call
> this node A).

> A starts a transaction and grabs some locks in the course of that
> transaction. Then A falls off the network before committing because of a
> hardware or network failure. A's connection might be completely idle
> when this happens.

> The cluster liveness machinery notices that A is dead and selects a new
> node to talk to postgresql (call this node B). B resumes the work that A
> was doing prior to failure.

> B has to wait for any locks held by A to be released before it can make
> any progress.

> Without some sort of tunable timeout, it could take a very long time (2+
> hours by default on Linux) before A's connection finally times out and
> releases the locks.

Wouldn't it be reasonable to expect the "cluster liveness machinery" to
notify the database server's kernel that connections to A are now dead?
I find it really unconvincing to suppose that the above problem should
be solved at the database level.
        regards, tom lane


Re: Feature freeze date for 8.1

From
Heikki Linnakangas
Date:
On Mon, 2 May 2005, Hannu Krosing wrote:

> Well, I've had problems with clients which resolve DB timeouts by
> closing the current connection and establish a new one.
>
> If it is actual DB timeout, then it all is ok, the server soon notices
> that the client connection is closed and kills itself.
>
> Problems happen when the timeout is caused by actual network problems -
> when i have 300 clients (server's max_connections=500) which try to
> reconnect after network outage, only 200 of them can do so as the server
> is holding to 300 old connections.
>
> In my case this has nothing to do with locks or transactions.
>
> It would be nice if I coud st up some timeut using keepalives (like ssh-
> s ProtocoKeepalives") and use similar timeouts on client and server.

FWIW, I've been bitten by this problem twice with other applications.

1. We had a DB2 database with clients running in other computers in the 
network. A faulty switch caused random network outages. If the connection 
timed out and the client was unable to send it's request to the server, 
the client would notice that the connection was down, and open a new one. 
But the server never noticed that the connection was dead. Eventually, 
the maximum number of connections was reached, and the administrator had 
to kill all the connections manually.

2. We had a custom client-server application using TCP across a network. 
There was stateful firewall between the server and the clients that 
dropped the connection at night when there was no activity. After a 
couple of days, the server reached the maximum number of threads on the 
platform and stopped accepting new connections.

In case 1, the switch was fixed. If another switch fails, the same will 
happen again. In case 2, we added an application-level heartbeat that 
sends a dummy message from server to client every 10 minutes.

TCP keep-alive with a small interval would have saved the day in both 
cases. Unfortunately the default interval must be >= 2 hours, according 
to RFC1122.

On most platforms, including Windows and Linux, the TCP keep-alive 
interval can't be set on a per-connection basis. The ideal solution would 
be to modify the operating system to support it.

What we can do in PostgreSQL is to introduce an application-level 
heartbeat. A simple "Hello world" message sent from server to client that 
the client would ignore would do the trick.

- Heikki


Re: Feature freeze date for 8.1

From
Date:
On Mon, 2 May 2005 18:47:14 +0300 (EEST)Heikki Linnakangas <hlinnaka@iki.fi> wrote:

>FWIW, I've been bitten by this problem twice with other
>applications.
>
>1. We had a DB2 database with clients running in other
>computers in the network. A faulty switch caused random
>network outages. If the connection timed out and the
>client was unable to send it's request to the server, the
>client would notice that the connection was down, and open
>a new one. But the server never noticed that the
>connection was dead. Eventually, the maximum number of
>connections was reached, and the administrator had to kill
>all the connections manually.
Are you pleased from this feature on DB2 ? I think you
will say no :-)

>2. We had a custom client-server application using TCP
>across a network. There was stateful firewall between the
>server and the clients that dropped the connection at
>night when there was no activity. After a couple of days,
>the server reached the maximum number of threads on the
>platform and stopped accepting new connections.

Yes, because your firewall drops only connectiona between
clients and firewall, not database.

>In case 1, the switch was fixed. If another switch fails,
>the same will happen again. In case 2, we added an
>application-level heartbeat that sends a dummy message
>from server to client every 10 minutes.
>
>TCP keep-alive with a small interval would have saved the
>day in both cases. Unfortunately the default interval must
>be >= 2 hours, according to RFC1122.
Yes..

>On most platforms, including Windows and Linux, the TCP
>keep-alive interval can't be set on a per-connection
>basis. The ideal solution would be to modify the operating
>system to support it.

How will we do this ?

>What we can do in PostgreSQL is to introduce an
>application-level heartbeat. A simple "Hello world"
>message sent from server to client that the client would
>ignore would do the trick.

This couldnt be not forgetten that a clients can have more
than one connection to database and one of them is lost..

Best Regards,

Adnan DURSUN
ASRIN Bilişim Hiz.Ltd.


Re: Feature freeze date for 8.1

From
Andrew - Supernews
Date:
On 2005-05-02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> While that isn't an unreasonable issue on its face, I think it really
> boils down to this: the OP is complaining because he thinks the
> connection-loss timeout mandated by the TCP RFCs is too long.  Perhaps
> the OP knows network engineering far better than the authors of those
> RFCs, or perhaps not.  I'm not convinced that Postgres ought to provide
> a way to second-guess the TCP stack ...

Speaking as someone who _does_ know network engineering, I would say that
yes, Postgres absolutely should do so.

The TCP keepalive timeout _is not intended_ to do this job; virtually
every application-level protocol manages its own timeouts independently
of TCP. (The few exceptions, such as telnet, tend to be purely interactive
protocols that rely on the user to figure out that something got stuck.)

One way to handle this is to have an option, set by the client, that
causes the server to send some ignorable message after a given period
of time idle while waiting for the client. If the idleness was due to
network partitioning or similar failure, then this ensures that the
connection breaks within a known time. This is safer than simply having
the backend abort after a given idle period.

If you want comparisons from other protocols, just look around - SMTP,
ssh, IRC, BGP, NNTP, FTP, and many, many more protocols all use timeouts
(or in some cases keepalive messages) with intervals much shorter than the
TCP keepalive timeout itself.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


Re: Feature freeze date for 8.1

From
Rob Butler
Date:
 
> One way to handle this is to have an option, set by
> the client, that
> causes the server to send some ignorable message
> after a given period
> of time idle while waiting for the client. If the
> idleness was due to
> network partitioning or similar failure, then this
> ensures that the
> connection breaks within a known time. This is safer
> than simply having
> the backend abort after a given idle period.

Another option is to have the client driver send some
ignorable message instead of the server.  If the
server doesn't get a message every timeout
minutes/seconds + slop factor, then it drops the
connection.  So libpq, JDBC, .net etc would all have
to have this implemented, but the changes to the
server would probably be simpler this way, wouldn't they?

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Re: Feature freeze date for 8.1

From
Bruno Wolff III
Date:
On Mon, May 02, 2005 at 12:29:33 -0700, Rob Butler <crodster2k@yahoo.com> wrote:
>  
> > One way to handle this is to have an option, set by
> > the client, that
> > causes the server to send some ignorable message
> > after a given period
> > of time idle while waiting for the client. If the
> > idleness was due to
> > network partitioning or similar failure, then this
> > ensures that the
> > connection breaks within a known time. This is safer
> > than simply having
> > the backend abort after a given idle period.
> 
> Another option is to have the client driver send some
> ignorable message instead of the server.  If the
> server doesn't get a message every timeout
> minutes/seconds + slop factor, then it drops the
> connection.  So libpq, JDBC, .net etc would all have
> to have this implemented, but the changes to the
> server would probably be simpler this way, wouldn't they?

Except it won't work, because the server is who needs to know about
the problem. If the network is down, you can't send a TCP RST packet
to close the connection on the server side.


Re: Feature freeze date for 8.1

From
Andrew - Supernews
Date:
On 2005-05-02, Rob Butler <crodster2k@yahoo.com> wrote:
> Another option is to have the client driver send some
> ignorable message instead of the server.  If the
> server doesn't get a message every timeout
> minutes/seconds + slop factor, then it drops the
> connection.  So libpq, JDBC, .net etc would all have
> to have this implemented, but the changes to the
> server would probably be simpler this way, wouldn't they?

Then the client has to guarantee that it can stop whatever it was doing
(which might have nothing to do with the database) every so often in
order to send a message; this isn't feasible for most clients.

The server-based method is actually no more complex to implement on the
server end and does not impose any such restrictions on the client (even
if the client sets the option and then ignores the database connection
for a long time, all that happens is that the TCP window fills up).

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


Re: Feature freeze date for 8.1

From
Tom Lane
Date:
Andrew - Supernews <andrew+nonews@supernews.com> writes:
> Then the client has to guarantee that it can stop whatever it was doing
> (which might have nothing to do with the database) every so often in
> order to send a message; this isn't feasible for most clients.

It's certainly infeasible for libpq, which has no portable way to force
the calling app to give it control.
        regards, tom lane


Re: Feature freeze date for 8.1

From
Date:
On Mon, 02 May 2005 19:53:56 -0000Andrew - Supernews <andrew+nonews@supernews.com> wrote:

>The server-based method is actually no more complex to
>implement on the server end and does not impose any such
restrictions on
>the client (even if the client sets the option and then
ignores the database connection
>for a long time, all that happens is that the TCP window
fills up).

Yes, any solution on the client side also requires all
client connection interface to be changed..But, at server
side solution only server side would be changed :-) I dont
know it is complex or not. as you know Oracle has a
parameter to implement that solution on the server side. 

Best Regards,

Adnan DURSUN
ASRIN Bilişim Hiz.Ltd.


Re: Feature freeze date for 8.1

From
Alvar Freude
Date:
Hi,

-- adnandursun@asrinbilisim.com.tr wrote:

>   So this means, If client does never try to send data the
> resources would be going to be held.
> I think it is not a good solution to find zombie / dead
> connection and clear them..

With TCP/IP you DON'T have any other options then waiting for a timeout. In
one or another way. This is a feature of TCP connections.


But when happens this?
 A) when you unplug the client or server from the net without shutdown B) when you firewall the client or server ("deny
allfrom any to any")
 


When one of this happens, there is another serious problem. First you should
handle this.


AFAIK is A) managable by switches our routers.  


With SO_KEEPALIVE there is a chance to detect dead connections earlyer: when
sending or receiving data. Perhaps it is possible to write a daemon, which
surveys all open sockets of a machine and kills the owner of any dead socket.
But this something for the OS, not for PostgreSQL.

Another option is to check periodical with select (2) the connections
(instead of sending/receiving something); I'm not a TCP/IP specialist, but
perhaps this helps in these rare situations, when the user knows that the
connections between server and client break often. This needs a lot of
changes in the PG backend, I guess. And it may cost performance, so please
only as compile time switch.


Because someone mentioned MySQL as positive example: that's wrong. MySQL *can
not* handle broken/cloded connections, when the remote machine is gone (e.g.
firewalled or unplugged). I had this scenario some weeks ago: the client was
sending queries to the server, and the firewall blocked them (because before
this query the socket was about 5 minutes idle); the whole application was
blocked. 


Ciao Alvar


Alvar C.H. Freude -- http://alvar.a-blast.org/
http://odem.org/
http://www.assoziations-blaster.de/info/Hommingberger-Gepardenforelle.html
http://www.assoziations-blaster.de/





Re: Feature freeze date for 8.1

From
"Jim C. Nasby"
Date:
FWIW, I've found myself wishing I could set statement_timeout on a per user
or per group basis. Likewise for log_min_duration_statement.

On Mon, May 02, 2005 at 11:38:12PM +0300, adnandursun@asrinbilisim.com.tr wrote:
> On Mon, 02 May 2005 19:53:56 -0000
>  Andrew - Supernews <andrew+nonews@supernews.com> wrote:
> 
> >The server-based method is actually no more complex to
> >implement on the server end and does not impose any such
> restrictions on
> >the client (even if the client sets the option and then
> ignores the database connection
> >for a long time, all that happens is that the TCP window
> fills up).
> 
> Yes, any solution on the client side also requires all
> client connection interface to be changed..But, at server
> side solution only server side would be changed :-) I dont
> know it is complex or not. as you know Oracle has a
> parameter to implement that solution on the server side. 
> 
> Best Regards,
> 
> Adnan DURSUN
> ASRIN Bili?im Hiz.Ltd.
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
> 

-- 
Jim C. Nasby, Database Consultant               decibel@decibel.org 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


Re: Feature freeze date for 8.1

From
Oliver Jowett
Date:
Tom Lane wrote:

> Wouldn't it be reasonable to expect the "cluster liveness machinery" to
> notify the database server's kernel that connections to A are now dead?

No, because it's a node-level liveness test, not a machine-level
liveness. It's possible that all that happened is the node's VM crashed.
The clustering is all done in userspace.

-O


Re: Feature freeze date for 8.1

From
Oliver Jowett
Date:
Tom Lane wrote:

> Wouldn't it be reasonable to expect the "cluster liveness machinery" to
> notify the database server's kernel that connections to A are now dead?
> I find it really unconvincing to suppose that the above problem should
> be solved at the database level.

Actually, if you were to implement this as you suggest, you either put
full-blown group communication in the kernel (ow, no thanks!) or you
implement a system where the DB server's kernel has a heartbeat to each
peer (e.g. A) and if that heartbeat stops, it kills the corresponding
connections.

But that functionality already exists: it is SO_KEEPALIVE.

(I think we're arguing in circles here..)

-O


Re: Feature freeze date for 8.1

From
"Chuck McDevitt"
Date:

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Tom Lane
> Sent: Monday, May 02, 2005 1:17 PM
> To: andrew@supernews.com
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Feature freeze date for 8.1
>
> Andrew - Supernews <andrew+nonews@supernews.com> writes:
> > Then the client has to guarantee that it can stop whatever it was
doing
> > (which might have nothing to do with the database) every so often in
> > order to send a message; this isn't feasible for most clients.
>
> It's certainly infeasible for libpq, which has no portable way to
force
> the calling app to give it control.
>
>             regards, tom lane

Why not just use SO_KEEPALIVE on the TCP socket?  Then the TCP stack
handles sending the keepalive messages, and there is no requirement that
the client application give control to anything... It's all handled by
the TCP stack.




Re: Feature freeze date for 8.1

From
Kris Jurka
Date:

On Mon, 2 May 2005, Jim C. Nasby wrote:

> FWIW, I've found myself wishing I could set statement_timeout on a per user
> or per group basis. Likewise for log_min_duration_statement.
> 

See ALTER USER ... SET

Kris Jurka



Re: Feature freeze date for 8.1

From
Oliver Jowett
Date:
Chuck McDevitt wrote:

> Why not just use SO_KEEPALIVE on the TCP socket? 

We already do, but the default keepalive interval makes it next to useless.

-O


Re: Feature freeze date for 8.1

From
"Chuck McDevitt"
Date:

> -----Original Message-----
> From: Oliver Jowett [mailto:oliver@opencloud.com]
> Sent: Monday, May 02, 2005 3:06 PM
> To: Chuck McDevitt
> Cc: Tom Lane; andrew@supernews.com; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Feature freeze date for 8.1
>
> Chuck McDevitt wrote:
>
> > Why not just use SO_KEEPALIVE on the TCP socket?
>
> We already do, but the default keepalive interval makes it next to
useless.
>
> -O

So, change the default.  On Linux it's in
/proc/sys/net/ipv4/tcp_keepalive_time

Admittedly, this isn't a great solution, but it had the advantage of
being simple.



Re: Feature freeze date for 8.1

From
Dawid Kuroczko
Date:
On 5/2/05, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> What we can do in PostgreSQL is to introduce an application-level
> heartbeat. A simple "Hello world" message sent from server to client that
> the client would ignore would do the trick.

Hmm, a quick-and-dirty implementation could be that a client issues
"LISTEN heartbeat;" command, and there would be other client issuing
"NOTIFY heartbeat;" every few minutes.  I am not sure but this would
probably make server send out these messages to the client, regardless
of whether the client is doing something or not.  Again, I am not sure.
Ah, and probably so many NOTIFY messages wouldn't be very nice for
system tables.
  Regards,       Dawid


Re: Feature freeze date for 8.1

From
Hannu Krosing
Date:
On E, 2005-05-02 at 18:47 +0300, Heikki Linnakangas wrote:
> On Mon, 2 May 2005, Hannu Krosing wrote:

> > It would be nice if I coud st up some timeut using keepalives (like ssh-
> > s ProtocoKeepalives") and use similar timeouts on client and server.
> 
> FWIW, I've been bitten by this problem twice with other applications.
> 
> 1. We had a DB2 database with clients running in other computers in the 
> network. A faulty switch caused random network outages. If the connection 
> timed out and the client was unable to send it's request to the server, 
> the client would notice that the connection was down, and open a new one. 
> But the server never noticed that the connection was dead. Eventually, 
> the maximum number of connections was reached, and the administrator had 
> to kill all the connections manually.
> 
> 2. We had a custom client-server application using TCP across a network. 
> There was stateful firewall between the server and the clients that 
> dropped the connection at night when there was no activity. After a 
> couple of days, the server reached the maximum number of threads on the 
> platform and stopped accepting new connections.
> 
> In case 1, the switch was fixed. If another switch fails, the same will 
> happen again. In case 2, we added an application-level heartbeat that 
> sends a dummy message from server to client every 10 minutes.
> 
> TCP keep-alive with a small interval would have saved the day in both 
> cases. Unfortunately the default interval must be >= 2 hours, according 
> to RFC1122.
> 
> On most platforms, including Windows and Linux, the TCP keep-alive 
> interval can't be set on a per-connection basis. The ideal solution would 
> be to modify the operating system to support it.

Yep. I think this could be done for (our instance of) linux, but getting
it into mainstream kernel, and then into all popular distros is a lot of
effort.

Going the ssh way (protocol level keepalives) might be way simpler.

> What we can do in PostgreSQL is to introduce an application-level 
> heartbeat. A simple "Hello world" message sent from server to client that 
> the client would ignore would do the trick.

Actually we would need a round-trip indicator (some there-and-back
message: A: do you copy 42 --> B: yes I copy 42), and not just send. The
difficult part is what to do when one side happens to send the keepalive
in the middle of actual data transfer ? 

move to packet oriented connections (UDP) and make different packet
types independant of each other? 

-- 
Hannu Krosing <hannu@skype.net>



Re: Feature freeze date for 8.1

From
Tom Lane
Date:
Hannu Krosing <hannu@skype.net> writes:
>> What we can do in PostgreSQL is to introduce an application-level 
>> heartbeat. A simple "Hello world" message sent from server to client that 
>> the client would ignore would do the trick.

> Actually we would need a round-trip indicator (some there-and-back
> message: A: do you copy 42 --> B: yes I copy 42), and not just send.

No, a one-way message is sufficient.  The reason is that once we've
asked the TCP stack to send something, the customary timeouts before
declaring the connection dead are far shorter than they are for
keepalives.  Also see the point that we must not assume that the
client-side library can get control on short notice (or indeed any
notice).

I am a tad worried about the possibility that if the client does nothing
for long enough, the TCP output buffer will fill causing the backend to
block at send().  A permanently blocked backend is bad news from a
performance point of view (it degrades the sinval protocol for everyone
else).
        regards, tom lane


Re: Feature freeze date for 8.1

From
"Dave Held"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, May 03, 2005 9:31 AM
> To: Hannu Krosing
> Cc: Heikki Linnakangas; Neil Conway; Oliver Jowett;
> adnandursun@asrinbilisim.com.tr; Peter Eisentraut; Alvaro Herrera;
> pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Feature freeze date for 8.1
>
> [...]
> I am a tad worried about the possibility that if the client
> does nothing for long enough, the TCP output buffer will fill
> causing the backend to block at send().  A permanently blocked
> backend is bad news from a performance point of view (it
> degrades the sinval protocol for everyone else).

So use MSG_DONTWAIT or O_NONBLOCK on the keepalive packets.
That won't stop the buffer from getting filled up, but if you
get an EAGAIN while sending a keepalive packet, you know the
client is either dead or really busy.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129


Re: Feature freeze date for 8.1

From
Heikki Linnakangas
Date:
On Tue, 3 May 2005, Tom Lane wrote:

> I am a tad worried about the possibility that if the client does nothing
> for long enough, the TCP output buffer will fill causing the backend to
> block at send().  A permanently blocked backend is bad news from a
> performance point of view (it degrades the sinval protocol for everyone
> else).

Do you mean this scenario:

1. client application doesn't empty its receive buffer (doesn't call   read)
2. server keeps sending data
3. client receive buffer fills
4. server send buffer fills
5. server send blocks.

Unfortunately there's no way to tell if the client is misbehaving or the 
network connection is slow or the client is too busy to handle the data 
fast enough.

I guess we could increase the send buffer (can it be set per-connection?), 
but that only delays the problem.

Does statement_timeout fire on that scenario? How about the new
transaction_timeout option discussed in other threads?

- Heikki


Re: Feature freeze date for 8.1

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> Does statement_timeout fire on that scenario? How about the new
> transaction_timeout option discussed in other threads?

Probably neither, since very likely you aren't in a transaction at all.
I'd not expect the server to send these messages except when it's been
idle for awhile, so statement_timeout is certainly irrelevant.

BTW, the upthread proposal of just dropping the message (which is what
O_NONBLOCK would do) doesn't work; it will lose encryption sync on SSL
connections.
        regards, tom lane


Re: Feature freeze date for 8.1

From
"Dave Held"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, May 03, 2005 12:39 PM
> To: Heikki Linnakangas
> Cc: Hannu Krosing; Neil Conway; Oliver Jowett;
> adnandursun@asrinbilisim.com.tr; Peter Eisentraut; Alvaro Herrera;
> pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Feature freeze date for 8.1
>
> [...]
> BTW, the upthread proposal of just dropping the message (which is what
> O_NONBLOCK would do) doesn't work; it will lose encryption sync on SSL
> connections.

How about an optional second connection to send keepalive pings?
It could be unencrypted and non-blocking.  If authentication is
needed on the ping port (which it doesn't seem like it would need
to be), it could be very simple, like this:

* client connects to main port
* server authenticates client normally
* server sends nonce token for keepalive authentication
* client connects to keepalive port
* client sends nonce token on keepalive port
* server associates matching keepalive connection with main    connection
* if server does not receive matching token within a small   timeout, no keepalive support enabled for this session

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129


Re: Feature freeze date for 8.1

From
Date:
On Tue, 3 May 2005 13:02:46 -0500"Dave Held" <dave.held@arraysg.com> wrote:

>How about an optional second connection to send keepalive
>pings?
>It could be unencrypted and non-blocking.  If
authentication is needed 
>on the ping port (which it doesn't seem like itwould
needto be), 
>it could be very simple, like this:
>
>* client connects to main port
>* server authenticates client normally
>* server sends nonce token for keepalive authentication
>* client connects to keepalive port
>* client sends nonce token on keepalive port
>* server associates matching keepalive connection with
main connection
>* if server does not receive matching token within a small
>    timeout, no keepalive support enabled for this session

Yes, this looks like good.But ;
     1. Do client interfaces (ODBC,JDBC OLEDB etc) need to
be changed ?
     2. If a firewall is used, ppl need to know the second
port number so mean that 2 parameters should be added to
postgres the first is timeout value and the second is port
number of the second port would be used for keepalive..

Best Regards,

Adnan DURSUN
ASRIN Bili?im Hiz.Ltd.


Re: Feature freeze date for 8.1

From
"Dave Held"
Date:
> -----Original Message-----
> From: adnandursun@asrinbilisim.com.tr
> [mailto:adnandursun@asrinbilisim.com.tr]
> Sent: Tuesday, May 03, 2005 3:36 PM
> To: Dave Held; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Feature freeze date for 8.1
>
> [...]
> Yes, this looks like good.But ;
>
>       1. Do client interfaces (ODBC,JDBC OLEDB etc) need to
> be changed ?

Only if they want to support the keepalive mechanism.  It should
be purely optional.

>       2. If a firewall is used, ppl need to know the second
> port number so mean that 2 parameters should be added to
> postgres the first is timeout value and the second is port
> number of the second port would be used for keepalive..

Sounds fine to me.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129


Re: Feature freeze date for 8.1

From
Tom Lane
Date:
"Dave Held" <dave.held@arraysg.com> writes:
> How about an optional second connection to send keepalive pings?
> It could be unencrypted and non-blocking.  If authentication is
> needed on the ping port (which it doesn't seem like it would need
> to be), it could be very simple, like this:

> * client connects to main port
> * server authenticates client normally
> * server sends nonce token for keepalive authentication
> * client connects to keepalive port
> * client sends nonce token on keepalive port
> * server associates matching keepalive connection with main 
>     connection
> * if server does not receive matching token within a small
>     timeout, no keepalive support enabled for this session


This seems to have nothing whatever to do with the stated problem?
        regards, tom lane


Re: Feature freeze date for 8.1

From
Thomas Swan
Date:
On 5/3/05, Dave Held <dave.held@arraysg.com> wrote:
> > -----Original Message-----
> > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> > Sent: Tuesday, May 03, 2005 12:39 PM
> > To: Heikki Linnakangas
> > Cc: Hannu Krosing; Neil Conway; Oliver Jowett;
> > adnandursun@asrinbilisim.com.tr; Peter Eisentraut; Alvaro Herrera;
> > pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] Feature freeze date for 8.1
> >
> > [...]
> > BTW, the upthread proposal of just dropping the message (which is what
> > O_NONBLOCK would do) doesn't work; it will lose encryption sync on SSL
> > connections.
>
> How about an optional second connection to send keepalive pings?
> It could be unencrypted and non-blocking.  If authentication is
> needed on the ping port (which it doesn't seem like it would need
> to be), it could be very simple, like this:
>
> * client connects to main port
> * server authenticates client normally
> * server sends nonce token for keepalive authentication
> * client connects to keepalive port
> * client sends nonce token on keepalive port
> * server associates matching keepalive connection with main
>     connection
> * if server does not receive matching token within a small
>     timeout, no keepalive support enabled for this session
>

This will not work through firewalls.  Is it not possible for the
server to test the current network connection with the client?


Re: Feature freeze date for 8.1

From
"Dave Held"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, May 03, 2005 4:20 PM
> To: Dave Held
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Feature freeze date for 8.1
>
>
> "Dave Held" <dave.held@arraysg.com> writes:
> > How about an optional second connection to send keepalive pings?
> > It could be unencrypted and non-blocking.  If authentication is
> > needed on the ping port (which it doesn't seem like it would need
> > to be), it could be very simple, like this:
>
> > * client connects to main port
> > * server authenticates client normally
> > * server sends nonce token for keepalive authentication
> > * client connects to keepalive port
> > * client sends nonce token on keepalive port
> > * server associates matching keepalive connection with main
> >     connection
> > * if server does not receive matching token within a small
> >     timeout, no keepalive support enabled for this session
>
>
> This seems to have nothing whatever to do with the stated problem?

I thought the problem was a server process that loses a
connection to a client sticking around and consuming resources.
And then I thought a possible solution was to try to see if
the client is still alive by sending it an occasional packet.
And then I thought a new problem is sending packets to an
unresponsive client and filling up the output buffer and
blocking the server process.

So it seems that a possible solution to that problem is to
have a separate connection for keepalive packets that doesn't
block and doesn't interfere with normal client/server
communication.

Now granted, it is possible that the primary connection could
die and the secondary is still alive.  So let's consider the
likely failure modes:

* physical network failure

In this case, I don't see how the secondary could survive while
the primary dies.

* client hangs or dies

If the client isn't reading keepalives from the server,
eventually the server's send queue will fill and the server
will see that the client is unresponsive.  The only way the
client could fail on the primary while responding on the
secondary is if it makes the connections in different threads,
and the primary thread crashes somehow.  At that point, I would
hope that the user would notice that the client has died and
shut it down completely.  Otherwise, the client should just not
create a separate thread for responding to keepalives.

* transient network congestion

It's possible that a keepalive could be delayed past the
expiration time, and the server would assume that the client
is dead when it's really not.  Then it would close the client's
connection rather rudely.  But then, since there's no reliable
way to tell if a client is dead or not, your other option is to
consume all your connections on maybe-dead clients.

So what am I missing?

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129


Re: Feature freeze date for 8.1

From
Doug McNaught
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> "Dave Held" <dave.held@arraysg.com> writes:
>> How about an optional second connection to send keepalive pings?
>> It could be unencrypted and non-blocking.  If authentication is
>> needed on the ping port (which it doesn't seem like it would need
>> to be), it could be very simple, like this:
>
>
>
> This seems to have nothing whatever to do with the stated problem?

Yeah--one of the original scenarios was "firewall drops DB connection
because it's inactive."  Pinging over a second socket does nothing to
address this.  

If you want to make sure network connection X is up, testing network
connection Y, which happens to be between the same two processes, is
only helpful in a limited set of circumstances.

-Doug


Re: Feature freeze date for 8.1

From
"Chuck McDevitt"
Date:

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Dave Held
> Sent: Tuesday, May 03, 2005 3:41 PM
> To: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Feature freeze date for 8.1
>
> > -----Original Message-----
> > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> > Sent: Tuesday, May 03, 2005 4:20 PM
> > To: Dave Held
> > Cc: pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] Feature freeze date for 8.1
> >
> >
> > "Dave Held" <dave.held@arraysg.com> writes:
> > > How about an optional second connection to send keepalive pings?
> > > It could be unencrypted and non-blocking.  If authentication is
> > > needed on the ping port (which it doesn't seem like it would need
> > > to be), it could be very simple, like this:
> >
> > > * client connects to main port
> > > * server authenticates client normally
> > > * server sends nonce token for keepalive authentication
> > > * client connects to keepalive port
> > > * client sends nonce token on keepalive port
> > > * server associates matching keepalive connection with main
> > >     connection
> > > * if server does not receive matching token within a small
> > >     timeout, no keepalive support enabled for this session
> >
> >
> > This seems to have nothing whatever to do with the stated problem?
>
> I thought the problem was a server process that loses a
> connection to a client sticking around and consuming resources.
> And then I thought a possible solution was to try to see if
> the client is still alive by sending it an occasional packet.
> And then I thought a new problem is sending packets to an
> unresponsive client and filling up the output buffer and
> blocking the server process.
>
> So it seems that a possible solution to that problem is to
> have a separate connection for keepalive packets that doesn't
> block and doesn't interfere with normal client/server
> communication.
>
> Now granted, it is possible that the primary connection could
> die and the secondary is still alive.  So let's consider the
> likely failure modes:
>
> * physical network failure
>
> In this case, I don't see how the secondary could survive while
> the primary dies.
>
> * client hangs or dies
>
> If the client isn't reading keepalives from the server,
> eventually the server's send queue will fill and the server
> will see that the client is unresponsive.  The only way the
> client could fail on the primary while responding on the
> secondary is if it makes the connections in different threads,
> and the primary thread crashes somehow.  At that point, I would
> hope that the user would notice that the client has died and
> shut it down completely.  Otherwise, the client should just not
> create a separate thread for responding to keepalives.
>
> * transient network congestion
>
> It's possible that a keepalive could be delayed past the
> expiration time, and the server would assume that the client
> is dead when it's really not.  Then it would close the client's
> connection rather rudely.  But then, since there's no reliable
> way to tell if a client is dead or not, your other option is to
> consume all your connections on maybe-dead clients.
>
> So what am I missing?
>
> __
> David B. Held
> Software Engineer/Array Services Group
> 200 14th Ave. East,  Sartell, MN 56377
> 320.534.3637 320.253.7800 800.752.8129
>

1)  Adding a separate connection means managing that connection, making
sure it gets connected/disconnected at the right times, and that it can
traverse the same firewalls as the primary connection.

2)  You'd need another process or another thread to respond on the
secondary connection.  If it's another process, the primary process
could die/hang while the keepalive process keeps working (or vice
versa).  If it's another thread, you are forcing all clients to support
multithreading.




Re: Feature freeze date for 8.1

From
Oliver Jowett
Date:
Dave Held wrote:

> So it seems that a possible solution to that problem is to
> have a separate connection for keepalive packets that doesn't
> block and doesn't interfere with normal client/server 
> communication.

What does this do that TCP keepalives don't? (other than add extra
connection management complexity..)

-O


Re: Feature freeze date for 8.1

From
Simon Riggs
Date:
Any chance one of you fine people could start another thread?

This has very little to do with "Feature freeze date for 8.1"...

Thanks,

Best Regards, Simon Riggs



Re: Feature freeze date for 8.1

From
Kaare Rasmussen
Date:
> This has very little to do with "Feature freeze date for 8.1"...

And btw I lost track of the thread. was any actual feature freeze date for 8.1 
approved?


Re: Feature freeze date for 8.1

From
Tom Lane
Date:
Kaare Rasmussen <kar@kakidata.dk> writes:
>> This has very little to do with "Feature freeze date for 8.1"...

> And btw I lost track of the thread. was any actual feature freeze date
> for 8.1 approved?

July 1 is the plan ... subject to change of course ...
        regards, tom lane


Re: Feature freeze date for 8.1

From
Andrew Dunstan
Date:

Tom Lane wrote:

>>And btw I lost track of the thread. was any actual feature freeze date
>>for 8.1 approved?
>>    
>>
>
>July 1 is the plan ... subject to change of course ...
>
>
>  
>

Incidentally, the way this was discussed/announced has been just right, 
IMHO. Big improvement over last year.

cheers

andrew


Re: Feature freeze date for 8.1

From
Bruce Momjian
Date:
Andrew Dunstan wrote:
> 
> 
> Tom Lane wrote:
> 
> >>And btw I lost track of the thread. was any actual feature freeze date
> >>for 8.1 approved?
> >>    
> >>
> >
> >July 1 is the plan ... subject to change of course ...
> >
> >
> >  
> >
> 
> Incidentally, the way this was discussed/announced has been just right, 
> IMHO. Big improvement over last year.

Yea, we're learning.  :-)

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Network write errors (was: Re: Feature freeze date for

From
Bruce Momjian
Date:
Andrew - Supernews wrote:
> On 2005-05-01, Peter Eisentraut <peter_e@gmx.net> wrote:
> > The problem, as I understand it, is that if you have a long-running 
> > query and the client process disappears, the query keeps running and 
> > holds whatever resources it may have until it finishes.  In fact, it 
> > keeps sending data to the client and keeps ignoring the SIGPIPE it gets 
> > (in case of a Unix-domain socket connection).
> 
> Ignoring the SIGPIPE is exactly the right thing to do.
> 
> What's _not_ a good idea is ignoring the EPIPE error from write(), which
> seems to currently be reported via ereport(COMMERROR) which doesn't try
> and abort the query as far as I can tell.

Where are you seeing this?  I looked from PostgresMain() to
ReadCommand() to SocketBackend() to pq_getbyte() which returns EOF, and
PostgresMain checks that and does a proc_exit(0).

I think the main problem is that a long-running query never tries to
interact with the client during the query.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Network write errors (was: Re: Feature freeze date for

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Andrew - Supernews wrote:
>> What's _not_ a good idea is ignoring the EPIPE error from write(), which
>> seems to currently be reported via ereport(COMMERROR) which doesn't try
>> and abort the query as far as I can tell.

> Where are you seeing this?  I looked from PostgresMain() to
> ReadCommand() to SocketBackend() to pq_getbyte() which returns EOF, and
> PostgresMain checks that and does a proc_exit(0).

It sounds like you were following the input-from-client logic.  Andrew
is complaining about the output-to-client side.

We deliberately don't abort on write-to-client failure.  There have
been periodic discussions about changing that, but I'm not convinced
that the advocates for a change have made a good case.  Right now,
it's predictable that the backend only fails due to loss of connection
when it waits for a new command.  The behavior would become much less
predictable if we allowed write failure to abort the query.  As an
example: whether an UPDATE command completes might depend on whether
any invoked triggers try to issue NOTICEs.
        regards, tom lane