Thread: transaction timeout

transaction timeout

From
Dr NoName
Date:
Hi all,

I have been quite satisfied with the level of support
from the PostgreSQL community, but this time I'm
getting nothing. So, is transaction timeout option
planned at all? What's the alternative solution to a
client that's hung in transaction?

thanks,

Eugene


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: transaction timeout

From
Alvaro Herrera
Date:
On Tue, Jul 26, 2005 at 07:40:30AM -0700, Dr NoName wrote:

> So, is transaction timeout option planned at all?

> What's the alternative solution to a client that's hung in
> transaction?

Forcibly end it?

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"I dream about dreams about dreams", sang the nightingale
under the pale moon (Sandman)

Re: transaction timeout

From
Dr NoName
Date:
Yeah, that's what we have to resort to now, but that's
not a solution. Until we kill the client, the entire
database is locked (or, at least the tables that other
clients need to write to, which is effectively the
same thing). This is annoying enough during the week
but it's especially a problem on weekends when none of
the developers are in the office.

A single client should not be able to bring the entire
database down. The DB should recognize that the client
went down and roll back the transaction. That would be
the ideal solution. Anything else we can do to remedy
the situation?

thanks,

Eugene


--- Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> On Tue, Jul 26, 2005 at 07:40:30AM -0700, Dr NoName
> wrote:
>
> > So, is transaction timeout option planned at all?
>
> > What's the alternative solution to a client that's
> hung in
> > transaction?
>
> Forcibly end it?
>
> --
> Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
> "I dream about dreams about dreams", sang the
> nightingale
> under the pale moon (Sandman)
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: transaction timeout

From
Michael Fuhr
Date:
On Tue, Jul 26, 2005 at 07:40:30AM -0700, Dr NoName wrote:
>
> I have been quite satisfied with the level of support
> from the PostgreSQL community, but this time I'm
> getting nothing.

There have been a couple of replies to your post, although perhaps
not what you were hoping for:

http://archives.postgresql.org/pgsql-general/2005-07/msg00984.php
http://archives.postgresql.org/pgsql-general/2005-07/msg00985.php

> So, is transaction timeout option planned at all?

A search through the archives shows that transaction timeout has
been discussed recently in pgsql-hackers, but unless I've missed
something it it hasn't been implemented yet (and therefore probably
won't be available in 8.1 since it's in feature freeze).

> What's the alternative solution to a client that's hung in transaction?

What's the client doing that takes locks strong enough to "lock up
the entire database"?  Why does the client hang?  Since the database
doesn't currently have a way to detect and handle these situations,
it might be worthwhile to find out what's happening to see if it
can be prevented.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: transaction timeout

From
Scott Marlowe
Date:
On Tue, 2005-07-26 at 09:40, Dr NoName wrote:
> Hi all,
>
> I have been quite satisfied with the level of support
> from the PostgreSQL community, but this time I'm
> getting nothing. So, is transaction timeout option
> planned at all? What's the alternative solution to a
> client that's hung in transaction?

The common view on this kind of thing is that if your client is broken,
you need to fix it.

That said, I have seen some folks post about writing a perl or shell
script that runs every x minutes looking for connections that have been
idle for > a certain amount of time and kill the backend associated with
it (sigterm, not -9...)

Generally speaking, I'd rather code my own solution to these problems
than do it the way Oracle does.  Often times the choices someone else
makes for you in fixing these problems are suboptimal for your system.

Re: transaction timeout

From
Scott Marlowe
Date:
On Tue, 2005-07-26 at 10:33, Dr NoName wrote:
> Yeah, that's what we have to resort to now, but that's
> not a solution. Until we kill the client, the entire
> database is locked (or, at least the tables that other
> clients need to write to, which is effectively the
> same thing). This is annoying enough during the week
> but it's especially a problem on weekends when none of
> the developers are in the office.

OK, for the third or fourth time, what kind of locks is your application
taking out that can lock the whole database?

>
> A single client should not be able to bring the entire
> database down.

A single client running a large unconstrained join can easily bring both
postgresql or Oracle to its knees.  Their very nature, of handling
hundreds of users accessing large amounts of data makes databases prone
to such problems, and requires you to carefully design your applications
so as not to do things that cause the database to hiccup.

> The DB should recognize that the client
> went down and roll back the transaction.

How, exactly, can PostgreSQL (or any other database) recognize a hung
client versus one that's waiting for an hour on user input?

> That would be
> the ideal solution. Anything else we can do to remedy
> the situation?

Yes, tell us what you're doing that "locks the whole database".

Re: transaction timeout

From
Alvaro Herrera
Date:
On Tue, Jul 26, 2005 at 08:33:19AM -0700, Dr NoName wrote:

> A single client should not be able to bring the entire
> database down. The DB should recognize that the client
> went down and roll back the transaction. That would be
> the ideal solution. Anything else we can do to remedy
> the situation?

Now wait just a second.  The database is not down at all just because
somebody left a transaction open.  The real problem is that that open
transaction is having some resources locked, right?

I guess the real answer is not to leave transactions open.  If you do
that by design, say because the app shows a data modification window,
and keeps a transaction open just to be able to save the changes later,
then you really need to rethink your app design.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
Maybe there's lots of data loss but the records of data loss are also lost.
(Lincoln Yeoh)

Re: transaction timeout

From
Dr NoName
Date:
> What's the client doing that takes locks strong
> enough to "lock up
> the entire database"?  Why does the client hang?

yeah, good question. I thought postgres uses
better-than-row-level locking? Could the total
deadlock be caused by a combination of an open
transaction and VACUUM FULL that runs every sunday?

> Since the database
> doesn't currently have a way to detect and handle
> these situations,
> it might be worthwhile to find out what's happening
> to see if it
> can be prevented.

*Anything* can happen. Like, for instance, last week a
user tried to kill the client and only managed to kill
some of the threads. But since the process was not
fully dead, the socket was not closed, so transaction
was still in progress. The point is that the client
must not to be trusted to always do the right thing.
That's why we have things like protected memory,
pre-emptive multitasking, resource limits, etc.
Similarly a database must have the ability to detect a
broken client and kick it out.

thanks,

Eugene



____________________________________________________
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs


Re: transaction timeout

From
Dr NoName
Date:
> On Tue, Jul 26, 2005 at 08:33:19AM -0700, Dr NoName
> wrote:
>
> > A single client should not be able to bring the
> entire
> > database down. The DB should recognize that the
> client
> > went down and roll back the transaction. That
> would be
> > the ideal solution. Anything else we can do to
> remedy
> > the situation?
>
> Now wait just a second.  The database is not down at
> all just because
> somebody left a transaction open.  The real problem
> is that that open
> transaction is having some resources locked, right?

right, but that's effectively the same thing: users
cannot write to the database and in some cases can't
even read from it.

> I guess the real answer is not to leave transactions
> open.  If you do
> that by design, say because the app shows a data
> modification window,
> and keeps a transaction open just to be able to save
> the changes later,
> then you really need to rethink your app design.

There is no user interaction in the middle of a
transaction. But there are other things we have to do
(file system I/O, heavy processing, etc.) Those
operations really do need to be interleaved with the
DB writes.

thanks,

Eugene

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: transaction timeout

From
Dr NoName
Date:
> OK, for the third or fourth time, what kind of locks
> is your application
> taking out that can lock the whole database?

I'd like to know that myself. How can a
select/inser/update lock an entire table or even
multiple tables?

> How, exactly, can PostgreSQL (or any other database)
> recognize a hung
> client versus one that's waiting for an hour on user
> input?

transaction timeout. In our application, one hour-long
transaction is not normal. I want it to abort
automatically.

> Yes, tell us what you're doing that "locks the whole
> database".

I wish I knew. Last sunday it was locked so bad that
even selects were blocked until we killed the
offending client.

Eugene



____________________________________________________
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs


Re: transaction timeout

From
Dr NoName
Date:
> The common view on this kind of thing is that if
> your client is broken,
> you need to fix it.

The problem is, we can't fix the users, nor can we fix
other software that our client has to interact with.
There will always be occasional situations when a
client gets stuck.

> That said, I have seen some folks post about writing
> a perl or shell
> script that runs every x minutes looking for
> connections that have been
> idle for > a certain amount of time and kill the
> backend associated with
> it (sigterm, not -9...)

what are the implications of killing a postmaster
process?

thanks,

Eugene

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: transaction timeout

From
Scott Marlowe
Date:
On Tue, 2005-07-26 at 11:24, Dr NoName wrote:
> > The common view on this kind of thing is that if
> > your client is broken,
> > you need to fix it.
>
> The problem is, we can't fix the users, nor can we fix
> other software that our client has to interact with.
> There will always be occasional situations when a
> client gets stuck.

> > That said, I have seen some folks post about writing
> > a perl or shell
> > script that runs every x minutes looking for
> > connections that have been
> > idle for > a certain amount of time and kill the
> > backend associated with
> > it (sigterm, not -9...)
>
> what are the implications of killing a postmaster
> process?

A Sigterm is generally considered safe.  It's -9 and its ilk that you
need to be wary of.

I think we and you both need more information about this failure.  Do
you have any logging turned on that could give us a clue to what's
causing this failure?

It sounds to me more like one of the user apps is DOSing the server with
large unconstrained joins or something similarly dangerous to do.

Re: transaction timeout

From
"Frank L. Parks"
Date:
TIP 2: Don't 'kill -9' the postmaster



Dr NoName wrote:

>>The common view on this kind of thing is that if
>>your client is broken,
>>you need to fix it.
>>
>>
>
>The problem is, we can't fix the users, nor can we fix
>other software that our client has to interact with.
>There will always be occasional situations when a
>client gets stuck.
>
>
>
>>That said, I have seen some folks post about writing
>>a perl or shell
>>script that runs every x minutes looking for
>>connections that have been
>>idle for > a certain amount of time and kill the
>>backend associated with
>>it (sigterm, not -9...)
>>
>>
>
>what are the implications of killing a postmaster
>process?
>
>thanks,
>
>Eugene
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around
>http://mail.yahoo.com
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: don't forget to increase your free space map settings
>
>
>
>
>


Re: transaction timeout

From
"Magnus Hagander"
Date:
> > > That said, I have seen some folks post about writing a
> perl or shell
> > > script that runs every x minutes looking for connections
> that have
> > > been idle for > a certain amount of time and kill the backend
> > > associated with it (sigterm, not -9...)
> >
> > what are the implications of killing a postmaster process?
>
> A Sigterm is generally considered safe.  It's -9 and its ilk
> that you need to be wary of.

No it's not. See the archives.
The only *safe* way to do it ATM is to restart the database. SIGTERM may
leave orphaned locks or such things in the system. (Incidentally, -9 on
a single backend should be safe I believe. The postmaster will tell all
concurrent connections to abort and restart. It's not nice, but it
should be safe - should perform onrmal recovery same as if you pull the
plug)


//Magnus

Re: transaction timeout

From
Dr NoName
Date:
> If you have second database in the cluster is it
> still operational when
> the main database locks up?

we don't have a DB cluster. It would be pretty useless
since postgresql doesn't support distributed
transactions.

> Also it seems that some diagnostics are needed in
> the client app to log
> the crash event so you can determine which SQL
> commands are causing the
> lock.

I'll try to get that next time it happens. But
regardless of sql commands are running, I know what
the root cause is: a client hangs while in
transaction.

> Despite many years of writing buggy code I have not
> yet locked a whole
> DB in the fashion described. I can not see how a
> simple select / insert
> / update command sequence can achieve it unless
> there is a particular
> relation between the tables involved.

As I have already said, I suspect this might be caused
by a combination of an open transaction and a weekly
VACUUM FULL. Does that sound right?

> If the tables are related / linked via rules /
> triggers/ keys then
> perhaps add a test table that bears no relation to
> the others and see if
> it is locked when the others appear to have this
> problem you are describing.
>
> Perhaps a simple test : When the DB error occurs can
> you use PGAdmin to
> read an independent table, or read from another
> database.

thanks, I'll try that.

Eugene




____________________________________________________
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs


Re: transaction timeout

From
Scott Marlowe
Date:
On Tue, 2005-07-26 at 12:41, Magnus Hagander wrote:
> > > > That said, I have seen some folks post about writing a
> > perl or shell
> > > > script that runs every x minutes looking for connections
> > that have
> > > > been idle for > a certain amount of time and kill the backend
> > > > associated with it (sigterm, not -9...)
> > >
> > > what are the implications of killing a postmaster process?
> >
> > A Sigterm is generally considered safe.  It's -9 and its ilk
> > that you need to be wary of.
>
> No it's not. See the archives.
> The only *safe* way to do it ATM is to restart the database. SIGTERM may
> leave orphaned locks or such things in the system. (Incidentally, -9 on
> a single backend should be safe I believe. The postmaster will tell all
> concurrent connections to abort and restart. It's not nice, but it
> should be safe - should perform onrmal recovery same as if you pull the
> plug)

Really?  I was under the impression that doing a "kill <backendpid>" on
an idle connection would clean up those things.  Was that a discussion
on hackers that brought this up?  And if so, what was the time period,
I'd like to read through it.

Re: transaction timeout

From
"Magnus Hagander"
Date:
> > > > > That said, I have seen some folks post about writing a
> > > perl or shell
> > > > > script that runs every x minutes looking for connections
> > > that have
> > > > > been idle for > a certain amount of time and kill the backend
> > > > > associated with it (sigterm, not -9...)
> > > >
> > > > what are the implications of killing a postmaster process?
> > >
> > > A Sigterm is generally considered safe.  It's -9 and its ilk that
> > > you need to be wary of.
> >
> > No it's not. See the archives.
> > The only *safe* way to do it ATM is to restart the
> database. SIGTERM
> > may leave orphaned locks or such things in the system.
> (Incidentally,
> > -9 on a single backend should be safe I believe. The
> postmaster will
> > tell all concurrent connections to abort and restart. It's
> not nice,
> > but it should be safe - should perform onrmal recovery same
> as if you
> > pull the
> > plug)
>
> Really?  I was under the impression that doing a "kill
> <backendpid>" on an idle connection would clean up those
> things.  Was that a discussion on hackers that brought this
> up?  And if so, what was the time period, I'd like to read through it.

It's been brought up several times on hackers. Once back last autumn
when looking at the pg_terminate_backend function, and it was brought up
again this spring I think during one of the discussions about the
instrumentation patch.

I can't find the actual discussion at a quick look, but the quote from
Tom on http://archives.postgresql.org/pgsql-hackers/2005-06/msg00970.php
certainly shows there is a problem :-) The discussion should be
somewhere around that timeframe (I think before it - before I came up
with the idea to solve it that didn't work)

//Magnus

Re: transaction timeout

From
Scott Marlowe
Date:
On Tue, 2005-07-26 at 12:51, Dr NoName wrote:
> > If you have second database in the cluster is it
> > still operational when
> > the main database locks up?
>
> we don't have a DB cluster. It would be pretty useless
> since postgresql doesn't support distributed
> transactions.

You misunderstood his point.  In PostgreSQL parlance, a "cluster" is a
single postmaster running on a single machine, with 1 or more
databases.  So, what he wanted to know was, if your application is
hitting a database called fred, and you have a spare database named
wilma, would "psql wilma" work when the database is "locked up?"

Can you elaborate on what you mean by a database that is "locked up?"

> > Also it seems that some diagnostics are needed in
> > the client app to log
> > the crash event so you can determine which SQL
> > commands are causing the
> > lock.
>
> I'll try to get that next time it happens. But
> regardless of sql commands are running, I know what
> the root cause is: a client hangs while in
> transaction.

Actually, unless the client is holding a table level exclusive lock,
most transactions should continue.

> > Despite many years of writing buggy code I have not
> > yet locked a whole
> > DB in the fashion described. I can not see how a
> > simple select / insert
> > / update command sequence can achieve it unless
> > there is a particular
> > relation between the tables involved.
>
> As I have already said, I suspect this might be caused
> by a combination of an open transaction and a weekly
> VACUUM FULL. Does that sound right?

No, VAcuum full shouldn't cause this kind of issue.  Now, if the
database is just running real slow, instead of actually locking up,
that's possible with vacuum full.


Re: transaction timeout

From
Dr NoName
Date:
> You misunderstood his point.  In PostgreSQL
> parlance, a "cluster" is a
> single postmaster running on a single machine, with
> 1 or more
> databases.  So, what he wanted to know was, if your
> application is
> hitting a database called fred, and you have a spare
> database named
> wilma, would "psql wilma" work when the database is
> "locked up?"


ok, I see. That's another thing to try next time.


> Can you elaborate on what you mean by a database
> that is "locked up?"


Here is the ps output from before the offending
process was killed. That one "idle in transaction"
process caused everything to lock.

2005 07 25 6:07:34  s17 79  > ps -efwww | grep
postgres
postgres 23281     1  0 Jul18 ?        00:00:29
/usr/bin/postmaster -p 5432
postgres 23285 23281  0 Jul18 ?        00:20:20
postgres: stats buffer process
postgres 23287 23285  0 Jul18 ?        00:18:08
postgres: stats collector process
postgres 12466 23281  0 Jul18 ?        00:00:00
postgres: siam siam_production 172.16.11.95 idle
postgres 12474 23281  0 Jul18 ?        00:00:00
postgres: siam siam_production 172.16.11.95 idle
postgres 26947 23281  0 Jul19 ?        00:00:00
postgres: siam siam_production 172.16.11.95 idle
postgres  3514 23281  0 Jul19 ?        00:00:00
postgres: siam siam_production 172.16.11.50 idle
postgres  6881 23281  0 Jul19 ?        00:00:01
postgres: siam siam_production 172.16.11.71 idle
postgres 17750 23281  0 Jul20 ?        00:00:00
postgres: siam siam_production 172.16.10.159 idle in
transaction
postgres 26504 23281  0 Jul20 ?        00:00:00
postgres: siam siam_production 172.16.11.50 idle
postgres 12284 23281  0 Jul20 ?        00:00:00
postgres: siam siam_production 172.16.10.125 idle
postgres 16026 23281  0 Jul20 ?        00:00:00
postgres: siam siam_production 172.16.10.125 idle
postgres 25709 23281  0 Jul21 ?        00:01:00
postgres: siam siam_production 172.16.1.17 idle
postgres 27980 23281  0 Jul21 ?        00:04:08
postgres: siam siam_production 172.16.1.17 idle
postgres 14854 23281  0 Jul21 ?        00:00:03
postgres: siam siam_production 172.16.11.95 idle
postgres 19531 23281  0 Jul21 ?        00:00:02
postgres: siam siam_production 172.16.11.95 idle
postgres 17590 23281  0 Jul22 ?        00:00:00
postgres: siam siam_production 172.16.11.95 idle
postgres 26917 23281  0 Jul22 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26933 23281  0 Jul22 ?        00:06:57
postgres: siam siam_production 172.16.1.17 idle
postgres 26934 23281  0 Jul22 ?        00:00:56
postgres: siam siam_production 172.16.1.17 idle
postgres 26939 23281  0 Jul22 ?        00:01:09
postgres: siam siam_production 172.16.1.17 idle
postgres 27362 23281  0 Jul22 ?        00:01:56
postgres: siam siam_production 172.16.1.17 idle
postgres 27365 23281  0 Jul22 ?        00:01:03
postgres: siam siam_production 172.16.1.17 idle
postgres 27398 23281  0 Jul22 ?        00:00:26
postgres: siam siam_production 172.16.1.17 idle
postgres 27856 23281  0 Jul22 ?        00:01:00
postgres: siam siam_production 172.16.1.17 idle
postgres 27858 23281  0 Jul22 ?        00:05:26
postgres: siam siam_production 172.16.1.17 idle
postgres 27863 23281  0 Jul22 ?        00:00:58
postgres: siam siam_production 172.16.1.17 idle
postgres 27865 23281  0 Jul22 ?        00:01:28
postgres: siam siam_production 172.16.1.17 idle
postgres 27869 23281  0 Jul22 ?        00:00:29
postgres: siam siam_production 172.16.1.17 idle
postgres 28295 23281  0 Jul22 ?        00:00:23
postgres: siam siam_production 172.16.1.17 idle
postgres 28313 23281  0 Jul22 ?        00:00:45
postgres: siam siam_production 172.16.1.17 idle
postgres 28315 23281  0 Jul22 ?        00:01:06
postgres: siam siam_production 172.16.1.17 idle
postgres 28725 23281  0 Jul22 ?        00:05:07
postgres: siam siam_production 172.16.1.17 idle
postgres 13559 23281  0 Jul22 ?        00:00:24
postgres: siam siam_production 172.16.1.17 idle
postgres 13595 23281  0 Jul22 ?        00:00:36
postgres: siam siam_production 172.16.1.17 idle
postgres 14017 23281  0 Jul22 ?        00:00:52
postgres: siam siam_production 172.16.1.17 idle
postgres 25206 23281  0 Jul22 ?        00:00:00
postgres: siam siam_production 172.16.11.146 idle
postgres  3742 23281  0 Jul22 ?        00:00:00
postgres: siam siam_production 172.16.10.142 idle
postgres 12016 23281  0 Jul22 ?        00:00:00
postgres: siam siam_production 172.16.10.106 idle
postgres 13782 23281  0 Jul22 ?        00:00:13
postgres: siam siam_production 172.16.10.106 idle
postgres 13853 23281  0 Jul22 ?        00:00:02
postgres: siam siam_production 172.16.10.106 idle
postgres 14381 23281  0 Jul22 ?        00:00:00
postgres: siam siam_production 172.16.10.106 idle
postgres 14923 23281  0 Jul22 ?        00:00:00
postgres: siam siam_production 172.16.10.106 idle
postgres 17181 23281  0 Jul22 ?        00:00:00
postgres: siam siam_production 172.16.11.59 idle
postgres  6212 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production [local] VACUUM waiting
postgres  5952 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.1.17 SELECT
waiting
postgres 24644 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.100 SELECT
waiting
postgres 26271 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.114 SELECT
waiting
postgres 26720 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.114 SELECT
waiting
postgres 26721 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.114 SELECT
waiting
postgres 27161 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.114 SELECT
waiting
postgres 27162 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.114 SELECT
waiting
postgres 28005 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.114 SELECT
waiting
postgres 28450 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.114 SELECT
waiting
postgres 28451 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.114 SELECT
waiting
postgres  3049 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.11.134 SELECT
waiting
postgres  3875 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.11.134 SELECT
waiting
postgres  4286 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.11.130 SELECT
waiting
postgres  4700 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.11.185 SELECT
waiting
postgres 13850 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.105 SELECT
waiting
postgres 13851 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.105 SELECT
waiting
postgres 13852 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.105 SELECT
waiting
postgres 13854 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.105 SELECT
waiting
postgres 13855 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.105 SELECT
waiting
postgres 13856 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.145 SELECT
waiting
postgres 14268 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.105 SELECT
waiting
postgres 14269 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.105 SELECT
waiting
postgres 14270 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.138 SELECT
waiting
postgres 14685 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.105 SELECT
waiting
postgres 14686 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.11.24 SELECT
waiting
postgres 15100 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.105 SELECT
waiting
postgres 15951 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.125 SELECT
waiting
postgres 16367 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.11.109 SELECT
waiting
postgres 25054 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.112 SELECT
waiting
postgres 25920 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.112 SELECT
waiting
postgres 25921 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.11.157 SELECT
waiting
postgres 25922 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.112 SELECT
waiting
postgres 26337 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.112 SELECT
waiting
postgres 26338 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.112 SELECT
waiting
postgres 10948 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.11.105 SELECT
waiting
postgres 12195 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.11.187 SELECT
waiting
postgres 12196 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.11.105 SELECT
waiting
postgres 22691 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.1.17 SELECT
waiting
postgres 23059 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.121 SELECT
waiting
postgres 15209 23281  0 04:00 ?        00:00:00
postgres: siam siam_production [local] SELECT waiting


Here is the ps output immediately after the hung
client was killed. As you can see, a whole shitload of
SELECTs suddenly woke up and finished. Unfortunately,
I don't have pg_stat_activity output.


2005 07 25 6:14:41  s17 78 > ps -efwww | grep postgres
postgres 23281     1  0 Jul18 ?        00:00:29
/usr/bin/postmaster -p 5432
postgres 23285 23281  0 Jul18 ?        00:20:21
postgres: stats buffer process
postgres 23287 23285  0 Jul18 ?        00:18:08
postgres: stats collector process
postgres 12466 23281  0 Jul18 ?        00:00:00
postgres: siam siam_production 172.16.11.95 idle
postgres 12474 23281  0 Jul18 ?        00:00:00
postgres: siam siam_production 172.16.11.95 idle
postgres 26947 23281  0 Jul19 ?        00:00:00
postgres: siam siam_production 172.16.11.95 idle
postgres  3514 23281  0 Jul19 ?        00:00:00
postgres: siam siam_production 172.16.11.50 idle
postgres  6881 23281  0 Jul19 ?        00:00:01
postgres: siam siam_production 172.16.11.71 idle
postgres 26504 23281  0 Jul20 ?        00:00:00
postgres: siam siam_production 172.16.11.50 idle
postgres 12284 23281  0 Jul20 ?        00:00:00
postgres: siam siam_production 172.16.10.125 idle
postgres 16026 23281  0 Jul20 ?        00:00:00
postgres: siam siam_production 172.16.10.125 idle
postgres 25709 23281  0 Jul21 ?        00:01:00
postgres: siam siam_production 172.16.1.17 idle
postgres 27980 23281  0 Jul21 ?        00:04:08
postgres: siam siam_production 172.16.1.17 idle
postgres 14854 23281  0 Jul21 ?        00:00:03
postgres: siam siam_production 172.16.11.95 idle
postgres 19531 23281  0 Jul21 ?        00:00:02
postgres: siam siam_production 172.16.11.95 idle
postgres 17590 23281  0 Jul22 ?        00:00:00
postgres: siam siam_production 172.16.11.95 idle
postgres 26917 23281  0 Jul22 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26933 23281  0 Jul22 ?        00:06:57
postgres: siam siam_production 172.16.1.17 idle
postgres 26934 23281  0 Jul22 ?        00:00:56
postgres: siam siam_production 172.16.1.17 idle
postgres 26939 23281  0 Jul22 ?        00:01:09
postgres: siam siam_production 172.16.1.17 idle
postgres 27362 23281  0 Jul22 ?        00:01:56
postgres: siam siam_production 172.16.1.17 idle
postgres 27365 23281  0 Jul22 ?        00:01:03
postgres: siam siam_production 172.16.1.17 idle
postgres 27398 23281  0 Jul22 ?        00:00:26
postgres: siam siam_production 172.16.1.17 idle
postgres 27856 23281  0 Jul22 ?        00:01:00
postgres: siam siam_production 172.16.1.17 idle
postgres 27858 23281  0 Jul22 ?        00:05:26
postgres: siam siam_production 172.16.1.17 idle
postgres 27863 23281  0 Jul22 ?        00:00:58
postgres: siam siam_production 172.16.1.17 idle
postgres 27865 23281  0 Jul22 ?        00:01:28
postgres: siam siam_production 172.16.1.17 idle
postgres 27869 23281  0 Jul22 ?        00:00:29
postgres: siam siam_production 172.16.1.17 idle
postgres 28295 23281  0 Jul22 ?        00:00:23
postgres: siam siam_production 172.16.1.17 idle
postgres 28313 23281  0 Jul22 ?        00:00:45
postgres: siam siam_production 172.16.1.17 idle
postgres 28315 23281  0 Jul22 ?        00:01:06
postgres: siam siam_production 172.16.1.17 idle
postgres 28725 23281  0 Jul22 ?        00:05:07
postgres: siam siam_production 172.16.1.17 idle
postgres 13559 23281  0 Jul22 ?        00:00:24
postgres: siam siam_production 172.16.1.17 idle
postgres 13595 23281  0 Jul22 ?        00:00:36
postgres: siam siam_production 172.16.1.17 idle
postgres 14017 23281  0 Jul22 ?        00:00:52
postgres: siam siam_production 172.16.1.17 idle
postgres 25206 23281  0 Jul22 ?        00:00:00
postgres: siam siam_production 172.16.11.146 idle
postgres  3742 23281  0 Jul22 ?        00:00:00
postgres: siam siam_production 172.16.10.142 idle
postgres 12016 23281  0 Jul22 ?        00:00:00
postgres: siam siam_production 172.16.10.106 idle
postgres 13782 23281  0 Jul22 ?        00:00:13
postgres: siam siam_production 172.16.10.106 idle
postgres 13853 23281  0 Jul22 ?        00:00:02
postgres: siam siam_production 172.16.10.106 idle
postgres 14381 23281  0 Jul22 ?        00:00:00
postgres: siam siam_production 172.16.10.106 idle
postgres 14923 23281  0 Jul22 ?        00:00:00
postgres: siam siam_production 172.16.10.106 idle
postgres 17181 23281  0 Jul22 ?        00:00:00
postgres: siam siam_production 172.16.11.59 idle
postgres  6212 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production [local] VACUUM waiting
postgres  5952 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 24644 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.100 idle
postgres 26721 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.114 idle
postgres 27161 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.10.114 idle
postgres 22691 23281  0 Jul24 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 15209 23281  0 03:59 ?        00:00:10
postgres: siam siam_production [local] COPY
postgres 26975 23281  0 06:14 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26976 23281  0 06:14 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26977 23281  0 06:14 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26978 23281  0 06:14 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26979 23281  0 06:14 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26980 23281  0 06:14 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26981 23281  0 06:14 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26982 23281  0 06:14 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26983 23281  0 06:14 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26984 23281  0 06:14 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26985 23281  0 06:14 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26986 23281  0 06:14 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26987 23281  0 06:14 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26988 23281  0 06:14 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26989 23281  0 06:14 ?        00:00:00
postgres: siam siam_production 172.16.1.17 idle
postgres 26990 23281  0 06:14 ?        00:00:00
postgres: siam siam_production 172.16.1.17 SELECT
waiting
postgres 27041 23281  0 06:14 ?        00:00:00
postgres: siam siam_production 172.16.11.130 SELECT
waiting
costa    27091 26473  0 06:14 pts/0    00:00:00 grep
postgres

> No, VAcuum full shouldn't cause this kind of issue.
> Now, if the
> database is just running real slow, instead of
> actually locking up,
> that's possible with vacuum full.

no, there was *zero* load on the server.

thanks,

Eugene

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: transaction timeout

From
Tom Lane
Date:
Scott Marlowe <smarlowe@g2switchworks.com> writes:
> On Tue, 2005-07-26 at 12:41, Magnus Hagander wrote:
>> The only *safe* way to do it ATM is to restart the database. SIGTERM may
>> leave orphaned locks or such things in the system.

> Really?  I was under the impression that doing a "kill <backendpid>" on
> an idle connection would clean up those things.

Well, it ought to, but I for one don't consider that code path
adequately tested --- and we have seen at least one report (from Rod
Taylor if memory serves) suggesting that there are in fact bugs in it.

We know that SIGTERM'ing all the backends at once leaves the database
with a good state on disk; that path is tested everytime someone shuts
down Postgres.  It does not follow that SIGTERM'ing a single backend
leaves consistent state in shared memory.  Rod's report suggested a
corrupt lock table in particular.

> Was that a discussion on hackers that brought this up?  And if so,
> what was the time period, I'd like to read through it.

There was a flamewar late during the 8.0 devel cycle where someone was
pushing to add a SIGTERM-one-backend function, and I was demurring that
I didn't think it was adequately tested.  Subsequent events seem to have
proven that fear correct.

Eventually I'm sure we'll find and fix the problem, but at the moment
it's a risky thing to do.

            regards, tom lane

Re: transaction timeout

From
Martijn van Oosterhout
Date:
On Tue, Jul 26, 2005 at 02:33:04PM -0400, Tom Lane wrote:
> Well, it ought to, but I for one don't consider that code path
> adequately tested --- and we have seen at least one report (from Rod
> Taylor if memory serves) suggesting that there are in fact bugs in it.
>
> We know that SIGTERM'ing all the backends at once leaves the database
> with a good state on disk; that path is tested everytime someone shuts
> down Postgres.  It does not follow that SIGTERM'ing a single backend
> leaves consistent state in shared memory.  Rod's report suggested a
> corrupt lock table in particular.

Well, is there debugging you can enable to check for corrupted locak
tables? If so, would it we worthwhile to setup a system with lots of
concurrent transactions and kill processes regularly and see if
anything strange happens.

Also, I've tended to use -INT to abort the current query, then -TERM to
kill the backend. Would this be safer, given you know exactly where
everything is at that point (aborted transaction)?

Does an aborted transaction release its locks straight away or does it
wait until the client issues a ROLLBACK?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: transaction timeout

From
Dr NoName
Date:
So can anyone suggest a solution that does not involve
killing the client when it hangs?

thanks,

Eugene

--- Dr NoName <spamacct11@yahoo.com> wrote:

> > You misunderstood his point.  In PostgreSQL
> > parlance, a "cluster" is a
> > single postmaster running on a single machine,
> with
> > 1 or more
> > databases.  So, what he wanted to know was, if
> your
> > application is
> > hitting a database called fred, and you have a
> spare
> > database named
> > wilma, would "psql wilma" work when the database
> is
> > "locked up?"
>
>
> ok, I see. That's another thing to try next time.
>
>
> > Can you elaborate on what you mean by a database
> > that is "locked up?"
>
>
> Here is the ps output from before the offending
> process was killed. That one "idle in transaction"
> process caused everything to lock.
>
> 2005 07 25 6:07:34  s17 79  > ps -efwww | grep
> postgres
> postgres 23281     1  0 Jul18 ?        00:00:29
> /usr/bin/postmaster -p 5432
> postgres 23285 23281  0 Jul18 ?        00:20:20
> postgres: stats buffer process
> postgres 23287 23285  0 Jul18 ?        00:18:08
> postgres: stats collector process
> postgres 12466 23281  0 Jul18 ?        00:00:00
> postgres: siam siam_production 172.16.11.95 idle
> postgres 12474 23281  0 Jul18 ?        00:00:00
> postgres: siam siam_production 172.16.11.95 idle
> postgres 26947 23281  0 Jul19 ?        00:00:00
> postgres: siam siam_production 172.16.11.95 idle
> postgres  3514 23281  0 Jul19 ?        00:00:00
> postgres: siam siam_production 172.16.11.50 idle
> postgres  6881 23281  0 Jul19 ?        00:00:01
> postgres: siam siam_production 172.16.11.71 idle
> postgres 17750 23281  0 Jul20 ?        00:00:00
> postgres: siam siam_production 172.16.10.159 idle in
> transaction
> postgres 26504 23281  0 Jul20 ?        00:00:00
> postgres: siam siam_production 172.16.11.50 idle
> postgres 12284 23281  0 Jul20 ?        00:00:00
> postgres: siam siam_production 172.16.10.125 idle
> postgres 16026 23281  0 Jul20 ?        00:00:00
> postgres: siam siam_production 172.16.10.125 idle
> postgres 25709 23281  0 Jul21 ?        00:01:00
> postgres: siam siam_production 172.16.1.17 idle
> postgres 27980 23281  0 Jul21 ?        00:04:08
> postgres: siam siam_production 172.16.1.17 idle
> postgres 14854 23281  0 Jul21 ?        00:00:03
> postgres: siam siam_production 172.16.11.95 idle
> postgres 19531 23281  0 Jul21 ?        00:00:02
> postgres: siam siam_production 172.16.11.95 idle
> postgres 17590 23281  0 Jul22 ?        00:00:00
> postgres: siam siam_production 172.16.11.95 idle
> postgres 26917 23281  0 Jul22 ?        00:00:00
> postgres: siam siam_production 172.16.1.17 idle
> postgres 26933 23281  0 Jul22 ?        00:06:57
> postgres: siam siam_production 172.16.1.17 idle
> postgres 26934 23281  0 Jul22 ?        00:00:56
> postgres: siam siam_production 172.16.1.17 idle
> postgres 26939 23281  0 Jul22 ?        00:01:09
> postgres: siam siam_production 172.16.1.17 idle
> postgres 27362 23281  0 Jul22 ?        00:01:56
> postgres: siam siam_production 172.16.1.17 idle
> postgres 27365 23281  0 Jul22 ?        00:01:03
> postgres: siam siam_production 172.16.1.17 idle
> postgres 27398 23281  0 Jul22 ?        00:00:26
> postgres: siam siam_production 172.16.1.17 idle
> postgres 27856 23281  0 Jul22 ?        00:01:00
> postgres: siam siam_production 172.16.1.17 idle
> postgres 27858 23281  0 Jul22 ?        00:05:26
> postgres: siam siam_production 172.16.1.17 idle
> postgres 27863 23281  0 Jul22 ?        00:00:58
> postgres: siam siam_production 172.16.1.17 idle
> postgres 27865 23281  0 Jul22 ?        00:01:28
> postgres: siam siam_production 172.16.1.17 idle
> postgres 27869 23281  0 Jul22 ?        00:00:29
> postgres: siam siam_production 172.16.1.17 idle
> postgres 28295 23281  0 Jul22 ?        00:00:23
> postgres: siam siam_production 172.16.1.17 idle
> postgres 28313 23281  0 Jul22 ?        00:00:45
> postgres: siam siam_production 172.16.1.17 idle
> postgres 28315 23281  0 Jul22 ?        00:01:06
> postgres: siam siam_production 172.16.1.17 idle
> postgres 28725 23281  0 Jul22 ?        00:05:07
> postgres: siam siam_production 172.16.1.17 idle
> postgres 13559 23281  0 Jul22 ?        00:00:24
> postgres: siam siam_production 172.16.1.17 idle
> postgres 13595 23281  0 Jul22 ?        00:00:36
> postgres: siam siam_production 172.16.1.17 idle
> postgres 14017 23281  0 Jul22 ?        00:00:52
> postgres: siam siam_production 172.16.1.17 idle
> postgres 25206 23281  0 Jul22 ?        00:00:00
> postgres: siam siam_production 172.16.11.146 idle
> postgres  3742 23281  0 Jul22 ?        00:00:00
> postgres: siam siam_production 172.16.10.142 idle
> postgres 12016 23281  0 Jul22 ?        00:00:00
> postgres: siam siam_production 172.16.10.106 idle
> postgres 13782 23281  0 Jul22 ?        00:00:13
> postgres: siam siam_production 172.16.10.106 idle
> postgres 13853 23281  0 Jul22 ?        00:00:02
> postgres: siam siam_production 172.16.10.106 idle
> postgres 14381 23281  0 Jul22 ?        00:00:00
> postgres: siam siam_production 172.16.10.106 idle
> postgres 14923 23281  0 Jul22 ?        00:00:00
> postgres: siam siam_production 172.16.10.106 idle
> postgres 17181 23281  0 Jul22 ?        00:00:00
> postgres: siam siam_production 172.16.11.59 idle
> postgres  6212 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production [local] VACUUM
> waiting
> postgres  5952 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.1.17 SELECT
> waiting
> postgres 24644 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.100 SELECT
> waiting
> postgres 26271 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.114 SELECT
> waiting
> postgres 26720 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.114 SELECT
> waiting
> postgres 26721 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.114 SELECT
> waiting
> postgres 27161 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.114 SELECT
> waiting
> postgres 27162 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.114 SELECT
> waiting
> postgres 28005 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.114 SELECT
> waiting
> postgres 28450 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.114 SELECT
> waiting
> postgres 28451 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.114 SELECT
> waiting
> postgres  3049 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.11.134 SELECT
> waiting
> postgres  3875 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.11.134 SELECT
> waiting
> postgres  4286 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.11.130 SELECT
> waiting
> postgres  4700 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.11.185 SELECT
> waiting
> postgres 13850 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.105 SELECT
> waiting
> postgres 13851 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.105 SELECT
> waiting
> postgres 13852 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.105 SELECT
> waiting
> postgres 13854 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.105 SELECT
> waiting
> postgres 13855 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.105 SELECT
> waiting
> postgres 13856 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.145 SELECT
> waiting
> postgres 14268 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.105 SELECT
> waiting
> postgres 14269 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.105 SELECT
> waiting
> postgres 14270 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.138 SELECT
> waiting
> postgres 14685 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.105 SELECT
> waiting
> postgres 14686 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.11.24 SELECT
> waiting
> postgres 15100 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.105 SELECT
> waiting
> postgres 15951 23281  0 Jul24 ?        00:00:00
> postgres: siam siam_production 172.16.10.125 SELECT
>
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: transaction timeout

From
Paul Tillotson
Date:
Dr NoName wrote:

>>What's the client doing that takes locks strong
>>enough to "lock up
>>the entire database"?  Why does the client hang?
>>
>>
>
>yeah, good question. I thought postgres uses
>better-than-row-level locking? Could the total
>deadlock be caused by a combination of an open
>transaction and VACUUM FULL that runs every sunday?
>
>
>
Sure.  Like this:

Client A accesses table T, and "hangs."
Client B attempts to get an ACCESS EXCLUSIVE lock on table T in
preparation for VACUUM FULL.
Client C connects to the database and waits for client B to get and
release his lock on table T.
Client D connects to the database and waits for client B to get and
release his lock on table T.
Client E connects to the database and waits for client B to get and
release his lock on table T.
etc...

until all your free connection slots are used up.

This happened to me once, except that client B was trying to rename
table T and create a new table T.

(You might think that clients C, D, and E should bypass client B (since
their access does not conflict with A's access.)  However, if that was
allowed, then a VACUUM FULL on a busy table would wait forever because
client C would slip in before A finished, and client D before C
finished, etc., leading to a situation called "lock starvation."  This
can really only be prevented by granting locks on a
first-come-first-serve basis.)

In your case, don't run VACUUM FULL via a cron job (i.e., when you're
not there).  If you need to run it regularly, you're almost certainly
not reserving enough space in the free space map.  VACUUM takes no locks
that conflict with selecting, inserting, updating, or deleting, so that
should be perfectly safe.

Regards,

Paul Tillotson


Re: transaction timeout

From
Dr NoName
Date:
> Sure.  Like this:
>
> Client A accesses table T, and "hangs."
> Client B attempts to get an ACCESS EXCLUSIVE lock on
> table T in
> preparation for VACUUM FULL.
> Client C connects to the database and waits for
> client B to get and
> release his lock on table T.
> Client D connects to the database and waits for
> client B to get and
> release his lock on table T.
> Client E connects to the database and waits for
> client B to get and
> release his lock on table T.
> etc...

oh! my! gawd!
Finally a clear explanation that makes perfect sense.
Now why did it take so long?

So all I need to do is take out the FULL? Is regular
VACUUM sufficient? How often do we need FULL? (I know
it's a stupid question without providing some more
context, but how can I estimate it?)

I suppose the ultimate solution would be a wrapper
script that works as follows:

check if there are any waiting/idle in transaction
processes
if such processes exist, do a regular VACUUM and send
out a warning email
otherwise, do VACUUM FULL.

I like this solution a lot more than getting support
calls on weekends.

Out of curiousity, how is lock acquisition implemented
in postgresql? All the processes have to go through
some sort of queue, so that locks are granted in FIFO
order, as you described. Just trying to understand it
better.

thanks a lot,

Eugene

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: transaction timeout

From
Scott Marlowe
Date:
On Wed, 2005-07-27 at 10:31, Dr NoName wrote:
> > Sure.  Like this:
> >
> > Client A accesses table T, and "hangs."
> > Client B attempts to get an ACCESS EXCLUSIVE lock on
> > table T in
> > preparation for VACUUM FULL.
> > Client C connects to the database and waits for
> > client B to get and
> > release his lock on table T.
> > Client D connects to the database and waits for
> > client B to get and
> > release his lock on table T.
> > Client E connects to the database and waits for
> > client B to get and
> > release his lock on table T.
> > etc...
>
> oh! my! gawd!
> Finally a clear explanation that makes perfect sense.
> Now why did it take so long?

Because your initial definition of the problem kinda led us all in the
wrong direction for 24 hours?  :)  Remember, it took like three times of
folks asking "what's happening that locks your database" before the
vacuum full issue came up.  From there, 24 more hours.  Actually not
bad.

And don't forget, the docs on vacuum pretty clearly state:

"The second form is the VACUUM FULL command. This uses a more aggressive
algorithm for reclaiming the space consumed by expired row versions. Any
space that is freed by VACUUM FULL is immediately returned to the
operating system. Unfortunately, this variant of the VACUUM command
acquires an exclusive lock on each table while VACUUM FULL is processing
it. Therefore, frequently using VACUUM FULL can have an extremely
negative effect on the performance of concurrent database queries."

And then later on:

"VACUUM FULL is recommended for cases where you know you have deleted
the majority of rows in a table, so that the steady-state size of the
table can be shrunk substantially with VACUUM FULL's more aggressive
approach. Use plain VACUUM, not VACUUM FULL, for routine vacuuming for
space recovery."

So, daily vacuum fulls are not recommended.

> So all I need to do is take out the FULL? Is regular
> VACUUM sufficient? How often do we need FULL? (I know
> it's a stupid question without providing some more
> context, but how can I estimate it?)

Please read up on vacuuming in the docs, at:

http://www.postgresql.org/docs/8.0/static/maintenance.html#ROUTINE-VACUUMING

It's quite enlightening about this.  Basically, assuming your fsm
settings are high enough for your update/delete load, yes, plain vacuums
should be enough.

>
> I suppose the ultimate solution would be a wrapper
> script that works as follows:
>
> check if there are any waiting/idle in transaction
> processes
> if such processes exist, do a regular VACUUM and send
> out a warning email
> otherwise, do VACUUM FULL.

Nah, that's probably overkill.  I'd rather just run plain vacuum
verboses and check them by hand once a week or so to make sure I'm
reclaiming all the space.

> I like this solution a lot more than getting support
> calls on weekends.

Amen brother, amen...

> Out of curiousity, how is lock acquisition implemented
> in postgresql? All the processes have to go through
> some sort of queue, so that locks are granted in FIFO
> order, as you described. Just trying to understand it
> better.

See here:

http://www.postgresql.org/docs/8.0/static/mvcc.html

PostgreSQL's locking system is quite impression.  I kinda giggle when
someone says "Well, not MySQL has feature Y, so why bother with
PostgreSQL?" It's pretty obvious they haven't really read up on pgsql
when they say things like that.

Re: transaction timeout

From
Dr NoName
Date:
Thanks a lot, everyone! That solved my problem. But I
still want to be able to set transaction timeout. Any
chance of that in the next release?

Eugene


--- Scott Marlowe <smarlowe@g2switchworks.com> wrote:

> On Wed, 2005-07-27 at 10:31, Dr NoName wrote:
> > > Sure.  Like this:
> > >
> > > Client A accesses table T, and "hangs."
> > > Client B attempts to get an ACCESS EXCLUSIVE
> lock on
> > > table T in
> > > preparation for VACUUM FULL.
> > > Client C connects to the database and waits for
> > > client B to get and
> > > release his lock on table T.
> > > Client D connects to the database and waits for
> > > client B to get and
> > > release his lock on table T.
> > > Client E connects to the database and waits for
> > > client B to get and
> > > release his lock on table T.
> > > etc...
> >
> > oh! my! gawd!
> > Finally a clear explanation that makes perfect
> sense.
> > Now why did it take so long?
>
> Because your initial definition of the problem kinda
> led us all in the
> wrong direction for 24 hours?  :)  Remember, it took
> like three times of
> folks asking "what's happening that locks your
> database" before the
> vacuum full issue came up.  From there, 24 more
> hours.  Actually not
> bad.
>
> And don't forget, the docs on vacuum pretty clearly
> state:
>
> "The second form is the VACUUM FULL command. This
> uses a more aggressive
> algorithm for reclaiming the space consumed by
> expired row versions. Any
> space that is freed by VACUUM FULL is immediately
> returned to the
> operating system. Unfortunately, this variant of the
> VACUUM command
> acquires an exclusive lock on each table while
> VACUUM FULL is processing
> it. Therefore, frequently using VACUUM FULL can have
> an extremely
> negative effect on the performance of concurrent
> database queries."
>
> And then later on:
>
> "VACUUM FULL is recommended for cases where you know
> you have deleted
> the majority of rows in a table, so that the
> steady-state size of the
> table can be shrunk substantially with VACUUM FULL's
> more aggressive
> approach. Use plain VACUUM, not VACUUM FULL, for
> routine vacuuming for
> space recovery."
>
> So, daily vacuum fulls are not recommended.
>
> > So all I need to do is take out the FULL? Is
> regular
> > VACUUM sufficient? How often do we need FULL? (I
> know
> > it's a stupid question without providing some more
> > context, but how can I estimate it?)
>
> Please read up on vacuuming in the docs, at:
>
>
http://www.postgresql.org/docs/8.0/static/maintenance.html#ROUTINE-VACUUMING
>
> It's quite enlightening about this.  Basically,
> assuming your fsm
> settings are high enough for your update/delete
> load, yes, plain vacuums
> should be enough.
>
> >
> > I suppose the ultimate solution would be a wrapper
> > script that works as follows:
> >
> > check if there are any waiting/idle in transaction
> > processes
> > if such processes exist, do a regular VACUUM and
> send
> > out a warning email
> > otherwise, do VACUUM FULL.
>
> Nah, that's probably overkill.  I'd rather just run
> plain vacuum
> verboses and check them by hand once a week or so to
> make sure I'm
> reclaiming all the space.
>
> > I like this solution a lot more than getting
> support
> > calls on weekends.
>
> Amen brother, amen...
>
> > Out of curiousity, how is lock acquisition
> implemented
> > in postgresql? All the processes have to go
> through
> > some sort of queue, so that locks are granted in
> FIFO
> > order, as you described. Just trying to understand
> it
> > better.
>
> See here:
>
> http://www.postgresql.org/docs/8.0/static/mvcc.html
>
> PostgreSQL's locking system is quite impression.  I
> kinda giggle when
> someone says "Well, not MySQL has feature Y, so why
> bother with
> PostgreSQL?" It's pretty obvious they haven't really
> read up on pgsql
> when they say things like that.
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will
> ignore your desire to
>        choose an index scan if your joining column's
> datatypes do not
>        match
>




__________________________________
Yahoo! Mail for Mobile
Take Yahoo! Mail with you! Check email on your mobile phone.
http://mobile.yahoo.com/learn/mail

Re: transaction timeout

From
Paul Tillotson
Date:
Dr NoName wrote:

>>Sure.  Like this:
>>
>>Client A accesses table T, and "hangs."
>>Client B attempts to get an ACCESS EXCLUSIVE lock on
>>table T in
>>preparation for VACUUM FULL.
>>Client C connects to the database and waits for
>>client B to get and
>>release his lock on table T.
>>Client D connects to the database and waits for
>>client B to get and
>>release his lock on table T.
>>Client E connects to the database and waits for
>>client B to get and
>>release his lock on table T.
>>etc...
>>
>>
>
>oh! my! gawd!
>Finally a clear explanation that makes perfect sense.
>Now why did it take so long?
>
>
>
I think you did not get the explanation sooner because you did not
mention that you were doing VACUUM FULL from a cron job, and you got
drawn into an argument about what postgres should do rather than WHY it
did what it did.

I had a lot of sympathy with your position as something similar happened
to me, but you did not give the detail that allowed me to guess (i.e.,
the VACUUM FULL) until several exchanges had taken place.

>So all I need to do is take out the FULL? Is regular
>VACUUM sufficient? How often do we need FULL? (I know
>it's a stupid question without providing some more
>context, but how can I estimate it?)
>
>
>
You never have to run VACUUM FULL.  The only thing that it does that
plain ole VACUUM does not is that it can actually shrink a table.   If
your table doesn't need shrinking you don't need VACUUM FULL. It is
really only for people in desperate straits who let a table get way too
large without running regular VACUUM on it.

As another poster already pointed out, you need to set the free space
map configuration high enough.  The general process is to let the
database go 1 day without VACUUMing, and then run VACUUM VERBOSE.  This
will print a lot of information about each table that you don't really
care about, and then at the end, it will tell you how many pages you
need in the free space map like this:

INFO:  free space map: 248 relations, 242 pages stored; 4032 total pages
needed
DETAIL:  Allocated FSM size: 1000 relations + 20000 pages = 182 kB
shared memory.

This example from my box shows that I have the free space tracking 1000
relations and 20000 pages, but I only need 248 relations and 4000
pages.  In your own case, unless you are short on RAM, multiplying the
amount it says you need by a factor of 4 is probably a good rule of thumb.

>Out of curiousity, how is lock acquisition implemented
>in postgresql? All the processes have to go through
>some sort of queue, so that locks are granted in FIFO
>order, as you described. Just trying to understand it
>better.
>
>
>
Not all locks--only locks that conflict with each other must wait on
each other in this fashion.  If every lock did, then you would only need
1 lock in the whole database, as it would protect against any sort of
concurrent access. :)

There are two main kinds of locks--shared locks and exclusive locks.
Multiple shared locks can be granted on the same table or row, but only
one exclusive lock can be.

select, insert, update, and delete, and regular vacuum take no exclusive
locks, hence the excellent general performance of postgres.* (see below)

The important thing to remember is that if 1 process is waiting trying
to get an exclusive lock on some table, then every other process asking
for shared lock on the same table will have to wait.

I know offhand that VACUUM FULL, ALTER TABLE, and REINDEX take exclusive
locks.  These are probably the only commands that people would be
tempted to run via a cron job.

You might find this informative:
http://www.postgresql.org/docs/8.0/interactive/explicit-locking.html

Regards,
Paul Tillotson

(*) Actually, you can get this kind of deadlock with just UPDATES.
Suppose that your web application does:

BEGIN;
UPDATE hits SET count = count + 1 WHERE page = 'somepage.aspx';
[other stuff]
COMMIT;

If you have another transaction that tries to update the SAME ROW, then
it will wait for the first transaction to finish.  Thus, if your client
does the update and then hangs while doing [other stuff], every other
client that tries to update that row will block until the transaction
commits, even though the rest of the database will be unaffected.


Re: transaction timeout

From
Alvaro Herrera
Date:
On Wed, Jul 27, 2005 at 05:12:46PM -0700, Dr NoName wrote:
> Thanks a lot, everyone! That solved my problem. But I
> still want to be able to set transaction timeout. Any
> chance of that in the next release?

No, because feature freeze for the next release is one month past
already.

Anyway, I think it was proposed and shot down several times already for
past releases.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"La Primavera ha venido. Nadie sabe como ha sido" (A. Machado)

Distributed Transactions

From
"Mike Fahrenkrog"
Date:
On Tue, 2005-07-26 at 12:51, Dr NoName wrote:
> postgresql doesn't support distributed transactions.

Hi, I just wanted to confirm this. Is there any reasonable way to configure
distributed transactions involving Postgres?

Thanks very much,
Mike Fahrenkrog



Re: Distributed Transactions

From
Tom Lane
Date:
"Mike Fahrenkrog" <mikef@travelpost.com> writes:
> Hi, I just wanted to confirm this. Is there any reasonable way to configure
> distributed transactions involving Postgres?

Not at the moment.  8.1 will have support for 2-phase commit, which would
allow its use as part of a distributed transaction (you still have to
get your XA control software from somewhere, though).

            regards, tom lane