Thread: Autovacuum vs statement_timeout

Autovacuum vs statement_timeout

From
Tom Lane
Date:
I seem to remember that we'd agreed that autovacuum should ignore any
globally set statement_timeout, on the grounds that a poorly chosen
setting could indefinitely prevent large tables from being vacuumed.
But I do not see anything in autovacuum.c that resets the variable.
Am I just being blind?  (Quite possible, as I'm tired and under the
weather.)

The thing that brought this to mind was the idea that Mark
Shuttleworth's open problem might be triggered in part by a statement
timeout interrupting autovacuum at an inopportune point --- some logs
he sent me offlist show that he is using statement_timeout ...
        regards, tom lane


Re: Autovacuum vs statement_timeout

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> I seem to remember that we'd agreed that autovacuum should ignore any
> globally set statement_timeout, on the grounds that a poorly chosen
> setting could indefinitely prevent large tables from being vacuumed.
> But I do not see anything in autovacuum.c that resets the variable.
> Am I just being blind?  (Quite possible, as I'm tired and under the
> weather.)
> 
> The thing that brought this to mind was the idea that Mark
> Shuttleworth's open problem might be triggered in part by a statement
> timeout interrupting autovacuum at an inopportune point --- some logs
> he sent me offlist show that he is using statement_timeout ...

statement_timeout interrupts seem to go through the PG_CATCH-block and 
clean up the entry from the vacuum cycle array as they should. But a 
SIGINT leading to a "terminating connection due to administrator 
command" error does not.

After the recent change in CVS HEAD, CREATE DATABASE tries to 
kill(SIGINT) any autovacuum process in the template database. That seems 
very dangerous now, it could easily leave stale entries in the cycle id 
array. However, it doesn't explain the Mark Shuttleworth's problem 
because the 8.2 behavior is to throw an "source database is being 
accessed by other users" error instead of killing autovacuum. Maybe 
there's something else killing autovacuum processes?

I think we need to add the xid of the vacuum transaction in the vacuum 
cycle array, and clean up orphaned entries in _bt_start_vacuum. We're 
going to have a hard time plugging every leak one-by-one otherwise.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Autovacuum vs statement_timeout

From
Tom Lane
Date:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
> statement_timeout interrupts seem to go through the PG_CATCH-block and 
> clean up the entry from the vacuum cycle array as they should. But a 
> SIGINT leading to a "terminating connection due to administrator 
> command" error does not.

Hm, that's an interesting thought, but there are no "terminating
connection" messages in Shuttleworth's logs either.  So we still lack
the right idea there.  (BTW it would be SIGTERM not SIGINT.)

> I think we need to add the xid of the vacuum transaction in the vacuum 
> cycle array, and clean up orphaned entries in _bt_start_vacuum. We're 
> going to have a hard time plugging every leak one-by-one otherwise.

You're thinking too small --- what this thought actually suggests is
that PG_CATCH can't be used to clean up shared memory at all, and I
don't think we want to accept that.  (I see several other places already
where we assume we can do that.  We could convert each one into an
on_proc_exit cleanup operation, maybe, but that seems messy and not very
scalable.)  I'm thinking we may want to redesign elog(FATAL) processing
so that we escape out to the outer level before calling proc_exit,
thereby allowing CATCH blocks to run first.

Note for the archives: I've argued for some time that SIGTERM'ing
individual backends is an insufficiently tested code path to be exposed
as standard functionality.  Looks like that's still true.  This is not
a bug for database shutdown because we don't really care if we leave
perfectly clean shared memory behind --- it's only a bug if you try to
SIGTERM an individual vacuum process while leaving the system up.
        regards, tom lane


Re: Autovacuum vs statement_timeout

From
Tom Lane
Date:
I wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> statement_timeout interrupts seem to go through the PG_CATCH-block and 
>> clean up the entry from the vacuum cycle array as they should. But a 
>> SIGINT leading to a "terminating connection due to administrator 
>> command" error does not.

> Hm, that's an interesting thought, but there are no "terminating
> connection" messages in Shuttleworth's logs either.  So we still lack
> the right idea there.  (BTW it would be SIGTERM not SIGINT.)

Hold it ... stop the presses ... the reason we saw no "terminating
connection" messages was he was grepping his logs for lines containing
ERROR.  Once we look for FATAL too, there are a pile of 'em.  I'm not
100% convinced that any are from autovacuum processes, but clearly
*something* is throwing SIGTERM around with abandon in his test
environment.  So at this point your theory above looks like a plausible
mechanism for the vacuum cycle array to slowly fill up and eventually
make _bt_start_vacuum fail (or, perhaps, fail sooner than that due to
a repeat vacuum attempt).

>> I think we need to add the xid of the vacuum transaction in the vacuum 
>> cycle array, and clean up orphaned entries in _bt_start_vacuum. We're 
>> going to have a hard time plugging every leak one-by-one otherwise.

> You're thinking too small --- what this thought actually suggests is
> that PG_CATCH can't be used to clean up shared memory at all, and I
> don't think we want to accept that.  (I see several other places already
> where we assume we can do that.  We could convert each one into an
> on_proc_exit cleanup operation, maybe, but that seems messy and not very
> scalable.)  I'm thinking we may want to redesign elog(FATAL) processing
> so that we escape out to the outer level before calling proc_exit,
> thereby allowing CATCH blocks to run first.

I was hoping we could do that just as an 8.3 change, but it's now
starting to look like we might have to back-patch it, depending on how
much we care about surviving random SIGTERM attempts.  I'd like to wait
for some report from Mark about what's causing all the SIGTERMs before
we evaluate that.
        regards, tom lane


Re: Autovacuum vs statement_timeout

From
Alvaro Herrera
Date:
Tom Lane wrote:
> I wrote:
> > Heikki Linnakangas <heikki@enterprisedb.com> writes:
> >> statement_timeout interrupts seem to go through the PG_CATCH-block and 
> >> clean up the entry from the vacuum cycle array as they should. But a 
> >> SIGINT leading to a "terminating connection due to administrator 
> >> command" error does not.
> 
> > Hm, that's an interesting thought, but there are no "terminating
> > connection" messages in Shuttleworth's logs either.  So we still lack
> > the right idea there.  (BTW it would be SIGTERM not SIGINT.)
> 
> Hold it ... stop the presses ... the reason we saw no "terminating
> connection" messages was he was grepping his logs for lines containing
> ERROR.  Once we look for FATAL too, there are a pile of 'em.  I'm not
> 100% convinced that any are from autovacuum processes, but clearly
> *something* is throwing SIGTERM around with abandon in his test
> environment.  So at this point your theory above looks like a plausible
> mechanism for the vacuum cycle array to slowly fill up and eventually
> make _bt_start_vacuum fail (or, perhaps, fail sooner than that due to
> a repeat vacuum attempt).

Hmmm, remember that DatabaseCancelAutovacuumActivity is called on CREATE
DATABASE; but what it does is send SIGINT, not SIGTERM.  Also, it's not
in 8.2.  SIGINT does terminate the autovac process however.

I haven't read the whole problem report completely, so I'm not sure this
has something to do or not.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Autovacuum vs statement_timeout

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Hmmm, remember that DatabaseCancelAutovacuumActivity is called on CREATE
> DATABASE; but what it does is send SIGINT, not SIGTERM.  Also, it's not
> in 8.2.  SIGINT does terminate the autovac process however.
> I haven't read the whole problem report completely, so I'm not sure this
> has something to do or not.

AFAICT, SIGINT should be okay, because it will lead to an ERROR not a
FATAL elog; so control should fall out through the CATCH block before
the autovacuum process quits.  The problem is with FATAL elogs.

Mark reports that the only FATAL lines in his logs are instances ofFATAL:  terminating connection due to administrator
commandFATAL: database "launchpad_ftest" does not exist
 
and the latter presumably isn't coming out from within the btree vacuum
code, so I don't see any other explanation for a FATAL exit than SIGTERM.
        regards, tom lane


Re: Autovacuum vs statement_timeout

From
Peter Eisentraut
Date:
Tom Lane wrote:
> I seem to remember that we'd agreed that autovacuum should ignore any
> globally set statement_timeout, on the grounds that a poorly chosen
> setting could indefinitely prevent large tables from being vacuumed.

On a vaguely related matter, should programs such as pg_dump, vacuumdb, 
and reindexdb disable statement_timeout?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: Autovacuum vs statement_timeout

From
"Jim C. Nasby"
Date:
On Sun, Apr 01, 2007 at 12:36:01AM +0200, Peter Eisentraut wrote:
> Tom Lane wrote:
> > I seem to remember that we'd agreed that autovacuum should ignore any
> > globally set statement_timeout, on the grounds that a poorly chosen
> > setting could indefinitely prevent large tables from being vacuumed.
> 
> On a vaguely related matter, should programs such as pg_dump, vacuumdb, 
> and reindexdb disable statement_timeout?

Youch... yes, they should IMO. Add clusterdb, pg_dumpall and pg_restore
to that list as well (really, pg_dump(all) should output a command to
disable statement_timeout).
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


Re: Autovacuum vs statement_timeout

From
"Joshua D. Drake"
Date:
Jim C. Nasby wrote:
> On Sun, Apr 01, 2007 at 12:36:01AM +0200, Peter Eisentraut wrote:
>> Tom Lane wrote:
>>> I seem to remember that we'd agreed that autovacuum should ignore any
>>> globally set statement_timeout, on the grounds that a poorly chosen
>>> setting could indefinitely prevent large tables from being vacuumed.
>> On a vaguely related matter, should programs such as pg_dump, vacuumdb, 
>> and reindexdb disable statement_timeout?
> 
> Youch... yes, they should IMO. Add clusterdb, pg_dumpall and pg_restore
> to that list as well (really, pg_dump(all) should output a command to
> disable statement_timeout).

I don't know if that should be a default or not. It is certainly easy 
enough to disable it should you want to.

Sincerely,

Joshua D. Drake


-- 
      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: Autovacuum vs statement_timeout

From
"Jim C. Nasby"
Date:
On Tue, Apr 17, 2007 at 12:51:51PM -0700, Joshua D. Drake wrote:
> Jim C. Nasby wrote:
> >On Sun, Apr 01, 2007 at 12:36:01AM +0200, Peter Eisentraut wrote:
> >>Tom Lane wrote:
> >>>I seem to remember that we'd agreed that autovacuum should ignore any
> >>>globally set statement_timeout, on the grounds that a poorly chosen
> >>>setting could indefinitely prevent large tables from being vacuumed.
> >>On a vaguely related matter, should programs such as pg_dump, vacuumdb, 
> >>and reindexdb disable statement_timeout?
> >
> >Youch... yes, they should IMO. Add clusterdb, pg_dumpall and pg_restore
> >to that list as well (really, pg_dump(all) should output a command to
> >disable statement_timeout).
> 
> I don't know if that should be a default or not. It is certainly easy 
> enough to disable it should you want to.

How would you disable it for those command-line utilities? Or are you
referring to disabling it via an ALTER ROLE SET ... for superusers?

ISTM current behavior is a bit of a foot-gun. These are administrative
shell commands that aren't going to be run by Joe-user.
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


Re: Autovacuum vs statement_timeout

From
Robert Treat
Date:
On Tuesday 17 April 2007 18:38, Jim C. Nasby wrote:
> On Tue, Apr 17, 2007 at 12:51:51PM -0700, Joshua D. Drake wrote:
> > Jim C. Nasby wrote:
> > >On Sun, Apr 01, 2007 at 12:36:01AM +0200, Peter Eisentraut wrote:
> > >>Tom Lane wrote:
> > >>>I seem to remember that we'd agreed that autovacuum should ignore any
> > >>>globally set statement_timeout, on the grounds that a poorly chosen
> > >>>setting could indefinitely prevent large tables from being vacuumed.
> > >>
> > >>On a vaguely related matter, should programs such as pg_dump, vacuumdb,
> > >>and reindexdb disable statement_timeout?
> > >
> > >Youch... yes, they should IMO. Add clusterdb, pg_dumpall and pg_restore
> > >to that list as well (really, pg_dump(all) should output a command to
> > >disable statement_timeout).
> >
> > I don't know if that should be a default or not. It is certainly easy
> > enough to disable it should you want to.
>
> How would you disable it for those command-line utilities? Or are you
> referring to disabling it via an ALTER ROLE SET ... for superusers?
>
> ISTM current behavior is a bit of a foot-gun. These are administrative
> shell commands that aren't going to be run by Joe-user.

I'm with Joshua on this one. Statement_timeout is often used as a means for 
protection from long running statements due to server load and locking and 
all of the above commands can certainly fall into that area. If people feel 
strongly that the command line programs need a way to circumvent it, add 
a --ignore-statement-timeout option or similar mechanism. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


Re: Autovacuum vs statement_timeout

From
Tom Lane
Date:
Robert Treat <xzilla@users.sourceforge.net> writes:
> I'm with Joshua on this one. Statement_timeout is often used as a means for 
> protection from long running statements due to server load and locking and 
> all of the above commands can certainly fall into that area. If people feel 
> strongly that the command line programs need a way to circumvent it, add 
> a --ignore-statement-timeout option or similar mechanism. 

The worst-case scenario here is that your server fails and you discover
that all your backups are corrupt because you didn't notice pg_dump was
failing due to statement_timeout.  (Maybe it just recently started to
fail because your biggest table grew past the point at which the COPY
command exceeded statement_timeout.)

I'm not excited about the other ones but I can see the argument for
making pg_dump force the timeout to 0.
        regards, tom lane


Re: Autovacuum vs statement_timeout

From
"Joshua D. Drake"
Date:
Tom Lane wrote:
> Robert Treat <xzilla@users.sourceforge.net> writes:
>> I'm with Joshua on this one. Statement_timeout is often used as a means for 
>> protection from long running statements due to server load and locking and 
>> all of the above commands can certainly fall into that area. If people feel 
>> strongly that the command line programs need a way to circumvent it, add 
>> a --ignore-statement-timeout option or similar mechanism. 
> 
> The worst-case scenario here is that your server fails and you discover
> that all your backups are corrupt because you didn't notice pg_dump was
> failing due to statement_timeout.  (Maybe it just recently started to
> fail because your biggest table grew past the point at which the COPY
> command exceeded statement_timeout.)
> 
> I'm not excited about the other ones but I can see the argument for
> making pg_dump force the timeout to 0.

I guess my point is, if you are knowledgeable enough to actually set a 
statement_timeout, you are likely knowledgeable enough to know how to 
turn it off for programs like pg_dump.

Sincerely,

Joshua D. Drake

> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 


-- 
      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: Autovacuum vs statement_timeout

From
Alvaro Herrera
Date:
Joshua D. Drake wrote:
> Tom Lane wrote:
> >Robert Treat <xzilla@users.sourceforge.net> writes:
> >>I'm with Joshua on this one. Statement_timeout is often used as a means 
> >>for protection from long running statements due to server load and 
> >>locking and all of the above commands can certainly fall into that area. 
> >>If people feel strongly that the command line programs need a way to 
> >>circumvent it, add a --ignore-statement-timeout option or similar 
> >>mechanism. 
> >
> >The worst-case scenario here is that your server fails and you discover
> >that all your backups are corrupt because you didn't notice pg_dump was
> >failing due to statement_timeout.  (Maybe it just recently started to
> >fail because your biggest table grew past the point at which the COPY
> >command exceeded statement_timeout.)
> >
> >I'm not excited about the other ones but I can see the argument for
> >making pg_dump force the timeout to 0.
> 
> I guess my point is, if you are knowledgeable enough to actually set a 
> statement_timeout, you are likely knowledgeable enough to know how to 
> turn it off for programs like pg_dump.

I think that is too strong an assumption, which is why I'm planning to
back-patch the change to reset statement_timeout to 0 on autovacuum till
8.0, as discussed.  I think I should also backpatch the change to set
zero_damaged_pages as well (which is not on 8.0 AFAIR).

It's very very easy to change things in postgresql.conf.  Actually
knowing what you are doing (i.e. thinking on the consequences on VACUUM
and such) is a whole another matter.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Autovacuum vs statement_timeout

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> I think that is too strong an assumption, which is why I'm planning to
> back-patch the change to reset statement_timeout to 0 on autovacuum till
> 8.0, as discussed.  I think I should also backpatch the change to set
> zero_damaged_pages as well (which is not on 8.0 AFAIR).
> 
> It's very very easy to change things in postgresql.conf.  Actually
> knowing what you are doing (i.e. thinking on the consequences on VACUUM
> and such) is a whole another matter.

Frankly, setting statement_timeout in postgresql.conf seems so risky in
so many ways, perhaps we just need to document that the parameter
probably should not be set in postgresql.conf, and why.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Autovacuum vs statement_timeout

From
Magnus Hagander
Date:
On Tue, Apr 17, 2007 at 10:33:21PM -0400, Bruce Momjian wrote:
> Alvaro Herrera wrote:
> > I think that is too strong an assumption, which is why I'm planning to
> > back-patch the change to reset statement_timeout to 0 on autovacuum till
> > 8.0, as discussed.  I think I should also backpatch the change to set
> > zero_damaged_pages as well (which is not on 8.0 AFAIR).
> > 
> > It's very very easy to change things in postgresql.conf.  Actually
> > knowing what you are doing (i.e. thinking on the consequences on VACUUM
> > and such) is a whole another matter.
> 
> Frankly, setting statement_timeout in postgresql.conf seems so risky in
> so many ways, perhaps we just need to document that the parameter
> probably should not be set in postgresql.conf, and why.

I'd suggest doing both. Tell people that it's dangerous (probably to the
point of a comment in the sample config file), but *also* force it in
pg_dump since you can't really expect people to read the documentation.

//MAgnus



Re: Autovacuum vs statement_timeout

From
Robert Treat
Date:
On Tuesday 17 April 2007 21:25, Alvaro Herrera wrote:
> I think that is too strong an assumption, which is why I'm planning to
> back-patch the change to reset statement_timeout to 0 on autovacuum till
> 8.0, as discussed.  I think I should also backpatch the change to set
> zero_damaged_pages as well (which is not on 8.0 AFAIR).

<blinks>  Um, can I get a pointer to that thread?  I can't imagine why we 
would actually want to automatically destroy our data without oversight from 
a DBA... I must be reading that wrong. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


Re: Autovacuum vs statement_timeout

From
Robert Treat
Date:
On Tuesday 17 April 2007 20:54, Tom Lane wrote:
> Robert Treat <xzilla@users.sourceforge.net> writes:
> > I'm with Joshua on this one. Statement_timeout is often used as a means
> > for protection from long running statements due to server load and
> > locking and all of the above commands can certainly fall into that area.
> > If people feel strongly that the command line programs need a way to
> > circumvent it, add a --ignore-statement-timeout option or similar
> > mechanism.
>
> The worst-case scenario here is that your server fails and you discover
> that all your backups are corrupt because you didn't notice pg_dump was
> failing due to statement_timeout.  (Maybe it just recently started to
> fail because your biggest table grew past the point at which the COPY
> command exceeded statement_timeout.)
>

I don't think I recall anyone ever complaining about this, and this scenario 
has been plausible for *years*...

> I'm not excited about the other ones but I can see the argument for
> making pg_dump force the timeout to 0.
>

Allowing pg_dump to run un-checked could also lead to problems such as 
exceeding maintenence windows causing performance issues, or causing trouble 
due to lock contention with ongoing pg_dumps.  I'll grant that the downsides 
aren't as extreme, but the current functionality provides simple work arounds 
(setting up specific dump users for example).  If we force pg_dump to 0 
timeout, what means will be provided for the DBA who doesn't want to let 
pg_dump run unchecked? 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


Re: Autovacuum vs statement_timeout

From
Alvaro Herrera
Date:
Robert Treat wrote:
> On Tuesday 17 April 2007 21:25, Alvaro Herrera wrote:
> > I think that is too strong an assumption, which is why I'm planning to
> > back-patch the change to reset statement_timeout to 0 on autovacuum till
> > 8.0, as discussed.  I think I should also backpatch the change to set
> > zero_damaged_pages as well (which is not on 8.0 AFAIR).
> 
> <blinks>  Um, can I get a pointer to that thread?  I can't imagine why we 
> would actually want to automatically destroy our data without oversight from 
> a DBA... I must be reading that wrong. 

You are -- I intend to set it to _off_ :-)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Autovacuum vs statement_timeout

From
Alvaro Herrera
Date:
Robert Treat wrote:
> On Tuesday 17 April 2007 20:54, Tom Lane wrote:

> > I'm not excited about the other ones but I can see the argument for
> > making pg_dump force the timeout to 0.
> 
> Allowing pg_dump to run un-checked could also lead to problems such as 
> exceeding maintenence windows causing performance issues, or causing trouble 
> due to lock contention with ongoing pg_dumps.

I have never ever seen a request to be able to control pg_dump and have
it stop dumping if the time taken to dump exceeded a threshold.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Autovacuum vs statement_timeout

From
Alvaro Herrera
Date:
> > Tom Lane wrote:
> > > I seem to remember that we'd agreed that autovacuum should ignore any
> > > globally set statement_timeout, on the grounds that a poorly chosen
> > > setting could indefinitely prevent large tables from being vacuumed.

FWIW in testing, I just noticed that autovacuum does not pay attention
to statement_timeout anyway, because it is only set in
start_xact_command, which is used for interactive commands only, not in
autovacuum.  So there's no need to patch anything.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Autovacuum vs statement_timeout

From
"Joshua D. Drake"
Date:
Alvaro Herrera wrote:
> Robert Treat wrote:
>> On Tuesday 17 April 2007 20:54, Tom Lane wrote:
> 
>>> I'm not excited about the other ones but I can see the argument for
>>> making pg_dump force the timeout to 0.
>> Allowing pg_dump to run un-checked could also lead to problems such as 
>> exceeding maintenence windows causing performance issues, or causing trouble 
>> due to lock contention with ongoing pg_dumps.

If you have that problem, you need bigger hardware. pg_dump is a 
priority application. Not to mention, if you *really* want that time of 
behavior it is easy enough to wrap pg_dump in perl or python.

Let the foot guns be available to those that can script them :)

Sincerely,

Joshua D. Drake





-- 
      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: Autovacuum vs statement_timeout

From
Robert Treat
Date:
On Wednesday 18 April 2007 11:30, Alvaro Herrera wrote:
> Robert Treat wrote:
> > On Tuesday 17 April 2007 20:54, Tom Lane wrote:
> > > I'm not excited about the other ones but I can see the argument for
> > > making pg_dump force the timeout to 0.
> >
> > Allowing pg_dump to run un-checked could also lead to problems such as
> > exceeding maintenence windows causing performance issues, or causing
> > trouble due to lock contention with ongoing pg_dumps.
>
> I have never ever seen a request to be able to control pg_dump and have
> it stop dumping if the time taken to dump exceeded a threshold.

Given that we already have the functionality, I suspect you wouldn't...

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


Re: Autovacuum vs statement_timeout

From
Bruce Momjian
Date:
I have added a TODO:

>       o Set up autovacuum to ignore statement_timeout set in
>         postgresql.conf
>
>         http://archives.postgresql.org/pgsql-hackers/2007-03/msg01753.php

and documented this behavior with the attached patch;  backpatched to 8.3.X.

---------------------------------------------------------------------------

Magnus Hagander wrote:
> On Tue, Apr 17, 2007 at 10:33:21PM -0400, Bruce Momjian wrote:
> > Alvaro Herrera wrote:
> > > I think that is too strong an assumption, which is why I'm planning to
> > > back-patch the change to reset statement_timeout to 0 on autovacuum till
> > > 8.0, as discussed.  I think I should also backpatch the change to set
> > > zero_damaged_pages as well (which is not on 8.0 AFAIR).
> > >
> > > It's very very easy to change things in postgresql.conf.  Actually
> > > knowing what you are doing (i.e. thinking on the consequences on VACUUM
> > > and such) is a whole another matter.
> >
> > Frankly, setting statement_timeout in postgresql.conf seems so risky in
> > so many ways, perhaps we just need to document that the parameter
> > probably should not be set in postgresql.conf, and why.
>
> I'd suggest doing both. Tell people that it's dangerous (probably to the
> point of a comment in the sample config file), but *also* force it in
> pg_dump since you can't really expect people to read the documentation.
>
> //MAgnus

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/config.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.170
diff -c -c -r1.170 config.sgml
*** doc/src/sgml/config.sgml    10 Mar 2008 12:55:13 -0000    1.170
--- doc/src/sgml/config.sgml    11 Mar 2008 15:38:41 -0000
***************
*** 3877,3882 ****
--- 3877,3888 ----
          logged.  A value of zero (the default) turns off the
          limitation.
         </para>
+
+        <para>
+         Setting <varname>statement_timeout</> in
+         <filename>postgresql.conf</> is not recommended because it
+         affects all sessions, including autovacuum.
+        </para>
        </listitem>
       </varlistentry>


Re: Autovacuum vs statement_timeout

From
Alvaro Herrera
Date:
Bruce Momjian wrote:
> 
> I have added a TODO:
> 
> >       o Set up autovacuum to ignore statement_timeout set in
> >         postgresql.conf
> >
> >         http://archives.postgresql.org/pgsql-hackers/2007-03/msg01753.php
> 
> and documented this behavior with the attached patch;  backpatched to 8.3.X.

Hmm, AFAIR subsequent investigation led to the discovery that autovacuum
is not affected by statement_timeout.




-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Autovacuum vs statement_timeout

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > 
> > I have added a TODO:
> > 
> > >       o Set up autovacuum to ignore statement_timeout set in
> > >         postgresql.conf
> > >
> > >         http://archives.postgresql.org/pgsql-hackers/2007-03/msg01753.php
> > 
> > and documented this behavior with the attached patch;  backpatched to 8.3.X.
> 
> Hmm, AFAIR subsequent investigation led to the discovery that autovacuum
> is not affected by statement_timeout.

Oh, very good.  I will remove the TODO and doc item then.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Autovacuum vs statement_timeout

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> Hmm, AFAIR subsequent investigation led to the discovery that autovacuum
> is not affected by statement_timeout.

Right -- see
http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/80044/focus=93847

So your documentation changes are incorrect.

-- 
Alvaro Herrera       Valdivia, Chile   ICBM: S 39º 48' 55.3", W 73º 15' 24.7"
"You knock on that door or the sun will be shining on places inside you
that the sun doesn't usually shine" (en Death: "The High Cost of Living")


Re: Autovacuum vs statement_timeout

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Alvaro Herrera wrote:
>> Hmm, AFAIR subsequent investigation led to the discovery that autovacuum
>> is not affected by statement_timeout.

> Right -- see
> http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/80044/focus=93847

Or even more to the point, look into autovacuum.c:
/* * Force statement_timeout to zero to avoid a timeout setting from * preventing regular maintenance from being
executed.*/SetConfigOption("statement_timeout", "0", PGC_SUSET, PGC_S_OVERRIDE);
 

> So your documentation changes are incorrect.

Indeed.  But wasn't the start of this thread a mention that pg_dump
ought to have a similar defense?  AFAIR it does not, so there's still
a small TODO.
        regards, tom lane


Re: Autovacuum vs statement_timeout

From
Bruce Momjian
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Alvaro Herrera wrote:
> >> Hmm, AFAIR subsequent investigation led to the discovery that autovacuum
> >> is not affected by statement_timeout.
>
> > Right -- see
> > http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/80044/focus=93847
>
> Or even more to the point, look into autovacuum.c:
>
>     /*
>      * Force statement_timeout to zero to avoid a timeout setting from
>      * preventing regular maintenance from being executed.
>      */
>     SetConfigOption("statement_timeout", "0", PGC_SUSET, PGC_S_OVERRIDE);
>
> > So your documentation changes are incorrect.
>
> Indeed.  But wasn't the start of this thread a mention that pg_dump
> ought to have a similar defense?  AFAIR it does not, so there's still
> a small TODO.

OK, I have added a documentation mention but removed the mention of
autovacuum.  I have also added this TODO:

        o Prevent pg_dump/pg_restore from being affected by
          statement_timeout

          Using psql to restore a pg_dump dump is also affected.

Particularly consider using psql to restore a pg_dump dump --- are we
going to add "SET statement_timeout=0" to the pg_dump file?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/config.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.170
diff -c -c -r1.170 config.sgml
*** doc/src/sgml/config.sgml    10 Mar 2008 12:55:13 -0000    1.170
--- doc/src/sgml/config.sgml    11 Mar 2008 15:38:41 -0000
***************
*** 3877,3882 ****
--- 3877,3888 ----
          logged.  A value of zero (the default) turns off the
          limitation.
         </para>
+
+        <para>
+         Setting <varname>statement_timeout</> in
+         <filename>postgresql.conf</> is not recommended because it
+         affects all sessions.
+        </para>
        </listitem>
       </varlistentry>


Re: Autovacuum vs statement_timeout

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> Particularly consider using psql to restore a pg_dump
> dump --- are we going to add "SET statement_timeout=0"
> to the pg_dump file?

I hope not. That should be the user's choice.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200803111607
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkfW5sMACgkQvJuQZxSWSsjceACgq72VOH210agu3GNn5p4d3L0R
0KUAniYE/TfuBEZXnWEvH7dLGnHMUU04
=TfbX
-----END PGP SIGNATURE-----




Re: Autovacuum vs statement_timeout

From
Bruce Momjian
Date:
Greg Sabino Mullane wrote:
> > Particularly consider using psql to restore a pg_dump
> > dump --- are we going to add "SET statement_timeout=0"
> > to the pg_dump file?
> 
> I hope not. That should be the user's choice.

Would anyone want to limit the load time for pg_dump?   I can hardly see
why.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Autovacuum vs statement_timeout

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 11 Mar 2008 16:17:53 -0400 (EDT)
Bruce Momjian <bruce@momjian.us> wrote:

> Greg Sabino Mullane wrote:
> > > Particularly consider using psql to restore a pg_dump
> > > dump --- are we going to add "SET statement_timeout=0"
> > > to the pg_dump file?
> > 
> > I hope not. That should be the user's choice.
> 
> Would anyone want to limit the load time for pg_dump?   I can hardly
> see why.

I have to agree with Bruce here. You restore a backup because your
hosed something or you are building a dev environment. These are both
boolean results that should end in TRUE :)

Sincerely,

Joshua D. Drake


- -- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate     PostgreSQL political pundit | Mocker of
Dolphins

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFH1uxyATb/zqfZUUQRAvyWAKCATrKgRQygjveXSrY0KIptvKYZJQCgg87z
XrXncZ8d2Qyf61Wtc+OtHXg=
=rBaK
-----END PGP SIGNATURE-----

Re: Autovacuum vs statement_timeout

From
Bruce Momjian
Date:
Joshua D. Drake wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Tue, 11 Mar 2008 16:17:53 -0400 (EDT)
> Bruce Momjian <bruce@momjian.us> wrote:
> 
> > Greg Sabino Mullane wrote:
> > > > Particularly consider using psql to restore a pg_dump
> > > > dump --- are we going to add "SET statement_timeout=0"
> > > > to the pg_dump file?
> > > 
> > > I hope not. That should be the user's choice.
> > 
> > Would anyone want to limit the load time for pg_dump?   I can hardly
> > see why.
> 
> I have to agree with Bruce here. You restore a backup because your
> hosed something or you are building a dev environment. These are both
> boolean results that should end in TRUE :)

But maybe this brings up that people just shouldn't put
statement_timeout in postgresql.conf, and if they do, they deserve what
they get.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Autovacuum vs statement_timeout

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Greg Sabino Mullane wrote:
>>> Particularly consider using psql to restore a pg_dump
>>> dump --- are we going to add "SET statement_timeout=0"
>>> to the pg_dump file?
>> 
>> I hope not. That should be the user's choice.

> Would anyone want to limit the load time for pg_dump?   I can hardly see
> why.

I think we need to be careful to distinguish three situations:

* statement_timeout during pg_dump
* statement_timeout during pg_restore
* statement_timeout during psql reading a pg_dump script file

Only in the third case is it really trivial for the user to set an
appropriate timeout value before starting the script.  (You can
manage it via PGOPTIONS in the other two cases, but it's not exactly
easy nor well-documented.)

To me, the killer reason for statement_timeout = 0 during pg_dump
is that without it, routine cron-driven dumps could fail, and the
user might not notice until he really really needed that dump.
Ooops.

In the second and third cases, you at least have an option to do it
over if a timeout zaps you.  But it'd still likely be unpleasant,
especially if the restore already did a lot of work before getting
killed.

Offhand I don't see an argument in any of these scenarios why
a timeout kill is ever a good idea, but I'm willing to listen.
        regards, tom lane


Re: Autovacuum vs statement_timeout

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


Bruce asks:
>>> Particularly consider using psql to restore a pg_dump
>>> dump --- are we going to add "SET statement_timeout=0"
>>> to the pg_dump file?
>
>> I hope not. That should be the user's choice.

> Would anyone want to limit the load time for pg_dump? I
> can hardly see why.

Not for pg_dump, but for psql, as you stated above. I don't
have a problem adding it to pg_dump or pg_restore. They are
single, atomic actions out of the control of the user. Restoring
a pg_dump'ed file through psql, on the other hand, should not
assume that the user might not want to keep or set their own
timeout, perhaps because they want to limit the load on the
server, or because of vacuuming concerns. Recall that pg_dump
is not just used to restore entire systems: we can dump schemas,
tables, and in the near future may even have the ability to
dump different classes (schema, data, constraints).

Hard-coding a forced option to the top of a potentially ginormous
and hard-to-edit file that really has nothing to do with the data
itself seems the wrong way to do things. It's not as if we've been
inundated on the lists with tales of people getting caught on
custom statement_timeouts when importing dumps.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200803111959
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkfXHqMACgkQvJuQZxSWSsgifQCgthvDCTiKhw/3A4S1na1mvlOB
+MQAn2baL34c8k3FV+f2CUAn7GwDewrN
=x24Q
-----END PGP SIGNATURE-----




Re: Autovacuum vs statement_timeout

From
Peter Eisentraut
Date:
Tom Lane wrote:
> To me, the killer reason for statement_timeout = 0 during pg_dump
> is that without it, routine cron-driven dumps could fail, and the
> user might not notice until he really really needed that dump.

This concrete case if of course valid, but if you take a step back, there are 
about half a dozen ways to configure the server to make pg_dump fail.  A 
misconfigured pg_hba.conf would also do it.  And if the server isn't running, 
should we start it?  If we take a base backup, and archiving isn't enabled, 
should we enable it?

Perhaps we shouldn't paper over configuration problems in inconsistent and ad 
hoc ways, and instead give useful configuration advice and encourage users to 
monitor their background jobs, which they will have to do anyway, even if we 
solve a misconfigured statement_timeout for them.

# DON'T SET THIS PARAMETER IN THE CONFIGURATION FILE OR YOUR BACKUPS WILL FAIL

could be a start.


Re: Autovacuum vs statement_timeout

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane wrote:
>> To me, the killer reason for statement_timeout = 0 during pg_dump
>> is that without it, routine cron-driven dumps could fail, and the
>> user might not notice until he really really needed that dump.

> This concrete case if of course valid, but if you take a step back,
> there are about half a dozen ways to configure the server to make
> pg_dump fail.

Sure, but I think in a lot of situations there is a use-case for
a nonzero default statement_timeout, so it seems to me worthwhile
to protect pg_dump from that rather than let DBAs find out the
hard way.  Also, statement_timeout is particularly nasty in that
you may not see any failure when you first set up and test your
dump strategy.  It'll get you after your tables have grown, or
on a day when there's particularly heavy activity concurrent
with the dump.

> # DON'T SET THIS PARAMETER IN THE CONFIGURATION FILE OR YOUR BACKUPS WILL FAIL

If there are any parameters for which that is actually appropriate,
wouldn't it be *more* appropriate to flat-out prevent them from
being set from the config file?  I believe we have a GUC variable
flag already for things that shouldn't be in the file, but it's
not enforced.
        regards, tom lane