Thread: First steps with 8.3 and autovacuum launcher

First steps with 8.3 and autovacuum launcher

From
"Guillaume Smet"
Date:
Hi all,

As we will soon enter beta, I decided to give a try to 8.3devel. The
first step is of course to load a dump from an existing database.

The dump is a text dump of 1.6 GB (database is approximately 4 GB).

The restore is far slower than with 8.2. From time to time ALTER TABLE
queries creating primary keys are waiting for a long time. After a
while, I discovered that that I had three autovacuum processes which
were running to analyze the created tables while the dump was trying
to create primary keys on these very tables.

While I understand that it's important to have a fully analyzed
database, I usually do it at the end of the restore and the fact that
three tables are analyzed concurrently while primary keys are created
is far from optimal IMHO as primary keys creation often (*really*
often in my case) waits for autovacuum to finish its job.

I don't have any magical solution for this problem but I'm pretty sure
we will have feedback from users about it. Perhaps the dump should
disable autovacuum on the database while restoring it?

The good news is that the database size is smaller (3.1 GB with
8.3devel instead of 3.6 with 8.2.4).

My next steps are to compare the performances of 8.1.10, 8.2.5 and
8.3devel on a set of real life read queries (e.g. real queries
executed on this database). I'll let you know if I have interesting
results.

Regards,

-- 
Guillaume


Re: First steps with 8.3 and autovacuum launcher

From
Decibel!
Date:
On Tue, Sep 18, 2007 at 06:51:04PM +0200, Guillaume Smet wrote:
> Hi all,
>
> As we will soon enter beta, I decided to give a try to 8.3devel. The
> first step is of course to load a dump from an existing database.
>
> The dump is a text dump of 1.6 GB (database is approximately 4 GB).
>
> The restore is far slower than with 8.2. From time to time ALTER TABLE
> queries creating primary keys are waiting for a long time. After a
> while, I discovered that that I had three autovacuum processes which
> were running to analyze the created tables while the dump was trying
> to create primary keys on these very tables.
>
> While I understand that it's important to have a fully analyzed
> database, I usually do it at the end of the restore and the fact that
> three tables are analyzed concurrently while primary keys are created
> is far from optimal IMHO as primary keys creation often (*really*
> often in my case) waits for autovacuum to finish its job.

Odd... I'd expect it to actually be beneficial to run analyze on a table
at roughly the same time as PK building, because you'd make better use
of cache. OTOH, unless something's changed, analyze should read at most
30k pages, which I wouldn't expect to take all that long...
--
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Re: First steps with 8.3 and autovacuum launcher

From
"Guillaume Smet"
Date:
On 9/19/07, Decibel! <decibel@decibel.org> wrote:
> Odd... I'd expect it to actually be beneficial to run analyze on a table
> at roughly the same time as PK building, because you'd make better use
> of cache.

Sure if your database fits entirely in RAM (otherwise if two big
tables are analyzed while we create the primary key for a third one,
it won't help us at all). And even in this case, it's not sure the
time lost by waiting the lock is worth it. It could for sure if the
restore could create the other primary keys while waiting for the lock
on the analyzed tables, which is obviously not the case.
In my particular case, the restore stales a lot of times with status
ALTER TABLE waiting.

--
Guillaume


Re: First steps with 8.3 and autovacuum launcher

From
Decibel!
Date:
On Sep 19, 2007, at 2:08 AM, Guillaume Smet wrote:
> On 9/19/07, Decibel! <decibel@decibel.org> wrote:
>> Odd... I'd expect it to actually be beneficial to run analyze on a  
>> table
>> at roughly the same time as PK building, because you'd make better  
>> use
>> of cache.
>
> Sure if your database fits entirely in RAM (otherwise if two big
> tables are analyzed while we create the primary key for a third one,

You missed my point... what we'd want to happen is for the analyze to  
take place while that table had a good chance of still being in memory.

> it won't help us at all). And even in this case, it's not sure the
> time lost by waiting the lock is worth it. It could for sure if the
> restore could create the other primary keys while waiting for the lock
> on the analyzed tables, which is obviously not the case.
> In my particular case, the restore stales a lot of times with status
> ALTER TABLE waiting.

It might be worth looking into creating a different lock for ALTERs  
that actually change database page layout vs ALTERs that don't, since  
there's no reason you couldn't run ANALYZE while adding a PK (for  
example).
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828




Re: First steps with 8.3 and autovacuum launcher

From
"Guillaume Smet"
Date:
On 9/19/07, Decibel! <decibel@decibel.org> wrote:
> You missed my point... what we'd want to happen is for the analyze to
> take place while that table had a good chance of still being in memory.

It seems to be a bit too intrusive for 8.3 at this point.

> It might be worth looking into creating a different lock for ALTERs
> that actually change database page layout vs ALTERs that don't, since
> there's no reason you couldn't run ANALYZE while adding a PK (for
> example).

It seems to be a good idea.

As I restore a lot my test database (a real life 3.1 GB database) to
test the patches on like/ilike, here are some figures to show the
problem a bit more:

** 8.3 (autovacuum on as it's the default) **
Time taken by psql to load my SQL dump (the database is analyzed at
the end of the restore due to autovacuum launcher launching up to 3
processes in parallel):
real    52m55.398s
user    0m37.727s
sys     0m3.114s

** 8.2 (autovacuum off as it's the default) **
Time taken by psql to load the SQL dump:
real    15m5.032s
user    0m37.881s
sys     0m3.040s
Time to analyze the database:
cityvox=# \timing
Timing is on.
cityvox=# ANALYZE;
ANALYZE
Time: 45882.049 ms
So a total of: 16 minutes for 8.2 compared to 53 minutes for 8.3 to
have the database in the same state.

All settings except autovacuum are identical on both clusters.

--
Guillaume


Re: First steps with 8.3 and autovacuum launcher

From
Tom Lane
Date:
"Guillaume Smet" <guillaume.smet@gmail.com> writes:
> So a total of: 16 minutes for 8.2 compared to 53 minutes for 8.3 to
> have the database in the same state.

Please try that experiment with all three configurations on both
versions:* autovacuum off* autovacuum on, autovacuum_vacuum_cost_delay = 0* autovacuum on, autovacuum_vacuum_cost_delay
=20
 
Comparing apples and oranges isn't real helpful in determining
what's happening.
        regards, tom lane


Re: First steps with 8.3 and autovacuum launcher

From
"Guillaume Smet"
Date:
On 9/22/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Guillaume Smet" <guillaume.smet@gmail.com> writes:
> Please try that experiment with all three configurations on both
> versions:
>         * autovacuum off
>         * autovacuum on, autovacuum_vacuum_cost_delay = 0
>         * autovacuum on, autovacuum_vacuum_cost_delay = 20

I'll do it during the week-end.

> Comparing apples and oranges isn't real helpful in determining
> what's happening.

I'm not exactly comparing apples and oranges, I'm comparing default
configuration of autovacuum for both versions.
IMHO, the point is not to compare both versions but to see what we can
do to improve the fact that 3 autovacuum processes analyzing the data
while restoring them introduces a lot of overhead.

--
Guillaume


Re: First steps with 8.3 and autovacuum launcher

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:
> "Guillaume Smet" <guillaume.smet@gmail.com> writes:
>> So a total of: 16 minutes for 8.2 compared to 53 minutes for 8.3 to
>> have the database in the same state.
> 
> Please try that experiment with all three configurations on both
> versions:
>     * autovacuum off
>     * autovacuum on, autovacuum_vacuum_cost_delay = 0
>     * autovacuum on, autovacuum_vacuum_cost_delay = 20
> Comparing apples and oranges isn't real helpful in determining
> what's happening.

for the record I'm seeing similiar dramatic slowdowns(ie takes a few
additional hours) during restoration of a 120GB database here - with the
default autovacuum settings the restore is taking ages to go through the
ALTER TABLE foo ADD PRIMARY KEY bar steps because autovacuum is
analyzing those tables at a very slow rate (ie the box is more or less
idle and the ALTER TABLE is waiting for the lock).

I have not yet experimented with different autovacuum_vacuum_cost_delay
settings but this seems like a step backwards :-(


Stefan


Re: First steps with 8.3 and autovacuum launcher

From
"Guillaume Smet"
Date:
On 9/22/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Please try that experiment with all three configurations on both
> versions:
>         * autovacuum off
>         * autovacuum on, autovacuum_vacuum_cost_delay = 0
>         * autovacuum on, autovacuum_vacuum_cost_delay = 20

I've finally found some time to spend on these tests. Here are the results:

* 8.2.5 *
- autovacuum off + ANALYZE: less than 16 minutes (figures previously
posted in this thread) - default configuration of 8.2
- autovacuum on, delay 0: 16m29
- autovacuum on, delay 20: 16m13
(I didn't repeat the run but we can see that autovacuum doesn't
introduce too much slowdown during the restore operation)

* 8.3devel freshly compiled  *
- autovacuum off: 14m39
- autovacuum on, delay 0: 15m32
- autovacuum on, delay 20: 51m37 (the box is idle during a large
amount of this time) - default configuration of 8.3devel

-- 
Guillaume


Re: First steps with 8.3 and autovacuum launcher

From
Stefan Kaltenbrunner
Date:
Guillaume Smet wrote:
> On 9/22/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Please try that experiment with all three configurations on both
>> versions:
>>         * autovacuum off
>>         * autovacuum on, autovacuum_vacuum_cost_delay = 0
>>         * autovacuum on, autovacuum_vacuum_cost_delay = 20
> 
> I've finally found some time to spend on these tests. Here are the results:
> 
> * 8.2.5 *
> - autovacuum off + ANALYZE: less than 16 minutes (figures previously
> posted in this thread) - default configuration of 8.2
> - autovacuum on, delay 0: 16m29
> - autovacuum on, delay 20: 16m13
> (I didn't repeat the run but we can see that autovacuum doesn't
> introduce too much slowdown during the restore operation)
> 
> * 8.3devel freshly compiled  *
> - autovacuum off: 14m39
> - autovacuum on, delay 0: 15m32
> - autovacuum on, delay 20: 51m37 (the box is idle during a large
> amount of this time) - default configuration of 8.3devel

some additional datapoints:

autovacuum on, delay 20: 8h 40min
autovacuum on, delay 0: 4h 23min

for restoring a database of around 120GB (on disk size) ...
In the delay 20 case the restore is more or less waiting hours for
grabbing locks during PK creation held by autovacuum (which tries to
analyze the tables).


Stefan


Re: First steps with 8.3 and autovacuum launcher

From
"Guillaume Smet"
Date:
On 10/1/07, Guillaume Smet <guillaume.smet@gmail.com> wrote:
> * 8.3devel freshly compiled  *
> - autovacuum off: 14m39
> - autovacuum on, delay 0: 15m32
> - autovacuum on, delay 20: 51m37 (the box is idle during a large
> amount of this time) - default configuration of 8.3devel

I made a few more tests with 8.3devel and different
autovacuum_vacuum_cost_delay:
- autovacuum off: 14m39
- autovacuum on, delay 0: 15m32
- autovacuum on, delay 10: 17m19 (*new*)
- autovacuum on, delay 20: 51m37 - default
- autovacuum on, delay 40: 90m46 (*new*)

--
Guillaume


Re: First steps with 8.3 and autovacuum launcher

From
Gregory Stark
Date:
"Stefan Kaltenbrunner" <stefan@kaltenbrunner.cc> writes:

> some additional datapoints:
>
> autovacuum on, delay 20: 8h 40min
> autovacuum on, delay 0: 4h 23min


I realize this isn't directly addressing the problem but perhaps part of the
solution would be to start advocating the use of pg_restore -1 ? That would
solve the problem for the narrow case of pg_restore.

In the long run we could think about exposing some kind of command for
pg_restore to use which would disable autovacuum from touching a table. (Or
take a session-level lock on the table -- shudder)

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: First steps with 8.3 and autovacuum launcher

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> I realize this isn't directly addressing the problem but perhaps part of the
> solution would be to start advocating the use of pg_restore -1 ? That would
> solve the problem for the narrow case of pg_restore.

Well, that would do as a quick workaround, as would disabling autovacuum
during the restore.

> In the long run we could think about exposing some kind of command for
> pg_restore to use which would disable autovacuum from touching a
> table.

Ugh.  I think a real solution probably involves a mechanism that kicks
autovacuum off a table when someone else wants an exclusive lock on it.
This is a little bit worrisome because a steady stream of lock requests
could prevent autovac from ever finishing the table, but it seems clear
that not doing this is going to make autovac a lot more intrusive than
people will stand for.
        regards, tom lane


Re: First steps with 8.3 and autovacuum launcher

From
Heikki Linnakangas
Date:
Gregory Stark wrote:
> "Stefan Kaltenbrunner" <stefan@kaltenbrunner.cc> writes:
> 
>> some additional datapoints:
>>
>> autovacuum on, delay 20: 8h 40min
>> autovacuum on, delay 0: 4h 23min
> 
> 
> I realize this isn't directly addressing the problem but perhaps part of the
> solution would be to start advocating the use of pg_restore -1 ? That would
> solve the problem for the narrow case of pg_restore.
> 
> In the long run we could think about exposing some kind of command for
> pg_restore to use which would disable autovacuum from touching a table. (Or
> take a session-level lock on the table -- shudder)

In my opinion, CREATE INDEX shouldn't need to wait for autovacuum to
finish, regardless of who issued it. This is like priority inversion;
the autovacuum is not urgent, and runs slowly to avoid disturbing
others. But if it keeps the higher priority CREATE INDEX from starting,
it is disturbing others. Could we arrange things so that the effective
cost delay of the autovacuum process that's in the way gets set to 0
(like priority inheritance)?

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: First steps with 8.3 and autovacuum launcher

From
Alvaro Herrera
Date:
Heikki Linnakangas escribió:

> In my opinion, CREATE INDEX shouldn't need to wait for autovacuum to
> finish, regardless of who issued it. This is like priority inversion;
> the autovacuum is not urgent, and runs slowly to avoid disturbing
> others. But if it keeps the higher priority CREATE INDEX from starting,
> it is disturbing others. Could we arrange things so that the effective
> cost delay of the autovacuum process that's in the way gets set to 0
> (like priority inheritance)?

This is an interesting idea, but I think it's attacking the wrong
problem.  To me, the problem here is that an ANALYZE should not block
CREATE INDEX or certain forms of ALTER TABLE.

We do have a mechanism for changing the cost-delay parameters of an
autovac worker, but even if we made it to work, it would still delay the
other operation needlessly until the end of the analyze.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: First steps with 8.3 and autovacuum launcher

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> This is an interesting idea, but I think it's attacking the wrong
> problem.  To me, the problem here is that an ANALYZE should not block
> CREATE INDEX or certain forms of ALTER TABLE.

I doubt that that will work; in particular I'm pretty dubious that you
can safely make CREATE INDEX and VACUUM run together.  Since they'd be
unlikely to be using the identical OldestXmin horizon, you'd likely end
up with dangling index entries (ie, CREATE INDEX indexes a tuple that
the VACUUM removes shortly afterward).
        regards, tom lane


Re: First steps with 8.3 and autovacuum launcher

From
"Matthew T. O'Connor"
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>   
>> This is an interesting idea, but I think it's attacking the wrong
>> problem.  To me, the problem here is that an ANALYZE should not block
>> CREATE INDEX or certain forms of ALTER TABLE.
>>     
>
> I doubt that that will work; in particular I'm pretty dubious that you
> can safely make CREATE INDEX and VACUUM run together.  Since they'd be
> unlikely to be using the identical OldestXmin horizon, you'd likely end
> up with dangling index entries (ie, CREATE INDEX indexes a tuple that
> the VACUUM removes shortly afterward).

I think the main issue is ANALYZE not VACUUM (at least in this thread) 
since it's DB load times that are in question.  Can CREATE INDEX and 
ANALYZE be made to run concurrently?




Re: First steps with 8.3 and autovacuum launcher

From
Alvaro Herrera
Date:
Matthew T. O'Connor escribió:
> Tom Lane wrote:
>> Alvaro Herrera <alvherre@commandprompt.com> writes:
>>   
>>> This is an interesting idea, but I think it's attacking the wrong
>>> problem.  To me, the problem here is that an ANALYZE should not block
>>> CREATE INDEX or certain forms of ALTER TABLE.
>>
>> I doubt that that will work; in particular I'm pretty dubious that you
>> can safely make CREATE INDEX and VACUUM run together.  Since they'd be
>> unlikely to be using the identical OldestXmin horizon, you'd likely end
>> up with dangling index entries (ie, CREATE INDEX indexes a tuple that
>> the VACUUM removes shortly afterward).
>
> I think the main issue is ANALYZE not VACUUM (at least in this thread) 
> since it's DB load times that are in question.

Right.  Autovac will not issue VACUUM against the freshly restored
tables anyway, since there are no deleted tuples.

> Can CREATE INDEX and ANALYZE be made to run concurrently?

I don't see why not (except for the fact that both try to update
reltuples and relpages AFAIR, so we would need to be careful about
that).

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: First steps with 8.3 and autovacuum launcher

From
Simon Riggs
Date:
On Mon, 2007-10-01 at 16:50 -0400, Alvaro Herrera wrote:

> > Can CREATE INDEX and ANALYZE be made to run concurrently?
> 
> I don't see why not (except for the fact that both try to update
> reltuples and relpages AFAIR, so we would need to be careful about
> that).

This seems like the most desirable solution, everything else I've
thought of is really just a kluge.

We could make ANALYZE use optimistic locking, so it doesn't attempt the
catalog lock until later, so ANALYZE would be two internal transactions.
It should not overwrite the reltuples values from a concurrent CREATE
INDEX either.

We should not allow VACUUM to be concurrent with either CREATE INDEX or
ANALYZE, but then thats not the problem here anyway.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: First steps with 8.3 and autovacuum launcher

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> We should not allow VACUUM to be concurrent with either CREATE INDEX or
> ANALYZE, but then thats not the problem here anyway.

I can't believe anyone is short-sighted enough to think that.

The problem here is that autovac takes locks that block foreground
sessions that want exclusive locks.  We've always known this and always
ignored it, but if autovac is on by default then it's going to be in
people's faces a lot more than it was before, and they won't be happy.

If you insist on crafting a solution that only fixes this problem for
pg_restore's narrow usage, you'll be back revisiting it before beta1
has been out a month.
        regards, tom lane


Re: First steps with 8.3 and autovacuum launcher

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > We should not allow VACUUM to be concurrent with either CREATE INDEX or
> > ANALYZE, but then thats not the problem here anyway.
> 
> I can't believe anyone is short-sighted enough to think that.
> 
> The problem here is that autovac takes locks that block foreground
> sessions that want exclusive locks.  We've always known this and always
> ignored it, but if autovac is on by default then it's going to be in
> people's faces a lot more than it was before, and they won't be happy.
> 
> If you insist on crafting a solution that only fixes this problem for
> pg_restore's narrow usage, you'll be back revisiting it before beta1
> has been out a month.

So you say we should make any job that needs an exclusive lock on a
table to be able to cancel a running autovac job?  If we did that,
autovac couldn't do very much of anything.

If that's not what you're saying, I'm afraid I'm not getting it.

-- 
Alvaro Herrera                  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
Maybe there's lots of data loss but the records of data loss are also lost.
(Lincoln Yeoh)


Re: First steps with 8.3 and autovacuum launcher

From
"Matthew T. O'Connor"
Date:
Tom Lane wrote:
> If you insist on crafting a solution that only fixes this problem for
> pg_restore's narrow usage, you'll be back revisiting it before beta1
> has been out a month.


I don't know much about what is involved in crafting these solutions, 
but it seems we're close to beta and probably don't want to make drastic 
changes to anything.  As such it seems to me that solving the problem 
for analyze is a nice piece of low-hanging fruit that solves an 
immediate problem that has been reported.  I would think that reducing 
the locking implications of  VACUUM is much more involved, no?  Also, I 
would think that the fix for ANALYZE will be totally different than the 
fix for VACUUM no?

Are you proposing that we solve the VACUUM locking problem before we 
release 8.3?




Re: First steps with 8.3 and autovacuum launcher

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> So you say we should make any job that needs an exclusive lock on a
> table to be able to cancel a running autovac job?

I think we're going to be seeing complaints of this form until we do that.
The only reason this particular discussion is about pg_restore is that
that's the OP's first exposure to 8.3.

> If we did that, autovac couldn't do very much of anything.

In the worst case autovac could be starved out for a long time.
I don't have any immediate good idea about how to fix that, but
the worst consequences could be avoided if we disable the cancellation
ability when running an anti-wraparound vacuum.  Further down the road
(*not* 8.3), when we teach autovac about maintenance windows, it might
also disregard cancels during a maintenance window.
        regards, tom lane


Re: First steps with 8.3 and autovacuum launcher

From
Tom Lane
Date:
"Matthew T. O'Connor" <matthew@zeut.net> writes:
> I don't know much about what is involved in crafting these solutions, 
> but it seems we're close to beta and probably don't want to make drastic 
> changes to anything.  As such it seems to me that solving the problem 
> for analyze is a nice piece of low-hanging fruit that solves an 
> immediate problem that has been reported.

It's not as low-hanging as all that.  I'm not sure if you realized it,
but there was a whole lot of handwaving going in the earlier posts...
        regards, tom lane


Re: First steps with 8.3 and autovacuum launcher

From
Tom Lane
Date:
[ on further thought ]

"Matthew T. O'Connor" <matthew@zeut.net> writes:
>> ... solving the problem 
>> for analyze is a nice piece of low-hanging fruit that solves an 
>> immediate problem that has been reported.

Actually, if you wanted a low-hanging solution to that, it would
probably be to revert this 8.2 patch:
http://archives.postgresql.org/pgsql-committers/2006-09/msg00284.php

We did that because people were complaining of unexpected failures in
manual ANALYZEs, but perhaps the cure is worse than the disease.
        regards, tom lane


Re: First steps with 8.3 and autovacuum launcher

From
Alvaro Herrera
Date:
Tom Lane escribió:
> [ on further thought ]
> 
> "Matthew T. O'Connor" <matthew@zeut.net> writes:
> >> ... solving the problem 
> >> for analyze is a nice piece of low-hanging fruit that solves an 
> >> immediate problem that has been reported.
> 
> Actually, if you wanted a low-hanging solution to that, it would
> probably be to revert this 8.2 patch:
> http://archives.postgresql.org/pgsql-committers/2006-09/msg00284.php
> 
> We did that because people were complaining of unexpected failures in
> manual ANALYZEs, but perhaps the cure is worse than the disease.

How about getting ShareUpdateExclusiveLock on manual analyze and plain
AccessShareLock on autovacuum-induced analyze?

-- 
Alvaro Herrera                        http://www.advogato.org/person/alvherre
"I must say, I am absolutely impressed with what pgsql's implementation of
VALUES allows me to do. It's kind of ridiculous how much "work" goes away in
my code.  Too bad I can't do this at work (Oracle 8/9)."       (Tom Allison)
http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php


Re: First steps with 8.3 and autovacuum launcher

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> How about getting ShareUpdateExclusiveLock on manual analyze and plain
> AccessShareLock on autovacuum-induced analyze?

Wouldn't fix the original problem because those two lock types don't
conflict; hence might as well keep the behavior simple.
        regards, tom lane


Re: First steps with 8.3 and autovacuum launcher

From
Michael Paesold
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> How about getting ShareUpdateExclusiveLock on manual analyze and plain
>> AccessShareLock on autovacuum-induced analyze?
> 
> Wouldn't fix the original problem because those two lock types don't
> conflict; hence might as well keep the behavior simple.

What about a new separate lock type for analyze? Couldn't that really 
solve the issue? I know I'm just hand-waving here ;-)

Best Regards
Michael Paesold



Re: First steps with 8.3 and autovacuum launcher

From
Michael Paesold
Date:
Tom Lane wrote:
> In the worst case autovac could be starved out for a long time.
> I don't have any immediate good idea about how to fix that, but
> the worst consequences could be avoided if we disable the cancellation
> ability when running an anti-wraparound vacuum.  Further down the road
> (*not* 8.3), when we teach autovac about maintenance windows, it might
> also disregard cancels during a maintenance window.

During maintenance window, it could instead lower vacuum cost delay in 
the case it would otherwise abort. That would also make sure that the 
task would finally finish at some point (even when the maintenance 
window ends too early... or perhaps any cleanup task started during a 
maintenance window should keep it's "maintenance priority"?)

Best Regards
Michael Paesold




Re: First steps with 8.3 and autovacuum launcher

From
Tom Lane
Date:
I wrote:
> Actually, if you wanted a low-hanging solution to that, it would
> probably be to revert this 8.2 patch:
> http://archives.postgresql.org/pgsql-committers/2006-09/msg00284.php

Uh ... never mind, that won't help at all.  Reducing the lock taken for
ANALYZE could allow CREATE INDEX (ShareLock) to run concurrently.  But
ALTER TABLE wants AccessExclusive lock, and no amount of fine-tuning of
the conflicting lock will help.

It might be possible to solve this if we reduce the strength of the lock
used for ALTER TABLE.  We'd have to go through all the commands
potentially issued by a pg_dump script and see if they could all be made
to run concurrently with autovac, which is a bit nervous-making but
might be feasible; and I'm afraid tablecmds.c would need some
restructuring to not use the same lock type for every variant of ALTER.
But it seems like a path worth investigating.
        regards, tom lane


Re: First steps with 8.3 and autovacuum launcher

From
Alvaro Herrera
Date:
Tom Lane escribió:
> I wrote:
> > Actually, if you wanted a low-hanging solution to that, it would
> > probably be to revert this 8.2 patch:
> > http://archives.postgresql.org/pgsql-committers/2006-09/msg00284.php
> 
> Uh ... never mind, that won't help at all.  Reducing the lock taken for
> ANALYZE could allow CREATE INDEX (ShareLock) to run concurrently.  But
> ALTER TABLE wants AccessExclusive lock, and no amount of fine-tuning of
> the conflicting lock will help.
> 
> It might be possible to solve this if we reduce the strength of the lock
> used for ALTER TABLE.  We'd have to go through all the commands
> potentially issued by a pg_dump script and see if they could all be made
> to run concurrently with autovac, which is a bit nervous-making but
> might be feasible; and I'm afraid tablecmds.c would need some
> restructuring to not use the same lock type for every variant of ALTER.
> But it seems like a path worth investigating.

Right.  Are you going to work on it, or do you want me to propose a
patch?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: First steps with 8.3 and autovacuum launcher

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane escribi�:
>> It might be possible to solve this if we reduce the strength of the lock
>> used for ALTER TABLE.  We'd have to go through all the commands
>> potentially issued by a pg_dump script and see if they could all be made
>> to run concurrently with autovac, which is a bit nervous-making but
>> might be feasible; and I'm afraid tablecmds.c would need some
>> restructuring to not use the same lock type for every variant of ALTER.
>> But it seems like a path worth investigating.

> Right.  Are you going to work on it, or do you want me to propose a
> patch?

If you have time to work on it, please do.  But it seems like you first
need to do some investigation --- the idea may not work at all.

In any case, this would still only fix things for pg_restore, and I
remain concerned that people will gripe about autovacuum blocking
locks.  The idea of kicking autovac off tables remains probably more
interesting in the long run.
        regards, tom lane


Re: First steps with 8.3 and autovacuum launcher

From
Alvaro Herrera
Date:
Tom Lane escribió:

> >> It might be possible to solve this if we reduce the strength of the lock
> >> used for ALTER TABLE.  We'd have to go through all the commands
> >> potentially issued by a pg_dump script and see if they could all be made
> >> to run concurrently with autovac, which is a bit nervous-making but
> >> might be feasible; and I'm afraid tablecmds.c would need some
> >> restructuring to not use the same lock type for every variant of ALTER.
> >> But it seems like a path worth investigating.

I think this is doable.  We would need to add a phase 0 to ALTER TABLE
processing, which grabs a less strong (than AccessExclusive) lock on the
table, then goes over the list of commands and determine if at least one
of them requires exclusive access to the table (I think the criteria
here is whether table rewriting is needed, in which case AccessExclusive
is enough).  If none of them does, then we press on.

If at least one subcommand needs exclusive lock, then it releases the
first lock and grabs AccessExclusive.  Then it rechecks whether the
table is still OK (not dropped, maybe not renamed)

This is on a new phase 0, and not integrated on phase 1, because this
way we get all the permission checks and, more importantly, inheritance
checks after we have grabbed the correct lock.  (Note: GRANT and REVOKE
do not lock the table itself (only pg_class), so an ALTER TABLE could be
still be running when somebody revokes a needed privilege that was
checked at the start.  Surely this is not a concern for this patch.)

Initially I was proposing RowExclusiveLock for the first lock (because
it's the strongest lock that doesn't conflict with ShareUpdateExclusive
which is what ANALYZE uses).  The problem with this approach is that it
leads to two ALTER TABLE commands being able to run concurrently, which
is a disaster.  To solve that, my idea is to create another lock type,
which conflicts with itself but not with ShareUpdateExclusive.  Not sure
what to call it.

Comments?

-- 
Alvaro Herrera                  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"Cuando mañana llegue pelearemos segun lo que mañana exija" (Mowgli)


Re: First steps with 8.3 and autovacuum launcher

From
Alvaro Herrera
Date:
Alvaro Herrera escribió:

> I think this is doable.  We would need to add a phase 0 to ALTER TABLE
> processing, which grabs a less strong (than AccessExclusive) lock on the
> table, then goes over the list of commands and determine if at least one
> of them requires exclusive access to the table (I think the criteria
> here is whether table rewriting is needed, in which case AccessExclusive
> is enough).  If none of them does, then we press on.

As expected, this idea didn't fly very far.  The first problem I find
is that DefineIndex grabs a new lock by itself; if it's not a concurrent
build, it grabs ShareLock which immediately locks out ANALYZE.

It is worse than it sounds at first, because as soon as we are starting
ALTER TABLE with a less strong lock, then this ShareLock is deemed a
lock upgrade, with the ensuing dangers for deadlocks.

I'm starting to find this area of lock strength reduction a minefield,
one on which it is very easy to step on a mine.

So I'm back to considering a solution along the lines of cancelling a
running autovacuum job.  But I think I would do it only for ANALYZE, not
vacuum.

-- 
Alvaro Herrera       Valdivia, Chile   ICBM: S 39º 49' 18.1", W 73º 13' 56.4"
"La espina, desde que nace, ya pincha" (Proverbio africano)


Re: First steps with 8.3 and autovacuum launcher

From
Alvaro Herrera
Date:
I came up with the following patch.  What this does is cancel any
ANALYZE started by autovacuum, at the top of ALTER TABLE.

There is a new function relation_openrv_cav().  This is the same as
relation_openrv, except that it will also cancel analyzes.  I'm still
wondering if I should merge the two and have a third boolean argument to
specify whether to do the cancel.

Comments?

--
Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
"Ciencias políticas es la ciencia de entender por qué
 los políticos actúan como lo hacen"  (netfunny.com)

Attachment

Re: First steps with 8.3 and autovacuum launcher

From
Heikki Linnakangas
Date:
Alvaro Herrera wrote:
> I came up with the following patch.  What this does is cancel any
> ANALYZE started by autovacuum, at the top of ALTER TABLE.

There's a small race condition, autoanalyze could start between the
calls autovac_cancel_analyze and relation_open.

And it doesn't solve the problem for autovacuum. Or other commands than
ALTER TABLE, like CLUSTER.

> There is a new function relation_openrv_cav().  This is the same as
> relation_openrv, except that it will also cancel analyzes.  I'm still
> wondering if I should merge the two and have a third boolean argument to
> specify whether to do the cancel.

I like it better the way you have it now.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: First steps with 8.3 and autovacuum launcher

From
"Guillaume Smet"
Date:
Alvaro,

On 10/4/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> I came up with the following patch.  What this does is cancel any
> ANALYZE started by autovacuum, at the top of ALTER TABLE.

It doesn't seem to work for me. I still have my ALTER TABLEs waiting:
\_ postgres: postgres cityvox [local] ALTER TABLE waiting
\_ postgres: autovacuum worker process   cityvox
\_ postgres: autovacuum worker process   cityvox
\_ postgres: autovacuum worker process   cityvox

I have a lot of NOTICE:  cancelling auto-analyze lines in my log but
it doesn't seem to change the overall behaviour (even if it's a bit
better than before). With cost delay of 20:
real    44m20.525s
user    0m35.757s
sys     0m2.985s
compared to 51 minutes before applying your patch.

The box is still idle for a long periods during the restore.

Regards,

--
Guillaume


Re: First steps with 8.3 and autovacuum launcher

From
Alvaro Herrera
Date:
Guillaume Smet escribió:
> Alvaro,
> 
> On 10/4/07, Alvaro Herrera <alvherre@commandprompt.com> wrote:
> > I came up with the following patch.  What this does is cancel any
> > ANALYZE started by autovacuum, at the top of ALTER TABLE.
> 
> It doesn't seem to work for me. I still have my ALTER TABLEs waiting:
> \_ postgres: postgres cityvox [local] ALTER TABLE waiting
> \_ postgres: autovacuum worker process   cityvox
> \_ postgres: autovacuum worker process   cityvox
> \_ postgres: autovacuum worker process   cityvox
> 
> I have a lot of NOTICE:  cancelling auto-analyze lines in my log but
> it doesn't seem to change the overall behaviour (even if it's a bit
> better than before). With cost delay of 20:
> real    44m20.525s
> user    0m35.757s
> sys     0m2.985s
> compared to 51 minutes before applying your patch.

Hmm, it looks like the race condition Heikki mentioned is the culprit.
We need a way to stop future analyzes from starting.  Back to the
drawing board ...

-- 
Alvaro Herrera                               http://www.PlanetPostgreSQL.org/
Criptografía: Poderosa técnica algorítmica de codificación que es
empleada en la creación de manuales de computadores.


Re: First steps with 8.3 and autovacuum launcher

From
Simon Riggs
Date:
On Tue, 2007-10-02 at 11:17 -0400, Tom Lane wrote:

> In any case, this would still only fix things for pg_restore, and I
> remain concerned that people will gripe about autovacuum blocking
> locks.  The idea of kicking autovac off tables remains probably more
> interesting in the long run.

Yes, sounds good.

I'd also like to see vacuum_delay_point() do a test against
CountActiveBackends() to see if anything else is running. If there all
non-autovac processes are idle or waiting, then we should skip the delay
point, this time only. That way a VACUUM can go at full speed on an idle
system and slow down when people get active again. It will also help
when people issue a DDL statement against a table that is currently
being vacuumed. I've got a patch worked out to do this.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: First steps with 8.3 and autovacuum launcher

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> I'd also like to see vacuum_delay_point() do a test against
> CountActiveBackends() to see if anything else is running. If there all
> non-autovac processes are idle or waiting, then we should skip the delay
> point, this time only. That way a VACUUM can go at full speed on an idle
> system and slow down when people get active again. It will also help
> when people issue a DDL statement against a table that is currently
> being vacuumed. I've got a patch worked out to do this.

This is exceedingly Postgres-centric thinking.  The lack of any other
backends does not mean that the system owner wants Postgres to take over
the machine.
        regards, tom lane


Re: First steps with 8.3 and autovacuum launcher

From
Simon Riggs
Date:
On Thu, 2007-10-04 at 10:43 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > I'd also like to see vacuum_delay_point() do a test against
> > CountActiveBackends() to see if anything else is running. If there all
> > non-autovac processes are idle or waiting, then we should skip the delay
> > point, this time only. That way a VACUUM can go at full speed on an idle
> > system and slow down when people get active again. It will also help
> > when people issue a DDL statement against a table that is currently
> > being vacuumed. I've got a patch worked out to do this.
> 
> This is exceedingly Postgres-centric thinking.  

Thanks :-)

> The lack of any other
> backends does not mean that the system owner wants Postgres to take over
> the machine.

Good thought. Sounds like we'd benefit from having another parameter:

autovacuum_vacuum_delay_siblings = -1 (default) 0..INT_MAX

Minimum number of active backends before autovacuum delay becomes
effective. If there are fewer than <param> active backends automatic
maintenance will proceed at full speed when the opportunity arises. The
delay will vary dynamically, thus utilising quiet periods more
effectively as and when they occur. Set to -1 if the database server is
running on a shared system and you do not want quiet periods to be used
for maintenance.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: First steps with 8.3 and autovacuum launcher

From
Gregory Stark
Date:
"Alvaro Herrera" <alvherre@commandprompt.com> writes:

> Hmm, it looks like the race condition Heikki mentioned is the culprit.
> We need a way to stop future analyzes from starting.  Back to the
> drawing board ...

A crazy idea I just had -- what if you roll this into the deadlock check? So
after waiting on the lock for 1s it wakes up, finds that the holder it's
waiting on is an autovacuum process and cancels it instead of finding no
deadlock.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: First steps with 8.3 and autovacuum launcher

From
Simon Riggs
Date:
On Thu, 2007-10-04 at 16:07 +0100, Gregory Stark wrote:
> "Alvaro Herrera" <alvherre@commandprompt.com> writes:
> 
> > Hmm, it looks like the race condition Heikki mentioned is the culprit.
> > We need a way to stop future analyzes from starting.  Back to the
> > drawing board ...
> 
> A crazy idea I just had -- what if you roll this into the deadlock check? So
> after waiting on the lock for 1s it wakes up, finds that the holder it's
> waiting on is an autovacuum process and cancels it instead of finding no
> deadlock.

+1

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: First steps with 8.3 and autovacuum launcher

From
Heikki Linnakangas
Date:
Gregory Stark wrote:
> "Alvaro Herrera" <alvherre@commandprompt.com> writes:
> 
>> Hmm, it looks like the race condition Heikki mentioned is the culprit.
>> We need a way to stop future analyzes from starting.  Back to the
>> drawing board ...
> 
> A crazy idea I just had -- what if you roll this into the deadlock check? So
> after waiting on the lock for 1s it wakes up, finds that the holder it's
> waiting on is an autovacuum process and cancels it instead of finding no
> deadlock.

Another idea would be to make no changes to the relation_openrv call,
but have autovacuum periodically check if anyone's blocked waiting on
its locks, and commit suicide (or set cost delay to zero) if so.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: First steps with 8.3 and autovacuum launcher

From
Alvaro Herrera
Date:
Gregory Stark escribió:
> "Alvaro Herrera" <alvherre@commandprompt.com> writes:
> 
> > Hmm, it looks like the race condition Heikki mentioned is the culprit.
> > We need a way to stop future analyzes from starting.  Back to the
> > drawing board ...
> 
> A crazy idea I just had -- what if you roll this into the deadlock check? So
> after waiting on the lock for 1s it wakes up, finds that the holder it's
> waiting on is an autovacuum process and cancels it instead of finding no
> deadlock.

Another crazy idea is to have some sort of "blacklist" of tables in
shared memory.  Any autovacuum process would skip those tables.
My idea is that a would-be locker automatically puts the table in the
blacklist, then kill autovacs, then press on.

My idea is to accompany the relid with the Xid of the locker
transaction, so the worker checks whether the transaction is still
running, and removes the item from the blacklist if not.

(The only problem then is figuring out how large a black list to have,
and how to evict items when it is full and somebody else wants to
blacklist another table.  For pg_dump it is more than enough to have
MaxBackends, since there is always at most one transaction, but I
wouldn't be surprised if I'm overlooking something.)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: First steps with 8.3 and autovacuum launcher

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Gregory Stark escribi�:
>> A crazy idea I just had -- what if you roll this into the deadlock check? So
>> after waiting on the lock for 1s it wakes up, finds that the holder it's
>> waiting on is an autovacuum process and cancels it instead of finding no
>> deadlock.

> Another crazy idea is to have some sort of "blacklist" of tables in
> shared memory.  Any autovacuum process would skip those tables.

The deadlock check idea sounds promising to me, not least because it
avoids adding any cycles in performance-critical paths.  I'm not certain
how easy it'd be to fold the idea into the checker though.  That
logic is pretty complicated :-( and I'm not sure that it makes a
consistent effort to visit every possible blocker.
        regards, tom lane


Re: First steps with 8.3 and autovacuum launcher

From
Alvaro Herrera
Date:
Tom Lane escribió:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Gregory Stark escribi�:
> >> A crazy idea I just had -- what if you roll this into the deadlock check? So
> >> after waiting on the lock for 1s it wakes up, finds that the holder it's
> >> waiting on is an autovacuum process and cancels it instead of finding no
> >> deadlock.
> 
> > Another crazy idea is to have some sort of "blacklist" of tables in
> > shared memory.  Any autovacuum process would skip those tables.
> 
> The deadlock check idea sounds promising to me, not least because it
> avoids adding any cycles in performance-critical paths.  I'm not certain
> how easy it'd be to fold the idea into the checker though.  That
> logic is pretty complicated :-( and I'm not sure that it makes a
> consistent effort to visit every possible blocker.

The idea sounds interesting, but I am not at all sure how to fit it in
the deadlock code.

I am totally uninclined to mess with this stuff.  I am barely aware of
what exactly is it doing; I don't have the slightest idea how to modify
it to cancel autovacs.  Furthermore it sounds very much like a layering
violation (what is deadlock.c doing with autovac processes anyway).

-- 
Alvaro Herrera                        http://www.advogato.org/person/alvherre
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, CSLewis)


Re: First steps with 8.3 and autovacuum launcher

From
Simon Riggs
Date:
On Thu, 2007-10-04 at 14:33 -0400, Alvaro Herrera wrote:
> Tom Lane escribió:
> > Alvaro Herrera <alvherre@commandprompt.com> writes:
> > > Gregory Stark escribi:
> > >> A crazy idea I just had -- what if you roll this into the deadlock check? So
> > >> after waiting on the lock for 1s it wakes up, finds that the holder it's
> > >> waiting on is an autovacuum process and cancels it instead of finding no
> > >> deadlock.
> > 
> > > Another crazy idea is to have some sort of "blacklist" of tables in
> > > shared memory.  Any autovacuum process would skip those tables.
> > 
> > The deadlock check idea sounds promising to me, not least because it
> > avoids adding any cycles in performance-critical paths.  I'm not certain
> > how easy it'd be to fold the idea into the checker though.  That
> > logic is pretty complicated :-( and I'm not sure that it makes a
> > consistent effort to visit every possible blocker.
> 
> The idea sounds interesting, but I am not at all sure how to fit it in
> the deadlock code.
> 
> I am totally uninclined to mess with this stuff. 

I'll look at it, if Greg isn't already doing so.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: First steps with 8.3 and autovacuum launcher

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> I am totally uninclined to mess with this stuff.  I am barely aware of
> what exactly is it doing; I don't have the slightest idea how to modify
> it to cancel autovacs.  Furthermore it sounds very much like a layering
> violation (what is deadlock.c doing with autovac processes anyway).

I think any fix for this at all is going to qualify as a layering
violation, so I'm not sure that that objection has merit.  I hear you on
the first point though.  I used to know how the deadlock checker worked,
let me see if I can swap that knowledge back in.

This change isn't going to make beta1 anyway ...
        regards, tom lane


Re: First steps with 8.3 and autovacuum launcher

From
Simon Riggs
Date:
On Thu, 2007-10-04 at 19:40 +0100, Simon Riggs wrote:
> On Thu, 2007-10-04 at 14:33 -0400, Alvaro Herrera wrote:
> > Tom Lane escribió:
> > > Alvaro Herrera <alvherre@commandprompt.com> writes:
> > > > Gregory Stark escribi:
> > > >> A crazy idea I just had -- what if you roll this into the deadlock check? So
> > > >> after waiting on the lock for 1s it wakes up, finds that the holder it's
> > > >> waiting on is an autovacuum process and cancels it instead of finding no
> > > >> deadlock.
> > > 
> > > > Another crazy idea is to have some sort of "blacklist" of tables in
> > > > shared memory.  Any autovacuum process would skip those tables.
> > > 
> > > The deadlock check idea sounds promising to me, not least because it
> > > avoids adding any cycles in performance-critical paths.  I'm not certain
> > > how easy it'd be to fold the idea into the checker though.  That
> > > logic is pretty complicated :-( and I'm not sure that it makes a
> > > consistent effort to visit every possible blocker.
> > 
> > The idea sounds interesting, but I am not at all sure how to fit it in
> > the deadlock code.
> > 
> > I am totally uninclined to mess with this stuff. 
> 
> I'll look at it, if Greg isn't already doing so.

Seems like we don't need to mess with the deadlock checker itself.

We can rely on the process at the head of the lock wait queue to sort
this out for us. So all we need do is look at the isAutovacuum flag on
the process that is holding the lock we're waiting on. If it isn't an
autoANALYZE we can carry on with the main deadlock check. We just need a
new kind of deadlock state to handle this, then let ProcSleep send
SIGINT to the autoANALYZE and then go back to sleep, waiting to be
reawoken when the auotANALYZE aborts.

So it looks do-able without major horrors.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: First steps with 8.3 and autovacuum launcher

From
Alvaro Herrera
Date:
Simon Riggs escribió:

> Seems like we don't need to mess with the deadlock checker itself.
>
> We can rely on the process at the head of the lock wait queue to sort
> this out for us. So all we need do is look at the isAutovacuum flag on
> the process that is holding the lock we're waiting on. If it isn't an
> autoANALYZE we can carry on with the main deadlock check. We just need a
> new kind of deadlock state to handle this, then let ProcSleep send
> SIGINT to the autoANALYZE and then go back to sleep, waiting to be
> reawoken when the auotANALYZE aborts.

Ok, I think this makes sense.

I can offer the following patch -- it makes it possible to determine
whether an autovacuum process is doing analyze or not, by comparing the
PGPROC of the running WorkerInfo list (the list has at most
max_autovacuum_workers entries, so this is better than trolling
ProcGlobal).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment

Re: First steps with 8.3 and autovacuum launcher

From
Simon Riggs
Date:
On Thu, 2007-10-04 at 17:33 -0400, Alvaro Herrera wrote:
> Simon Riggs escribió:
> 
> > Seems like we don't need to mess with the deadlock checker itself.
> > 
> > We can rely on the process at the head of the lock wait queue to sort
> > this out for us. So all we need do is look at the isAutovacuum flag on
> > the process that is holding the lock we're waiting on. If it isn't an
> > autoANALYZE we can carry on with the main deadlock check. We just need a
> > new kind of deadlock state to handle this, then let ProcSleep send
> > SIGINT to the autoANALYZE and then go back to sleep, waiting to be
> > reawoken when the auotANALYZE aborts.
> 
> Ok, I think this makes sense.
> 
> I can offer the following patch -- it makes it possible to determine
> whether an autovacuum process is doing analyze or not, by comparing the
> PGPROC of the running WorkerInfo list (the list has at most
> max_autovacuum_workers entries, so this is better than trolling
> ProcGlobal).

Looks OK to me, thanks for noticing I glossed over the bit about how to
tell whether it was an auto-ANALYZE.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: First steps with 8.3 and autovacuum launcher

From
Simon Riggs
Date:
On Thu, 2007-10-04 at 17:33 -0400, Alvaro Herrera wrote:
> Simon Riggs escribió:
> 
> > Seems like we don't need to mess with the deadlock checker itself.
> > 
> > We can rely on the process at the head of the lock wait queue to sort
> > this out for us. So all we need do is look at the isAutovacuum flag on
> > the process that is holding the lock we're waiting on. If it isn't an
> > autoANALYZE we can carry on with the main deadlock check. We just need a
> > new kind of deadlock state to handle this, then let ProcSleep send
> > SIGINT to the autoANALYZE and then go back to sleep, waiting to be
> > reawoken when the auotANALYZE aborts.
> 
> Ok, I think this makes sense.
> 
> I can offer the following patch -- it makes it possible to determine
> whether an autovacuum process is doing analyze or not, by comparing the
> PGPROC of the running WorkerInfo list (the list has at most
> max_autovacuum_workers entries, so this is better than trolling
> ProcGlobal).

OK, I've got this working now. It successfully handles this test case,
which trips up on an auto ANALYZE every time I run it.

--
drop table a;

create table a as select generate_series(1,1000000)::integer as col1;

alter table a alter column col1 type bigint;
alter table a alter column col1 type bigint;
alter table a alter column col1 type bigint;
alter table a alter column col1 type bigint;
alter table a alter column col1 type bigint;
--

I think there may be a cleaner implementation, so I'll clean it up and
post tomorrow.

Few thoughts:

Why do we run all of the ANALYZEs in a single big transaction? That
seems like it could be the cause of many problems. ANALYZE specifically
holds locks until EOXact, so I'd recommend we start a new transaction
for each one. What do you think? 

I notice when we cancel an AV worker it always says "cancelling
autovacuum of table", even when its just an ANALYZE. Wasn't important
before but now looks a little strange.

If you want to commit this patch, I'll layer mine over the top.

Any other input anyone?

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: First steps with 8.3 and autovacuum launcher

From
Alvaro Herrera
Date:
Simon Riggs escribió:

> OK, I've got this working now. It successfully handles this test case,
> which trips up on an auto ANALYZE every time I run it.

OK, nice, send your patch.  I'm a bit disconnected these days so I'm not
sure when, but I'll commit mine shortly.

-- 
Alvaro Herrera                  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"On the other flipper, one wrong move and we're Fatal Exceptions"
(T.U.X.: Term Unit X  - http://www.thelinuxreview.com/TUX/)


Re: First steps with 8.3 and autovacuum launcher

From
"Heikki Linnakangas"
Date:
Simon Riggs wrote:
> My thoughts are that it doesn't need to. Typically we create objects and
> then fill them. It isn't that frequent that we would load data, then
> delete or update more than 20% of it, then attempt other DDL.

One scenario that comes to mind is a table that's used in OLTP fashion
during day, but it's taken offline for data loading during night. To
speed up the data loading, indexes are dropped before the load and
recreated afterwards.

Even if there's no dead rows in a table, autovacuum will still kick in
to freeze it at some point.

> If a COPY fails it will create dead rows, which should be cleared up by
> an autoVACUUM. If a COPY fails, the user knows to run a VACUUM or a
> re-TRUNCATE before re-attempting a modified COPY. So there is potential
> for more than one VACUUM to be attempted in that case.

I wish the user didn't have to know to do that.

> So there could be an argument for TRUNCATE causing a cancellation of a
> VACUUM, but I don't see the use case for other DDL. Maybe it would be
> easier to make all conflicting lock requestors cancel VACUUM.

Any VACUUM, or just autovacuum?

The only danger I can see is that the autovacuum is always killed and
never gets to finish, leading to degrading performance at first and
shutdown to prevent xid wraparound at the extreme. Doesn't seem likely
under normal circumstances, though. A scenario that comes to mind is
having very lazy autovacuum settings, so that vacuum of the table takes
longer than 24h, and a daily cron job to run REINDEX.

The "priority inheritance" scheme I proposed earlier would work well
with that: instead of killing the autovacuum, set cost delay to zero to
let it finish out of the way ASAP. It has it's own set of problems,
though. An innocent-looking DROP INDEX would cause the autovacuum to go
full steam ahead, hurting performance for others.

> I think it would be helpful if user-initiated VACUUMs waited behind
> another VACUUM that was already in progress on the table and then
> returned immediately as successful when the first VACUUM finishes. That
> would seem better than queuing up behind the first VACUUM and then
> repeating the process. 

I don't think that's a good idea. The second VACUUM wouldn't be a no-op,
it would clean up any dead rows accumulated during the first VACUUM.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: First steps with 8.3 and autovacuum launcher

From
Simon Riggs
Date:
Heikki,

Thanks for your comments, we do need some review on the expected
behaviour.

On Wed, 2007-10-10 at 11:17 +0100, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > My thoughts are that it doesn't need to. Typically we create objects and
> > then fill them. It isn't that frequent that we would load data, then
> > delete or update more than 20% of it, then attempt other DDL.
> 
> One scenario that comes to mind is a table that's used in OLTP fashion
> during day, but it's taken offline for data loading during night. To
> speed up the data loading, indexes are dropped before the load and
> recreated afterwards.

Yes, delaying the index re-creation could cause an effective outage,
even if it is more efficient to let the VACUUM happen then re-add the
indexes.

> Even if there's no dead rows in a table, autovacuum will still kick in
> to freeze it at some point.

Yeh, but that can wait a while.

> > If a COPY fails it will create dead rows, which should be cleared up by
> > an autoVACUUM. If a COPY fails, the user knows to run a VACUUM or a
> > re-TRUNCATE before re-attempting a modified COPY. So there is potential
> > for more than one VACUUM to be attempted in that case.
> 
> I wish the user didn't have to know to do that.

Yeh, but thats an 8.4 feature now.

> > So there could be an argument for TRUNCATE causing a cancellation of a
> > VACUUM, but I don't see the use case for other DDL. Maybe it would be
> > easier to make all conflicting lock requestors cancel VACUUM.
> 
> Any VACUUM, or just autovacuum?

After some thought, you and Michael have persuaded me that there is
cause to do this for VACUUM as well, but just autovacuum, I think. That
also makes the patch simpler, since we don't need to delve inside the av
worker to see what it is doing.

Alvaro: That means we can just skip your patch altogether, or at least
we can discuss them separately now.

> The only danger I can see is that the autovacuum is always killed and
> never gets to finish, leading to degrading performance at first and
> shutdown to prevent xid wraparound at the extreme. Doesn't seem likely
> under normal circumstances, though. 

Yeh agreed. Table locks aren't that common, so I think we are safe for
100s of millions of transactions. The user has log messages to warn of
that, so I think we're good.

> A scenario that comes to mind is
> having very lazy autovacuum settings, so that vacuum of the table takes
> longer than 24h, and a daily cron job to run REINDEX.

A table that big would have a REINDEX run for a very long time too, so I
hope the user would notice before too long.

> The "priority inheritance" scheme I proposed earlier would work well
> with that: instead of killing the autovacuum, set cost delay to zero to
> let it finish out of the way ASAP. It has it's own set of problems,
> though. An innocent-looking DROP INDEX would cause the autovacuum to go
> full steam ahead, hurting performance for others.

Not very nice performance behaviour. 

> > I think it would be helpful if user-initiated VACUUMs waited behind
> > another VACUUM that was already in progress on the table and then
> > returned immediately as successful when the first VACUUM finishes. That
> > would seem better than queuing up behind the first VACUUM and then
> > repeating the process. 
> 
> I don't think that's a good idea. The second VACUUM wouldn't be a no-op,
> it would clean up any dead rows accumulated during the first VACUUM.

That was my first reaction to that thought too!

In practice, whoever submitted the first VACUUM can re-run it. So that
might be a custom program emitting a stream of VACUUMs or autovacuum
doing the same thing. If we need to VACUUM almost continuously then
autovacuum will realise this and re-submit. Sitting in the locking queue
won't make anything more efficient; like waiting for Rolling Stones
tickets at 4am doesn't make them play any better at the gig.

I'm not proposing to do this latter idea for now though.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: First steps with 8.3 and autovacuum launcher

From
Michael Paesold
Date:
Simon Riggs wrote:
> OK, I've got this working now. It successfully handles this test case,
> which trips up on an auto ANALYZE every time I run it.
...
> I notice when we cancel an AV worker it always says "cancelling
> autovacuum of table", even when its just an ANALYZE. Wasn't important
> before but now looks a little strange.
...
> Any other input anyone?

What about VACUUM (not just ANALYZE)? The starter of the thread 
"Possible bugreport 8.3 beta1 on Win32: Looking like a deadlock with 
AutoVacuum" complained about vacuum, not analyze.

It is just as Tom said earlier: it will be before end of beta that 
people will complain about more than just restoring dumps. ;-)

So does this approach work for both analyze as well as vacuum?

Best Regards
Michael Paesold



Re: First steps with 8.3 and autovacuum launcher

From
Simon Riggs
Date:
On Wed, 2007-10-10 at 11:04 +0200, Michael Paesold wrote:
> Simon Riggs wrote:
> > OK, I've got this working now. It successfully handles this test case,
> > which trips up on an auto ANALYZE every time I run it.
> ...
> > I notice when we cancel an AV worker it always says "cancelling
> > autovacuum of table", even when its just an ANALYZE. Wasn't important
> > before but now looks a little strange.
> ...
> > Any other input anyone?
> 
> What about VACUUM (not just ANALYZE)? The starter of the thread 
> "Possible bugreport 8.3 beta1 on Win32: Looking like a deadlock with 
> AutoVacuum" complained about vacuum, not analyze.
> 
> It is just as Tom said earlier: it will be before end of beta that 
> people will complain about more than just restoring dumps. ;-)

I'm not looking at this from the perspective of how to make restores
work better, I'm looking at the common use cases. Re-adding FKs after a
major data load is part of the Performance Tips section.

> So does this approach work for both analyze as well as vacuum?

It doesn't work with VACUUM.

My thoughts are that it doesn't need to. Typically we create objects and
then fill them. It isn't that frequent that we would load data, then
delete or update more than 20% of it, then attempt other DDL.

If a COPY fails it will create dead rows, which should be cleared up by
an autoVACUUM. If a COPY fails, the user knows to run a VACUUM or a
re-TRUNCATE before re-attempting a modified COPY. So there is potential
for more than one VACUUM to be attempted in that case.

So there could be an argument for TRUNCATE causing a cancellation of a
VACUUM, but I don't see the use case for other DDL. Maybe it would be
easier to make all conflicting lock requestors cancel VACUUM.

I think it would be helpful if user-initiated VACUUMs waited behind
another VACUUM that was already in progress on the table and then
returned immediately as successful when the first VACUUM finishes. That
would seem better than queuing up behind the first VACUUM and then
repeating the process. 

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: First steps with 8.3 and autovacuum launcher

From
Gregory Stark
Date:
"Gregory Stark" <stark@enterprisedb.com> writes:

> "Alvaro Herrera" <alvherre@commandprompt.com> writes:
>
>> Hmm, it looks like the race condition Heikki mentioned is the culprit.
>> We need a way to stop future analyzes from starting.  Back to the
>> drawing board ...
>
> A crazy idea I just had -- what if you roll this into the deadlock check? So
> after waiting on the lock for 1s it wakes up, finds that the holder it's
> waiting on is an autovacuum process and cancels it instead of finding no
> deadlock.

Having given this a bit of thought I think it can be made quite general:

I would assign every process a priority which could be just an unsigned
integer from 0..255 stored in PGPROC. By default each process gets 128 except
autovacuum which sets its priority to 64 and for safety I would set bgwriter
and walwriter to 255, though afaik they never acquire any user-level locks..

When DeadLockCheck traverses the waiters it's normally looking for waiters
which are blocked waiting on a lock it holds itself already. I would add that
it should also compare the priority. If any waiter ahead of itself would block
this process but has a lower priority then that should be corrected.

If the process already has acquired the lock then it should be delivered a
SIGINT. I'm not sure if that works on Windows, if not then we would have to
introduce a flag in PGPROC which would also be checked on
CHECK_FOR_INTERRUPTS.

If not then ideally I would think it should try to just push the low priority
waiter ahead of it to the back of the queue. But I'm not sure how easy that
would be. Specifically I'm not sure if we would still reliably detect real
deadlocks. It seems safest to just deliver a SIGINT in that case as well,
since in the case of autovacuum it won't matter.

This also allows for possibly having a guc which allows users to lower their
own priority. In which case their queries would be automatically killed if
they held up someone else at a normal priority.

Having a flag in PGPROC instead of delivering SIGINT is attractive for other
reasons aside from portability. It would open the door to having vacuums
(perhaps just interactive vacuums) continue but reduce or zero their
vacuum_cost_delay for example. Or perhaps they could release their analyze
locks giving up on analyzing the table but continue with the vacuuming.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: First steps with 8.3 and autovacuum launcher

From
Michael Paesold
Date:
Simon Riggs wrote:
> After some thought, you and Michael have persuaded me that there is
> cause to do this for VACUUM as well, but just autovacuum, I think. That
> also makes the patch simpler, since we don't need to delve inside the av
> worker to see what it is doing.
> 
> Alvaro: That means we can just skip your patch altogether, or at least
> we can discuss them separately now.
...
>> The only danger I can see is that the autovacuum is always killed and
>> never gets to finish, leading to degrading performance at first and
>> shutdown to prevent xid wraparound at the extreme. Doesn't seem likely
>> under normal circumstances, though. 
> 
> Yeh agreed. Table locks aren't that common, so I think we are safe for
> 100s of millions of transactions. The user has log messages to warn of
> that, so I think we're good.

Hmm, I am not sure we are there, yet. Autovacuum does take extra care to 
vacuum tables nearing xid wrap-around, right? It even does so when 
autovacuum is disabled in the configuration.

So in case a vacuum is needed for that very reason, the vacuum should *not* 
be canceled, of course. So we don't really need the information, whether 
the AV worker is doing VACUUM or ANALYZE, but whether it is critical 
against xid wrap-around. Could that be done as easily as in Alvaro's patch 
for distinguishing vacuum/analyze? Alvaro?

The other thing I am wondering about is, whether it would be a safer 
approach to let the DBA decide whether to cancel AV vacuums or just disable 
cost-delay, as Heikki suggested. There might be valid work-loads for both 
options...

Btw., I am grateful you took up the work here, Simon.

Best Regards
Michael Paesold


Re: First steps with 8.3 and autovacuum launcher

From
Simon Riggs
Date:
On Thu, 2007-10-11 at 21:59 +0200, Michael Paesold wrote:

> So in case a vacuum is needed for that very reason, the vacuum should *not* 
> be canceled, of course. So we don't really need the information, whether 
> the AV worker is doing VACUUM or ANALYZE, but whether it is critical 
> against xid wrap-around. Could that be done as easily as in Alvaro's patch 
> for distinguishing vacuum/analyze? Alvaro?

Well, I did think about this.

We probably want to preserve the ability of an autovacuum to be manually
cancelled. So the only way to do this is by letting the would-be
canceller know that they shouldn't cancel that one by marking the
autovacuum to show it is a "compulsory" one. We could change the field
on PGPROC from a boolean isAutovacuum to a status flag, so we have bit
flags for IS_AUTOVACUUM and IS_WRAPAROUND_AVOIDANCE. I think that's
overkill personally, but you might argue me round.

> The other thing I am wondering about is, whether it would be a safer 
> approach to let the DBA decide whether to cancel AV vacuums or just disable 
> cost-delay, as Heikki suggested. There might be valid work-loads for both 
> options...

Cancelling the VACUUM hurts nobody, and allows the DDL to be run now,
not later when the database server gets round to it. Speeding up a
delayed vacuum will hurt everybody. A big VACUUM can last hours, even at
full speed and that is a big beast to let loose during prime time.

BTW I took the liberty of starting a new thread on this.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: First steps with 8.3 and autovacuum launcher

From
Alvaro Herrera
Date:
Michael Paesold escribió:
> Simon Riggs wrote:

> Hmm, I am not sure we are there, yet. Autovacuum does take extra care to 
> vacuum tables nearing xid wrap-around, right? It even does so when 
> autovacuum is disabled in the configuration.
>
> So in case a vacuum is needed for that very reason, the vacuum should *not* 
> be canceled, of course. So we don't really need the information, whether 
> the AV worker is doing VACUUM or ANALYZE, but whether it is critical 
> against xid wrap-around. Could that be done as easily as in Alvaro's patch 
> for distinguishing vacuum/analyze? Alvaro?

Yes, I think it is easy to mark the "is for xid wraparound" bit in the
WorkerInfo struct and have the cancel work only if it's off.

However, what I think should happen is that the signal handler for
SIGINT in a worker for xid wraparound should not cancel the current
vacuum.  Instead turn it into a no-op, if possible.  That way we also
disallow a user from cancelling vacuums for xid wraparound.  I think he
can do that with pg_cancel_backend, and it could be dangerous.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: First steps with 8.3 and autovacuum launcher

From
Simon Riggs
Date:
On Fri, 2007-10-12 at 01:24 -0400, Alvaro Herrera wrote:
> Michael Paesold escribió:
> > Simon Riggs wrote:
> 
> > Hmm, I am not sure we are there, yet. Autovacuum does take extra care to 
> > vacuum tables nearing xid wrap-around, right? It even does so when 
> > autovacuum is disabled in the configuration.
> >
> > So in case a vacuum is needed for that very reason, the vacuum should *not* 
> > be canceled, of course. So we don't really need the information, whether 
> > the AV worker is doing VACUUM or ANALYZE, but whether it is critical 
> > against xid wrap-around. Could that be done as easily as in Alvaro's patch 
> > for distinguishing vacuum/analyze? Alvaro?
> 
> Yes, I think it is easy to mark the "is for xid wraparound" bit in the
> WorkerInfo struct and have the cancel work only if it's off.
> 
> However, what I think should happen is that the signal handler for
> SIGINT in a worker for xid wraparound should not cancel the current
> vacuum.  Instead turn it into a no-op, if possible.  That way we also
> disallow a user from cancelling vacuums for xid wraparound.  I think he
> can do that with pg_cancel_backend, and it could be dangerous.

I think that is dangerous too because the user may have specifically
turned AV off. That anti-wraparound vacuum might spring up right in a
busy period and start working its way through many tables, all of which
cause massive writes to occur. That's about as close to us causing an
outage as I ever want to see. We need a way through that to allow the
user to realise his predicament and find a good time to VACUUM. I never
want to say to anybody "nothing you can do, just sit and watch, your
production system will be working again in no time. Restart? no that
won't work either."

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: First steps with 8.3 and autovacuum launcher

From
Simon Riggs
Date:
On Fri, 2007-10-12 at 07:17 +0100, Simon Riggs wrote:
> On Fri, 2007-10-12 at 01:24 -0400, Alvaro Herrera wrote:
> > Michael Paesold escribió:
> > > Simon Riggs wrote:
> > 
> > > Hmm, I am not sure we are there, yet. Autovacuum does take extra care to 
> > > vacuum tables nearing xid wrap-around, right? It even does so when 
> > > autovacuum is disabled in the configuration.
> > >
> > > So in case a vacuum is needed for that very reason, the vacuum should *not* 
> > > be canceled, of course. So we don't really need the information, whether 
> > > the AV worker is doing VACUUM or ANALYZE, but whether it is critical 
> > > against xid wrap-around. Could that be done as easily as in Alvaro's patch 
> > > for distinguishing vacuum/analyze? Alvaro?
> > 
> > Yes, I think it is easy to mark the "is for xid wraparound" bit in the
> > WorkerInfo struct and have the cancel work only if it's off.
> > 
> > However, what I think should happen is that the signal handler for
> > SIGINT in a worker for xid wraparound should not cancel the current
> > vacuum.  Instead turn it into a no-op, if possible.  That way we also
> > disallow a user from cancelling vacuums for xid wraparound.  I think he
> > can do that with pg_cancel_backend, and it could be dangerous.
> 
> I think that is dangerous too because the user may have specifically
> turned AV off. That anti-wraparound vacuum might spring up right in a
> busy period and start working its way through many tables, all of which
> cause massive writes to occur. That's about as close to us causing an
> outage as I ever want to see. We need a way through that to allow the
> user to realise his predicament and find a good time to VACUUM. I never
> want to say to anybody "nothing you can do, just sit and watch, your
> production system will be working again in no time. Restart? no that
> won't work either."

I think the best way to handle this is to have two limits.

First limit attempts to autovacuum, but can be cancelled.

When we hit second limit, sometime later, then autovacuum cannot be
cancelled.

That would give us a breathing space if we need it.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: First steps with 8.3 and autovacuum launcher

From
Deblauwe Gino
Date:
Simon Riggs schreef: <blockquote cite="mid:1192171225.4233.480.camel@ebony.site" type="cite"><pre wrap="">On Fri,
2007-10-12at 07:17 +0100, Simon Riggs wrote: </pre><blockquote type="cite"><pre wrap="">On Fri, 2007-10-12 at 01:24
-0400,Alvaro Herrera wrote:   </pre><blockquote type="cite"><pre wrap="">Michael Paesold escribió:
</pre><blockquotetype="cite"><pre wrap="">Simon Riggs wrote:       </pre></blockquote><blockquote type="cite"><pre
wrap="">Hmm,I am not sure we are there, yet. Autovacuum does take extra care to 
 
vacuum tables nearing xid wrap-around, right? It even does so when 
autovacuum is disabled in the configuration.

So in case a vacuum is needed for that very reason, the vacuum should *not* 
be canceled, of course. So we don't really need the information, whether 
the AV worker is doing VACUUM or ANALYZE, but whether it is critical 
against xid wrap-around. Could that be done as easily as in Alvaro's patch 
for distinguishing vacuum/analyze? Alvaro?       </pre></blockquote><pre wrap="">Yes, I think it is easy to mark the
"isfor xid wraparound" bit in the
 
WorkerInfo struct and have the cancel work only if it's off.

However, what I think should happen is that the signal handler for
SIGINT in a worker for xid wraparound should not cancel the current
vacuum.  Instead turn it into a no-op, if possible.  That way we also
disallow a user from cancelling vacuums for xid wraparound.  I think he
can do that with pg_cancel_backend, and it could be dangerous.     </pre></blockquote><pre wrap="">I think that is
dangeroustoo because the user may have specifically
 
turned AV off. That anti-wraparound vacuum might spring up right in a
busy period and start working its way through many tables, all of which
cause massive writes to occur. That's about as close to us causing an
outage as I ever want to see. We need a way through that to allow the
user to realise his predicament and find a good time to VACUUM. I never
want to say to anybody "nothing you can do, just sit and watch, your
production system will be working again in no time. Restart? no that
won't work either."   </pre></blockquote><pre wrap="">
I think the best way to handle this is to have two limits.

First limit attempts to autovacuum, but can be cancelled.

When we hit second limit, sometime later, then autovacuum cannot be
cancelled.

That would give us a breathing space if we need it.
 </pre></blockquote> Just a few thoughts: <br /><br /> 1) In the postgresql.conf you can define if you use autovacuum. 
<br/> You make a parameter that states a time of day.  <br /> If autovacuum is canceled once and not performed manually
beforethat time, <br /> then it executes at that time (or just after the next system restart after that time).  <br />
Soyou ensure that it isn't delayed indefinitely and you execute it on a time the database is normally not under a heavy
load.<br/> As a standard value you could take 2am in the morning or so.  <br /><br /> 2) I you can cancel an autovacuum
thatway, could you prevent it by a statement to start executing in the first<br /> place, and then restart execution by
anotherstatement.  There are a few situations where vacuuming is entirely pointless<br /><br /> Example: <br /> a)
Everyonelogs out, upgradeprocedure of db is started<br /> b) drop indexes<br /> c) add tables/change tables/add
columns/changecolumns<br /> d) convert data<br /> e) drop tables/drop columns<br /> f) add indexes<br /> g) vacuum full
analyze<br/> h) Everyone starts new app<br /><br /> BTW: I like pg83, allready looking for implementation when it hits
theshelves...<br /> 

Re: First steps with 8.3 and autovacuum launcher

From
Michael Paesold
Date:
Simon Riggs wrote:
> I think the best way to handle this is to have two limits.
> 
> First limit attempts to autovacuum, but can be cancelled.
> 
> When we hit second limit, sometime later, then autovacuum cannot be
> cancelled.
> 
> That would give us a breathing space if we need it.

Sounds quite reasonable.

Best Regards
Michael Paesold


Re: First steps with 8.3 and autovacuum launcher

From
Michael Paesold
Date:
Simon Riggs wrote:
> On Fri, 2007-10-12 at 01:24 -0400, Alvaro Herrera wrote:
>> Yes, I think it is easy to mark the "is for xid wraparound" bit in the
>> WorkerInfo struct and have the cancel work only if it's off.
>>
>> However, what I think should happen is that the signal handler for
>> SIGINT in a worker for xid wraparound should not cancel the current
>> vacuum.  Instead turn it into a no-op, if possible.  That way we also
>> disallow a user from cancelling vacuums for xid wraparound.  I think he
>> can do that with pg_cancel_backend, and it could be dangerous.
> 
> I think that is dangerous too because the user may have specifically
> turned AV off. That anti-wraparound vacuum might spring up right in a
> busy period and start working its way through many tables, all of which
> cause massive writes to occur. That's about as close to us causing an
> outage as I ever want to see. We need a way through that to allow the
> user to realise his predicament and find a good time to VACUUM. I never
> want to say to anybody "nothing you can do, just sit and watch, your
> production system will be working again in no time. Restart? no that
> won't work either."

You are probably right that VACUUM going full-steam is a bad idea in most 
situations. Except for anti-wraparound vacuum, cancellation seems the most 
reasonable thing to do. Because autovacuum will usually pickup the table in 
time again.

The only problem I would see is if someone has an application that does a 
lot of schema changes (doesn't sound like a good idea anyway). In that case 
they would better issue manual vacuums on such tables.

Best Regards
Michael Paesold


Re: First steps with 8.3 and autovacuum launcher

From
Simon Riggs
Date:
On Fri, 2007-10-12 at 11:44 +0200, Michael Paesold wrote:
> Simon Riggs wrote:
> > On Fri, 2007-10-12 at 01:24 -0400, Alvaro Herrera wrote:
> >> Yes, I think it is easy to mark the "is for xid wraparound" bit in the
> >> WorkerInfo struct and have the cancel work only if it's off.
> >>
> >> However, what I think should happen is that the signal handler for
> >> SIGINT in a worker for xid wraparound should not cancel the current
> >> vacuum.  Instead turn it into a no-op, if possible.  That way we also
> >> disallow a user from cancelling vacuums for xid wraparound.  I think he
> >> can do that with pg_cancel_backend, and it could be dangerous.
> > 
> > I think that is dangerous too because the user may have specifically
> > turned AV off. That anti-wraparound vacuum might spring up right in a
> > busy period and start working its way through many tables, all of which
> > cause massive writes to occur. That's about as close to us causing an
> > outage as I ever want to see. We need a way through that to allow the
> > user to realise his predicament and find a good time to VACUUM. I never
> > want to say to anybody "nothing you can do, just sit and watch, your
> > production system will be working again in no time. Restart? no that
> > won't work either."
> 
> You are probably right that VACUUM going full-steam is a bad idea in most 
> situations. Except for anti-wraparound vacuum, cancellation seems the most 
> reasonable thing to do. Because autovacuum will usually pickup the table in 
> time again.

Yeh, if we do have to do the second emergency anti-wraparound, then that
should be at full speed, since there's nothing else to do at that point.

> The only problem I would see is if someone has an application that does a 
> lot of schema changes (doesn't sound like a good idea anyway). In that case 
> they would better issue manual vacuums on such tables.

I can't see a use case for regular DDL as part of an application, on an
otherwise integral table (lots of updates and deletes).

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: First steps with 8.3 and autovacuum launcher

From
Deblauwe Gino
Date:
Simon Riggs schreef: <blockquote cite="mid:1192186854.4233.508.camel@ebony.site" type="cite"><pre wrap="">On Fri,
2007-10-12at 11:44 +0200, Michael Paesold wrote: </pre><blockquote type="cite"><pre wrap="">Simon Riggs wrote:
</pre><blockquotetype="cite"><pre wrap="">On Fri, 2007-10-12 at 01:24 -0400, Alvaro Herrera wrote:
</pre><blockquotetype="cite"><pre wrap="">Yes, I think it is easy to mark the "is for xid wraparound" bit in the
 
WorkerInfo struct and have the cancel work only if it's off.

However, what I think should happen is that the signal handler for
SIGINT in a worker for xid wraparound should not cancel the current
vacuum.  Instead turn it into a no-op, if possible.  That way we also
disallow a user from cancelling vacuums for xid wraparound.  I think he
can do that with pg_cancel_backend, and it could be dangerous.       </pre></blockquote><pre wrap="">I think that is
dangeroustoo because the user may have specifically
 
turned AV off. That anti-wraparound vacuum might spring up right in a
busy period and start working its way through many tables, all of which
cause massive writes to occur. That's about as close to us causing an
outage as I ever want to see. We need a way through that to allow the
user to realise his predicament and find a good time to VACUUM. I never
want to say to anybody "nothing you can do, just sit and watch, your
production system will be working again in no time. Restart? no that
won't work either."     </pre></blockquote><pre wrap="">You are probably right that VACUUM going full-steam is a bad
ideain most 
 
situations. Except for anti-wraparound vacuum, cancellation seems the most 
reasonable thing to do. Because autovacuum will usually pickup the table in 
time again.   </pre></blockquote><pre wrap="">
Yeh, if we do have to do the second emergency anti-wraparound, then that
should be at full speed, since there's nothing else to do at that point.
 </pre><blockquote type="cite"><pre wrap="">The only problem I would see is if someone has an application that does a 
lot of schema changes (doesn't sound like a good idea anyway). In that case 
they would better issue manual vacuums on such tables.   </pre></blockquote><pre wrap="">
I can't see a use case for regular DDL as part of an application, on an
otherwise integral table (lots of updates and deletes). </pre></blockquote> As part of an application there's no
use.<br/> As part of an upgrade between 2 different versions of that application there is.<br /> And that's exactly the
kindof situation where temporary disabling autovacuum could become handy.<br /> 

Re: First steps with 8.3 and autovacuum launcher

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> I think the best way to handle this is to have two limits.
> First limit attempts to autovacuum, but can be cancelled.
> When we hit second limit, sometime later, then autovacuum cannot be
> cancelled.

This seems like uselessly complex overdesign.

Remember that we still have the 8.1-era mechanism of warning messages
and eventual shutdown of normal backends as the wrap point gets closer.
A DBA who persists in cancelling antiwraparound vacuums will start
getting nasty warnings, and if he still persists will eventually be
forced to do the vacuuming manually in a standalone backend (which will
go at full speed btw).

I concur with Simon's original argument that it's a terrible idea to
make a background vacuum cancel-proof.  Pretty much the entire argument
for having vacuum at all (as opposed to say Oracle-style rollback) is
that it's good to be able to postpone maintenance overhead to a time of
the DBA's choosing.  I don't see a reason why that doesn't apply to
anti-wraparound vacuuming too.  If he postpones it too long, the
recovery will get unpleasant, but that's his choice.

An independent reason for minimizing complexity in this area is that
it's a corner case that (I trust) will never be hit by most
installations, which means it'll never get much field testing.
Poorly tested complicated code is dangerous.

In short: put in the automatic lock cancel for regular vacuums, disable
it for antiwraparound vacuums, but don't disable manual cancels; and
definitely don't invent a complicated new set of behaviors around that.
        regards, tom lane


Re: First steps with 8.3 and autovacuum launcher

From
Simon Riggs
Date:
On Fri, 2007-10-12 at 10:19 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > I think the best way to handle this is to have two limits.
> > First limit attempts to autovacuum, but can be cancelled.
> > When we hit second limit, sometime later, then autovacuum cannot be
> > cancelled.
> 
> This seems like uselessly complex overdesign.

OK

> In short: put in the automatic lock cancel for regular vacuums, disable
> it for antiwraparound vacuums, but don't disable manual cancels; and
> definitely don't invent a complicated new set of behaviors around that.

That seemed more complex when I thought about that, but if we just use
SIGUSR2 for automatic cancels then this would be very simple.

Or did you have another design?

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: First steps with 8.3 and autovacuum launcher

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> That seemed more complex when I thought about that, but if we just use
> SIGUSR2 for automatic cancels then this would be very simple.

Why not SIGINT?
        regards, tom lane


Re: First steps with 8.3 and autovacuum launcher

From
Simon Riggs
Date:
On Fri, 2007-10-12 at 11:26 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > That seemed more complex when I thought about that, but if we just use
> > SIGUSR2 for automatic cancels then this would be very simple.
> 
> Why not SIGINT?

I must be missing something. How would I tell the difference between
manual and automatic cancels if we use SIGINT for both cases?

Or did you think to put another flag on the PGPROC to allow us to tell
whether it isAvoidingWraparound?

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: First steps with 8.3 and autovacuum launcher

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On Fri, 2007-10-12 at 11:26 -0400, Tom Lane wrote:
>> Why not SIGINT?

> I must be missing something. How would I tell the difference between
> manual and automatic cancels if we use SIGINT for both cases?

Why do you need to?  I thought the plan was that DeadlockCheck would
only try to signal autovac workers.
        regards, tom lane


Re: First steps with 8.3 and autovacuum launcher

From
Simon Riggs
Date:
On Fri, 2007-10-12 at 12:42 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Fri, 2007-10-12 at 11:26 -0400, Tom Lane wrote:
> >> Why not SIGINT?
> 
> > I must be missing something. How would I tell the difference between
> > manual and automatic cancels if we use SIGINT for both cases?
> 
> Why do you need to?  I thought the plan was that DeadlockCheck would
> only try to signal autovac workers.

...thinks...

On Fri, 2007-10-12 at 10:19 -0400, Tom Lane wrote:

> In short: put in the automatic lock cancel for regular vacuums, disable
> it for antiwraparound vacuums, but don't disable manual cancels; and
> definitely don't invent a complicated new set of behaviors around that.

So you mean "dont allow automatic cancels of manually submitted
VACUUMs".

I thought you meant don't disable manually-requested cancels of
autovacuums.

Can you explain further what you meant by "don't disable manual
cancels".

(laughs) wish I had a pound for every time we'd misunderstood each
other, or at least, a pound every time I misunderstood you. :-)

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: First steps with 8.3 and autovacuum launcher

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> Can you explain further what you meant by "don't disable manual
> cancels".

I meant that pg_cancel_backend() should still work on autovac workers,
contrary to Alvaro's suggestion that autovac workers should sometimes
ignore SIGINT.

Basically the implementation vision I have is that the SIGINT catcher in
an autovac worker should remain stupid, and any intelligence involved
should be on the side where we're deciding whether to send a signal or
not.  This probably does involve exposing more state in PGPROC but I see
nothing much wrong with that.  (It might be time to merge inVacuum,
isAutovacuum, and the additional state into a bitwise vacuumFlags field.)
        regards, tom lane


Re: First steps with 8.3 and autovacuum launcher

From
Simon Riggs
Date:
On Fri, 2007-10-12 at 13:51 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Can you explain further what you meant by "don't disable manual
> > cancels".
> 
> I meant that pg_cancel_backend() should still work on autovac workers,
> contrary to Alvaro's suggestion that autovac workers should sometimes
> ignore SIGINT.
> 
> Basically the implementation vision I have is that the SIGINT catcher in
> an autovac worker should remain stupid, and any intelligence involved
> should be on the side where we're deciding whether to send a signal or
> not.  This probably does involve exposing more state in PGPROC but I see
> nothing much wrong with that.  (It might be time to merge inVacuum,
> isAutovacuum, and the additional state into a bitwise vacuumFlags field.)

Gotcha

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com