Thread: Problems with autovacuum

From:
Łukasz Jagiełło
Date:

Hi,

Recent change postgresql server from Amazon EC2 small into large one.
That gives me x86_64 arch, two core cpu and 7.5GB ram. Atm got almost
~2000 small databases at that server and autovacuum working hole time
(witch isn't good for performance as I notice at cacti, one core is
busy in 60% hole time). How can I tweak postgresql.conf to get better
performance ? Maybe number of database is huge but most of them are
unused most of time and others (~400-500) do mainly selects only with
small number of inserts or deletes from time to time.

My configuration is Fedora Core 10:
postgresql-libs-8.3.7-1.fc10.x86_64
postgresql-8.3.7-1.fc10.x86_64
postgresql-server-8.3.7-1.fc10.x86_64
postgresql-devel-8.3.7-1.fc10.x86_64

postgresql.conf:
#v+
max_connections = 500
shared_buffers = 200MB
work_mem = 4096
maintenance_work_mem = 256MB
max_fsm_pages = 204800
max_fsm_relations = 4000
vacuum_cost_delay = 0
vacuum_cost_page_hit = 1
vacuum_cost_page_miss = 10
vacuum_cost_page_dirty = 20
vacuum_cost_limit = 200
effective_cache_size = 2048MB
logging_collector = on
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
track_activities = off
track_counts = on
log_parser_stats = off
log_planner_stats = off
log_executor_stats = off
log_statement_stats = off
autovacuum = on
log_autovacuum_min_duration = -1
autovacuum_max_workers = 3
autovacuum_naptime = 10min
autovacuum_vacuum_threshold = 10000
autovacuum_analyze_threshold = 10000
autovacuum_vacuum_scale_factor = 0.5
autovacuum_analyze_scale_factor = 0.4
autovacuum_freeze_max_age = 200000000
autovacuum_vacuum_cost_delay = 20
autovacuum_vacuum_cost_limit = -1
#v-

Regards
--
Łukasz Jagiełło
System Administrator
G-Forces Web Management Polska sp. z o.o. (www.gforces.pl)

Ul. Kruczkowskiego 12, 80-288 Gdańsk
Spółka wpisana do KRS pod nr 246596 decyzją Sądu Rejonowego Gdańsk-Północ

From:
Scott Marlowe
Date:

2009/5/24 Łukasz Jagiełło <>:
> Hi,
>
> Recent change postgresql server from Amazon EC2 small into large one.
> That gives me x86_64 arch, two core cpu and 7.5GB ram. Atm got almost
> ~2000 small databases at that server and autovacuum working hole time

> postgresql.conf:
> max_fsm_pages = 204800
> max_fsm_relations = 4000

So, in 2000 databases, there's only an average of 2 relations per db
and 102 dead rows?  Cause that's all you got room for with those
settings.

Whats the last 20 or so lines of vacuum verbose as run by a superuser say?

From:
Grzegorz Jaśkiewicz
Date:

2009/5/25 Scott Marlowe <>:

>
> So, in 2000 databases, there's only an average of 2 relations per db
> and 102 dead rows?  Cause that's all you got room for with those
> settings.
>
> Whats the last 20 or so lines of vacuum verbose as run by a superuser say?

according to http://www.postgresql.org/docs/8.1/interactive/runtime-config-resource.html
max_fsm_relations applies only to tables and indices, and it says "in
database", so I presume that means per database. In which case, those
settings are ok.
It would be nice, to see if vacuum actually complains about it.



--
GJ

From:
Scott Marlowe
Date:

2009/5/25 Grzegorz Jaśkiewicz <>:
> 2009/5/25 Scott Marlowe <>:
>
>>
>> So, in 2000 databases, there's only an average of 2 relations per db
>> and 102 dead rows?  Cause that's all you got room for with those
>> settings.
>>
>> Whats the last 20 or so lines of vacuum verbose as run by a superuser say?
>
> according to http://www.postgresql.org/docs/8.1/interactive/runtime-config-resource.html
> max_fsm_relations applies only to tables and indices, and it says "in
> database", so I presume that means per database. In which case, those
> settings are ok.
> It would be nice, to see if vacuum actually complains about it.

The docs say: "These parameters control the size of the shared free
space map,"  Key word being shared.

From:
Łukasz Jagiełło
Date:

W dniu 25 maja 2009 17:32 użytkownik Scott Marlowe
<> napisał:
>> Recent change postgresql server from Amazon EC2 small into large one.
>> That gives me x86_64 arch, two core cpu and 7.5GB ram. Atm got almost
>> ~2000 small databases at that server and autovacuum working hole time
>
>> postgresql.conf:
>> max_fsm_pages = 204800
>> max_fsm_relations = 4000
>
> So, in 2000 databases, there's only an average of 2 relations per db
> and 102 dead rows?  Cause that's all you got room for with those
> settings.
>
> Whats the last 20 or so lines of vacuum verbose as run by a superuser say?

Guess you was right

#v+
Total free space (including removable row versions) is 2408 bytes.
0 pages are or will become empty, including 0 at the end of the table.
1 pages containing 2092 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "pg_depend_depender_index" now contains 5267 row versions
in 30 pages
SZCZEGÓŁY: 4 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "pg_depend_reference_index" now contains 5267 row
versions in 32 pages
SZCZEGÓŁY: 4 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_depend": moved 0 row versions, truncated 39 to 39 pages
SZCZEGÓŁY: CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "pg_catalog.pg_depend"
INFO:  "pg_depend": scanned 39 of 39 pages, containing 5267 live rows
and 0 dead rows; 3000 rows in sample, 5267 estimated total rows
INFO:  free space map contains 3876 pages in 4000 relations
SZCZEGÓŁY: A total of 67824 page slots are in use (including overhead).
67824 page slots are required to track all free space.
Current limits are:  204800 page slots, 4000 relations, using 1612 kB.
NOTICE:  max_fsm_relations(4000) equals the number of relations checked
PODPOWIEDŹ:  You have at least 4000 relations.  Consider increasing
the configuration parameter "max_fsm_relations".
VACUUM
#v-

Change:

max_fsm_pages = 6400000
max_fsm_relations = 400000

--
Łukasz Jagiełło
System Administrator
G-Forces Web Management Polska sp. z o.o. (www.gforces.pl)

Ul. Kruczkowskiego 12, 80-288 Gdańsk
Spółka wpisana do KRS pod nr 246596 decyzją Sądu Rejonowego Gdańsk-Północ

From:
Łukasz Jagiełło
Date:

W dniu 25 maja 2009 17:50 użytkownik Grzegorz Jaśkiewicz
<> napisał:
>> So, in 2000 databases, there's only an average of 2 relations per db
>> and 102 dead rows?  Cause that's all you got room for with those
>> settings.
>>
>> Whats the last 20 or so lines of vacuum verbose as run by a superuser say?
>
> according to http://www.postgresql.org/docs/8.1/interactive/runtime-config-resource.html
> max_fsm_relations applies only to tables and indices, and it says "in
> database", so I presume that means per database. In which case, those
> settings are ok.
> It would be nice, to see if vacuum actually complains about it.

Vacuum did complain about that but still got:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  609 postgres  15   0  537m 105m  676 S 21.5  1.4   1:15.00 postgres:
autovacuum launcher process
  610 postgres  15   0  241m  88m  500 S 20.5  1.1   1:50.87 postgres:
stats collector process

That autovacuum working hole time, shoudn't be run only when db needs ?

--
Łukasz Jagiełło
System Administrator
G-Forces Web Management Polska sp. z o.o. (www.gforces.pl)

Ul. Kruczkowskiego 12, 80-288 Gdańsk
Spółka wpisana do KRS pod nr 246596 decyzją Sądu Rejonowego Gdańsk-Północ

From:
Tom Lane
Date:

=?UTF-8?B?xYF1a2FzeiBKYWdpZcWCxYJv?= <> writes:
> That autovacuum working hole time, shoudn't be run only when db needs ?

With 2000 databases to cycle through, autovac is going to be spending
quite a lot of time just finding out whether it needs to do anything.
I believe the interpretation of autovacuum_naptime is that it should
examine each database that often, ie once a minute by default.  So
it's got more than 30 databases per second to look through.

Maybe it would make more sense to have one database (or at least,
many fewer databases) and 2000 schemas within it?

If you really want to stick with this layout, you're going to have to
increase autovacuum_naptime.

            regards, tom lane

From:
Łukasz Jagiełło
Date:

2009/5/25 Tom Lane <>:
> With 2000 databases to cycle through, autovac is going to be spending
> quite a lot of time just finding out whether it needs to do anything.
> I believe the interpretation of autovacuum_naptime is that it should
> examine each database that often, ie once a minute by default.  So
> it's got more than 30 databases per second to look through.
>
> Maybe it would make more sense to have one database (or at least,
> many fewer databases) and 2000 schemas within it?

It's rather impossible in my case.

> If you really want to stick with this layout, you're going to have to
> increase autovacuum_naptime.

Thanks for hint.

--
Łukasz Jagiełło
System Administrator
G-Forces Web Management Polska sp. z o.o. (www.gforces.pl)

Ul. Kruczkowskiego 12, 80-288 Gdańsk
Spółka wpisana do KRS pod nr 246596 decyzją Sądu Rejonowego Gdańsk-Północ

From:
Grzegorz Jaśkiewicz
Date:

2009/5/25 Łukasz Jagiełło <>:
> W dniu 25 maja 2009 17:32 użytkownik Scott Marlowe
> <> napisał:
>>> Recent change postgresql server from Amazon EC2 small into large one.
>>> That gives me x86_64 arch, two core cpu and 7.5GB ram. Atm got almost
>>> ~2000 small databases at that server and autovacuum working hole time
>>
>>> postgresql.conf:
>>> max_fsm_pages = 204800
>>> max_fsm_relations = 4000
>>
>> So, in 2000 databases, there's only an average of 2 relations per db
>> and 102 dead rows?  Cause that's all you got room for with those
>> settings.
>>
>> Whats the last 20 or so lines of vacuum verbose as run by a superuser say?
>
> Guess you was right
>

For future reference, if you don't log postgresql's messages, please
turn at least basic logging on, and things like that you would find in
$PGDATA/pg_log/ logs. The value suggested by postgresql, is the
minimum. I usually put in 1.5 the suggestion, which covers me from
worse case hopefully.

--
GJ

From:
Alvaro Herrera
Date:

Tom Lane escribió:
> =?UTF-8?B?xYF1a2FzeiBKYWdpZcWCxYJv?= <> writes:
> > That autovacuum working hole time, shoudn't be run only when db needs ?
>
> With 2000 databases to cycle through, autovac is going to be spending
> quite a lot of time just finding out whether it needs to do anything.
> I believe the interpretation of autovacuum_naptime is that it should
> examine each database that often, ie once a minute by default.  So
> it's got more than 30 databases per second to look through.

Note that this is correct in 8.1 and 8.2 but not 8.3 onwards.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:
Tom Lane
Date:

Alvaro Herrera <> writes:
> Tom Lane escribi�:
>> I believe the interpretation of autovacuum_naptime is that it should
>> examine each database that often, ie once a minute by default.  So
>> it's got more than 30 databases per second to look through.

> Note that this is correct in 8.1 and 8.2 but not 8.3 onwards.

Oh?  The current documentation still defines the variable thusly:

    Specifies the minimum delay between autovacuum runs on any given
    database. In each round the daemon examines the database and
    issues VACUUM and ANALYZE commands as needed for tables in that
    database.

I suppose the use of "minimum" means that this is not technically
incorrect, but it's sure not very helpful if there is some other
rule involved that causes it to not behave as I said.  (And if there
is some other rule, what is that?)  Please improve the docs.

            regards, tom lane

From:
Alvaro Herrera
Date:

Tom Lane escribió:
> Alvaro Herrera <> writes:
> > Tom Lane escribi�:
> >> I believe the interpretation of autovacuum_naptime is that it should
> >> examine each database that often, ie once a minute by default.  So
> >> it's got more than 30 databases per second to look through.
>
> > Note that this is correct in 8.1 and 8.2 but not 8.3 onwards.
>
> Oh?  The current documentation still defines the variable thusly:
>
>     Specifies the minimum delay between autovacuum runs on any given
>     database. In each round the daemon examines the database and
>     issues VACUUM and ANALYZE commands as needed for tables in that
>     database.

Sorry, it's the other way around actually -- correct for 8.3 onwards,
wrong for 8.1 and 8.2.  In the earlier versions, it would do one run in
a chosen database, sleep during "naptime", then do another run.

> I suppose the use of "minimum" means that this is not technically
> incorrect, but it's sure not very helpful if there is some other
> rule involved that causes it to not behave as I said.  (And if there
> is some other rule, what is that?)

The word "minimum" is there because it's possible that all workers are
busy with some other database(s).

> Please improve the docs.

I'll see about that.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:
Łukasz Jagiełło
Date:

W dniu 26 maja 2009 20:28 użytkownik Tom Lane <> napisał:
>>> I believe the interpretation of autovacuum_naptime is that it should
>>> examine each database that often, ie once a minute by default.  So
>>> it's got more than 30 databases per second to look through.
>
>> Note that this is correct in 8.1 and 8.2 but not 8.3 onwards.
>
> Oh?  The current documentation still defines the variable thusly:
>
>        Specifies the minimum delay between autovacuum runs on any given
>        database. In each round the daemon examines the database and
>        issues VACUUM and ANALYZE commands as needed for tables in that
>        database.
>
> I suppose the use of "minimum" means that this is not technically
> incorrect, but it's sure not very helpful if there is some other
> rule involved that causes it to not behave as I said.  (And if there
> is some other rule, what is that?)  Please improve the docs.

After change autovacuum_naptime postgresql behave like you wrote before.

--
Łukasz Jagiełło
System Administrator
G-Forces Web Management Polska sp. z o.o. (www.gforces.pl)

Ul. Kruczkowskiego 12, 80-288 Gdańsk
Spółka wpisana do KRS pod nr 246596 decyzją Sądu Rejonowego Gdańsk-Północ

From:
Tom Lane
Date:

Alvaro Herrera <> writes:
> Sorry, it's the other way around actually -- correct for 8.3 onwards,
> wrong for 8.1 and 8.2.  In the earlier versions, it would do one run in
> a chosen database, sleep during "naptime", then do another run.

> Tom Lane escribió:
>> I suppose the use of "minimum" means that this is not technically
>> incorrect, but it's sure not very helpful if there is some other
>> rule involved that causes it to not behave as I said.  (And if there
>> is some other rule, what is that?)

> The word "minimum" is there because it's possible that all workers are
> busy with some other database(s).

>> Please improve the docs.

> I'll see about that.

Hmm, maybe we need to improve the code too.  This example suggests that
there needs to be some limit on the worker launch rate, even if there
are so many databases that that means we don't meet naptime exactly.

            regards, tom lane

From:
Alvaro Herrera
Date:

Tom Lane escribió:

> Hmm, maybe we need to improve the code too.  This example suggests that
> there needs to be some limit on the worker launch rate, even if there
> are so many databases that that means we don't meet naptime exactly.

We already have a 100ms lower bound on the sleep time (see
launcher_determine_sleep()).  Maybe that needs to be increased?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:
Tom Lane
Date:

Alvaro Herrera <> writes:
> Tom Lane escribi�:
>> Hmm, maybe we need to improve the code too.  This example suggests that
>> there needs to be some limit on the worker launch rate, even if there
>> are so many databases that that means we don't meet naptime exactly.

> We already have a 100ms lower bound on the sleep time (see
> launcher_determine_sleep()).  Maybe that needs to be increased?

Maybe.  I hesitate to suggest a GUC variable ;-)

One thought is that I don't trust the code implementing the minimum
too much:

    /* 100ms is the smallest time we'll allow the launcher to sleep */
    if (nap->tv_sec <= 0 && nap->tv_usec <= 100000)
    {
        nap->tv_sec = 0;
        nap->tv_usec = 100000;    /* 100 ms */
    }

What would happen if tv_sec is negative and tv_usec is say 500000?
Maybe negative tv_sec is impossible here, but ...

            regards, tom lane

From:
Alvaro Herrera
Date:

Tom Lane escribió:
> Alvaro Herrera <> writes:
> > Tom Lane escribi�:
> >> Hmm, maybe we need to improve the code too.  This example suggests that
> >> there needs to be some limit on the worker launch rate, even if there
> >> are so many databases that that means we don't meet naptime exactly.
>
> > We already have a 100ms lower bound on the sleep time (see
> > launcher_determine_sleep()).  Maybe that needs to be increased?
>
> Maybe.  I hesitate to suggest a GUC variable ;-)

Heh :-)

> One thought is that I don't trust the code implementing the minimum
> too much:
>
>     /* 100ms is the smallest time we'll allow the launcher to sleep */
>     if (nap->tv_sec <= 0 && nap->tv_usec <= 100000)
>     {
>         nap->tv_sec = 0;
>         nap->tv_usec = 100000;    /* 100 ms */
>     }
>
> What would happen if tv_sec is negative and tv_usec is say 500000?
> Maybe negative tv_sec is impossible here, but ...

I don't think it's possible to get negative tv_sec here currently, but
perhaps you're right that we could make this code more future-proof.

However I think there's a bigger problem here, which is that if the user
has set naptime too low, i.e. to a value lower than
number-of-databases * 100ms, we'll be running the (expensive)
rebuild_database_list function on each iteration ... maybe we oughta put
a lower bound on naptime based on the number of databases to avoid this
problem.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:
Tom Lane
Date:

Alvaro Herrera <> writes:
> However I think there's a bigger problem here, which is that if the user
> has set naptime too low, i.e. to a value lower than
> number-of-databases * 100ms, we'll be running the (expensive)
> rebuild_database_list function on each iteration ... maybe we oughta put
> a lower bound on naptime based on the number of databases to avoid this
> problem.

Bingo, that's surely exactly what was happening to the OP.  He had 2000
databases and naptime at (I assume) the default; so he was rerunning
rebuild_database_list every 100ms.

So that recovery code path needs some more thought.  Maybe a lower bound
on how often to do rebuild_database_list?  And/or don't set adl_next_worker
to less than 100ms in the future to begin with?

            regards, tom lane

From:
Scott Marlowe
Date:

2009/5/26 Tom Lane <>:
> Alvaro Herrera <> writes:
>> However I think there's a bigger problem here, which is that if the user
>> has set naptime too low, i.e. to a value lower than
>> number-of-databases * 100ms, we'll be running the (expensive)
>> rebuild_database_list function on each iteration ... maybe we oughta put
>> a lower bound on naptime based on the number of databases to avoid this
>> problem.
>
> Bingo, that's surely exactly what was happening to the OP.  He had 2000
> databases and naptime at (I assume) the default; so he was rerunning
> rebuild_database_list every 100ms.
>
> So that recovery code path needs some more thought.  Maybe a lower bound
> on how often to do rebuild_database_list?  And/or don't set adl_next_worker
> to less than 100ms in the future to begin with?

I'd be happy with logging telling me when things are getting pathological.

From:
Alvaro Herrera
Date:

Tom Lane escribió:

> Bingo, that's surely exactly what was happening to the OP.  He had 2000
> databases and naptime at (I assume) the default; so he was rerunning
> rebuild_database_list every 100ms.
>
> So that recovery code path needs some more thought.  Maybe a lower bound
> on how often to do rebuild_database_list?  And/or don't set adl_next_worker
> to less than 100ms in the future to begin with?

I've been giving this some thought and tried several approaches.  In the
end the one that I like the most is raising autovacuum_naptime to a
reasonable value for the exiting number of databases.  The only problem
I have with it is that it's trivial to change it in the autovacuum
launcher process and have it stick, but there's no way to propagate the
value out to backends or postmaster to that they SHOW the actual value
in use by the launcher.  The best I can do is emit a WARNING with the
new value.

I have experimented with other choices such as not rebuilding the
database list if the time elapsed since last rebuild is not very long,
but there were small problems with that so I'd prefer to avoid it.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachment
From:
Tom Lane
Date:

Alvaro Herrera <> writes:
> I've been giving this some thought and tried several approaches.  In the
> end the one that I like the most is raising autovacuum_naptime to a
> reasonable value for the exiting number of databases.  The only problem
> I have with it is that it's trivial to change it in the autovacuum
> launcher process and have it stick, but there's no way to propagate the
> value out to backends or postmaster to that they SHOW the actual value
> in use by the launcher.  The best I can do is emit a WARNING with the
> new value.

Well, that code isn't even correct I think; you're not supposed to
modify a GUC variable directly.  I think you should just silently
use a naptime of at least X without changing the nominal GUC variable.
And definitely without the WARNING --- that's nothing but log spam.

            regards, tom lane

From:
Alvaro Herrera
Date:

Tom Lane escribió:

> Well, that code isn't even correct I think; you're not supposed to
> modify a GUC variable directly.  I think you should just silently
> use a naptime of at least X without changing the nominal GUC variable.
> And definitely without the WARNING --- that's nothing but log spam.

Glitches fixed in this version; will apply shortly to 8.3 and HEAD.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment
From:
Alvaro Herrera
Date:

Alvaro Herrera escribió:
> Tom Lane escribió:
>
> > Well, that code isn't even correct I think; you're not supposed to
> > modify a GUC variable directly.  I think you should just silently
> > use a naptime of at least X without changing the nominal GUC variable.
> > And definitely without the WARNING --- that's nothing but log spam.
>
> Glitches fixed in this version; will apply shortly to 8.3 and HEAD.

Committed.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:
Tom Lane
Date:

Alvaro Herrera <> writes:
> Glitches fixed in this version; will apply shortly to 8.3 and HEAD.

Looks sane; one trivial grammar correction:

> + /* the minimum allowed time between two awakening of the launcher */

Should read "two awakenings".

            regards, tom lane