Thread: CPU-intensive autovacuuming

CPU-intensive autovacuuming

From

Phil Endecott

Date:

05 June 2005, 19:47:10

Dear Postgresql experts,

I'm trying to work out why my system spends so much time autovacuuming.

It looks like pg_autovacuum wakes up every few minutes and does a query
on some system tables, and then spins doing no more queries but burning
all available CPU cycles for a a couple of minutes, before sleeping again.

I'm logging all queries to syslog, and the only queries that autovacuum
seems to run are about 3 selecting from pg_class and some stats tables.
   They complete in a couple of seconds.  I would see VACUUM commands in
there as well (wouldn't I?) if it was actually doing anything, but I
don't. Since not much is happening on the system I'm not suprised that
it decides that nothing needs vacuuming.  But instead I think it
"spins"; and it's the pg_autovacuum process, not a postmaster, that is
taking all the CPU.

I wonder if this is because I have a lot of tables (about 50,000 I think
- I have one schema per user and each schema a couple of dozen tables).
  Could it be that there is some code in autovacuum that is O(n^2) in
the number of tables, or something like that?

Has anyone seen anything like this before?  Any debugging suggestions?

This is with the Debian package of 7.4.2, and all the default autovacuum
settings.

Cheers,  Phil.

Re: CPU-intensive autovacuuming

From

Phil Endecott

Date:

06 June 2005, 12:16:37

Following up on my own post from last night:

 > Could it be that there is some code in autovacuum that is O(n^2) in
 > the number of tables?

Browsing the code using webcvs, I have found this:

for (j = 0; j < PQntuples(res); j++)
{
    tbl_elem = DLGetHead(dbs->table_list);
    while (tbl_elem != NULL)
    {

I haven't really tried to understand what is going on in here, but it
does look like it is getting the result of the "pg_class join stats"
query and then matching it up against its internal list of tables using
nested loops, which is undoubtedly O(n^2) in the number of tables.

Have I correctly understood what is going on here?

--Phil.

Re: CPU-intensive autovacuuming

From

"Matthew T. O'Connor"

Date:

06 June 2005, 21:33:37

Phil Endecott wrote:

> Following up on my own post from last night:
>
> > Could it be that there is some code in autovacuum that is O(n^2) in
> > the number of tables?
>
> Browsing the code using webcvs, I have found this:
>
> for (j = 0; j < PQntuples(res); j++)
> {
>     tbl_elem = DLGetHead(dbs->table_list);
>     while (tbl_elem != NULL)
>     {
>
> I haven't really tried to understand what is going on in here, but it
> does look like it is getting the result of the "pg_class join stats"
> query and then matching it up against its internal list of tables
> using nested loops, which is undoubtedly O(n^2) in the number of tables.
>
> Have I correctly understood what is going on here?

Indeed you have.  I have head a few similar reports but perhaps none as
bad as yours.  One person put a small sleep value so that it doesn't
spin so tight.  You could also just up the sleep delay so that it
doesn't do this work quite so often.  No other quick suggestions.

Re: CPU-intensive autovacuuming

From

Phil Endecott

Date:

07 June 2005, 08:16:43

Matthew T. O'Connor wrote:
> Phil Endecott wrote:
>> > Could it be that there is some code in autovacuum that is O(n^2) in
>> > the number of tables?
>>
>> Browsing the code using webcvs, I have found this:
>>
>> for (j = 0; j < PQntuples(res); j++)
>> {
>>     tbl_elem = DLGetHead(dbs->table_list);
>>     while (tbl_elem != NULL)
>>     {
>> Have I correctly understood what is going on here?

> Indeed you have.  I have head a few similar reports but perhaps none as
> bad as yours.  One person put a small sleep value so that it doesn't
> spin so tight.  You could also just up the sleep delay so that it
> doesn't do this work quite so often.  No other quick suggestions.

I do wonder why autovacuum is keeping its table list in memory rather
than in the database.

But given that it is keeping it in memory, I think the real fix is to
sort that list (or keep it ordered when building or updating it).  It is
trivial to also get the query results ordered, and they can then be
compared in O(n) time.

I notice various other places where there seem to be nested loops, e.g.
in the update_table_list function.  I'm not sure if they can be fixed by
similar means.

--Phil.

Re: CPU-intensive autovacuuming

From

"Matthew T. O'Connor"

Date:

07 June 2005, 12:16:32

Phil Endecott wrote:

> Matthew T. O'Connor wrote:
>
>> Indeed you have.  I have head a few similar reports but perhaps none
>> as bad as yours.  One person put a small sleep value so that it
>> doesn't spin so tight.  You could also just up the sleep delay so
>> that it doesn't do this work quite so often.  No other quick
>> suggestions.
>
>
> I do wonder why autovacuum is keeping its table list in memory rather
> than in the database.

For better or worse, this was a conscious design decision that the
contrib version of autovacuum be non-invasive to your database.

> But given that it is keeping it in memory, I think the real fix is to
> sort that list (or keep it ordered when building or updating it).  It
> is trivial to also get the query results ordered, and they can then be
> compared in O(n) time.

I'm quite sure there is a better way, please submit a patch if you can.
This was never a real concern for most people since the number of tables
is typically small enough not to be a problem.  The integrated version
of autovacuum that didn't make the cut before 8.0 avoids this problem
since the autovacuum data is stored in the database.

> I notice various other places where there seem to be nested loops,
> e.g. in the update_table_list function.  I'm not sure if they can be
> fixed by similar means.

I would think so, they all basically do the same type of loop.

Re: CPU-intensive autovacuuming

From

Phil Endecott

Date:

07 June 2005, 12:35:44

Matthew T. O'Connor wrote:
> The integrated version
> of autovacuum that didn't make the cut before 8.0 avoids this problem
> since the autovacuum data is stored in the database.

What is the status of this?  Is it something that will be included in
8.1 or 8.0.n?  I might be able to patch the current code but that
doesn't seem like a useful thing to do if a better solution will arrive
eventually.  I am currently running vacuums from a cron job and I think
I will be happy with that for the time being.

(Incidentally, I have also found that the indexes on my pg_attributes
table were taking up over half a gigabyte, which came down to less than
40 megs after reindexing them.  Is there a case for having autovacuum
also call reindex?)

--Phil.

Re: CPU-intensive autovacuuming

From

"Matthew T. O'Connor"

Date:

07 June 2005, 13:01:23

Phil Endecott wrote:

> Matthew T. O'Connor wrote:
>
>> The integrated version of autovacuum that didn't make the cut before
>> 8.0 avoids this problem since the autovacuum data is stored in the
>> database.
>
>
> What is the status of this?  Is it something that will be included in
> 8.1 or 8.0.n?  I might be able to patch the current code but that
> doesn't seem like a useful thing to do if a better solution will
> arrive eventually.  I am currently running vacuums from a cron job and
> I think I will be happy with that for the time being.

This is a good question :-)  I have been so busy with work lately that I
have not been able to work on it.  I am currently trying to resurrect
the patch I sent in for 8.0 and update it so that it applies against
HEAD.  Once that is done, I will need help from someone with the
portions of the work that I'm not comfortable / capable of.   The main
issue with the version I created during the 8.0 devel cycle it used
libpq to connect, query and issue commands against the databases.  This
was deemed bad, and I need help setting up the infrastructure to make
this happen without libpq.  I hope to have my patch applying against
HEAD sometime this week but it probably won't happen till next week.

So the summary of the autovacuum integration status is that we are fast
running out of time (feature freeze July 1), and I have very little time
to devote to this task.  So you might want to submit your O(n) patch
cause unfortunately it looks like integrated autovacuum might slip
another release unless someone else steps up to work on it.

> (Incidentally, I have also found that the indexes on my pg_attributes
> table were taking up over half a gigabyte, which came down to less
> than 40 megs after reindexing them.  Is there a case for having
> autovacuum also call reindex?)

Yes there is certainly some merit to having autovacuum or something
similar perform other system maintenance tasks such as reindexing.  I
just haven't taken it there yet.  It does seem strange that your
pg_attributes table go that big, anyone have any insight here?  You did
say you are using 7.4.2, I forget it that has the index reclaiming code
in vacuum, also there are some autovacuum bugs in the early 7.4.x
releases.  You might try to upgrade to either 8.0.x or a later 7.4.x
release.

Matthew O'Connor

Re: CPU-intensive autovacuuming

From

Bruce Momjian

Date:

07 June 2005, 13:04:37

Phil Endecott wrote:
> Matthew T. O'Connor wrote:
> > The integrated version
> > of autovacuum that didn't make the cut before 8.0 avoids this problem
> > since the autovacuum data is stored in the database.
>
> What is the status of this?  Is it something that will be included in
> 8.1 or 8.0.n?  I might be able to patch the current code but that
> doesn't seem like a useful thing to do if a better solution will arrive
> eventually.  I am currently running vacuums from a cron job and I think
> I will be happy with that for the time being.

I will post about integrating pg_autovacuum into the backend for 8.1 in
a few minutes.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: CPU-intensive autovacuuming

From

Tom Lane

Date:

07 June 2005, 13:42:10

Phil Endecott <spam_from_postgresql_general@chezphil.org> writes:
> (Incidentally, I have also found that the indexes on my pg_attributes
> table were taking up over half a gigabyte, which came down to less than
> 40 megs after reindexing them.  Is there a case for having autovacuum
> also call reindex?)

Lots of temp tables I suppose?  If so that's not autovacuum's fault;
it wasn't getting told about the activity in pg_attribute until this
patch:

2005-03-31 18:20  tgl

    * src/backend/postmaster/: pgstat.c (REL7_4_STABLE), pgstat.c
    (REL8_0_STABLE), pgstat.c: Flush any remaining statistics counts
    out to the collector at process exit.  Without this, operations
    triggered during backend exit (such as temp table deletions) won't
    be counted ... which given heavy usage of temp tables can lead to
    pg_autovacuum falling way behind on the need to vacuum pg_class and
    pg_attribute.  Per reports from Steve Crawford and others.

Unless the bloat occurred after you updated to 8.0.2, there's no issue.

            regards, tom lane

Re: CPU-intensive autovacuuming

From

"Thomas F. O'Connell"

Date:

10 June 2005, 00:52:18

Phil,

If you complete this patch, I'm very interested to see it.

I think I'm the person Matthew is talking about who inserted a sleep
value. Because of the sheer number of tables involved, even small
values of sleep caused pg_autovacuum to iterate too slowly over its
table lists to be of use in a production environment (where I still
find its behavior to be preferable to a complicated list of manual
vacuums performed in cron).

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Jun 7, 2005, at 6:16 AM, Phil Endecott wrote:

> Matthew T. O'Connor wrote:
>
>> Phil Endecott wrote:
>>
>>> > Could it be that there is some code in autovacuum that is O
>>> (n^2) in
>>> > the number of tables?
>>>
>>> Browsing the code using webcvs, I have found this:
>>>
>>> for (j = 0; j < PQntuples(res); j++)
>>> {
>>>     tbl_elem = DLGetHead(dbs->table_list);
>>>     while (tbl_elem != NULL)
>>>     {  Have I correctly understood what is going on here?
>>>
>
>
>> Indeed you have.  I have head a few similar reports but perhaps
>> none as bad as yours.  One person put a small sleep value so that
>> it doesn't spin so tight.  You could also just up the sleep delay
>> so that it doesn't do this work quite so often.  No other quick
>> suggestions.
>>
>
> I do wonder why autovacuum is keeping its table list in memory
> rather than in the database.
>
> But given that it is keeping it in memory, I think the real fix is
> to sort that list (or keep it ordered when building or updating
> it).  It is trivial to also get the query results ordered, and they
> can then be compared in O(n) time.
>
> I notice various other places where there seem to be nested loops,
> e.g. in the update_table_list function.  I'm not sure if they can
> be fixed by similar means.
>
> --Phil.

Re: CPU-intensive autovacuuming

From

Shelby Cain

Date:

10 June 2005, 11:12:37


--- "Thomas F. O'Connell" <tfo@sitening.com> wrote:

> Phil,
>
> If you complete this patch, I'm very interested to see it.
>
> I think I'm the person Matthew is talking about who inserted a sleep
>
> value. Because of the sheer number of tables involved, even small
> values of sleep caused pg_autovacuum to iterate too slowly over its
> table lists to be of use in a production environment (where I still
> find its behavior to be preferable to a complicated list of manual
> vacuums performed in cron).
>
> --
> Thomas F. O'Connell
> Co-Founder, Information Architect
> Sitening, LLC
>

Were you sleeping every time through the loop?  How about something
like:

if (j%500 == 1) usleep(100000)

Regards,

Shelby Cain



__________________________________
Discover Yahoo!
Stay in touch with email, IM, photo sharing and more. Check it out!
http://discover.yahoo.com/stayintouch.html

Re: CPU-intensive autovacuuming

From

"Thomas F. O'Connell"

Date:

10 June 2005, 22:23:10

I was usleeping in tiny increments in each iteration of the loop. I
didn't try break it into iterative groups like this.

Honestly, I'd prefer to see pg_autovacuum improved to do O(n) rather
than O(n^2) table activity. At this point, though, I'm probably not
too likely to have much time to hack pg_autovacuum before 8.1 is
released, although if it doesn't become integrated by beta feature
freeze, I might give it a shot.

But I hope if anyone completes the linear improvement, they'll post
to the lists.

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Jun 10, 2005, at 9:12 AM, Shelby Cain wrote:

> --- "Thomas F. O'Connell" <tfo@sitening.com> wrote:
>
>
> Were you sleeping every time through the loop?  How about something
> like:
>
> if (j%500 == 1) usleep(100000)
>
> Regards,
>
> Shelby Cain

how to return a result set from a stored procedure

From

Hugo

Date:

11 June 2005, 09:44:40

Hi everybody

I am trying to write a stored procedure that returns a result set but it is not working
this is the function:
///
CREATE OR REPLACE FUNCTION remisiones.fn_get_total_remitidoxprovision1("numeric")
RETURNS SETOF record AS
$BODY$
begin
  select rm.provision as provision,
           drm.producto as producto,
           sum(drm.cantidad) as cantidad
FROM remisiones.remisiones rm, remisiones.detalles_remision drm
WHERE rm.remision = drm.remision and rm.provision = $1
GROUP BY rm.provision, drm.producto
ORDER BY rm.provision, drm.producto;
end;$BODY$

///
If  I call this function from the interactive sql of pgadminIII I get this result:
    select * from fn_gert_total_remitidosxprovision(1)
---------------------------------------------------------------------------
    row    refcursor
    1       <unnamed porta1>

is there a way to display the value of the rows returned, i need it becouse I need to use it in a Datawindow definition in an Powerbuilder app.

thanks in advance

Hugo

Re: CPU-intensive autovacuuming

From

Tom Lane

Date:

11 June 2005, 13:54:24

"Thomas F. O'Connell" <tfo@sitening.com> writes:
> Honestly, I'd prefer to see pg_autovacuum improved to do O(n) rather
> than O(n^2) table activity. At this point, though, I'm probably not
> too likely to have much time to hack pg_autovacuum before 8.1 is
> released, although if it doesn't become integrated by beta feature
> freeze, I might give it a shot.

This would be vastly easier to fix if the code were integrated into the
backend first.  In the backend environment you could just keep the info
in a dynahash.c hashtable instead of in a linear list.  On the client
side, you have to roll your own hashing (or adapt dynahash to life
outside the backend environment).

            regards, tom lane

Re: how to return a result set from a stored procedure

From

Roman Neuhauser

Date:

15 June 2005, 06:44:09

# htakada@gmail.com / 2005-06-11 08:44:32 -0400:
> Hi everybody
>
> I am trying to write a stored procedure that returns a result set but it is
> not working
> this is the function:
> ///
> CREATE OR REPLACE FUNCTION
> remisiones.fn_get_total_remitidoxprovision1("numeric")
>
> RETURNS SETOF record AS
> $BODY$
> begin
> select rm.provision as provision,
> drm.producto as producto,
> sum(drm.cantidad) as cantidad
> FROM remisiones.remisiones rm, remisiones.detalles_remision drm
> WHERE rm.remision = drm.remision and rm.provision = $1
> GROUP BY rm.provision, drm.producto
> ORDER BY rm.provision, drm.producto;
> end;$BODY$
>
> ///

    http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#AEN32875

--
How many Vietnam vets does it take to screw in a light bulb?
You don't know, man.  You don't KNOW.
Cause you weren't THERE.             http://bash.org/?255991