Thread: pg_autovacuum seems to be a neat freak and cleans way too much

pg_autovacuum seems to be a neat freak and cleans way too much

From
Brian Hirt
Date:
I've having a strange issue with pg_autovacuum.   I have a table with
about 4 million rows in 20,000 pages.   autovacuum likes to vacuum
and/or analyze  it every 45 minutes or so, but it probably doesn't have
more that a few hundred rows changed every few hours.   when i run
autovacuum with -d3 it says

[2004-05-18 07:04:26 PM]   table name:
basement_nightly."public"."search_words4"
[2004-05-18 07:04:26 PM]      relid: 396238832;   relisshared: 0
[2004-05-18 07:04:26 PM]      reltuples: 4;  relpages: 20013
[2004-05-18 07:04:26 PM]      curr_analyze_count:  0; cur_delete_count:
   0
[2004-05-18 07:04:26 PM]      ins_at_last_analyze: 0;
del_at_last_vacuum: 0
[2004-05-18 07:04:26 PM]      insert_threshold:    504;
delete_threshold    1008

reltuples: 4 seems wrong.  I would expect a table with 4m rows and 20k
pages to have more than 4 tuples.   I think this is why the insert
threshhold is all messed up -- which is why it gets analyzed way too
frequently.

this happens with other big tables too.   the autovacuum is from 7.4.2,
some information is below.


output from vacuum:

basement=# vacuum ANALYZE verbose search_words4;
INFO:  vacuuming "public.search_words4"
INFO:  index "search_words4_data_id" now contains 4069268 row versions
in 15978 pages
DETAIL:  479 index row versions were removed.
1 index pages have been deleted, 0 are currently reusable.
CPU 0.42s/0.70u sec elapsed 29.48 sec.
INFO:  index "search_words4_pkey" now contains 4069268 row versions in
17576 pages
DETAIL:  479 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.77s/0.74u sec elapsed 150.19 sec.
INFO:  "search_words4": removed 479 row versions in 6 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "search_words4": found 479 removable, 4069268 nonremovable row
versions in 19950 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 1.30s/1.61u sec elapsed 179.96 sec.
INFO:  analyzing "public.search_words4"
INFO:  "search_words4": 19950 pages, 3000 rows sampled, 4069800
estimated total rows
VACUUM
basement=#



here's the frequency
[2004-05-18 12:12:54 PM] Performing: ANALYZE "public"."search_words4"
[2004-05-18 01:59:13 PM] Performing: ANALYZE "public"."search_words4"
[2004-05-18 02:05:36 PM] Performing: ANALYZE "public"."search_words4"
[2004-05-18 02:29:25 PM] Performing: VACUUM ANALYZE
"public"."search_words4"
[2004-05-18 02:46:09 PM] Performing: ANALYZE "public"."search_words4"
[2004-05-18 03:39:31 PM] Performing: ANALYZE "public"."search_words4"
[2004-05-18 05:20:45 PM] Performing: ANALYZE "public"."search_words4"
[2004-05-18 06:08:03 PM] Performing: VACUUM ANALYZE
"public"."search_words4"
[2004-05-18 06:18:34 PM] Performing: ANALYZE "public"."search_words4"
[2004-05-18 07:34:27 PM] Performing: ANALYZE "public"."search_words4"
[2004-05-18 07:43:18 PM] Performing: ANALYZE "public"."search_words4"


Re: pg_autovacuum seems to be a neat freak and cleans way too much

From
Brian Hirt
Date:
I'm following up on my own email and cross posting to hackers, because
there is a bug that needs fixed.   I spent some more time digging into
this, and I found the cause of the problem.

reltuples in pg_class is defined as a real,  reltuples in pg_autovacuum
is defined as an int.   the query used to get reltuples returns
scientific notation for my larg tables, '4.06927e+06' for the one i
mention below.    pg_autovacuum happily converts that to a '4' by doing
atoi('4.06927e+06'), which is why it's all fubar for my large tables
with over a million tuples.

my real quick hack of changing the define in pg_autovacuum.h to cast
reltuples to ::int4 makes it work

line: 37
#define TABLE_STATS_QUERY       "select
a.oid,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltuples::
int4,b.schemaname,b.n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a,
pg_stat_all_tables b where a.oid=b.relid and a
.relkind = 'r'"

#define PAGES_QUERY "select oid,reltuples::int4,relpages from pg_class
where oid=%i"

however, i think a better fix would be to change the autovacuum to use
a double instead of an int.   if it's going to stay at int, it should
probably be increased to long and the casts changed to ::int8

any suggestions on how best way to fix?

i'll supply a patch once the approach is agreed upon and the problem
has been verified.


best regards,

--brian

On May 18, 2004, at 7:37 PM, Brian Hirt wrote:

> I've having a strange issue with pg_autovacuum.   I have a table with
> about 4 million rows in 20,000 pages.   autovacuum likes to vacuum
> and/or analyze  it every 45 minutes or so, but it probably doesn't
> have more that a few hundred rows changed every few hours.   when i
> run autovacuum with -d3 it says
>
> [2004-05-18 07:04:26 PM]   table name:
> basement_nightly."public"."search_words4"
> [2004-05-18 07:04:26 PM]      relid: 396238832;   relisshared: 0
> [2004-05-18 07:04:26 PM]      reltuples: 4;  relpages: 20013
> [2004-05-18 07:04:26 PM]      curr_analyze_count:  0;
> cur_delete_count:   0
> [2004-05-18 07:04:26 PM]      ins_at_last_analyze: 0;
> del_at_last_vacuum: 0
> [2004-05-18 07:04:26 PM]      insert_threshold:    504;
> delete_threshold    1008
>
> reltuples: 4 seems wrong.  I would expect a table with 4m rows and 20k
> pages to have more than 4 tuples.   I think this is why the insert
> threshhold is all messed up -- which is why it gets analyzed way too
> frequently.
>
> this happens with other big tables too.   the autovacuum is from
> 7.4.2, some information is below.
>
>
> output from vacuum:
>
> basement=# vacuum ANALYZE verbose search_words4;
> INFO:  vacuuming "public.search_words4"
> INFO:  index "search_words4_data_id" now contains 4069268 row versions
> in 15978 pages
> DETAIL:  479 index row versions were removed.
> 1 index pages have been deleted, 0 are currently reusable.
> CPU 0.42s/0.70u sec elapsed 29.48 sec.
> INFO:  index "search_words4_pkey" now contains 4069268 row versions in
> 17576 pages
> DETAIL:  479 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.77s/0.74u sec elapsed 150.19 sec.
> INFO:  "search_words4": removed 479 row versions in 6 pages
> DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  "search_words4": found 479 removable, 4069268 nonremovable row
> versions in 19950 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 0 unused item pointers.
> 0 pages are entirely empty.
> CPU 1.30s/1.61u sec elapsed 179.96 sec.
> INFO:  analyzing "public.search_words4"
> INFO:  "search_words4": 19950 pages, 3000 rows sampled, 4069800
> estimated total rows
> VACUUM
> basement=#
>
>
>
> here's the frequency
> [2004-05-18 12:12:54 PM] Performing: ANALYZE "public"."search_words4"
> [2004-05-18 01:59:13 PM] Performing: ANALYZE "public"."search_words4"
> [2004-05-18 02:05:36 PM] Performing: ANALYZE "public"."search_words4"
> [2004-05-18 02:29:25 PM] Performing: VACUUM ANALYZE
> "public"."search_words4"
> [2004-05-18 02:46:09 PM] Performing: ANALYZE "public"."search_words4"
> [2004-05-18 03:39:31 PM] Performing: ANALYZE "public"."search_words4"
> [2004-05-18 05:20:45 PM] Performing: ANALYZE "public"."search_words4"
> [2004-05-18 06:08:03 PM] Performing: VACUUM ANALYZE
> "public"."search_words4"
> [2004-05-18 06:18:34 PM] Performing: ANALYZE "public"."search_words4"
> [2004-05-18 07:34:27 PM] Performing: ANALYZE "public"."search_words4"
> [2004-05-18 07:43:18 PM] Performing: ANALYZE "public"."search_words4"


Re: [HACKERS] pg_autovacuum seems to be a neat freak and cleans way

From
Bruce Momjian
Date:
I think we already fixed that in 7.4.2.  We also have a few bugs still
in 7.4.2 and we need to get those fixed soon and release 7.4.3.

---------------------------------------------------------------------------

Brian Hirt wrote:
> I'm following up on my own email and cross posting to hackers, because
> there is a bug that needs fixed.   I spent some more time digging into
> this, and I found the cause of the problem.
>
> reltuples in pg_class is defined as a real,  reltuples in pg_autovacuum
> is defined as an int.   the query used to get reltuples returns
> scientific notation for my larg tables, '4.06927e+06' for the one i
> mention below.    pg_autovacuum happily converts that to a '4' by doing
> atoi('4.06927e+06'), which is why it's all fubar for my large tables
> with over a million tuples.
>
> my real quick hack of changing the define in pg_autovacuum.h to cast
> reltuples to ::int4 makes it work
>
> line: 37
> #define TABLE_STATS_QUERY       "select
> a.oid,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltuples::
> int4,b.schemaname,b.n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a,
> pg_stat_all_tables b where a.oid=b.relid and a
> .relkind = 'r'"
>
> #define PAGES_QUERY "select oid,reltuples::int4,relpages from pg_class
> where oid=%i"
>
> however, i think a better fix would be to change the autovacuum to use
> a double instead of an int.   if it's going to stay at int, it should
> probably be increased to long and the casts changed to ::int8
>
> any suggestions on how best way to fix?
>
> i'll supply a patch once the approach is agreed upon and the problem
> has been verified.
>
>
> best regards,
>
> --brian
>
> On May 18, 2004, at 7:37 PM, Brian Hirt wrote:
>
> > I've having a strange issue with pg_autovacuum.   I have a table with
> > about 4 million rows in 20,000 pages.   autovacuum likes to vacuum
> > and/or analyze  it every 45 minutes or so, but it probably doesn't
> > have more that a few hundred rows changed every few hours.   when i
> > run autovacuum with -d3 it says
> >
> > [2004-05-18 07:04:26 PM]   table name:
> > basement_nightly."public"."search_words4"
> > [2004-05-18 07:04:26 PM]      relid: 396238832;   relisshared: 0
> > [2004-05-18 07:04:26 PM]      reltuples: 4;  relpages: 20013
> > [2004-05-18 07:04:26 PM]      curr_analyze_count:  0;
> > cur_delete_count:   0
> > [2004-05-18 07:04:26 PM]      ins_at_last_analyze: 0;
> > del_at_last_vacuum: 0
> > [2004-05-18 07:04:26 PM]      insert_threshold:    504;
> > delete_threshold    1008
> >
> > reltuples: 4 seems wrong.  I would expect a table with 4m rows and 20k
> > pages to have more than 4 tuples.   I think this is why the insert
> > threshhold is all messed up -- which is why it gets analyzed way too
> > frequently.
> >
> > this happens with other big tables too.   the autovacuum is from
> > 7.4.2, some information is below.
> >
> >
> > output from vacuum:
> >
> > basement=# vacuum ANALYZE verbose search_words4;
> > INFO:  vacuuming "public.search_words4"
> > INFO:  index "search_words4_data_id" now contains 4069268 row versions
> > in 15978 pages
> > DETAIL:  479 index row versions were removed.
> > 1 index pages have been deleted, 0 are currently reusable.
> > CPU 0.42s/0.70u sec elapsed 29.48 sec.
> > INFO:  index "search_words4_pkey" now contains 4069268 row versions in
> > 17576 pages
> > DETAIL:  479 index row versions were removed.
> > 0 index pages have been deleted, 0 are currently reusable.
> > CPU 0.77s/0.74u sec elapsed 150.19 sec.
> > INFO:  "search_words4": removed 479 row versions in 6 pages
> > DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
> > INFO:  "search_words4": found 479 removable, 4069268 nonremovable row
> > versions in 19950 pages
> > DETAIL:  0 dead row versions cannot be removed yet.
> > There were 0 unused item pointers.
> > 0 pages are entirely empty.
> > CPU 1.30s/1.61u sec elapsed 179.96 sec.
> > INFO:  analyzing "public.search_words4"
> > INFO:  "search_words4": 19950 pages, 3000 rows sampled, 4069800
> > estimated total rows
> > VACUUM
> > basement=#
> >
> >
> >
> > here's the frequency
> > [2004-05-18 12:12:54 PM] Performing: ANALYZE "public"."search_words4"
> > [2004-05-18 01:59:13 PM] Performing: ANALYZE "public"."search_words4"
> > [2004-05-18 02:05:36 PM] Performing: ANALYZE "public"."search_words4"
> > [2004-05-18 02:29:25 PM] Performing: VACUUM ANALYZE
> > "public"."search_words4"
> > [2004-05-18 02:46:09 PM] Performing: ANALYZE "public"."search_words4"
> > [2004-05-18 03:39:31 PM] Performing: ANALYZE "public"."search_words4"
> > [2004-05-18 05:20:45 PM] Performing: ANALYZE "public"."search_words4"
> > [2004-05-18 06:08:03 PM] Performing: VACUUM ANALYZE
> > "public"."search_words4"
> > [2004-05-18 06:18:34 PM] Performing: ANALYZE "public"."search_words4"
> > [2004-05-18 07:34:27 PM] Performing: ANALYZE "public"."search_words4"
> > [2004-05-18 07:43:18 PM] Performing: ANALYZE "public"."search_words4"
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pg_autovacuum seems to be a neat freak and cleans way

From
Bruce Momjian
Date:
Brian Hirt wrote:
> I've having a strange issue with pg_autovacuum.   I have a table with
> about 4 million rows in 20,000 pages.   autovacuum likes to vacuum
> and/or analyze  it every 45 minutes or so, but it probably doesn't have
> more that a few hundred rows changed every few hours.   when i run
> autovacuum with -d3 it says
>
> [2004-05-18 07:04:26 PM]   table name:
> basement_nightly."public"."search_words4"
> [2004-05-18 07:04:26 PM]      relid: 396238832;   relisshared: 0
> [2004-05-18 07:04:26 PM]      reltuples: 4;  relpages: 20013
> [2004-05-18 07:04:26 PM]      curr_analyze_count:  0; cur_delete_count:
>    0
> [2004-05-18 07:04:26 PM]      ins_at_last_analyze: 0;
> del_at_last_vacuum: 0
> [2004-05-18 07:04:26 PM]      insert_threshold:    504;
> delete_threshold    1008
>
> reltuples: 4 seems wrong.  I would expect a table with 4m rows and 20k
> pages to have more than 4 tuples.   I think this is why the insert
> threshhold is all messed up -- which is why it gets analyzed way too
> frequently.
>
> this happens with other big tables too.   the autovacuum is from 7.4.2,
> some information is below.

Oh, 7.4.2.  I know we have some known bug and are waiting on a patch for
it.

Matthew, we need those fixes for pg_autovacuum soon.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pg_autovacuum seems to be a neat freak and cleans way

From
"Matthew T. O'Connor"
Date:
On Tue, 2004-05-18 at 22:16, Bruce Momjian wrote:
> Brian Hirt wrote:
> > I've having a strange issue with pg_autovacuum.   I have a table with
> > about 4 million rows in 20,000 pages.   autovacuum likes to vacuum
> > and/or analyze  it every 45 minutes or so, but it probably doesn't have
> > more that a few hundred rows changed every few hours.   when i run
> > autovacuum with -d3 it says
> >
> > [2004-05-18 07:04:26 PM]   table name:
> > basement_nightly."public"."search_words4"
> > [2004-05-18 07:04:26 PM]      relid: 396238832;   relisshared: 0
> > [2004-05-18 07:04:26 PM]      reltuples: 4;  relpages: 20013
> > [2004-05-18 07:04:26 PM]      curr_analyze_count:  0; cur_delete_count:
> >    0
> > [2004-05-18 07:04:26 PM]      ins_at_last_analyze: 0;
> > del_at_last_vacuum: 0
> > [2004-05-18 07:04:26 PM]      insert_threshold:    504;
> > delete_threshold    1008
> >
> > reltuples: 4 seems wrong.  I would expect a table with 4m rows and 20k
> > pages to have more than 4 tuples.   I think this is why the insert
> > threshhold is all messed up -- which is why it gets analyzed way too
> > frequently.

reltuples: 4 is wrong.  This is a known bug that has been reported
several times lately.  It is already fixed in CVS.  If you want you can
grab pg_autovacuum.c and .h from CVS and recompile, that will fix it.

> > this happens with other big tables too.   the autovacuum is from 7.4.2,
> > some information is below.
>
> Oh, 7.4.2.  I know we have some known bug and are waiting on a patch for
> it.
>
> Matthew, we need those fixes for pg_autovacuum soon.

As I said above this problem is already fixed, unfortunately it was
fixed just after the 7.4.2 release so it's been sitting in CVS for a
while.

The only outstanding bug I need to fix that I'm aware of is the temp
table issue.  Everything else I'm planning on is development work for
7.5, mostly integrating it into the backend directly.

If anyone is aware of some other bug that I'm not thinking about please
let me know.

Thanks,

Matthew


Re: pg_autovacuum seems to be a neat freak and cleans way

From
Bruce Momjian
Date:
Matthew T. O'Connor wrote:
> > > this happens with other big tables too.   the autovacuum is from 7.4.2,
> > > some information is below.
> >
> > Oh, 7.4.2.  I know we have some known bug and are waiting on a patch for
> > it.
> >
> > Matthew, we need those fixes for pg_autovacuum soon.
>
> As I said above this problem is already fixed, unfortunately it was
> fixed just after the 7.4.2 release so it's been sitting in CVS for a
> while.
>
> The only outstanding bug I need to fix that I'm aware of is the temp
> table issue.  Everything else I'm planning on is development work for
> 7.5, mostly integrating it into the backend directly.
>
> If anyone is aware of some other bug that I'm not thinking about please
> let me know.

OK, thanks.  Sorry, I got confused.  I thought there was more.  Oh, yea,
there is the issue of sending a query and not checking if the return is
null.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pg_autovacuum seems to be a neat freak and cleans way too much

From
Vivek Khera
Date:
>>>>> "BH" == Brian Hirt <bhirt@mobygames.com> writes:

BH> reltuples in pg_class is defined as a real,  reltuples in
BH> pg_autovacuum  is defined as an int.   the query used to get reltuples
BH> returns  scientific notation for my larg tables, '4.06927e+06' for the
BH> one i  mention below.    pg_autovacuum happily converts that to a '4'
BH> by doing  atoi('4.06927e+06'), which is why it's all fubar for my
BH> large tables  with over a million tuples.

Wow.  What a difference the CVS pg_autovacuum makes in this case.  I
was wondering why my large tables were vacuumed *every* iteration
thru, even though I set the churn rate to 3.0.  It thought that my 150
million+ row table had 1 row in it!  With the latest pg_autovacuum it
gets the numbers right.  This one was worse than when we had the
overflow in the time computation causing it to sleep forever between
iterations... :-)

For anyone with large tables using pg_autovaccum, you *have* to update
to the latest version.

Thanks for starting this thread.  I really is making a big difference
in my performance no having to run vacuum all the time.  I was
beginning to think my table churn was much worse than it really is...

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD  +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/