Thread: Re: pg_autovacuum seems to be a neat freak and cleans way too much

Re: pg_autovacuum seems to be a neat freak and cleans way too much

From
Brian Hirt
Date:
I'm following up on my own email and cross posting to hackers, because
there is a bug that needs fixed.   I spent some more time digging into
this, and I found the cause of the problem.

reltuples in pg_class is defined as a real,  reltuples in pg_autovacuum
is defined as an int.   the query used to get reltuples returns
scientific notation for my larg tables, '4.06927e+06' for the one i
mention below.    pg_autovacuum happily converts that to a '4' by doing
atoi('4.06927e+06'), which is why it's all fubar for my large tables
with over a million tuples.

my real quick hack of changing the define in pg_autovacuum.h to cast
reltuples to ::int4 makes it work

line: 37
#define TABLE_STATS_QUERY       "select
a.oid,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltuples::
int4,b.schemaname,b.n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a,
pg_stat_all_tables b where a.oid=b.relid and a
.relkind = 'r'"

#define PAGES_QUERY "select oid,reltuples::int4,relpages from pg_class
where oid=%i"

however, i think a better fix would be to change the autovacuum to use
a double instead of an int.   if it's going to stay at int, it should
probably be increased to long and the casts changed to ::int8

any suggestions on how best way to fix?

i'll supply a patch once the approach is agreed upon and the problem
has been verified.


best regards,

--brian

On May 18, 2004, at 7:37 PM, Brian Hirt wrote:

> I've having a strange issue with pg_autovacuum.   I have a table with
> about 4 million rows in 20,000 pages.   autovacuum likes to vacuum
> and/or analyze  it every 45 minutes or so, but it probably doesn't
> have more that a few hundred rows changed every few hours.   when i
> run autovacuum with -d3 it says
>
> [2004-05-18 07:04:26 PM]   table name:
> basement_nightly."public"."search_words4"
> [2004-05-18 07:04:26 PM]      relid: 396238832;   relisshared: 0
> [2004-05-18 07:04:26 PM]      reltuples: 4;  relpages: 20013
> [2004-05-18 07:04:26 PM]      curr_analyze_count:  0;
> cur_delete_count:   0
> [2004-05-18 07:04:26 PM]      ins_at_last_analyze: 0;
> del_at_last_vacuum: 0
> [2004-05-18 07:04:26 PM]      insert_threshold:    504;
> delete_threshold    1008
>
> reltuples: 4 seems wrong.  I would expect a table with 4m rows and 20k
> pages to have more than 4 tuples.   I think this is why the insert
> threshhold is all messed up -- which is why it gets analyzed way too
> frequently.
>
> this happens with other big tables too.   the autovacuum is from
> 7.4.2, some information is below.
>
>
> output from vacuum:
>
> basement=# vacuum ANALYZE verbose search_words4;
> INFO:  vacuuming "public.search_words4"
> INFO:  index "search_words4_data_id" now contains 4069268 row versions
> in 15978 pages
> DETAIL:  479 index row versions were removed.
> 1 index pages have been deleted, 0 are currently reusable.
> CPU 0.42s/0.70u sec elapsed 29.48 sec.
> INFO:  index "search_words4_pkey" now contains 4069268 row versions in
> 17576 pages
> DETAIL:  479 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.77s/0.74u sec elapsed 150.19 sec.
> INFO:  "search_words4": removed 479 row versions in 6 pages
> DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  "search_words4": found 479 removable, 4069268 nonremovable row
> versions in 19950 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 0 unused item pointers.
> 0 pages are entirely empty.
> CPU 1.30s/1.61u sec elapsed 179.96 sec.
> INFO:  analyzing "public.search_words4"
> INFO:  "search_words4": 19950 pages, 3000 rows sampled, 4069800
> estimated total rows
> VACUUM
> basement=#
>
>
>
> here's the frequency
> [2004-05-18 12:12:54 PM] Performing: ANALYZE "public"."search_words4"
> [2004-05-18 01:59:13 PM] Performing: ANALYZE "public"."search_words4"
> [2004-05-18 02:05:36 PM] Performing: ANALYZE "public"."search_words4"
> [2004-05-18 02:29:25 PM] Performing: VACUUM ANALYZE
> "public"."search_words4"
> [2004-05-18 02:46:09 PM] Performing: ANALYZE "public"."search_words4"
> [2004-05-18 03:39:31 PM] Performing: ANALYZE "public"."search_words4"
> [2004-05-18 05:20:45 PM] Performing: ANALYZE "public"."search_words4"
> [2004-05-18 06:08:03 PM] Performing: VACUUM ANALYZE
> "public"."search_words4"
> [2004-05-18 06:18:34 PM] Performing: ANALYZE "public"."search_words4"
> [2004-05-18 07:34:27 PM] Performing: ANALYZE "public"."search_words4"
> [2004-05-18 07:43:18 PM] Performing: ANALYZE "public"."search_words4"


Re: pg_autovacuum seems to be a neat freak and cleans way

From
Bruce Momjian
Date:
I think we already fixed that in 7.4.2.  We also have a few bugs still
in 7.4.2 and we need to get those fixed soon and release 7.4.3.

---------------------------------------------------------------------------

Brian Hirt wrote:
> I'm following up on my own email and cross posting to hackers, because
> there is a bug that needs fixed.   I spent some more time digging into
> this, and I found the cause of the problem.
>
> reltuples in pg_class is defined as a real,  reltuples in pg_autovacuum
> is defined as an int.   the query used to get reltuples returns
> scientific notation for my larg tables, '4.06927e+06' for the one i
> mention below.    pg_autovacuum happily converts that to a '4' by doing
> atoi('4.06927e+06'), which is why it's all fubar for my large tables
> with over a million tuples.
>
> my real quick hack of changing the define in pg_autovacuum.h to cast
> reltuples to ::int4 makes it work
>
> line: 37
> #define TABLE_STATS_QUERY       "select
> a.oid,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltuples::
> int4,b.schemaname,b.n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a,
> pg_stat_all_tables b where a.oid=b.relid and a
> .relkind = 'r'"
>
> #define PAGES_QUERY "select oid,reltuples::int4,relpages from pg_class
> where oid=%i"
>
> however, i think a better fix would be to change the autovacuum to use
> a double instead of an int.   if it's going to stay at int, it should
> probably be increased to long and the casts changed to ::int8
>
> any suggestions on how best way to fix?
>
> i'll supply a patch once the approach is agreed upon and the problem
> has been verified.
>
>
> best regards,
>
> --brian
>
> On May 18, 2004, at 7:37 PM, Brian Hirt wrote:
>
> > I've having a strange issue with pg_autovacuum.   I have a table with
> > about 4 million rows in 20,000 pages.   autovacuum likes to vacuum
> > and/or analyze  it every 45 minutes or so, but it probably doesn't
> > have more that a few hundred rows changed every few hours.   when i
> > run autovacuum with -d3 it says
> >
> > [2004-05-18 07:04:26 PM]   table name:
> > basement_nightly."public"."search_words4"
> > [2004-05-18 07:04:26 PM]      relid: 396238832;   relisshared: 0
> > [2004-05-18 07:04:26 PM]      reltuples: 4;  relpages: 20013
> > [2004-05-18 07:04:26 PM]      curr_analyze_count:  0;
> > cur_delete_count:   0
> > [2004-05-18 07:04:26 PM]      ins_at_last_analyze: 0;
> > del_at_last_vacuum: 0
> > [2004-05-18 07:04:26 PM]      insert_threshold:    504;
> > delete_threshold    1008
> >
> > reltuples: 4 seems wrong.  I would expect a table with 4m rows and 20k
> > pages to have more than 4 tuples.   I think this is why the insert
> > threshhold is all messed up -- which is why it gets analyzed way too
> > frequently.
> >
> > this happens with other big tables too.   the autovacuum is from
> > 7.4.2, some information is below.
> >
> >
> > output from vacuum:
> >
> > basement=# vacuum ANALYZE verbose search_words4;
> > INFO:  vacuuming "public.search_words4"
> > INFO:  index "search_words4_data_id" now contains 4069268 row versions
> > in 15978 pages
> > DETAIL:  479 index row versions were removed.
> > 1 index pages have been deleted, 0 are currently reusable.
> > CPU 0.42s/0.70u sec elapsed 29.48 sec.
> > INFO:  index "search_words4_pkey" now contains 4069268 row versions in
> > 17576 pages
> > DETAIL:  479 index row versions were removed.
> > 0 index pages have been deleted, 0 are currently reusable.
> > CPU 0.77s/0.74u sec elapsed 150.19 sec.
> > INFO:  "search_words4": removed 479 row versions in 6 pages
> > DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
> > INFO:  "search_words4": found 479 removable, 4069268 nonremovable row
> > versions in 19950 pages
> > DETAIL:  0 dead row versions cannot be removed yet.
> > There were 0 unused item pointers.
> > 0 pages are entirely empty.
> > CPU 1.30s/1.61u sec elapsed 179.96 sec.
> > INFO:  analyzing "public.search_words4"
> > INFO:  "search_words4": 19950 pages, 3000 rows sampled, 4069800
> > estimated total rows
> > VACUUM
> > basement=#
> >
> >
> >
> > here's the frequency
> > [2004-05-18 12:12:54 PM] Performing: ANALYZE "public"."search_words4"
> > [2004-05-18 01:59:13 PM] Performing: ANALYZE "public"."search_words4"
> > [2004-05-18 02:05:36 PM] Performing: ANALYZE "public"."search_words4"
> > [2004-05-18 02:29:25 PM] Performing: VACUUM ANALYZE
> > "public"."search_words4"
> > [2004-05-18 02:46:09 PM] Performing: ANALYZE "public"."search_words4"
> > [2004-05-18 03:39:31 PM] Performing: ANALYZE "public"."search_words4"
> > [2004-05-18 05:20:45 PM] Performing: ANALYZE "public"."search_words4"
> > [2004-05-18 06:08:03 PM] Performing: VACUUM ANALYZE
> > "public"."search_words4"
> > [2004-05-18 06:18:34 PM] Performing: ANALYZE "public"."search_words4"
> > [2004-05-18 07:34:27 PM] Performing: ANALYZE "public"."search_words4"
> > [2004-05-18 07:43:18 PM] Performing: ANALYZE "public"."search_words4"
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pg_autovacuum seems to be a neat freak and cleans way too much

From
Brian Hirt
Date:
there might be another similar bug that was fixed in 7.4.2

i just doubled checked the 7.4.2 tarball, and it does have this problem.

you might want to double check to see if it's fixed in 7.4.3, or i can  
grab cvs and check it if you like.


On May 18, 2004, at 8:06 PM, Bruce Momjian wrote:

>
> I think we already fixed that in 7.4.2.  We also have a few bugs still
> in 7.4.2 and we need to get those fixed soon and release 7.4.3.
>
> ----------------------------------------------------------------------- 
> ----
>
> Brian Hirt wrote:
>> I'm following up on my own email and cross posting to hackers, because
>> there is a bug that needs fixed.   I spent some more time digging into
>> this, and I found the cause of the problem.
>>
>> reltuples in pg_class is defined as a real,  reltuples in  
>> pg_autovacuum
>> is defined as an int.   the query used to get reltuples returns
>> scientific notation for my larg tables, '4.06927e+06' for the one i
>> mention below.    pg_autovacuum happily converts that to a '4' by  
>> doing
>> atoi('4.06927e+06'), which is why it's all fubar for my large tables
>> with over a million tuples.
>>
>> my real quick hack of changing the define in pg_autovacuum.h to cast
>> reltuples to ::int4 makes it work
>>
>> line: 37
>> #define TABLE_STATS_QUERY       "select
>> a.oid,a.relname,a.relnamespace,a.relpages,a.relisshared,a.reltuples::
>> int4,b.schemaname,b.n_tup_ins,b.n_tup_upd,b.n_tup_del from pg_class a,
>> pg_stat_all_tables b where a.oid=b.relid and a
>> .relkind = 'r'"
>>
>> #define PAGES_QUERY "select oid,reltuples::int4,relpages from pg_class
>> where oid=%i"
>>
>> however, i think a better fix would be to change the autovacuum to use
>> a double instead of an int.   if it's going to stay at int, it should
>> probably be increased to long and the casts changed to ::int8
>>
>> any suggestions on how best way to fix?
>>
>> i'll supply a patch once the approach is agreed upon and the problem
>> has been verified.
>>
>>
>> best regards,
>>
>> --brian
>>
>> On May 18, 2004, at 7:37 PM, Brian Hirt wrote:
>>
>>> I've having a strange issue with pg_autovacuum.   I have a table with
>>> about 4 million rows in 20,000 pages.   autovacuum likes to vacuum
>>> and/or analyze  it every 45 minutes or so, but it probably doesn't
>>> have more that a few hundred rows changed every few hours.   when i
>>> run autovacuum with -d3 it says
>>>
>>> [2004-05-18 07:04:26 PM]   table name:
>>> basement_nightly."public"."search_words4"
>>> [2004-05-18 07:04:26 PM]      relid: 396238832;   relisshared: 0
>>> [2004-05-18 07:04:26 PM]      reltuples: 4;  relpages: 20013
>>> [2004-05-18 07:04:26 PM]      curr_analyze_count:  0;
>>> cur_delete_count:   0
>>> [2004-05-18 07:04:26 PM]      ins_at_last_analyze: 0;
>>> del_at_last_vacuum: 0
>>> [2004-05-18 07:04:26 PM]      insert_threshold:    504;
>>> delete_threshold    1008
>>>
>>> reltuples: 4 seems wrong.  I would expect a table with 4m rows and  
>>> 20k
>>> pages to have more than 4 tuples.   I think this is why the insert
>>> threshhold is all messed up -- which is why it gets analyzed way too
>>> frequently.
>>>
>>> this happens with other big tables too.   the autovacuum is from
>>> 7.4.2, some information is below.
>>>
>>>
>>> output from vacuum:
>>>
>>> basement=# vacuum ANALYZE verbose search_words4;
>>> INFO:  vacuuming "public.search_words4"
>>> INFO:  index "search_words4_data_id" now contains 4069268 row  
>>> versions
>>> in 15978 pages
>>> DETAIL:  479 index row versions were removed.
>>> 1 index pages have been deleted, 0 are currently reusable.
>>> CPU 0.42s/0.70u sec elapsed 29.48 sec.
>>> INFO:  index "search_words4_pkey" now contains 4069268 row versions  
>>> in
>>> 17576 pages
>>> DETAIL:  479 index row versions were removed.
>>> 0 index pages have been deleted, 0 are currently reusable.
>>> CPU 0.77s/0.74u sec elapsed 150.19 sec.
>>> INFO:  "search_words4": removed 479 row versions in 6 pages
>>> DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
>>> INFO:  "search_words4": found 479 removable, 4069268 nonremovable row
>>> versions in 19950 pages
>>> DETAIL:  0 dead row versions cannot be removed yet.
>>> There were 0 unused item pointers.
>>> 0 pages are entirely empty.
>>> CPU 1.30s/1.61u sec elapsed 179.96 sec.
>>> INFO:  analyzing "public.search_words4"
>>> INFO:  "search_words4": 19950 pages, 3000 rows sampled, 4069800
>>> estimated total rows
>>> VACUUM
>>> basement=#
>>>
>>>
>>>
>>> here's the frequency
>>> [2004-05-18 12:12:54 PM] Performing: ANALYZE "public"."search_words4"
>>> [2004-05-18 01:59:13 PM] Performing: ANALYZE "public"."search_words4"
>>> [2004-05-18 02:05:36 PM] Performing: ANALYZE "public"."search_words4"
>>> [2004-05-18 02:29:25 PM] Performing: VACUUM ANALYZE
>>> "public"."search_words4"
>>> [2004-05-18 02:46:09 PM] Performing: ANALYZE "public"."search_words4"
>>> [2004-05-18 03:39:31 PM] Performing: ANALYZE "public"."search_words4"
>>> [2004-05-18 05:20:45 PM] Performing: ANALYZE "public"."search_words4"
>>> [2004-05-18 06:08:03 PM] Performing: VACUUM ANALYZE
>>> "public"."search_words4"
>>> [2004-05-18 06:18:34 PM] Performing: ANALYZE "public"."search_words4"
>>> [2004-05-18 07:34:27 PM] Performing: ANALYZE "public"."search_words4"
>>> [2004-05-18 07:43:18 PM] Performing: ANALYZE "public"."search_words4"
>>
>>
>> ---------------------------(end of  
>> broadcast)---------------------------
>> TIP 6: Have you searched our list archives?
>>
>>                http://archives.postgresql.org
>>
>
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania  
> 19073



Re: pg_autovacuum seems to be a neat freak and cleans

From
"Matthew T. O'Connor"
Date:
On Tue, 2004-05-18 at 22:21, Brian Hirt wrote:
> there might be another similar bug that was fixed in 7.4.2

This bug is fixed, but it didn't make in 7.4.2, it is in CVS (both 7.4
and HEAD).  Please grab pg_autovacuum.c and .h from CVS, if that doesn't
fix it please let me know.

Thanks,

Matthew