Thread: Help tuning autovacuum - seeing lots of relation bloat

Help tuning autovacuum - seeing lots of relation bloat

From

"jody brownell"

Date:

21 June 2006, 10:53:00

Hey - I am running into a data relation bloat problem which I believe is causing fairly significant
slowdown of my updates. I am using version

                                   version
-----------------------------------------------------------------------------
 PostgreSQL 8.1.4 on i586-trustix-linux-gnu, compiled by GCC gcc (GCC) 3.2.3

After about 12 hours of running, my updates are causing lots of reads and iowait (45%) slowing
everything down. The DB bloats from 259MB to 2.4 - 3.4GB.

The primary table which is troubled is called target and reaches a size of in mb of 834MB from its
freshly 'vacuum full analyze' size of 39MB.

qradar=# select * from q_table_size;
           tablename            |  size
--------------------------------+---------
 target                         | 834.496

My configuration includes.

shared_buffers = 32767
work_mem = 20480
maintenance_work_mem = 32768
max_fsm_pages = 4024000
max_fsm_relations = 2000
fsync = false
wal_sync_method = fsync
wal_buffers = 4096
checkpoint_segments = 32
checkpoint_timeout = 1200
checkpoint_warning = 60
commit_delay = 5000
commit_siblings = 5
effective_cache_size = 175000
random_page_cost = 2
autovacuum = true
autovacuum_naptime = 60
autovacuum_vacuum_threshold = 500
autovacuum_analyze_threshold = 250
autovacuum_vacuum_scale_factor = 0.08
autovacuum_analyze_scale_factor = 0.08
#autovacuum_vacuum_cost_delay=100
#autovacuum_vacuum_cost_limit=100
default_statistics_target = 40

For the particular table I have pg_autovacuum overrides as

app=# select * from pg_autovacuum where vacrelid = 16603;
 vacrelid | enabled | vac_base_thresh | vac_scale_factor | anl_base_thresh | anl_scale_factor | vac_cost_delay |
vac_cost_limit

----------+---------+-----------------+------------------+-----------------+------------------+----------------+----------------
    16603 | t       |             200 |             0.01 |             200 |             0.01 |              0 |
   400 


What I am seeing is, after about 12 hours an update of a few thousand records takes about 2+ minutes as opposed the
100msit used  
to take. I can restore performance only be stopping everything, perform a vacuum full analyze and restarting.

After the vacuum full, my table returns to the expected 250+ MB from the previous size.

qradar=# select * from q_table_size ;
           tablename            |  size
--------------------------------+---------
 target                         | 841.536

I can see autovacuum in top every 60 seconds as configured, but it is there and gone in the 1 second refresh. My table
growsconsistent  
every transaction to no avail. To stop the growth, I had to perform a manual vacuum analyze. But at this point,
performanceis so poor  
I have to perform vacuum analyze full.

Anyway, I am totally confused. My first cut at changing the autovacuum configuration was using Jim Nasby' advice by
cuttingall values in  
half leaving my tables at roughly 20% dead space, for this table, that would be just over 50k tuples. This however
yieldsthe same results  
as the above configuration with continous bloat. So, I was WAY more aggressive as shown above with no improvment. By
calculation,Jims advice 
would suffice for our system.

I just checked a production box which is running  8.1.1 and it is behaving as expected. This configuration only
specifies"autovacuum = true",  
everything else is left to the defaults.

Is there something whacked about my configuration? Is there a way I can troubleshoot what autovacuum is doing or why it
isnot performing  
the work?

Here is the output for the vacuum full of target...

qradar=# vacuum full analyze verbose target;
INFO:  vacuuming "public.target"
INFO:  "target": found 5048468 removable, 266778 nonremovable row versions in 96642 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 140 to 144 bytes long.
There were 1696 unused item pointers.
Total free space (including removable row versions) is 730074628 bytes.
89347 pages are or will become empty, including 0 at the end of the table.
95261 pages containing 730030436 free bytes are potential move destinations.
CPU 2.31s/1.27u sec elapsed 6.46 sec.
INFO:  index "target_pkey" now contains 266778 row versions in 18991 pages
DETAIL:  5048468 index row versions were removed.
40 index pages have been deleted, 40 are currently reusable.
CPU 0.91s/5.29u sec elapsed 6.24 sec.
INFO:  index "target_network_key" now contains 266778 row versions in 15159 pages
DETAIL:  5048468 index row versions were removed.
30 index pages have been deleted, 30 are currently reusable.
CPU 0.45s/4.96u sec elapsed 5.43 sec.
INFO:  index "target_tulu_idx" now contains 266778 row versions in 19453 pages
DETAIL:  5048468 index row versions were removed.
17106 index pages have been deleted, 17106 are currently reusable.
CPU 0.79s/3.31u sec elapsed 4.10 sec.
INFO:  "target": moved 266719 row versions, truncated 96642 to 4851 pages
DETAIL:  CPU 5.19s/8.86u sec elapsed 14.27 sec.
INFO:  index "target_pkey" now contains 266778 row versions in 18991 pages
DETAIL:  266719 index row versions were removed.
41 index pages have been deleted, 41 are currently reusable.
CPU 0.78s/0.54u sec elapsed 1.32 sec.
INFO:  index "target_network_key" now contains 266778 row versions in 15159 pages
DETAIL:  266719 index row versions were removed.
31 index pages have been deleted, 31 are currently reusable.
CPU 0.49s/0.44u sec elapsed 0.93 sec.
INFO:  index "target_tulu_idx" now contains 266778 row versions in 19453 pages
DETAIL:  266719 index row versions were removed.
16726 index pages have been deleted, 16726 are currently reusable.
CPU 0.33s/0.38u sec elapsed 0.76 sec.
INFO:  analyzing "public.target"
INFO:  "target": scanned 4851 of 4851 pages, containing 266778 live rows and 0 dead rows; 12000 rows in sample, 266778
estimatedtotal rows 
VACUUM

A db wide vacuum full outputs this at the end.

INFO:  free space map contains 32848 pages in 159 relations
DETAIL:  A total of 24192 page slots are in use (including overhead).
24192 page slots are required to track all free space.
Current limits are:  4024000 page slots, 2000 relations, using 23705 KB.

So, it appears my autovacuum is just NOT working... I must have screwed something up, but I cannot see what.

Thanks again.
Jody

Re: Help tuning autovacuum - seeing lots of relation

From

Csaba Nagy

Date:

21 June 2006, 11:08:45

> So, it appears my autovacuum is just NOT working... I must have screwed something up, but I cannot see what.

Is it possible that you have long running transactions ? If yes, VACUUM
is simply not efficient, as it won't eliminate the dead space
accumulated during the long running transaction. In that case VACUUM
FULL won't help you either as it also can't eliminate dead space still
visible by old transactions, but from what you say I guess you really
stop everything before doing VACUUM FULL so you might as well stopped
the culprit transaction too... that's why the VACUUM FULL worked (if my
assumption is correct).

To check if this is the case, look for "idle in transaction" in your
process listing (ps auxww|grep "idle in transaction"). If you got one
(or more) of that, you found your problem. If not, hopefully others will
help you :-)

Cheers,
Csaba.

Re: Help tuning autovacuum - seeing lots of relation

From

Tom Lane

Date:

21 June 2006, 11:17:18

Csaba Nagy <nagy@ecircle-ag.com> writes:
>> So, it appears my autovacuum is just NOT working... I must have screwed something up, but I cannot see what.

> Is it possible that you have long running transactions ?

The other question I was wondering about is if autovacuum is actually
choosing to vacuum the target table or not.  The only way to check that
in 8.1 is to crank log_min_messages up to DEBUG2 and then trawl through
the postmaster log looking for "autovac" messages.

            regards, tom lane

Re: Help tuning autovacuum - seeing lots of relationbloat

From

"jody brownell"

Date:

21 June 2006, 12:27:23

Our application is broken down quite well. We have two main writing processes
writing to two separate sets of tables. No crossing over, nothign to prohibit the
vacuuming in the nature which you describe.

My longest transaction on the tables in question are typically quite short until
of course they begin to bloat.



On Wednesday 21 June 2006 11:08, Csaba Nagy wrote:
> > So, it appears my autovacuum is just NOT working... I must have screwed something up, but I cannot see what.
>
> Is it possible that you have long running transactions ? If yes, VACUUM
> is simply not efficient, as it won't eliminate the dead space
> accumulated during the long running transaction. In that case VACUUM
> FULL won't help you either as it also can't eliminate dead space still
> visible by old transactions, but from what you say I guess you really
> stop everything before doing VACUUM FULL so you might as well stopped
> the culprit transaction too... that's why the VACUUM FULL worked (if my
> assumption is correct).
>
> To check if this is the case, look for "idle in transaction" in your
> process listing (ps auxww|grep "idle in transaction"). If you got one
> (or more) of that, you found your problem. If not, hopefully others will
> help you :-)
>
> Cheers,
> Csaba.
>
>
>
>

Re: Help tuning autovacuum - seeing lots of relationbloat

From

Csaba Nagy

Date:

21 June 2006, 12:42:38

On Wed, 2006-06-21 at 17:27, jody brownell wrote:
> Our application is broken down quite well. We have two main writing processes
> writing to two separate sets of tables. No crossing over, nothign to prohibit the
> vacuuming in the nature which you describe.

It really doesn't matter what table are you touching, as it doesn't
matter if you read or write either, what matters is how long ago was the
last "begin" without "commit" or "rollback". VACUUM will not touch
tuples which were deleted after the oldest not yet finished transaction
started, regardless if that transaction touched the vacuumed table or
not in any way...

> My longest transaction on the tables in question are typically quite short until
> of course they begin to bloat.

Well, your application might be completely well behaved and still your
DBA (or your favorite DB access tool for that matter) can leave open
transactions in an interactive session. It never hurts to check if you
actually have "idle in transaction" sessions. It happened a few times to
us, some of those were bad coding on ad-hoc tools written by us, others
were badly behaved DB access tools opening a transaction immediately
after connect and after each successful command, effectively leaving an
open transaction when leaving it open while having lunch...

So it might very well be that some interactive or ad hoc tools you're
using to manage the DB are your problem.

Cheers,
Csaba.

Re: Help tuning autovacuum - seeing lots of relationbloat

From

"jody brownell"

Date:

21 June 2006, 13:21:24

That is interesting.

There is one thread keeping a transaction open it appears from ps

postgres: app app xxx(42644) idle in transaction

however, I created a test table "t" not configured in pg_autovacuum. I inserted a whack of rows and saw this.

Jun 21 12:38:45 vanquish postgres[1525]: [8-1] LOG:  autovacuum: processing database "qradar"
Jun 21 12:38:45 vanquish postgres[1525]: [9-1] DEBUG:  autovac: will VACUUM ANALYZE t
Jun 21 12:38:45 vanquish postgres[1525]: [10-1] DEBUG:  vacuuming "public.t"
Jun 21 12:38:48 vanquish postgres[1525]: [11-1] DEBUG:  "t": removed 8104311 row versions in 51620 pages
Jun 21 12:38:48 vanquish postgres[1525]: [11-2] DETAIL:  CPU 0.93s/0.70u sec elapsed 1.70 sec.
Jun 21 12:38:48 vanquish postgres[1525]: [12-1] DEBUG:  "t": found 8104311 removable, 0 nonremovable row versions in
51620pages 
Jun 21 12:38:48 vanquish postgres[1525]: [12-2] DETAIL:  0 dead row versions cannot be removed yet.

followed a later (after I did a similar insert op on target) by this

Jun 21 13:00:46 vanquish postgres[3311]: [12-1] LOG:  autovacuum: processing database "qradar"
Jun 21 13:00:46 vanquish postgres[3311]: [13-1] DEBUG:  autovac: will VACUUM target
Jun 21 13:00:46 vanquish postgres[3311]: [14-1] DEBUG:  vacuuming "public.target"
Jun 21 13:01:51 vanquish postgres[3311]: [15-1] DEBUG:  index "target_pkey" now contains 1296817 row versions in 25116
pages
Jun 21 13:01:51 vanquish postgres[3311]: [15-2] DETAIL:  5645230 index row versions were removed.
Jun 21 13:01:51 vanquish postgres[3311]: [15-3] ^I116 index pages have been deleted, 60 are currently reusable.
Jun 21 13:01:51 vanquish postgres[3311]: [15-4] ^ICPU 1.29s/7.44u sec elapsed 48.65 sec.
Jun 21 13:02:19 vanquish postgres[3311]: [16-1] DEBUG:  index "target_network_key" now contains 1296817 row versions in
19849pages 
Jun 21 13:02:19 vanquish postgres[3311]: [16-2] DETAIL:  5645230 index row versions were removed.
Jun 21 13:02:19 vanquish postgres[3311]: [16-3] ^I32 index pages have been deleted, 0 are currently reusable.
Jun 21 13:02:19 vanquish postgres[3311]: [16-4] ^ICPU 0.89s/6.61u sec elapsed 27.77 sec.
Jun 21 13:02:47 vanquish postgres[3311]: [17-1] DEBUG:  index "target_network_details_id_idx" now contains 1296817 row
versionsin 23935 pages 
Jun 21 13:02:47 vanquish postgres[3311]: [17-2] DETAIL:  5645230 index row versions were removed.
Jun 21 13:02:47 vanquish postgres[3311]: [17-3] ^I17814 index pages have been deleted, 0 are currently reusable.
Jun 21 13:02:47 vanquish postgres[3311]: [17-4] ^ICPU 0.93s/7.52u sec elapsed 27.36 sec.
Jun 21 13:03:23 vanquish postgres[3311]: [18-1] DEBUG:  index "target_tulu_idx" now contains 1296817 row versions in
24341pages 
Jun 21 13:03:23 vanquish postgres[3311]: [18-2] DETAIL:  5645230 index row versions were removed.
Jun 21 13:03:23 vanquish postgres[3311]: [18-3] ^I18495 index pages have been deleted, 0 are currently reusable.
Jun 21 13:03:23 vanquish postgres[3311]: [18-4] ^ICPU 1.37s/5.38u sec elapsed 36.95 sec.
Jun 21 13:04:04 vanquish postgres[3311]: [19-1] DEBUG:  "target": removed 5645231 row versions in 106508 pages
Jun 21 13:04:04 vanquish postgres[3311]: [19-2] DETAIL:  CPU 3.37s/1.23u sec elapsed 40.63 sec.
Jun 21 13:04:04 vanquish postgres[3311]: [20-1] DEBUG:  "target": found 5645231 removable, 1296817 nonremovable row
versionsin 114701 pages 
Jun 21 13:04:04 vanquish postgres[3311]: [20-2] DETAIL:  0 dead row versions cannot be removed yet.

this was with the "Idle in transaction" though.....

Ah HA! Wondering, my autovacuum naptime is 60 seconds, that is also the interval which I wake up and begin persistence.
Wondering if I am simply locking autovacuum out of the tables b/c they are on a similar timeline.

I will try a 30 second naptime, if this is it, that should increase the likely hood of falling on the right side of the
TXmore often. 

make sense?


On Wednesday 21 June 2006 12:42, Csaba Nagy wrote:
> On Wed, 2006-06-21 at 17:27, jody brownell wrote:
> > Our application is broken down quite well. We have two main writing processes
> > writing to two separate sets of tables. No crossing over, nothign to prohibit the
> > vacuuming in the nature which you describe.
>
> It really doesn't matter what table are you touching, as it doesn't
> matter if you read or write either, what matters is how long ago was the
> last "begin" without "commit" or "rollback". VACUUM will not touch
> tuples which were deleted after the oldest not yet finished transaction
> started, regardless if that transaction touched the vacuumed table or
> not in any way...
>
> > My longest transaction on the tables in question are typically quite short until
> > of course they begin to bloat.
>
> Well, your application might be completely well behaved and still your
> DBA (or your favorite DB access tool for that matter) can leave open
> transactions in an interactive session. It never hurts to check if you
> actually have "idle in transaction" sessions. It happened a few times to
> us, some of those were bad coding on ad-hoc tools written by us, others
> were badly behaved DB access tools opening a transaction immediately
> after connect and after each successful command, effectively leaving an
> open transaction when leaving it open while having lunch...
>
> So it might very well be that some interactive or ad hoc tools you're
> using to manage the DB are your problem.
>
> Cheers,
> Csaba.
>
>
>

Re: Help tuning autovacuum - seeing lots of relationbloat

From

"jody brownell"

Date:

21 June 2006, 13:34:07

Opps - that was confusing. The idle in transaction was from one box and the autovacuum was from another.

So, one question was answered, auto vacuum is running and selecting the tables but apparently not at the
same time as my app probably due to this "idle in transaction". I will track it down and see what the difference is.

thanks

On Wednesday 21 June 2006 13:21, jody brownell wrote:
> That is interesting.
>
> There is one thread keeping a transaction open it appears from ps
>
> postgres: app app xxx(42644) idle in transaction
>
> however, I created a test table "t" not configured in pg_autovacuum. I inserted a whack of rows and saw this.
>
> Jun 21 12:38:45 vanquish postgres[1525]: [8-1] LOG:  autovacuum: processing database "qradar"
> Jun 21 12:38:45 vanquish postgres[1525]: [9-1] DEBUG:  autovac: will VACUUM ANALYZE t
> Jun 21 12:38:45 vanquish postgres[1525]: [10-1] DEBUG:  vacuuming "public.t"
> Jun 21 12:38:48 vanquish postgres[1525]: [11-1] DEBUG:  "t": removed 8104311 row versions in 51620 pages
> Jun 21 12:38:48 vanquish postgres[1525]: [11-2] DETAIL:  CPU 0.93s/0.70u sec elapsed 1.70 sec.
> Jun 21 12:38:48 vanquish postgres[1525]: [12-1] DEBUG:  "t": found 8104311 removable, 0 nonremovable row versions in
51620pages 
> Jun 21 12:38:48 vanquish postgres[1525]: [12-2] DETAIL:  0 dead row versions cannot be removed yet.
>
> followed a later (after I did a similar insert op on target) by this
>
> Jun 21 13:00:46 vanquish postgres[3311]: [12-1] LOG:  autovacuum: processing database "qradar"
> Jun 21 13:00:46 vanquish postgres[3311]: [13-1] DEBUG:  autovac: will VACUUM target
> Jun 21 13:00:46 vanquish postgres[3311]: [14-1] DEBUG:  vacuuming "public.target"
> Jun 21 13:01:51 vanquish postgres[3311]: [15-1] DEBUG:  index "target_pkey" now contains 1296817 row versions in
25116pages 
> Jun 21 13:01:51 vanquish postgres[3311]: [15-2] DETAIL:  5645230 index row versions were removed.
> Jun 21 13:01:51 vanquish postgres[3311]: [15-3] ^I116 index pages have been deleted, 60 are currently reusable.
> Jun 21 13:01:51 vanquish postgres[3311]: [15-4] ^ICPU 1.29s/7.44u sec elapsed 48.65 sec.
> Jun 21 13:02:19 vanquish postgres[3311]: [16-1] DEBUG:  index "target_network_key" now contains 1296817 row versions
in19849 pages 
> Jun 21 13:02:19 vanquish postgres[3311]: [16-2] DETAIL:  5645230 index row versions were removed.
> Jun 21 13:02:19 vanquish postgres[3311]: [16-3] ^I32 index pages have been deleted, 0 are currently reusable.
> Jun 21 13:02:19 vanquish postgres[3311]: [16-4] ^ICPU 0.89s/6.61u sec elapsed 27.77 sec.
> Jun 21 13:02:47 vanquish postgres[3311]: [17-1] DEBUG:  index "target_network_details_id_idx" now contains 1296817
rowversions in 23935 pages 
> Jun 21 13:02:47 vanquish postgres[3311]: [17-2] DETAIL:  5645230 index row versions were removed.
> Jun 21 13:02:47 vanquish postgres[3311]: [17-3] ^I17814 index pages have been deleted, 0 are currently reusable.
> Jun 21 13:02:47 vanquish postgres[3311]: [17-4] ^ICPU 0.93s/7.52u sec elapsed 27.36 sec.
> Jun 21 13:03:23 vanquish postgres[3311]: [18-1] DEBUG:  index "target_tulu_idx" now contains 1296817 row versions in
24341pages 
> Jun 21 13:03:23 vanquish postgres[3311]: [18-2] DETAIL:  5645230 index row versions were removed.
> Jun 21 13:03:23 vanquish postgres[3311]: [18-3] ^I18495 index pages have been deleted, 0 are currently reusable.
> Jun 21 13:03:23 vanquish postgres[3311]: [18-4] ^ICPU 1.37s/5.38u sec elapsed 36.95 sec.
> Jun 21 13:04:04 vanquish postgres[3311]: [19-1] DEBUG:  "target": removed 5645231 row versions in 106508 pages
> Jun 21 13:04:04 vanquish postgres[3311]: [19-2] DETAIL:  CPU 3.37s/1.23u sec elapsed 40.63 sec.
> Jun 21 13:04:04 vanquish postgres[3311]: [20-1] DEBUG:  "target": found 5645231 removable, 1296817 nonremovable row
versionsin 114701 pages 
> Jun 21 13:04:04 vanquish postgres[3311]: [20-2] DETAIL:  0 dead row versions cannot be removed yet.
>
> this was with the "Idle in transaction" though.....
>
> Ah HA! Wondering, my autovacuum naptime is 60 seconds, that is also the interval which I wake up and begin
persistence.
> Wondering if I am simply locking autovacuum out of the tables b/c they are on a similar timeline.
>
> I will try a 30 second naptime, if this is it, that should increase the likely hood of falling on the right side of
theTX more often. 
>
> make sense?
>
>
> On Wednesday 21 June 2006 12:42, Csaba Nagy wrote:
> > On Wed, 2006-06-21 at 17:27, jody brownell wrote:
> > > Our application is broken down quite well. We have two main writing processes
> > > writing to two separate sets of tables. No crossing over, nothign to prohibit the
> > > vacuuming in the nature which you describe.
> >
> > It really doesn't matter what table are you touching, as it doesn't
> > matter if you read or write either, what matters is how long ago was the
> > last "begin" without "commit" or "rollback". VACUUM will not touch
> > tuples which were deleted after the oldest not yet finished transaction
> > started, regardless if that transaction touched the vacuumed table or
> > not in any way...
> >
> > > My longest transaction on the tables in question are typically quite short until
> > > of course they begin to bloat.
> >
> > Well, your application might be completely well behaved and still your
> > DBA (or your favorite DB access tool for that matter) can leave open
> > transactions in an interactive session. It never hurts to check if you
> > actually have "idle in transaction" sessions. It happened a few times to
> > us, some of those were bad coding on ad-hoc tools written by us, others
> > were badly behaved DB access tools opening a transaction immediately
> > after connect and after each successful command, effectively leaving an
> > open transaction when leaving it open while having lunch...
> >
> > So it might very well be that some interactive or ad hoc tools you're
> > using to manage the DB are your problem.
> >
> > Cheers,
> > Csaba.
> >
> >
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: Help tuning autovacuum - seeing lots of relationbloat

From

Csaba Nagy

Date:

21 June 2006, 13:36:56

On Wed, 2006-06-21 at 18:21, jody brownell wrote:
> That is interesting.
>
> There is one thread keeping a transaction open it appears from ps
>
> postgres: app app xxx(42644) idle in transaction

That shouldn't be a problem on itself, "idle in transaction" happens all
the time between 2 commands in the same transaction... you only have a
problem if you see the same PID always "idle", that means somebody left
an open transaction and left for lunch.

[snip]
> this was with the "Idle in transaction" though.....

This probably means you don't have long running transactions currently.
However, if you happen to have just one such long transaction, the dead
space accumulates and normal vacuum will not be able to clean that
anymore. But I guess if you didn't find one now then you should take a
look at Tom's suggestion and bump up debug level to see if autovacuum
picks your table at all...

> Ah HA! Wondering, my autovacuum naptime is 60 seconds, that is also the interval which I wake up and begin
persistence.
> Wondering if I am simply locking autovacuum out of the tables b/c they are on a similar timeline.
>
> I will try a 30 second naptime, if this is it, that should increase the likely hood of falling on the right side of
theTX more often. 
>
> make sense?

I don't think that's your problem... vacuum wouldn't be locked out by
any activity which doesn't lock exclusively the table (and I guess
you're not doing that). If your persistence finishes quickly then that's
not the problem.

Oh, just occured to me... in order to use autovacuum you also need to
enable the statistics collector on row level:

stats_start_collector = on
stats_row_level = on

See also:
http://www.postgresql.org/docs/8.1/static/maintenance.html#AUTOVACUUM

This was not mentioned in the settings in your original post, so I guess
you didn't touch that, and I think they are disabled by default.

If this is disabled, you should enable it and "pg_ctl reload ....", that
should fix the problem.

Cheers,
Csaba.

Re: Help tuning autovacuum - seeing lots of relationbloat

From

"jody brownell"

Date:

21 June 2006, 13:39:40

that is exactly what I am seeing, one process, no change, always in idle while the others are constantly
changing their state.

looks like someone opened a tx then is blocking on a queue lock or something. dang.



On Wednesday 21 June 2006 13:36, Csaba Nagy wrote:
> On Wed, 2006-06-21 at 18:21, jody brownell wrote:
> > That is interesting.
> >
> > There is one thread keeping a transaction open it appears from ps
> >
> > postgres: app app xxx(42644) idle in transaction
>
> That shouldn't be a problem on itself, "idle in transaction" happens all
> the time between 2 commands in the same transaction... you only have a
> problem if you see the same PID always "idle", that means somebody left
> an open transaction and left for lunch.
>
> [snip]
> > this was with the "Idle in transaction" though.....
>
> This probably means you don't have long running transactions currently.
> However, if you happen to have just one such long transaction, the dead
> space accumulates and normal vacuum will not be able to clean that
> anymore. But I guess if you didn't find one now then you should take a
> look at Tom's suggestion and bump up debug level to see if autovacuum
> picks your table at all...
>
> > Ah HA! Wondering, my autovacuum naptime is 60 seconds, that is also the interval which I wake up and begin
persistence.
> > Wondering if I am simply locking autovacuum out of the tables b/c they are on a similar timeline.
> >
> > I will try a 30 second naptime, if this is it, that should increase the likely hood of falling on the right side of
theTX more often. 
> >
> > make sense?
>
> I don't think that's your problem... vacuum wouldn't be locked out by
> any activity which doesn't lock exclusively the table (and I guess
> you're not doing that). If your persistence finishes quickly then that's
> not the problem.
>
> Oh, just occured to me... in order to use autovacuum you also need to
> enable the statistics collector on row level:
>
> stats_start_collector = on
> stats_row_level = on
>
> See also:
> http://www.postgresql.org/docs/8.1/static/maintenance.html#AUTOVACUUM
>
> This was not mentioned in the settings in your original post, so I guess
> you didn't touch that, and I think they are disabled by default.
>
> If this is disabled, you should enable it and "pg_ctl reload ....", that
> should fix the problem.
>
> Cheers,
> Csaba.
>
>
>

Re: Help tuning autovacuum - seeing lots of relationbloat

From

Csaba Nagy

Date:

21 June 2006, 13:44:52

On Wed, 2006-06-21 at 18:39, jody brownell wrote:
> that is exactly what I am seeing, one process, no change, always in idle while the others are constantly
> changing their state.
>
> looks like someone opened a tx then is blocking on a queue lock or something. dang.

Don't forget to check the statistics collector settings (see below), if
that is not correct then autovacuum is indeed not working correctly... I
should have put that on the beginning of the mail so you won't overlook
it ;-)

> >
> > Oh, just occured to me... in order to use autovacuum you also need to
> > enable the statistics collector on row level:
> >
> > stats_start_collector = on
> > stats_row_level = on
> >
> > See also:
> > http://www.postgresql.org/docs/8.1/static/maintenance.html#AUTOVACUUM
> >
> > This was not mentioned in the settings in your original post, so I guess
> > you didn't touch that, and I think they are disabled by default.
> >
> > If this is disabled, you should enable it and "pg_ctl reload ....", that
> > should fix the problem.
> >
> > Cheers,
> > Csaba.

Re: Help tuning autovacuum - seeing lots of relationbloat

From

"jody brownell"

Date:

21 June 2006, 13:50:00

block and row are always configured on - they are my friend :)

thanks
On Wednesday 21 June 2006 13:44, Csaba Nagy wrote:
> On Wed, 2006-06-21 at 18:39, jody brownell wrote:
> > that is exactly what I am seeing, one process, no change, always in idle while the others are constantly
> > changing their state.
> >
> > looks like someone opened a tx then is blocking on a queue lock or something. dang.
>
> Don't forget to check the statistics collector settings (see below), if
> that is not correct then autovacuum is indeed not working correctly... I
> should have put that on the beginning of the mail so you won't overlook
> it ;-)
>
> > >
> > > Oh, just occured to me... in order to use autovacuum you also need to
> > > enable the statistics collector on row level:
> > >
> > > stats_start_collector = on
> > > stats_row_level = on
> > >
> > > See also:
> > > http://www.postgresql.org/docs/8.1/static/maintenance.html#AUTOVACUUM
> > >
> > > This was not mentioned in the settings in your original post, so I guess
> > > you didn't touch that, and I think they are disabled by default.
> > >
> > > If this is disabled, you should enable it and "pg_ctl reload ....", that
> > > should fix the problem.
> > >
> > > Cheers,
> > > Csaba.
>
>
>
>

Re: Help tuning autovacuum - seeing lots of relation bloat

From

"Jim C. Nasby"

Date:

21 June 2006, 15:58:03

On Wed, Jun 21, 2006 at 10:52:42AM -0300, jody brownell wrote:
> A db wide vacuum full outputs this at the end.
>
> INFO:  free space map contains 32848 pages in 159 relations
> DETAIL:  A total of 24192 page slots are in use (including overhead).
> 24192 page slots are required to track all free space.
> Current limits are:  4024000 page slots, 2000 relations, using 23705 KB.

FWIW, the tail end of a db-wide vacuum FULL doesn't provide useful info
about FSM utilization, because it just made everything as compact as
possible.

My suspicion is that it's taking too long for autovac to get around to
this database/table. Dropping the sleep time might help. I see that this
table is vacuumed with a delay setting of 0, but if there are other
tables with a high delay that could pose a problem.

Getting detailed output of what autovac is actually doing as Tom
suggested would be a good idea.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Help tuning autovacuum - seeing lots of relationbloat

From

"Jim C. Nasby"

Date:

21 June 2006, 16:38:38

On Wed, Jun 21, 2006 at 01:21:05PM -0300, jody brownell wrote:
> Jun 21 13:04:04 vanquish postgres[3311]: [19-1] DEBUG:  "target": removed 5645231 row versions in 106508 pages
> Jun 21 13:04:04 vanquish postgres[3311]: [19-2] DETAIL:  CPU 3.37s/1.23u sec elapsed 40.63 sec.
> Jun 21 13:04:04 vanquish postgres[3311]: [20-1] DEBUG:  "target": found 5645231 removable, 1296817 nonremovable row
versionsin 114701 pages 
> Jun 21 13:04:04 vanquish postgres[3311]: [20-2] DETAIL:  0 dead row versions cannot be removed yet.

So the table contained 5.6M dead rows and 1.3M live rows.

I think you should forget about having autovacuum keep this table
in-check and add manual vacuum commands to your code. Autovac is
intended to deal with 99% of use cases; this is pretty clearly in the 1%
it can't handle.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Help tuning autovacuum - seeing lots of relationbloat

From

"jody brownell"

Date:

21 June 2006, 16:40:55

OK.... this was over a 12 - 16 hour period of not having anything done with it though right?

I am assuming if autovacuum were active through out that period, we would be somewhat better off ...is that not
accurate?


On Wednesday 21 June 2006 16:38, Jim C. Nasby wrote:
> 5

Re: Help tuning autovacuum - seeing lots of relationbloat

From

"jody brownell"

Date:

21 June 2006, 16:41:51

BTW, in production with a similar load - autovacuum with default out of the box
settings seems to work quite well....

I double checked this earlier today.

On Wednesday 21 June 2006 16:38, Jim C. Nasby wrote:
> 5

Re: Help tuning autovacuum - seeing lots of relationbloat

From

"Jim C. Nasby"

Date:

21 June 2006, 19:36:39

On Wed, Jun 21, 2006 at 04:41:45PM -0300, jody brownell wrote:
> BTW, in production with a similar load - autovacuum with default out of the box
> settings seems to work quite well....
>
> I double checked this earlier today.

So what's different between production and the machine with the problem?

The issue with autovac is that it will only vacuum one table at a time,
so if it's off vacuuming some other table for a long period of time it
won't be touching this table, which will be a problem. Now, if that's
actually what's happening...
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Help tuning autovacuum - seeing lots of relationbloat

From

"jody brownell"

Date:

21 June 2006, 20:09:05

Well, for one we did introduce a TX leak which was preventing autovac from running. I guess that was _the_ issue.

I have since fixed it and an now testing.... looks much better, nothing concerning....
(fingers crossed until morning :)). debug logs are full of vac/anal of the tables... so, for now I am back
on track moving forward... Now that auto vac is actually running, the box is feeling slightly more sluggish.

BTW - As soon as we deliver to QA, I will post the test case for the memory leak I was seeing the other day.
(I have not forgotten, I am just swamped)

Thanks for the help all. Much appreciated.
Cheers.

On Wednesday 21 June 2006 19:11, Jim C. Nasby wrote:
> On Wed, Jun 21, 2006 at 04:41:45PM -0300, jody brownell wrote:
> > BTW, in production with a similar load - autovacuum with default out of the box
> > settings seems to work quite well....
> >
> > I double checked this earlier today.
>
> So what's different between production and the machine with the problem?
>
> The issue with autovac is that it will only vacuum one table at a time,
> so if it's off vacuuming some other table for a long period of time it
> won't be touching this table, which will be a problem. Now, if that's
> actually what's happening...

Re: Help tuning autovacuum - seeing lots of relationbloat

From

Markus Schaber

Date:

23 June 2006, 09:32:30

Hi, Csaba,

Csaba Nagy wrote:

> Well, your application might be completely well behaved and still your
> DBA (or your favorite DB access tool for that matter) can leave open
> transactions in an interactive session. It never hurts to check if you
> actually have "idle in transaction" sessions. It happened a few times to
> us, some of those were bad coding on ad-hoc tools written by us, others
> were badly behaved DB access tools opening a transaction immediately
> after connect and after each successful command, effectively leaving an
> open transaction when leaving it open while having lunch...

Some older JDBC driver versions had the bug that they always had an open
transaction, thus an application server having some pooled connections
lingering around could block vacuum forever.

Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: Help tuning autovacuum - seeing lots of relationbloat

From

Markus Schaber

Date:

23 June 2006, 09:40:44

Jim C. Nasby wrote:
> On Wed, Jun 21, 2006 at 01:21:05PM -0300, jody brownell wrote:
>> Jun 21 13:04:04 vanquish postgres[3311]: [19-1] DEBUG:  "target": removed 5645231 row versions in 106508 pages
>> Jun 21 13:04:04 vanquish postgres[3311]: [19-2] DETAIL:  CPU 3.37s/1.23u sec elapsed 40.63 sec.
>> Jun 21 13:04:04 vanquish postgres[3311]: [20-1] DEBUG:  "target": found 5645231 removable, 1296817 nonremovable row
versionsin 114701 pages 
>> Jun 21 13:04:04 vanquish postgres[3311]: [20-2] DETAIL:  0 dead row versions cannot be removed yet.
>
> So the table contained 5.6M dead rows and 1.3M live rows.
>
> I think you should forget about having autovacuum keep this table
> in-check and add manual vacuum commands to your code. Autovac is
> intended to deal with 99% of use cases; this is pretty clearly in the 1%
> it can't handle.

Maybe your free space map is configured to small, can you watch out for
log messages telling to increase it?


HTH,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org