Thread: Help tuning autovacuum - seeing lots of relation bloat
Hey - I am running into a data relation bloat problem which I believe is causing fairly significant slowdown of my updates. I am using version version ----------------------------------------------------------------------------- PostgreSQL 8.1.4 on i586-trustix-linux-gnu, compiled by GCC gcc (GCC) 3.2.3 After about 12 hours of running, my updates are causing lots of reads and iowait (45%) slowing everything down. The DB bloats from 259MB to 2.4 - 3.4GB. The primary table which is troubled is called target and reaches a size of in mb of 834MB from its freshly 'vacuum full analyze' size of 39MB. qradar=# select * from q_table_size; tablename | size --------------------------------+--------- target | 834.496 My configuration includes. shared_buffers = 32767 work_mem = 20480 maintenance_work_mem = 32768 max_fsm_pages = 4024000 max_fsm_relations = 2000 fsync = false wal_sync_method = fsync wal_buffers = 4096 checkpoint_segments = 32 checkpoint_timeout = 1200 checkpoint_warning = 60 commit_delay = 5000 commit_siblings = 5 effective_cache_size = 175000 random_page_cost = 2 autovacuum = true autovacuum_naptime = 60 autovacuum_vacuum_threshold = 500 autovacuum_analyze_threshold = 250 autovacuum_vacuum_scale_factor = 0.08 autovacuum_analyze_scale_factor = 0.08 #autovacuum_vacuum_cost_delay=100 #autovacuum_vacuum_cost_limit=100 default_statistics_target = 40 For the particular table I have pg_autovacuum overrides as app=# select * from pg_autovacuum where vacrelid = 16603; vacrelid | enabled | vac_base_thresh | vac_scale_factor | anl_base_thresh | anl_scale_factor | vac_cost_delay | vac_cost_limit ----------+---------+-----------------+------------------+-----------------+------------------+----------------+---------------- 16603 | t | 200 | 0.01 | 200 | 0.01 | 0 | 400 What I am seeing is, after about 12 hours an update of a few thousand records takes about 2+ minutes as opposed the 100msit used to take. I can restore performance only be stopping everything, perform a vacuum full analyze and restarting. After the vacuum full, my table returns to the expected 250+ MB from the previous size. qradar=# select * from q_table_size ; tablename | size --------------------------------+--------- target | 841.536 I can see autovacuum in top every 60 seconds as configured, but it is there and gone in the 1 second refresh. My table growsconsistent every transaction to no avail. To stop the growth, I had to perform a manual vacuum analyze. But at this point, performanceis so poor I have to perform vacuum analyze full. Anyway, I am totally confused. My first cut at changing the autovacuum configuration was using Jim Nasby' advice by cuttingall values in half leaving my tables at roughly 20% dead space, for this table, that would be just over 50k tuples. This however yieldsthe same results as the above configuration with continous bloat. So, I was WAY more aggressive as shown above with no improvment. By calculation,Jims advice would suffice for our system. I just checked a production box which is running 8.1.1 and it is behaving as expected. This configuration only specifies"autovacuum = true", everything else is left to the defaults. Is there something whacked about my configuration? Is there a way I can troubleshoot what autovacuum is doing or why it isnot performing the work? Here is the output for the vacuum full of target... qradar=# vacuum full analyze verbose target; INFO: vacuuming "public.target" INFO: "target": found 5048468 removable, 266778 nonremovable row versions in 96642 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 140 to 144 bytes long. There were 1696 unused item pointers. Total free space (including removable row versions) is 730074628 bytes. 89347 pages are or will become empty, including 0 at the end of the table. 95261 pages containing 730030436 free bytes are potential move destinations. CPU 2.31s/1.27u sec elapsed 6.46 sec. INFO: index "target_pkey" now contains 266778 row versions in 18991 pages DETAIL: 5048468 index row versions were removed. 40 index pages have been deleted, 40 are currently reusable. CPU 0.91s/5.29u sec elapsed 6.24 sec. INFO: index "target_network_key" now contains 266778 row versions in 15159 pages DETAIL: 5048468 index row versions were removed. 30 index pages have been deleted, 30 are currently reusable. CPU 0.45s/4.96u sec elapsed 5.43 sec. INFO: index "target_tulu_idx" now contains 266778 row versions in 19453 pages DETAIL: 5048468 index row versions were removed. 17106 index pages have been deleted, 17106 are currently reusable. CPU 0.79s/3.31u sec elapsed 4.10 sec. INFO: "target": moved 266719 row versions, truncated 96642 to 4851 pages DETAIL: CPU 5.19s/8.86u sec elapsed 14.27 sec. INFO: index "target_pkey" now contains 266778 row versions in 18991 pages DETAIL: 266719 index row versions were removed. 41 index pages have been deleted, 41 are currently reusable. CPU 0.78s/0.54u sec elapsed 1.32 sec. INFO: index "target_network_key" now contains 266778 row versions in 15159 pages DETAIL: 266719 index row versions were removed. 31 index pages have been deleted, 31 are currently reusable. CPU 0.49s/0.44u sec elapsed 0.93 sec. INFO: index "target_tulu_idx" now contains 266778 row versions in 19453 pages DETAIL: 266719 index row versions were removed. 16726 index pages have been deleted, 16726 are currently reusable. CPU 0.33s/0.38u sec elapsed 0.76 sec. INFO: analyzing "public.target" INFO: "target": scanned 4851 of 4851 pages, containing 266778 live rows and 0 dead rows; 12000 rows in sample, 266778 estimatedtotal rows VACUUM A db wide vacuum full outputs this at the end. INFO: free space map contains 32848 pages in 159 relations DETAIL: A total of 24192 page slots are in use (including overhead). 24192 page slots are required to track all free space. Current limits are: 4024000 page slots, 2000 relations, using 23705 KB. So, it appears my autovacuum is just NOT working... I must have screwed something up, but I cannot see what. Thanks again. Jody
> So, it appears my autovacuum is just NOT working... I must have screwed something up, but I cannot see what. Is it possible that you have long running transactions ? If yes, VACUUM is simply not efficient, as it won't eliminate the dead space accumulated during the long running transaction. In that case VACUUM FULL won't help you either as it also can't eliminate dead space still visible by old transactions, but from what you say I guess you really stop everything before doing VACUUM FULL so you might as well stopped the culprit transaction too... that's why the VACUUM FULL worked (if my assumption is correct). To check if this is the case, look for "idle in transaction" in your process listing (ps auxww|grep "idle in transaction"). If you got one (or more) of that, you found your problem. If not, hopefully others will help you :-) Cheers, Csaba.
Csaba Nagy <nagy@ecircle-ag.com> writes: >> So, it appears my autovacuum is just NOT working... I must have screwed something up, but I cannot see what. > Is it possible that you have long running transactions ? The other question I was wondering about is if autovacuum is actually choosing to vacuum the target table or not. The only way to check that in 8.1 is to crank log_min_messages up to DEBUG2 and then trawl through the postmaster log looking for "autovac" messages. regards, tom lane
Our application is broken down quite well. We have two main writing processes writing to two separate sets of tables. No crossing over, nothign to prohibit the vacuuming in the nature which you describe. My longest transaction on the tables in question are typically quite short until of course they begin to bloat. On Wednesday 21 June 2006 11:08, Csaba Nagy wrote: > > So, it appears my autovacuum is just NOT working... I must have screwed something up, but I cannot see what. > > Is it possible that you have long running transactions ? If yes, VACUUM > is simply not efficient, as it won't eliminate the dead space > accumulated during the long running transaction. In that case VACUUM > FULL won't help you either as it also can't eliminate dead space still > visible by old transactions, but from what you say I guess you really > stop everything before doing VACUUM FULL so you might as well stopped > the culprit transaction too... that's why the VACUUM FULL worked (if my > assumption is correct). > > To check if this is the case, look for "idle in transaction" in your > process listing (ps auxww|grep "idle in transaction"). If you got one > (or more) of that, you found your problem. If not, hopefully others will > help you :-) > > Cheers, > Csaba. > > > >
On Wed, 2006-06-21 at 17:27, jody brownell wrote: > Our application is broken down quite well. We have two main writing processes > writing to two separate sets of tables. No crossing over, nothign to prohibit the > vacuuming in the nature which you describe. It really doesn't matter what table are you touching, as it doesn't matter if you read or write either, what matters is how long ago was the last "begin" without "commit" or "rollback". VACUUM will not touch tuples which were deleted after the oldest not yet finished transaction started, regardless if that transaction touched the vacuumed table or not in any way... > My longest transaction on the tables in question are typically quite short until > of course they begin to bloat. Well, your application might be completely well behaved and still your DBA (or your favorite DB access tool for that matter) can leave open transactions in an interactive session. It never hurts to check if you actually have "idle in transaction" sessions. It happened a few times to us, some of those were bad coding on ad-hoc tools written by us, others were badly behaved DB access tools opening a transaction immediately after connect and after each successful command, effectively leaving an open transaction when leaving it open while having lunch... So it might very well be that some interactive or ad hoc tools you're using to manage the DB are your problem. Cheers, Csaba.
That is interesting. There is one thread keeping a transaction open it appears from ps postgres: app app xxx(42644) idle in transaction however, I created a test table "t" not configured in pg_autovacuum. I inserted a whack of rows and saw this. Jun 21 12:38:45 vanquish postgres[1525]: [8-1] LOG: autovacuum: processing database "qradar" Jun 21 12:38:45 vanquish postgres[1525]: [9-1] DEBUG: autovac: will VACUUM ANALYZE t Jun 21 12:38:45 vanquish postgres[1525]: [10-1] DEBUG: vacuuming "public.t" Jun 21 12:38:48 vanquish postgres[1525]: [11-1] DEBUG: "t": removed 8104311 row versions in 51620 pages Jun 21 12:38:48 vanquish postgres[1525]: [11-2] DETAIL: CPU 0.93s/0.70u sec elapsed 1.70 sec. Jun 21 12:38:48 vanquish postgres[1525]: [12-1] DEBUG: "t": found 8104311 removable, 0 nonremovable row versions in 51620pages Jun 21 12:38:48 vanquish postgres[1525]: [12-2] DETAIL: 0 dead row versions cannot be removed yet. followed a later (after I did a similar insert op on target) by this Jun 21 13:00:46 vanquish postgres[3311]: [12-1] LOG: autovacuum: processing database "qradar" Jun 21 13:00:46 vanquish postgres[3311]: [13-1] DEBUG: autovac: will VACUUM target Jun 21 13:00:46 vanquish postgres[3311]: [14-1] DEBUG: vacuuming "public.target" Jun 21 13:01:51 vanquish postgres[3311]: [15-1] DEBUG: index "target_pkey" now contains 1296817 row versions in 25116 pages Jun 21 13:01:51 vanquish postgres[3311]: [15-2] DETAIL: 5645230 index row versions were removed. Jun 21 13:01:51 vanquish postgres[3311]: [15-3] ^I116 index pages have been deleted, 60 are currently reusable. Jun 21 13:01:51 vanquish postgres[3311]: [15-4] ^ICPU 1.29s/7.44u sec elapsed 48.65 sec. Jun 21 13:02:19 vanquish postgres[3311]: [16-1] DEBUG: index "target_network_key" now contains 1296817 row versions in 19849pages Jun 21 13:02:19 vanquish postgres[3311]: [16-2] DETAIL: 5645230 index row versions were removed. Jun 21 13:02:19 vanquish postgres[3311]: [16-3] ^I32 index pages have been deleted, 0 are currently reusable. Jun 21 13:02:19 vanquish postgres[3311]: [16-4] ^ICPU 0.89s/6.61u sec elapsed 27.77 sec. Jun 21 13:02:47 vanquish postgres[3311]: [17-1] DEBUG: index "target_network_details_id_idx" now contains 1296817 row versionsin 23935 pages Jun 21 13:02:47 vanquish postgres[3311]: [17-2] DETAIL: 5645230 index row versions were removed. Jun 21 13:02:47 vanquish postgres[3311]: [17-3] ^I17814 index pages have been deleted, 0 are currently reusable. Jun 21 13:02:47 vanquish postgres[3311]: [17-4] ^ICPU 0.93s/7.52u sec elapsed 27.36 sec. Jun 21 13:03:23 vanquish postgres[3311]: [18-1] DEBUG: index "target_tulu_idx" now contains 1296817 row versions in 24341pages Jun 21 13:03:23 vanquish postgres[3311]: [18-2] DETAIL: 5645230 index row versions were removed. Jun 21 13:03:23 vanquish postgres[3311]: [18-3] ^I18495 index pages have been deleted, 0 are currently reusable. Jun 21 13:03:23 vanquish postgres[3311]: [18-4] ^ICPU 1.37s/5.38u sec elapsed 36.95 sec. Jun 21 13:04:04 vanquish postgres[3311]: [19-1] DEBUG: "target": removed 5645231 row versions in 106508 pages Jun 21 13:04:04 vanquish postgres[3311]: [19-2] DETAIL: CPU 3.37s/1.23u sec elapsed 40.63 sec. Jun 21 13:04:04 vanquish postgres[3311]: [20-1] DEBUG: "target": found 5645231 removable, 1296817 nonremovable row versionsin 114701 pages Jun 21 13:04:04 vanquish postgres[3311]: [20-2] DETAIL: 0 dead row versions cannot be removed yet. this was with the "Idle in transaction" though..... Ah HA! Wondering, my autovacuum naptime is 60 seconds, that is also the interval which I wake up and begin persistence. Wondering if I am simply locking autovacuum out of the tables b/c they are on a similar timeline. I will try a 30 second naptime, if this is it, that should increase the likely hood of falling on the right side of the TXmore often. make sense? On Wednesday 21 June 2006 12:42, Csaba Nagy wrote: > On Wed, 2006-06-21 at 17:27, jody brownell wrote: > > Our application is broken down quite well. We have two main writing processes > > writing to two separate sets of tables. No crossing over, nothign to prohibit the > > vacuuming in the nature which you describe. > > It really doesn't matter what table are you touching, as it doesn't > matter if you read or write either, what matters is how long ago was the > last "begin" without "commit" or "rollback". VACUUM will not touch > tuples which were deleted after the oldest not yet finished transaction > started, regardless if that transaction touched the vacuumed table or > not in any way... > > > My longest transaction on the tables in question are typically quite short until > > of course they begin to bloat. > > Well, your application might be completely well behaved and still your > DBA (or your favorite DB access tool for that matter) can leave open > transactions in an interactive session. It never hurts to check if you > actually have "idle in transaction" sessions. It happened a few times to > us, some of those were bad coding on ad-hoc tools written by us, others > were badly behaved DB access tools opening a transaction immediately > after connect and after each successful command, effectively leaving an > open transaction when leaving it open while having lunch... > > So it might very well be that some interactive or ad hoc tools you're > using to manage the DB are your problem. > > Cheers, > Csaba. > > >
Opps - that was confusing. The idle in transaction was from one box and the autovacuum was from another. So, one question was answered, auto vacuum is running and selecting the tables but apparently not at the same time as my app probably due to this "idle in transaction". I will track it down and see what the difference is. thanks On Wednesday 21 June 2006 13:21, jody brownell wrote: > That is interesting. > > There is one thread keeping a transaction open it appears from ps > > postgres: app app xxx(42644) idle in transaction > > however, I created a test table "t" not configured in pg_autovacuum. I inserted a whack of rows and saw this. > > Jun 21 12:38:45 vanquish postgres[1525]: [8-1] LOG: autovacuum: processing database "qradar" > Jun 21 12:38:45 vanquish postgres[1525]: [9-1] DEBUG: autovac: will VACUUM ANALYZE t > Jun 21 12:38:45 vanquish postgres[1525]: [10-1] DEBUG: vacuuming "public.t" > Jun 21 12:38:48 vanquish postgres[1525]: [11-1] DEBUG: "t": removed 8104311 row versions in 51620 pages > Jun 21 12:38:48 vanquish postgres[1525]: [11-2] DETAIL: CPU 0.93s/0.70u sec elapsed 1.70 sec. > Jun 21 12:38:48 vanquish postgres[1525]: [12-1] DEBUG: "t": found 8104311 removable, 0 nonremovable row versions in 51620pages > Jun 21 12:38:48 vanquish postgres[1525]: [12-2] DETAIL: 0 dead row versions cannot be removed yet. > > followed a later (after I did a similar insert op on target) by this > > Jun 21 13:00:46 vanquish postgres[3311]: [12-1] LOG: autovacuum: processing database "qradar" > Jun 21 13:00:46 vanquish postgres[3311]: [13-1] DEBUG: autovac: will VACUUM target > Jun 21 13:00:46 vanquish postgres[3311]: [14-1] DEBUG: vacuuming "public.target" > Jun 21 13:01:51 vanquish postgres[3311]: [15-1] DEBUG: index "target_pkey" now contains 1296817 row versions in 25116pages > Jun 21 13:01:51 vanquish postgres[3311]: [15-2] DETAIL: 5645230 index row versions were removed. > Jun 21 13:01:51 vanquish postgres[3311]: [15-3] ^I116 index pages have been deleted, 60 are currently reusable. > Jun 21 13:01:51 vanquish postgres[3311]: [15-4] ^ICPU 1.29s/7.44u sec elapsed 48.65 sec. > Jun 21 13:02:19 vanquish postgres[3311]: [16-1] DEBUG: index "target_network_key" now contains 1296817 row versions in19849 pages > Jun 21 13:02:19 vanquish postgres[3311]: [16-2] DETAIL: 5645230 index row versions were removed. > Jun 21 13:02:19 vanquish postgres[3311]: [16-3] ^I32 index pages have been deleted, 0 are currently reusable. > Jun 21 13:02:19 vanquish postgres[3311]: [16-4] ^ICPU 0.89s/6.61u sec elapsed 27.77 sec. > Jun 21 13:02:47 vanquish postgres[3311]: [17-1] DEBUG: index "target_network_details_id_idx" now contains 1296817 rowversions in 23935 pages > Jun 21 13:02:47 vanquish postgres[3311]: [17-2] DETAIL: 5645230 index row versions were removed. > Jun 21 13:02:47 vanquish postgres[3311]: [17-3] ^I17814 index pages have been deleted, 0 are currently reusable. > Jun 21 13:02:47 vanquish postgres[3311]: [17-4] ^ICPU 0.93s/7.52u sec elapsed 27.36 sec. > Jun 21 13:03:23 vanquish postgres[3311]: [18-1] DEBUG: index "target_tulu_idx" now contains 1296817 row versions in 24341pages > Jun 21 13:03:23 vanquish postgres[3311]: [18-2] DETAIL: 5645230 index row versions were removed. > Jun 21 13:03:23 vanquish postgres[3311]: [18-3] ^I18495 index pages have been deleted, 0 are currently reusable. > Jun 21 13:03:23 vanquish postgres[3311]: [18-4] ^ICPU 1.37s/5.38u sec elapsed 36.95 sec. > Jun 21 13:04:04 vanquish postgres[3311]: [19-1] DEBUG: "target": removed 5645231 row versions in 106508 pages > Jun 21 13:04:04 vanquish postgres[3311]: [19-2] DETAIL: CPU 3.37s/1.23u sec elapsed 40.63 sec. > Jun 21 13:04:04 vanquish postgres[3311]: [20-1] DEBUG: "target": found 5645231 removable, 1296817 nonremovable row versionsin 114701 pages > Jun 21 13:04:04 vanquish postgres[3311]: [20-2] DETAIL: 0 dead row versions cannot be removed yet. > > this was with the "Idle in transaction" though..... > > Ah HA! Wondering, my autovacuum naptime is 60 seconds, that is also the interval which I wake up and begin persistence. > Wondering if I am simply locking autovacuum out of the tables b/c they are on a similar timeline. > > I will try a 30 second naptime, if this is it, that should increase the likely hood of falling on the right side of theTX more often. > > make sense? > > > On Wednesday 21 June 2006 12:42, Csaba Nagy wrote: > > On Wed, 2006-06-21 at 17:27, jody brownell wrote: > > > Our application is broken down quite well. We have two main writing processes > > > writing to two separate sets of tables. No crossing over, nothign to prohibit the > > > vacuuming in the nature which you describe. > > > > It really doesn't matter what table are you touching, as it doesn't > > matter if you read or write either, what matters is how long ago was the > > last "begin" without "commit" or "rollback". VACUUM will not touch > > tuples which were deleted after the oldest not yet finished transaction > > started, regardless if that transaction touched the vacuumed table or > > not in any way... > > > > > My longest transaction on the tables in question are typically quite short until > > > of course they begin to bloat. > > > > Well, your application might be completely well behaved and still your > > DBA (or your favorite DB access tool for that matter) can leave open > > transactions in an interactive session. It never hurts to check if you > > actually have "idle in transaction" sessions. It happened a few times to > > us, some of those were bad coding on ad-hoc tools written by us, others > > were badly behaved DB access tools opening a transaction immediately > > after connect and after each successful command, effectively leaving an > > open transaction when leaving it open while having lunch... > > > > So it might very well be that some interactive or ad hoc tools you're > > using to manage the DB are your problem. > > > > Cheers, > > Csaba. > > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
On Wed, 2006-06-21 at 18:21, jody brownell wrote: > That is interesting. > > There is one thread keeping a transaction open it appears from ps > > postgres: app app xxx(42644) idle in transaction That shouldn't be a problem on itself, "idle in transaction" happens all the time between 2 commands in the same transaction... you only have a problem if you see the same PID always "idle", that means somebody left an open transaction and left for lunch. [snip] > this was with the "Idle in transaction" though..... This probably means you don't have long running transactions currently. However, if you happen to have just one such long transaction, the dead space accumulates and normal vacuum will not be able to clean that anymore. But I guess if you didn't find one now then you should take a look at Tom's suggestion and bump up debug level to see if autovacuum picks your table at all... > Ah HA! Wondering, my autovacuum naptime is 60 seconds, that is also the interval which I wake up and begin persistence. > Wondering if I am simply locking autovacuum out of the tables b/c they are on a similar timeline. > > I will try a 30 second naptime, if this is it, that should increase the likely hood of falling on the right side of theTX more often. > > make sense? I don't think that's your problem... vacuum wouldn't be locked out by any activity which doesn't lock exclusively the table (and I guess you're not doing that). If your persistence finishes quickly then that's not the problem. Oh, just occured to me... in order to use autovacuum you also need to enable the statistics collector on row level: stats_start_collector = on stats_row_level = on See also: http://www.postgresql.org/docs/8.1/static/maintenance.html#AUTOVACUUM This was not mentioned in the settings in your original post, so I guess you didn't touch that, and I think they are disabled by default. If this is disabled, you should enable it and "pg_ctl reload ....", that should fix the problem. Cheers, Csaba.
that is exactly what I am seeing, one process, no change, always in idle while the others are constantly changing their state. looks like someone opened a tx then is blocking on a queue lock or something. dang. On Wednesday 21 June 2006 13:36, Csaba Nagy wrote: > On Wed, 2006-06-21 at 18:21, jody brownell wrote: > > That is interesting. > > > > There is one thread keeping a transaction open it appears from ps > > > > postgres: app app xxx(42644) idle in transaction > > That shouldn't be a problem on itself, "idle in transaction" happens all > the time between 2 commands in the same transaction... you only have a > problem if you see the same PID always "idle", that means somebody left > an open transaction and left for lunch. > > [snip] > > this was with the "Idle in transaction" though..... > > This probably means you don't have long running transactions currently. > However, if you happen to have just one such long transaction, the dead > space accumulates and normal vacuum will not be able to clean that > anymore. But I guess if you didn't find one now then you should take a > look at Tom's suggestion and bump up debug level to see if autovacuum > picks your table at all... > > > Ah HA! Wondering, my autovacuum naptime is 60 seconds, that is also the interval which I wake up and begin persistence. > > Wondering if I am simply locking autovacuum out of the tables b/c they are on a similar timeline. > > > > I will try a 30 second naptime, if this is it, that should increase the likely hood of falling on the right side of theTX more often. > > > > make sense? > > I don't think that's your problem... vacuum wouldn't be locked out by > any activity which doesn't lock exclusively the table (and I guess > you're not doing that). If your persistence finishes quickly then that's > not the problem. > > Oh, just occured to me... in order to use autovacuum you also need to > enable the statistics collector on row level: > > stats_start_collector = on > stats_row_level = on > > See also: > http://www.postgresql.org/docs/8.1/static/maintenance.html#AUTOVACUUM > > This was not mentioned in the settings in your original post, so I guess > you didn't touch that, and I think they are disabled by default. > > If this is disabled, you should enable it and "pg_ctl reload ....", that > should fix the problem. > > Cheers, > Csaba. > > >
On Wed, 2006-06-21 at 18:39, jody brownell wrote: > that is exactly what I am seeing, one process, no change, always in idle while the others are constantly > changing their state. > > looks like someone opened a tx then is blocking on a queue lock or something. dang. Don't forget to check the statistics collector settings (see below), if that is not correct then autovacuum is indeed not working correctly... I should have put that on the beginning of the mail so you won't overlook it ;-) > > > > Oh, just occured to me... in order to use autovacuum you also need to > > enable the statistics collector on row level: > > > > stats_start_collector = on > > stats_row_level = on > > > > See also: > > http://www.postgresql.org/docs/8.1/static/maintenance.html#AUTOVACUUM > > > > This was not mentioned in the settings in your original post, so I guess > > you didn't touch that, and I think they are disabled by default. > > > > If this is disabled, you should enable it and "pg_ctl reload ....", that > > should fix the problem. > > > > Cheers, > > Csaba.
block and row are always configured on - they are my friend :) thanks On Wednesday 21 June 2006 13:44, Csaba Nagy wrote: > On Wed, 2006-06-21 at 18:39, jody brownell wrote: > > that is exactly what I am seeing, one process, no change, always in idle while the others are constantly > > changing their state. > > > > looks like someone opened a tx then is blocking on a queue lock or something. dang. > > Don't forget to check the statistics collector settings (see below), if > that is not correct then autovacuum is indeed not working correctly... I > should have put that on the beginning of the mail so you won't overlook > it ;-) > > > > > > > Oh, just occured to me... in order to use autovacuum you also need to > > > enable the statistics collector on row level: > > > > > > stats_start_collector = on > > > stats_row_level = on > > > > > > See also: > > > http://www.postgresql.org/docs/8.1/static/maintenance.html#AUTOVACUUM > > > > > > This was not mentioned in the settings in your original post, so I guess > > > you didn't touch that, and I think they are disabled by default. > > > > > > If this is disabled, you should enable it and "pg_ctl reload ....", that > > > should fix the problem. > > > > > > Cheers, > > > Csaba. > > > >
On Wed, Jun 21, 2006 at 10:52:42AM -0300, jody brownell wrote: > A db wide vacuum full outputs this at the end. > > INFO: free space map contains 32848 pages in 159 relations > DETAIL: A total of 24192 page slots are in use (including overhead). > 24192 page slots are required to track all free space. > Current limits are: 4024000 page slots, 2000 relations, using 23705 KB. FWIW, the tail end of a db-wide vacuum FULL doesn't provide useful info about FSM utilization, because it just made everything as compact as possible. My suspicion is that it's taking too long for autovac to get around to this database/table. Dropping the sleep time might help. I see that this table is vacuumed with a delay setting of 0, but if there are other tables with a high delay that could pose a problem. Getting detailed output of what autovac is actually doing as Tom suggested would be a good idea. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Wed, Jun 21, 2006 at 01:21:05PM -0300, jody brownell wrote: > Jun 21 13:04:04 vanquish postgres[3311]: [19-1] DEBUG: "target": removed 5645231 row versions in 106508 pages > Jun 21 13:04:04 vanquish postgres[3311]: [19-2] DETAIL: CPU 3.37s/1.23u sec elapsed 40.63 sec. > Jun 21 13:04:04 vanquish postgres[3311]: [20-1] DEBUG: "target": found 5645231 removable, 1296817 nonremovable row versionsin 114701 pages > Jun 21 13:04:04 vanquish postgres[3311]: [20-2] DETAIL: 0 dead row versions cannot be removed yet. So the table contained 5.6M dead rows and 1.3M live rows. I think you should forget about having autovacuum keep this table in-check and add manual vacuum commands to your code. Autovac is intended to deal with 99% of use cases; this is pretty clearly in the 1% it can't handle. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
OK.... this was over a 12 - 16 hour period of not having anything done with it though right? I am assuming if autovacuum were active through out that period, we would be somewhat better off ...is that not accurate? On Wednesday 21 June 2006 16:38, Jim C. Nasby wrote: > 5
BTW, in production with a similar load - autovacuum with default out of the box settings seems to work quite well.... I double checked this earlier today. On Wednesday 21 June 2006 16:38, Jim C. Nasby wrote: > 5
On Wed, Jun 21, 2006 at 04:41:45PM -0300, jody brownell wrote: > BTW, in production with a similar load - autovacuum with default out of the box > settings seems to work quite well.... > > I double checked this earlier today. So what's different between production and the machine with the problem? The issue with autovac is that it will only vacuum one table at a time, so if it's off vacuuming some other table for a long period of time it won't be touching this table, which will be a problem. Now, if that's actually what's happening... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Well, for one we did introduce a TX leak which was preventing autovac from running. I guess that was _the_ issue. I have since fixed it and an now testing.... looks much better, nothing concerning.... (fingers crossed until morning :)). debug logs are full of vac/anal of the tables... so, for now I am back on track moving forward... Now that auto vac is actually running, the box is feeling slightly more sluggish. BTW - As soon as we deliver to QA, I will post the test case for the memory leak I was seeing the other day. (I have not forgotten, I am just swamped) Thanks for the help all. Much appreciated. Cheers. On Wednesday 21 June 2006 19:11, Jim C. Nasby wrote: > On Wed, Jun 21, 2006 at 04:41:45PM -0300, jody brownell wrote: > > BTW, in production with a similar load - autovacuum with default out of the box > > settings seems to work quite well.... > > > > I double checked this earlier today. > > So what's different between production and the machine with the problem? > > The issue with autovac is that it will only vacuum one table at a time, > so if it's off vacuuming some other table for a long period of time it > won't be touching this table, which will be a problem. Now, if that's > actually what's happening...
Hi, Csaba, Csaba Nagy wrote: > Well, your application might be completely well behaved and still your > DBA (or your favorite DB access tool for that matter) can leave open > transactions in an interactive session. It never hurts to check if you > actually have "idle in transaction" sessions. It happened a few times to > us, some of those were bad coding on ad-hoc tools written by us, others > were badly behaved DB access tools opening a transaction immediately > after connect and after each successful command, effectively leaving an > open transaction when leaving it open while having lunch... Some older JDBC driver versions had the bug that they always had an open transaction, thus an application server having some pooled connections lingering around could block vacuum forever. Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
Jim C. Nasby wrote: > On Wed, Jun 21, 2006 at 01:21:05PM -0300, jody brownell wrote: >> Jun 21 13:04:04 vanquish postgres[3311]: [19-1] DEBUG: "target": removed 5645231 row versions in 106508 pages >> Jun 21 13:04:04 vanquish postgres[3311]: [19-2] DETAIL: CPU 3.37s/1.23u sec elapsed 40.63 sec. >> Jun 21 13:04:04 vanquish postgres[3311]: [20-1] DEBUG: "target": found 5645231 removable, 1296817 nonremovable row versionsin 114701 pages >> Jun 21 13:04:04 vanquish postgres[3311]: [20-2] DETAIL: 0 dead row versions cannot be removed yet. > > So the table contained 5.6M dead rows and 1.3M live rows. > > I think you should forget about having autovacuum keep this table > in-check and add manual vacuum commands to your code. Autovac is > intended to deal with 99% of use cases; this is pretty clearly in the 1% > it can't handle. Maybe your free space map is configured to small, can you watch out for log messages telling to increase it? HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org