Persistent dead rows - Mailing list pgsql-general

From Malcolm McLean
Subject Persistent dead rows
Date
Msg-id 3823FE6BD7EA064C961B2559C9FED9345DE780@zaidsexch001.interpharm.com
Whole thread Raw
Responses Re: Persistent dead rows  (Richard Huxton <dev@archonet.com>)
List pgsql-general
Hi,
 
We are currently having a problem with one of our tables containing far too many dead rows. The table in question will have a few hundred thousand inserts and deletes per day and usually builds up quite a large dead row count that starts to affect the performance of the queries select from the table.
 
However, it seems that when the dead row count reaches around 700000 it drops to 0 again and all is fast once more.
 
I know that vacuuming is supposed to recover these, but it doesn't seem to be happening. Here is output from my analyze, vacuum, reindex and cluster commands:
claim=# ANALYZE VERBOSE trans_queue;
INFO:  analyzing "public.trans_queue"
INFO:  "trans_queue": scanned 1749 of 1749 pages, containing 20383 live rows and 137327 dead rows; 20383 rows in sample, 20383 estimated total rows
ANALYZE
claim=# VACUUM VERBOSE ANALYZE trans_queue;
INFO:  vacuuming "public.trans_queue"
INFO:  index "tq_index_1" now contains 100499 row versions in 307 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.09 sec.
INFO:  index "pk_trans_queue" now contains 157730 row versions in 477 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.00 sec.
INFO:  index "tq_index_2" now contains 157730 row versions in 1316 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.26 sec.
INFO:  index "tq_index_3" now contains 157730 row versions in 641 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.16 sec.
INFO:  "trans_queue": found 0 removable, 157730 nonremovable row versions in 1749 pages
DETAIL:  137344 dead row versions cannot be removed yet.
There were 1 unused item pointers.
0 pages are entirely empty.
CPU 0.02s/0.03u sec elapsed 0.65 sec.
INFO:  analyzing "public.trans_queue"
INFO:  "trans_queue": scanned 1749 of 1749 pages, containing 20386 live rows and 137344 dead rows; 20386 rows in sample, 20386 estimated total rows
VACUUM
claim=# VACUUM FULL VERBOSE ANALYZE trans_queue;
INFO:  vacuuming "public.trans_queue"
INFO:  "trans_queue": found 4 removable, 157808 nonremovable row versions in 1749 pages
DETAIL:  137417 dead row versions cannot be removed yet.
Nonremovable row versions range from 79 to 99 bytes long.
There were 1 unused item pointers.
Total free space (including removable row versions) is 131552 bytes.
0 pages are or will become empty, including 0 at the end of the table.
25 pages containing 8472 free bytes are potential move destinations.
CPU 0.00s/0.02u sec elapsed 0.11 sec.
INFO:  index "tq_index_1" now contains 100509 row versions in 307 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.11 sec.
INFO:  index "pk_trans_queue" now contains 157808 row versions in 477 pages
DETAIL:  4 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.02u sec elapsed 0.02 sec.
INFO:  index "tq_index_2" now contains 157808 row versions in 1316 pages
DETAIL:  4 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.03u sec elapsed 0.26 sec.
INFO:  index "tq_index_3" now contains 157808 row versions in 641 pages
DETAIL:  4 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.01u sec elapsed 0.18 sec.
INFO:  "trans_queue": moved 76 row versions, truncated 1749 to 1749 pages
DETAIL:  CPU 0.00s/0.01u sec elapsed 0.23 sec.
INFO:  index "tq_index_1" now contains 100509 row versions in 308 pages
DETAIL:  64 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "pk_trans_queue" now contains 157808 row versions in 478 pages
DETAIL:  76 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.10 sec.
INFO:  index "tq_index_2" now contains 157808 row versions in 1317 pages
DETAIL:  76 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.02u sec elapsed 0.10 sec.
INFO:  index "tq_index_3" now contains 157808 row versions in 641 pages
DETAIL:  76 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.23 sec.
INFO:  analyzing "public.trans_queue"
INFO:  "trans_queue": scanned 1749 of 1749 pages, containing 20391 live rows and 137417 dead rows; 20391 rows in sample, 20391 estimated total rows
VACUUM
claim=# REINDEX TABLE trans_queue;
REINDEX
claim=# ANALYZE VERBOSE trans_queue;
INFO:  analyzing "public.trans_queue"
INFO:  "trans_queue": scanned 1760 of 1760 pages, containing 21390 live rows and 137417 dead rows; 21390 rows in sample, 21390 estimated total rows
ANALYZE
claim=# CLUSTER trans_queue;
CLUSTER
claim=# ANALYZE VERBOSE trans_queue;
INFO:  analyzing "public.trans_queue"
INFO:  "trans_queue": scanned 244 of 244 pages, containing 21390 live rows and 0 dead rows; 21390 rows in sample, 21390 estimated total rows
ANALYZE
claim=#
 
As can be seen, neither a plain vacuum, full vacuum or reindex was able to recover/remove the dead rows being reported by the analyze, only a cluster was able to reduce that count to 0 again. In this example, the number of dead rows is still low, but when it starts reaching 700000 the selects on the table start slowing down considerably.
 
What is causing those dead rows to not get cleared even by a full vacuum? Is there any way keep them low without having to run a cluster command as that is a locking statement and requires me to close all java applications that are connecting to that table before running the cluster. There is an autovacuum process running that has a low scale factor set so as to force regular vacuuming of this table, but I'm not sure if this is what is making the dead row count to drop to 0. If it is, I will sacrifice more frequent vacuuming of the other 100 or so tables in order to keep this one's dead row count low.
 
We are currently running version 8.1.3 and are in the process of upgrading to 8.2.3, which is still being tested in development, so I cannot see how long it has been since the last autovacuum ran on the table.
 
Any help would be appreciated.
 
Regards,
Malcolm McLean
 

This information is intended only for the person or entity to which it is addressed and may contain private, confidential, proprietary and/or privileged material and may be subject to confidentiality agreements. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify us immediately by return email or telephone +27 12 673-1400 and destroy the original message.

Any review, retransmission, dissemination, or any other use of or taking of any action in reliance upon this information, by persons or entities other than the intended recipient, is prohibited.

The company is neither liable for proper, complete transmission of the information contained in this communication, any delay in its receipt or that the mail is virus-free.

pgsql-general by date:

Previous
From: Hannes Dorbath
Date:
Subject: Re: DBI-Link 2.0
Next
From: Richard Huxton
Date:
Subject: Re: SQL textbook