Thread: Persistent dead rows
INFO: analyzing "public.trans_queue"
INFO: "trans_queue": scanned 1749 of 1749 pages, containing 20383 live rows and 137327 dead rows; 20383 rows in sample, 20383 estimated total rows
ANALYZE
claim=# VACUUM VERBOSE ANALYZE trans_queue;
INFO: vacuuming "public.trans_queue"
INFO: index "tq_index_1" now contains 100499 row versions in 307 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.09 sec.
INFO: index "pk_trans_queue" now contains 157730 row versions in 477 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.00 sec.
INFO: index "tq_index_2" now contains 157730 row versions in 1316 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.26 sec.
INFO: index "tq_index_3" now contains 157730 row versions in 641 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.16 sec.
INFO: "trans_queue": found 0 removable, 157730 nonremovable row versions in 1749 pages
DETAIL: 137344 dead row versions cannot be removed yet.
There were 1 unused item pointers.
0 pages are entirely empty.
CPU 0.02s/0.03u sec elapsed 0.65 sec.
INFO: analyzing "public.trans_queue"
INFO: "trans_queue": scanned 1749 of 1749 pages, containing 20386 live rows and 137344 dead rows; 20386 rows in sample, 20386 estimated total rows
VACUUM
INFO: vacuuming "public.trans_queue"
INFO: "trans_queue": found 4 removable, 157808 nonremovable row versions in 1749 pages
DETAIL: 137417 dead row versions cannot be removed yet.
Nonremovable row versions range from 79 to 99 bytes long.
There were 1 unused item pointers.
Total free space (including removable row versions) is 131552 bytes.
0 pages are or will become empty, including 0 at the end of the table.
25 pages containing 8472 free bytes are potential move destinations.
CPU 0.00s/0.02u sec elapsed 0.11 sec.
INFO: index "tq_index_1" now contains 100509 row versions in 307 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.11 sec.
INFO: index "pk_trans_queue" now contains 157808 row versions in 477 pages
DETAIL: 4 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.02u sec elapsed 0.02 sec.
INFO: index "tq_index_2" now contains 157808 row versions in 1316 pages
DETAIL: 4 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.03u sec elapsed 0.26 sec.
INFO: index "tq_index_3" now contains 157808 row versions in 641 pages
DETAIL: 4 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.01u sec elapsed 0.18 sec.
INFO: "trans_queue": moved 76 row versions, truncated 1749 to 1749 pages
DETAIL: CPU 0.00s/0.01u sec elapsed 0.23 sec.
INFO: index "tq_index_1" now contains 100509 row versions in 308 pages
DETAIL: 64 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "pk_trans_queue" now contains 157808 row versions in 478 pages
DETAIL: 76 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.10 sec.
INFO: index "tq_index_2" now contains 157808 row versions in 1317 pages
DETAIL: 76 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.02u sec elapsed 0.10 sec.
INFO: index "tq_index_3" now contains 157808 row versions in 641 pages
DETAIL: 76 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.23 sec.
INFO: analyzing "public.trans_queue"
INFO: "trans_queue": scanned 1749 of 1749 pages, containing 20391 live rows and 137417 dead rows; 20391 rows in sample, 20391 estimated total rows
VACUUM
claim=# REINDEX TABLE trans_queue;
REINDEX
claim=# ANALYZE VERBOSE trans_queue;
INFO: analyzing "public.trans_queue"
INFO: "trans_queue": scanned 1760 of 1760 pages, containing 21390 live rows and 137417 dead rows; 21390 rows in sample, 21390 estimated total rows
ANALYZE
claim=# CLUSTER trans_queue;
CLUSTER
claim=# ANALYZE VERBOSE trans_queue;
INFO: analyzing "public.trans_queue"
INFO: "trans_queue": scanned 244 of 244 pages, containing 21390 live rows and 0 dead rows; 21390 rows in sample, 21390 estimated total rows
ANALYZE
claim=#
This information is intended only for the person or entity to which it is addressed and may contain private, confidential, proprietary and/or privileged material and may be subject to confidentiality agreements. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify us immediately by return email or telephone +27 12 673-1400 and destroy the original message.
Any review, retransmission, dissemination, or any other use of or taking of any action in reliance upon this information, by persons or entities other than the intended recipient, is prohibited.
The company is neither liable for proper, complete transmission of the information contained in this communication, any delay in its receipt or that the mail is virus-free.
Malcolm McLean wrote: > Hi, > > We are currently having a problem with one of our tables containing far > too many dead rows. The table in question will have a few hundred > thousand inserts and deletes per day and usually builds up quite a large > dead row count that starts to affect the performance of the queries > select from the table. > > However, it seems that when the dead row count reaches around 700000 it > drops to 0 again and all is fast once more. > > I know that vacuuming is supposed to recover these, but it doesn't seem > to be happening. Here is output from my analyze, vacuum, reindex and > cluster commands: > > claim=# ANALYZE VERBOSE trans_queue; > INFO: analyzing "public.trans_queue" > INFO: "trans_queue": scanned 1749 of 1749 pages, containing 20383 live > rows and 137327 dead rows; 20383 rows in sample, 20383 estimated total > rows > ANALYZE > claim=# VACUUM VERBOSE ANALYZE trans_queue; > INFO: "trans_queue": found 0 removable, 157730 nonremovable row > versions in 1749 pages > DETAIL: 137344 dead row versions cannot be removed yet. This is usually because a transaction is hanging around that might be able to see them. The vacuum can't recover them until that transaction has completed. > What is causing those dead rows to not get cleared even by a full > vacuum? Is there any way keep them low without having to run a cluster > command as that is a locking statement and requires me to close all java > applications that are connecting to that table before running the > cluster. Aha! I'll bet your java app (or something in the stack) is issuing a BEGIN and just sitting there. Try disconnecting the apps and seeing if vacuum recovers rows then. If so, you'll need to get your java code to stop sitting on open transactions. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: >> What is causing those dead rows to not get cleared even by a full >> vacuum? Is there any way keep them low without having to run a cluster >> command as that is a locking statement and requires me to close all java >> applications that are connecting to that table before running the >> cluster. > > Aha! I'll bet your java app (or something in the stack) is issuing a > BEGIN and just sitting there. Try disconnecting the apps and seeing if > vacuum recovers rows then. If so, you'll need to get your java code to > stop sitting on open transactions. I tested this theory by stopping java applications that were connected to the database and all other connections that were using transactions and the full vacuum was still unable to remove the dead rows. What I'm still wondering about, is why the dead row count rises incredibly high, then all of a sudden drops to 0 again when the java apps never stop running. Malcolm -- Richard Huxton Archonet Ltd This information is intended only for the person or entity to which it is addressed and may contain private, confidential,proprietary and/or privileged material and may be subject to confidentiality agreements. If you are not theintended recipient, you are hereby notified that any dissemination, distribution or copying of this e-mail is strictlyprohibited. If you have received this e-mail in error, please notify us immediately by return email or telephone+27 12 673-1400 and destroy the original message. Any review, retransmission, dissemination, or any other use of or taking of any action in reliance upon this information,by persons or entities other than the intended recipient, is prohibited. The company is neither liable for proper, complete transmission of the information contained in this communication, any delayin its receipt or that the mail is virus-free.
Malcolm McLean wrote: > Richard Huxton wrote: >>> What is causing those dead rows to not get cleared even by a full >>> vacuum? Is there any way keep them low without having to run a > cluster >>> command as that is a locking statement and requires me to close all > java >>> applications that are connecting to that table before running the >>> cluster. >> Aha! I'll bet your java app (or something in the stack) is issuing a >> BEGIN and just sitting there. Try disconnecting the apps and seeing if > >> vacuum recovers rows then. If so, you'll need to get your java code to > >> stop sitting on open transactions. > > I tested this theory by stopping java applications that were connected > to the database and all other connections that were using transactions > and the full vacuum was still unable to remove the dead rows. > > What I'm still wondering about, is why the dead row count rises > incredibly high, then all of a sudden drops to 0 again when the java > apps never stop running. Are you certain there's no open transaction? Perhaps keep an eye on SELECT * FROM pg_stat_activity - there might be something you don't know about. If it was the autovacuum interfering, I'd expect a lock failure. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > Malcolm McLean wrote: >> I tested this theory by stopping java applications that were connected >> to the database and all other connections that were using transactions >> and the full vacuum was still unable to remove the dead rows. >> >> What I'm still wondering about, is why the dead row count rises >> incredibly high, then all of a sudden drops to 0 again when the java >> apps never stop running. > > Are you certain there's no open transaction? Perhaps keep an eye on > SELECT * FROM pg_stat_activity - there might be something you don't know > about. Okay, I just stopped all java processes again and all pg_stat_activity returned were IDLE's and no IDLE in transactions. The strange this is that a cluster command removes the dead rows, and this can only be run when all the java apps have been stopped. > If it was the autovacuum interfering, I'd expect a lock failure. I doubt autovacuum is interfering, I think it is the process that is clearing the dead rows. If that is the case, then why is only autovacuum able to clear them and not a manual vacuum. Malcolm This information is intended only for the person or entity to which it is addressed and may contain private, confidential,proprietary and/or privileged material and may be subject to confidentiality agreements. If you are not theintended recipient, you are hereby notified that any dissemination, distribution or copying of this e-mail is strictlyprohibited. If you have received this e-mail in error, please notify us immediately by return email or telephone+27 12 673-1400 and destroy the original message. Any review, retransmission, dissemination, or any other use of or taking of any action in reliance upon this information,by persons or entities other than the intended recipient, is prohibited. The company is neither liable for proper, complete transmission of the information contained in this communication, any delayin its receipt or that the mail is virus-free.
Malcolm McLean wrote: > Richard Huxton wrote: >> Malcolm McLean wrote: >>> I tested this theory by stopping java applications that were > connected >>> to the database and all other connections that were using > transactions >>> and the full vacuum was still unable to remove the dead rows. >>> >>> What I'm still wondering about, is why the dead row count rises >>> incredibly high, then all of a sudden drops to 0 again when the java >>> apps never stop running. >> Are you certain there's no open transaction? Perhaps keep an eye on >> SELECT * FROM pg_stat_activity - there might be something you don't > know >> about. > > Okay, I just stopped all java processes again and all pg_stat_activity > returned were IDLE's and no IDLE in transactions. The strange this is > that a cluster command removes the dead rows, and this can only be run > when all the java apps have been stopped. Well, both cluster and vacuum full will require a lock on the table. But they're more or less doing the same thing, so why the one should work and the other not I don't know. >> If it was the autovacuum interfering, I'd expect a lock failure. > > I doubt autovacuum is interfering, I think it is the process that is > clearing the dead rows. If that is the case, then why is only autovacuum > able to clear them and not a manual vacuum. Makes no sense to me - autovacuum has no magic about it, it just runs vacuum. You've got the permissions to vacuum the table, otherwise you'd get an error. You've checked for transactions in pg_stat_activity. Don't see how it's worth checking pg_locks if you don't have any transactions. This one's puzzling me. I've had a quick look at the release notes for 8.1.x and can't see anything obvious on this issue, but it might be worth upgrading to 8.1.8 to see if that just makes it go away. -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: > Well, both cluster and vacuum full will require a lock on the table. But > they're more or less doing the same thing, so why the one should work > and the other not I don't know. CLUSTER isn't MVCC-safe --- it'll drop rows that are committed dead even if they are potentially still visible to some open transaction. The OP has *clearly* got an open-transaction problem. regards, tom lane