Thread: Persistent dead rows

Persistent dead rows

From
"Malcolm McLean"
Date:
Hi,
 
We are currently having a problem with one of our tables containing far too many dead rows. The table in question will have a few hundred thousand inserts and deletes per day and usually builds up quite a large dead row count that starts to affect the performance of the queries select from the table.
 
However, it seems that when the dead row count reaches around 700000 it drops to 0 again and all is fast once more.
 
I know that vacuuming is supposed to recover these, but it doesn't seem to be happening. Here is output from my analyze, vacuum, reindex and cluster commands:
claim=# ANALYZE VERBOSE trans_queue;
INFO:  analyzing "public.trans_queue"
INFO:  "trans_queue": scanned 1749 of 1749 pages, containing 20383 live rows and 137327 dead rows; 20383 rows in sample, 20383 estimated total rows
ANALYZE
claim=# VACUUM VERBOSE ANALYZE trans_queue;
INFO:  vacuuming "public.trans_queue"
INFO:  index "tq_index_1" now contains 100499 row versions in 307 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.09 sec.
INFO:  index "pk_trans_queue" now contains 157730 row versions in 477 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.00 sec.
INFO:  index "tq_index_2" now contains 157730 row versions in 1316 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.26 sec.
INFO:  index "tq_index_3" now contains 157730 row versions in 641 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.16 sec.
INFO:  "trans_queue": found 0 removable, 157730 nonremovable row versions in 1749 pages
DETAIL:  137344 dead row versions cannot be removed yet.
There were 1 unused item pointers.
0 pages are entirely empty.
CPU 0.02s/0.03u sec elapsed 0.65 sec.
INFO:  analyzing "public.trans_queue"
INFO:  "trans_queue": scanned 1749 of 1749 pages, containing 20386 live rows and 137344 dead rows; 20386 rows in sample, 20386 estimated total rows
VACUUM
claim=# VACUUM FULL VERBOSE ANALYZE trans_queue;
INFO:  vacuuming "public.trans_queue"
INFO:  "trans_queue": found 4 removable, 157808 nonremovable row versions in 1749 pages
DETAIL:  137417 dead row versions cannot be removed yet.
Nonremovable row versions range from 79 to 99 bytes long.
There were 1 unused item pointers.
Total free space (including removable row versions) is 131552 bytes.
0 pages are or will become empty, including 0 at the end of the table.
25 pages containing 8472 free bytes are potential move destinations.
CPU 0.00s/0.02u sec elapsed 0.11 sec.
INFO:  index "tq_index_1" now contains 100509 row versions in 307 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.11 sec.
INFO:  index "pk_trans_queue" now contains 157808 row versions in 477 pages
DETAIL:  4 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.02u sec elapsed 0.02 sec.
INFO:  index "tq_index_2" now contains 157808 row versions in 1316 pages
DETAIL:  4 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.03u sec elapsed 0.26 sec.
INFO:  index "tq_index_3" now contains 157808 row versions in 641 pages
DETAIL:  4 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.01u sec elapsed 0.18 sec.
INFO:  "trans_queue": moved 76 row versions, truncated 1749 to 1749 pages
DETAIL:  CPU 0.00s/0.01u sec elapsed 0.23 sec.
INFO:  index "tq_index_1" now contains 100509 row versions in 308 pages
DETAIL:  64 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "pk_trans_queue" now contains 157808 row versions in 478 pages
DETAIL:  76 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.10 sec.
INFO:  index "tq_index_2" now contains 157808 row versions in 1317 pages
DETAIL:  76 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.02u sec elapsed 0.10 sec.
INFO:  index "tq_index_3" now contains 157808 row versions in 641 pages
DETAIL:  76 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.23 sec.
INFO:  analyzing "public.trans_queue"
INFO:  "trans_queue": scanned 1749 of 1749 pages, containing 20391 live rows and 137417 dead rows; 20391 rows in sample, 20391 estimated total rows
VACUUM
claim=# REINDEX TABLE trans_queue;
REINDEX
claim=# ANALYZE VERBOSE trans_queue;
INFO:  analyzing "public.trans_queue"
INFO:  "trans_queue": scanned 1760 of 1760 pages, containing 21390 live rows and 137417 dead rows; 21390 rows in sample, 21390 estimated total rows
ANALYZE
claim=# CLUSTER trans_queue;
CLUSTER
claim=# ANALYZE VERBOSE trans_queue;
INFO:  analyzing "public.trans_queue"
INFO:  "trans_queue": scanned 244 of 244 pages, containing 21390 live rows and 0 dead rows; 21390 rows in sample, 21390 estimated total rows
ANALYZE
claim=#
 
As can be seen, neither a plain vacuum, full vacuum or reindex was able to recover/remove the dead rows being reported by the analyze, only a cluster was able to reduce that count to 0 again. In this example, the number of dead rows is still low, but when it starts reaching 700000 the selects on the table start slowing down considerably.
 
What is causing those dead rows to not get cleared even by a full vacuum? Is there any way keep them low without having to run a cluster command as that is a locking statement and requires me to close all java applications that are connecting to that table before running the cluster. There is an autovacuum process running that has a low scale factor set so as to force regular vacuuming of this table, but I'm not sure if this is what is making the dead row count to drop to 0. If it is, I will sacrifice more frequent vacuuming of the other 100 or so tables in order to keep this one's dead row count low.
 
We are currently running version 8.1.3 and are in the process of upgrading to 8.2.3, which is still being tested in development, so I cannot see how long it has been since the last autovacuum ran on the table.
 
Any help would be appreciated.
 
Regards,
Malcolm McLean
 

This information is intended only for the person or entity to which it is addressed and may contain private, confidential, proprietary and/or privileged material and may be subject to confidentiality agreements. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify us immediately by return email or telephone +27 12 673-1400 and destroy the original message.

Any review, retransmission, dissemination, or any other use of or taking of any action in reliance upon this information, by persons or entities other than the intended recipient, is prohibited.

The company is neither liable for proper, complete transmission of the information contained in this communication, any delay in its receipt or that the mail is virus-free.

Re: Persistent dead rows

From
Richard Huxton
Date:
Malcolm McLean wrote:
> Hi,
>
> We are currently having a problem with one of our tables containing far
> too many dead rows. The table in question will have a few hundred
> thousand inserts and deletes per day and usually builds up quite a large
> dead row count that starts to affect the performance of the queries
> select from the table.
>
> However, it seems that when the dead row count reaches around 700000 it
> drops to 0 again and all is fast once more.
>
> I know that vacuuming is supposed to recover these, but it doesn't seem
> to be happening. Here is output from my analyze, vacuum, reindex and
> cluster commands:
>
> claim=# ANALYZE VERBOSE trans_queue;
> INFO:  analyzing "public.trans_queue"
> INFO:  "trans_queue": scanned 1749 of 1749 pages, containing 20383 live
> rows and 137327 dead rows; 20383 rows in sample, 20383 estimated total
> rows
> ANALYZE
> claim=# VACUUM VERBOSE ANALYZE trans_queue;

> INFO:  "trans_queue": found 0 removable, 157730 nonremovable row
> versions in 1749 pages
> DETAIL:  137344 dead row versions cannot be removed yet.

This is usually because a transaction is hanging around that might be
able to see them. The vacuum can't recover them until that transaction
has completed.

> What is causing those dead rows to not get cleared even by a full
> vacuum? Is there any way keep them low without having to run a cluster
> command as that is a locking statement and requires me to close all java
> applications that are connecting to that table before running the
> cluster.

Aha! I'll bet your java app (or something in the stack) is issuing a
BEGIN and just sitting there. Try disconnecting the apps and seeing if
vacuum recovers rows then. If so, you'll need to get your java code to
stop sitting on open transactions.

--
   Richard Huxton
   Archonet Ltd

Re: Persistent dead rows

From
"Malcolm McLean"
Date:
Richard Huxton wrote:
>> What is causing those dead rows to not get cleared even by a full
>> vacuum? Is there any way keep them low without having to run a
cluster
>> command as that is a locking statement and requires me to close all
java
>> applications that are connecting to that table before running the
>> cluster.
>
> Aha! I'll bet your java app (or something in the stack) is issuing a
> BEGIN and just sitting there. Try disconnecting the apps and seeing if

> vacuum recovers rows then. If so, you'll need to get your java code to

> stop sitting on open transactions.

I tested this theory by stopping java applications that were connected
to the database and all other connections that were using transactions
and the full vacuum was still unable to remove the dead rows.

What I'm still wondering about, is why the dead row count rises
incredibly high, then all of a sudden drops to 0 again when the java
apps never stop running.

Malcolm

--
   Richard Huxton
   Archonet Ltd
This information is intended only for the person or entity to which it is addressed and may contain private,
confidential,proprietary and/or privileged material and may be subject to confidentiality agreements. If you are not
theintended recipient, you are hereby notified that any dissemination, distribution or copying of this e-mail is
strictlyprohibited. If you have received this e-mail in error, please notify us immediately by return email or
telephone+27 12 673-1400 and destroy the original message.  

Any review, retransmission, dissemination, or any other use of or taking of any action in reliance upon this
information,by persons or entities other than the intended recipient, is prohibited.  

The company is neither liable for proper, complete transmission of the information contained in this communication, any
delayin its receipt or that the mail is virus-free. 

Re: Persistent dead rows

From
Richard Huxton
Date:
Malcolm McLean wrote:
> Richard Huxton wrote:
>>> What is causing those dead rows to not get cleared even by a full
>>> vacuum? Is there any way keep them low without having to run a
> cluster
>>> command as that is a locking statement and requires me to close all
> java
>>> applications that are connecting to that table before running the
>>> cluster.
>> Aha! I'll bet your java app (or something in the stack) is issuing a
>> BEGIN and just sitting there. Try disconnecting the apps and seeing if
>
>> vacuum recovers rows then. If so, you'll need to get your java code to
>
>> stop sitting on open transactions.
>
> I tested this theory by stopping java applications that were connected
> to the database and all other connections that were using transactions
> and the full vacuum was still unable to remove the dead rows.
>
> What I'm still wondering about, is why the dead row count rises
> incredibly high, then all of a sudden drops to 0 again when the java
> apps never stop running.

Are you certain there's no open transaction? Perhaps keep an eye on
SELECT * FROM pg_stat_activity - there might be something you don't know
about.

If it was the autovacuum interfering, I'd expect a lock failure.

--
   Richard Huxton
   Archonet Ltd

Re: Persistent dead rows

From
"Malcolm McLean"
Date:
Richard Huxton wrote:
> Malcolm McLean wrote:
>> I tested this theory by stopping java applications that were
connected
>> to the database and all other connections that were using
transactions
>> and the full vacuum was still unable to remove the dead rows.
>>
>> What I'm still wondering about, is why the dead row count rises
>> incredibly high, then all of a sudden drops to 0 again when the java
>> apps never stop running.
>
> Are you certain there's no open transaction? Perhaps keep an eye on
> SELECT * FROM pg_stat_activity - there might be something you don't
know
> about.

Okay, I just stopped all java processes again and all pg_stat_activity
returned were IDLE's and no IDLE in transactions. The strange this is
that a cluster command removes the dead rows, and this can only be run
when all the java apps have been stopped.

> If it was the autovacuum interfering, I'd expect a lock failure.

I doubt autovacuum is interfering, I think it is the process that is
clearing the dead rows. If that is the case, then why is only autovacuum
able to clear them and not a manual vacuum.

Malcolm
This information is intended only for the person or entity to which it is addressed and may contain private,
confidential,proprietary and/or privileged material and may be subject to confidentiality agreements. If you are not
theintended recipient, you are hereby notified that any dissemination, distribution or copying of this e-mail is
strictlyprohibited. If you have received this e-mail in error, please notify us immediately by return email or
telephone+27 12 673-1400 and destroy the original message.  

Any review, retransmission, dissemination, or any other use of or taking of any action in reliance upon this
information,by persons or entities other than the intended recipient, is prohibited.  

The company is neither liable for proper, complete transmission of the information contained in this communication, any
delayin its receipt or that the mail is virus-free. 

Re: Persistent dead rows

From
Richard Huxton
Date:
Malcolm McLean wrote:
> Richard Huxton wrote:
>> Malcolm McLean wrote:
>>> I tested this theory by stopping java applications that were
> connected
>>> to the database and all other connections that were using
> transactions
>>> and the full vacuum was still unable to remove the dead rows.
>>>
>>> What I'm still wondering about, is why the dead row count rises
>>> incredibly high, then all of a sudden drops to 0 again when the java
>>> apps never stop running.
>> Are you certain there's no open transaction? Perhaps keep an eye on
>> SELECT * FROM pg_stat_activity - there might be something you don't
> know
>> about.
>
> Okay, I just stopped all java processes again and all pg_stat_activity
> returned were IDLE's and no IDLE in transactions. The strange this is
> that a cluster command removes the dead rows, and this can only be run
> when all the java apps have been stopped.

Well, both cluster and vacuum full will require a lock on the table. But
  they're more or less doing the same thing, so why the one should work
and the other not I don't know.

>> If it was the autovacuum interfering, I'd expect a lock failure.
>
> I doubt autovacuum is interfering, I think it is the process that is
> clearing the dead rows. If that is the case, then why is only autovacuum
> able to clear them and not a manual vacuum.

Makes no sense to me - autovacuum has no magic about it, it just runs
vacuum.

You've got the permissions to vacuum the table, otherwise you'd get an
error. You've checked for transactions in pg_stat_activity. Don't see
how it's worth checking pg_locks if you don't have any transactions.
This one's puzzling me.

I've had a quick look at the release notes for 8.1.x and can't see
anything obvious on this issue, but it might be worth upgrading to 8.1.8
to see if that just makes it go away.

--
   Richard Huxton
   Archonet Ltd

Re: Persistent dead rows

From
Tom Lane
Date:
Richard Huxton <dev@archonet.com> writes:
> Well, both cluster and vacuum full will require a lock on the table. But
>   they're more or less doing the same thing, so why the one should work
> and the other not I don't know.

CLUSTER isn't MVCC-safe --- it'll drop rows that are committed dead
even if they are potentially still visible to some open transaction.
The OP has *clearly* got an open-transaction problem.

            regards, tom lane