Thread: Vacuum and freeing dead rows

Vacuum and freeing dead rows

From
Simon T
Date:
Hi,

I have a very heavily updated table in a Postgres 9.6.10 database with
lots of disk bloat. Every row is updated about once a minute, and
little to no inserts. Approx 18k rows total. The table has bloated
from ~1700 KB to about 6 GB over a few weeks time. I'm trying to
understand why vacuum hasn't made dead rows available for re-use.


appdb=# \d+ app.heartbeat;
                                     Table "app.heartbeat"
       Column        |            Type             |   Modifiers   |
Storage  | Stats target | Description
---------------------+-----------------------------+---------------+----------+--------------+-------------
 endpoint            | uuid                        | not null      |
plain    |              |
 record_timestamp    | timestamp without time zone | not null      |
plain    |              |
 heartbeat_timestamp | timestamp without time zone | not null      |
plain    |              |
 outer_ip            | character varying(46)       | not null      |
extended |              |
 inner_ip            | character varying(46)       | not null      |
extended |              |
 last_notified       | timestamp without time zone | default now() |
plain    |              |
Indexes:
    "heartbeat_pkey" PRIMARY KEY, btree (endpoint)
    "heartbeat_endpoint_idx" btree (endpoint)
Options: fillfactor=45


I reset the stats on the tuple and a few moments later ran:

SELECT pg_stat_reset_single_table_counters(40237);

SELECT c.oid, c.relname, c.relfrozenxid, c.relminmxid, c.relreplident,
relpages, reltuples, pg_stat_get_live_tuples(c.oid) AS n_live_tup
     , pg_stat_get_dead_tuples(c.oid) AS n_dead_tup, n_tup_ins,
n_tup_upd, n_tup_del, n_tup_hot_upd, n_mod_since_analyze,
last_analyze, last_autoanalyze, autovacuum_count, autoanalyze_count
FROM pg_class c, pg_stat_all_tables av where c.oid = av.relid and
c.relname = av.relname and c.relname = 'heartbeat' order by
av.relname;

  oid | relname | relfrozenxid | relminmxid | relreplident | relpages
| reltuples | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd | n_mod_since_analyze | last_analyze |
last_autoanalyze | autovacuum_count | autoanalyze_count

-------+----------------+--------------+------------+--------------+----------+-----------+------------+------------+-----------+-----------+-----------+---------------+---------------------+--------------+------------------+------------------+-------------------
 40237 | heartbeat | 3451305052 | 55105127 | d | 674616 | 15494 | 0 |
6234 | 0 | 6234 | 0 | 3758 | 6234 | | | 0 | 0
(1 row)


I also found it interesting in n_tup_upd vs n_tup_hot_upd: HOT updates
were 3758, vs 6234 overall. I was expecting all updates to be HOT
since the PK isn't changing, just record values (timestamps, IPs), and
my fillfactor is low. Maybe not relevant but thought it worth
mentioning just in case, because both index and table is bloated.


So when I try vacuum it, the vacuum reports:

DETAIL: 55330122 dead row versions cannot be removed yet.
INFO: "hearbeat": found 0 removable, 55347745 nonremovable row
versions in 675230 out of 675230 pages

How can I determine what is preventing the dead rows from being
removed? Is there a way to identify what transaction if any is
preventing their re-use. Or am I misinterpreting the results?

Full vacuum log:

appdb=# vacuum verbose app.hearbeat;
INFO: vacuuming "app.hearbeat"
INFO: index "hearbeat_pkey" now contains 23435126 row versions in 112094 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.38s/0.40u sec elapsed 1.07 sec.
INFO: index "hearbeat_endpoint_idx" now contains 23435358 row versions
in 112185 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.23s/0.35u sec elapsed 0.70 sec.
INFO: "hearbeat": found 0 removable, 55347745 nonremovable row
versions in 675230 out of 675230 pages
DETAIL: 55330122 dead row versions cannot be removed yet.
There were 199391 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 3.40s/11.46u sec elapsed 18.39 sec.
VACUUM

And in case it is relevant:

appdb=# SELECT pid, datname, usename, state, backend_xmin
appdb-# FROM pg_stat_activity
appdb-# WHERE backend_xmin IS NOT NULL
appdb-# ORDER BY age(backend_xmin) DESC;
  pid | datname | usename | state | backend_xmin
-------+---------------+----------+---------------------+--------------
 10921 | appdb | app | idle in transaction | 3501305052
 10919 | appdb | app | idle in transaction | 3501305052
 10916 | appdb | app | idle in transaction | 3501305052
 27935 | appdb | app | idle in transaction | 3501305052
 24500 | appdb | postgres | active | 3501305052
 10914 | appdb | app | active | 3501305052
 20671 | appdb | postgres | active | 3501305052
 11817 | appdb | app | active | 3501305052
  1988 | appdb | app | active | 3501305052
 15041 | appdb | postgres | active | 3501305052
  9916 | appdb | postgres | active | 3501305052
 10912 | appdb | app | idle in transaction | 3501305052
 10909 | appdb | app | idle in transaction | 3501305052
(13 rows)

appdb=#
appdb=# SELECT slot_name, slot_type, database, xmin
appdb-# FROM pg_replication_slots
appdb-# ORDER BY age(xmin) DESC;
 slot_name | slot_type | database | xmin
-----------+-----------+----------+------
(0 rows)

appdb=#
appdb=# SELECT gid, prepared, owner, database, transaction AS xmin
appdb-# FROM pg_prepared_xacts
appdb-# ORDER BY age(transaction) DESC;
 gid | prepared | owner | database | xmin
-----+----------+-------+----------+------
(0 rows)

Looking through my autovacuum logs, we are only seeing autovacuum
trigger once every 20-40 minutes, which in a table that is essentially
overwritten completely every 60s will explain part of the problem. So
I am also looking to tweak my vacuum settings so it is more aggressive
with this table. However the vacuum above not freeing up any dead rows
has me worried it won't be sufficient.

One other thing to add, we had a similar problem earlier in the year
and a vacuum full apparently didn't clear up the disk space, which
seems very odd to me but unfortunately I wasn't there to check in more
detail. We ended up copying the data into a temporary table,
truncating the original table, and inserting the data back into the
original.

Any advice is much appreciated!

Kind regards,

Simon



Re: Vacuum and freeing dead rows

From
Ron
Date:
On 7/5/19 3:16 AM, Simon T wrote:
Hi,

I have a very heavily updated table in a Postgres 9.6.10 database with
lots of disk bloat. Every row is updated about once a minute, and
little to no inserts. Approx 18k rows total. The table has bloated
from ~1700 KB to about 6 GB over a few weeks time. I'm trying to
understand why vacuum hasn't made dead rows available for re-use.

[snip]
And in case it is relevant:

appdb=# SELECT pid, datname, usename, state, backend_xmin
appdb-# FROM pg_stat_activity
appdb-# WHERE backend_xmin IS NOT NULL
appdb-# ORDER BY age(backend_xmin) DESC; pid | datname | usename | state | backend_xmin
-------+---------------+----------+---------------------+--------------10921 | appdb | app | idle in transaction | 3501305052

"idle IN TRANSACTION" is never good.  Transactions should always be as short as possible.

 10919 | appdb | app | idle in transaction | 350130505210916 | appdb | app | idle in transaction | 350130505227935 | appdb | app | idle in transaction | 350130505224500 | appdb | postgres | active | 350130505210914 | appdb | app | active | 350130505220671 | appdb | postgres | active | 350130505211817 | appdb | app | active | 3501305052 1988 | appdb | app | active | 350130505215041 | appdb | postgres | active | 3501305052 9916 | appdb | postgres | active | 350130505210912 | appdb | app | idle in transaction | 350130505210909 | appdb | app | idle in transaction | 3501305052
(13 rows)

Add backend_start to that query.  I'd kill any idle transactions are more than 30 minutes old.  (Of course, since they're "idle IN TRANSACTION", you'd lose stuff.

--
Angular momentum makes the world go 'round.