Vacuum and freeing dead rows - Mailing list pgsql-general

From Simon T
Subject Vacuum and freeing dead rows
Date
Msg-id CAKYm0co_f4T5aQd9exmUuBxqdde2Le3F1cW40UnKtOW8SR4qfw@mail.gmail.com
Whole thread Raw
Responses Re: Vacuum and freeing dead rows  (Ron <ronljohnsonjr@gmail.com>)
List pgsql-general
Hi,

I have a very heavily updated table in a Postgres 9.6.10 database with
lots of disk bloat. Every row is updated about once a minute, and
little to no inserts. Approx 18k rows total. The table has bloated
from ~1700 KB to about 6 GB over a few weeks time. I'm trying to
understand why vacuum hasn't made dead rows available for re-use.


appdb=# \d+ app.heartbeat;
                                     Table "app.heartbeat"
       Column        |            Type             |   Modifiers   |
Storage  | Stats target | Description
---------------------+-----------------------------+---------------+----------+--------------+-------------
 endpoint            | uuid                        | not null      |
plain    |              |
 record_timestamp    | timestamp without time zone | not null      |
plain    |              |
 heartbeat_timestamp | timestamp without time zone | not null      |
plain    |              |
 outer_ip            | character varying(46)       | not null      |
extended |              |
 inner_ip            | character varying(46)       | not null      |
extended |              |
 last_notified       | timestamp without time zone | default now() |
plain    |              |
Indexes:
    "heartbeat_pkey" PRIMARY KEY, btree (endpoint)
    "heartbeat_endpoint_idx" btree (endpoint)
Options: fillfactor=45


I reset the stats on the tuple and a few moments later ran:

SELECT pg_stat_reset_single_table_counters(40237);

SELECT c.oid, c.relname, c.relfrozenxid, c.relminmxid, c.relreplident,
relpages, reltuples, pg_stat_get_live_tuples(c.oid) AS n_live_tup
     , pg_stat_get_dead_tuples(c.oid) AS n_dead_tup, n_tup_ins,
n_tup_upd, n_tup_del, n_tup_hot_upd, n_mod_since_analyze,
last_analyze, last_autoanalyze, autovacuum_count, autoanalyze_count
FROM pg_class c, pg_stat_all_tables av where c.oid = av.relid and
c.relname = av.relname and c.relname = 'heartbeat' order by
av.relname;

  oid | relname | relfrozenxid | relminmxid | relreplident | relpages
| reltuples | n_live_tup | n_dead_tup | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd | n_mod_since_analyze | last_analyze |
last_autoanalyze | autovacuum_count | autoanalyze_count

-------+----------------+--------------+------------+--------------+----------+-----------+------------+------------+-----------+-----------+-----------+---------------+---------------------+--------------+------------------+------------------+-------------------
 40237 | heartbeat | 3451305052 | 55105127 | d | 674616 | 15494 | 0 |
6234 | 0 | 6234 | 0 | 3758 | 6234 | | | 0 | 0
(1 row)


I also found it interesting in n_tup_upd vs n_tup_hot_upd: HOT updates
were 3758, vs 6234 overall. I was expecting all updates to be HOT
since the PK isn't changing, just record values (timestamps, IPs), and
my fillfactor is low. Maybe not relevant but thought it worth
mentioning just in case, because both index and table is bloated.


So when I try vacuum it, the vacuum reports:

DETAIL: 55330122 dead row versions cannot be removed yet.
INFO: "hearbeat": found 0 removable, 55347745 nonremovable row
versions in 675230 out of 675230 pages

How can I determine what is preventing the dead rows from being
removed? Is there a way to identify what transaction if any is
preventing their re-use. Or am I misinterpreting the results?

Full vacuum log:

appdb=# vacuum verbose app.hearbeat;
INFO: vacuuming "app.hearbeat"
INFO: index "hearbeat_pkey" now contains 23435126 row versions in 112094 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.38s/0.40u sec elapsed 1.07 sec.
INFO: index "hearbeat_endpoint_idx" now contains 23435358 row versions
in 112185 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.23s/0.35u sec elapsed 0.70 sec.
INFO: "hearbeat": found 0 removable, 55347745 nonremovable row
versions in 675230 out of 675230 pages
DETAIL: 55330122 dead row versions cannot be removed yet.
There were 199391 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 3.40s/11.46u sec elapsed 18.39 sec.
VACUUM

And in case it is relevant:

appdb=# SELECT pid, datname, usename, state, backend_xmin
appdb-# FROM pg_stat_activity
appdb-# WHERE backend_xmin IS NOT NULL
appdb-# ORDER BY age(backend_xmin) DESC;
  pid | datname | usename | state | backend_xmin
-------+---------------+----------+---------------------+--------------
 10921 | appdb | app | idle in transaction | 3501305052
 10919 | appdb | app | idle in transaction | 3501305052
 10916 | appdb | app | idle in transaction | 3501305052
 27935 | appdb | app | idle in transaction | 3501305052
 24500 | appdb | postgres | active | 3501305052
 10914 | appdb | app | active | 3501305052
 20671 | appdb | postgres | active | 3501305052
 11817 | appdb | app | active | 3501305052
  1988 | appdb | app | active | 3501305052
 15041 | appdb | postgres | active | 3501305052
  9916 | appdb | postgres | active | 3501305052
 10912 | appdb | app | idle in transaction | 3501305052
 10909 | appdb | app | idle in transaction | 3501305052
(13 rows)

appdb=#
appdb=# SELECT slot_name, slot_type, database, xmin
appdb-# FROM pg_replication_slots
appdb-# ORDER BY age(xmin) DESC;
 slot_name | slot_type | database | xmin
-----------+-----------+----------+------
(0 rows)

appdb=#
appdb=# SELECT gid, prepared, owner, database, transaction AS xmin
appdb-# FROM pg_prepared_xacts
appdb-# ORDER BY age(transaction) DESC;
 gid | prepared | owner | database | xmin
-----+----------+-------+----------+------
(0 rows)

Looking through my autovacuum logs, we are only seeing autovacuum
trigger once every 20-40 minutes, which in a table that is essentially
overwritten completely every 60s will explain part of the problem. So
I am also looking to tweak my vacuum settings so it is more aggressive
with this table. However the vacuum above not freeing up any dead rows
has me worried it won't be sufficient.

One other thing to add, we had a similar problem earlier in the year
and a vacuum full apparently didn't clear up the disk space, which
seems very odd to me but unfortunately I wasn't there to check in more
detail. We ended up copying the data into a temporary table,
truncating the original table, and inserting the data back into the
original.

Any advice is much appreciated!

Kind regards,

Simon



pgsql-general by date:

Previous
From: AYahorau@ibagroup.eu
Date:
Subject: Re: Active connections are terminated because of small wal_sender_timeout
Next
From: Thomas Kellerer
Date:
Subject: Re: Why does jsonb_set() remove non-mentioned keys?