Thread: BUG #15995: VACUUM not working after setting/unsetting of archive_mode = on, wal_level = replica

BUG #15995: VACUUM not working after setting/unsetting of archive_mode = on, wal_level = replica

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15995
Logged by:          Daniel Joseph
Email address:      jo.raj.postgres.dba@gmail.com
PostgreSQL version: 10.8
Operating system:   RHEL 7.5
Description:

Hi Postgres Support Team,

I have a huge problem with Postgres, initially I build a database with 9.6,
upgraded to 10.8 after 3 months, during upgraded I took the
archive_mode=off(default) and left

1. Build a brand new postgres cluster in 9.6 version in RHEL 7.5 (no
external extensions/software been installed)
2. Vacuum worked full, freeze etc.
3. After 3 months upgraded the postgres database cluster from 9.6 to 10.8.
4. Database is running fine after the upgrade.
5. During upgrade, have unset archive_mode = on, wal_level = replica, to
leave as default.
6. We don't have any type of streaming/HA environment.
7. Upgrade was successful, database is operational.
8. Again set archive_mode = on, wal_level = replica
9. VACUUM is not working, even after upgrades vacuum is not working.  If I
unset archive_mode = on, wal_level = replica and leave to default, vacuum
not working.  Following is the message I get, no dead rows removed, but lot
of delete/update has happeded to the table.

test1=# vacuum (full, verbose, analyze ) ;
INFO:  vacuuming "pg_catalog.pg_statistic"
INFO:  "pg_statistic": found 136 removable, 426 nonremovable row versions in
27 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s.
INFO:  vacuuming "pg_catalog.pg_type"
INFO:  "pg_type": found 9 removable, 380 nonremovable row versions in 9
pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  analyzing "pg_catalog.pg_type"
INFO:  "pg_type": scanned 9 of 9 pages, containing 378 live rows and 2 dead
rows; 378 rows in sample, 378 estimated total rows
INFO:  vacuuming "pg_catalog.pg_policy"
INFO:  "pg_policy": found 0 removable, 0 nonremovable row versions in 0
pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  analyzing "pg_catalog.pg_policy"
INFO:  "pg_policy": scanned 0 of 0 pages, containing 0 live rows and 0 dead
rows; 0 rows in sample, 0 estimated total rows
INFO:  vacuuming "test1.table1"
INFO:  "table1": found 0 removable, 619476 nonremovable row versions in
15644 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 0.38 s, system: 0.25 s, elapsed: 3.16 s.
INFO:  analyzing "test1.table1"
INFO:  "table1": scanned 15644 of 15644 pages, containing 619476 live rows
and 0 dead rows; 30000 rows in sample, 619476 estimated total rows
INFO:  vacuuming "pg_catalog.pg_authid"

10. Even if I create a brand new postgres cluster + database in 10.8 and
delete/update 1million rows in 10 million row table, vacuum is not removing
the dead rows.

Any help in resolving the issue is appreciated.  If this is a know bug, do
you have any workaround.

Thanks,
Daniel


PG Bug reporting form <noreply@postgresql.org> writes:
> 9. VACUUM is not working, even after upgrades vacuum is not working.  If I
> unset archive_mode = on, wal_level = replica and leave to default, vacuum
> not working.  Following is the message I get, no dead rows removed, but lot
> of delete/update has happeded to the table.

There are basically two possibilities here:

1. You have an old open transaction somewhere.  Possibly an
uncommitted two-phase transaction.  The pg_prepared_xacts view
would help identify the latter.

2. You have a replication slot that's holding back global xmin.
Look in pg_replication_slots.

            regards, tom lane





On Sun, Sep 8, 2019, 9:13 PM PG Bug reporting form <noreply@postgresql.org> wrote:

DETAIL:  0 dead row versions cannot be removed yet.

How are you determining that vacuum is not working?  It looks like vacuum has already worked, just not at the spot in the log file where you expected it to.

Cheers,

Jeff