On Sat, Mar 26, 2022 at 7:13 AM PG Bug reporting form <noreply@postgresql.org> wrote: > > The following bug has been logged on the website: > > Bug reference: 17449 > Logged by: Giorgio Saviane > Email address: gsaviane@gmail.com > PostgreSQL version: 11.13 > Operating system: Linux 5.8 > Description: > > Hello, I noticed an uncontrolled disk occupation growth caused by a Postgres > database in one of my deployments. > The actual database size took more than 500Gb (checked with select > pg_size_pretty(pg_database_size('dbname')) although tables accounted for a > total of ~ 50Gb (checked with pg_total_relation_size()).
So, the output of pg_database_size is 500GB and pg_total_relation_size for all the tables is 50GB right? pg_database_size includes table space size as well, were there any table spaces and files under it? BTW, pg_total_relation_size also includes all the files of a relation irrespective of whether the vacuum happened or not.
The database was created in a single tablespace. Tables did not hold any other
additional files than those inherently used for storing their data and indexes.
> Despite any attempt > of full vacuum the discrepancy remained the same. I suspect that Postgres > started leaking disk space. I could see many 1Gb files with a timestamp of > two months back in time in the postgres data folder. > Restarting the server did not have any effect, so I decided to pg_dump the > database and pg_restore the backup in a new instance. That worked, the new > database is now ~ 50 Gb and dropping the old one released that 500Gb of disk > space. > The database was under streaming replication and I noticed the postgres log > reporting many of these messages > > requested WAL segment 0000000100000000000000E3 has already been removed > > Could be that the leak started in correspondence of that error? > If so, is there anything we can do to prevent it? I already set > wal_keep_segments = 100, but I'm not sure it is enough and how to tune it. > Is there any possible workaround to release the leaked space without going > through a backup? (It took two hours)
Which all other directories in the data directory took more space? Is it pg_wal or base or some other? Were there any inactive replication slots? Was the archiving happening properly?
The content of /var/lib/postoresql/11/main/base dir was as following: