>>>
>>> > Despite any attempt
>>> > of full vacuum the discrepancy remained the same. I suspect that Postgres
>>> > started leaking disk space. I could see many 1Gb files with a timestamp of
>>> > two months back in time in the postgres data folder.
>
>
> If the database suffers a crash (or immediate shutdown) in the middle of something like VACUUM FULL or CLUSTER, it
mightleave orphaned in-process files such as the ones you describe behind and have no way to know to clean them up.
Theknowledge about what it was working on just before the crash was lost in the crash.
>
> Files not touched in 2 months and also not referenced in pg_class.relfilenode are almost certainly such orphaned
filesand could, with extreme nervousness, be cleaned up by hand. Especially if the human-readable log files support a
crashhaving happened at that time.
That was not the case. The server has been running seamlessly since I
rebuilt the master.
>
>>>
>>> > Restarting the server did not have any effect, so I decided to pg_dump the
>>> > database and pg_restore the backup in a new instance. That worked, the new
>>> > database is now ~ 50 Gb and dropping the old one released that 500Gb of disk
>>> > space.
>>> > The database was under streaming replication and I noticed the postgres log
>>> > reporting many of these messages
>>> >
>>> > requested WAL segment 0000000100000000000000E3 has already been removed
>
>
> When did those start? Before you rebuilt the master? Was your replica using, or attempting to use, replication
slots?
They show up after I rebuild the master and re-enable the replica. No,
the replica is not using any slot, but I got that it would help in
case of unstable networking between slave and master.
Kind regards
Giorgio