> Despite any attempt > of full vacuum the discrepancy remained the same. I suspect that Postgres > started leaking disk space. I could see many 1Gb files with a timestamp of > two months back in time in the postgres data folder.
If the database suffers a crash (or immediate shutdown) in the middle of something like VACUUM FULL or CLUSTER, it might leave orphaned in-process files such as the ones you describe behind and have no way to know to clean them up. The knowledge about what it was working on just before the crash was lost in the crash.
Files not touched in 2 months and also not referenced in pg_class.relfilenode are almost certainly such orphaned files and could, with extreme nervousness, be cleaned up by hand. Especially if the human-readable log files support a crash having happened at that time.
> Restarting the server did not have any effect, so I decided to pg_dump the > database and pg_restore the backup in a new instance. That worked, the new > database is now ~ 50 Gb and dropping the old one released that 500Gb of disk > space. > The database was under streaming replication and I noticed the postgres log > reporting many of these messages > > requested WAL segment 0000000100000000000000E3 has already been removed
When did those start? Before you rebuilt the master? Was your replica using, or attempting to use, replication slots?