On Wed, Jun 8, 2022 at 8:46 AM Bruce Momjian <bruce@momjian.us> wrote:
> We currently can check for missing heap/index files by comparing
> pg_class with the database directory files. However, I am not clear if
> this is safe during concurrent DDL. I assume we create the file before
> the update to pg_class is visible, but do we always delete the file
> after the update to pg_class is visible? I assume any external checking
> tool would need to lock the relation to prevent concurrent DDL.
If you see an entry in pg_class, then there should definitely be a
file present on disk. The reverse is not true: just because you don't
see an entry in pg_class for a file that's on disk doesn't mean it's
safe to remove that file.
> Also, how would it check if the number of extents is correct? Seems we
> would need this value to be in pg_class, and have the same update
> protections outlined above. Seems that would require heavier locking.
Yeah, and it's not just the number of extents but the length of the
last one. If the last extent is supposed to be 700MB and it gets
truncated to 200MB, it would be nice if we could notice that.
One idea might be for each heap table to have a metapage and store the
length - or an upper bound on the length - in the metapage. That'd
probably be cheaper than updating pg_class, but might still be
expensive in some scenarios, and it's a fairly large amount of
engineering.
> Is this something anyone has even needed or had requested?
Definitely. And also the reverse: figuring out which files on disk are
old garbage that can be safely nuked.
--
Robert Haas
EDB: http://www.enterprisedb.com