Thread: Extra files in "base" dir not seen in relfilenodes

Extra files in "base" dir not seen in relfilenodes

From
Daniel Farina
Date:
I am looking at a database with a wide (~500G) divergence between the total space expended by the database directory and the result of select sum(pg_relation_size(oid)) from pg_class;.

I located about 280G of apparent extra space by performing an anti-join between files on disk and files in the catalog via the pg_class.relfilenode field.

What should I do to get rid of the data, if it is, in fact, "extra"? Is there a reasonable bug report to file? Can I independently compute the entire itemization of files that belong in the data directory?

The cluster was pg_upgrade'd, in link mode, a while ago, to 10.1 from 9.6. This is not necessarily relevant, though.
 
Thanks,
Daniel

Re: Extra files in "base" dir not seen in relfilenodes

From
Tom Lane
Date:
Daniel Farina <daniel@fdr.io> writes:
> I am looking at a database with a wide (~500G) divergence between the total
> space expended by the database directory and the result of select
> sum(pg_relation_size(oid)) from pg_class;.

Odd.

> I located about 280G of apparent extra space by performing an anti-join
> between files on disk and files in the catalog via the pg_class.relfilenode
> field.

Umm ... are you accounting for catalogs that have zeroes in
pg_class.relfilenode?  It's generally better to rely on the
pg_relation_filenode(oid) function than the raw column contents.

            regards, tom lane


Re: Extra files in "base" dir not seen in relfilenodes

From
Daniel Farina
Date:

On Tue, Jan 16, 2018 at 3:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Daniel Farina <daniel@fdr.io> writes:
> I am looking at a database with a wide (~500G) divergence between the total
> space expended by the database directory and the result of select
> sum(pg_relation_size(oid)) from pg_class;.

Odd.

> I located about 280G of apparent extra space by performing an anti-join
> between files on disk and files in the catalog via the pg_class.relfilenode
> field.

Umm ... are you accounting for catalogs that have zeroes in
pg_class.relfilenode?  It's generally better to rely on the
pg_relation_filenode(oid) function than the raw column contents.

Yeah, the catalogs are not considered here (oids < 10000). The oids in question are rather high. Let me re-run the antijoin with the function though....

Indeed, same result.

Re: Extra files in "base" dir not seen in relfilenodes

From
Tom Lane
Date:
Daniel Farina <daniel@fdr.io> writes:
> On Tue, Jan 16, 2018 at 3:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Umm ... are you accounting for catalogs that have zeroes in
>> pg_class.relfilenode?  It's generally better to rely on the
>> pg_relation_filenode(oid) function than the raw column contents.

> Yeah, the catalogs are not considered here (oids < 10000). The oids in
> question are rather high. Let me re-run the antijoin with the function
> though....

> Indeed, same result.

Hmm, you should have gotten a result that was different by the size
of the bootstrap catalogs (pg_class, pg_attribute, pg_proc, pg_type,
plus their indexes).  I'm worried that you're going to accidentally
delete those critical catalogs.

However, assuming you've gotten that detail right, then any file
you can't match up with a relfilenode value must be an orphan you
can just "rm".

Do you have any theories about how the DB got like this?  Were there
system crashes or anything like that recently?  Is there any pattern
to the file access or mod dates on the putatively-orphaned files?

            regards, tom lane


Re: Extra files in "base" dir not seen in relfilenodes

From
Alvaro Herrera
Date:
Tom Lane wrote:

> However, assuming you've gotten that detail right, then any file
> you can't match up with a relfilenode value must be an orphan you
> can just "rm".

Maybe look in pg_buffercache for entries referencing those files before
deleting.  It would be surprising to see any if no catalog points to
those files, but who knows ...  If you break checkpointing, you're not
going to be pleased.

If it turns out that a shared buffer exists for any of those files, what
would be a way to evict them without pain?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Extra files in "base" dir not seen in relfilenodes

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Tom Lane wrote:
>> However, assuming you've gotten that detail right, then any file
>> you can't match up with a relfilenode value must be an orphan you
>> can just "rm".

> Maybe look in pg_buffercache for entries referencing those files before
> deleting.  It would be surprising to see any if no catalog points to
> those files, but who knows ...  If you break checkpointing, you're not
> going to be pleased.

Even if there are any such entries, surely they ought to be clean by
now and thus not a hazard.  Not that a manual "CHECKPOINT" might not
be a good idea.

            regards, tom lane