Thread: Orphaned Files in PostgreSQL

Orphaned Files in PostgreSQL

From
Ashutosh Sharma
Date:
Hi All,

While investigating one of our customer issues, we discovered several
orphaned data files on the disk that do not have corresponding entries
in the pg_class table. Upon further analysis, we identified specific
scenarios in PostgreSQL where this issue can occur. One such scenario
is as follows:

Consider a situation where a table is being created within a
transaction, and data is being loaded into it. If PostgreSQL
unexpectedly crashes while the transaction is still in progress, an
orphaned file may be left behind on the disk. In cases where multiple
such transactions occur, this can lead to the accumulation of numerous
orphaned files, resulting in significant disk space consumption.
Unfortunately, these files are not cleared during PostgreSQL's restart
process.

We have discussed this issue internally, and one proposed solution
involves adding a marker file to the disk for any table created within
a transaction, immediately upon its creation. This marker file would
then be removed during the commit process. If the transaction is
aborted due to a server crash, the marker file and the corresponding
disk file would be cleared at the end of the recovery process during
server startup.

I would appreciate your thoughts on this solution. Should you have any
suggestions or alternative approaches, I would be grateful to hear
them.

Additionally, I am unsure if this issue has already been reported or
if it is currently being addressed. If that is the case, I would be
grateful if you could point me to the relevant discussion thread so I
can follow the progress and contribute if needed.

Thank you for your time and assistance.

--
With Regards,
Ashutosh Sharma.



Re: Orphaned Files in PostgreSQL

From
Bertrand Drouvot
Date:
Hi,

On Tue, Feb 18, 2025 at 04:16:02PM +0530, Ashutosh Sharma wrote:
> Additionally, I am unsure if this issue has already been reported or
> if it is currently being addressed. If that is the case, I would be
> grateful if you could point me to the relevant discussion thread so I
> can follow the progress and contribute if needed.

I think that it is a known issue (see [1]). The thread links to a discussion that
could provide a fix (If I understood correctly) and introduces a way/extension to
clean those orphaned files in a clean way (using a dirty snapshot). Idea was to
gauge if there is interest to add this extension in contrib (would certainly
need some polishing and code work to meet the "contrib" expectations though).

[1]: https://www.postgresql.org/message-id/flat/7ff08868-843a-c39c-c96d-7e7f77fe5f5c%40amazon.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com