On Mon, Feb 11, 2019 at 10:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Ah, so Andrew was correct: we panicked due to lack of WAL space, and
> that explains why the vacuuming process didn't have an opportunity
> to delete the files belonging to the uncommitted new relation.
> It's a pretty well-understood dynamic, I believe. Perhaps we should
> try harder to recover cleanly, but I don't know of anyone putting
> effort into the case.
FTR I am working on a PG13 patch that records relfilenodes of
uncommitted transactions in undo logs, so it can unlink them reliably,
even if you crash (at the cost of introducing a WAL flush before
creating files). I haven't specifically studied the VACUUM FULL case
yet, but in principle this is exactly what my project aims to fix.
It's mostly intended as example code to demonstrate the undo log
machinery (defining undo record types, registering undo log action
functions that are invoked during rollback, rollback of aborted but
not yet rolled back transaction at startup, ...) without having to
understand the whole zheap sandwich at once, but it's also a solution
to an age old problem. More on that soon.
--
Thomas Munro
http://www.enterprisedb.com