Re: Orphaned relations after crash/sigkill during CREATE TABLE - Mailing list pgsql-general

From Tom Lane
Subject Re: Orphaned relations after crash/sigkill during CREATE TABLE
Date
Msg-id 1751191.1597959995@sss.pgh.pa.us
Whole thread Raw
In response to Re: Orphaned relations after crash/sigkill during CREATE TABLE  (Jason Myers <j.myers@brstrat.com>)
Responses Re: Orphaned relations after crash/sigkill during CREATE TABLE  (Jason Myers <j.myers@brstrat.com>)
Re: Orphaned relations after crash/sigkill during CREATE TABLE  (Jeremy Schneider <schneider@ardentperf.com>)
List pgsql-general
Jason Myers <j.myers@brstrat.com> writes:
> However we were still seeing orphaned files on crash, and I believe I
> tracked it down to subsequent CREATE INDEX statements also creating these
> orphaned files (if they are running during a crash).
> Is that issue known as well?  I don't believe I can use the same trick to
> sidestep that one...

Yeah, it's entirely intentional that we don't try to clean up orphaned
disk files after a database crash.  There's a long discussion of this and
related topics in src/backend/access/transam/README.  What that says about
why not is that such files' contents might be useful for forensic analysis
of the crash, and anyway "Orphan files are harmless --- at worst they
waste a bit of disk space".  A point not made in that text, but true
anyway, is that it'd also be quite expensive to search a large database
for orphaned files, so people would likely not want to pay that price
on the way to getting their database back up.

There might be value in a user-invokable tool that runs in an existing
non-crashed database and looks for orphan files, but I'm not aware that
anyone has written one.  (Race conditions against concurrent table
creation would be a problem; but probably that can be finessed somehow,
maybe by noting the file's creation time.)

In the meantime I've got to say that routinely kill 9'ing database
processes just doesn't seem like a very good idea.  Yeah, we do our best
to ensure that there won't be data loss, but you're really doubling down
on a hard assumption that Postgres contains zero bugs when you operate
that way.  I'd suggest reconfiguring things to avoid the OOM kill hazard;
or if your cloud provider makes that effectively impossible, maybe you
need another provider.  But on most systems I'd think you could use ulimit
or the like even if you don't have root privileges.

            regards, tom lane



pgsql-general by date:

Previous
From: Michael Lewis
Date:
Subject: Re: Misestimate when applying condition like id = id
Next
From: Jason Myers
Date:
Subject: Re: Orphaned relations after crash/sigkill during CREATE TABLE