Re: [HACKERS] Orphaned files in base/[oid] - Mailing list pgsql-hackers

From Chris Travers
Subject Re: [HACKERS] Orphaned files in base/[oid]
Date
Msg-id CAN-RpxAAjKwCMb6iQ7uvOReZOXqTzPs3jT_3zo1xmWXBerWM7g@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Orphaned files in base/[oid]  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Orphaned files in base/[oid]
List pgsql-hackers


On Tue, Aug 15, 2017 at 3:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Chris Travers <chris.travers@adjust.com> writes:
> I wonder about a different solution.  Would it be possible to special case
> vacuum to check for and remove (or just move to where they can be removed)
> files when vacuuming pg_class?  At the point we are vacuuming pg_class, we
> ought to be able to know that a relfilenode shouldn't be used anymore,
> right?

I don't think so.  It's not clear to me whether you have in mind "scan
pg_class, collect relfilenodes from all live tuples, then zap all files
not in that set" or "when removing a dead tuple, zap the relfilenode
it mentions".  But neither one works.  The first case has a race condition
against new pg_class entries.  As for the second, the existence of a dead
tuple bearing relfilenode N isn't evidence that some other live tuple
can't have relfilenode N.

Ah because if the file never made it on to disk the number could be re-used. 

Another problem for the second solution is that in the case you're worried
about (ie, PANIC due to out-of-WAL-space during relation's creating
transaction), there's no very good reason to expect that the relation's
pg_class tuple ever made it to disk at all.

A traditional low-tech answer to this has been to keep the WAL on a
separate volume from the main data store, so that it's protected from
out-of-space conditions in the main store and temp areas.  The space
needs for WAL proper are generally much more predictable than the main
store, so it's easier to keep the dedicated space from overflowing.
(Stalled replication/archiving processes can be hazardous to your
health in this scenario, though, if they prevent prompt recycling of
WAL files.)

Yeah, most of our dbs here have wal on a separate volume but not this system.  This system is also unusual in that disk usage varies wildly (and I am not 100% sure that this is the only case which causes it though I can reproduce it consistently in the case of the wal writer running out of disk space with symptoms exactly what I found).

So for now that leaves my fallback approach as a way to fix it when I see it.

I have written a shell script which does as follows:
1.  starts Postgres in single user mode with a data directory or dies (won't run if Postgres seems to be already running)
2.  gets the old of the current database
3.  lists all files consisting of only digits in the  base/[dboid] directory
4. asks Postgres (In single user mode again) for all relfilenodes and oids of tables (In my testing both were required because there were some cases where relfilenodes were not set in some system 
5.  Loops through the file nodes gathered, checks against the relfilenode entries, and zaps $f, $f_*, and $f.*.  Currently for testing "zaps" has been to move to a lostnfound folder for inspection following the script.  The logic here is not perfect and is very slightly under inclusive, but better that than the other way.

Then we can start Postgres again.  I cannot find a better way to avoid race conditions, I guess. At any rate it sounds like preventing the problem more generally may be something beyond what I would feel comfortable trying to do as a patch at my current level of familiarity with he source code.

The full script is included inline below my signature in case it is of interest to anyone on the list.


                        regards, tom lane



--
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 

Saarbrücker Straße 37a, 10405 Berlin

---

#!/bin/bash

datadir=$1
database=$2

pg_ctl -D $datadir stop

dboidfile="$PWD/cleanupdb.oid"
reloidfile="$PWD/refilenodes.list"

echo "COPY (select oid from pg_database where datname = current_database()) TO '$dboidfile'" | postgres --single -D $datadir $database > /dev/null


if (($?)) 
then
   echo "FATAL: Could not start Postgres in single user mode"
   exit 1
fi

dboid=`cat $dboidfile`
filenodes=`(cd test/base/$dboid; ls [0-9]*[0-9] | grep -v '\.' | sort -n)`
#echo $filenodes

echo "COPY (select relfilenode from pg_class union select oid as relfilenode from pg_class) TO '$reloidfile'" | postgres --single -D $datadir $database > /dev/null
relfilenodes=`cat $reloidfile`
#echo $relfilenodes
if [[ -z relfilenodes ]]
then
   echo "FATAL: did not get any relfilenodes"
   exit 2
fi

mkdir lostnfound;
for f in $filenodes
do
  if [[ -z `echo $relfilenodes | grep -w $f` ]]
  then
      echo moving $f to lostnfound
      mv $datadir/base/$dboid/$f lostnfound
      mv $datadir/base/$dboid/${f}_* lostnfound 2> /dev/null
      mv $datadir/base/$dboid/${f}.* lostnfound 2> /dev/null
  fi
done
rm $dboidfile
rm $reloidfile

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken