Thread: leaking lots of unreferenced inodes (pg_xlog files?), maybe after moving tables and indexes to tablespace on different volume

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Running postgresql-9.2.2 on FreeBSD 9.1 using vanilla ufs file system.

I have the postgresql base/ on the /usr disk, and a separate volume /opt
where the default tablespace resides. I found many databases that had
not used the default tablespace, an ran many hundreds of

ALTER TABLE foobar SET TABLESPACE foo;
ALTER INDEX foobar_idx SET TABLESPACE foo;

a while back to free up disk space on /usr. I got lots of space freed
up, but it seems that after that the disk usage grows linearly (it seems
to leave many inodes unreferenced).

The the discrepancy between df and du is now huge:

# du -sxh /usr; df -h /usr
4,6G    /usr
Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/da0s1f    104G     88G    8.0G    92%    /usr

4,6G vs 88GB, that must be more than a rounding error?

Strange thing is I cannot find any open files.

# lsof /usr| awk '{print $9}'|xargs ls -l > /dev/null

returns no errors (a missing file would render an error with ls). If
there where open files not referenced in any directory, they should be
found.

Next thing is fsck, and yes, there are plenty of unreferenced files.

I ran fsck while system is running (i.e. read only) to get a grip oif
the amount of lost inodes:

fsck /usr | awk '{print $1}'|cut -f 2 -d=| perl -e '$i = 0; while (<>) {
$i += $_;}; print $i / 1024 / 1024; print "\n";'
85223.3530330658

~85 GB gone, that's 80% of the disk, and it accounts fo all the missing
space.

MTIME for the inodes are pretty evenly spread over time since the
machine was updated to FreeBSD 9.1, rebooted, and PostgreSQL was updated
to 9.2. All was done at the same time, so I can't really tell who's to
blaim, but this is the only server out of a dozen where we did all these
ALTER table/index SET TABLEPSPACE... and it is the only server, out of a
dozen that where updated to exactly the same versions, that has this
problem. All other servers have their base/ disk as horisontal the
horizone (since all data resides on a separate tablespace). All servers
where updated from 8.4 or 9.0 using pg_update. Only this server has
problems. This is the only server with a multitude (~30) active
databases, and the only one where we ran the SET TABLESPACE stuff.

The unreferenced inodes are almost exclusively around 16 MB in size, so
i.e. they would most probably all be pg_xlog files.

config in postgresql.conf is checkpoint_segments = 40

Could there be anything fishy in the code that moves the indexes or
tables between different tablespaces? How could it possibly be able to
leave unreferenced inodes around like this? Is the culprit a combination
of postgresql and file system code?

checkpoints seems to happen approximately every three minutes:

Mar 13 00:39:08 dbserver postgres[5298]: [48-1] db=,user= LOG:
checkpoint starting: time
Mar 13 00:41:38 dbserver postgres[5298]: [49-1] db=,user= LOG:
checkpoint complete: wrote 2542 buffers (0.3%); 0 transaction log
file(s) added, 0 removed, 1 recycled; write=149.667 s, sync=0.101 s,
total=149.770 s; sync files=628, longest=0.021 s, average=0.000 s
Mar 13 00:44:08 dbserver postgres[5298]: [50-1] db=,user= LOG:
checkpoint starting: time
Mar 13 00:46:38 dbserver postgres[5298]: [51-1] db=,user= LOG:
checkpoint complete: wrote 3996 buffers (0.4%); 0 transaction log
file(s) added, 0 removed, 1 recycled; write=149.438 s, sync=0.111 s,
total=149.551 s; sync files=823, longest=0.006 s, average=0.000 s
Mar 13 00:49:08 dbserver postgres[5298]: [52-1] db=,user= LOG:
checkpoint starting: time
Mar 13 00:51:38 dbserver postgres[5298]: [53-1] db=,user= LOG:
checkpoint complete: wrote 13736 buffers (1.4%); 0 transaction log
file(s) added, 0 removed, 2 recycled; write=149.958 s, sync=0.311 s,
total=150.271 s; sync files=1335, longest=0.079 s, average=0.000 s
Mar 13 00:54:08 dbserver postgres[5298]: [54-1] db=,user= LOG:
checkpoint starting: time
Mar 13 00:56:38 dbserver postgres[5298]: [55-1] db=,user= LOG:
checkpoint complete: wrote 14638 buffers (1.5%); 0 transaction log
file(s) added, 0 removed, 17 recycled; write=149.330 s, sync=0.271 s,
total=149.603 s; sync files=1363, longest=0.017 s, average=0.000 s
Mar 13 00:59:08 dbserver postgres[5298]: [56-1] db=,user= LOG:
checkpoint starting: time
Mar 13 01:01:38 dbserver postgres[5298]: [57-1] db=,user= LOG:
checkpoint complete: wrote 8035 buffers (0.8%); 0 transaction log
file(s) added, 0 removed, 21 recycled; write=149.285 s, sync=0.146 s,
total=149.433 s; sync files=1160, longest=0.003 s, average=0.000 s
Mar 13 01:04:08 dbserver postgres[5298]: [58-1] db=,user= LOG:
checkpoint starting: time
Mar 13 01:06:37 dbserver postgres[5298]: [59-1] db=,user= LOG:
checkpoint complete: wrote 2156 buffers (0.2%); 0 transaction log
file(s) added, 0 removed, 9 recycled; write=149.402 s, sync=0.057 s,
total=149.461 s; sync files=610, longest=0.000 s, average=0.000 s
Mar 13 01:09:08 dbserver postgres[5298]: [60-1] db=,user= LOG:
checkpoint starting: time


I'm pretty certain that unmounting the file system and running fsck will
regain the lost space, but will it stop there? Has it got to do with the
ALTER table/index SET TABLESPACE, or is that a smoke screen, and it is
really something else?

Stopping postgresql briefly did not help, I tried that.

The server has about 30 databases and ~127 concurrent connections (not
all beeing active simultaneously, though), so it is fair to say it is
pretty active, but nothing extreme.

Hardware is HP DL360, using their HT Smart Array P410i.

Any ideas how to debug this? Or shall I just reboot, fsck, hope the
problem will go away, and when it does, forget about it?

Thanks,
Palle
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJRP8o5AAoJEIhV+7FrxBJDTl0H/2JWb3lFuaL+QYDe9+Le8zhu
7Q1KEYeY91Noq88rLa+eM3ENPibrsJYGYWYaQb9U2PSFXvud5/+czWMZrcVx+3H4
+tFME+FO+f9kasNrGA++LmRwlXlSWYZj9oZXumdIbmqX0Nvne0nvMc34Qyg8yqB5
l3pMFRdMEO4kM9n/pJkQf/7nuWZDEvmNTlkL1wARjGyTa8GYQa1a4pBDq/ovXuE8
+6NF64X8Mbm55geEHIybop72BTcfQmGMpYaeInjH2V2zbaqC6HV3eqfTv+CGT8op
tAxRYKa/9brYj9IIcaLn+zQvBTxbklZVON6BrlKI6+z144nXOdH+Ny4Zs4upP+Y=
=J6T7
-----END PGP SIGNATURE-----



Palle Girgensohn <girgen@FreeBSD.org> writes:
> ... I got lots of space freed
> up, but it seems that after that the disk usage grows linearly (it seems
> to leave many inodes unreferenced).

Hm.  We've seen issues in the past with PG processes failing to close
no-longer-useful files promptly, but ...

> Strange thing is I cannot find any open files.

... that suggests there's something else going on.

> The unreferenced inodes are almost exclusively around 16 MB in size, so
> i.e. they would most probably all be pg_xlog files.

Have you got any sort of WAL archiving active, and if so maybe that's
holding onto WAL files?  Not that it's clear how come lsof wouldn't
tattle on an archiving process either.

> Stopping postgresql briefly did not help, I tried that.

That seems to point the finger at some non-postgres cause.  I confess
I can't guess what.
        regards, tom lane



<p dir="ltr"><br /> On Mar 13, 2013 3:04 AM, "Tom Lane" <<a
href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>wrote:<br /> ><br /> > Palle Girgensohn
<girgen@FreeBSD.org>writes:<br /> > > ... I got lots of space freed<br /> > > up, but it seems that
afterthat the disk usage grows linearly (it seems<br /> > > to leave many inodes unreferenced).<br /> ><br />
>Hm.  We've seen issues in the past with PG processes failing to close<br /> > no-longer-useful files promptly,
but...<br /> ><br /> > > Strange thing is I cannot find any open files.<br /> ><br /> > ... that
suggeststhere's something else going on.<br /> ><br /> > > The unreferenced inodes are almost exclusively
around16 MB in size, so<br /> > > i.e. they would most probably all be pg_xlog files.<br /> ><br /> > Have
yougot any sort of WAL archiving active, and if so maybe that's<br /> > holding onto WAL files?  Not that it's clear
howcome lsof wouldn't<br /> > tattle on an archiving process either.<br /> ><br /> > > Stopping postgresql
brieflydid not help, I tried that.<br /> ><br /> > That seems to point the finger at some non-postgres cause.  I
confess<br/> > I can't guess what.<br /> ><p dir="ltr">Yeah, unreferenced inodes with no open files, and only
discoverablewith fsck sounds like a filsystem bug to me. Particularly since it showed up just after a operating system
upgrade,and doesn't go away with a postgres restart... <p dir="ltr">/Magnus  
We're seeing similar behaviour on several of our FreeBSD servers too.
It doesn't look like open files, or filesystem snapshots. Rebooting
does reset it, but restarting PG makes no difference.

We've got an assortment of different versions of both FreeBSD and
PostgreSQL, some of which are demonstrating this behaviour, some
aren't. Here's a quick breakdown of versions and what we've got
running:

FreeBSD   PostgreSQL   Leaking?
8.0       8.4.4        no
8.2       9.0.4        no
8.3       9.1.4        yes
8.3       9.2.3        yes
9.1       9.2.3        yes

All of these machines are under similar load patterns and (apart from
the version differences), are set up essentially the same and are
doing the same job. They all have hot standbys yet this problem
doesn't exist on any of the standby servers. We haven't done anything
with tablespaces, the database has its own dedicated partition
(although pg_log/pg_xlog are both symlinked out to /usr).

However (just to throw a spanner in the works) we do have another
server running fbsd8.3/pg9.1.4 which ISN'T showing this behaviour -
although its load patterns are quite different.

I'm not sure if this is going to help, but here's a graph of this disk
space disparity over the last few days (Y axis is in gigabytes). The
flat-ish part in the middle is the weekend where we have little
traffic, so we can at least say it's not constant:
http://i.imgur.com/jlbgzNI.png

Up until now we've been upgrading things in the hope that the problem
will go away, but since we've got one server up to fbsd9.1/pg9.2.3 and
still seeing the problem we're a little stumped. Any ideas about how
we can go about debugging this would be appreciated.

Thanks,

Dan

On 13 March 2013 07:39, Magnus Hagander <magnus@hagander.net> wrote:
>
> On Mar 13, 2013 3:04 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>>
>> Palle Girgensohn <girgen@FreeBSD.org> writes:
>> > ... I got lots of space freed
>> > up, but it seems that after that the disk usage grows linearly (it seems
>> > to leave many inodes unreferenced).
>>
>> Hm.  We've seen issues in the past with PG processes failing to close
>> no-longer-useful files promptly, but ...
>>
>> > Strange thing is I cannot find any open files.
>>
>> ... that suggests there's something else going on.
>>
>> > The unreferenced inodes are almost exclusively around 16 MB in size, so
>> > i.e. they would most probably all be pg_xlog files.
>>
>> Have you got any sort of WAL archiving active, and if so maybe that's
>> holding onto WAL files?  Not that it's clear how come lsof wouldn't
>> tattle on an archiving process either.
>>
>> > Stopping postgresql briefly did not help, I tried that.
>>
>> That seems to point the finger at some non-postgres cause.  I confess
>> I can't guess what.
>>
>
> Yeah, unreferenced inodes with no open files, and only discoverable with
> fsck sounds like a filsystem bug to me. Particularly since it showed up just
> after a operating system upgrade, and doesn't go away with a postgres
> restart...
>
> /Magnus