Thread: hidden junk files in ...data/base/oid/

hidden junk files in ...data/base/oid/

From
Andrej Vanek
Date:
Hello,

I ran upgrade from 9.1 to 9.3 on CentOS using pg_upgrade and database cluster size dropped from 77GB to 4.3GB. I wonder how this could happen. No data lost. This means I had about 70GB junk files in my database...

I checked top 10 biggest database tables: nearly same size on old/new cluster, the biggest table about 2GB, next 1GB, all other tables smaller and smaller..
I checked files of biggest database (/var/lib/pgsql/9.1/data/base/27610):
# du  -sm * |awk '{s+=$1} END {print s}'
7107
# du  -sm . |sort -n
75264   .
So size of non-hidden database files is 1/10th of the whole database size.
Question: Where does the 70GB hidden files in ...data/base/oid/ come from? Which postgres process could generate them? Some missed maintenance from my side? A bug? Anybody else experienced such issue?
Thanks, Andrej
---------details: ...data/base/oid file listing shortened:
# ls -la
total 77069960
drwx------. 2 postgres postgres      32768 May 12 18:50 .
drwxr-xr-x. 5 postgres postgres       4096 May 12 11:47 ..
-rw-------. 1 postgres postgres  354156544 Feb 25 12:06 .27623.17rLmT
-rw-------. 1 postgres postgres  338952192 Feb 25 14:21 .27623.6dH1b6
-rw-------. 1 postgres postgres    5767168 Mar  7 16:00 ..27623.6dH1b6.6PrU4B
-rw-------. 1 postgres postgres  411041792 Feb 25 15:07 .27623.aN42DG
-rw-------. 1 postgres postgres  342884352 Mar  8 15:16 ..27623.aN42DG.0U5xfj
-rw-------. 1 postgres postgres  343146496 Mar  8 13:13 ..27623.aN42DG.2WFmNo
-rw-------. 1 postgres postgres  343408640 Mar  8 10:43 ..27623.aN42DG.384SXU
-rw-------. 1 postgres postgres  357302272 Mar  8 05:26 ..27623.aN42DG.3hHjZ8
-rw-------. 1 postgres postgres  360185856 Mar  7 18:19 ..27623.aN42DG.5lWta4
-rw-------. 1 postgres postgres  343146496 Mar  8 10:12 ..27623.aN42DG.64lNVQ
...shortened...
-rw-------. 1 postgres postgres 1005322240 Feb 25 15:38 .27731.2.JKYXGW
-rw-------. 1 postgres postgres  359661568 Mar  9 14:52 ..27731.2.JKYXGW.3h8RuF
-rw-------. 1 postgres postgres  331087872 Mar  9 07:37 ..27731.2.JKYXGW.3hK5aF
-rw-------. 1 postgres postgres  359923712 Mar  9 09:29 ..27731.2.JKYXGW.3KA5Cq
-rw-------. 1 postgres postgres  359923712 Mar  9 16:55 ..27731.2.JKYXGW.45nQei
-rw-------. 1 postgres postgres  137363456 Mar  9 04:47 ..27731.2.JKYXGW.4zya2Z
...shortened...
-rw-------. 1 postgres postgres  769916928 Feb 25 15:53 .27902.YboxvS
-rw-------. 1 postgres postgres  671612928 Feb 20 10:01 .27902.YMEtoS
-rw-------. 1 postgres postgres  159645696 Feb 25 16:24 .59866.Lkyxgs
-rw-------. 1 postgres postgres  272629760 Feb 20 18:37 .59866.RTcUkC
-rw-------. 1 postgres postgres  505151488 Feb 25 16:40 .59961.5BcZpK
-rw-------. 1 postgres postgres   91750400 Feb 25 16:55 .60194.gUqSdJ
-rw-------. 1 postgres postgres       8192 Apr  7 05:20 60592
-rw-------. 1 postgres postgres       8192 Jan 31 13:03 60594
-rw-------. 1 postgres postgres       8192 Apr  7 02:01 60596
-rw-------. 1 postgres postgres       8192 Feb 28 14:44 60598
-rw-------. 1 postgres postgres       8192 Apr  7 11:55 60600
...shortened...
-rw-------. 1 postgres postgres     139264 May 12 12:08 702364
-rw-------. 1 postgres postgres      24576 May  9 12:42 702364_fsm
-rw-------. 1 postgres postgres       8192 May  9 12:40 702364_vm
-rw-------. 1 postgres postgres          0 May  9 10:10 702369
-rw-------. 1 postgres postgres     860160 May 12 12:08 702372
-rw-------. 1 postgres postgres      24576 May  9 12:37 702372_fsm
-rw-------. 1 postgres postgres       8192 May  9 12:42 702372_vm
-rw-------. 1 postgres postgres       8192 May  9 10:10 702377
-rw-------. 1 postgres postgres     499712 May 12 12:08 702381
...shortened...
-rw-------. 1 postgres postgres      16384 May  9 14:34 704207
-rw-------. 1 postgres postgres      16384 May  9 14:34 704208
-rw-------. 1 postgres postgres       8192 May  9 14:34 704209
-rw-------. 1 postgres postgres      16384 May  9 14:34 704210
-rw-------. 1 postgres postgres      16384 May  9 14:34 704211
-rw-------. 1 postgres postgres        512 May  9 14:34 pg_filenode.map
-rw-------. 1 postgres postgres     106804 May 12 18:50 pg_internal.init
-rw-------. 1 postgres postgres          4 Jan 28 13:52 PG_VERSION


Re: hidden junk files in ...data/base/oid/

From
Andrej Vanek
Date:
Hello,

solved.
This is not a postgres issue.

The system was used in HA-cluster with streaming replications. 
The hidden files I asked for were created probably by broken (killed) rsync. It uses such file-format for temporary files used during copying.

This rsync is used by master to slave database synchronization (full on-line backup of master database to slave node) before starting postgres in hot-standby mode on slave the node...

Best Regards, Andrej

Re: hidden junk files in ...data/base/oid/

From
Alvaro Herrera
Date:
Andrej Vanek wrote:
> Hello,
>
> solved.
> This is not a postgres issue.
>
> The system was used in HA-cluster with streaming replications.
> The hidden files I asked for were created probably by broken (killed)
> rsync. It uses such file-format for temporary files used during copying.
>
> This rsync is used by master to slave database synchronization (full
> on-line backup of master database to slave node) before starting postgres
> in hot-standby mode on slave the node...

You not only have leftover first-order rsync temp files (.NNNNN.uvwxyz)
-- but also when those temp files were being copied over by another
rsync run, which created temp files for the first-order temp files,
leaving you with second-order temp files (..NNNNN.uvwxyz.opqrst).  Not
nice.  I wonder if this is anywhere near sanity -- it looks like you're
copying stuff from one direction first, then failed over, then copied in
the opposite direction.  I would have your setup reviewed real closely,
to avoid data-corrupting configuration mistakes.  I have seen people
make subtle mistakes in their configuration, causing their whole HA
setups to be completely broken.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


Re: hidden junk files in ...data/base/oid/

From
Andrej Vanek
Date:
Hello,

thanks for your answer.

I've identified problems in my cluster agent script. It is a custom written script with built in automated recovery of failed slave. It was written in time when postgres 9.1 streaming replications feature was just in beta release and there was no postgres agent for streaming replications available out there. 
The problem was that the failed slave recovery was hardcoded into start operation. But this start operation was aborted by pacemaker due to startup operation timeout. This occured before having finished backup from master to failed slave (in case of bigger database). This is the point where rsync could be aborted and left over temporary junk files. There was no cleanup before re-running the backup from master (using rsync). This may be the reason why there may be left rsync temporary files.
Second problem identified is what you write: copying stuff from one direction first, then failed over, then copied in the opposite direction.
This was caused because my agent was missing the lock file that standard clusterlabs pgsql agent uses to avoid starting failed master in case of double failure followed by reboot.

Now I'm migrating to the standard pacemaker's postgres cluster agent provided by clusterlabs.org to avoid such issues. It is surely much better tested by plenty of installations worldwide with community feedback.

In addition I need to automate single (master or slave) failure recovery as much as possible. For this purpose I plan to introduce a new resource on top of pgsql resource which would recover failed pgsql slave(or master) in case master is active on another node (I use only two node cluster). Manual recovery by operator would be needed for cases when postgres on both nodes is down to avoid accidental data loss.
Do you know whether there is such cluster agent already available?

Best Regards, Andrej



2014-05-27 16:09 GMT+02:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Andrej Vanek wrote:
> Hello,
>
> solved.
> This is not a postgres issue.
>
> The system was used in HA-cluster with streaming replications.
> The hidden files I asked for were created probably by broken (killed)
> rsync. It uses such file-format for temporary files used during copying.
>
> This rsync is used by master to slave database synchronization (full
> on-line backup of master database to slave node) before starting postgres
> in hot-standby mode on slave the node...

You not only have leftover first-order rsync temp files (.NNNNN.uvwxyz)
-- but also when those temp files were being copied over by another
rsync run, which created temp files for the first-order temp files,
leaving you with second-order temp files (..NNNNN.uvwxyz.opqrst).  Not
nice.  I wonder if this is anywhere near sanity -- it looks like you're
copying stuff from one direction first, then failed over, then copied in
the opposite direction.  I would have your setup reviewed real closely,
to avoid data-corrupting configuration mistakes.  I have seen people
make subtle mistakes in their configuration, causing their whole HA
setups to be completely broken.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services