Re: [ADMIN] Strange times in WAL files in archive directory (9.3) - Mailing list pgsql-admin

From Achilleas Mantzios
Subject Re: [ADMIN] Strange times in WAL files in archive directory (9.3)
Date
Msg-id 58873609.4030004@matrix.gatewaynet.com
Whole thread Raw
In response to [ADMIN] Strange times in WAL files in archive directory (9.3)  (Achilleas Mantzios <achill@matrix.gatewaynet.com>)
Responses Re: [ADMIN] Strange times in WAL files in archive directory (9.3)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-admin
On 24/01/2017 11:12, Achilleas Mantzios wrote:
> Hello,
> we are running PostgreSQL 9.3 on a new VM running Linux 3.16.0-4-amd64 #1 SMP, Debian 8.6, and we setup WAL archiving
yesterday.We were planing to keep a window of 15 days worth of WAL files, so we  
> thought of writing a script to delete files older than 15 days. The settings used were rather common :
> archive_mode = on
> archive_command = '/usr/bin/scp %p sma:/smadb/pgsql/pitr/%f'
> archive_timeout=240
> However, this morning, to my surprise I saw this by doing ls on the archive directory :
> # ls -l /smadb/pgsql/pitr/ | less
> total 12680296
> -rw------- 1 postgres postgres 16777216 Jan 24 03:44 000000020000070100000000
> -rw------- 1 postgres postgres 16777216 Jan 24 03:50 000000020000070100000001
> -rw------- 1 postgres postgres 16777216 Jan 24 03:54 000000020000070100000002
> -rw------- 1 postgres postgres 16777216 Jan 24 04:00 000000020000070100000003
> -rw------- 1 postgres postgres 16777216 Jan 24 04:00 000000020000070100000004
> -rw------- 1 postgres postgres 16777216 Jan 24 04:04 000000020000070100000005
> -rw------- 1 postgres postgres 16777216 Jan 24 04:09 000000020000070100000006
> -rw------- 1 postgres postgres 16777216 Jan 24 04:14 000000020000070100000007
> -rw------- 1 postgres postgres 16777216 Jan 24 04:20 000000020000070100000008
> -rw------- 1 postgres postgres 16777216 Jan 24 04:20 000000020000070100000009
> -rw------- 1 postgres postgres 16777216 Jan 24 04:25 00000002000007010000000A
> -rw------- 1 postgres postgres 16777216 Jan 24 04:29 00000002000007010000000B
> -rw------- 1 postgres postgres 16777216 Jan 24 04:34 00000002000007010000000C
> -rw------- 1 postgres postgres 16777216 Jan 24 04:39 00000002000007010000000D
> -rw------- 1 postgres postgres 16777216 Jan 24 04:39 00000002000007010000000E
> -rw------- 1 postgres postgres 16777216 Jan 24 04:44 00000002000007010000000F
> ...........
> -rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000701000000EF
> -rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000701000000F0
> -rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000701000000F1  <------
> -rw------- 1 postgres postgres 16777216 Jan 23 13:04 0000000200000701000000F2 <------ time jumps
> -rw------- 1 postgres postgres 16777216 Jan 23 13:05 0000000200000701000000F3
> -rw------- 1 postgres postgres 16777216 Jan 23 13:06 0000000200000701000000F4
>
> So IMHO either the naming of the WAL filename got damaged somehow or the one of the servers got wrong time or
somethingI miss. "Jan 23 13:04" is indeed the time that we started the server for WAL  
> archiving.
> By sorting by time I get :
> ls -ltr /smadb/pgsql/pitr/ | less
> -rw------- 1 postgres postgres 16777216 Jan 23 13:04 0000000200000701000000F2
> -rw------- 1 postgres postgres 16777216 Jan 23 13:05 0000000200000701000000F3
> -rw------- 1 postgres postgres 16777216 Jan 23 13:06 0000000200000701000000F4
> -rw------- 1 postgres postgres 16777216 Jan 23 13:07 0000000200000701000000F5
> -rw------- 1 postgres postgres 16777216 Jan 23 13:08 0000000200000701000000F6
> -rw------- 1 postgres postgres 16777216 Jan 23 13:09 0000000200000701000000F7
> -rw------- 1 postgres postgres 16777216 Jan 23 13:09 0000000200000701000000F8
> -rw------- 1 postgres postgres 16777216 Jan 23 13:10 0000000200000701000000F9
> -rw------- 1 postgres postgres 16777216 Jan 23 13:11 0000000200000701000000FA
> -rw------- 1 postgres postgres 16777216 Jan 23 13:12 0000000200000701000000FB
> -rw------- 1 postgres postgres 16777216 Jan 23 13:13 0000000200000701000000FC
> ........
> -rw------- 1 postgres postgres 16777216 Jan 24 03:15 0000000200000702000000F9
> -rw------- 1 postgres postgres 16777216 Jan 24 03:19 0000000200000702000000FA
> -rw------- 1 postgres postgres 16777216 Jan 24 03:23 0000000200000702000000FB
> -rw------- 1 postgres postgres 16777216 Jan 24 03:27 0000000200000702000000FC
> -rw------- 1 postgres postgres 16777216 Jan 24 03:31 0000000200000702000000FD
> -rw------- 1 postgres postgres 16777216 Jan 24 03:35 0000000200000702000000FE
> -rw------- 1 postgres postgres 16777216 Jan 24 03:39 0000000200000702000000FF
> -rw------- 1 postgres postgres 16777216 Jan 24 03:43 000000020000070300000000
> -rw------- 1 postgres postgres 16777216 Jan 24 03:44 000000020000070100000000 <----- problem begins
> -rw------- 1 postgres postgres 16777216 Jan 24 03:47 000000020000070300000001
> -rw------- 1 postgres postgres 16777216 Jan 24 03:50 000000020000070100000001
> -rw------- 1 postgres postgres 16777216 Jan 24 03:51 000000020000070300000002
> -rw------- 1 postgres postgres 16777216 Jan 24 03:54 000000020000070100000002
> ........ (same pattern goes on 0000000200000703 mixing with 0000000200000701 )
> -rw------- 1 postgres postgres 16777216 Jan 24 09:42 0000000200000701000000EE
> -rw------- 1 postgres postgres 16777216 Jan 24 09:45 0000000200000703000000EF
> -rw------- 1 postgres postgres 16777216 Jan 24 09:47 0000000200000703000000F0
> -rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000703000000F1
> -rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000701000000EF
> -rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000701000000F0
> -rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000701000000F1 <----- problem ends
> -rw------- 1 postgres postgres 16777216 Jan 24 09:50 0000000200000703000000F2
> -rw------- 1 postgres postgres 16777216 Jan 24 09:52 0000000200000703000000F3
> -rw------- 1 postgres postgres 16777216 Jan 24 09:56 0000000200000703000000F4
>
> Any thoughts about any explanation on this?
>
By scrolling up the terminal on the database server (cause the said file is gone now from pg_xlog) I saw this :
$ ls -ltr ~/data/pg_xlog/0000000200000701000000F0
-rw------- 1 postgres postgres 16777216 Jan 23 13:01 /var/lib/pgsql/data/pg_xlog/0000000200000701000000F0
$ md5sum ~/data/pg_xlog/0000000200000701000000F0
8ce1917c86d3c9e41455ec13659c2689 /var/lib/pgsql/data/pg_xlog/0000000200000701000000F0

In the archive's host I see this :
# ls -ltr /smadb/pgsql/pitr/0000000200000701000000F0
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 /smadb/pgsql/pitr/0000000200000701000000F0
# md5sum /smadb/pgsql/pitr/0000000200000701000000F0
8ce1917c86d3c9e41455ec13659c2689 /smadb/pgsql/pitr/0000000200000701000000F0

Same file, different mod times, so, this makes me think that maybe there was some flushing that the postgresql server
did,prior to deleting or recycling some old WAL files. Also I see this : 
# ls -ltr /smadb/pgsql/pitr/0000000200000701000000*
-rw------- 1 postgres postgres 16777216 Jan 23 13:04 /smadb/pgsql/pitr/0000000200000701000000F2
-rw------- 1 postgres postgres 16777216 Jan 23 13:05 /smadb/pgsql/pitr/0000000200000701000000F3
-rw------- 1 postgres postgres 16777216 Jan 23 13:06 /smadb/pgsql/pitr/0000000200000701000000F4
-rw------- 1 postgres postgres 16777216 Jan 23 13:07 /smadb/pgsql/pitr/0000000200000701000000F5
-rw------- 1 postgres postgres 16777216 Jan 23 13:08 /smadb/pgsql/pitr/0000000200000701000000F6
-rw------- 1 postgres postgres 16777216 Jan 23 13:09 /smadb/pgsql/pitr/0000000200000701000000F7
-rw------- 1 postgres postgres 16777216 Jan 23 13:09 /smadb/pgsql/pitr/0000000200000701000000F8
-rw------- 1 postgres postgres 16777216 Jan 23 13:10 /smadb/pgsql/pitr/0000000200000701000000F9
-rw------- 1 postgres postgres 16777216 Jan 23 13:11 /smadb/pgsql/pitr/0000000200000701000000FA
-rw------- 1 postgres postgres 16777216 Jan 23 13:12 /smadb/pgsql/pitr/0000000200000701000000FB
-rw------- 1 postgres postgres 16777216 Jan 23 13:13 /smadb/pgsql/pitr/0000000200000701000000FC
-rw------- 1 postgres postgres 16777216 Jan 23 13:14 /smadb/pgsql/pitr/0000000200000701000000FD
-rw------- 1 postgres postgres 16777216 Jan 23 13:15 /smadb/pgsql/pitr/0000000200000701000000FE
-rw------- 1 postgres postgres 16777216 Jan 23 13:16 /smadb/pgsql/pitr/0000000200000701000000FF
-rw------- 1 postgres postgres 16777216 Jan 24 03:44 /smadb/pgsql/pitr/000000020000070100000000 <---- time jump
-rw------- 1 postgres postgres 16777216 Jan 24 03:50 /smadb/pgsql/pitr/000000020000070100000001
-rw------- 1 postgres postgres 16777216 Jan 24 03:54 /smadb/pgsql/pitr/000000020000070100000002
-rw------- 1 postgres postgres 16777216 Jan 24 04:00 /smadb/pgsql/pitr/000000020000070100000003
-rw------- 1 postgres postgres 16777216 Jan 24 04:00 /smadb/pgsql/pitr/000000020000070100000004
......
-rw------- 1 postgres postgres 16777216 Jan 24 09:42 /smadb/pgsql/pitr/0000000200000701000000EE
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 /smadb/pgsql/pitr/0000000200000701000000EF
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 /smadb/pgsql/pitr/0000000200000701000000F0
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 /smadb/pgsql/pitr/0000000200000701000000F1

So this makes me believe that all the files before 0000000200000701000000F2 were present before WAL archiving was
setup.So I want to ask : 

1) Do you find this behavior normal?
2) From now on, can we assume a monotonic relation between file names and creation times in order to base our
maintenancescripts on this? 

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



pgsql-admin by date:

Previous
From: Achilleas Mantzios
Date:
Subject: [ADMIN] Strange times in WAL files in archive directory (9.3)
Next
From: Tom Lane
Date:
Subject: Re: [ADMIN] Strange times in WAL files in archive directory (9.3)