Re: .ready files appearing on slaves - Mailing list pgsql-hackers

From Jehan-Guillaume de Rorthais
Subject Re: .ready files appearing on slaves
Date
Msg-id 20140915173724.2dae0dcd@erg
Whole thread Raw
In response to .ready files appearing on slaves  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
Responses BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
List pgsql-hackers
Hi hackers,

An issue that seems related to this has been posted on pgsql-admin. See:
 http://www.postgresql.org/message-id/CAAS3tyLnXaYDZ0+zhXLPdVtOvHQOvR+jSPhp30o8kvWqQs0Tqw@mail.gmail.com

How can we help on this issue?

Cheers,

On Thu, 4 Sep 2014 17:50:36 +0200
Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote:

> Hi hackers,
> 
> Since few months, we occasionally see .ready files appearing on some slave
>  instances from various context. The two I have in mind are under 9.2.x.
> 
> I tried to investigate a bit. These .ready files are created when a WAL file
> from pg_xlog has no corresponding file in pg_xlog/archive_status. I could
> easily experience this by deleting such a file: it is created again at the
> next restartpoint or checkpoint received from the master.
> 
> Looking at the WAL in pg_xlog folder corresponding to these .ready files, they
> are all much older than the current WAL "cycle" in both mtime and name logic
> sequence. As instance on one of these box we have currently 6 of those "ghost"
> WALs:
> 
>   0000000200001E53000000FF
>   0000000200001F18000000FF
>   0000000200002047000000FF
>   00000002000020BF000000FF
>   0000000200002140000000FF
>   0000000200002370000000FF
>   000000020000255D000000A8
>   000000020000255D000000A9
>   [...normal WAL sequence...]
>   000000020000255E0000009D
> 
> And on another box:
> 
>   000000010000040E000000FF
>   0000000100000414000000DA
>   000000010000046E000000FF
>   0000000100000470000000FF
>   00000001000004850000000F
>   000000010000048500000010
>   [...normal WAL sequence...]
>   000000010000048500000052
> 
> So it seems for some reasons, these old WALs were "forgotten" by the
> restartpoint mechanism when they should have been recylced/deleted. 
> 
> For one of these servers, I could correlate this with some brutal
> disconnection of the streaming replication appearing in its logs. But there
> was no known SR disconnection on the second one.
> 
> Any idea about this weird behaviour? What can we do to help you investigate
> further?
> 
> Regards,



-- 
Jehan-Guillaume de Rorthais
Dalibo
http://www.dalibo.com



pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Collation-aware comparisons in GIN opclasses
Next
From: Robert Haas
Date:
Subject: Re: Turning off HOT/Cleanup sometimes