Hi All,
thank you all, I sincerely appreciate your feedback.
I have done a fair amount of testing on the solution proposed by you all (not removing backup_label), and it seems to have completely addressed the issue.
This was actually introduced some time back, and I am not completely certain how it crept into our codebase. I think that at least part of the explanation lies in the fact that we are experiencing a fair amount of growth in the database size and use on some of our installations. This could be the reason why extensive testing did not show the issue back then and why we are seeing it now.
Would it make sense to log a warning in the case of a missing backup_label file, or would it be difficult to identify that situation in the code? I would be happy to dig in and develop a patch?
With regards to the package version; we *are* working with a few "stock" scenarios, where one of them is a fairly old RHEL installation. We also have centos versions that are much more updated.
Best regards, and thank you all again,
Fredrik
On 20 October 2016 at 22:38:26 +02:00, Andres Freund <andres@anarazel.de> wrote:
On 2016-10-20 22:37:15 +0900, Michael Paquier wrote:
- remove a file called backup_label, but I am not certain that this file is
in fact there (any more).
It is never a good idea when you are trying to restore from a backup,
backup_label contains critical information when restoring from a
backup, so you may finish with a corrupted data folder.
And this actually seems like a likely source of these errors. Removing
a backup label unfortunately causes hard to diagnose errors, because
everything appears to be ok as long as there's no checkpoints while
taking the base backups (or when the control file was copied early
enough). But as soon as a second checkpoint happens before the control
file is copied...
Fredrik, how did you end up removing the label?
Greetings,
Andres Freund