Re: Hard limit on WAL space used (because PANIC sucks) - Mailing list pgsql-hackers
From | Josh Berkus |
---|---|
Subject | Re: Hard limit on WAL space used (because PANIC sucks) |
Date | |
Msg-id | 51B6220A.9070103@agliodbs.com Whole thread Raw |
In response to | Hard limit on WAL space used (because PANIC sucks) (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: Hard limit on WAL space used (because PANIC sucks)
|
List | pgsql-hackers |
Josh, Daniel, >> Right now, what we're telling users is "You can have continuous backup >> with Postgres, but you'd better hire and expensive consultant to set it >> up for you, or use this external tool of dubious provenance which >> there's no packages for, or you might accidentally cause your database >> to shut down in the middle of the night." > > This is an outright falsehood. We are telling them, "You better know > what you are doing" or "You should call a consultant". This is no > different than, "You better know what you are doing" or "You should take > driving lessons". What I'm pointing out is that there is no "simple case" for archiving the way we have it set up. That is, every possible way to deploy PITR for Postgres involves complex, error-prone configuration, setup, and monitoring. I don't think that's necessary; simple cases should have simple solutions. If you do a quick survey of pgsql-general, you will see that the issue of databases shutting down unexpectedly due to archiving running them out of disk space is a very common problem. People shouldn't be afraid of their backup solutions. I'd agree that one possible answer for this is to just get one of the external tools simplified, well-packaged, distributed, instrumented for common monitoring systems, and referenced in our main documentation. I'd say Barman is the closest to "a simple solution for the simple common case", at least for PITR. I've been able to give some clients Barman and have them deploy it themselves. This isn't true of the other tools I've tried. Too bad it's GPL, and doesn't do archiving-for-streaming. > I have a clear bias in experience here, but I can't relate to someone > who sets up archives but is totally okay losing a segment unceremoniously, > because it only takes one of those once in a while to make a really, > really bad day. Who is this person that lackadaisically archives, and > are they just fooling themselves? And where are these archivers that If WAL archiving is your *second* level of redundancy, you will generally be willing to have it break rather than interfere with the production workload. This is particularly the case if you're using archiving just as a backup for streaming replication. Heck, I've had one client where archiving was being used *only* to spin up staging servers, and not for production at all; do you think they wanted production to shut down if they ran out of archive space (which it did)? I'll also point out that archiving can silently fail for a number of reasons having nothing to do with "safety" options, such as an NFS mount in Linux silently going away (I've also had this happen), or network issues causing file corruption. Which just points out that we need better ways to detect gaps/corruption in archiving. Anyway, what I'm pointing out is that this is a business decision, and there is no way that we can make a decision for the users what to do when we run out of WAL space. And that the "stop archiving" option needs to be there for users, as well as the "shut down" option. *without* requiring users to learn the internals of the archiving system to implement it, or to know the implied effects of non-obvious PostgreSQL settings. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
pgsql-hackers by date: