Re: Better handling of archive_command problems - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Better handling of archive_command problems |
Date | |
Msg-id | CA+Tgmobu5AkOoDv4iSkPd4-+jZ_+j74rvArQz2=yqQPyCvzDpQ@mail.gmail.com Whole thread Raw |
In response to | Re: Better handling of archive_command problems (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: Better handling of archive_command problems
Re: Better handling of archive_command problems |
List | pgsql-hackers |
On Thu, May 16, 2013 at 2:42 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Thu, May 16, 2013 at 11:16 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Well, I think it IS a Postgres precept that interrupts should get a >> timely response. You don't have to agree, but I think that's >> important. > > Well, yes, but the fact of the matter is that it is taking high single > digit numbers of seconds to get a response at times, so I don't think > that there is any reasonable expectation that that be almost > instantaneous. I don't want to make that worse, but then it might be > worth it in order to ameliorate a particular pain point for users. At times, like when the system is under really heavy load? Or at times, like depending on what the backend is doing? We can't do a whole lot about the fact that it's possible to beat a system to death so that, at the OS level, it stops responding. Linux is unfriendly enough to put processes into non-interruptible kernel wait states when they're waiting on the disk, a decision that I suspect to have been made by a sadomasochist. But if there are times when a system that is not responding to cancels in under a second when not particularly heavily loaded, I would consider that a bug, and we should fix it. >>> There is a setting called zero_damaged_pages, and enabling it causes >>> data loss. I've seen cases where it was enabled within postgresql.conf >>> for years. >> >> That is both true and bad, but it is not a reason to do more bad things. > > I don't think it's bad. I think that we shouldn't be paternalistic > towards our users. If anyone enables a setting like zero_damaged_pages > (or, say, wal_write_throttle) within their postgresql.conf > indefinitely for no good reason, then they're incompetent. End of > story. That's a pretty user-hostile attitude. Configuration mistakes are a very common user error. If those configuration hose the system, users expect to be able to change them back, hit reload, and get things back on track. But you're proposing a GUC that, if set to a bad value, will very plausibly cause the entire system to freeze up in such a way that it won't respond to a reload request - or for that matter a fast shutdown request. I think that's 100% unacceptable. Despite what you seem to think, we've put a lot of work into ensuring interruptibility, and it does not make sense to abandon that principle for this or any other feature. > Would you feel better about it if the setting had a time-out? Say, the > user had to explicitly re-enable it after one hour at the most? No, but I'd feel better about it if you figured out a way avoid creating a scenario where it might lock up the entire database cluster. I am convinced that it is possible to avoid that, and that without that this is not a feature worthy of being included in PostgreSQL. Yeah, it's more work that way. But that's the difference between "a quick hack that is useful in our shop" and "a production-quality feature ready for a general audience". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: