Thread: Better handling of archive_command problems
The documentation says of continuous archiving: "While designing your archiving setup, consider what will happen if the archive command fails repeatedly because some aspect requires operator intervention or the archive runs out of space. For example, this could occur if you write to tape without an autochanger; when the tape fills, nothing further can be archived until the tape is swapped. You should ensure that any error condition or request to a human operator is reported appropriately so that the situation can be resolved reasonably quickly. The pg_xlog/ directory will continue to fill with WAL segment files until the situation is resolved. (If the file system containing pg_xlog/ fills up, PostgreSQL will do a PANIC shutdown. No committed transactions will be lost, but the database will remain offline until you free some space.)" I think that it is not uncommon for archiving to fall seriously behind, risking a serious loss of availability. When this happens, the DBA has to fight against the clock to fix whatever problem there is with continuous archiving, hoping to catch up and prevent a PANIC shutdown. This is a particularly unpleasant problem to have. At Heroku, we naturally monitor the state of continuous archiving on all clusters under our control. However, when faced with this situation, sometimes the least-worst option to buy time is to throttle Postgres using a crude mechanism: issuing repeated SIGSTOP and SIGCONT signals to all Postgres processes, with the exception of the archiver auxiliary process. Obviously this is a terrible thing to have to do, principally because it slows almost everything right down. It would be far preferable to just slow down the writing of WAL segments when these emergencies arise, since that alone is what risks causing a PANIC shutdown when XLogWrite() cannot write WAL. Even if the pg_xlog directory is on the same filesystem as database heap files, it is obviously the case that throttling WAL will have the effect of throttling operations that might cause those heap files to be enlarged. Reads (including the writes that enable reads, like those performed by the background writer and backends to clean dirty buffers) and checkpointing are not affected (though of course checkpointing does have to write checkpoint WAL records, so perhaps not quite). What I'd like to propose is that we simply sit on WALWriteLock for a configured delay in order to throttle the writing (though not the insertion) of WAL records. I've drafted a patch that does just that - it has the WAL Writer optionally sleep on the WALWriteLock for some period of time once per activity cycle (avoiding WAL Writer hibernation). If this sounds similar to commit_delay, that's because it is almost exactly the same. We just sleep within the WAL Writer rather than a group commit leader backend because that doesn't depend upon some backend hitting the XLogFlush()/commit_delay codepath. In a bulk loading situation, it's perfectly possible for no backend to actually hit XLogFlush() with any sort of regularity, so commit_delay cannot really be abused to do what I describe here. Besides, right now commit_delay is capped so that it isn't possible to delay for more than 1/10th of a second. What I've proposed here has the disadvantage of making activity rounds of the WAL Writer take longer, thus considerably increasing the window in which any asynchronous commits will actually make it out to disk. However, that's a problem that's inherent with any throttling of WAL Writing as described here (XLogBackgroundFlush() itself acquires WalWriteLock anyway), so I don't imagine that there's anything that can be done about that other than having a clear warning. I envisage this feature as very much a sharp tool to be used by the DBA only when they are in a very tight bind. Better to at least be able to handle read queries in the event of having this problem, and to not throttle longer running transactions with some writes that don't need to make it out to disk right away. I also have a notion that we can usefully throttle WAL writing less aggressively than almost or entirely preventing it. I have an idea that a third party monitoring daemon could scale up or down the throttling delay as a function of how full the pg_xlog filesystem is. It might be better to modestly throttle WAL writing for two hours in order to allow continuous archiving to catch up, rather than sharply curtailing WAL writing for a shorter period. Has anyone else thought about approaches to mitigating the problems that arise when an archive_command continually fails, and the DBA must manually clean up the mess? -- Peter Geoghegan
On Mon, May 13, 2013 at 3:02 PM, Peter Geoghegan <pg@heroku.com> wrote: > Has anyone else thought about approaches to mitigating the problems > that arise when an archive_command continually fails, and the DBA must > manually clean up the mess? Notably, the most common problem in this vein suffered at Heroku has nothing to do with archive_command failing, and everything to do with the ratio of block device write performance (hence, backlog) versus the archiving performance. When CPU is uncontended it's not a huge deficit, but it is there and it causes quite a bit of stress. Archive commands failing are definitely a special case there, where it might be nice to bring write traffic to exactly zero for a time.
On Tue, May 14, 2013 at 12:23 AM, Daniel Farina <daniel@heroku.com> wrote: > On Mon, May 13, 2013 at 3:02 PM, Peter Geoghegan <pg@heroku.com> wrote: >> Has anyone else thought about approaches to mitigating the problems >> that arise when an archive_command continually fails, and the DBA must >> manually clean up the mess? > > Notably, the most common problem in this vein suffered at Heroku has > nothing to do with archive_command failing, and everything to do with > the ratio of block device write performance (hence, backlog) versus > the archiving performance. When CPU is uncontended it's not a huge > deficit, but it is there and it causes quite a bit of stress. > > Archive commands failing are definitely a special case there, where it > might be nice to bring write traffic to exactly zero for a time. One possible objection to this line of attack is that, IIUC, waits to acquire a LWLock are non-interruptible. If someone tells PostgreSQL to wait for some period of time before performing each WAL write, other backends that grab the WALWriteLock will not respond to query cancels during that time. Worse, the locks have a tendency to back up. What I have observed is that if WAL isn't flushed in a timely fashion, someone will try to grab WALWriteLock while holding WALInsertLock. Now anyone who attempts to insert WAL is in a non-interruptible wait. If the system is busy, it won't be long before someone tries to extend pg_clog, and to do that they'll try to grab WALInsertLock while holding CLogControlLock. At that point, any CLOG lookup that misses in the already-resident pages will send that backend into a non-interruptible wait. I have seen cases where this pile-up occurs during a heavy pgbench workload and paralyzes the entire system, including any read-only queries, until the WAL write completes. Now despite all that, I can see this being useful enough that Heroku might want to insert a very small patch into their version of PostgreSQL to do it this way, and just live with the downsides. But anything that can propagate non-interruptible waits across the entire system does not sound to me like a feature that is sufficiently polished that we want to expose it to users less sophisticated than Heroku (i.e. nearly all of them). If we do this, I think we ought to find a way to make the waits interruptible, and to insert them in places where they really don't interfere with read-only backends. I'd probably also argue that we ought to try to design it such that the GUC can be in MB/s rather than delay/WAL writer cycle. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, May 15, 2013 at 3:46 AM, Robert Haas <robertmhaas@gmail.com> wrote: > One possible objection to this line of attack is that, IIUC, waits to > acquire a LWLock are non-interruptible. If someone tells PostgreSQL > to wait for some period of time before performing each WAL write, > other backends that grab the WALWriteLock will not respond to query > cancels during that time. I don't see any reasonable way to make LWLocks care about interrupts (using all 3 possible underlying semaphore implementations, no less). As it says within LWLockAcquire: /** Lock out cancel/die interrupts until we exit the code section protected* by the LWLock. This ensures that interruptswill not interfere with* manipulations of data structures in shared memory.*/ HOLD_INTERRUPTS(); We've been pretty judicious about placing CHECK_FOR_INTERRUPTS() calls in the right places, but it's still quite possible to see the server take multiple seconds - perhaps even as many as 10 - to respond to an interrupt (by psql SIGINT). Now, I didn't have enough of an interest at the times I noticed this to figure out exactly why that may have been or to somehow characterize it, but I don't accept that it's a violation of some Postgres precept that this setting could result in interrupts taking multiple seconds, and maybe even as many as 10. I'd go so far as to let the user make the throttling sleep take as long as they like, even though this admittedly would sort of break such a precept. There is a setting called zero_damaged_pages, and enabling it causes data loss. I've seen cases where it was enabled within postgresql.conf for years. > Worse, the locks have a tendency to back up. Well, yes, sleeping on WALWriteLock has some fairly bad consequences for performance. That's sort of the idea. The bar is rather low here, to my mind (the state of the art currently is deal with it/monitor it yourself, and risk a PANIC shutdown if you fail). I think it's worth noting that at least one other "Enterprise" RDBMS has a similar feature. The only situation that I can see where a backend would acquire WALWriteLock alongside WALInsertLock is when a segment switch needs to occur (i.e. a XLOG_SWITCH record is being inserted). If that's the case, no one is going to be able to insert WAL for as long as the sleep to throttle occurs anyway, so there is no additional harm done. > Now despite all that, I can see this being useful enough that Heroku > might want to insert a very small patch into their version of > PostgreSQL to do it this way, and just live with the downsides. But > anything that can propagate non-interruptible waits across the entire > system does not sound to me like a feature that is sufficiently > polished that we want to expose it to users less sophisticated than > Heroku (i.e. nearly all of them). If we do this, I think we ought to > find a way to make the waits interruptible, and to insert them in > places where they really don't interfere with read-only backends. It would be nice to be able to be sure that CLogControlLock could not be held for multiple seconds as a result of this. However, I don't see any reasons to let the perfect be the enemy of the good, or at least the better. Just how likely is it that the scenario you describe will affect reads in the real world? In any case, this is a problem in its own right. I don't intend to make any promises about how this throttling will affect read queries, except perhaps something vague and loose. > I'd probably also argue that we ought to try to design it such that the > GUC can be in MB/s rather than delay/WAL writer cycle. That'd certainly be better, but I can't see a way of doing it without adding a whole bunch of mechanism to some important codepaths, like within XLogWrite(), which would be quite a hard sell. -- Peter Geoghegan
On Wed, May 15, 2013 at 6:40 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, May 15, 2013 at 3:46 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> One possible objection to this line of attack is that, IIUC, waits to >> acquire a LWLock are non-interruptible. If someone tells PostgreSQL >> to wait for some period of time before performing each WAL write, >> other backends that grab the WALWriteLock will not respond to query >> cancels during that time. > > I don't see any reasonable way to make LWLocks care about interrupts > (using all 3 possible underlying semaphore implementations, no less). It couldn't be done across the board, but that doesn't mean that certain cases couldn't get special treatment. > As it says within LWLockAcquire: > > /* > * Lock out cancel/die interrupts until we exit the code section protected > * by the LWLock. This ensures that interrupts will not interfere with > * manipulations of data structures in shared memory. > */ > HOLD_INTERRUPTS(); > > We've been pretty judicious about placing CHECK_FOR_INTERRUPTS() calls > in the right places, but it's still quite possible to see the server > take multiple seconds - perhaps even as many as 10 - to respond to an > interrupt (by psql SIGINT). Now, I didn't have enough of an interest > at the times I noticed this to figure out exactly why that may have > been or to somehow characterize it, but I don't accept that it's a > violation of some Postgres precept that this setting could result in > interrupts taking multiple seconds, and maybe even as many as 10. I'd > go so far as to let the user make the throttling sleep take as long as > they like, even though this admittedly would sort of break such a > precept. Well, I think it IS a Postgres precept that interrupts should get a timely response. You don't have to agree, but I think that's important. > There is a setting called zero_damaged_pages, and enabling it causes > data loss. I've seen cases where it was enabled within postgresql.conf > for years. That is both true and bad, but it is not a reason to do more bad things. >> Now despite all that, I can see this being useful enough that Heroku >> might want to insert a very small patch into their version of >> PostgreSQL to do it this way, and just live with the downsides. But >> anything that can propagate non-interruptible waits across the entire >> system does not sound to me like a feature that is sufficiently >> polished that we want to expose it to users less sophisticated than >> Heroku (i.e. nearly all of them). If we do this, I think we ought to >> find a way to make the waits interruptible, and to insert them in >> places where they really don't interfere with read-only backends. > > It would be nice to be able to be sure that CLogControlLock could not > be held for multiple seconds as a result of this. However, I don't see > any reasons to let the perfect be the enemy of the good, or at least > the better. Just how likely is it that the scenario you describe will > affect reads in the real world? In any case, this is a problem in its > own right. That's true, but you're proposing to add a knob which would make it much easier for users to notice the bad behavior that already exists, and to prolong it for unbounded periods of time even when the system is not under load. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, May 16, 2013 at 11:16 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Well, I think it IS a Postgres precept that interrupts should get a > timely response. You don't have to agree, but I think that's > important. Well, yes, but the fact of the matter is that it is taking high single digit numbers of seconds to get a response at times, so I don't think that there is any reasonable expectation that that be almost instantaneous. I don't want to make that worse, but then it might be worth it in order to ameliorate a particular pain point for users. >> There is a setting called zero_damaged_pages, and enabling it causes >> data loss. I've seen cases where it was enabled within postgresql.conf >> for years. > > That is both true and bad, but it is not a reason to do more bad things. I don't think it's bad. I think that we shouldn't be paternalistic towards our users. If anyone enables a setting like zero_damaged_pages (or, say, wal_write_throttle) within their postgresql.conf indefinitely for no good reason, then they're incompetent. End of story. Would you feel better about it if the setting had a time-out? Say, the user had to explicitly re-enable it after one hour at the most? -- Peter Geoghegan
On Thu, May 16, 2013 at 2:42 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Thu, May 16, 2013 at 11:16 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Well, I think it IS a Postgres precept that interrupts should get a >> timely response. You don't have to agree, but I think that's >> important. > > Well, yes, but the fact of the matter is that it is taking high single > digit numbers of seconds to get a response at times, so I don't think > that there is any reasonable expectation that that be almost > instantaneous. I don't want to make that worse, but then it might be > worth it in order to ameliorate a particular pain point for users. At times, like when the system is under really heavy load? Or at times, like depending on what the backend is doing? We can't do a whole lot about the fact that it's possible to beat a system to death so that, at the OS level, it stops responding. Linux is unfriendly enough to put processes into non-interruptible kernel wait states when they're waiting on the disk, a decision that I suspect to have been made by a sadomasochist. But if there are times when a system that is not responding to cancels in under a second when not particularly heavily loaded, I would consider that a bug, and we should fix it. >>> There is a setting called zero_damaged_pages, and enabling it causes >>> data loss. I've seen cases where it was enabled within postgresql.conf >>> for years. >> >> That is both true and bad, but it is not a reason to do more bad things. > > I don't think it's bad. I think that we shouldn't be paternalistic > towards our users. If anyone enables a setting like zero_damaged_pages > (or, say, wal_write_throttle) within their postgresql.conf > indefinitely for no good reason, then they're incompetent. End of > story. That's a pretty user-hostile attitude. Configuration mistakes are a very common user error. If those configuration hose the system, users expect to be able to change them back, hit reload, and get things back on track. But you're proposing a GUC that, if set to a bad value, will very plausibly cause the entire system to freeze up in such a way that it won't respond to a reload request - or for that matter a fast shutdown request. I think that's 100% unacceptable. Despite what you seem to think, we've put a lot of work into ensuring interruptibility, and it does not make sense to abandon that principle for this or any other feature. > Would you feel better about it if the setting had a time-out? Say, the > user had to explicitly re-enable it after one hour at the most? No, but I'd feel better about it if you figured out a way avoid creating a scenario where it might lock up the entire database cluster. I am convinced that it is possible to avoid that, and that without that this is not a feature worthy of being included in PostgreSQL. Yeah, it's more work that way. But that's the difference between "a quick hack that is useful in our shop" and "a production-quality feature ready for a general audience". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, May 16, 2013 at 5:43 PM, Robert Haas <robertmhaas@gmail.com> wrote: > At times, like when the system is under really heavy load? Or at > times, like depending on what the backend is doing? We can't do a > whole lot about the fact that it's possible to beat a system to death > so that, at the OS level, it stops responding. Linux is unfriendly > enough to put processes into non-interruptible kernel wait states when > they're waiting on the disk, a decision that I suspect to have been > made by a sadomasochist. I find it more plausible that the decision was made by someone making an engineering trade-off. > But if there are times when a system that is > not responding to cancels in under a second when not particularly > heavily loaded, I would consider that a bug, and we should fix it. It's not as if the DBA is going to have a hard time figuring out why that is. It's taking a long time to respond because they've throttled the entire server. Clearly, if that's something they're doing very casually, they have bigger problems. >> I don't think it's bad. I think that we shouldn't be paternalistic >> towards our users. If anyone enables a setting like zero_damaged_pages >> (or, say, wal_write_throttle) within their postgresql.conf >> indefinitely for no good reason, then they're incompetent. End of >> story. > > That's a pretty user-hostile attitude. I think paternalism is user-hostile. Things should be easy to user correctly and hard to use incorrectly. I certainly think we should be novice friendly, but not if that implies being expert hostile. The fact that a PANIC shutdown can occur when the pg_xlog filesystem runs out of space is pretty user-hostile. It's hostile to both novices and experts. > Configuration mistakes are a > very common user error. If those configuration hose the system, users > expect to be able to change them back, hit reload, and get things back > on track. But you're proposing a GUC that, if set to a bad value, > will very plausibly cause the entire system to freeze up in such a way > that it won't respond to a reload request - or for that matter a fast > shutdown request. I think that's 100% unacceptable. Despite what you > seem to think, we've put a lot of work into ensuring interruptibility, > and it does not make sense to abandon that principle for this or any > other feature. > >> Would you feel better about it if the setting had a time-out? Say, the >> user had to explicitly re-enable it after one hour at the most? > > No, but I'd feel better about it if you figured out a way avoid > creating a scenario where it might lock up the entire database > cluster. I am convinced that it is possible to avoid that, and that > without that this is not a feature worthy of being included in > PostgreSQL. What if the WALWriter slept on its proc latch within XLogBackgroundFlush(), rather than calling pg_usleep? That way, WalSigHupHandler() will set the process latch on a reload, and the sleep will end immediately if the user determines that they've made a mistake in setting the sleep. Ditto all other signals. As with all extant latch sleeps, we wake on postmaster death, so that an inordinately long sleep doesn't create a denial-of-service that prevents a restart if, say, the postmaster receives SIGKILL. Maybe it wouldn't even be much additional work to figure out a way of making LWLocks care about interrupts. I think an upper limit on the relevant GUC is sufficient given the nature of what I propose to do, but I might be convinced if a better approach came to light. Do you have one? > Yeah, it's more work that way. But that's the difference > between "a quick hack that is useful in our shop" and "a > production-quality feature ready for a general audience". It is certainly the case that it would be far easier for me to just deploy this on our own customer instances. If I'm expected to solve the problem of this throttling conceivably affecting backends executing read queries due to the CLogControlLock scenario you describe, just so users can have total assurance read queries are unaffected for a couple of hours or less once in a blue moon when they're fighting off a PANIC shutdown, then the bar is set almost impossibly high. This is unfortunate, because there is plenty of evidence that archive_command issues cause serious user pain all the time. -- Peter Geoghegan
On Thu, May 16, 2013 at 5:43 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, May 16, 2013 at 2:42 PM, Peter Geoghegan <pg@heroku.com> wrote: >> On Thu, May 16, 2013 at 11:16 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> Well, I think it IS a Postgres precept that interrupts should get a >>> timely response. You don't have to agree, but I think that's >>> important. >> >> Well, yes, but the fact of the matter is that it is taking high single >> digit numbers of seconds to get a response at times, so I don't think >> that there is any reasonable expectation that that be almost >> instantaneous. I don't want to make that worse, but then it might be >> worth it in order to ameliorate a particular pain point for users. > > At times, like when the system is under really heavy load? Or at > times, like depending on what the backend is doing? We can't do a > whole lot about the fact that it's possible to beat a system to death > so that, at the OS level, it stops responding. Linux is unfriendly > enough to put processes into non-interruptible kernel wait states when > they're waiting on the disk, a decision that I suspect to have been > made by a sadomasochist. But if there are times when a system that is > not responding to cancels in under a second when not particularly > heavily loaded, I would consider that a bug, and we should fix it. > >>>> There is a setting called zero_damaged_pages, and enabling it causes >>>> data loss. I've seen cases where it was enabled within postgresql.conf >>>> for years. >>> >>> That is both true and bad, but it is not a reason to do more bad things. >> >> I don't think it's bad. I think that we shouldn't be paternalistic >> towards our users. If anyone enables a setting like zero_damaged_pages >> (or, say, wal_write_throttle) within their postgresql.conf >> indefinitely for no good reason, then they're incompetent. End of >> story. > > That's a pretty user-hostile attitude. Configuration mistakes are a > very common user error. If those configuration hose the system, users > expect to be able to change them back, hit reload, and get things back > on track. But you're proposing a GUC that, if set to a bad value, > will very plausibly cause the entire system to freeze up in such a way > that it won't respond to a reload request - or for that matter a fast > shutdown request. I think that's 100% unacceptable. Despite what you > seem to think, we've put a lot of work into ensuring interruptibility, > and it does not make sense to abandon that principle for this or any > other feature. The inability to shut down in such a situation is not happy at all, as you say, and the problems with whacking the GUC around due to non-interruptability is pretty bad too. >> Would you feel better about it if the setting had a time-out? Say, the >> user had to explicitly re-enable it after one hour at the most? > > No, but I'd feel better about it if you figured out a way avoid > creating a scenario where it might lock up the entire database > cluster. I am convinced that it is possible to avoid that Do you have a sketch about mechanism to not encounter that problem? > and that without that this is not a feature worthy of being included > in PostgreSQL. Yeah, it's more work that way. But that's the > difference between "a quick hack that is useful in our shop" and "a > production-quality feature ready for a general audience". However little it may matter, I would like to disagree with your opinion on this one: the current situation as I imagine encountered by *all* users of archiving is really unpleasant, 'my' shop or no. It would probably not be inaccurate to say that 99.9999% of archiving users have to live with a hazy control over the amount of data loss, only bounded by how long it takes for the system to full up the WAL file system and then for PostgreSQL to PANIC and crash (hence, no more writes are processed, and no more data can be lost). Once one factors in the human cost of having to deal with that down time or monitor it to circumvent this, I feel as though the bar for quality should be lowered. As you see, we've had to resort to horrific techniques that to get around this problem. I think this is something serious enough that it is worth doing better, but the bind that people doing archiving find themselves in is much worse at the margins -- involving data loss and loss of availability -- and accordingly, I think the bar for some kind of solution should be lowered, insomuch as that at least the interface should be right enough to not be an albatross later (of which this proposal may not meet). That said, there is probably a way to please everyone and do something better. Any ideas?
On Thu, May 16, 2013 at 10:05 PM, Peter Geoghegan <pg@heroku.com> wrote: >>> I don't think it's bad. I think that we shouldn't be paternalistic >>> towards our users. If anyone enables a setting like zero_damaged_pages >>> (or, say, wal_write_throttle) within their postgresql.conf >>> indefinitely for no good reason, then they're incompetent. End of >>> story. >> >> That's a pretty user-hostile attitude. > > I think paternalism is user-hostile. Things should be easy to user > correctly and hard to use incorrectly. I certainly think we should be > novice friendly, but not if that implies being expert hostile. I don't disagree with that. I am arguing that your proposal, as it stands, is far too easy to use incorrectly. > The fact that a PANIC shutdown can occur when the pg_xlog filesystem > runs out of space is pretty user-hostile. It's hostile to both novices > and experts. +1. >> No, but I'd feel better about it if you figured out a way avoid >> creating a scenario where it might lock up the entire database >> cluster. I am convinced that it is possible to avoid that, and that >> without that this is not a feature worthy of being included in >> PostgreSQL. > > What if the WALWriter slept on its proc latch within > XLogBackgroundFlush(), rather than calling pg_usleep? That way, > WalSigHupHandler() will set the process latch on a reload, and the > sleep will end immediately if the user determines that they've made a > mistake in setting the sleep. Ditto all other signals. As with all > extant latch sleeps, we wake on postmaster death, so that an > inordinately long sleep doesn't create a denial-of-service that > prevents a restart if, say, the postmaster receives SIGKILL. That seems like a step in the right direction. I'm not sure it's enough: it makes that particular backend a lot more responsive to interrupts, but other backends that may get caught waiting for the LWLocks it holds are still going to freeze up. Now, that no longer prevents you from reloading the configuration and getting unstuck (which is good), but it's still a lot of non-interruptible stuff (which is not good). I would also suggest considering other possible places to insert the sleep. One idea that occurs to me off-hand is to do it just before or just after each XLogInsert(). We might be holding a buffer lwlock at that point, though. That's probably not SO bad, but what is bad is that we might be extending pg_clog or pg_subtrans while sitting on the relevant SLRU lock, and that's the quintessentially wrong time to go into the tank. Actually, I wonder if we couldn't tie this into ProcessInterrupts(). If this feature is enabled, then after each XLogInsert(), we increment XLogBytesWrittenSinceLastSleep and set InterruptPending if it crosses some threshold (8kB?). In ProcessInterrupts, we add a new stanza at bottom of function. If that finds that XLogBytesWrittenSinceLastSleep exceeds the same threshold, then it does gettimeofday(), computes a sleep time based on the specified WAL write rate, the time since we last slept, and the number of WAL bytes written. It then resets XLogBytesWrittenSinceLastSleep to 0 and sleeps on the process latch for the calculated sleep time. Under this system, we never sleep while holding an LWLock (because LWLockAcquire calls HOLD_INTERRUPTS(), to be matched by LWLockRelease's call to RESUME_INTERRUPTS()), we can throttle the WAL write rate of each backend in bytes/second, and the overhead when the feature's not in use should be negligible. > Maybe it wouldn't even be much additional work to figure out a way of > making LWLocks care about interrupts. I think an upper limit on the > relevant GUC is sufficient given the nature of what I propose to do, > but I might be convinced if a better approach came to light. Do you > have one? I was vaguely thinking about something like LWLockAcquireWithoutHoldingOffInterrupts() that could be used in select locations. But we'd have to think about how safe that really is, and whether it is actually useful. > It is certainly the case that it would be far easier for me to just > deploy this on our own customer instances. If I'm expected to solve > the problem of this throttling conceivably affecting backends > executing read queries due to the CLogControlLock scenario you > describe, just so users can have total assurance read queries are > unaffected for a couple of hours or less once in a blue moon when > they're fighting off a PANIC shutdown, then the bar is set almost > impossibly high. This is unfortunate, because there is plenty of > evidence that archive_command issues cause serious user pain all the > time. Maybe so, but I have trouble believing that very many users are sophisticated enough to use a feature like this unless it's pretty well-cooked. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, May 16, 2013 at 10:06 PM, Daniel Farina <daniel@heroku.com> wrote: > Do you have a sketch about mechanism to not encounter that problem? I didn't until just now, but see my email to Peter. That idea might be all wet, but off-hand it seems like it might work... > However little it may matter, I would like to disagree with your > opinion on this one: the current situation as I imagine encountered by > *all* users of archiving is really unpleasant, 'my' shop or no. It > would probably not be inaccurate to say that 99.9999% of archiving > users have to live with a hazy control over the amount of data loss, > only bounded by how long it takes for the system to full up the WAL > file system and then for PostgreSQL to PANIC and crash (hence, no more > writes are processed, and no more data can be lost). I'm really not trying to pick a fight here. What I am saying is that it is the problem is not so bad that we should accept the first design proposed, despite the obvious problems with it, without trying to find a better one. The first post to -hackers on this was 4 days ago. Feature freeze for the first release that could conceivably contain this feature will most likely occur about 8 months from now. We can take a few more days or weeks to explore alternatives, and I believe it will be possible to find good ones. If I were trying to kill this proposal, I could find better ways to do it than clearly articulating my concerns at the outset. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, May 16, 2013 at 9:13 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, May 16, 2013 at 10:06 PM, Daniel Farina <daniel@heroku.com> wrote: >> Do you have a sketch about mechanism to not encounter that problem? > > I didn't until just now, but see my email to Peter. That idea might > be all wet, but off-hand it seems like it might work... > >> However little it may matter, I would like to disagree with your >> opinion on this one: the current situation as I imagine encountered by >> *all* users of archiving is really unpleasant, 'my' shop or no. It >> would probably not be inaccurate to say that 99.9999% of archiving >> users have to live with a hazy control over the amount of data loss, >> only bounded by how long it takes for the system to full up the WAL >> file system and then for PostgreSQL to PANIC and crash (hence, no more >> writes are processed, and no more data can be lost). > > I'm really not trying to pick a fight here. What I am saying is that > it is the problem is not so bad that we should accept the first design > proposed, despite the obvious problems with it, without trying to find > a better one. The first post to -hackers on this was 4 days ago. > Feature freeze for the first release that could conceivably contain > this feature will most likely occur about 8 months from now. We can > take a few more days or weeks to explore alternatives, and I believe > it will be possible to find good ones. If I were trying to kill this > proposal, I could find better ways to do it than clearly articulating > my concerns at the outset. I know you are not trying to kill it or anything, and I didn't mean to pick a fight either: my point is, without rancor, that I think that as-is the state of affairs fails the sniff test dimensions like availability and exposure to data loss, much worse than some of the negative effects of hobbling cancellations. And I completely agree with you that given the timing of thinking about this issue means one can marinate it for a while. And, I hope it goes without saying, I am grateful for your noodling on the matter, as always.