Thread: WAL Rate Limiting
We've discussed previously the negative impact of large bulk operations, especially wrt WAL writes. Patch here allows maintenance operations to have their WAL generation slowed down as a replication lag prevention feature. I believe there was originally intended to be some work on I/O rate limiting, but that hasn't happened and is in some ways orthogonal to this patch and we will likely eventually want both. Single new parameter works very similarly to vacuum_cost_delay wal_rate_limit_delay = Xms slows down CLUSTER, VACUUM FULL, ALTER TABLE (rewrite & set tablespace), CREATE INDEX so basically same things we optimise WAL for and the same places where we honour maintenance_work_mem (discuss: should we add COPY, CTAS etc also?) (discuss: do we need another parameter to specify "cost"? Currently patch uses "sleep every 64kB of WAL") VACUUM is not included, since we already have controls for that - honouring two controls would be complex and weird. Uses GetCurrentTransactionWALVolume() patch, which is included within the patch to make it easier to review as a whole. Technically, we can't simply wait before/after WAL inserts because these typically occur while holding buffer locks. So we need to put the waits at a higher level, notably in safe places that currently do CHECK_FOR_INTERRUPTS(). Doing that during query execution might make locking of blocks for nested loops joins much worse. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 01/15/2014 10:20 AM, Simon Riggs wrote: > (discuss: do we need another parameter to specify "cost"? Currently > patch uses "sleep every 64kB of WAL") It might be nicer to express this as "up to n MB of WAL per second", but otherwise seems reasonable, and it's easy enough to convert WAL MB/s into a delay by hand. Initial tests of this show that it does limit index creation rates as expected, while continuing to allow WAL generation at normal rates for other concurrent work. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 1/14/14, 8:20 PM, Simon Riggs wrote: > We've discussed previously the negative impact of large bulk > operations, especially wrt WAL writes. Patch here allows maintenance > operations to have their WAL generation slowed down as a replication > lag prevention feature. > > I believe there was originally intended to be some work on I/O rate > limiting, but that hasn't happened and is in some ways orthogonal to > this patch and we will likely eventually want both. > > Single new parameter works very similarly to vacuum_cost_delay > > wal_rate_limit_delay = Xms From a replication standpoint, I think it would be a lot nicer if this also (or instead) considered the replication delaywe were currently experiencing. Related to that... it would be extremely useful to us if we could similarly rate limit certain backend operations, for thepurpose of not overloading the database or replication. That's fairly unrelated except that sometimes you might want toWAL rate limit a backend operation, besides just COPY. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Wed, Jan 15, 2014 at 7:50 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
We've discussed previously the negative impact of large bulk
operations, especially wrt WAL writes. Patch here allows maintenance
operations to have their WAL generation slowed down as a replication
lag prevention feature.
I believe there was originally intended to be some work on I/O rate
limiting, but that hasn't happened and is in some ways orthogonal to
this patch and we will likely eventually want both.
Single new parameter works very similarly to vacuum_cost_delay
wal_rate_limit_delay = Xms
slows down CLUSTER, VACUUM FULL, ALTER TABLE (rewrite & set
tablespace), CREATE INDEX
so basically same things we optimise WAL for and the same places where
we honour maintenance_work_mem
(discuss: should we add COPY, CTAS etc also?)
(discuss: do we need another parameter to specify "cost"? Currently
patch uses "sleep every 64kB of WAL")
VACUUM is not included, since we already have controls for that -
honouring two controls would be complex and weird.
I wonder if should replace vacuum_cost/delay with say maintenance_cost/delay and use it in all maintenance activities including what you just listed out above. While the exact semantics of vacuum and other maintenance activities may differ, ISTM the final goal is just the same i.e. reduce load on the master.
Thanks,
On Wed, Jan 15, 2014 at 3:20 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
-- We've discussed previously the negative impact of large bulk
operations, especially wrt WAL writes. Patch here allows maintenance
operations to have their WAL generation slowed down as a replication
lag prevention feature.
I believe there was originally intended to be some work on I/O rate
limiting, but that hasn't happened and is in some ways orthogonal to
this patch and we will likely eventually want both.
Single new parameter works very similarly to vacuum_cost_delay
wal_rate_limit_delay = Xms
Seems like a really bad name if we are only slowing down some commands - that seems to indicate we're slowing down all of them. I think it should be something that indicates that it only affects the maintenance commands.
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Wed, Jan 15, 2014 at 7:54 AM, Magnus Hagander <magnus@hagander.net> wrote: > On Wed, Jan 15, 2014 at 3:20 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> We've discussed previously the negative impact of large bulk >> operations, especially wrt WAL writes. Patch here allows maintenance >> operations to have their WAL generation slowed down as a replication >> lag prevention feature. >> >> I believe there was originally intended to be some work on I/O rate >> limiting, but that hasn't happened and is in some ways orthogonal to >> this patch and we will likely eventually want both. >> >> Single new parameter works very similarly to vacuum_cost_delay >> >> wal_rate_limit_delay = Xms > > > Seems like a really bad name if we are only slowing down some commands - > that seems to indicate we're slowing down all of them. I think it should be > something that indicates that it only affects the maintenance commands. And why should it only affect the maintenance commands anyway, and who decides what's a maintenance command? I thought Heroku suggested something like this previously, and their use case was something along the lines of "we need to slow the system down enough to do a backup so we can delete some stuff before the disk fills". For that, it seems likely to me that you would just want to slow everything down. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-01-16 09:06:30 -0500, Robert Haas wrote: > > Seems like a really bad name if we are only slowing down some commands - > > that seems to indicate we're slowing down all of them. I think it should be > > something that indicates that it only affects the maintenance commands. > > And why should it only affect the maintenance commands anyway, and who > decides what's a maintenance command? I think implementing it for everything might have some use, but it's a much, much more complex task. You can't simply do rate limiting in XLogInsert() or somesuch - we're holding page locks, buffer pins, other locks... I don't see why that needs to be done in the same feature. I don't really see much difficulty in determining what's a utility command and what not for the purpose of this? All utility commands which create WAL in O(table_size) or worse. > I thought Heroku suggested something like this previously, and their > use case was something along the lines of "we need to slow the system > down enough to do a backup so we can delete some stuff before the disk > fills". For that, it seems likely to me that you would just want to > slow everything down. I think the usecase for this more along the lines of not slowing normal operations or cause replication delays to standbys unduly, while performing maintenance operations on relations. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > I don't really see much difficulty in determining what's a utility > command and what not for the purpose of this? All utility commands which > create WAL in O(table_size) or worse. By that definition, INSERT, UPDATE, and DELETE can all be "utility commands". So would a full-table-scan SELECT, if it's unfortunate enough to run into a lot of hint-setting or HOT-pruning work. I think possibly a more productive approach to this would be to treat it as a session-level GUC setting, rather than hard-wiring it to affect certain commands and not others. regards, tom lane
On 2014-01-16 10:35:20 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > I don't really see much difficulty in determining what's a utility > > command and what not for the purpose of this? All utility commands which > > create WAL in O(table_size) or worse. > > By that definition, INSERT, UPDATE, and DELETE can all be "utility > commands". So would a full-table-scan SELECT, if it's unfortunate > enough to run into a lot of hint-setting or HOT-pruning work. Well, I said *utility* command. So everything processed by ProcessUtility() generating WAL like that. > I think possibly a more productive approach to this would be to treat > it as a session-level GUC setting, rather than hard-wiring it to affect > certain commands and not others. Do you see a reasonable way to implement this generically for all commands? I don't. We shouldn't let the the need for generic resource control stop us from providing some for of resource control for a consistent subset of commands. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 01/16/2014 05:39 PM, Andres Freund wrote: > On 2014-01-16 10:35:20 -0500, Tom Lane wrote: >> I think possibly a more productive approach to this would be to treat >> it as a session-level GUC setting, rather than hard-wiring it to affect >> certain commands and not others. > > Do you see a reasonable way to implement this generically for all > commands? I don't. In suitable safe places, check if you've written too much WAL, and sleep if so. Call it CHECK_FOR_WAL_BUDGET(), along the lines of CHECK_FOR_INTERRUPTS(), but called less frequently. Or maybe vacuum_delay_point() is a better analogy. Hopefully you don't need to sprinkle them in too many places to be useful. - Heikki
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > On 01/16/2014 05:39 PM, Andres Freund wrote: >> Do you see a reasonable way to implement this generically for all >> commands? I don't. > In suitable safe places, check if you've written too much WAL, and sleep > if so. Call it CHECK_FOR_WAL_BUDGET(), along the lines of > CHECK_FOR_INTERRUPTS(), but called less frequently. Or maybe > vacuum_delay_point() is a better analogy. Hopefully you don't need to > sprinkle them in too many places to be useful. We probably don't really need to implement it for "all" commands; I think everyone can agree that, say, ALTER TABLE RENAME COLUMN isn't going to emit enough WAL to really matter. My point was that we should try to hit everything that potentially *could* generate lots of WAL, rather than arbitrarily deciding that some are utility commands and some are not. For INSERT/UPDATE/DELETE, likely you could do this with a single CHECK_FOR_WAL_BUDGET() added at a strategic spot in nodeModifyTable.c. regards, tom lane
On 2014-01-16 17:47:51 +0200, Heikki Linnakangas wrote: > On 01/16/2014 05:39 PM, Andres Freund wrote: > >On 2014-01-16 10:35:20 -0500, Tom Lane wrote: > >>I think possibly a more productive approach to this would be to treat > >>it as a session-level GUC setting, rather than hard-wiring it to affect > >>certain commands and not others. > > > >Do you see a reasonable way to implement this generically for all > >commands? I don't. > > In suitable safe places, check if you've written too much WAL, and sleep if > so. Call it CHECK_FOR_WAL_BUDGET(), along the lines of > CHECK_FOR_INTERRUPTS(), but called less frequently. Or maybe > vacuum_delay_point() is a better analogy. Hopefully you don't need to > sprinkle them in too many places to be useful. That'd be most places doing XLogInsert() if you want to be consistent. Each needing to be analyzed whether we're blocking important resources, determining where in the callstack we can do the CHECK_FOR_WAL_BUDGET(). I don't think the the add power by allowing bulk DML to be metered is worth the increased implementation cost. I think the usecases that would want this for DML probably also wan this to work for unlogged, temp tables. That'd require a much more generic resource control framework. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-01-16 17:47:51 +0200, Heikki Linnakangas wrote: >> In suitable safe places, check if you've written too much WAL, and sleep if >> so. > That'd be most places doing XLogInsert() if you want to be > consistent. See my other response. There's no need for "consistency", only to be sure that code paths that might generate a *lot* of WAL include such checks. We've gotten along fine without any formal methodology for where to put CHECK_FOR_INTERRUPTS() or vacuum_delay_point() calls, and this seems like just more of the same. > I think the usecases that would want this for DML probably also wan this > to work for unlogged, temp tables. Huh? Unlogged tables generate *zero* WAL, by definition. If your point is that WAL creation isn't a particularly good resource consumption metric, that's an argument worth considering, but it seems quite orthogonal to whether we can implement such throttling reasonably. And in any case, you didn't provide such an argument. regards, tom lane
On 16 January 2014 16:56, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-01-16 17:47:51 +0200, Heikki Linnakangas wrote: >> On 01/16/2014 05:39 PM, Andres Freund wrote: >> >On 2014-01-16 10:35:20 -0500, Tom Lane wrote: >> >>I think possibly a more productive approach to this would be to treat >> >>it as a session-level GUC setting, rather than hard-wiring it to affect >> >>certain commands and not others. >> > >> >Do you see a reasonable way to implement this generically for all >> >commands? I don't. >> >> In suitable safe places, check if you've written too much WAL, and sleep if >> so. Call it CHECK_FOR_WAL_BUDGET(), along the lines of >> CHECK_FOR_INTERRUPTS(), but called less frequently. Or maybe >> vacuum_delay_point() is a better analogy. Hopefully you don't need to >> sprinkle them in too many places to be useful. > > That'd be most places doing XLogInsert() if you want to be > consistent. Each needing to be analyzed whether we're blocking important > resources, determining where in the callstack we can do the > CHECK_FOR_WAL_BUDGET(). > > I don't think the the add power by allowing bulk DML to be metered is > worth the increased implementation cost. > > I think the usecases that would want this for DML probably also wan this > to work for unlogged, temp tables. That'd require a much more generic > resource control framework. Thank you everyone for responding so positively to this proposal. It is something many users have complained about. In time, I think we might want both WAL Rate Limiting and I/O Rate Limiting. IMHO I/O rate limiting is harder and so this proposal is restricted solely to WAL rate limiting. I'm comfortable with a session level parameter. I was mulling over a wal_rate_limit_scope parameter but I think that is too much. I will follow Craig's proposal to define this in terms of MB/s, adding that I would calc from beginning of statement to now, so it is averaged rate. Setting of zero means unlimited. The rate would be per-session (in this release) rather than a globally calculated setting. I would like to call it CHECK_FOR_WAL_RATE_LIMIT() That seems like a cheap enough call to allow it to be added in more places, so my expanded list of commands touched areCLUSTER/VACUUM FULLALTER TABLE (only in phase 3 table rewrite)ALTER TABLESET TABLESPACECREATE INDEX which are already there, plus new ones suggested/impliedCOPYCREATE TABLE AS SELECTINSERT/UPDATE/DELETE Please let me know if I missed something (rather than debating what is or is not a "maintenance" command). Any other thoughts before I cut v2 ? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-01-16 11:19:28 -0500, Tom Lane wrote: > > I think the usecases that would want this for DML probably also wan this > > to work for unlogged, temp tables. > > Huh? Unlogged tables generate *zero* WAL, by definition. Yes. That's my point. If we provide it as a generic resource control - which what's being discussed here sounds to me - it should be generic. If we provide as a measure to prevent standbys from getting out of date due to maintenance commands, then it only needs to cover those. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-01-16 17:20:19 +0100, Simon Riggs wrote: > I'm comfortable with a session level parameter. I was mulling over a > wal_rate_limit_scope parameter but I think that is too much. > I will follow Craig's proposal to define this in terms of MB/s, adding > that I would calc from beginning of statement to now, so it is > averaged rate. Setting of zero means unlimited. The rate would be > per-session (in this release) rather than a globally calculated > setting. > > I would like to call it CHECK_FOR_WAL_RATE_LIMIT() > > That seems like a cheap enough call to allow it to be added in more > places, so my expanded list of commands touched are > CLUSTER/VACUUM FULL > ALTER TABLE (only in phase 3 table rewrite) > ALTER TABLE SET TABLESPACE > CREATE INDEX > which are already there, plus new ones suggested/implied > COPY > CREATE TABLE AS SELECT > INSERT/UPDATE/DELETE > > Please let me know if I missed something (rather than debating what is > or is not a "maintenance" command). If we're going to do this for DML - which I am far from convinced of - we also should do it for SELECT, since that can generate significant amounts of WAL with checksums turned on. Otherwise stuff like INSERT ... SELECT, UPDATE FROM et al. will behave very confusingly since suddenly thez will not only block the WAL generated by the INSERT but also the SELECT. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 16 January 2014 17:22, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-01-16 11:19:28 -0500, Tom Lane wrote: >> > I think the usecases that would want this for DML probably also wan this >> > to work for unlogged, temp tables. >> >> Huh? Unlogged tables generate *zero* WAL, by definition. > > Yes. That's my point. If we provide it as a generic resource control - > which what's being discussed here sounds to me - it should be generic. > > If we provide as a measure to prevent standbys from getting out of date > due to maintenance commands, then it only needs to cover those. Agreed, but it won't happen in this release. I/O resource control to follow in later releases. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 16 January 2014 17:29, Andres Freund <andres@2ndquadrant.com> wrote: >> Please let me know if I missed something (rather than debating what is >> or is not a "maintenance" command). > > If we're going to do this for DML - which I am far from convinced of - > we also should do it for SELECT, since that can generate significant > amounts of WAL with checksums turned on. > Otherwise stuff like INSERT ... SELECT, UPDATE FROM et al. will behave > very confusingly since suddenly thez will not only block the WAL > generated by the INSERT but also the SELECT. Good point, but rather happily I can say "thought of that" and refer you to the other patch which limits SELECT's ability to dirty pages, and thus, with checksums enabled will limit the generation of WAL. (And no, they aren't the same thing). -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > Agreed, but it won't happen in this release. I/O resource control to > follow in later releases. "This release"? TBH, I think this patch has arrived too late to be considered for 9.4. It's a fairly major feature and clearly not without controversy, so I don't think that posting it for the first time the day before CF4 starts meets the guidelines we all agreed to back in Ottawa. regards, tom lane
On 16 January 2014 17:49, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> Agreed, but it won't happen in this release. I/O resource control to >> follow in later releases. > > "This release"? TBH, I think this patch has arrived too late to be > considered for 9.4. It's a fairly major feature and clearly not > without controversy, so I don't think that posting it for the first > time the day before CF4 starts meets the guidelines we all agreed > to back in Ottawa. I think its majorly useful, but there really isn't much to the implementation and I expect its beneath the level of being defined as a major feature. (I think it may have taken less time to write than this current conversation has lasted). -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jan 16, 2014 at 8:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think the usecases that would want this for DML probably also wan thisHuh? Unlogged tables generate *zero* WAL, by definition.
> to work for unlogged, temp tables.
Transactions that only change unlogged tables still generate commit records to WAL.
I don't think that amount of WAL is particularly relevant to this discussion, but I was recently surprised by it, so wanted to publicize it. (It was causing a lot of churn in my WAL due to interaction with archive_timeout)
Cheers,
Jeff
The documentation doesn't build: openjade:config.sgml:1416:11:E: end tag for "VARIABLELIST" omitted, but OMITTAG NO was specified openjade:config.sgml:1390:5: start tag was here
On 01/16/2014 11:52 PM, Tom Lane wrote: > Heikki Linnakangas <hlinnakangas@vmware.com> writes: >> On 01/16/2014 05:39 PM, Andres Freund wrote: >>> Do you see a reasonable way to implement this generically for all >>> commands? I don't. > >> In suitable safe places, check if you've written too much WAL, and sleep >> if so. Call it CHECK_FOR_WAL_BUDGET(), along the lines of >> CHECK_FOR_INTERRUPTS(), but called less frequently. Or maybe >> vacuum_delay_point() is a better analogy. Hopefully you don't need to >> sprinkle them in too many places to be useful. > > We probably don't really need to implement it for "all" commands; I think > everyone can agree that, say, ALTER TABLE RENAME COLUMN isn't going to > emit enough WAL to really matter. My point was that we should try to hit > everything that potentially *could* generate lots of WAL, rather than > arbitrarily deciding that some are utility commands and some are not. Then you land up needing a configuration mechanism to control *which* commands get affected, too, to handle the original use case of "I want to rate limit vaccuum and index rebuilds, while everything else runs full tilt". Or do you propose to just do this per-session? If so, what about autovacuum? -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jan 16, 2014 at 10:56 AM, Andres Freund <andres@2ndquadrant.com> wrote: > That'd be most places doing XLogInsert() if you want to be > consistent. Each needing to be analyzed whether we're blocking important > resources, determining where in the callstack we can do the > CHECK_FOR_WAL_BUDGET(). And doing that probably wouldn't be a problem except for the fact that this patch is showing up at the absolute very last minute with multiple unresolved design considerations. I'm with Tom: this should be postponed to 9.5. That having been said, I bet it could be done at the tail of XLogInsert(). if there are cases where that's not desirable, then add macros HOLDOFF_WAL_THROTTLING() and RESUME_WAL_THROTTLING() that bump a counter up and down. When the counter is >0, XLogInsert() doesn't sleep; when RESUME_WAL_THROTTLING() drops the counter to 0, it also considers sleeping. I suspect only a few places would need to do this, like where we're holding one of the SLRU locks. Some testing would be needed to figure out exactly which places cause problems, but that's why it's a good idea to start discussion of your proposed feature before January 14th. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-01-17 09:04:54 -0500, Robert Haas wrote: > That having been said, I bet it could be done at the tail of > XLogInsert(). if there are cases where that's not desirable, then add > macros HOLDOFF_WAL_THROTTLING() and RESUME_WAL_THROTTLING() that bump > a counter up and down. When the counter is >0, XLogInsert() doesn't > sleep; when RESUME_WAL_THROTTLING() drops the counter to 0, it also > considers sleeping. I suspect only a few places would need to do > this, like where we're holding one of the SLRU locks. I don't think there are many locations where this would be ok. Sleeping while holding exclusive buffer locks? Quite possibly inside a criticial section? Surely not. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-01-17 09:04:54 -0500, Robert Haas wrote: >> That having been said, I bet it could be done at the tail of >> XLogInsert(). > I don't think there are many locations where this would be ok. Sleeping > while holding exclusive buffer locks? Quite possibly inside a criticial > section? More or less by definition, you're always doing both when you call XLogInsert. > Surely not. I agree. It's got to be somewhere further up the call stack. I'm inclined to think that what we ought to do is reconceptualize vacuum_delay_point() as something a bit more generic, and sprinkle calls to it in a few more places than now. It's also interesting to wonder about the relationship to CHECK_FOR_INTERRUPTS --- although I think that currently, we assume that that's *cheap* (1 test and branch) as long as nothing is pending. I don't want to see a bunch of arithmetic added to it. regards, tom lane
On 16 January 2014 17:20, Simon Riggs <simon@2ndquadrant.com> wrote: > Thank you everyone for responding so positively to this proposal. It > is something many users have complained about. > > In time, I think we might want both WAL Rate Limiting and I/O Rate > Limiting. IMHO I/O rate limiting is harder and so this proposal is > restricted solely to WAL rate limiting. > > I'm comfortable with a session level parameter. I was mulling over a > wal_rate_limit_scope parameter but I think that is too much. > I will follow Craig's proposal to define this in terms of MB/s, adding > that I would calc from beginning of statement to now, so it is > averaged rate. Setting of zero means unlimited. The rate would be > per-session (in this release) rather than a globally calculated > setting. Parameter renamed to wal_rate_limit. Not yet modified to produce this in MB/s > I would like to call it CHECK_FOR_WAL_RATE_LIMIT() Used CHECK_WAL_BUDGET() as proposed > That seems like a cheap enough call to allow it to be added in more > places, so my expanded list of commands touched are > CLUSTER/VACUUM FULL > ALTER TABLE (only in phase 3 table rewrite) > ALTER TABLE SET TABLESPACE > CREATE INDEX > which are already there, plus new ones suggested/implied > COPY > CREATE TABLE AS SELECT > INSERT/UPDATE/DELETE Added, no problems or technical difficulties encountered. > Please let me know if I missed something (rather than debating what is > or is not a "maintenance" command). > > Any other thoughts before I cut v2 ? v2 attached -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
It occurs to me that this is very similar to the method I proposed in June to enforce a hard limit on WAL usage, to avoid PANIC caused by running out of disk space when writing WAL: http://www.postgresql.org/message-id/51B095FE.6050106@vmware.com Enforcing a global limit needs more book-keeping than rate limiting individual sesssions. But I'm hoping that the CHECK_WAL_BUDGET() calls could be used for both purposes. - Heikki
On 01/17/2014 05:20 PM, Simon Riggs wrote: > + if (RelationNeedsWAL(indexrel)) > + CHECK_FOR_WAL_BUDGET(); I don't think we need the RelationNeedsWAL tests. If the relation is not WAL-logged, you won't write much WAL, so you should easily stay under the limit. And CHECK_FOR_WAL_BUDGET() better be cheap when you're below the limit. - Heikki
On 17 January 2014 16:34, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 01/17/2014 05:20 PM, Simon Riggs wrote: >> >> + if (RelationNeedsWAL(indexrel)) >> + CHECK_FOR_WAL_BUDGET(); > > > I don't think we need the RelationNeedsWAL tests. If the relation is not > WAL-logged, you won't write much WAL, so you should easily stay under the > limit. And CHECK_FOR_WAL_BUDGET() better be cheap when you're below the > limit. OK, I'll remove them, thanks. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 17 January 2014 16:10, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@2ndquadrant.com> writes: >> On 2014-01-17 09:04:54 -0500, Robert Haas wrote: >>> That having been said, I bet it could be done at the tail of >>> XLogInsert(). > >> I don't think there are many locations where this would be ok. Sleeping >> while holding exclusive buffer locks? Quite possibly inside a criticial >> section? > > More or less by definition, you're always doing both when you call > XLogInsert. > >> Surely not. > > I agree. It's got to be somewhere further up the call stack. Definitely. > I'm inclined to think that what we ought to do is reconceptualize > vacuum_delay_point() as something a bit more generic, and sprinkle > calls to it in a few more places than now. Agreed; that was the original plan, but implementation delays prevented the whole vision/discussion/implementation. Requirements from various areas include WAL rate limiting for replication, I/O rate limiting, hard CPU and I/O limits for security and mixed workload coexistence. I'd still like to get something on this in 9.4 that alleviates the replication issues, leaving wider changes for later releases. The vacuum_* parameters don't allow any control over WAL production, which is often the limiting factor. I could, for example, introduce a new parameter for vacuum_cost_delay that provides a weighting for each new BLCKSZ chunk of WAL, then rename all parameters to a more general form. Or I could forget that and just press ahead with the patch as is, providing a cleaner interface in next release. > It's also interesting to wonder about the relationship to > CHECK_FOR_INTERRUPTS --- although I think that currently, we assume > that that's *cheap* (1 test and branch) as long as nothing is pending. > I don't want to see a bunch of arithmetic added to it. Good point. I'll call these new calls CHECK_FOR_RESOURCES() to allow us to implement various kinds of resource checking in future. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 17 January 2014 16:30, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > It occurs to me that this is very similar to the method I proposed in June > to enforce a hard limit on WAL usage, to avoid PANIC caused by running out > of disk space when writing WAL: > > http://www.postgresql.org/message-id/51B095FE.6050106@vmware.com > > Enforcing a global limit needs more book-keeping than rate limiting > individual sesssions. But I'm hoping that the CHECK_WAL_BUDGET() calls could > be used for both purposes. We can't set a useful hard limit on WAL by default without that being a bigger foot gun than what we have now. If we did have a limit, I would look to make it less exact and take it out of the main path for example by checkpointer - or perhaps only make the check when we switch xlog files. So I don't want to include that requirement into this current work. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jan 20, 2014 at 5:37 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > Agreed; that was the original plan, but implementation delays > prevented the whole vision/discussion/implementation. Requirements > from various areas include WAL rate limiting for replication, I/O rate > limiting, hard CPU and I/O limits for security and mixed workload > coexistence. > > I'd still like to get something on this in 9.4 that alleviates the > replication issues, leaving wider changes for later releases. My first reaction was that we should just have a generic I/O resource throttling. I was only convinced this was a reasonable idea by the replication use case. It would help me to understand the specific situations where replication breaks down due to WAL bandwidth starvation. Heroku has had some problems with slaves falling behind though the immediate problems that causes is the slave filling up disk which we could solve more directly by switching to archive mode rather than slowing down the master. But I would suggest you focus on a specific use case that's problematic so we can judge better if the implementation is really fixing it. > The vacuum_* parameters don't allow any control over WAL production, > which is often the limiting factor. I could, for example, introduce a > new parameter for vacuum_cost_delay that provides a weighting for each > new BLCKSZ chunk of WAL, then rename all parameters to a more general > form. Or I could forget that and just press ahead with the patch as > is, providing a cleaner interface in next release. > >> It's also interesting to wonder about the relationship to >> CHECK_FOR_INTERRUPTS --- although I think that currently, we assume >> that that's *cheap* (1 test and branch) as long as nothing is pending. >> I don't want to see a bunch of arithmetic added to it. > > Good point. I think it should be possible to actually merge it into CHECK_FOR_INTERRUPTS. Have a single global flag io_done_since_check_for_interrupts which is set to 0 after each CHECK_FOR_INTERRUPTS and set to 1 whenever any wal is written. Then CHECK_FOR_INTERRUPTS turns into two tests and branches instead of one in the normal case. In fact you could do all the arithmetic when you do the wal write. Only set the flag if the bandwidth consumed is above the budget. Then the flag should only ever be set when you're about to sleep. I would dearly love to see a generic I/O bandwidth limits so it would be nice to see a nicely general pattern here that could be extended even if we only target wal this release. I'm going to read the existing patch now, do you think it's ready to go or did you want to do more work based on the feedback?
On Wed, Feb 19, 2014 at 8:28 AM, Greg Stark <stark@mit.edu> wrote: > On Mon, Jan 20, 2014 at 5:37 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> Agreed; that was the original plan, but implementation delays >> prevented the whole vision/discussion/implementation. Requirements >> from various areas include WAL rate limiting for replication, I/O rate >> limiting, hard CPU and I/O limits for security and mixed workload >> coexistence. >> >> I'd still like to get something on this in 9.4 that alleviates the >> replication issues, leaving wider changes for later releases. > > My first reaction was that we should just have a generic I/O resource > throttling. I was only convinced this was a reasonable idea by the > replication use case. It would help me to understand the specific > situations where replication breaks down due to WAL bandwidth > starvation. Heroku has had some problems with slaves falling behind > though the immediate problems that causes is the slave filling up disk > which we could solve more directly by switching to archive mode rather > than slowing down the master. > > But I would suggest you focus on a specific use case that's > problematic so we can judge better if the implementation is really > fixing it. > >> The vacuum_* parameters don't allow any control over WAL production, >> which is often the limiting factor. I could, for example, introduce a >> new parameter for vacuum_cost_delay that provides a weighting for each >> new BLCKSZ chunk of WAL, then rename all parameters to a more general >> form. Or I could forget that and just press ahead with the patch as >> is, providing a cleaner interface in next release. >> >>> It's also interesting to wonder about the relationship to >>> CHECK_FOR_INTERRUPTS --- although I think that currently, we assume >>> that that's *cheap* (1 test and branch) as long as nothing is pending. >>> I don't want to see a bunch of arithmetic added to it. >> >> Good point. > > I think it should be possible to actually merge it into > CHECK_FOR_INTERRUPTS. Have a single global flag > io_done_since_check_for_interrupts which is set to 0 after each > CHECK_FOR_INTERRUPTS and set to 1 whenever any wal is written. Then > CHECK_FOR_INTERRUPTS turns into two tests and branches instead of one > in the normal case. > > In fact you could do all the arithmetic when you do the wal write. > Only set the flag if the bandwidth consumed is above the budget. Then > the flag should only ever be set when you're about to sleep. > > I would dearly love to see a generic I/O bandwidth limits so it would > be nice to see a nicely general pattern here that could be extended > even if we only target wal this release. > > I'm going to read the existing patch now, do you think it's ready to > go or did you want to do more work based on the feedback? Well, *I* don't think this is ready to go. A WAL rate limit that only limits WAL sometimes still doesn't impress me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 19 February 2014 16:04, Robert Haas <robertmhaas@gmail.com> wrote: > Well, *I* don't think this is ready to go. A WAL rate limit that only > limits WAL sometimes still doesn't impress me. Could you be specific in your criticism? "Sometimes" wouldn't impress anybody, but we need to understand whether you are referring to a bug or just making a personal assessment of the patch features. Thanks. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Feb 20, 2014 at 1:45 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 19 February 2014 16:04, Robert Haas <robertmhaas@gmail.com> wrote: > >> Well, *I* don't think this is ready to go. A WAL rate limit that only >> limits WAL sometimes still doesn't impress me. > > Could you be specific in your criticism? "Sometimes" wouldn't impress > anybody, but we need to understand whether you are referring to a bug > or just making a personal assessment of the patch features. Thanks. I think it's clear what the question here is. The patch represents a quick easy win for certain cases but isn't the nice broad solution we would prefer to have. We face this decision a lot and it's a tough one. On the one hand we want to set a high standard for ourselves and not load up the Postgres source with lots of half-measures, on the other hand we don't want to let "perfect be the enemy of the good" and have development stall while we refuse easy wins. I don't think it would be a terrible thing if this patch went in. And I also don't think it would be a huge loss if it didn't since I think we will need IO resource management down the road anyways. I was looking for a few non-controversial patches to work on at least at the moment to establish some history before I take on more controversial patches so I don't think committing this is really a great idea for me right now. If you wanted to make this a more general system I have some ideas and I think we could do it fairly easily. We could make CHECK_FOR_INTERRUPTS have a single flag but if it's set check the "interrupt type" and know that some types can be handled by simply sleeping and not throwing an error. There are already some cases that are handled specially so it wouldn't really even change the code much. I would be happy to help get something like that committed. -- greg
On 19 February 2014 13:28, Greg Stark <stark@mit.edu> wrote: > On Mon, Jan 20, 2014 at 5:37 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> Agreed; that was the original plan, but implementation delays >> prevented the whole vision/discussion/implementation. Requirements >> from various areas include WAL rate limiting for replication, I/O rate >> limiting, hard CPU and I/O limits for security and mixed workload >> coexistence. >> >> I'd still like to get something on this in 9.4 that alleviates the >> replication issues, leaving wider changes for later releases. > > My first reaction was that we should just have a generic I/O resource > throttling. I was only convinced this was a reasonable idea by the > replication use case. It would help me to understand the specific > situations where replication breaks down due to WAL bandwidth > starvation. The various bulk operations mentioned in the OP can dump GBs of data into WAL. That causes replication to immediately fall behind, which slows everything down if you are using synchronous replication. This has been discussed before on list over last few years (and Heroku staff have been involved in that, fyi). General I/O limiting is a nice-to-have but what is suggested here is an essential aspect of running bulk commands when using replication, especially synchronous replication. The use cases for I/O limiting and replication limiting are different in both their effects and their criticality. The design choice of making the limit only apply to bulk ops is because that is where the main problem lies. Rate limiting will cause a loss of performance in the main line for non-bulk operations, so adding tests there will not be valuable. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Feb 20, 2014 at 2:43 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > The design choice of making the limit only apply to bulk ops is > because that is where the main problem lies. Rate limiting will cause > a loss of performance in the main line for non-bulk operations, so > adding tests there will not be valuable. This isn't a bad point. If you do a massive UPDATE or INSERT you probably do want the ability to limit the damage to your replication lag but if you're an OLTP worker and you just happen to handle a disproportionate amount of the incoming requests you wouldn't. But then presumably you would just set that GUC that way where your OLTP workers would run without any limits and your batch jobs would set a limit. As I said it wouldn't be terrible if the patch went in and could only handle cluster/vacuum/create index. They're by far the most common causes of problems and most people who do massive UPDATES or INSERTS quickly discover they want to do it in batches of a few thousand rows anyways so they can limit the speed themselves. There's a decent argument to made for incremental changes too. If we add a feature like this for these operations then we'll find out how well it works in practice and whether users need finer control or what other sources become the next problem before we've invested a whole lot in a general solution. -- greg
On Thu, Feb 20, 2014 at 9:43 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > The design choice of making the limit only apply to bulk ops is > because that is where the main problem lies. Rate limiting will cause > a loss of performance in the main line for non-bulk operations, so > adding tests there will not be valuable. That's pure sophistry. Of course rate limiting will cause "a loss of performance in the main line for non-bulk operations". Rate limiting will also cause a loss of performance for bulk operations. The whole point of rate limiting is to cause a loss of performance. That's not a bug; that's what the feature is explicitly designed to do. So-called "non-bulk operations", which seems to be defined as more or less "anything where actually making this feature work seemed hard", can include huge inserts, updates, or deletes that greatly depress performance for other parts of the system, either individually or in aggregate. And it's just as legitimate to want to tamp down that activity as it is to want to slow down CLUSTER. Remember, this is a GUC, so it need not be set identically for every user session. It is not hard at all to imagine a situation where the user wishes to limit the rate at which some sessions can write WAL so as to avoid interfering with other, higher-priority tasks happening in other sessions. That is hardly a niche use case; I think I've seen it reported, if anything, even more frequently than problems with what you're calling bulk operations. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Feb 20, 2014 at 10:10 AM, Greg Stark <stark@mit.edu> wrote: > On Thu, Feb 20, 2014 at 2:43 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> The design choice of making the limit only apply to bulk ops is >> because that is where the main problem lies. Rate limiting will cause >> a loss of performance in the main line for non-bulk operations, so >> adding tests there will not be valuable. > > This isn't a bad point. If you do a massive UPDATE or INSERT you > probably do want the ability to limit the damage to your replication > lag but if you're an OLTP worker and you just happen to handle a > disproportionate amount of the incoming requests you wouldn't. But > then presumably you would just set that GUC that way where your OLTP > workers would run without any limits and your batch jobs would set a > limit. > > As I said it wouldn't be terrible if the patch went in and could only > handle cluster/vacuum/create index. They're by far the most common > causes of problems and most people who do massive UPDATES or INSERTS > quickly discover they want to do it in batches of a few thousand rows > anyways so they can limit the speed themselves. > > There's a decent argument to made for incremental changes too. If we > add a feature like this for these operations then we'll find out how > well it works in practice and whether users need finer control or what > other sources become the next problem before we've invested a whole > lot in a general solution. What will more likely happen is that we won't be able to change the definition in future releases because it breaks backward compatibility. If we implement this now for only a subset of commands, and then later someone proposes a patch to expand it to more commands, then there's going to be someone who shows up and complains about that definitional change. And for good reason. I'm heartily in favor of incremental development of features, but each incremental addition needs to stand on its own two feet, not be something that we can already foresee needing to break or throw out later. There are way, way too many PostgreSQL users now for us to be able to get away with that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 02/20/2014 04:16 PM, Robert Haas wrote: > On Thu, Feb 20, 2014 at 9:43 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> The design choice of making the limit only apply to bulk ops is >> because that is where the main problem lies. Rate limiting will cause >> a loss of performance in the main line for non-bulk operations, so >> adding tests there will not be valuable. > That's pure sophistry. Of course rate limiting will cause "a loss of > performance in the main line for non-bulk operations". I think he meant that *adding code for* rate limiting there will cause loss of performance even if not used. > Rate limiting > will also cause a loss of performance for bulk operations. The whole > point of rate limiting is to cause a loss of performance. Not when it is not enabled it should not. > That's not > a bug; that's what the feature is explicitly designed to do. > > So-called "non-bulk operations", which seems to be defined as more or > less "anything where actually making this feature work seemed hard", NO, it is anything that a decent DBA could not script to run in chunks, which he would do anyway for other reasons, like not to lock out OLTP. > can include huge inserts, updates, or deletes that greatly depress > performance for other parts of the system, either individually or in > aggregate. And it's just as legitimate to want to tamp down that > activity as it is to want to slow down CLUSTER. Remember, this is a > GUC, so it need not be set identically for every user session. It is > not hard at all to imagine a situation where the user wishes to limit > the rate at which some sessions can write WAL so as to avoid > interfering with other, higher-priority tasks happening in other > sessions. It is hard to imagine, but it is still a much more infrequent than needingto do concurrent ops like VACUUM and CREATE INDEXCONCURRENTLY . > That is hardly a niche use case; I think I've seen it > reported, if anything, even more frequently than problems with what > you're calling bulk operations. Could be, but they are two separate cases. One helps DBA get stuff done without stepping on users toes, the other is about protecting one user from the other. It is arguable that the first is a sub-case of the other, but I don't think these should be controlled by the same GUC though you may want to have some checking for sane values between the two Maybe call this one `maintenance_wal_rate_limit_delay` same way as we have `maintenance_work_mem` Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
Robert Haas escribió: > On Thu, Feb 20, 2014 at 9:43 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > The design choice of making the limit only apply to bulk ops is > > because that is where the main problem lies. Rate limiting will cause > > a loss of performance in the main line for non-bulk operations, so > > adding tests there will not be valuable. > > That's pure sophistry. Of course rate limiting will cause "a loss of > performance in the main line for non-bulk operations". Rate limiting > will also cause a loss of performance for bulk operations. The whole > point of rate limiting is to cause a loss of performance. I think he's saying that it will degrade performance even in the case where the feature is disabled, which would be undesirable. I don't necessarily agree with that assertion, but at least it's something to consider rather than to laugh at. > And it's just as legitimate to want to tamp down that > activity as it is to want to slow down CLUSTER. Remember, this is a > GUC, so it need not be set identically for every user session. It is > not hard at all to imagine a situation where the user wishes to limit > the rate at which some sessions can write WAL so as to avoid > interfering with other, higher-priority tasks happening in other > sessions. I don't disagree with this either, but I don't think it necessarily has to be the same feature. Most likely we don't want to have users setting the GUC each time they want to run vacuum; rather they will want to be able to set one line in postgresql.conf and have it affect all vacuuming activity. However, if that same config tweak affects all their UPDATE statements regardless of them being single-row updates or bulk updates, I'm pretty sure it's not going to be welcome. I think "bulk" (maintenance) operations should legitimately configured separately from regular UPDATE etc operations. For the latter I think it's pretty reasonable to assume that users are going to want to tweak the GUC for each individual callsite in the application, rather than wholesale in postgresql.conf. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Feb 20, 2014 at 12:07 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I think "bulk" (maintenance) operations should legitimately configured > separately from regular UPDATE etc operations. For the latter I think > it's pretty reasonable to assume that users are going to want to tweak > the GUC for each individual callsite in the application, rather than > wholesale in postgresql.conf. There's an argument to be made for that design, but the discussion upthread doesn't support the contention that the patch makes a clean and well-defined division between those things. The initial patch left VACUUM out on the dubious theory that since we have an existing system to limit its throughput by pages dirtied we don't need a way to limit the rate at which it generates WAL, and left as an open question whether this out to apply to COPY and CREATE TABLE AS SELECT (but apparently not UPDATE or DELETE, or for that matter just plain SELECT, when it does a lot of pruning). A subsequent version of the patch changed the list of commands affected, but it's not at all clear to me that we have something as tidy as "only commands where X and never commands where Y". More broadly, three committers (myself, Tom, Heikki) expressed the desire for this to be made into a more generic mechanism, and two of those people (myself and Tom) said that this was too late for 9.4 and should be postponed to 9.5. Then a month went by and Greg Stark showed up and made noises about pushing this forward as-is. So I don't want it to be forgotten that those objections were made and have not been withdrawn. I'm not dead set against changing my position on this patch, but that should happen because of work to improve the patch - which was last posted on January 17th and did not at that time even include the requested and agreed fix to measure the limit in MB/s rather than some odd unit - not because of the simple passage of time.If anything, the passage of 5 weeks without meaningfulprogress ought to strengthen rather than weaken the argument that this is far too late for this release. Of course, if we're talking about 9.5, then disregard the previous paragraph. But in that case perhaps we could postpone this until we don't have 40+ patches needing review and a dozen marked ready for committer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
It's clear now that if what Greg wants is an uncontroversial patch to commit, this is not it. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 1/16/14, 9:19 PM, Craig Ringer wrote: > On 01/16/2014 11:52 PM, Tom Lane wrote: >> Heikki Linnakangas <hlinnakangas@vmware.com> writes: >>> On 01/16/2014 05:39 PM, Andres Freund wrote: >>>> Do you see a reasonable way to implement this generically for all >>>> commands? I don't. >> >>> In suitable safe places, check if you've written too much WAL, and sleep >>> if so. Call it CHECK_FOR_WAL_BUDGET(), along the lines of >>> CHECK_FOR_INTERRUPTS(), but called less frequently. Or maybe >>> vacuum_delay_point() is a better analogy. Hopefully you don't need to >>> sprinkle them in too many places to be useful. >> >> We probably don't really need to implement it for "all" commands; I think >> everyone can agree that, say, ALTER TABLE RENAME COLUMN isn't going to >> emit enough WAL to really matter. My point was that we should try to hit >> everything that potentially *could* generate lots of WAL, rather than >> arbitrarily deciding that some are utility commands and some are not. > > Then you land up needing a configuration mechanism to control *which* > commands get affected, too, to handle the original use case of "I want > to rate limit vaccuum and index rebuilds, while everything else runs > full tilt". > > Or do you propose to just do this per-session? If so, what about autovacuum? Tom was proposing per-session. For autovac, don't we already have some method of changing the GUCs it uses? If not, we should... -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net