Thread: FSM search modes
Just been looking again at the way FSM works. In fsm_search_avail() we essentially have just a single way for working out how to search the tree. Seems like it would be good to abstract this so that we can implement a number of FSM search strategies * (current) randomize - page selection encourages different backends to access different blocks, thus reducing block contention * cluster - page selection made based around selecting block with freespace nearest current block and we prefer keep-in-sequence to avoid-contention * compact - page selection specifically attempts to find the lowest numbered blocks, so that the table will naturally shrink over time. These are not all mutually exclusive, suggested combinations would be randomize, randomize | cluster, randomize | compact So we don't give up the load spreading behaviour, we just apply the logic at lower levels of the tree only. VACUUM could set the FSM into FSMstrategy = compact when it notices that most of the free blocks are at lower end of table. Or explicitly set during VF replacement utility. FSMstrategy = cluster would be the default if clustering is enabled on a table. FSMstrategy can change via ALTER TABLE ... WITH (fsm_strategy = ...) -- Simon Riggs www.2ndQuadrant.com
Simon Riggs <simon@2ndQuadrant.com> wrote: > * compact - page selection specifically attempts to find the lowest > numbered blocks, so that the table will naturally shrink over time. We cannot shrink the table if one tuple remains at the end of table and the tuple is always HOT-updated, because we put a new tuple into the same page where the old tuple is placed if possible. In addition to your intelligent FSM search modes, do we need another algorithm to make the compaction to work better? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
On Fri, 2009-09-18 at 10:47 +0900, Itagaki Takahiro wrote: > Simon Riggs <simon@2ndQuadrant.com> wrote: > > > * compact - page selection specifically attempts to find the lowest > > numbered blocks, so that the table will naturally shrink over time. > > We cannot shrink the table if one tuple remains at the end of table > and the tuple is always HOT-updated, because we put a new tuple into > the same page where the old tuple is placed if possible. > > In addition to your intelligent FSM search modes, > do we need another algorithm to make the compaction to work better? Perhaps we can have an additional piece of information about a table. Something like target_size, so that normal updaters that attempt HOT updates on blocks greater than target_size would know to avoid that block and to seek a new location for the row lower in the table. Perhaps such information could be reset and then sent via invalidation mechanisms. I'm thinking along the lines of a "fire alarm". An occasional mechanism by which we can inform users of the need to evacuate certain blocks. -- Simon Riggs www.2ndQuadrant.com
On Sep 18, 2009, at 1:09 AM, Simon Riggs wrote: > On Fri, 2009-09-18 at 10:47 +0900, Itagaki Takahiro wrote: >> Simon Riggs <simon@2ndQuadrant.com> wrote: >> >>> * compact - page selection specifically attempts to find the lowest >>> numbered blocks, so that the table will naturally shrink over time. >> >> We cannot shrink the table if one tuple remains at the end of table >> and the tuple is always HOT-updated, because we put a new tuple into >> the same page where the old tuple is placed if possible. >> >> In addition to your intelligent FSM search modes, >> do we need another algorithm to make the compaction to work better? > > Perhaps we can have an additional piece of information about a table. > Something like target_size, so that normal updaters that attempt HOT > updates on blocks greater than target_size would know to avoid that > block and to seek a new location for the row lower in the table. > Perhaps > such information could be reset and then sent via invalidation > mechanisms. > > I'm thinking along the lines of a "fire alarm". An occasional > mechanism > by which we can inform users of the need to evacuate certain blocks. It might be better to not beat around the bush and provide a vacuum mode that explicitly tries to free the last X percent of the table. That's especially handy for a very large table, because you probably don't want to be forced into scanning the whole thing in vacuum just to free some space at the end. This mode could also be more aggressive about trying to acquire the lock that's needed to trim the file on disk. That said, *any* step that improves dealing with table bloat is extremely welcome, as right now you're basically stuck rebuilding the table. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
On Thu, 2009-09-17 at 16:26 +0100, Simon Riggs wrote: > Just been looking again at the way FSM works. In fsm_search_avail() we > essentially have just a single way for working out how to search the > tree. > > Seems like it would be good to abstract this so that we can implement a > number of FSM search strategies > > * (current) randomize - page selection encourages different backends to > access different blocks, thus reducing block contention > > * cluster - page selection made based around selecting block with > freespace nearest current block and we prefer keep-in-sequence to > avoid-contention Is the information about "current block" available to FSM search API, or do we need to change the API for that ? > * compact - page selection specifically attempts to find the lowest > numbered blocks, so that the table will naturally shrink over time. > > These are not all mutually exclusive, suggested combinations would be > > randomize, randomize | cluster, randomize | compact > > So we don't give up the load spreading behaviour, we just apply the > logic at lower levels of the tree only. > > VACUUM could set the FSM into FSMstrategy = compact when it notices that > most of the free blocks are at lower end of table. Or explicitly set > during VF replacement utility. > > FSMstrategy = cluster would be the default if clustering is enabled on a > table. > > FSMstrategy can change via ALTER TABLE ... WITH (fsm_strategy = ...) > > -- > Simon Riggs www.2ndQuadrant.com > -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training
On Fri, 2009-09-18 at 15:43 +0300, Hannu Krosing wrote: > > > > * cluster - page selection made based around selecting block with > > freespace nearest current block and we prefer keep-in-sequence to > > avoid-contention > > Is the information about "current block" available to FSM search API, or > do we need to change the API for that ? Need to change the API, but its a small code impact. -- Simon Riggs www.2ndQuadrant.com
decibel <decibel@decibel.org> wrote: > *any* step that improves dealing with table bloat is extremely > welcome, as right now you're basically stuck rebuilding the table. +1 Although, possibly more irritating than actually rebuilding it is evaluating borderline bloat situations to determine if they will "work themselves out" over time or whether you need to bite the bullet to do aggressive maintenance. Having some way for routine vacuums (or any other routine process, currently available or that could be scheduled) to "nibble away" at moderate bloat without significant performance or usability impact would make life easier for at least *some* DBAs. -Kevin
On Sep 30, 2009, at 5:13 PM, Kevin Grittner wrote: > decibel <decibel@decibel.org> wrote: > >> *any* step that improves dealing with table bloat is extremely >> welcome, as right now you're basically stuck rebuilding the table. > > +1 > > Although, possibly more irritating than actually rebuilding it is > evaluating borderline bloat situations to determine if they will "work > themselves out" over time or whether you need to bite the bullet to do > aggressive maintenance. Having some way for routine vacuums (or any > other routine process, currently available or that could be scheduled) > to "nibble away" at moderate bloat without significant performance or > usability impact would make life easier for at least *some* DBAs. Kevin, do you have tools that allow you to clear out the end of a table? That part is at least mostly possible from userland (get list of ctids from end of table, update those records to move them, rinse, repeat) but even if you do all that there's no guarantee that a vacuum will get the exclusive lock it needs to truncate the table. So while something that makes it easier to clean out the end of a table would be good, I think the critical need is a way to make vacuum more aggressive about obtaining the exclusive lock. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
decibel wrote: > So while something that makes it easier to clean out the end of a > table would be good, I think the critical need is a way to make > vacuum more aggressive about obtaining the exclusive lock. I wonder if we should have a different mode of operation that only attempted the truncate (say VACUUM TRUNCATE), optionally being non-conditional about obtaining the required lock. That said, I wonder even more whether any such hacks are still needed after the visilibity map that changed the landscape for vacuum so dramatically. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > I wonder if we should have a different mode of operation that only > attempted the truncate (say VACUUM TRUNCATE), optionally being > non-conditional about obtaining the required lock. That said, I wonder > even more whether any such hacks are still needed after the visilibity > map that changed the landscape for vacuum so dramatically. Yeah. The one thing in this thread that seemed like a good idea to me was to bias the FSM code a little bit towards returning space near the start of the relation, rather than its current behavior of treating all the free space equally. The current arrangement pretty much guarantees that you'll never recover from a bloating episode without taking special manual action. (This is not new to 8.4, the previous FSM code behaved the same.) The analogy in the back of my mind is the btree code that decides whether to split the current page or move to the next page when it has a full page and a new key that could go to either page. We found out that randomizing that choice made a *huge* improvement in the average behavior, even with the probabilities set up as 99-to-1. I'm thinking that just a small chance of resetting the search to the start of the relation might do the trick for FSM. regards, tom lane
On Thu, 2009-10-01 at 12:05 -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > I wonder if we should have a different mode of operation that only > > attempted the truncate (say VACUUM TRUNCATE), optionally being > > non-conditional about obtaining the required lock. That said, I wonder > > even more whether any such hacks are still needed after the visilibity > > map that changed the landscape for vacuum so dramatically. > > Yeah. The one thing in this thread that seemed like a good idea to me > was to bias the FSM code a little bit towards returning space near the > start of the relation, rather than its current behavior of treating all > the free space equally. The current arrangement pretty much guarantees > that you'll never recover from a bloating episode without taking special > manual action. (This is not new to 8.4, the previous FSM code behaved > the same.) > > The analogy in the back of my mind is the btree code that decides > whether to split the current page or move to the next page when it has a > full page and a new key that could go to either page. We found out that > randomizing that choice made a *huge* improvement in the average > behavior, even with the probabilities set up as 99-to-1. I'm thinking > that just a small chance of resetting the search to the start of the > relation might do the trick for FSM. No real need to be random is there? In the bloated space scenario, VACUUM will be triggered but will be unable to remove the empty blocks. So in that case VACUUM can hint the FSM to perform "start from beginning of relation" behaviour. When a searcher does that and can't usefully find space quickly, then we can reset the hint back to normal. So it's an automated mode change in both directions. I think we can use the N-to-1 idea to define what we mean by "quickly", but that should be in proportion to exactly how many free blocks there were in table, not a fixed N. It would be a shame to make this work poorly for very large tables. It would be more useful to think of this as "look for huge chunks of space and fill them" rather than "start at beginning", since space won't always be right at start. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs <simon@2ndQuadrant.com> writes: > No real need to be random is there? In the bloated space scenario, > VACUUM will be triggered but will be unable to remove the empty blocks. > So in that case VACUUM can hint the FSM to perform "start from beginning > of relation" behaviour. No, that's an entirely unrelated issue. My point is that there won't *be* any empty end blocks unless we discourage FSM from handing out those pages as insertion targets. regards, tom lane
Simon Riggs <simon@2ndQuadrant.com> wrote: > It would be more useful to think of this as "look for huge chunks of > space and fill them" rather than "start at beginning", since space > won't always be right at start. Either I misunderstand you or I disagree. If there's a huge chunk of space near the end, and many many smaller spaces spread throughout, what I'd like is for rows to be placed in those small ones. This would minimize the number of pages to read for queries, and would present some possibility that the rows past the huge chunk might eventually be deleted or non-HOT updated, allowing the bloat to eventually be cleaned up with minimal pain. -Kevin
On Thu, 2009-10-01 at 11:32 -0500, Kevin Grittner wrote: > Either I misunderstand you or I disagree. That does seem to be a common stance, though I will read on. :-) -- Simon Riggs www.2ndQuadrant.com
On Thu, 2009-10-01 at 11:32 -0500, Kevin Grittner wrote: > If there's a huge chunk of > space near the end, and many many smaller spaces spread throughout, > what I'd like is for rows to be placed in those small ones. This > would minimize the number of pages to read for queries, and would > present some possibility that the rows past the huge chunk might > eventually be deleted or non-HOT updated, allowing the bloat to > eventually be cleaned up with minimal pain. Yes, as Tom points out, this must be done with bias away from the very end of the table. I meant that we should start from the beginning of large spaces and that we shouldn't assume that all space worth filling is at start of relation. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs <simon@2ndQuadrant.com> writes: > Yes, as Tom points out, this must be done with bias away from the very > end of the table. > I meant that we should start from the beginning of large spaces and that > we shouldn't assume that all space worth filling is at start of > relation. Right. One of the goals that FSM is trying to meet is ensuring that different backends aren't trying to insert into the same page concurrently, so as to reduce page-level contention. So for example it'd be a bad idea to adopt the really-dumb strategy of searching from the start of the table for each request --- it'd funnel all the backends to the same target page. What we want is a behavior that is randomized but tends to prefer pages towards the start rather than hitting all free pages equally. The btree precedent makes me suspect that quite a weak preference would be sufficient. So for example we might try resetting the search to the start of the relation with probability 0.01. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> Yes, as Tom points out, this must be done with bias away from the >> very end of the table. > >> I meant that we should start from the beginning of large spaces and >> that we shouldn't assume that all space worth filling is at start >> of relation. OK, so I did misunderstand you; we agree after all. :-) > So for example we might try resetting the search to the start of the > relation with probability 0.01. If I understand the heuristic you propose, and my math skill haven't eroded too badly from lack of use, every 229 spots considered would cause a 90% chance of reset. That means that the odds of getting past 50,000 spots (the number of pages with available free space at which I generally start to get worried) without resetting is about 1 in 10^218 -- which is a risk I'm willing to accept. ;-) -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So for example we might try resetting the search to the start of the >> relation with probability 0.01. > If I understand the heuristic you propose, and my math skill haven't > eroded too badly from lack of use, every 229 spots considered would > cause a 90% chance of reset. Sorry, I wasn't clear. What I was thinking of was that we'd consider resetting the search position once, upon entry to fsm_search, and then search normally thereafter. Some experimentation would be needed to choose the right probability of course. A number like 0.01 might seem too small to affect the behavior at all, but that's what we thought about the btree case too. A very light thumb upon the scales may be sufficient. regards, tom lane
Kevin Grittner wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Simon Riggs <simon@2ndQuadrant.com> writes: >>> Yes, as Tom points out, this must be done with bias away from the >>> very end of the table. >>> I meant that we should start from the beginning of large spaces and >>> that we shouldn't assume that all space worth filling is at start >>> of relation. > > OK, so I did misunderstand you; we agree after all. :-) > >> So for example we might try resetting the search to the start of the >> relation with probability 0.01. > > If I understand the heuristic you propose, and my math skill haven't > eroded too badly from lack of use, every 229 spots considered would > cause a 90% chance of reset. That means that the odds of getting past > 50,000 spots (the number of pages with available free space at which I > generally start to get worried) without resetting is about 1 in 10^218 > -- which is a risk I'm willing to accept. ;-) The FSM currently uses a clock sweep kind of algorithm to hand out pages, and the clock hand is reset to 0 at every vacuum. The purpose of resetting the clock hand is precisely to bias towards lower-numbered pages, to allow truncation later on. That's a bit of an over-simplification, though, there's actually a separate "clock hand" on each FSM page (next_slot pointer). We probably could do with more bias. For example, we still prefer pages close to the page we last inserted to, by searching for free space in the same FSM page first, before starting the search from the top of the tree. And of course, each backend tries first to insert to the last page it inserted to before even consulting the FSM. A mode to disable those behaviors might indeed be useful, along with the random reset. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Kevin Grittner wrote: >> Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> So for example we might try resetting the search to the start of the >>> relation with probability 0.01. >> >> If I understand the heuristic you propose, and my math skill >> haven't eroded too badly from lack of use, every 229 spots >> considered would cause a 90% chance of reset. That means that the >> odds of getting past 50,000 spots (the number of pages with >> available free space at which I generally start to get worried) >> without resetting is about 1 in 10^218 -- which is a risk I'm >> willing to accept. ;-) > > The FSM currently uses a clock sweep kind of algorithm to hand out > pages, and the clock hand is reset to 0 at every vacuum. The purpose > of resetting the clock hand is precisely to bias towards > lower-numbered pages, to allow truncation later on. > > That's a bit of an over-simplification, though, there's actually a > separate "clock hand" on each FSM page (next_slot pointer). > > We probably could do with more bias. For example, we still prefer > pages close to the page we last inserted to, by searching for free > space in the same FSM page first, before starting the search from > the top of the tree. And of course, each backend tries first to > insert to the last page it inserted to before even consulting the > FSM. A mode to disable those behaviors might indeed be useful, along > with the random reset. That flushes out the descriptions given by Tom, but doesn't really make me think I misunderstood the heuristic or miscalculated the odds of getting far from the start with a 1% probability of a reset on each advance of the sweep. (Of course, it's possible I'm being dense and taking statements to mean what I expect, despite evidence to the contrary.) The way I figure it, if there is a 0.01 chance to reset the sweep, then there's a 0.99 percent chance to continue the sweep from the last position. 0.99^229 is about 0.1, which means there is a 10% chance not to have reset after that many attempts to advance. Every 229 attempts after that multiplies by 0.1, too. So, after 229*6 attempts (for example) the odds of not having reset are about one in a million. Am I saying something stupid here? (I used to be good at these sorts of calculations....) -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > 0.99 percent chance to continue the sweep Make that a 99% chance, or a 0.99 chance (in case the typo was not apparent). > Am I saying something stupid here? Well, besides that line? -Kevin
On Thu, Oct 1, 2009 at 3:23 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > We probably could do with more bias. For example, we still prefer pages > close to the page we last inserted to, by searching for free space in > the same FSM page first, before starting the search from the top of the > tree. And of course, each backend tries first to insert to the last page > it inserted to before even consulting the FSM. A mode to disable those > behaviors might indeed be useful, along with the random reset. The elephant in the room here is that if the relation is a million pages of which 1-100,000 and 1,000,000 are in use, no amount of bias is going to help us truncate the relation unless every tuple on page 1,000,000 gets updated or deleted. If the user doesn't have any interest in doing anything to those tuples, nothing will ever happen. And this is likely to come up a lot, because many tables have a core of data that is rarely or never updated. For example, imagine that 90% of pages contain data that is updated regularly and 10% of pages contain data that is rarely or never updated. If these are distributed uniformly, we figure to hit one of the latter type within 10 pages or so, and then we're stymied. ...Robert
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > The way I figure it, if there is a 0.01 chance to reset the sweep, > then there's a 0.99 percent chance to continue the sweep from the last > position. 0.99^229 is about 0.1, which means there is a 10% chance > not to have reset after that many attempts to advance. Right, so the odds would be that a backend will confine its insertion attempts to the first 229 pages containing a usable amount of free space. As long as the number of backends concurrently inserting into the relation is well under 229, this seems perfectly fine. (Hm, so we might want to make the probability depend on max_connections?) A possible downside of keeping things compact this way is that you'd probably get a longer average search distance because of all the early pages tending to remain full. Maybe what we want is some bias against inserting in the last half or quarter of the table, or some such rule, rather than necessarily heading for the start of the relation. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > The elephant in the room here is that if the relation is a million > pages of which 1-100,000 and 1,000,000 are in use, no amount of bias > is going to help us truncate the relation unless every tuple on page > 1,000,000 gets updated or deleted. Well, there is no way to move a tuple across pages in a user-invisible, non-blocking fashion, so our ability to do something automatic about the above scenario is limited. The discussion at the moment is about ways of reducing the probability of getting into that situation in the first place. That doesn't preclude also providing some more-invasive tools that people can use when they do get into that situation; but let's not let I-want-a-magic-pony syndrome prevent us from doing anything at all. regards, tom lane
On Thu, Oct 1, 2009 at 4:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > A possible downside of keeping things compact this way is that you'd > probably get a longer average search distance because of all the early > pages tending to remain full. Maybe what we want is some bias against > inserting in the last half or quarter of the table, or some such rule, > rather than necessarily heading for the start of the relation. Yeah, that seems better. Avoiding the last quarter seems like it would likely be sufficient. If it's not too hard, it might make sense to only apply this bias when we know that the table is already somewhat bloated. ...Robert
On Thu, Oct 1, 2009 at 5:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> The elephant in the room here is that if the relation is a million >> pages of which 1-100,000 and 1,000,000 are in use, no amount of bias >> is going to help us truncate the relation unless every tuple on page >> 1,000,000 gets updated or deleted. > > Well, there is no way to move a tuple across pages in a user-invisible, > non-blocking fashion, so our ability to do something automatic about the > above scenario is limited. The discussion at the moment is about ways > of reducing the probability of getting into that situation in the first > place. That doesn't preclude also providing some more-invasive tools > that people can use when they do get into that situation; but let's > not let I-want-a-magic-pony syndrome prevent us from doing anything > at all. That's fair enough, but it's our usual practice to consider, before implementing a feature or code change, what fraction of the people it will actually help and by how much. If there's a way that we can improve the behavior of the system in this area, I am all in favor of it, but I have pretty modest expectations for how much real-world benefit will ensue. I suspect that it's pretty common for large tables to contain a core of infrequently-updated records, and even a very light smattering of those, distributed randomly, will be enough to stop table shrinkage before it can get very far. ...Robert
Robert Haas <robertmhaas@gmail.com> wrote: > The elephant in the room here is that if the relation is a million > pages of which 1-100,000 and 1,000,000 are in use, no amount of bias > is going to help us truncate the relation unless every tuple on page > 1,000,000 gets updated or deleted. Perhaps bias, combined with a client utility to force non-HOT updates of some rows at the end of the table would provide the desired behavior. (It'd be nice if that could be built in to vacuum, but if it's not feasible, a separate utility is workable.) Off the top of my head, I might set up a routine crontab job for most databases to do that to the lesser of 1% of the rows or a number of rows which matches how far the pages with free space exceed 10% of total pages. That seems like it should contain things for most circumstances without getting too extreme. One could always run the utility manually to correct more extreme bloat. Some sort of delay (similar to what vacuum can do) to prevent tanking performance would be good. We wouldn't care about the occasional update conflict -- we expect that using a relational database means dealing with serialization failures. We'd be more than happy to accept a few of these in exchange for keeping performance optimal. If your software is designed to reschedule transactions which are rolled back for serialization failures, they are just another performance cost, so it's pretty easy to balance one cost against another. -Kevin
Tom Lane <tgl@sss.pgh.pa.us> writes: > That doesn't preclude also providing some more-invasive tools > that people can use when they do get into that situation; About this side of things, what about having any query which asks for system columns (any of them) take a specific (new) exclusive row level lock. The tuple moving utility would try to take the same lock before moving any tuple, thus allowing those who need it to continue relying on the ctid / xmin queries trick. If that idea has any chance to fly, the tuple moving facility could even be integrated into autovacuum for actively self-healing PostgreSQL. Regards, -- dim Hoping this won't be some more hand waving.
Tom Lane <tgl@sss.pgh.pa.us> wrote: > (Hm, so we might want to make the probability depend on > max_connections?) Without doing rigorous math on it, I'd guess that to prevent contention among n connections you'd want the probably of resetting the sweep to be about 1 / (n * 2). That would mean you'd advance to the nth page about 60.6% of the time without resetting the sweep. For less contention, 1 / (n * 4) would let you get to the nth page about 77.9% of the time. > Maybe what we want is some bias against inserting in the last half > or quarter of the table, or some such rule, rather than necessarily > heading for the start of the relation. I think it would make sense to just start using this once you get into the last half or quarter of the free pages. If you go with the last quarter, then you might want to use a higher probability than I suggested above, although that would tend to leave you with contention when all the free space was in the last quarter. I'd be inclined to use something like the above probability and start using it at 50%. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > I think it would make sense to just start using this once you get > into the last half or quarter of the free pages. If you go with the > last quarter, then you might want to use a higher probability than I > suggested above, although that would tend to leave you with > contention when all the free space was in the last quarter. I'd be > inclined to use something like the above probability and start using > it at 50%. Fuzzy thinking there -- if it's the last quarter of the *free* pages, the suggested probabilities should be fine. (Somehow I got to thinking, for a moment, that it would be the last quarter of the relation's overall pages....) -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Fuzzy thinking there -- if it's the last quarter of the *free* pages, > the suggested probabilities should be fine. (Somehow I got to > thinking, for a moment, that it would be the last quarter of the > relation's overall pages....) It's going to be the latter --- we do not know, and are *not* going to invest the cycles to find out, how many pages have a useful amount of free space. Even finding out the relation physical length might be more cycles than we want to spend here ... regards, tom lane
On Thu, 2009-10-01 at 17:08 -0400, Tom Lane wrote: > The discussion at the moment is about ways > of reducing the probability of getting into that situation in the > first place. Definitely. > That doesn't preclude also providing some more-invasive tools that > people can use when they do get into that situation Yes, this is a separate thread specifically so we can get both; the VACUUM FULL replacement discussion had/was progressing well so ought not need to bring it up again here. -- Simon Riggs www.2ndQuadrant.com
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> [pages with free space or total pages in relation?] > > It's going to be the latter --- we do not know, and are *not* going > to invest the cycles to find out, how many pages have a useful > amount of free space. Even finding out the relation physical length > might be more cycles than we want to spend here ... OK, after mulling this over for a while, I suspect we'd do pretty well with starting to consider resetting the sweep after hitting 50% of the last known relation size (or whatever best approximation is available at low cost), and using a reset probability of 1 / (max_connections * 4). That gives about a 77.8% chance of getting to at least max_connections before resetting the sweep. Since it leaves about an 8.2% chance of getting to at least ten times max_connections pages before resetting the sweep, I'm inclined to think we'd want to start that at 50% of the relation rather than waiting until we get to the last quarter. As one more data point to consider, if someone has 2000 connections configured (which I've seen mentioned in many posts), you would get to 50,000 pages past the point where you start using this technique one time out of 500. -Kevin
Kevin Grittner wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > (Hm, so we might want to make the probability depend on > > max_connections?) > > Without doing rigorous math on it, I'd guess that to prevent > contention among n connections you'd want the probably of resetting > the sweep to be about 1 / (n * 2). That would mean you'd advance to > the nth page about 60.6% of the time without resetting the sweep. For > less contention, 1 / (n * 4) would let you get to the nth page about > 77.9% of the time. > > > Maybe what we want is some bias against inserting in the last half > > or quarter of the table, or some such rule, rather than necessarily > > heading for the start of the relation. > > I think it would make sense to just start using this once you get into > the last half or quarter of the free pages. If you go with the last > quarter, then you might want to use a higher probability than I > suggested above, although that would tend to leave you with contention > when all the free space was in the last quarter. I'd be inclined to > use something like the above probability and start using it at 50%. Two things that might not have been mentioned: First, only reset if you are given a page in the last 1/4 of the table; that way, if there is no free space in the last 1/4 of the table, you will not be resetting. A second idea is to heavily bias against using the last table page with data in it; that should help bias toward empty pages on the end of the table. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Oct 1, 2009, at 4:18 PM, Robert Haas wrote: > On Thu, Oct 1, 2009 at 5:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> The elephant in the room here is that if the relation is a million >>> pages of which 1-100,000 and 1,000,000 are in use, no amount of bias >>> is going to help us truncate the relation unless every tuple on page >>> 1,000,000 gets updated or deleted. >> >> Well, there is no way to move a tuple across pages in a user- >> invisible, >> non-blocking fashion, so our ability to do something automatic >> about the >> above scenario is limited. The discussion at the moment is about >> ways >> of reducing the probability of getting into that situation in the >> first >> place. That doesn't preclude also providing some more-invasive tools >> that people can use when they do get into that situation; but let's >> not let I-want-a-magic-pony syndrome prevent us from doing anything >> at all. > > That's fair enough, but it's our usual practice to consider, before > implementing a feature or code change, what fraction of the people it > will actually help and by how much. If there's a way that we can > improve the behavior of the system in this area, I am all in favor of > it, but I have pretty modest expectations for how much real-world > benefit will ensue. I suspect that it's pretty common for large Speaking of helping other cases... Something else that's been talked about is biasing FSM searches in order to try and keep a table clustered. If it doesn't add a lot of overhead, it would be nice to keep that in mind. I don't know where something like randomly reseting the search would go in the code, but I suspect it wouldn't be very expandable in the future. But like Tom said, the top goal here is to help deal with bloat, not other fanciness. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828