Thread: FSM search modes

FSM search modes

From
Simon Riggs
Date:
Just been looking again at the way FSM works. In fsm_search_avail() we
essentially have just a single way for working out how to search the
tree.

Seems like it would be good to abstract this so that we can implement a
number of FSM search strategies

* (current) randomize - page selection encourages different backends to
access different blocks, thus reducing block contention

* cluster - page selection made based around selecting block with
freespace nearest current block and we prefer keep-in-sequence to
avoid-contention

* compact - page selection specifically attempts to find the lowest
numbered blocks, so that the table will naturally shrink over time.

These are not all mutually exclusive, suggested combinations would be

randomize, randomize | cluster, randomize | compact

So we don't give up the load spreading behaviour, we just apply the
logic at lower levels of the tree only.

VACUUM could set the FSM into FSMstrategy = compact when it notices that
most of the free blocks are at lower end of table. Or explicitly set
during VF replacement utility.

FSMstrategy = cluster would be the default if clustering is enabled on a
table.

FSMstrategy can change via ALTER TABLE ... WITH (fsm_strategy = ...)

-- Simon Riggs           www.2ndQuadrant.com



Re: FSM search modes

From
Itagaki Takahiro
Date:
Simon Riggs <simon@2ndQuadrant.com> wrote:

> * compact - page selection specifically attempts to find the lowest
> numbered blocks, so that the table will naturally shrink over time.

We cannot shrink the table if one tuple remains at the end of table
and the tuple is always HOT-updated, because we put a new tuple into
the same page where the old tuple is placed if possible.

In addition to your intelligent FSM search modes,
do we need another algorithm to make the compaction to work better?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: FSM search modes

From
Simon Riggs
Date:
On Fri, 2009-09-18 at 10:47 +0900, Itagaki Takahiro wrote:
> Simon Riggs <simon@2ndQuadrant.com> wrote:
> 
> > * compact - page selection specifically attempts to find the lowest
> > numbered blocks, so that the table will naturally shrink over time.
> 
> We cannot shrink the table if one tuple remains at the end of table
> and the tuple is always HOT-updated, because we put a new tuple into
> the same page where the old tuple is placed if possible.
> 
> In addition to your intelligent FSM search modes,
> do we need another algorithm to make the compaction to work better?

Perhaps we can have an additional piece of information about a table.
Something like target_size, so that normal updaters that attempt HOT
updates on blocks greater than target_size would know to avoid that
block and to seek a new location for the row lower in the table. Perhaps
such information could be reset and then sent via invalidation
mechanisms.

I'm thinking along the lines of a "fire alarm". An occasional mechanism
by which we can inform users of the need to evacuate certain blocks.

-- Simon Riggs           www.2ndQuadrant.com



Re: FSM search modes

From
decibel
Date:
On Sep 18, 2009, at 1:09 AM, Simon Riggs wrote:
> On Fri, 2009-09-18 at 10:47 +0900, Itagaki Takahiro wrote:
>> Simon Riggs <simon@2ndQuadrant.com> wrote:
>>
>>> * compact - page selection specifically attempts to find the lowest
>>> numbered blocks, so that the table will naturally shrink over time.
>>
>> We cannot shrink the table if one tuple remains at the end of table
>> and the tuple is always HOT-updated, because we put a new tuple into
>> the same page where the old tuple is placed if possible.
>>
>> In addition to your intelligent FSM search modes,
>> do we need another algorithm to make the compaction to work better?
>
> Perhaps we can have an additional piece of information about a table.
> Something like target_size, so that normal updaters that attempt HOT
> updates on blocks greater than target_size would know to avoid that
> block and to seek a new location for the row lower in the table.  
> Perhaps
> such information could be reset and then sent via invalidation
> mechanisms.
>
> I'm thinking along the lines of a "fire alarm". An occasional  
> mechanism
> by which we can inform users of the need to evacuate certain blocks.


It might be better to not beat around the bush and provide a vacuum  
mode that explicitly tries to free the last X percent of the table.  
That's especially handy for a very large table, because you probably  
don't want to be forced into scanning the whole thing in vacuum just  
to free some space at the end. This mode could also be more  
aggressive about trying to acquire the lock that's needed to trim the  
file on disk.

That said, *any* step that improves dealing with table bloat is  
extremely welcome, as right now you're basically stuck rebuilding the  
table.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828




Re: FSM search modes

From
Hannu Krosing
Date:
On Thu, 2009-09-17 at 16:26 +0100, Simon Riggs wrote:
> Just been looking again at the way FSM works. In fsm_search_avail() we
> essentially have just a single way for working out how to search the
> tree.
> 
> Seems like it would be good to abstract this so that we can implement a
> number of FSM search strategies
> 
> * (current) randomize - page selection encourages different backends to
> access different blocks, thus reducing block contention
> 
> * cluster - page selection made based around selecting block with
> freespace nearest current block and we prefer keep-in-sequence to
> avoid-contention

Is the information about "current block" available to FSM search API, or
do we need to change the API for that ?

> * compact - page selection specifically attempts to find the lowest
> numbered blocks, so that the table will naturally shrink over time.
> 
> These are not all mutually exclusive, suggested combinations would be
> 
> randomize, randomize | cluster, randomize | compact
> 
> So we don't give up the load spreading behaviour, we just apply the
> logic at lower levels of the tree only.
> 
> VACUUM could set the FSM into FSMstrategy = compact when it notices that
> most of the free blocks are at lower end of table. Or explicitly set
> during VF replacement utility.
> 
> FSMstrategy = cluster would be the default if clustering is enabled on a
> table.
> 
> FSMstrategy can change via ALTER TABLE ... WITH (fsm_strategy = ...)
> 
> -- 
>  Simon Riggs           www.2ndQuadrant.com
> 

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability   Services, Consulting and Training




Re: FSM search modes

From
Simon Riggs
Date:
On Fri, 2009-09-18 at 15:43 +0300, Hannu Krosing wrote:
> > 
> > * cluster - page selection made based around selecting block with
> > freespace nearest current block and we prefer keep-in-sequence to
> > avoid-contention
> 
> Is the information about "current block" available to FSM search API, or
> do we need to change the API for that ?

Need to change the API, but its a small code impact.

-- Simon Riggs           www.2ndQuadrant.com



Re: FSM search modes

From
"Kevin Grittner"
Date:
decibel <decibel@decibel.org> wrote: 
> *any* step that improves dealing with table bloat is extremely
> welcome, as right now you're basically stuck rebuilding the table.
+1
Although, possibly more irritating than actually rebuilding it is
evaluating borderline bloat situations to determine if they will "work
themselves out" over time or whether you need to bite the bullet to do
aggressive maintenance.  Having some way for routine vacuums (or any
other routine process, currently available or that could be scheduled)
to "nibble away" at moderate bloat without significant performance or
usability impact would make life easier for at least *some* DBAs.
-Kevin


Re: FSM search modes

From
decibel
Date:
On Sep 30, 2009, at 5:13 PM, Kevin Grittner wrote:
> decibel <decibel@decibel.org> wrote:
>
>> *any* step that improves dealing with table bloat is extremely
>> welcome, as right now you're basically stuck rebuilding the table.
>
> +1
>
> Although, possibly more irritating than actually rebuilding it is
> evaluating borderline bloat situations to determine if they will "work
> themselves out" over time or whether you need to bite the bullet to do
> aggressive maintenance.  Having some way for routine vacuums (or any
> other routine process, currently available or that could be scheduled)
> to "nibble away" at moderate bloat without significant performance or
> usability impact would make life easier for at least *some* DBAs.


Kevin, do you have tools that allow you to clear out the end of a  
table? That part is at least mostly possible from userland (get list  
of ctids from end of table, update those records to move them, rinse,  
repeat) but even if you do all that there's no guarantee that a  
vacuum will get the exclusive lock it needs to truncate the table.

So while something that makes it easier to clean out the end of a  
table would be good, I think the critical need is a way to make  
vacuum more aggressive about obtaining the exclusive lock.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828




Re: FSM search modes

From
Alvaro Herrera
Date:
decibel wrote:

> So while something that makes it easier to clean out the end of a
> table would be good, I think the critical need is a way to make
> vacuum more aggressive about obtaining the exclusive lock.

I wonder if we should have a different mode of operation that only
attempted the truncate (say VACUUM TRUNCATE), optionally being
non-conditional about obtaining the required lock.  That said, I wonder
even more whether any such hacks are still needed after the visilibity
map that changed the landscape for vacuum so dramatically.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: FSM search modes

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> I wonder if we should have a different mode of operation that only
> attempted the truncate (say VACUUM TRUNCATE), optionally being
> non-conditional about obtaining the required lock.  That said, I wonder
> even more whether any such hacks are still needed after the visilibity
> map that changed the landscape for vacuum so dramatically.

Yeah.  The one thing in this thread that seemed like a good idea to me
was to bias the FSM code a little bit towards returning space near the
start of the relation, rather than its current behavior of treating all
the free space equally.  The current arrangement pretty much guarantees
that you'll never recover from a bloating episode without taking special
manual action.  (This is not new to 8.4, the previous FSM code behaved
the same.)

The analogy in the back of my mind is the btree code that decides
whether to split the current page or move to the next page when it has a
full page and a new key that could go to either page.  We found out that
randomizing that choice made a *huge* improvement in the average
behavior, even with the probabilities set up as 99-to-1.  I'm thinking
that just a small chance of resetting the search to the start of the
relation might do the trick for FSM.
        regards, tom lane


Re: FSM search modes

From
Simon Riggs
Date:
On Thu, 2009-10-01 at 12:05 -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > I wonder if we should have a different mode of operation that only
> > attempted the truncate (say VACUUM TRUNCATE), optionally being
> > non-conditional about obtaining the required lock.  That said, I wonder
> > even more whether any such hacks are still needed after the visilibity
> > map that changed the landscape for vacuum so dramatically.
> 
> Yeah.  The one thing in this thread that seemed like a good idea to me
> was to bias the FSM code a little bit towards returning space near the
> start of the relation, rather than its current behavior of treating all
> the free space equally.  The current arrangement pretty much guarantees
> that you'll never recover from a bloating episode without taking special
> manual action.  (This is not new to 8.4, the previous FSM code behaved
> the same.)
> 
> The analogy in the back of my mind is the btree code that decides
> whether to split the current page or move to the next page when it has a
> full page and a new key that could go to either page.  We found out that
> randomizing that choice made a *huge* improvement in the average
> behavior, even with the probabilities set up as 99-to-1.  I'm thinking
> that just a small chance of resetting the search to the start of the
> relation might do the trick for FSM.

No real need to be random is there? In the bloated space scenario,
VACUUM will be triggered but will be unable to remove the empty blocks.
So in that case VACUUM can hint the FSM to perform "start from beginning
of relation" behaviour. When a searcher does that and can't usefully
find space quickly, then we can reset the hint back to normal. So it's
an automated mode change in both directions.

I think we can use the N-to-1 idea to define what we mean by "quickly",
but that should be in proportion to exactly how many free blocks there
were in table, not a fixed N. It would be a shame to make this work
poorly for very large tables.

It would be more useful to think of this as "look for huge chunks of
space and fill them" rather than "start at beginning", since space won't
always be right at start.

-- Simon Riggs           www.2ndQuadrant.com



Re: FSM search modes

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> No real need to be random is there? In the bloated space scenario,
> VACUUM will be triggered but will be unable to remove the empty blocks.
> So in that case VACUUM can hint the FSM to perform "start from beginning
> of relation" behaviour.

No, that's an entirely unrelated issue.  My point is that there won't
*be* any empty end blocks unless we discourage FSM from handing out
those pages as insertion targets.
        regards, tom lane


Re: FSM search modes

From
"Kevin Grittner"
Date:
Simon Riggs <simon@2ndQuadrant.com> wrote:
> It would be more useful to think of this as "look for huge chunks of
> space and fill them" rather than "start at beginning", since space
> won't always be right at start.
Either I misunderstand you or I disagree.  If there's a huge chunk of
space near the end, and many many smaller spaces spread throughout,
what I'd like is for rows to be placed in those small ones.  This
would minimize the number of pages to read for queries, and would
present some possibility that the rows past the huge chunk might
eventually be deleted or non-HOT updated, allowing the bloat to
eventually be cleaned up with minimal pain.
-Kevin


Re: FSM search modes

From
Simon Riggs
Date:
On Thu, 2009-10-01 at 11:32 -0500, Kevin Grittner wrote:

> Either I misunderstand you or I disagree.

That does seem to be a common stance, though I will read on. :-)

-- Simon Riggs           www.2ndQuadrant.com



Re: FSM search modes

From
Simon Riggs
Date:
On Thu, 2009-10-01 at 11:32 -0500, Kevin Grittner wrote:
> If there's a huge chunk of
> space near the end, and many many smaller spaces spread throughout,
> what I'd like is for rows to be placed in those small ones.  This
> would minimize the number of pages to read for queries, and would
> present some possibility that the rows past the huge chunk might
> eventually be deleted or non-HOT updated, allowing the bloat to
> eventually be cleaned up with minimal pain.

Yes, as Tom points out, this must be done with bias away from the very
end of the table.

I meant that we should start from the beginning of large spaces and that
we shouldn't assume that all space worth filling is at start of
relation.

-- Simon Riggs           www.2ndQuadrant.com



Re: FSM search modes

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> Yes, as Tom points out, this must be done with bias away from the very
> end of the table.

> I meant that we should start from the beginning of large spaces and that
> we shouldn't assume that all space worth filling is at start of
> relation.

Right.  One of the goals that FSM is trying to meet is ensuring that
different backends aren't trying to insert into the same page
concurrently, so as to reduce page-level contention.  So for example
it'd be a bad idea to adopt the really-dumb strategy of searching from
the start of the table for each request --- it'd funnel all the backends
to the same target page.  What we want is a behavior that is randomized
but tends to prefer pages towards the start rather than hitting all free
pages equally.  The btree precedent makes me suspect that quite a weak
preference would be sufficient.  So for example we might try resetting
the search to the start of the relation with probability 0.01.
        regards, tom lane


Re: FSM search modes

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> Yes, as Tom points out, this must be done with bias away from the
>> very end of the table.
> 
>> I meant that we should start from the beginning of large spaces and
>> that we shouldn't assume that all space worth filling is at start
>> of relation.
OK, so I did misunderstand you; we agree after all.  :-)
> So for example we might try resetting the search to the start of the
> relation with probability 0.01.
If I understand the heuristic you propose, and my math skill haven't
eroded too badly from lack of use, every 229 spots considered would
cause a 90% chance of reset.  That means that the odds of getting past
50,000 spots (the number of pages with available free space at which I
generally start to get worried) without resetting is about 1 in 10^218
-- which is a risk I'm willing to accept.  ;-)
-Kevin


Re: FSM search modes

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So for example we might try resetting the search to the start of the
>> relation with probability 0.01.
> If I understand the heuristic you propose, and my math skill haven't
> eroded too badly from lack of use, every 229 spots considered would
> cause a 90% chance of reset.

Sorry, I wasn't clear.  What I was thinking of was that we'd consider
resetting the search position once, upon entry to fsm_search, and then
search normally thereafter.  Some experimentation would be needed to
choose the right probability of course.  A number like 0.01 might seem
too small to affect the behavior at all, but that's what we thought
about the btree case too.  A very light thumb upon the scales may be
sufficient.
        regards, tom lane


Re: FSM search modes

From
Heikki Linnakangas
Date:
Kevin Grittner wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Simon Riggs <simon@2ndQuadrant.com> writes:
>>> Yes, as Tom points out, this must be done with bias away from the
>>> very end of the table.
>>> I meant that we should start from the beginning of large spaces and
>>> that we shouldn't assume that all space worth filling is at start
>>> of relation.
>  
> OK, so I did misunderstand you; we agree after all.  :-)
>  
>> So for example we might try resetting the search to the start of the
>> relation with probability 0.01.
>  
> If I understand the heuristic you propose, and my math skill haven't
> eroded too badly from lack of use, every 229 spots considered would
> cause a 90% chance of reset.  That means that the odds of getting past
> 50,000 spots (the number of pages with available free space at which I
> generally start to get worried) without resetting is about 1 in 10^218
> -- which is a risk I'm willing to accept.  ;-)

The FSM currently uses a clock sweep kind of algorithm to hand out
pages, and the clock hand is reset to 0 at every vacuum. The purpose of
resetting the clock hand is precisely to bias towards lower-numbered
pages, to allow truncation later on.

That's a bit of an over-simplification, though, there's actually a
separate "clock hand" on each FSM page (next_slot pointer).

We probably could do with more bias. For example, we still prefer pages
close to the page we last inserted to, by searching for free space in
the same FSM page first, before starting the search from the top of the
tree. And of course, each backend tries first to insert to the last page
it inserted to before even consulting the FSM. A mode to disable those
behaviors might indeed be useful, along with the random reset.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: FSM search modes

From
"Kevin Grittner"
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: 
> Kevin Grittner wrote:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> So for example we might try resetting the search to the start of
the
>>> relation with probability 0.01.
>>  
>> If I understand the heuristic you propose, and my math skill
>> haven't eroded too badly from lack of use, every 229 spots
>> considered would cause a 90% chance of reset.  That means that the
>> odds of getting past 50,000 spots (the number of pages with
>> available free space at which I generally start to get worried)
>> without resetting is about 1 in 10^218 -- which is a risk I'm
>> willing to accept.  ;-)
> 
> The FSM currently uses a clock sweep kind of algorithm to hand out
> pages, and the clock hand is reset to 0 at every vacuum. The purpose
> of resetting the clock hand is precisely to bias towards
> lower-numbered pages, to allow truncation later on.
> 
> That's a bit of an over-simplification, though, there's actually a
> separate "clock hand" on each FSM page (next_slot pointer).
> 
> We probably could do with more bias. For example, we still prefer
> pages close to the page we last inserted to, by searching for free
> space in the same FSM page first, before starting the search from
> the top of the tree. And of course, each backend tries first to
> insert to the last page it inserted to before even consulting the
> FSM. A mode to disable those behaviors might indeed be useful, along
> with the random reset.
That flushes out the descriptions given by Tom, but doesn't really
make me think I misunderstood the heuristic or miscalculated the odds
of getting far from the start with a 1% probability of a reset on each
advance of the sweep.  (Of course, it's possible I'm being dense and
taking statements to mean what I expect, despite evidence to the
contrary.)
The way I figure it, if there is a 0.01 chance to reset the sweep,
then there's a 0.99 percent chance to continue the sweep from the last
position.  0.99^229 is about 0.1, which means there is a 10% chance
not to have reset after that many attempts to advance.  Every 229
attempts after that multiplies by 0.1, too.  So, after 229*6 attempts
(for example) the odds of not having reset are about one in a million.
Am I saying something stupid here?  (I used to be good at these sorts
of calculations....)
-Kevin


Re: FSM search modes

From
"Kevin Grittner"
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: 
> 0.99 percent chance to continue the sweep
Make that a 99% chance, or a 0.99 chance (in case the typo was not
apparent).
> Am I saying something stupid here?
Well, besides that line?
-Kevin


Re: FSM search modes

From
Robert Haas
Date:
On Thu, Oct 1, 2009 at 3:23 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> We probably could do with more bias. For example, we still prefer pages
> close to the page we last inserted to, by searching for free space in
> the same FSM page first, before starting the search from the top of the
> tree. And of course, each backend tries first to insert to the last page
> it inserted to before even consulting the FSM. A mode to disable those
> behaviors might indeed be useful, along with the random reset.

The elephant in the room here is that if the relation is a million
pages of which 1-100,000 and 1,000,000 are in use, no amount of bias
is going to help us truncate the relation unless every tuple on page
1,000,000 gets updated or deleted.  If the user doesn't have any
interest in doing anything to those tuples, nothing will ever happen.

And this is likely to come up a lot, because many tables have a core
of data that is rarely or never updated.  For example, imagine that
90% of pages contain data that is updated regularly and 10% of pages
contain data that is rarely or never updated.  If these are
distributed uniformly, we figure to hit one of the latter type within
10 pages or so, and then we're stymied.

...Robert


Re: FSM search modes

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> The way I figure it, if there is a 0.01 chance to reset the sweep,
> then there's a 0.99 percent chance to continue the sweep from the last
> position.  0.99^229 is about 0.1, which means there is a 10% chance
> not to have reset after that many attempts to advance.

Right, so the odds would be that a backend will confine its insertion
attempts to the first 229 pages containing a usable amount of free
space.  As long as the number of backends concurrently inserting
into the relation is well under 229, this seems perfectly fine.
(Hm, so we might want to make the probability depend on
max_connections?)

A possible downside of keeping things compact this way is that you'd
probably get a longer average search distance because of all the early
pages tending to remain full.  Maybe what we want is some bias against
inserting in the last half or quarter of the table, or some such rule,
rather than necessarily heading for the start of the relation.
        regards, tom lane


Re: FSM search modes

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> The elephant in the room here is that if the relation is a million
> pages of which 1-100,000 and 1,000,000 are in use, no amount of bias
> is going to help us truncate the relation unless every tuple on page
> 1,000,000 gets updated or deleted.

Well, there is no way to move a tuple across pages in a user-invisible,
non-blocking fashion, so our ability to do something automatic about the
above scenario is limited.  The discussion at the moment is about ways
of reducing the probability of getting into that situation in the first
place.  That doesn't preclude also providing some more-invasive tools
that people can use when they do get into that situation; but let's
not let I-want-a-magic-pony syndrome prevent us from doing anything
at all.
        regards, tom lane


Re: FSM search modes

From
Robert Haas
Date:
On Thu, Oct 1, 2009 at 4:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> A possible downside of keeping things compact this way is that you'd
> probably get a longer average search distance because of all the early
> pages tending to remain full.  Maybe what we want is some bias against
> inserting in the last half or quarter of the table, or some such rule,
> rather than necessarily heading for the start of the relation.

Yeah, that seems better.  Avoiding the last quarter seems like it
would likely be sufficient.

If it's not too hard, it might make sense to only apply this bias when
we know that the table is already somewhat bloated.

...Robert


Re: FSM search modes

From
Robert Haas
Date:
On Thu, Oct 1, 2009 at 5:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> The elephant in the room here is that if the relation is a million
>> pages of which 1-100,000 and 1,000,000 are in use, no amount of bias
>> is going to help us truncate the relation unless every tuple on page
>> 1,000,000 gets updated or deleted.
>
> Well, there is no way to move a tuple across pages in a user-invisible,
> non-blocking fashion, so our ability to do something automatic about the
> above scenario is limited.  The discussion at the moment is about ways
> of reducing the probability of getting into that situation in the first
> place.  That doesn't preclude also providing some more-invasive tools
> that people can use when they do get into that situation; but let's
> not let I-want-a-magic-pony syndrome prevent us from doing anything
> at all.

That's fair enough, but it's our usual practice to consider, before
implementing a feature or code change, what fraction of the people it
will actually help and by how much.  If there's a way that we can
improve the behavior of the system in this area, I am all in favor of
it, but I have pretty modest expectations for how much real-world
benefit will ensue.  I suspect that it's pretty common for large
tables to contain a core of infrequently-updated records, and even a
very light smattering of those, distributed randomly, will be enough
to stop table shrinkage before it can get very far.

...Robert


Re: FSM search modes

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote: 
> The elephant in the room here is that if the relation is a million
> pages of which 1-100,000 and 1,000,000 are in use, no amount of bias
> is going to help us truncate the relation unless every tuple on page
> 1,000,000 gets updated or deleted.
Perhaps bias, combined with a client utility to force non-HOT updates
of some rows at the end of the table would provide the desired
behavior.  (It'd be nice if that could be built in to vacuum, but if
it's not feasible, a separate utility is workable.)  Off the top of my
head, I might set up a routine crontab job for most databases to do
that to the lesser of 1% of the rows or a number of rows which matches
how far the pages with free space exceed 10% of total pages.  That
seems like it should contain things for most circumstances without
getting too extreme.  One could always run the utility manually to
correct more extreme bloat.
Some sort of delay (similar to what vacuum can do) to prevent tanking
performance would be good.  We wouldn't care about the occasional
update conflict -- we expect that using a relational database means
dealing with serialization failures.  We'd be more than happy to
accept a few of these in exchange for keeping performance optimal.  If
your software is designed to reschedule transactions which are rolled
back for serialization failures, they are just another performance
cost, so it's pretty easy to balance one cost against another.
-Kevin


Re: FSM search modes

From
Dimitri Fontaine
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> That doesn't preclude also providing some more-invasive tools
> that people can use when they do get into that situation; 

About this side of things, what about having any query which asks for
system columns (any of them) take a specific (new) exclusive row level
lock. The tuple moving utility would try to take the same lock before
moving any tuple, thus allowing those who need it to continue relying on
the ctid / xmin queries trick.

If that idea has any chance to fly, the tuple moving facility could even
be integrated into autovacuum for actively self-healing PostgreSQL.

Regards,
-- 
dim

Hoping this won't be some more hand waving.


Re: FSM search modes

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> (Hm, so we might want to make the probability depend on
> max_connections?)
Without doing rigorous math on it, I'd guess that to prevent
contention among n connections you'd want the probably of resetting
the sweep to be about 1 / (n * 2).  That would mean you'd advance to
the nth page about 60.6% of the time without resetting the sweep.  For
less contention, 1 / (n * 4) would let you get to the nth page about
77.9% of the time.
> Maybe what we want is some bias against inserting in the last half
> or quarter of the table, or some such rule, rather than necessarily
> heading for the start of the relation.
I think it would make sense to just start using this once you get into
the last half or quarter of the free pages.  If you go with the last
quarter, then you might want to use a higher probability than I
suggested above, although that would tend to leave you with contention
when all the free space was in the last quarter.  I'd be inclined to
use something like the above probability and start using it at 50%.
-Kevin


Re: FSM search modes

From
"Kevin Grittner"
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: 
> I think it would make sense to just start using this once you get
> into the last half or quarter of the free pages.  If you go with the
> last quarter, then you might want to use a higher probability than I
> suggested above, although that would tend to leave you with
> contention when all the free space was in the last quarter.  I'd be
> inclined to use something like the above probability and start using
> it at 50%.
Fuzzy thinking there -- if it's the last quarter of the *free* pages,
the suggested probabilities should be fine.  (Somehow I got to
thinking, for a moment, that it would be the last quarter of the
relation's overall pages....)
-Kevin


Re: FSM search modes

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Fuzzy thinking there -- if it's the last quarter of the *free* pages,
> the suggested probabilities should be fine.  (Somehow I got to
> thinking, for a moment, that it would be the last quarter of the
> relation's overall pages....)

It's going to be the latter --- we do not know, and are *not* going to
invest the cycles to find out, how many pages have a useful amount of
free space.  Even finding out the relation physical length might be
more cycles than we want to spend here ...
        regards, tom lane


Re: FSM search modes

From
Simon Riggs
Date:
On Thu, 2009-10-01 at 17:08 -0400, Tom Lane wrote:
> The discussion at the moment is about ways
> of reducing the probability of getting into that situation in the
> first place.  

Definitely. 

> That doesn't preclude also providing some more-invasive tools that
> people can use when they do get into that situation

Yes, this is a separate thread specifically so we can get both; the
VACUUM FULL replacement discussion had/was progressing well so ought not
need to bring it up again here.

-- Simon Riggs           www.2ndQuadrant.com



Re: FSM search modes

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> [pages with free space or total pages in relation?]
> 
> It's going to be the latter --- we do not know, and are *not* going
> to invest the cycles to find out, how many pages have a useful
> amount of free space.  Even finding out the relation physical length
> might be more cycles than we want to spend here ...
OK, after mulling this over for a while, I suspect we'd do pretty well
with starting to consider resetting the sweep after hitting 50% of the
last known relation size (or whatever best approximation is available
at low cost), and using a reset probability of 1 / (max_connections *
4).  That gives about a 77.8% chance of getting to at least
max_connections before resetting the sweep.  Since it leaves about an
8.2% chance of getting to at least ten times max_connections pages
before resetting the sweep, I'm inclined to think we'd want to start
that at 50% of the relation rather than waiting until we get to the
last quarter.  As one more data point to consider, if someone has 2000
connections configured (which I've seen mentioned in many posts), you
would get to 50,000 pages past the point where you start using this
technique one time out of 500.
-Kevin


Re: FSM search modes

From
Bruce Momjian
Date:
Kevin Grittner wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
>  
> > (Hm, so we might want to make the probability depend on
> > max_connections?)
>  
> Without doing rigorous math on it, I'd guess that to prevent
> contention among n connections you'd want the probably of resetting
> the sweep to be about 1 / (n * 2).  That would mean you'd advance to
> the nth page about 60.6% of the time without resetting the sweep.  For
> less contention, 1 / (n * 4) would let you get to the nth page about
> 77.9% of the time.
>  
> > Maybe what we want is some bias against inserting in the last half
> > or quarter of the table, or some such rule, rather than necessarily
> > heading for the start of the relation.
>  
> I think it would make sense to just start using this once you get into
> the last half or quarter of the free pages.  If you go with the last
> quarter, then you might want to use a higher probability than I
> suggested above, although that would tend to leave you with contention
> when all the free space was in the last quarter.  I'd be inclined to
> use something like the above probability and start using it at 50%.

Two things that might not have been mentioned:  First, only reset if you
are given a page in the last 1/4 of the table;  that way, if there is no
free space in the last 1/4 of the table, you will not be resetting.  A
second idea is to heavily bias against using the last table page with
data in it; that should help bias toward empty pages on the end of the
table.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: FSM search modes

From
decibel
Date:
On Oct 1, 2009, at 4:18 PM, Robert Haas wrote:

> On Thu, Oct 1, 2009 at 5:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> The elephant in the room here is that if the relation is a million
>>> pages of which 1-100,000 and 1,000,000 are in use, no amount of bias
>>> is going to help us truncate the relation unless every tuple on page
>>> 1,000,000 gets updated or deleted.
>>
>> Well, there is no way to move a tuple across pages in a user- 
>> invisible,
>> non-blocking fashion, so our ability to do something automatic  
>> about the
>> above scenario is limited.  The discussion at the moment is about  
>> ways
>> of reducing the probability of getting into that situation in the  
>> first
>> place.  That doesn't preclude also providing some more-invasive tools
>> that people can use when they do get into that situation; but let's
>> not let I-want-a-magic-pony syndrome prevent us from doing anything
>> at all.
>
> That's fair enough, but it's our usual practice to consider, before
> implementing a feature or code change, what fraction of the people it
> will actually help and by how much.  If there's a way that we can
> improve the behavior of the system in this area, I am all in favor of
> it, but I have pretty modest expectations for how much real-world
> benefit will ensue.  I suspect that it's pretty common for large


Speaking of helping other cases...

Something else that's been talked about is biasing FSM searches in  
order to try and keep a table clustered. If it doesn't add a lot of  
overhead, it would be nice to keep that in mind. I don't know where  
something like randomly reseting the search would go in the code, but  
I suspect it wouldn't be very expandable in the future.

But like Tom said, the top goal here is to help deal with bloat, not  
other fanciness.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828