Thread: 'Real' auto vacuum?

'Real' auto vacuum?

From
Ralph Mason
Date:
This is a wild and crazy thought which I am sure is invalid for some
good reason.

But why can't postgres just vacuum itself as it goes along?

When a row is orphaned it's added to a list of possibly available rows.
When a new row is needed the list of possible rows is examined and the
first one with a transaction id less then the lowest running transaction
id is chosen to be the new row?  These rows can be in a heap so it's
really fast to find one.

Like magic - no more vacuuming.  No more holes for people to fall into.

Is this an oversimplification of the problem?

Ralph

Re: 'Real' auto vacuum?

From
mark@mark.mielke.cc
Date:
On Wed, Aug 31, 2005 at 10:21:20AM +1200, Ralph Mason wrote:
> This is a wild and crazy thought which I am sure is invalid for some
> good reason.
>
> But why can't postgres just vacuum itself as it goes along?
>
> When a row is orphaned it's added to a list of possibly available rows.
> When a new row is needed the list of possible rows is examined and the
> first one with a transaction id less then the lowest running transaction
> id is chosen to be the new row?  These rows can be in a heap so it's
> really fast to find one.
>
> Like magic - no more vacuuming.  No more holes for people to fall into.

Yes please. :-)

> Is this an oversimplification of the problem?

But, yeah. It's probably not that easy, especially with really big
databases. Where is this free list stored? How efficient is it to keep
track of the lowest running transaction at all times? How does one
synchronize access to this free list, to ensure that processes don't
block up waiting for access to the free list? Is the fre list
journalled to prevent corruption, and the accidental re-use of a still
in use row? And, there would be a cost to scanning this list on every
insert or update.

As an outsider (like you?) I see the current model as a design flaw as
well. A neat and tidy model on paper. Not so nice in real life. The
need to vacuum in batch mode, to keep the database from dying, seems
intuitively bad.

I think there must be answers to this problem. Even simple
optimizations, such as defining a table such that any delete or update
within a table, upon commit, will attempt to vacuum just the rows that
should not be considered free for any new transactions. If it's in
use by an active transaction, oh well. It can be picked up by a batch
run of vacuum. If it's free though - let's do it now.

I think any optimizations we come up with, will be more happily accepted
with a working patch that causes no breakage... :-)

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


Re: 'Real' auto vacuum?

From
Josh Berkus
Date:
Ralph,

> When a row is orphaned it's added to a list of possibly available rows.
> When a new row is needed the list of possible rows is examined and the
> first one with a transaction id less then the lowest running transaction
> id is chosen to be the new row?  These rows can be in a heap so it's
> really fast to find one.

This is the long-term plan.    However, it's actually a lot harder than it
sounds.  Patches welcome.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: 'Real' auto vacuum?

From
Ralph Mason
Date:
mark@mark.mielke.cc wrote:

>But, yeah. It's probably not that easy, especially with really big
>databases. Where is this free list stored? How efficient is it to keep
>track of the lowest running transaction at all times? How does one
>synchronize access to this free list, to ensure that processes don't
>block up waiting for access to the free list? Is the fre list
>journalled to prevent corruption, and the accidental re-use of a still
>in use row? And, there would be a cost to scanning this list on every
>insert or update.
>
>
I suspect the freelist could be stored as an index, and just handily
postgres supports those out of the box.    There would be a cost yes,
but then what is the cost of adding pages to the file all the time?  I
guess as with all things there is no one size fits all, so perhaps you
could turn it off - although I expect for 99.9% of the cases 'on' would
be the better choice. If it gets broken there is already the reindex
code that can fix it. A coherency / fixing / recover of a table command
would probably be a useful tool anyway.

>As an outsider (like you?) I see the current model as a design flaw as
>well. A neat and tidy model on paper. Not so nice in real life. The
>need to vacuum in batch mode, to keep the database from dying, seems
>intuitively bad.
>
>
We have a script that vacuums the database every 5 minutes, excessive -
yes, but turns out that any less is no good really.  I think that this
is sub optimal, the DB work keeps running, but the vacuum can slow down
other tasks.  It also probably flushes data that we would need out of
the page cache so it can look at data that isn't used often as the
vacuum runs. Not the most optimal data access pattern I could imagine.

>I think there must be answers to this problem. Even simple
>optimizations, such as defining a table such that any delete or update
>within a table, upon commit, will attempt to vacuum just the rows that
>should not be considered free for any new transactions. If it's in
>use by an active transaction, oh well. It can be picked up by a batch
>run of vacuum. If it's free though - let's do it now.
>
>
Anything would be good - I think it's the achilles  heel of postgres.
Perhaps there is something simple like that could fix 95% of the problem.

>I think any optimizations we come up with, will be more happily accepted
>with a working patch that causes no breakage... :-)
>
>
>
I am sure they would.

Cheers
Ralph


Re: 'Real' auto vacuum?

From
"Mindaugas Riauba"
Date:
> > When a row is orphaned it's added to a list of possibly available rows.
> > When a new row is needed the list of possible rows is examined and the
> > first one with a transaction id less then the lowest running transaction
> > id is chosen to be the new row?  These rows can be in a heap so it's
> > really fast to find one.
>
> This is the long-term plan.    However, it's actually a lot harder than it
> sounds.  Patches welcome.

  Some ETA? Since that would be the most welcome addition for us. We
have few very heavily updated databases where table bloat and constant
vacuuming is killing performance.

  Mindaugas


Re: 'Real' auto vacuum?

From
Brad Nicholson
Date:
Mindaugas Riauba wrote:

>>>When a row is orphaned it's added to a list of possibly available rows.
>>>When a new row is needed the list of possible rows is examined and the
>>>first one with a transaction id less then the lowest running transaction
>>>id is chosen to be the new row?  These rows can be in a heap so it's
>>>really fast to find one.
>>>
>>>
>>This is the long-term plan.    However, it's actually a lot harder than it
>>sounds.  Patches welcome.
>>
>>
>
>  Some ETA? Since that would be the most welcome addition for us. We
>have few very heavily updated databases where table bloat and constant
>vacuuming is killing performance.
>
>
>
How often are you vacuuming (the definition of 'constantly' tends to
vary)?  Are you vacuuming the whole database each time?  If so, identify
which tables are being updated frequently, and vacuum those often.
Vacuum other tables less frequently.

Also, are you you using VACUUM FULL (if so, you certainly don't want to be).

--
Brad Nicholson  416-673-4106    bnichols@ca.afilias.info
Database Administrator, Afilias Canada Corp.