Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From A.M.
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 54624.216.41.12.254.1151072992.squirrel@webmail.webopticon.org
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
On Fri, June 23, 2006 9:56 am, Martijn van Oosterhout wrote:
> On Fri, Jun 23, 2006 at 03:30:45PM +0200, Csaba Nagy wrote:
>
>>> It sounds like you have a "big" problem and you need a "big"
>>> solution.
>>
>> Well, Postgres does a decent job as it is. The problem is under peek
>> load, sometimes it gets bogged down and the usual things like vacuum will
>> not help immediately. I think a few more features like the dead space
>> map for quick vacuum and even something like the original post's
>> proposition would make postgres fly under heavy load too...
>
> I know there have a been a number of suggestions in the past to deal
> with this thing. Some I don't remember being mentioned in this thread are:
>
>
> - Once a tuple has been determined to be invisible to everyone,
> truncate it to just the header. This would probably work wonders for
> frequently updated wide tables. However, this required keeping track of
> the oldest active xact, I'm not sure how that works at the moment.
>
> - Have the bgwriter do cleanup work before writing out a block. It
> could probably do the truncation bit above, but totally removing old tuples
> requires cleaning out the indexes too, which AIUI is the hard part of
> vacuuming.
>
> One totally whacked out idea I just thought of: Instead of just
> truncating tuples when they're invisible, mark them "dying" and make the
> data section store an array of CTIDs pointing to the index tuples pointing
> to it. Lookups that find the tuple via an index could store the CTID of
> the index tuple before continuing. If the bgwriter sees it has a full set,
> it can efficiently remove the tuple straight away.
>
> There are ofcourse drawbacks to this approach, you'd probably need
> something like the half-dirty pages to avoid a large increase in write
> load. If it's even beneficial at all given concurrency issues.

A lot of these recommendations sound like garbage collection ideas found
in modern programming languages. Perhaps it would be worth considering
allowing sessions to keep track of which pages they alter and spawn a
separate process per connection to sweep up slowly along the way.

Also, it's nice that vacuum now has slow-down settings, but why isn't
there a option to autovacuum during periods of idleness and pause when
busy?

-M




pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: xlog viewer proposal
Next
From: "Larry Rosenman"
Date:
Subject: Re: Anyone still care about Cygwin? (was Re: [CORE] GPL Source and Copyright Questions)