Re: Why frequently updated tables are an issue - Mailing list pgsql-hackers

From Jan Wieck
Subject Re: Why frequently updated tables are an issue
Date
Msg-id 40CB515F.3050303@Yahoo.com
Whole thread Raw
In response to Re: Why frequently updated tables are an issue  (Shridhar Daithankar <shridhar@frodo.hserus.net>)
Responses Re: Why frequently updated tables are an issue
List pgsql-hackers
On 6/10/2004 10:37 AM, Shridhar Daithankar wrote:

> pgsql@mohawksoft.com wrote:
>> The session table is a different issue, but has the same problems. You
>> have an active website, hundreds or thousands of hits a second, and you
>> want to manage sessions for this site. Sessions are created, updated many
>> times, and deleted. Performance degrades steadily until a vacuum. Vacuum
>> has to be run VERY frequently. Prior to lazy vacuum, this was impossible.
>> 
>> Both session tables and summary tables have another thing in common, they
>> are not vital data, they hold transitive state information. Yea, sure,
>> data integrity is important, but if you lose these values, you can either
>> recreate it or it isn't too important.
>> 
>> Why put that is a database at all? Because, in the case of sessions
>> especially, you need to access this information for other operations. In
>> the case of summary tables, OLAP usually needs to join or include this
>> info.
>> 
>> PostgreSQL's behavior on these cases is poor. I don't think anyone who has
>> tried to use PG for this sort of thing will disagree, and yes it is
>> getting better. Does anyone else consider this to be a problem? If so, I'm
>> open for suggestions on what can be done. I've suggested a number of
>> things, and admittedly they have all been pretty weak ideas, but they were
>> potentially workable.
> 
> There is another as-of-non-feasible and hence rejected approach. Vacuum in 
> postgresql is tied to entire relations/objects since indexes do not have 
> transaction visibility information.
> 
> It has been suggested in past to add such a visibility to index tuple header so 
> that index and heaps can be cleaned out of order. In such a case other backround 
> processes such as background writer and soon-to-be integrated autovacuum daemon 
> can vacuum pages/buffers rather than relations. That way most used things will 
> remain clean and cost of cleanup will remain outside crtical transaction 
> processing path.

This is not feasable because at the time you update or delete a row you 
would have to visit all it's index entries. The performance impact on 
that would be immense.

But a per relation bitmap that tells if a block is a) free of dead 
tuples and b) all remaining tuples in it are frozen could be used to let 
vacuum skip them (there can't be anything to do). The bit would get 
reset whenever the block is marked dirty. This would cause vacuum to 
look at mainly recently touched blocks, likely to be found in the buffer 
cache anyway and thus dramatically reduce the amount of IO and thereby 
make high frequent vacuuming less expensive.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: File leak?
Next
From: Tom Lane
Date:
Subject: Re: Why frequently updated tables are an issue