Re: really lazy vacuums? - Mailing list pgsql-hackers

From Jesper Krogh
Subject Re: really lazy vacuums?
Date
Msg-id 4D8268F2.7010303@krogh.cc
Whole thread Raw
In response to Re: really lazy vacuums?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: really lazy vacuums?
List pgsql-hackers
On 2011-03-17 15:02, Robert Haas wrote:
> On Thu, Mar 17, 2011 at 4:17 AM, Jesper Krogh<jesper@krogh.cc>  wrote:
>> Is it obvious that the visibillity map bits should track complete
>> pages and not individual tuples? If the visibillity map tracks at
>> page-level the benefit would fall on "slim tables" where you squeeze
>> 200 tuples into each page and having an update rate of 1% would
>> lower the likelyhood even more. (it may be that for slim tables the
>> index-only-scans are not as benefitial as to wide tables).
> I'm not sure exactly what MaxHeapTuplesPerPage works out to be, but
> say it's 200.  If you track visibility info per tuple rather than per
> page, then the size of the visibility map is going to expand by a
> factor of 200.  That might decrease contention, but otherwise it's a
> bad thing - the whole point of having the visibility map in the first
> place is that it's much, much smaller than the heap.  If it were the
> same size as the heap, we could just read the heap.  What the map
> attempts to accomplish is to allow us, by reading a small number of
> pages, to check whether the tuples we're thinking of reading are
> likely to be all-visible without actually looking at them.
Yes, that was sort of the math I was trying to make. I do allthough
belive that you have a way better feeling about it. But according
to this: 
http://wiki.postgresql.org/wiki/FAQ#How_much_database_disk_space_is_required_to_store_data_from_a_typical_text_file.3F


The bulk row-overhead is around 24bytes, which will with 1 bit per row 
give a
size reduction of 1:(24x8) ~1:192, worstcase... that gives at best 341 
tuples/page
(where each tuple, does not contain any data at all). With that ratio, the
visibillitymap of a relation of 10GB would fill 52MB on disk (still 
worst case)
and that by itself would by all means be awesome. (with that small tuples a
10GB relation would have around 42 billion tuples).

On the 1 bit per page the "best case" would be 341 times better than above
reducing the size of the visibiility map on a 10GB table to around 152KB 
which
is extremely small (and thus also awesome) But the consequenses of a single
update would mean that you loose visibilllity map benefit on 341 tuples in
one shot.

Worst case situations are, where we approach the 4 tuples per page, before
we hit toast where the ratio of space reduction in 1 bit per tuple would 
be:
1:(2048x8) ~ 1:16384 and the 1 bit per page is 4 times better.
In the 1 bit per tuple a visibillity map of a 10GB relation would be 
around 610KB
1 bit per page would then drop it to around 160KB.


Can we drag out some average-case numbers on row-size in the heap
from some real production systems?

I may have gotten something hugely wrong in above calculations and/or
have missed some important points.

-- 
Jesper


pgsql-hackers by date:

Previous
From: Radosław Smogura
Date:
Subject: 2nd Level Buffer Cache
Next
From: "Kevin Grittner"
Date:
Subject: Re: 2nd Level Buffer Cache