Re: really lazy vacuums? - Mailing list pgsql-hackers
From | Jesper Krogh |
---|---|
Subject | Re: really lazy vacuums? |
Date | |
Msg-id | 4D8268F2.7010303@krogh.cc Whole thread Raw |
In response to | Re: really lazy vacuums? (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: really lazy vacuums?
|
List | pgsql-hackers |
On 2011-03-17 15:02, Robert Haas wrote: > On Thu, Mar 17, 2011 at 4:17 AM, Jesper Krogh<jesper@krogh.cc> wrote: >> Is it obvious that the visibillity map bits should track complete >> pages and not individual tuples? If the visibillity map tracks at >> page-level the benefit would fall on "slim tables" where you squeeze >> 200 tuples into each page and having an update rate of 1% would >> lower the likelyhood even more. (it may be that for slim tables the >> index-only-scans are not as benefitial as to wide tables). > I'm not sure exactly what MaxHeapTuplesPerPage works out to be, but > say it's 200. If you track visibility info per tuple rather than per > page, then the size of the visibility map is going to expand by a > factor of 200. That might decrease contention, but otherwise it's a > bad thing - the whole point of having the visibility map in the first > place is that it's much, much smaller than the heap. If it were the > same size as the heap, we could just read the heap. What the map > attempts to accomplish is to allow us, by reading a small number of > pages, to check whether the tuples we're thinking of reading are > likely to be all-visible without actually looking at them. Yes, that was sort of the math I was trying to make. I do allthough belive that you have a way better feeling about it. But according to this: http://wiki.postgresql.org/wiki/FAQ#How_much_database_disk_space_is_required_to_store_data_from_a_typical_text_file.3F The bulk row-overhead is around 24bytes, which will with 1 bit per row give a size reduction of 1:(24x8) ~1:192, worstcase... that gives at best 341 tuples/page (where each tuple, does not contain any data at all). With that ratio, the visibillitymap of a relation of 10GB would fill 52MB on disk (still worst case) and that by itself would by all means be awesome. (with that small tuples a 10GB relation would have around 42 billion tuples). On the 1 bit per page the "best case" would be 341 times better than above reducing the size of the visibiility map on a 10GB table to around 152KB which is extremely small (and thus also awesome) But the consequenses of a single update would mean that you loose visibilllity map benefit on 341 tuples in one shot. Worst case situations are, where we approach the 4 tuples per page, before we hit toast where the ratio of space reduction in 1 bit per tuple would be: 1:(2048x8) ~ 1:16384 and the 1 bit per page is 4 times better. In the 1 bit per tuple a visibillity map of a 10GB relation would be around 610KB 1 bit per page would then drop it to around 160KB. Can we drag out some average-case numbers on row-size in the heap from some real production systems? I may have gotten something hugely wrong in above calculations and/or have missed some important points. -- Jesper
pgsql-hackers by date: