Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD
Date
Msg-id 4DDFB105.9010806@enterprisedb.com
Whole thread Raw
In response to Vacuum, visibility maps and SKIP_PAGES_THRESHOLD  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD  (Greg Stark <gsstark@mit.edu>)
Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On 27.05.2011 16:52, Pavan Deolasee wrote:
> On closer inspection, I realized that we have
> deliberately put in this hook to ensure that we use visibility maps
> only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
> sequential pages to take advantage of possible OS seq scan
> optimizations.

That, and the fact that if you skip any page, you can't advance 
relfrozenxid.

> My statistical skills are limited, but wouldn't that mean that for a
> fairly well distributed write activity across a large table, if there
> are even 3-4% update/deletes, we would most likely hit a
> not-all-visible page for every 32 pages scanned ? That would mean that
> almost entire relation will be scanned even if the visibility map
> tells us that only 3-4% pages require scanning ?  And the probability
> will increase with the increase in the percentage of updated/deleted
> tuples. Given that the likelihood of anyone calling VACUUM (manually
> or through autovac settings) on a table which has less than 3-4%
> updates/deletes is very low, I am worried that might be loosing all
> advantages of visibility maps for a fairly common use case.

Well, as with normal queries, it's usually faster to just seqscan the 
whole table if you need to access more than a few percent of the pages, 
because sequential I/O is so much faster than random I/O. The visibility 
map really only helps if all the updates are limited to some part of the 
table. For example, if you only recent records are updated frequently, 
and old ones are almost never touched.

> Do we have any numbers to prove what we have today is good ? Sorry, I
> may not have followed the discussions very closely in the past and not
> sure if this has been debated/tested already.

I think that number was chosen quite arbitrary. When you consider 
updating relfrozenxid, it's a bit difficult to decide what the optimal 
value would be; if you decide to skip pages you might have to perform an 
extra anti-wraparound somewhere down the line.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Cédric Villemain
Date:
Subject: Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD
Next
From: Tom Lane
Date:
Subject: Re: What is the best and easiest implementation to reliably wait for the completion of startup?