Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD
Date
Msg-id 201106031916.p53JGTC27199@momjian.us
Whole thread Raw
In response to Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD
List pgsql-hackers
Heikki Linnakangas wrote:
> On 27.05.2011 16:52, Pavan Deolasee wrote:
> > On closer inspection, I realized that we have
> > deliberately put in this hook to ensure that we use visibility maps
> > only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
> > sequential pages to take advantage of possible OS seq scan
> > optimizations.
> 
> That, and the fact that if you skip any page, you can't advance 
> relfrozenxid.
> 
> > My statistical skills are limited, but wouldn't that mean that for a
> > fairly well distributed write activity across a large table, if there
> > are even 3-4% update/deletes, we would most likely hit a
> > not-all-visible page for every 32 pages scanned ? That would mean that
> > almost entire relation will be scanned even if the visibility map
> > tells us that only 3-4% pages require scanning ?  And the probability
> > will increase with the increase in the percentage of updated/deleted
> > tuples. Given that the likelihood of anyone calling VACUUM (manually
> > or through autovac settings) on a table which has less than 3-4%
> > updates/deletes is very low, I am worried that might be loosing all
> > advantages of visibility maps for a fairly common use case.
> 
> Well, as with normal queries, it's usually faster to just seqscan the 
> whole table if you need to access more than a few percent of the pages, 
> because sequential I/O is so much faster than random I/O. The visibility 
> map really only helps if all the updates are limited to some part of the 
> table. For example, if you only recent records are updated frequently, 
> and old ones are almost never touched.

I realize we just read the pages from the kernel to maintain sequential
I/O, but do we actually read the contents of the page if we know it
doesn't need vacuuming?  If so, do we need to?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: SIREAD lock versus ACCESS EXCLUSIVE lock
Next
From: Peter Eisentraut
Date:
Subject: Re: DOCS: SGML identifier may not exceed 44 characters