Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD
Date
Msg-id BANLkTinOg-q-r7Zfvsdgs+h7OruK5nAV3Q@mail.gmail.com
Whole thread Raw
In response to Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On Fri, May 27, 2011 at 7:41 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 27.05.2011 16:52, Pavan Deolasee wrote:
>>
>> On closer inspection, I realized that we have
>> deliberately put in this hook to ensure that we use visibility maps
>> only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
>> sequential pages to take advantage of possible OS seq scan
>> optimizations.
>
> That, and the fact that if you skip any page, you can't advance
> relfrozenxid.

Hmm. For a significantly large table, wouldn't it be the case that we
would most likely skip one page somewhere ? Would it be better that we
instead do a full scan every once in a while instead of relying on a
not-so-well-understood heuristic ?

>
>> My statistical skills are limited, but wouldn't that mean that for a
>> fairly well distributed write activity across a large table, if there
>> are even 3-4% update/deletes, we would most likely hit a
>> not-all-visible page for every 32 pages scanned ? That would mean that
>> almost entire relation will be scanned even if the visibility map
>> tells us that only 3-4% pages require scanning ?  And the probability
>> will increase with the increase in the percentage of updated/deleted
>> tuples. Given that the likelihood of anyone calling VACUUM (manually
>> or through autovac settings) on a table which has less than 3-4%
>> updates/deletes is very low, I am worried that might be loosing all
>> advantages of visibility maps for a fairly common use case.
>
> Well, as with normal queries, it's usually faster to just seqscan the whole
> table if you need to access more than a few percent of the pages, because
> sequential I/O is so much faster than random I/O. The visibility map really
> only helps if all the updates are limited to some part of the table.

The vacuum scan is not a complete random scan. So I am not sure how
effective a complete seq scan be. May be we need to run some tests to
measure that too before we choose one over the other.

Thanks,
Pavan


--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD
Next
From: "Kevin Grittner"
Date:
Subject: Re: dblink crash on PPC