Thread: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD

Vacuum, visibility maps and SKIP_PAGES_THRESHOLD

From
Pavan Deolasee
Date:
I wonder if we have tested the reasoning behind having
SKIP_PAGES_THRESHOLD and the magic number of 32 assigned to it
currently. While looking at the code after a long time and doing some
tests, I realized that a manual VACUUM would always scan first 31
pages of a relation which has not received any write activity since
the last VACUUM. On closer inspection, I realized that we have
deliberately put in this hook to ensure that we use visibility maps
only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
sequential pages to take advantage of possible OS seq scan
optimizations.

My statistical skills are limited, but wouldn't that mean that for a
fairly well distributed write activity across a large table, if there
are even 3-4% update/deletes, we would most likely hit a
not-all-visible page for every 32 pages scanned ? That would mean that
almost entire relation will be scanned even if the visibility map
tells us that only 3-4% pages require scanning ?  And the probability
will increase with the increase in the percentage of updated/deleted
tuples. Given that the likelihood of anyone calling VACUUM (manually
or through autovac settings) on a table which has less than 3-4%
updates/deletes is very low, I am worried that might be loosing all
advantages of visibility maps for a fairly common use case.

Do we have any numbers to prove what we have today is good ? Sorry, I
may not have followed the discussions very closely in the past and not
sure if this has been debated/tested already.

Thanks,
Pavan





--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com


Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD

From
Tom Lane
Date:
Pavan Deolasee <pavan.deolasee@gmail.com> writes:
> My statistical skills are limited, but wouldn't that mean that for a
> fairly well distributed write activity across a large table, if there
> are even 3-4% update/deletes, we would most likely hit a
> not-all-visible page for every 32 pages scanned ?

Huh?  With a typical table density of several dozen tuples per page, an
update ratio in that range would mean that just about every page would
have something for VACUUM to do, if the modified tuples are evenly
distributed.  The case where the skip optimization has some use is where
there are large "cold" sections that have no changes at all.

Having said that, I don't know how carefully we tested different values
for SKIP_PAGES_THRESHOLD.
        regards, tom lane


Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD

From
Cédric Villemain
Date:
2011/5/27 Pavan Deolasee <pavan.deolasee@gmail.com>:
> I wonder if we have tested the reasoning behind having
> SKIP_PAGES_THRESHOLD and the magic number of 32 assigned to it
> currently. While looking at the code after a long time and doing some
> tests, I realized that a manual VACUUM would always scan first 31
> pages of a relation which has not received any write activity since
> the last VACUUM. On closer inspection, I realized that we have
> deliberately put in this hook to ensure that we use visibility maps
> only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
> sequential pages to take advantage of possible OS seq scan
> optimizations.
>
> My statistical skills are limited, but wouldn't that mean that for a
> fairly well distributed write activity across a large table, if there
> are even 3-4% update/deletes, we would most likely hit a
> not-all-visible page for every 32 pages scanned ? That would mean that

The page skip is still based on VM.
So you wonder what are the chances of a VM not up-to-date when we access it ?

> almost entire relation will be scanned even if the visibility map
> tells us that only 3-4% pages require scanning ?  And the probability
> will increase with the increase in the percentage of updated/deleted
> tuples. Given that the likelihood of anyone calling VACUUM (manually
> or through autovac settings) on a table which has less than 3-4%
> updates/deletes is very low, I am worried that might be loosing all
> advantages of visibility maps for a fairly common use case.
>
> Do we have any numbers to prove what we have today is good ? Sorry, I
> may not have followed the discussions very closely in the past and not
> sure if this has been debated/tested already.
>
> Thanks,
> Pavan
>
>
>
>
>
> --
> Pavan Deolasee
> EnterpriseDB     http://www.enterprisedb.com
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD

From
Heikki Linnakangas
Date:
On 27.05.2011 16:52, Pavan Deolasee wrote:
> On closer inspection, I realized that we have
> deliberately put in this hook to ensure that we use visibility maps
> only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
> sequential pages to take advantage of possible OS seq scan
> optimizations.

That, and the fact that if you skip any page, you can't advance 
relfrozenxid.

> My statistical skills are limited, but wouldn't that mean that for a
> fairly well distributed write activity across a large table, if there
> are even 3-4% update/deletes, we would most likely hit a
> not-all-visible page for every 32 pages scanned ? That would mean that
> almost entire relation will be scanned even if the visibility map
> tells us that only 3-4% pages require scanning ?  And the probability
> will increase with the increase in the percentage of updated/deleted
> tuples. Given that the likelihood of anyone calling VACUUM (manually
> or through autovac settings) on a table which has less than 3-4%
> updates/deletes is very low, I am worried that might be loosing all
> advantages of visibility maps for a fairly common use case.

Well, as with normal queries, it's usually faster to just seqscan the 
whole table if you need to access more than a few percent of the pages, 
because sequential I/O is so much faster than random I/O. The visibility 
map really only helps if all the updates are limited to some part of the 
table. For example, if you only recent records are updated frequently, 
and old ones are almost never touched.

> Do we have any numbers to prove what we have today is good ? Sorry, I
> may not have followed the discussions very closely in the past and not
> sure if this has been debated/tested already.

I think that number was chosen quite arbitrary. When you consider 
updating relfrozenxid, it's a bit difficult to decide what the optimal 
value would be; if you decide to skip pages you might have to perform an 
extra anti-wraparound somewhere down the line.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD

From
Cédric Villemain
Date:
2011/5/27 Cédric Villemain <cedric.villemain.debian@gmail.com>:
> 2011/5/27 Pavan Deolasee <pavan.deolasee@gmail.com>:
>> I wonder if we have tested the reasoning behind having
>> SKIP_PAGES_THRESHOLD and the magic number of 32 assigned to it
>> currently. While looking at the code after a long time and doing some
>> tests, I realized that a manual VACUUM would always scan first 31
>> pages of a relation which has not received any write activity since
>> the last VACUUM. On closer inspection, I realized that we have
>> deliberately put in this hook to ensure that we use visibility maps
>> only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
>> sequential pages to take advantage of possible OS seq scan
>> optimizations.
>>
>> My statistical skills are limited, but wouldn't that mean that for a
>> fairly well distributed write activity across a large table, if there
>> are even 3-4% update/deletes, we would most likely hit a
>> not-all-visible page for every 32 pages scanned ? That would mean that
>
> The page skip is still based on VM.
> So you wonder what are the chances of a VM not up-to-date when we access it ?

re-reading the mails and answers, I misunderstood the case you exposed.

>
>> almost entire relation will be scanned even if the visibility map
>> tells us that only 3-4% pages require scanning ?  And the probability
>> will increase with the increase in the percentage of updated/deleted
>> tuples. Given that the likelihood of anyone calling VACUUM (manually
>> or through autovac settings) on a table which has less than 3-4%
>> updates/deletes is very low, I am worried that might be loosing all
>> advantages of visibility maps for a fairly common use case.
>>
>> Do we have any numbers to prove what we have today is good ? Sorry, I
>> may not have followed the discussions very closely in the past and not
>> sure if this has been debated/tested already.
>>
>> Thanks,
>> Pavan
>>
>>
>>
>>
>>
>> --
>> Pavan Deolasee
>> EnterpriseDB     http://www.enterprisedb.com
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>
>
>
> --
> Cédric Villemain               2ndQuadrant
> http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD

From
Pavan Deolasee
Date:
On Fri, May 27, 2011 at 7:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Pavan Deolasee <pavan.deolasee@gmail.com> writes:
>> My statistical skills are limited, but wouldn't that mean that for a
>> fairly well distributed write activity across a large table, if there
>> are even 3-4% update/deletes, we would most likely hit a
>> not-all-visible page for every 32 pages scanned ?
>
> Huh?  With a typical table density of several dozen tuples per page, an
> update ratio in that range would mean that just about every page would
> have something for VACUUM to do, if the modified tuples are evenly
> distributed.  The case where the skip optimization has some use is where
> there are large "cold" sections that have no changes at all.
>

I was pretty sure that I would have done my maths wrong :-) So that
means, even for far lesser update ratio, we would pretty much scan
every block and vacuum many of them for a typical well distributed
updates. Hmm. That means the idea of a single pass vacuum is
interesting even after visibility maps.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com


Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD

From
Greg Stark
Date:
On Fri, May 27, 2011 at 7:11 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Well, as with normal queries, it's usually faster to just seqscan the whole
> table if you need to access more than a few percent of the pages, because
> sequential I/O is so much faster than random I/O.

Well it's not strictly random access, you're still reading
sequentially, you're just skipping some pages. It'll never be slower
than a sequential scan it just might not be any faster. In my testing
reading every 8th page took exactly as long as reading every page,
which makes sense as the drive still has to seek to every track
exactly as if you were reading sequentially. IIRC reading less than
every 8th page started seeing a speedup.

>> Do we have any numbers to prove what we have today is good ? Sorry, I
>> may not have followed the discussions very closely in the past and not
>> sure if this has been debated/tested already.
>
> I think that number was chosen quite arbitrary. When you consider updating
> relfrozenxid, it's a bit difficult to decide what the optimal value would
> be; if you decide to skip pages you might have to perform an extra
> anti-wraparound somewhere down the line.

It would be nice if the VM had a bit for "all-frozen" but that
wouldn't help much except in the case of truly cold data. We could
perhaps keep the frozen data per segment or per VM page (which covers
a large section of the table) which would at least mean that would
have a fixed amount of data become vacuum-dirty when a tuple is
updated rather than a whole table which could be arbitrarily large.

Separately it's a bit strange that we actually have to visit the
pages. We have all the information we need in the VM to determine
whether there's a run of 32 vacuum-clean pages. Why can't we look at
the next 32 pages and if they're all vacuum-clean then skip looking at
the heap at all for them. What we do now is do the regular vacuum
algorithm and only after we've processed 32 pages in a row realize
that it was a waste of effort.



-- 
greg


Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD

From
Pavan Deolasee
Date:
On Fri, May 27, 2011 at 7:41 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 27.05.2011 16:52, Pavan Deolasee wrote:
>>
>> On closer inspection, I realized that we have
>> deliberately put in this hook to ensure that we use visibility maps
>> only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
>> sequential pages to take advantage of possible OS seq scan
>> optimizations.
>
> That, and the fact that if you skip any page, you can't advance
> relfrozenxid.

Hmm. For a significantly large table, wouldn't it be the case that we
would most likely skip one page somewhere ? Would it be better that we
instead do a full scan every once in a while instead of relying on a
not-so-well-understood heuristic ?

>
>> My statistical skills are limited, but wouldn't that mean that for a
>> fairly well distributed write activity across a large table, if there
>> are even 3-4% update/deletes, we would most likely hit a
>> not-all-visible page for every 32 pages scanned ? That would mean that
>> almost entire relation will be scanned even if the visibility map
>> tells us that only 3-4% pages require scanning ?  And the probability
>> will increase with the increase in the percentage of updated/deleted
>> tuples. Given that the likelihood of anyone calling VACUUM (manually
>> or through autovac settings) on a table which has less than 3-4%
>> updates/deletes is very low, I am worried that might be loosing all
>> advantages of visibility maps for a fairly common use case.
>
> Well, as with normal queries, it's usually faster to just seqscan the whole
> table if you need to access more than a few percent of the pages, because
> sequential I/O is so much faster than random I/O. The visibility map really
> only helps if all the updates are limited to some part of the table.

The vacuum scan is not a complete random scan. So I am not sure how
effective a complete seq scan be. May be we need to run some tests to
measure that too before we choose one over the other.

Thanks,
Pavan


--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com


Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD

From
Robert Haas
Date:
On Fri, May 27, 2011 at 11:10 AM, Greg Stark <gsstark@mit.edu> wrote:
> It would be nice if the VM had a bit for "all-frozen" but that
> wouldn't help much except in the case of truly cold data. We could
> perhaps keep the frozen data per segment or per VM page (which covers
> a large section of the table) which would at least mean that would
> have a fixed amount of data become vacuum-dirty when a tuple is
> updated rather than a whole table which could be arbitrarily large.

Instead of just having one bit, it might be useful to have a
relfrozenxid counter for each, say, 64MB chunk, rather than just one
for the whole table.  At least in theory, that would give us the
possibility of freezing only portions of the table that were most
urgently in need of it.  I'm not sure how exactly what algorithm we'd
want to apply, though.

In general, ISTM that the problem with VACUUM is that we don't know
whether we're "keeping up" or "getting behind".  For
checkpoint_completion_target, we measure how fast we're writing pages
relative to when the checkpoint needs to be done.  We write faster if
we get behind, where behind can mean either that checkpoint_segments
is going to expire too soon, or that checkpoint_timeout is going to
expire too soon.  VACUUM has a very similar problem: operations that
use XIDs or create dead tuples create the need for maintenance which
VACUUM then performs.  We want to vacuum fast enough to keep up with
the work, but not so fast that we tax the I/O subsystem more than
necessary.  But unlike the checkpoint process, vacuum's
decision-making is all local: it has no idea whether it's keeping up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD

From
Pavan Deolasee
Date:
On Fri, May 27, 2011 at 8:40 PM, Greg Stark <gsstark@mit.edu> wrote:
>
> Separately it's a bit strange that we actually have to visit the
> pages. We have all the information we need in the VM to determine
> whether there's a run of 32 vacuum-clean pages. Why can't we look at
> the next 32 pages and if they're all vacuum-clean then skip looking at
> the heap at all for them. What we do now is do the regular vacuum
> algorithm and only after we've processed 32 pages in a row realize
> that it was a waste of effort.
>

May be we want to have visibility map APIs to test if a chunk of pages
are all visible or not. We can then use that API to test every N
blocks (where N is the number where continuous sequential scans would
still be better than sequential scans with gaps) and either read all
of them sequentially or just skip all of them.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com


Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD

From
Bruce Momjian
Date:
Heikki Linnakangas wrote:
> On 27.05.2011 16:52, Pavan Deolasee wrote:
> > On closer inspection, I realized that we have
> > deliberately put in this hook to ensure that we use visibility maps
> > only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible
> > sequential pages to take advantage of possible OS seq scan
> > optimizations.
> 
> That, and the fact that if you skip any page, you can't advance 
> relfrozenxid.
> 
> > My statistical skills are limited, but wouldn't that mean that for a
> > fairly well distributed write activity across a large table, if there
> > are even 3-4% update/deletes, we would most likely hit a
> > not-all-visible page for every 32 pages scanned ? That would mean that
> > almost entire relation will be scanned even if the visibility map
> > tells us that only 3-4% pages require scanning ?  And the probability
> > will increase with the increase in the percentage of updated/deleted
> > tuples. Given that the likelihood of anyone calling VACUUM (manually
> > or through autovac settings) on a table which has less than 3-4%
> > updates/deletes is very low, I am worried that might be loosing all
> > advantages of visibility maps for a fairly common use case.
> 
> Well, as with normal queries, it's usually faster to just seqscan the 
> whole table if you need to access more than a few percent of the pages, 
> because sequential I/O is so much faster than random I/O. The visibility 
> map really only helps if all the updates are limited to some part of the 
> table. For example, if you only recent records are updated frequently, 
> and old ones are almost never touched.

I realize we just read the pages from the kernel to maintain sequential
I/O, but do we actually read the contents of the page if we know it
doesn't need vacuuming?  If so, do we need to?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD

From
Heikki Linnakangas
Date:
On 03.06.2011 22:16, Bruce Momjian wrote:
> I realize we just read the pages from the kernel to maintain sequential
> I/O, but do we actually read the contents of the page if we know it
> doesn't need vacuuming?

Yes.

>  If so, do we need to?

Not necessarily, but it allows us to freeze old tuples, and doesn't cost 
much anyway.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD

From
Greg Stark
Date:
<p><br /> On Jun 3, 2011 8:38 PM, "Bruce Momjian" <<a href="mailto:bruce@momjian.us">bruce@momjian.us</a>>
wrote:<br/> ><br /> > I realize we just read the pages from the kernel to maintain sequential<br /> > I/O, but
dowe actually read the contents of the page if we know it<br /> > doesn't need vacuuming?  If so, do we need to?<p>I
dontfollow. What's your question?<p>Tom's final version does basically the optimal combination of the above I think.