Thread: Vacuum, visibility maps and SKIP_PAGES_THRESHOLD
I wonder if we have tested the reasoning behind having SKIP_PAGES_THRESHOLD and the magic number of 32 assigned to it currently. While looking at the code after a long time and doing some tests, I realized that a manual VACUUM would always scan first 31 pages of a relation which has not received any write activity since the last VACUUM. On closer inspection, I realized that we have deliberately put in this hook to ensure that we use visibility maps only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible sequential pages to take advantage of possible OS seq scan optimizations. My statistical skills are limited, but wouldn't that mean that for a fairly well distributed write activity across a large table, if there are even 3-4% update/deletes, we would most likely hit a not-all-visible page for every 32 pages scanned ? That would mean that almost entire relation will be scanned even if the visibility map tells us that only 3-4% pages require scanning ? And the probability will increase with the increase in the percentage of updated/deleted tuples. Given that the likelihood of anyone calling VACUUM (manually or through autovac settings) on a table which has less than 3-4% updates/deletes is very low, I am worried that might be loosing all advantages of visibility maps for a fairly common use case. Do we have any numbers to prove what we have today is good ? Sorry, I may not have followed the discussions very closely in the past and not sure if this has been debated/tested already. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Pavan Deolasee <pavan.deolasee@gmail.com> writes: > My statistical skills are limited, but wouldn't that mean that for a > fairly well distributed write activity across a large table, if there > are even 3-4% update/deletes, we would most likely hit a > not-all-visible page for every 32 pages scanned ? Huh? With a typical table density of several dozen tuples per page, an update ratio in that range would mean that just about every page would have something for VACUUM to do, if the modified tuples are evenly distributed. The case where the skip optimization has some use is where there are large "cold" sections that have no changes at all. Having said that, I don't know how carefully we tested different values for SKIP_PAGES_THRESHOLD. regards, tom lane
2011/5/27 Pavan Deolasee <pavan.deolasee@gmail.com>: > I wonder if we have tested the reasoning behind having > SKIP_PAGES_THRESHOLD and the magic number of 32 assigned to it > currently. While looking at the code after a long time and doing some > tests, I realized that a manual VACUUM would always scan first 31 > pages of a relation which has not received any write activity since > the last VACUUM. On closer inspection, I realized that we have > deliberately put in this hook to ensure that we use visibility maps > only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible > sequential pages to take advantage of possible OS seq scan > optimizations. > > My statistical skills are limited, but wouldn't that mean that for a > fairly well distributed write activity across a large table, if there > are even 3-4% update/deletes, we would most likely hit a > not-all-visible page for every 32 pages scanned ? That would mean that The page skip is still based on VM. So you wonder what are the chances of a VM not up-to-date when we access it ? > almost entire relation will be scanned even if the visibility map > tells us that only 3-4% pages require scanning ? And the probability > will increase with the increase in the percentage of updated/deleted > tuples. Given that the likelihood of anyone calling VACUUM (manually > or through autovac settings) on a table which has less than 3-4% > updates/deletes is very low, I am worried that might be loosing all > advantages of visibility maps for a fairly common use case. > > Do we have any numbers to prove what we have today is good ? Sorry, I > may not have followed the discussions very closely in the past and not > sure if this has been debated/tested already. > > Thanks, > Pavan > > > > > > -- > Pavan Deolasee > EnterpriseDB http://www.enterprisedb.com > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
On 27.05.2011 16:52, Pavan Deolasee wrote: > On closer inspection, I realized that we have > deliberately put in this hook to ensure that we use visibility maps > only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible > sequential pages to take advantage of possible OS seq scan > optimizations. That, and the fact that if you skip any page, you can't advance relfrozenxid. > My statistical skills are limited, but wouldn't that mean that for a > fairly well distributed write activity across a large table, if there > are even 3-4% update/deletes, we would most likely hit a > not-all-visible page for every 32 pages scanned ? That would mean that > almost entire relation will be scanned even if the visibility map > tells us that only 3-4% pages require scanning ? And the probability > will increase with the increase in the percentage of updated/deleted > tuples. Given that the likelihood of anyone calling VACUUM (manually > or through autovac settings) on a table which has less than 3-4% > updates/deletes is very low, I am worried that might be loosing all > advantages of visibility maps for a fairly common use case. Well, as with normal queries, it's usually faster to just seqscan the whole table if you need to access more than a few percent of the pages, because sequential I/O is so much faster than random I/O. The visibility map really only helps if all the updates are limited to some part of the table. For example, if you only recent records are updated frequently, and old ones are almost never touched. > Do we have any numbers to prove what we have today is good ? Sorry, I > may not have followed the discussions very closely in the past and not > sure if this has been debated/tested already. I think that number was chosen quite arbitrary. When you consider updating relfrozenxid, it's a bit difficult to decide what the optimal value would be; if you decide to skip pages you might have to perform an extra anti-wraparound somewhere down the line. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
2011/5/27 Cédric Villemain <cedric.villemain.debian@gmail.com>: > 2011/5/27 Pavan Deolasee <pavan.deolasee@gmail.com>: >> I wonder if we have tested the reasoning behind having >> SKIP_PAGES_THRESHOLD and the magic number of 32 assigned to it >> currently. While looking at the code after a long time and doing some >> tests, I realized that a manual VACUUM would always scan first 31 >> pages of a relation which has not received any write activity since >> the last VACUUM. On closer inspection, I realized that we have >> deliberately put in this hook to ensure that we use visibility maps >> only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible >> sequential pages to take advantage of possible OS seq scan >> optimizations. >> >> My statistical skills are limited, but wouldn't that mean that for a >> fairly well distributed write activity across a large table, if there >> are even 3-4% update/deletes, we would most likely hit a >> not-all-visible page for every 32 pages scanned ? That would mean that > > The page skip is still based on VM. > So you wonder what are the chances of a VM not up-to-date when we access it ? re-reading the mails and answers, I misunderstood the case you exposed. > >> almost entire relation will be scanned even if the visibility map >> tells us that only 3-4% pages require scanning ? And the probability >> will increase with the increase in the percentage of updated/deleted >> tuples. Given that the likelihood of anyone calling VACUUM (manually >> or through autovac settings) on a table which has less than 3-4% >> updates/deletes is very low, I am worried that might be loosing all >> advantages of visibility maps for a fairly common use case. >> >> Do we have any numbers to prove what we have today is good ? Sorry, I >> may not have followed the discussions very closely in the past and not >> sure if this has been debated/tested already. >> >> Thanks, >> Pavan >> >> >> >> >> >> -- >> Pavan Deolasee >> EnterpriseDB http://www.enterprisedb.com >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >> > > > > -- > Cédric Villemain 2ndQuadrant > http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
On Fri, May 27, 2011 at 7:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Pavan Deolasee <pavan.deolasee@gmail.com> writes: >> My statistical skills are limited, but wouldn't that mean that for a >> fairly well distributed write activity across a large table, if there >> are even 3-4% update/deletes, we would most likely hit a >> not-all-visible page for every 32 pages scanned ? > > Huh? With a typical table density of several dozen tuples per page, an > update ratio in that range would mean that just about every page would > have something for VACUUM to do, if the modified tuples are evenly > distributed. The case where the skip optimization has some use is where > there are large "cold" sections that have no changes at all. > I was pretty sure that I would have done my maths wrong :-) So that means, even for far lesser update ratio, we would pretty much scan every block and vacuum many of them for a typical well distributed updates. Hmm. That means the idea of a single pass vacuum is interesting even after visibility maps. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
On Fri, May 27, 2011 at 7:11 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Well, as with normal queries, it's usually faster to just seqscan the whole > table if you need to access more than a few percent of the pages, because > sequential I/O is so much faster than random I/O. Well it's not strictly random access, you're still reading sequentially, you're just skipping some pages. It'll never be slower than a sequential scan it just might not be any faster. In my testing reading every 8th page took exactly as long as reading every page, which makes sense as the drive still has to seek to every track exactly as if you were reading sequentially. IIRC reading less than every 8th page started seeing a speedup. >> Do we have any numbers to prove what we have today is good ? Sorry, I >> may not have followed the discussions very closely in the past and not >> sure if this has been debated/tested already. > > I think that number was chosen quite arbitrary. When you consider updating > relfrozenxid, it's a bit difficult to decide what the optimal value would > be; if you decide to skip pages you might have to perform an extra > anti-wraparound somewhere down the line. It would be nice if the VM had a bit for "all-frozen" but that wouldn't help much except in the case of truly cold data. We could perhaps keep the frozen data per segment or per VM page (which covers a large section of the table) which would at least mean that would have a fixed amount of data become vacuum-dirty when a tuple is updated rather than a whole table which could be arbitrarily large. Separately it's a bit strange that we actually have to visit the pages. We have all the information we need in the VM to determine whether there's a run of 32 vacuum-clean pages. Why can't we look at the next 32 pages and if they're all vacuum-clean then skip looking at the heap at all for them. What we do now is do the regular vacuum algorithm and only after we've processed 32 pages in a row realize that it was a waste of effort. -- greg
On Fri, May 27, 2011 at 7:41 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 27.05.2011 16:52, Pavan Deolasee wrote: >> >> On closer inspection, I realized that we have >> deliberately put in this hook to ensure that we use visibility maps >> only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible >> sequential pages to take advantage of possible OS seq scan >> optimizations. > > That, and the fact that if you skip any page, you can't advance > relfrozenxid. Hmm. For a significantly large table, wouldn't it be the case that we would most likely skip one page somewhere ? Would it be better that we instead do a full scan every once in a while instead of relying on a not-so-well-understood heuristic ? > >> My statistical skills are limited, but wouldn't that mean that for a >> fairly well distributed write activity across a large table, if there >> are even 3-4% update/deletes, we would most likely hit a >> not-all-visible page for every 32 pages scanned ? That would mean that >> almost entire relation will be scanned even if the visibility map >> tells us that only 3-4% pages require scanning ? And the probability >> will increase with the increase in the percentage of updated/deleted >> tuples. Given that the likelihood of anyone calling VACUUM (manually >> or through autovac settings) on a table which has less than 3-4% >> updates/deletes is very low, I am worried that might be loosing all >> advantages of visibility maps for a fairly common use case. > > Well, as with normal queries, it's usually faster to just seqscan the whole > table if you need to access more than a few percent of the pages, because > sequential I/O is so much faster than random I/O. The visibility map really > only helps if all the updates are limited to some part of the table. The vacuum scan is not a complete random scan. So I am not sure how effective a complete seq scan be. May be we need to run some tests to measure that too before we choose one over the other. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
On Fri, May 27, 2011 at 11:10 AM, Greg Stark <gsstark@mit.edu> wrote: > It would be nice if the VM had a bit for "all-frozen" but that > wouldn't help much except in the case of truly cold data. We could > perhaps keep the frozen data per segment or per VM page (which covers > a large section of the table) which would at least mean that would > have a fixed amount of data become vacuum-dirty when a tuple is > updated rather than a whole table which could be arbitrarily large. Instead of just having one bit, it might be useful to have a relfrozenxid counter for each, say, 64MB chunk, rather than just one for the whole table. At least in theory, that would give us the possibility of freezing only portions of the table that were most urgently in need of it. I'm not sure how exactly what algorithm we'd want to apply, though. In general, ISTM that the problem with VACUUM is that we don't know whether we're "keeping up" or "getting behind". For checkpoint_completion_target, we measure how fast we're writing pages relative to when the checkpoint needs to be done. We write faster if we get behind, where behind can mean either that checkpoint_segments is going to expire too soon, or that checkpoint_timeout is going to expire too soon. VACUUM has a very similar problem: operations that use XIDs or create dead tuples create the need for maintenance which VACUUM then performs. We want to vacuum fast enough to keep up with the work, but not so fast that we tax the I/O subsystem more than necessary. But unlike the checkpoint process, vacuum's decision-making is all local: it has no idea whether it's keeping up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, May 27, 2011 at 8:40 PM, Greg Stark <gsstark@mit.edu> wrote: > > Separately it's a bit strange that we actually have to visit the > pages. We have all the information we need in the VM to determine > whether there's a run of 32 vacuum-clean pages. Why can't we look at > the next 32 pages and if they're all vacuum-clean then skip looking at > the heap at all for them. What we do now is do the regular vacuum > algorithm and only after we've processed 32 pages in a row realize > that it was a waste of effort. > May be we want to have visibility map APIs to test if a chunk of pages are all visible or not. We can then use that API to test every N blocks (where N is the number where continuous sequential scans would still be better than sequential scans with gaps) and either read all of them sequentially or just skip all of them. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > On 27.05.2011 16:52, Pavan Deolasee wrote: > > On closer inspection, I realized that we have > > deliberately put in this hook to ensure that we use visibility maps > > only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible > > sequential pages to take advantage of possible OS seq scan > > optimizations. > > That, and the fact that if you skip any page, you can't advance > relfrozenxid. > > > My statistical skills are limited, but wouldn't that mean that for a > > fairly well distributed write activity across a large table, if there > > are even 3-4% update/deletes, we would most likely hit a > > not-all-visible page for every 32 pages scanned ? That would mean that > > almost entire relation will be scanned even if the visibility map > > tells us that only 3-4% pages require scanning ? And the probability > > will increase with the increase in the percentage of updated/deleted > > tuples. Given that the likelihood of anyone calling VACUUM (manually > > or through autovac settings) on a table which has less than 3-4% > > updates/deletes is very low, I am worried that might be loosing all > > advantages of visibility maps for a fairly common use case. > > Well, as with normal queries, it's usually faster to just seqscan the > whole table if you need to access more than a few percent of the pages, > because sequential I/O is so much faster than random I/O. The visibility > map really only helps if all the updates are limited to some part of the > table. For example, if you only recent records are updated frequently, > and old ones are almost never touched. I realize we just read the pages from the kernel to maintain sequential I/O, but do we actually read the contents of the page if we know it doesn't need vacuuming? If so, do we need to? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 03.06.2011 22:16, Bruce Momjian wrote: > I realize we just read the pages from the kernel to maintain sequential > I/O, but do we actually read the contents of the page if we know it > doesn't need vacuuming? Yes. > If so, do we need to? Not necessarily, but it allows us to freeze old tuples, and doesn't cost much anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
<p><br /> On Jun 3, 2011 8:38 PM, "Bruce Momjian" <<a href="mailto:bruce@momjian.us">bruce@momjian.us</a>> wrote:<br/> ><br /> > I realize we just read the pages from the kernel to maintain sequential<br /> > I/O, but dowe actually read the contents of the page if we know it<br /> > doesn't need vacuuming? If so, do we need to?<p>I dontfollow. What's your question?<p>Tom's final version does basically the optimal combination of the above I think.