Thread: vacuum analyze again...
Hi, In Oracle, there are 2 ways to do the equivalent of vacuum analyze : * analyze table xx compute statitics * analyze table xx estimate statistics In the second form, you can tell on what percentage of the file you will do your stats. This is useful to make a quick analyze on huge tables that have a homogenous dispersion. Is there a way to "estimate" the statistics that vacuum analyze will use instead of "computing" them ? -- Jean-Christophe Boggio cat@thefreecat.org Independant Consultant and Developer Delphi, Linux, Perl, PostgreSQL
[ Charset ISO-8859-1 unsupported, converting... ] > Hi, > > In Oracle, there are 2 ways to do the equivalent of vacuum analyze : > > * analyze table xx compute statitics > * analyze table xx estimate statistics > > In the second form, you can tell on what percentage of the file you > will do your stats. This is useful to make a quick analyze on huge tables > that have a homogenous dispersion. > > Is there a way to "estimate" the statistics that vacuum analyze will > use instead of "computing" them ? No, we have no ability to randomly pick rows to use for estimating statistics. Should we have this ability? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > No, we have no ability to randomly pick rows to use for estimating > statistics. Should we have this ability? That would be really slick, especially given the fact that VACUUM runs much faster than VACUUM ANALYZE for a lot of PG users. I could change my daily maintenance scripts to do a VACUUM of everything, followed by a VACUUM ANALYZE of the small tables, followed by a VACUUM ANALYZE ESTIMATE (or whatever) of the large tables. Even cooler would be the ability to set a table size threshold, so that VACUUM ANALYZE would automatically choose the appropriate method based on the table size. Chris -- chris@mt.sri.com ----------------------------------------------------- Chris Jones SRI International, Inc. www.sri.com
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > No, we have no ability to randomly pick rows to use for estimating > > statistics. Should we have this ability? > > That would be really slick, especially given the fact that VACUUM runs > much faster than VACUUM ANALYZE for a lot of PG users. I could change > my daily maintenance scripts to do a VACUUM of everything, followed by > a VACUUM ANALYZE of the small tables, followed by a VACUUM ANALYZE > ESTIMATE (or whatever) of the large tables. > > Even cooler would be the ability to set a table size threshold, so > that VACUUM ANALYZE would automatically choose the appropriate method > based on the table size. Added to TODO: * Allow ANALYZE to process a certain random precentage of rows -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian writes: > No, we have no ability to randomly pick rows to use for estimating > statistics. Should we have this ability? How's reading a sufficiently large fraction of random rows going to be significantly faster than reading all rows? If you're just going to read the first n rows then that isn't really random, is it? -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
> Bruce Momjian writes: > > > No, we have no ability to randomly pick rows to use for estimating > > statistics. Should we have this ability? > > How's reading a sufficiently large fraction of random rows going to be > significantly faster than reading all rows? If you're just going to read > the first n rows then that isn't really random, is it? Ingres did this too, I thought. You could specify a certain number of random rows, perhaps 10%. On a large table, that is often good enough and much faster. Often 2% is enough. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> How's reading a sufficiently large fraction of random rows going to be >> significantly faster than reading all rows? If you're just going to read >> the first n rows then that isn't really random, is it? > Ingres did this too, I thought. You could specify a certain number of > random rows, perhaps 10%. On a large table, that is often good enough > and much faster. Often 2% is enough. Peter's got a good point though. Even 2% is going to mean fetching most or all of the blocks in the table, for typical-size rows. Furthermore, fetching (say) every second or third block is likely to be actually slower than a straight sequential read, because you're now fighting the kernel's readahead policy instead of working with it. To get a partial VACUUM ANALYZE that was actually usefully faster than the current code, I think you'd have to read just a few percent of the blocks, which means much less than a few percent of the rows ... unless maybe you picked selected blocks but then used all the rows in those blocks ... but is that a random sample? It's debatable. I find it hard to believe that VAC ANALYZE is all that much slower than plain VACUUM anyway; fixing the indexes is the slowest part of VACUUM in my experience. It would be useful to know exactly what the columns are in a table where VAC ANALYZE is considered unusably slow. regards, tom lane
> To get a partial VACUUM ANALYZE that was actually usefully faster than > the current code, I think you'd have to read just a few percent of the > blocks, which means much less than a few percent of the rows ... unless > maybe you picked selected blocks but then used all the rows in those > blocks ... but is that a random sample? It's debatable. > > I find it hard to believe that VAC ANALYZE is all that much slower than > plain VACUUM anyway; fixing the indexes is the slowest part of VACUUM in > my experience. It would be useful to know exactly what the columns are > in a table where VAC ANALYZE is considered unusably slow. VACUUM ANALYZE does a huge number of adt/ function calls. It must be those calls that make ANALYZE slower. People report ANALYZE is certainly slower, and that is the only difference. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> I find it hard to believe that VAC ANALYZE is all that much slower than >> plain VACUUM anyway; fixing the indexes is the slowest part of VACUUM in >> my experience. It would be useful to know exactly what the columns are >> in a table where VAC ANALYZE is considered unusably slow. > VACUUM ANALYZE does a huge number of adt/ function calls. It must be > those calls that make ANALYZE slower. People report ANALYZE is > certainly slower, and that is the only difference. That's why I'm asking what the data is. The function calls per se can't be that slow; I think there must be some datatype-specific issue. With TOAST in the mix, TOAST fetches could very well be an issue, but I didn't think 7.1 was being discussed ... regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> I find it hard to believe that VAC ANALYZE is all that much slower than > >> plain VACUUM anyway; fixing the indexes is the slowest part of VACUUM in > >> my experience. It would be useful to know exactly what the columns are > >> in a table where VAC ANALYZE is considered unusably slow. > > > VACUUM ANALYZE does a huge number of adt/ function calls. It must be > > those calls that make ANALYZE slower. People report ANALYZE is > > certainly slower, and that is the only difference. > > That's why I'm asking what the data is. The function calls per se can't > be that slow; I think there must be some datatype-specific issue. > > With TOAST in the mix, TOAST fetches could very well be an issue, but > I didn't think 7.1 was being discussed ... > I would love to hear what the issue is with ANALYZE. There isn't much going on with ANALYZE except the function calls. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian writes: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > > > No, we have no ability to randomly pick rows to use for > > > estimating statistics. Should we have this ability? > > > > That would be really slick, especially given the fact that VACUUM > > runs much faster than VACUUM ANALYZE for a lot of PG users. I > > could change my daily maintenance scripts to do a VACUUM of > > everything, followed by a VACUUM ANALYZE of the small tables, > > followed by a VACUUM ANALYZE ESTIMATE (or whatever) of the large > > tables. > > > > Even cooler would be the ability to set a table size threshold, > > so that VACUUM ANALYZE would automatically choose the appropriate > > method based on the table size. > > Added to TODO: > > * Allow ANALYZE to process a certain random precentage of > rows Does this reduced analysis need to be random? Why not allow the DBA to specify what rows or blocks to do in some way. -- Pete Forman -./\.- Disclaimer: This post is originated WesternGeco -./\.- by myself and does not represent pete.forman@westerngeco.com -./\.- opinion of Schlumberger, Baker http://www.crosswinds.net/~petef -./\.- Hughes or their divisions.
> > Added to TODO: > > > > * Allow ANALYZE to process a certain random precentage of > > rows > > Does this reduced analysis need to be random? Why not allow the DBA > to specify what rows or blocks to do in some way. No, we are not about to add the kitchen sink here. If you really want to control the statistics values, just update pg_attribute and pg_statistics. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026