Thread: vacuum analyze again...

vacuum analyze again...

From
Jean-Christophe Boggio
Date:
Hi,

In Oracle, there are 2 ways to do the equivalent of vacuum analyze :

* analyze table xx compute statitics
* analyze table xx estimate statistics

In the second form, you can tell on what percentage of the file you
will do your stats. This is useful to make a quick analyze on huge tables
that have a homogenous dispersion.

Is there a way to "estimate" the statistics that vacuum analyze will
use instead of "computing" them ?

--
Jean-Christophe Boggio
cat@thefreecat.org
Independant Consultant and Developer
Delphi, Linux, Perl, PostgreSQL



Re: vacuum analyze again...

From
Bruce Momjian
Date:
[ Charset ISO-8859-1 unsupported, converting... ]
> Hi,
>
> In Oracle, there are 2 ways to do the equivalent of vacuum analyze :
>
> * analyze table xx compute statitics
> * analyze table xx estimate statistics
>
> In the second form, you can tell on what percentage of the file you
> will do your stats. This is useful to make a quick analyze on huge tables
> that have a homogenous dispersion.
>
> Is there a way to "estimate" the statistics that vacuum analyze will
> use instead of "computing" them ?

No, we have no ability to randomly pick rows to use for estimating
statistics.  Should we have this ability?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: vacuum analyze again...

From
Chris Jones
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:

> No, we have no ability to randomly pick rows to use for estimating
> statistics.  Should we have this ability?

That would be really slick, especially given the fact that VACUUM runs
much faster than VACUUM ANALYZE for a lot of PG users.  I could change
my daily maintenance scripts to do a VACUUM of everything, followed by
a VACUUM ANALYZE of the small tables, followed by a VACUUM ANALYZE
ESTIMATE (or whatever) of the large tables.

Even cooler would be the ability to set a table size threshold, so
that VACUUM ANALYZE would automatically choose the appropriate method
based on the table size.

Chris

--
chris@mt.sri.com -----------------------------------------------------
Chris Jones                                    SRI International, Inc.
                                                           www.sri.com

Re: vacuum analyze again...

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
>
> > No, we have no ability to randomly pick rows to use for estimating
> > statistics.  Should we have this ability?
>
> That would be really slick, especially given the fact that VACUUM runs
> much faster than VACUUM ANALYZE for a lot of PG users.  I could change
> my daily maintenance scripts to do a VACUUM of everything, followed by
> a VACUUM ANALYZE of the small tables, followed by a VACUUM ANALYZE
> ESTIMATE (or whatever) of the large tables.
>
> Even cooler would be the ability to set a table size threshold, so
> that VACUUM ANALYZE would automatically choose the appropriate method
> based on the table size.

Added to TODO:

        * Allow ANALYZE to process a certain random precentage of rows

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: vacuum analyze again...

From
Peter Eisentraut
Date:
Bruce Momjian writes:

> No, we have no ability to randomly pick rows to use for estimating
> statistics.  Should we have this ability?

How's reading a sufficiently large fraction of random rows going to be
significantly faster than reading all rows?  If you're just going to read
the first n rows then that isn't really random, is it?

--
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/


Re: vacuum analyze again...

From
Bruce Momjian
Date:
> Bruce Momjian writes:
>
> > No, we have no ability to randomly pick rows to use for estimating
> > statistics.  Should we have this ability?
>
> How's reading a sufficiently large fraction of random rows going to be
> significantly faster than reading all rows?  If you're just going to read
> the first n rows then that isn't really random, is it?

Ingres did this too, I thought.  You could specify a certain number of
random rows, perhaps 10%.  On a large table, that is often good enough
and much faster.  Often 2% is enough.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: vacuum analyze again...

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> How's reading a sufficiently large fraction of random rows going to be
>> significantly faster than reading all rows?  If you're just going to read
>> the first n rows then that isn't really random, is it?

> Ingres did this too, I thought.  You could specify a certain number of
> random rows, perhaps 10%.  On a large table, that is often good enough
> and much faster.  Often 2% is enough.

Peter's got a good point though.  Even 2% is going to mean fetching most
or all of the blocks in the table, for typical-size rows.  Furthermore,
fetching (say) every second or third block is likely to be actually
slower than a straight sequential read, because you're now fighting the
kernel's readahead policy instead of working with it.

To get a partial VACUUM ANALYZE that was actually usefully faster than
the current code, I think you'd have to read just a few percent of the
blocks, which means much less than a few percent of the rows ... unless
maybe you picked selected blocks but then used all the rows in those
blocks ... but is that a random sample?  It's debatable.

I find it hard to believe that VAC ANALYZE is all that much slower than
plain VACUUM anyway; fixing the indexes is the slowest part of VACUUM in
my experience.  It would be useful to know exactly what the columns are
in a table where VAC ANALYZE is considered unusably slow.

            regards, tom lane

Re: vacuum analyze again...

From
Bruce Momjian
Date:
> To get a partial VACUUM ANALYZE that was actually usefully faster than
> the current code, I think you'd have to read just a few percent of the
> blocks, which means much less than a few percent of the rows ... unless
> maybe you picked selected blocks but then used all the rows in those
> blocks ... but is that a random sample?  It's debatable.
>
> I find it hard to believe that VAC ANALYZE is all that much slower than
> plain VACUUM anyway; fixing the indexes is the slowest part of VACUUM in
> my experience.  It would be useful to know exactly what the columns are
> in a table where VAC ANALYZE is considered unusably slow.

VACUUM ANALYZE does a huge number of adt/ function calls.  It must be
those calls that make ANALYZE slower.  People report ANALYZE is
certainly slower, and that is the only difference.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: vacuum analyze again...

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> I find it hard to believe that VAC ANALYZE is all that much slower than
>> plain VACUUM anyway; fixing the indexes is the slowest part of VACUUM in
>> my experience.  It would be useful to know exactly what the columns are
>> in a table where VAC ANALYZE is considered unusably slow.

> VACUUM ANALYZE does a huge number of adt/ function calls.  It must be
> those calls that make ANALYZE slower.  People report ANALYZE is
> certainly slower, and that is the only difference.

That's why I'm asking what the data is.  The function calls per se can't
be that slow; I think there must be some datatype-specific issue.

With TOAST in the mix, TOAST fetches could very well be an issue, but
I didn't think 7.1 was being discussed ...

            regards, tom lane

Re: vacuum analyze again...

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> I find it hard to believe that VAC ANALYZE is all that much slower than
> >> plain VACUUM anyway; fixing the indexes is the slowest part of VACUUM in
> >> my experience.  It would be useful to know exactly what the columns are
> >> in a table where VAC ANALYZE is considered unusably slow.
>
> > VACUUM ANALYZE does a huge number of adt/ function calls.  It must be
> > those calls that make ANALYZE slower.  People report ANALYZE is
> > certainly slower, and that is the only difference.
>
> That's why I'm asking what the data is.  The function calls per se can't
> be that slow; I think there must be some datatype-specific issue.
>
> With TOAST in the mix, TOAST fetches could very well be an issue, but
> I didn't think 7.1 was being discussed ...
>

I would love to hear what the issue is with ANALYZE.  There isn't much
going on with ANALYZE except the function calls.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: vacuum analyze again...

From
Pete Forman
Date:
Bruce Momjian writes:
 > > Bruce Momjian <pgman@candle.pha.pa.us> writes:
 > >
 > > > No, we have no ability to randomly pick rows to use for
 > > > estimating statistics.  Should we have this ability?
 > >
 > > That would be really slick, especially given the fact that VACUUM
 > > runs much faster than VACUUM ANALYZE for a lot of PG users.  I
 > > could change my daily maintenance scripts to do a VACUUM of
 > > everything, followed by a VACUUM ANALYZE of the small tables,
 > > followed by a VACUUM ANALYZE ESTIMATE (or whatever) of the large
 > > tables.
 > >
 > > Even cooler would be the ability to set a table size threshold,
 > > so that VACUUM ANALYZE would automatically choose the appropriate
 > > method based on the table size.
 >
 > Added to TODO:
 >
 >         * Allow ANALYZE to process a certain random precentage of
 >           rows

Does this reduced analysis need to be random?  Why not allow the DBA
to specify what rows or blocks to do in some way.
--
Pete Forman                 -./\.- Disclaimer: This post is originated
WesternGeco                   -./\.-  by myself and does not represent
pete.forman@westerngeco.com     -./\.-  opinion of Schlumberger, Baker
http://www.crosswinds.net/~petef  -./\.-  Hughes or their divisions.

Re: vacuum analyze again...

From
Bruce Momjian
Date:
>  > Added to TODO:
>  >
>  >         * Allow ANALYZE to process a certain random precentage of
>  >           rows
>
> Does this reduced analysis need to be random?  Why not allow the DBA
> to specify what rows or blocks to do in some way.

No, we are not about to add the kitchen sink here.  If you really want
to control the statistics values, just update pg_attribute and
pg_statistics.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026