Re: pg_upgrade and statistics - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: pg_upgrade and statistics
Date
Msg-id 20120315000908.GB26534@momjian.us
Whole thread Raw
In response to Re: pg_upgrade and statistics  (Bruce Momjian <bruce@momjian.us>)
Responses Re: pg_upgrade and statistics
List pgsql-hackers
On Tue, Mar 13, 2012 at 09:15:52PM -0400, Bruce Momjian wrote:
> On Tue, Mar 13, 2012 at 08:22:51PM -0400, Bruce Momjian wrote:
> > On Tue, Mar 13, 2012 at 05:33:29PM -0500, Kevin Grittner wrote:
> > > Bruce Momjian <bruce@momjian.us> wrote:
> > >
> > > > What is the target=10 duration?  I think 10 is as low as we can
> > > > acceptably recommend.  Should we recommend they run vacuumdb
> > > > twice, once with default_statistics_target = 4, and another with
> > > > the default?
> > >
> > > Here are the results at various settings.
> > >
> > > 1   :  172198.892 ms
> > > 2   :  295536.814 ms
> > > 4   :  474319.826 ms
> > > 10  :  750458.312 ms
> > > 100 :  3433794.609 ms
> >
> > Thanks, good numbers to know.
>
> OK, new crazy idea.  Kevin has shown that his database can get a single
> bucket in 2.8 minutes.  What if we have pg_upgrade create a vacuumdb
> script that generates increasingly accurate statistics, e.g. it runs for
> default_statistics_target values of 1, 10, and default (100).  That
> would give basic statistics quickly (2.8 minutes), and full statistics
> in an hour, for Kevin's database.
>
>     PGOPTIONS='-c default_statistics_target=1' vacuumdb --all --analyze-only
>     PGOPTIONS='-c default_statistics_target=10' vacuumdb --all --analyze-only
>     vacuumdb --all --analyze-only
>
> The only problem I see is that users who use non-default statistics per
> table would not be affected by the increasing default_statistics_target
> values.
>
> The upside is this would work for all releases of Postgres.

OK, I have modified pg_upgrade with the attached patch to do exactly
this.  I have also attached the script pg_upgrade creates that should be
run instead of vacuumdb.

Based on Kevin's numbers above, the first vacuumdb will be done in 2.8
minutes (1 target), the 10 target vacuumdb done after 15 minutes, and
the 100 target vacuumdb done after 72 minutes (times accumulate).  Here
is what the output looks like:

    Generating minimal optimizer statistics (1 target)
    --------------------------------------------------
    vacuumdb: vacuuming database "postgres"
    vacuumdb: vacuuming database "template1"
    vacuumdb: vacuuming database "test"

    The server is now available with minimal optimizer statistics.
    Query performance will be optimal once this script completes.

    Generating medium optimizer statistics (10 targets)
    ---------------------------------------------------
    vacuumdb: vacuuming database "postgres"
    vacuumdb: vacuuming database "template1"
    vacuumdb: vacuuming database "test"

    Generating default (full) optimizer statistics (100 targets?)
    -------------------------------------------------------------
    vacuumdb: vacuuming database "postgres"
    vacuumdb: vacuuming database "template1"
    vacuumdb: vacuuming database "test"

    Done

If we don't want to try migrating the statistics from the old system,
this seems like the best approach.

Does anyone know how bad the queries will be with only one target?

I did see if vacuumdb --analyze-only was somehow being throttled by the
vacuum settings, but saw the drive at 100% utilization analying a 36GB
table on a 24GB RAM server, so it seems I/O bound.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Attachment

pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Too many IO?
Next
From: Tom Lane
Date:
Subject: Re: VALID UNTIL