Thread: Re: [GENERAL] Idea for the statistics collector

Re: [GENERAL] Idea for the statistics collector

From
Bruce Momjian
Date:
Martijn van Oosterhout wrote:
> Since it's currently all for collecting statistics on tables, why can't it
> collect another type of statistic, like:
> 
> - How often the estimator gets it wrong?
> 
> At the end of an index scan, the executor could compare the number of rows
> returned against what was estimated, and if it falls outside a certain
> range, flag it.
> 
> Also, the average ratio of rows coming out of a distinct node vs the number
> going in.
> 
> For a join clause, the amount of correlation between two columns (hard).
> 
> etc
> 
> Ideally, the planner could then use this info to make better plans.
> Eventually, the whole system could become somewhat self-tuning.
> 
> Does anyone see any problems with this?

[ Discussion moved to hackers.]

I have thought that some type of feedback from the executor back into
the optimizer would be a good feature.  Not sure how to do it, but your
idea makes sense.  It certainly could update the table statistics after
a sequential scan.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [GENERAL] Idea for the statistics collector

From
Neil Conway
Date:
On Thu, 20 Jun 2002 22:50:04 -0400 (EDT)
"Bruce Momjian" <pgman@candle.pha.pa.us> wrote:
> I have thought that some type of feedback from the executor back into
> the optimizer would be a good feature.  Not sure how to do it, but your
> idea makes sense.  It certainly could update the table statistics after
> a sequential scan.

Search the archives for a thread I started on -hackers called "self-tuning
histograms", which talks about a pretty similar idea. The technique there
applies only to histograms, and builds the histogram based *only* upon
the data provided by the executor.

Tom commented that it's probably a better idea to concentrate on more
elementary techniques, like multi-dimensional histograms, before starting
on ST histograms. I agree, and plan to look at multi-dimensional histograms
when I get some spare time.

Cheers,

Neil

-- 
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC


Re: [GENERAL] Idea for the statistics collector

From
Bruce Momjian
Date:
Neil Conway wrote:
> On Thu, 20 Jun 2002 22:50:04 -0400 (EDT)
> "Bruce Momjian" <pgman@candle.pha.pa.us> wrote:
> > I have thought that some type of feedback from the executor back into
> > the optimizer would be a good feature.  Not sure how to do it, but your
> > idea makes sense.  It certainly could update the table statistics after
> > a sequential scan.
> 
> Search the archives for a thread I started on -hackers called "self-tuning
> histograms", which talks about a pretty similar idea. The technique there
> applies only to histograms, and builds the histogram based *only* upon
> the data provided by the executor.
> 
> Tom commented that it's probably a better idea to concentrate on more
> elementary techniques, like multi-dimensional histograms, before starting
> on ST histograms. I agree, and plan to look at multi-dimensional histograms
> when I get some spare time.

I was thinking of something much more elementary, like a table that
reports to have 50 blocks but an executor sequential scan shows 500
blocks.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [GENERAL] Idea for the statistics collector

From
"Christopher Kings-Lynne"
Date:
I was thinking of writing a command line tool like 'pgtune' that looks at
the stats views and will generate SQL code for, or do automatically the
following:

* Dropping indices that are never used
* Creating appropriate indices to avoid large, expensive sequential scans.

This would put us in the 'mysql makes my indices for me by magic' league -
but would be far more powerful and flexible.  How to do multikey indices is
beyond me tho.

*sigh* I'm recovering from a septoplasty on my nose atm, so I might have
some time to do some coding!

Chris

----- Original Message -----
From: "Bruce Momjian" <pgman@candle.pha.pa.us>
To: "Martijn van Oosterhout" <kleptog@svana.org>
Cc: "PostgreSQL-development" <pgsql-hackers@postgresql.org>
Sent: Friday, June 21, 2002 10:50 AM
Subject: Re: [HACKERS] [GENERAL] Idea for the statistics collector


> Martijn van Oosterhout wrote:
> > Since it's currently all for collecting statistics on tables, why can't
it
> > collect another type of statistic, like:
> >
> > - How often the estimator gets it wrong?
> >
> > At the end of an index scan, the executor could compare the number of
rows
> > returned against what was estimated, and if it falls outside a certain
> > range, flag it.
> >
> > Also, the average ratio of rows coming out of a distinct node vs the
number
> > going in.
> >
> > For a join clause, the amount of correlation between two columns (hard).
> >
> > etc
> >
> > Ideally, the planner could then use this info to make better plans.
> > Eventually, the whole system could become somewhat self-tuning.
> >
> > Does anyone see any problems with this?
>
> [ Discussion moved to hackers.]
>
> I have thought that some type of feedback from the executor back into
> the optimizer would be a good feature.  Not sure how to do it, but your
> idea makes sense.  It certainly could update the table statistics after
> a sequential scan.
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>





Re: [GENERAL] Idea for the statistics collector

From
Bruce Momjian
Date:
Christopher Kings-Lynne wrote:
> I was thinking of writing a command line tool like 'pgtune' that looks at
> the stats views and will generate SQL code for, or do automatically the
> following:
> 
> * Dropping indices that are never used
> * Creating appropriate indices to avoid large, expensive sequential scans.
> 
> This would put us in the 'mysql makes my indices for me by magic' league -
> but would be far more powerful and flexible.  How to do multikey indices is
> beyond me tho.

This is a great idea.  I have been wanting to do something like this
myself but probably won't get the time.

Does MySQL really make indexes by magic?

Also, I had to look up the contraction for "will not" because I always
get that confused (won't).  I just found a web page on it:
http://www.straightdope.com/mailbag/mwont.html

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 




Re: [GENERAL] Idea for the statistics collector

From
Tom Lane
Date:
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> I was thinking of writing a command line tool like 'pgtune' that looks at
> the stats views and will generate SQL code for, or do automatically the
> following:

> * Dropping indices that are never used
> * Creating appropriate indices to avoid large, expensive sequential scans.

Dropping unused indices sounds good --- but beware of dropping unique
indexes; they may be there to enforce a constraint, and not because of
any desire to use them in queries.

I'm not sure how you're going to automatically intuit appropriate
indexes to add, though.  You'd need to look at a suitable workload
(ie, a representative set of queries) which is not data that's readily
available from the stats views.  Perhaps we could expect the DBA to
provide a segment of log output that includes debug_print_query
and show_query_stats results.
        regards, tom lane




Re: [GENERAL] Idea for the statistics collector

From
Bruce Momjian
Date:
Added to TODO:
* Add tool to query pg_stat_* tables and report indexes that aren't needed  or tables that might need indexes

---------------------------------------------------------------------------

Christopher Kings-Lynne wrote:
> I was thinking of writing a command line tool like 'pgtune' that looks at
> the stats views and will generate SQL code for, or do automatically the
> following:
> 
> * Dropping indices that are never used
> * Creating appropriate indices to avoid large, expensive sequential scans.
> 
> This would put us in the 'mysql makes my indices for me by magic' league -
> but would be far more powerful and flexible.  How to do multikey indices is
> beyond me tho.
> 
> *sigh* I'm recovering from a septoplasty on my nose atm, so I might have
> some time to do some coding!
> 
> Chris
> 
> ----- Original Message -----
> From: "Bruce Momjian" <pgman@candle.pha.pa.us>
> To: "Martijn van Oosterhout" <kleptog@svana.org>
> Cc: "PostgreSQL-development" <pgsql-hackers@postgresql.org>
> Sent: Friday, June 21, 2002 10:50 AM
> Subject: Re: [HACKERS] [GENERAL] Idea for the statistics collector
> 
> 
> > Martijn van Oosterhout wrote:
> > > Since it's currently all for collecting statistics on tables, why can't
> it
> > > collect another type of statistic, like:
> > >
> > > - How often the estimator gets it wrong?
> > >
> > > At the end of an index scan, the executor could compare the number of
> rows
> > > returned against what was estimated, and if it falls outside a certain
> > > range, flag it.
> > >
> > > Also, the average ratio of rows coming out of a distinct node vs the
> number
> > > going in.
> > >
> > > For a join clause, the amount of correlation between two columns (hard).
> > >
> > > etc
> > >
> > > Ideally, the planner could then use this info to make better plans.
> > > Eventually, the whole system could become somewhat self-tuning.
> > >
> > > Does anyone see any problems with this?
> >
> > [ Discussion moved to hackers.]
> >
> > I have thought that some type of feedback from the executor back into
> > the optimizer would be a good feature.  Not sure how to do it, but your
> > idea makes sense.  It certainly could update the table statistics after
> > a sequential scan.
> >
> > --
> >   Bruce Momjian                        |  http://candle.pha.pa.us
> >   pgman@candle.pha.pa.us               |  (610) 853-3000
> >   +  If your life is a hard drive,     |  830 Blythe Avenue
> >   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: Have you searched our list archives?
> >
> > http://archives.postgresql.org
> >
> 
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 
> 
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: [GENERAL] Idea for the statistics collector

From
Christopher Kings-Lynne
Date:
Good god - how old was that email? 2002???

Chris

Bruce Momjian wrote:
> Added to TODO:
> 
>     * Add tool to query pg_stat_* tables and report indexes that aren't needed
>       or tables that might need indexes
> 
> ---------------------------------------------------------------------------
> 
> Christopher Kings-Lynne wrote:
> 
>>I was thinking of writing a command line tool like 'pgtune' that looks at
>>the stats views and will generate SQL code for, or do automatically the
>>following:
>>
>>* Dropping indices that are never used
>>* Creating appropriate indices to avoid large, expensive sequential scans.
>>
>>This would put us in the 'mysql makes my indices for me by magic' league -
>>but would be far more powerful and flexible.  How to do multikey indices is
>>beyond me tho.
>>
>>*sigh* I'm recovering from a septoplasty on my nose atm, so I might have
>>some time to do some coding!
>>
>>Chris
>>
>>----- Original Message -----
>>From: "Bruce Momjian" <pgman@candle.pha.pa.us>
>>To: "Martijn van Oosterhout" <kleptog@svana.org>
>>Cc: "PostgreSQL-development" <pgsql-hackers@postgresql.org>
>>Sent: Friday, June 21, 2002 10:50 AM
>>Subject: Re: [HACKERS] [GENERAL] Idea for the statistics collector
>>
>>
>>
>>>Martijn van Oosterhout wrote:
>>>
>>>>Since it's currently all for collecting statistics on tables, why can't
>>
>>it
>>
>>>>collect another type of statistic, like:
>>>>
>>>>- How often the estimator gets it wrong?
>>>>
>>>>At the end of an index scan, the executor could compare the number of
>>
>>rows
>>
>>>>returned against what was estimated, and if it falls outside a certain
>>>>range, flag it.
>>>>
>>>>Also, the average ratio of rows coming out of a distinct node vs the
>>
>>number
>>
>>>>going in.
>>>>
>>>>For a join clause, the amount of correlation between two columns (hard).
>>>>
>>>>etc
>>>>
>>>>Ideally, the planner could then use this info to make better plans.
>>>>Eventually, the whole system could become somewhat self-tuning.
>>>>
>>>>Does anyone see any problems with this?
>>>
>>>[ Discussion moved to hackers.]
>>>
>>>I have thought that some type of feedback from the executor back into
>>>the optimizer would be a good feature.  Not sure how to do it, but your
>>>idea makes sense.  It certainly could update the table statistics after
>>>a sequential scan.
>>>
>>>--
>>>  Bruce Momjian                        |  http://candle.pha.pa.us
>>>  pgman@candle.pha.pa.us               |  (610) 853-3000
>>>  +  If your life is a hard drive,     |  830 Blythe Avenue
>>>  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
>>>
>>>---------------------------(end of broadcast)---------------------------
>>>TIP 6: Have you searched our list archives?
>>>
>>>http://archives.postgresql.org
>>>
>>
>>
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 4: Don't 'kill -9' the postmaster
>>
>>
>>
> 
> 


Re: [GENERAL] Idea for the statistics collector

From
Bruce Momjian
Date:
Christopher Kings-Lynne wrote:
> Good god - how old was that email? 2002???

Yep, and been in my mailbox since then, waiting for me to process it
into a TODO entry.

---------------------------------------------------------------------------


> 
> Chris
> 
> Bruce Momjian wrote:
> > Added to TODO:
> > 
> >     * Add tool to query pg_stat_* tables and report indexes that aren't needed
> >       or tables that might need indexes
> > 
> > ---------------------------------------------------------------------------
> > 
> > Christopher Kings-Lynne wrote:
> > 
> >>I was thinking of writing a command line tool like 'pgtune' that looks at
> >>the stats views and will generate SQL code for, or do automatically the
> >>following:
> >>
> >>* Dropping indices that are never used
> >>* Creating appropriate indices to avoid large, expensive sequential scans.
> >>
> >>This would put us in the 'mysql makes my indices for me by magic' league -
> >>but would be far more powerful and flexible.  How to do multikey indices is
> >>beyond me tho.
> >>
> >>*sigh* I'm recovering from a septoplasty on my nose atm, so I might have
> >>some time to do some coding!
> >>
> >>Chris
> >>
> >>----- Original Message -----
> >>From: "Bruce Momjian" <pgman@candle.pha.pa.us>
> >>To: "Martijn van Oosterhout" <kleptog@svana.org>
> >>Cc: "PostgreSQL-development" <pgsql-hackers@postgresql.org>
> >>Sent: Friday, June 21, 2002 10:50 AM
> >>Subject: Re: [HACKERS] [GENERAL] Idea for the statistics collector
> >>
> >>
> >>
> >>>Martijn van Oosterhout wrote:
> >>>
> >>>>Since it's currently all for collecting statistics on tables, why can't
> >>
> >>it
> >>
> >>>>collect another type of statistic, like:
> >>>>
> >>>>- How often the estimator gets it wrong?
> >>>>
> >>>>At the end of an index scan, the executor could compare the number of
> >>
> >>rows
> >>
> >>>>returned against what was estimated, and if it falls outside a certain
> >>>>range, flag it.
> >>>>
> >>>>Also, the average ratio of rows coming out of a distinct node vs the
> >>
> >>number
> >>
> >>>>going in.
> >>>>
> >>>>For a join clause, the amount of correlation between two columns (hard).
> >>>>
> >>>>etc
> >>>>
> >>>>Ideally, the planner could then use this info to make better plans.
> >>>>Eventually, the whole system could become somewhat self-tuning.
> >>>>
> >>>>Does anyone see any problems with this?
> >>>
> >>>[ Discussion moved to hackers.]
> >>>
> >>>I have thought that some type of feedback from the executor back into
> >>>the optimizer would be a good feature.  Not sure how to do it, but your
> >>>idea makes sense.  It certainly could update the table statistics after
> >>>a sequential scan.
> >>>
> >>>--
> >>>  Bruce Momjian                        |  http://candle.pha.pa.us
> >>>  pgman@candle.pha.pa.us               |  (610) 853-3000
> >>>  +  If your life is a hard drive,     |  830 Blythe Avenue
> >>>  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
> >>>
> >>>---------------------------(end of broadcast)---------------------------
> >>>TIP 6: Have you searched our list archives?
> >>>
> >>>http://archives.postgresql.org
> >>>
> >>
> >>
> >>
> >>
> >>---------------------------(end of broadcast)---------------------------
> >>TIP 4: Don't 'kill -9' the postmaster
> >>
> >>
> >>
> > 
> > 
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: [GENERAL] Idea for the statistics collector

From
Andreas Pflug
Date:
Bruce Momjian wrote:
> Christopher Kings-Lynne wrote:
> 
>>Good god - how old was that email? 2002???
> 
> 
> Yep, and been in my mailbox since then, waiting for me to process it
> into a TODO entry.

Exciting what one can find wiping the floor of the mailbox :-)

Regards,
Andreas