Thread: Re: [GENERAL] Idea for the statistics collector
Martijn van Oosterhout wrote: > Since it's currently all for collecting statistics on tables, why can't it > collect another type of statistic, like: > > - How often the estimator gets it wrong? > > At the end of an index scan, the executor could compare the number of rows > returned against what was estimated, and if it falls outside a certain > range, flag it. > > Also, the average ratio of rows coming out of a distinct node vs the number > going in. > > For a join clause, the amount of correlation between two columns (hard). > > etc > > Ideally, the planner could then use this info to make better plans. > Eventually, the whole system could become somewhat self-tuning. > > Does anyone see any problems with this? [ Discussion moved to hackers.] I have thought that some type of feedback from the executor back into the optimizer would be a good feature. Not sure how to do it, but your idea makes sense. It certainly could update the table statistics after a sequential scan. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Thu, 20 Jun 2002 22:50:04 -0400 (EDT) "Bruce Momjian" <pgman@candle.pha.pa.us> wrote: > I have thought that some type of feedback from the executor back into > the optimizer would be a good feature. Not sure how to do it, but your > idea makes sense. It certainly could update the table statistics after > a sequential scan. Search the archives for a thread I started on -hackers called "self-tuning histograms", which talks about a pretty similar idea. The technique there applies only to histograms, and builds the histogram based *only* upon the data provided by the executor. Tom commented that it's probably a better idea to concentrate on more elementary techniques, like multi-dimensional histograms, before starting on ST histograms. I agree, and plan to look at multi-dimensional histograms when I get some spare time. Cheers, Neil -- Neil Conway <neilconway@rogers.com> PGP Key ID: DB3C29FC
Neil Conway wrote: > On Thu, 20 Jun 2002 22:50:04 -0400 (EDT) > "Bruce Momjian" <pgman@candle.pha.pa.us> wrote: > > I have thought that some type of feedback from the executor back into > > the optimizer would be a good feature. Not sure how to do it, but your > > idea makes sense. It certainly could update the table statistics after > > a sequential scan. > > Search the archives for a thread I started on -hackers called "self-tuning > histograms", which talks about a pretty similar idea. The technique there > applies only to histograms, and builds the histogram based *only* upon > the data provided by the executor. > > Tom commented that it's probably a better idea to concentrate on more > elementary techniques, like multi-dimensional histograms, before starting > on ST histograms. I agree, and plan to look at multi-dimensional histograms > when I get some spare time. I was thinking of something much more elementary, like a table that reports to have 50 blocks but an executor sequential scan shows 500 blocks. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
I was thinking of writing a command line tool like 'pgtune' that looks at the stats views and will generate SQL code for, or do automatically the following: * Dropping indices that are never used * Creating appropriate indices to avoid large, expensive sequential scans. This would put us in the 'mysql makes my indices for me by magic' league - but would be far more powerful and flexible. How to do multikey indices is beyond me tho. *sigh* I'm recovering from a septoplasty on my nose atm, so I might have some time to do some coding! Chris ----- Original Message ----- From: "Bruce Momjian" <pgman@candle.pha.pa.us> To: "Martijn van Oosterhout" <kleptog@svana.org> Cc: "PostgreSQL-development" <pgsql-hackers@postgresql.org> Sent: Friday, June 21, 2002 10:50 AM Subject: Re: [HACKERS] [GENERAL] Idea for the statistics collector > Martijn van Oosterhout wrote: > > Since it's currently all for collecting statistics on tables, why can't it > > collect another type of statistic, like: > > > > - How often the estimator gets it wrong? > > > > At the end of an index scan, the executor could compare the number of rows > > returned against what was estimated, and if it falls outside a certain > > range, flag it. > > > > Also, the average ratio of rows coming out of a distinct node vs the number > > going in. > > > > For a join clause, the amount of correlation between two columns (hard). > > > > etc > > > > Ideally, the planner could then use this info to make better plans. > > Eventually, the whole system could become somewhat self-tuning. > > > > Does anyone see any problems with this? > > [ Discussion moved to hackers.] > > I have thought that some type of feedback from the executor back into > the optimizer would be a good feature. Not sure how to do it, but your > idea makes sense. It certainly could update the table statistics after > a sequential scan. > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
Christopher Kings-Lynne wrote: > I was thinking of writing a command line tool like 'pgtune' that looks at > the stats views and will generate SQL code for, or do automatically the > following: > > * Dropping indices that are never used > * Creating appropriate indices to avoid large, expensive sequential scans. > > This would put us in the 'mysql makes my indices for me by magic' league - > but would be far more powerful and flexible. How to do multikey indices is > beyond me tho. This is a great idea. I have been wanting to do something like this myself but probably won't get the time. Does MySQL really make indexes by magic? Also, I had to look up the contraction for "will not" because I always get that confused (won't). I just found a web page on it: http://www.straightdope.com/mailbag/mwont.html -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes: > I was thinking of writing a command line tool like 'pgtune' that looks at > the stats views and will generate SQL code for, or do automatically the > following: > * Dropping indices that are never used > * Creating appropriate indices to avoid large, expensive sequential scans. Dropping unused indices sounds good --- but beware of dropping unique indexes; they may be there to enforce a constraint, and not because of any desire to use them in queries. I'm not sure how you're going to automatically intuit appropriate indexes to add, though. You'd need to look at a suitable workload (ie, a representative set of queries) which is not data that's readily available from the stats views. Perhaps we could expect the DBA to provide a segment of log output that includes debug_print_query and show_query_stats results. regards, tom lane
Added to TODO: * Add tool to query pg_stat_* tables and report indexes that aren't needed or tables that might need indexes --------------------------------------------------------------------------- Christopher Kings-Lynne wrote: > I was thinking of writing a command line tool like 'pgtune' that looks at > the stats views and will generate SQL code for, or do automatically the > following: > > * Dropping indices that are never used > * Creating appropriate indices to avoid large, expensive sequential scans. > > This would put us in the 'mysql makes my indices for me by magic' league - > but would be far more powerful and flexible. How to do multikey indices is > beyond me tho. > > *sigh* I'm recovering from a septoplasty on my nose atm, so I might have > some time to do some coding! > > Chris > > ----- Original Message ----- > From: "Bruce Momjian" <pgman@candle.pha.pa.us> > To: "Martijn van Oosterhout" <kleptog@svana.org> > Cc: "PostgreSQL-development" <pgsql-hackers@postgresql.org> > Sent: Friday, June 21, 2002 10:50 AM > Subject: Re: [HACKERS] [GENERAL] Idea for the statistics collector > > > > Martijn van Oosterhout wrote: > > > Since it's currently all for collecting statistics on tables, why can't > it > > > collect another type of statistic, like: > > > > > > - How often the estimator gets it wrong? > > > > > > At the end of an index scan, the executor could compare the number of > rows > > > returned against what was estimated, and if it falls outside a certain > > > range, flag it. > > > > > > Also, the average ratio of rows coming out of a distinct node vs the > number > > > going in. > > > > > > For a join clause, the amount of correlation between two columns (hard). > > > > > > etc > > > > > > Ideally, the planner could then use this info to make better plans. > > > Eventually, the whole system could become somewhat self-tuning. > > > > > > Does anyone see any problems with this? > > > > [ Discussion moved to hackers.] > > > > I have thought that some type of feedback from the executor back into > > the optimizer would be a good feature. Not sure how to do it, but your > > idea makes sense. It certainly could update the table statistics after > > a sequential scan. > > > > -- > > Bruce Momjian | http://candle.pha.pa.us > > pgman@candle.pha.pa.us | (610) 853-3000 > > + If your life is a hard drive, | 830 Blythe Avenue > > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: Have you searched our list archives? > > > > http://archives.postgresql.org > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Good god - how old was that email? 2002??? Chris Bruce Momjian wrote: > Added to TODO: > > * Add tool to query pg_stat_* tables and report indexes that aren't needed > or tables that might need indexes > > --------------------------------------------------------------------------- > > Christopher Kings-Lynne wrote: > >>I was thinking of writing a command line tool like 'pgtune' that looks at >>the stats views and will generate SQL code for, or do automatically the >>following: >> >>* Dropping indices that are never used >>* Creating appropriate indices to avoid large, expensive sequential scans. >> >>This would put us in the 'mysql makes my indices for me by magic' league - >>but would be far more powerful and flexible. How to do multikey indices is >>beyond me tho. >> >>*sigh* I'm recovering from a septoplasty on my nose atm, so I might have >>some time to do some coding! >> >>Chris >> >>----- Original Message ----- >>From: "Bruce Momjian" <pgman@candle.pha.pa.us> >>To: "Martijn van Oosterhout" <kleptog@svana.org> >>Cc: "PostgreSQL-development" <pgsql-hackers@postgresql.org> >>Sent: Friday, June 21, 2002 10:50 AM >>Subject: Re: [HACKERS] [GENERAL] Idea for the statistics collector >> >> >> >>>Martijn van Oosterhout wrote: >>> >>>>Since it's currently all for collecting statistics on tables, why can't >> >>it >> >>>>collect another type of statistic, like: >>>> >>>>- How often the estimator gets it wrong? >>>> >>>>At the end of an index scan, the executor could compare the number of >> >>rows >> >>>>returned against what was estimated, and if it falls outside a certain >>>>range, flag it. >>>> >>>>Also, the average ratio of rows coming out of a distinct node vs the >> >>number >> >>>>going in. >>>> >>>>For a join clause, the amount of correlation between two columns (hard). >>>> >>>>etc >>>> >>>>Ideally, the planner could then use this info to make better plans. >>>>Eventually, the whole system could become somewhat self-tuning. >>>> >>>>Does anyone see any problems with this? >>> >>>[ Discussion moved to hackers.] >>> >>>I have thought that some type of feedback from the executor back into >>>the optimizer would be a good feature. Not sure how to do it, but your >>>idea makes sense. It certainly could update the table statistics after >>>a sequential scan. >>> >>>-- >>> Bruce Momjian | http://candle.pha.pa.us >>> pgman@candle.pha.pa.us | (610) 853-3000 >>> + If your life is a hard drive, | 830 Blythe Avenue >>> + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 >>> >>>---------------------------(end of broadcast)--------------------------- >>>TIP 6: Have you searched our list archives? >>> >>>http://archives.postgresql.org >>> >> >> >> >> >>---------------------------(end of broadcast)--------------------------- >>TIP 4: Don't 'kill -9' the postmaster >> >> >> > >
Christopher Kings-Lynne wrote: > Good god - how old was that email? 2002??? Yep, and been in my mailbox since then, waiting for me to process it into a TODO entry. --------------------------------------------------------------------------- > > Chris > > Bruce Momjian wrote: > > Added to TODO: > > > > * Add tool to query pg_stat_* tables and report indexes that aren't needed > > or tables that might need indexes > > > > --------------------------------------------------------------------------- > > > > Christopher Kings-Lynne wrote: > > > >>I was thinking of writing a command line tool like 'pgtune' that looks at > >>the stats views and will generate SQL code for, or do automatically the > >>following: > >> > >>* Dropping indices that are never used > >>* Creating appropriate indices to avoid large, expensive sequential scans. > >> > >>This would put us in the 'mysql makes my indices for me by magic' league - > >>but would be far more powerful and flexible. How to do multikey indices is > >>beyond me tho. > >> > >>*sigh* I'm recovering from a septoplasty on my nose atm, so I might have > >>some time to do some coding! > >> > >>Chris > >> > >>----- Original Message ----- > >>From: "Bruce Momjian" <pgman@candle.pha.pa.us> > >>To: "Martijn van Oosterhout" <kleptog@svana.org> > >>Cc: "PostgreSQL-development" <pgsql-hackers@postgresql.org> > >>Sent: Friday, June 21, 2002 10:50 AM > >>Subject: Re: [HACKERS] [GENERAL] Idea for the statistics collector > >> > >> > >> > >>>Martijn van Oosterhout wrote: > >>> > >>>>Since it's currently all for collecting statistics on tables, why can't > >> > >>it > >> > >>>>collect another type of statistic, like: > >>>> > >>>>- How often the estimator gets it wrong? > >>>> > >>>>At the end of an index scan, the executor could compare the number of > >> > >>rows > >> > >>>>returned against what was estimated, and if it falls outside a certain > >>>>range, flag it. > >>>> > >>>>Also, the average ratio of rows coming out of a distinct node vs the > >> > >>number > >> > >>>>going in. > >>>> > >>>>For a join clause, the amount of correlation between two columns (hard). > >>>> > >>>>etc > >>>> > >>>>Ideally, the planner could then use this info to make better plans. > >>>>Eventually, the whole system could become somewhat self-tuning. > >>>> > >>>>Does anyone see any problems with this? > >>> > >>>[ Discussion moved to hackers.] > >>> > >>>I have thought that some type of feedback from the executor back into > >>>the optimizer would be a good feature. Not sure how to do it, but your > >>>idea makes sense. It certainly could update the table statistics after > >>>a sequential scan. > >>> > >>>-- > >>> Bruce Momjian | http://candle.pha.pa.us > >>> pgman@candle.pha.pa.us | (610) 853-3000 > >>> + If your life is a hard drive, | 830 Blythe Avenue > >>> + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > >>> > >>>---------------------------(end of broadcast)--------------------------- > >>>TIP 6: Have you searched our list archives? > >>> > >>>http://archives.postgresql.org > >>> > >> > >> > >> > >> > >>---------------------------(end of broadcast)--------------------------- > >>TIP 4: Don't 'kill -9' the postmaster > >> > >> > >> > > > > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian wrote: > Christopher Kings-Lynne wrote: > >>Good god - how old was that email? 2002??? > > > Yep, and been in my mailbox since then, waiting for me to process it > into a TODO entry. Exciting what one can find wiping the floor of the mailbox :-) Regards, Andreas