Thread: Use of index in 7.0 vs 6.5
Tom (Or anyone else who is good with PostgreSQL statistics), I am in the process of transitioning from postgreSQL 6.5.3 to postgreSQL 7.0. I ran into an issue where a sequential scan is being choosen on postgreSQL 7.0 where an index scan was choosen on postgreSQL 6.5.3. Note: All tables have been freshly vacuum'd and analyzed. procman=# select version(); version -------------------------------------------------------------------PostgreSQL 7.0.0 on hppa2.0w-hp-hpux11.00, compiled bygcc 2.95.2 (1 row) procman=# explain select count(catagory) from medusa where host_id = 404 and catagory like 'A%'; NOTICE: QUERY PLAN: Aggregate (cost=189546.19..189546.19 rows=1 width=12) -> Seq Scan on medusa (cost=0.00..189529.43 rows=6704 width=12) EXPLAIN Note: The above query produces an index scan on postgreSQL 6.5.3. procman=# set enable_seqscan = off; SET VARIABLE procman=# explain select count(catagory) from medusa where host_id = 404 and catagory like 'A%'; NOTICE: QUERY PLAN: Aggregate (cost=207347.36..207347.36 rows=1 width=12) -> Index Scan using medusa_host_id_key on medusa (cost=0.00..207330.60 rows=6704 width=12) EXPLAIN Here are the statistics: procman=# select attname,attdisbursion,s.* procman-# from pg_statistic s, pg_attribute a, pg_class c procman-# where starelid = c.oid and attrelid = c.oid and staattnum = attnum procman-# and relname = 'medusa'; attname | attdisbursion | starelid | staattnum | staop | stanullfrac | stacommonfrac | stacommonval |staloval | stahival -----------+---------------+----------+-----------+-------+-------------+---------------+-----------------------------------------------------------------------------+---------------------------- --------------------------------------+----------------------------------------- host_id | 0.00621312 | 30874288 | 1 | 97 | 0 | 0.0279425 | 446 | 0 | 11011 (1 row) Here is my analysis of the stastics (based on the examples in the archive). The most common value host_id in the table is 446 with row fraction of ~ 2.8%. The estimated number of rows in the index is 6704. This table has 4,630,229 entries in the table. Hopefully this analysis is correct, if not .. please correct me :) I do not understand why the planner would choose a seqscan over the index scan because 6704/4,630,229 is ~ 0.15%. Thanks for your time, Ryan - Ryan
Ryan Bradetich <ryan_bradetich@hp.com> writes: > I am in the process of transitioning from postgreSQL 6.5.3 to > postgreSQL 7.0. I ran into an issue where a sequential scan > is being choosen on postgreSQL 7.0 where an index scan was > choosen on postgreSQL 6.5.3. Since you're complaining, I assume the seqscan is slower ;-). But you didn't say how much slower --- what are the actual timings? Basically what's going on here is that we need to tune the fudge-factor constants in the cost model so that they have something to do with reality on as wide a variety of systems as possible. You did an excellent job of showing the estimates the planner computed --- but what we really need here is to see how those relate to reality. > I do not understand why the planner would choose a seqscan over the > index scan because 6704/4,630,229 is ~ 0.15%. I'm a bit surprised too. What is the average tuple width on this table? (Actually, probably a better question is how many pages and tuples are in the relation according to its pg_class entry. Try "select * from pgclass where relname = 'medusa'".) regards, tom lane
Tom Lane wrote: > Ryan Bradetich <ryan_bradetich@hp.com> writes: > > I am in the process of transitioning from postgreSQL 6.5.3 to > > postgreSQL 7.0. I ran into an issue where a sequential scan > > is being choosen on postgreSQL 7.0 where an index scan was > > choosen on postgreSQL 6.5.3. > > Since you're complaining, I assume the seqscan is slower ;-). > But you didn't say how much slower --- what are the actual timings? Opps... Had them written down, just forgot to include them in the email :) with enable_seqscan = on: real 18.05 sys 0.01 user 0.02 with enable_seqscan = off: real 0.08 sys 0.01 user 0.02 I stopped and restarted the postmaster daemon between these timing to flush the cache. > Basically what's going on here is that we need to tune the fudge-factor > constants in the cost model so that they have something to do with > reality on as wide a variety of systems as possible. You did an > excellent job of showing the estimates the planner computed --- but > what we really need here is to see how those relate to reality. > > > I do not understand why the planner would choose a seqscan over the > > index scan because 6704/4,630,229 is ~ 0.15%. > > I'm a bit surprised too. What is the average tuple width on this table? > (Actually, probably a better question is how many pages and tuples > are in the relation according to its pg_class entry. Try "select * from > pgclass where relname = 'medusa'".) > > regards, tom lane procman=# select * from pg_class where relname = 'medusa';relname | reltype | relowner | relam | relpages | reltuples | rellongrelid | relhasindex | relisshared | relkind | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhas pkey | relhasrules | relacl ---------+---------+----------+-------+----------+-----------+--------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------- -----+-------------+--------medusa | 0 | 36000 | 0 | 120076 | 4630229 | 0 | t | f | r | 6 | 0 | 0 | 0 | 0 | 0 | f | f | (1 row) procman=# \d medusa Table "medusa"Attribute | Type | Modifier -----------+-----------+----------host_id | integer |timestamp | timestamp |current | integer |catagory | text |cat_desc | text |anomaly | text | This table has two fairly large text fields, the cat_desc and the anomaly. The catagory field is very short and in the format: [ABC][0-9][0-9]. Thanks for the help, - Ryan
Ryan Bradetich <ryan_bradetich@hp.com> writes: > procman=# explain select count(catagory) from medusa where host_id = 404 > and catagory like 'A%'; > Here is my analysis of the stastics (based on the examples in the > archive). > The most common value host_id in the table is 446 with row fraction of > ~ 2.8%. The estimated number of rows in the index is 6704. This > table has 4,630,229 entries in the table. > I do not understand why the planner would choose a seqscan over the > index scan because 6704/4,630,229 is ~ 0.15%. I see at least part of the answer. The 'rows=' number in the EXPLAIN output is the planner's estimate of the net number of rows out after applying all available WHERE restrictions. In this case we've got a clause "host_id = 404" which can be used with the index on host_id, and then we have another clause "catagory like 'A%'" which will be applied on-the-fly to the tuples returned by the indexscan. The rows number tells you about the estimated number of rows out after that second filter step. However, the cost of the indexscan depends mainly on the number of tuples that have to be fetched, and that is determined by the selectivity of just the "host_id = 404" clause. I made a dummy table with the schema you showed and then inserted the statistics you reported into the system tables (who's afraid of "update pg_class ..." ;-)). If I didn't mess up, you should be able to reproduce these EXPLAIN results: set enable_seqscan = off; explain select count(catagory) from medusa where host_id = 404; NOTICE: QUERY PLAN: Aggregate (cost=206943.69..206943.69 rows=1 width=12) -> Index Scan using medusa_host_id_key on medusa (cost=0.00..206781.97rows=64690 width=12) set enable_seqscan = on; explain select count(catagory) from medusa where host_id = 404; NOTICE: QUERY PLAN: Aggregate (cost=178115.59..178115.59 rows=1 width=12) -> Seq Scan on medusa (cost=0.00..177953.86 rows=64690 width=12) This shows that the planner is actually estimating that the indexscan will fetch about 64690 rows (of which it guesses only 6704 will remain after applying the catagory clause, but that's not really relevant to the cost estimate). Since there are 120076 pages in the table, that would mean pretty nearly one separate page fetch for each retrieved tuple, if the matching tuples are randomly distributed --- and that very probably *would* take more time than reading the whole table sequentially. So the planner's chain of logic holds up if all these assumptions are correct. Since you find that in reality the indexscan method is very quick, I'm guessing that there are actually fairly few tuples matching host_id = 404. Could you run a quick "select count(*)" to check? This seems to point up (once again) the deficiency of assuming that the most-common value in the table is a good guide to the frequency of typical values. You showed that host_id = 446 occurs in 2.8% of the rows in this table; a search for 446 very probably would be faster as a seqscan than as an indexscan (you might care to try it and see). But that's probably a statistical outlier that's not got much to do with the frequency of typical values in the table. The only really good answer to this problem is to collect more-detailed statistics in VACUUM ANALYZE, which I hope to see us doing in a release or so. In the meantime I am wondering about deliberately skewing the cost model in favor of indexscans, because I sure haven't heard many complaints about erroneous selection of indexscans... One way to put a thumb on the scales is to reduce the value of the SET variable random_page_cost. The default value is 4.0, which seems to correspond more or less to reality, but reducing it to 3 or so would shift the planner pretty nicely in the direction of indexscans. regards, tom lane
> -----Original Message----- > From: pgsql-sql-owner@hub.org [mailto:pgsql-sql-owner@hub.org]On Behalf > Of Tom Lane > [snip] > > This seems to point up (once again) the deficiency of assuming that > the most-common value in the table is a good guide to the frequency > of typical values. You showed that host_id = 446 occurs in 2.8% of > the rows in this table; a search for 446 very probably would be faster > as a seqscan than as an indexscan (you might care to try it and see). > But that's probably a statistical outlier that's not got much to do > with the frequency of typical values in the table. > > The only really good answer to this problem is to collect more-detailed > statistics in VACUUM ANALYZE, which I hope to see us doing in a release > or so. For example we could count up distinct values for the first column of an index by scanning its index relation. > In the meantime I am wondering about deliberately skewing the > cost model in favor of indexscans, because I sure haven't heard many > complaints about erroneous selection of indexscans... > > One way to put a thumb on the scales is to reduce the value of the SET > variable random_page_cost. The default value is 4.0, which seems to > correspond more or less to reality, but reducing it to 3 or so would > shift the planner pretty nicely in the direction of indexscans. > Or how about changing current fudge factor ? For example,from 0.5 to 0.2 which is the fudge factor of attdisbursion calculation. Regards. Hiroshi Inoue Inoue@tpf.co.jp
> For example we could count up distinct values for the first > column of an index by scanning its index relation. Oracle has index keys distribution page... And we have near empty meta-page for each index AM... And backend reads this page for each insert... Vadim
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: >> One way to put a thumb on the scales is to reduce the value of the SET >> variable random_page_cost. The default value is 4.0, which seems to >> correspond more or less to reality, but reducing it to 3 or so would >> shift the planner pretty nicely in the direction of indexscans. > Or how about changing current fudge factor ? > For example,from 0.5 to 0.2 which is the fudge factor of attdisbursion > calculation. Yes, that's another way --- and probably more defensible than changing random_page_cost, now that I think about it. Unfortunately it's a hardwired constant and so not as easily experimented with :-(. regards, tom lane
[ Charset ISO-8859-1 unsupported, converting... ] > > For example we could count up distinct values for the first > > column of an index by scanning its index relation. > > Oracle has index keys distribution page... And we have near empty > meta-page for each index AM... And backend reads this page for > each insert... That certainly would make sense. We have hesitated to gather more statistics because of the time involved. Fuller statistics on just the indexed columns could be a big win and be done fairly quickly because the rows are already sorted in the index. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane wrote: > One way to put a thumb on the scales is to reduce the value of the SET > variable random_page_cost. The default value is 4.0, which seems to > correspond more or less to reality, but reducing it to 3 or so would > shift the planner pretty nicely in the direction of indexscans. > like you may know, i've gone through a lot of things related to cost estimates & i've convinced myself that (on paper) they're pretty good (other than selectivity estimates, of course). this may be heavily influenced by my choice of OS & the corresponding disk cache management scheme, but it seems to me that the sequential scan cost is a lot closer to reality because cache doesn't come into play as much as compared to the indexed scans which (probably) are heavily influence by disk cache. unfortunately, you'd have to simulate some very high load conditions to negate the disk cache effect and convince people that the numbers are actually fairly correct. even though i like to think i have very important & demanding, end-all, be-all applications and that postgresql should cater to my situation, i, like almost all of the other users, operate in a very low load environments for the most part. therefore, you may want to consider placing the defaults where they may be more advantageous to the casual user (or at least to the user which has a small number of backends & is fairly dedicated to operating as a database server) anyway, it seems to me that if you're going to fudge a number, random_page_cost is probably the best place to do it because it factors so heavily in the total cost. fudging things in the selectivity can be frustrating because you can fudge for a certain table distribution, but who's to say that the same table distribution will be the same for all tables? since the selectivity doesn't have a lot of real-world relevance, what do you gain in a general case by fudging it? at least messing with random_page_cost, you know that on paper the value of 4 is reasonably close to reality and that you're factoring in disk cache, which should be fairly constant on your system (assuming a similar volume of disk accesses). on another topic, is there a list somewhere of the variables that you can adjust with a SET command? some of the variables have been invaluable in learning about cost estimates, but i can't remember seeing a list of them documented anywhere -- i've just stumbled onto them, more or less. jeff
Bruce Momjian <pgman@candle.pha.pa.us> writes: > That certainly would make sense. We have hesitated to gather more > statistics because of the time involved. Fuller statistics on just the > indexed columns could be a big win and be done fairly quickly because > the rows are already sorted in the index. Yeah, a scan over just the index itself would be a perfect way to gather stats. The normal objection to it (can't tell whether entries correspond to currently-valid tuples) doesn't apply, because we don't really care whether the stats are perfectly accurate. Should put this in TODO, along with something about splitting the ANALYZE function out of VACUUM and making it invokable as a separate statement. regards, tom lane
Jeff Hoffmann <jeff@propertykey.com> writes: > on another topic, is there a list somewhere of the variables that you > can adjust with a SET command? SET command reference page is fairly complete, I think (but if you want The Whole Truth And Nothing But, see src/backend/commands/variable.c). regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > > That certainly would make sense. We have hesitated to gather more > > statistics because of the time involved. Fuller statistics on just the > > indexed columns could be a big win and be done fairly quickly because > > the rows are already sorted in the index. > > Yeah, a scan over just the index itself would be a perfect way to > gather stats. The normal objection to it (can't tell whether entries > correspond to currently-valid tuples) doesn't apply, because we don't > really care whether the stats are perfectly accurate. > > Should put this in TODO, along with something about splitting the > ANALYZE function out of VACUUM and making it invokable as a separate > statement. Added: * Remove ANALYZE from VACUUM so it can be run separately without locks * Gather more accurate statistics using indexes -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > Oracle has index keys distribution page... And we have near empty > > meta-page for each index AM... And backend reads this page for > > each insert... > > That certainly would make sense. We have hesitated to gather more > statistics because of the time involved. Fuller statistics > on just the > indexed columns could be a big win and be done fairly quickly because > the rows are already sorted in the index. So, add new item to TODO -:) (I'm sure that I've told about index keys distribution page already - I become tired to say about the same more and more -:)) Vadim
> Yeah, a scan over just the index itself would be a perfect way to ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I believe that Oracle updates index statistic on-fly... Meta-page is always in cache for inserts, so there will be no additional reads. > gather stats. The normal objection to it (can't tell whether entries > correspond to currently-valid tuples) doesn't apply, because we don't > really care whether the stats are perfectly accurate. Vadim
Tom Lane wrote: > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > >> One way to put a thumb on the scales is to reduce the value of the SET > >> variable random_page_cost. The default value is 4.0, which seems to > >> correspond more or less to reality, but reducing it to 3 or so would > >> shift the planner pretty nicely in the direction of indexscans. This worked great! Is their a place I can change the default to 3? I do not want to change all the scripts to include this :) > > Or how about changing current fudge factor ? > > For example,from 0.5 to 0.2 which is the fudge factor of attdisbursion > > calculation. > > Yes, that's another way --- and probably more defensible than changing > random_page_cost, now that I think about it. Unfortunately it's a > hardwired constant and so not as easily experimented with :-(. > > regards, tom lane Can you give me more information about this? I do not have a problem re-compiling the database and performing more testing if you would like. Tom, To answer your question in a previous post: Since you find that in reality the indexscan method is very quick, I'm guessing that there are actually fairly few tuples matching host_id = 404. Could you run a quick "select count(*)" to check? procman=# select count(*) from medusa where host_id = 404;count ------- 680 (1 row) procman=# select count(catagory) from medusa where host_id = 404 and catagory like 'A%';count ------- 4 (1 row) Thanks again everyone for all the help! Now that I am finished with school for the semester, I should have time to make contributions again ... :) Ryan
> -----Original Message----- > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > That certainly would make sense. We have hesitated to gather more > > > statistics because of the time involved. Fuller statistics > on just the > > > indexed columns could be a big win and be done fairly quickly because > > > the rows are already sorted in the index. > > > > Yeah, a scan over just the index itself would be a perfect way to > > gather stats. The normal objection to it (can't tell whether entries > > correspond to currently-valid tuples) doesn't apply, because we don't > > really care whether the stats are perfectly accurate. > > > > Should put this in TODO, along with something about splitting the > > ANALYZE function out of VACUUM and making it invokable as a separate > > statement. > > Added: > > * Remove ANALYZE from VACUUM so it can be run separately without locks > * Gather more accurate statistics using indexes > Gathering statistics using indexes on-fly is best. However VACUUM(without ANALYZE) already scans all indexes using vc_scanoneind()/vc_vaconeind(). Isn't it availble anyway ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
Ryan Bradetich <ryan_bradetich@hp.com> writes: > This worked great! Is their a place I can change the default to 3? > I do not want to change all the scripts to include this :) See src/include/optimizer/cost.h. However, I am currently thinking of taking Hiroshi's advice instead. Lowering RANDOM_PAGE_COST seems like a bad idea --- if anything, we might want to raise it ;-) >>>> Or how about changing current fudge factor ? >>>> For example,from 0.5 to 0.2 which is the fudge factor of attdisbursion >>>> calculation. >> >> Yes, that's another way --- and probably more defensible than changing >> random_page_cost, now that I think about it. Unfortunately it's a >> hardwired constant and so not as easily experimented with :-(. > Can you give me more information about this? I do not have a problem > re-compiling the database and performing more testing if you would like. The fudge factor in question is currently 0.5, and is used in two places in src/backend/utils/adt/selfuncs.c (looks like lines 193 and 212 in 7.0 sources). I was thinking of dropping it to 0.25 or 0.1. regards, tom lane
Tom Lane wrote: > Ryan Bradetich <ryan_bradetich@hp.com> writes: > > This worked great! Is their a place I can change the default to 3? > > I do not want to change all the scripts to include this :) > > See src/include/optimizer/cost.h. However, I am currently thinking of > taking Hiroshi's advice instead. Lowering RANDOM_PAGE_COST seems like > a bad idea --- if anything, we might want to raise it ;-) > > >>>> Or how about changing current fudge factor ? > >>>> For example,from 0.5 to 0.2 which is the fudge factor of attdisbursion > >>>> calculation. > >> > >> Yes, that's another way --- and probably more defensible than changing > >> random_page_cost, now that I think about it. Unfortunately it's a > >> hardwired constant and so not as easily experimented with :-(. > > > Can you give me more information about this? I do not have a problem > > re-compiling the database and performing more testing if you would like. > > The fudge factor in question is currently 0.5, and is used in two places > in src/backend/utils/adt/selfuncs.c (looks like lines 193 and 212 in 7.0 > sources). I was thinking of dropping it to 0.25 or 0.1. > > regards, tom lane Tom and Hiroshi, I modified the backend to 0.1 and this has been working great! Thanks again for the suggestion, and I'll let you know if we run into a problem. Thanks again! Ryan