Re: Seq scan on join, not on subselect? analyze this - Mailing list pgsql-sql
From | Bryce Nesbitt |
---|---|
Subject | Re: Seq scan on join, not on subselect? analyze this |
Date | |
Msg-id | 490E0A33.3060403@obviously.com Whole thread Raw |
In response to | Re: Seq scan on join, not on subselect? analyze this (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-sql |
Tom Lane wrote:<br /><blockquote cite="mid:4529.1225651101@sss.pgh.pa.us" type="cite"><pre wrap="">Why are the rowcountestimates so far off? Maybe you need to increase the statistics target for this table. regards, tom lane </pre></blockquote> Tom,<br /> How does one tell the rowcountis off in a query plan? I've never found a great reference on interpreting the query analyze output!<br /><br />Upping the stats target made <i>little</i> difference in the estimated row count, but a <i>huge</i> difference in the query. It also mattered<br /> only on the context_key column, not the other columns. Why would that be?<br /><br /><fontsize="-1"><tt><br /> lyell5=> \pset format wrapped<br /> lyell5=> select * from pg_stats where tablename='article_words';<br/> +------------+-------------+----------+-----------+-----------+------------+--------------------+-------------------+---------------------+-------------+<br />| schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation |<br /> +------------+-------------+----------+-----------+-----------+------------+--------------------+-------------------+---------------------+-------------+<br />| public | article_wor | word_key | 0 | 4 | 6361 | {382,226,1155,2130 | {0.003,0.00233333 | {3,623,1376,2074,31| 0.0292482 |<br /> | ; ds ; ,2246,3328,8999,73; ,0.00233333,0.002 ; 54,4601,7269,12770, |<br /> | ; 3,1760,1796} ; 33333,0.00233333, ; 30017,212496,582068 |<br /> | ; 0.00233333,0.0023 ; 0} |<br /> | ; 3333,0.002,0.002, |<br /> | ; 0.002} |<br /> | public | article_wor | context_ | 0 | 4 | 4671 | {639302,113013,133 | {0.00233333,0.001 | {28,42838,92697,140 | 0.867505 |<br /> | ; ds ; key ; 052,211978,508496, ; 66667,0.00166667, ; 684,202950,248442,3 |<br /> | ; 545123,590829,5985; 0.00166667,0.0016 ; 38954,403025,498952 |<br /> | ; 95,649645,37658} ; 6667,0.00166667,0 ; ,584048,654070} |<br /> | ; .00166667,0.00166 |<br /> | ; 667,0.00166667,0. |<br /> | ; 00133333} |<br /> +------------+-------------+----------+-----------+-----------+------------+--------------------+-------------------+---------------------+-------------+<br />lyell5=> analyze verbose article_words;<br /> INFO: "article_words": scanned 3000 of 1125489 pages, containing 555000live rows<br /> and 0 dead rows; 3000 rows in sample, 208,215,465 estimated total rows<br /><br /> lyell5=> setdefault_statistics_target to 500;<br /> lyell5=> analyze verbose article_words;<br /> INFO: "article_words": scanned150000 of 1125489 pages, containing 27749820 live rows<br /> and 180 dead rows; 150000 rows in sample, 208,214,114estimated total rows<br /><br /><br /><br /> But it makes a huge impact on this query:<br /><br /></tt></font><tableborder="1" cellpadding="2" cellspacing="2" width="30%"><tbody><tr><td bgcolor="#cccccc" valign="top">statstarget<font color="#333333"><br /></font></td><td bgcolor="#cccccc" valign="top">sampled<br /></td><tdbgcolor="#cccccc" valign="top">query duration after analyze<br /></td></tr><tr><td valign="top">10<br /></td><tdvalign="top">3000<br /></td><td valign="top">80600ms<br /></td></tr><tr><td valign="top">15<br /></td><td valign="top">4500<br/></td><td valign="top">64000ms</td></tr><tr><td valign="top">20<br /></td><td valign="top">6000<br /></td><tdvalign="top">4.2ms<br /></td></tr><tr><td valign="top">30<br /></td><td valign="top">9000<br /></td><td valign="top">4.2ms<br/></td></tr><tr><td valign="top">250<br /></td><td valign="top">75000<br /></td><td valign="top">4.2ms<br/></td></tr><tr><td valign="top">500<br /></td><td valign="top">?<br /></td><td valign="top">4.2ms<br/></td></tr><tr><td valign="top">1000<br /></td><td valign="top">300000<br /></td><td valign="top">4.2ms<br/></td></tr></tbody></table><font size="-1"><tt><br /> Is there any good reason not to set stats targetto 1000 always?<br /><br /><br /><br /><br /> lyell5=> alter table article_words alter word_key set statistics 30;analyzeverbose article_words;<br /> explain analyze select words.* from article_words join words using (word_key) wherecontext_key=535462;<br /> INFO: "article_words": scanned 9000 of 1125489 pages, containing 1665000 live rows and 0dead rows; 9000 rows in sample, 208215465 estimated total rows<br /> +------------------------------------------------------------------------------------------------------------------------------------------------+<br />| QUERY PLAN |<br /> +------------------------------------------------------------------------------------------------------------------------------------------------+<br />| Nested Loop (cost=0.00..160395.31 rows=16170 width=13) (actual time=0.034..4.025 rows=777 loops=1) |<br /> | -> Index Scan using article_word_idx on article_words (cost=0.00..18258.96 rows=16170 width=4) (actual time=0.023..0.344 rows=777 loops=1) |<br /> | IndexCond: (context_key = 535462) |<br /> | -> Index Scan using words_pkey on words (cost=0.00..8.78 rows=1 width=13) (actual time=0.003..0.004 rows=1 loops=777) |<br /> | Index Cond: (words.word_key = article_words.word_key) |<br /> | Total runtime:4.231 ms |<br/> +------------------------------------------------------------------------------------------------------------------------------------------------+<br /><br/><br /> lyell5=> alter table article_words alter word_key set statistics 10;analyze verbose article_words;<br />explain analyze select words.* from article_words join words using (word_key) where context_key=535462;<br /> INFO: "article_words":scanned 3000 of 1125489 pages, containing 555000 live rows and 0 dead rows; 3000 rows in sample, 208215465estimated total rows<br /> +------------------------------------------------------------------------------------------------------------------------------------------------+<br />| QUERY PLAN |<br /> +------------------------------------------------------------------------------------------------------------------------------------------------+<br />| Hash Join (cost=192089.54..265653.38 rows=44589 width=13) (actual time=18809.184..80685.239 rows=777 loops=1) |<br /> | Hash Cond: (article_words.word_key = words.word_key) |<br /> | -> Index Scan using article_word_idx on article_words (cost=0.00..44717.05 rows=44589 width=4) (actual time=0.025..0.597rows=777 loops=1) |<br /> | Index Cond: (context_key = 535462) |<br /> | -> Hash (cost=93818.13..93818.13 rows=5653313 width=13) (actual time=18808.872..18808.872 rows=5651551 loops=1) |<br /> | -> Seq Scan on words (cost=0.00..93818.13 rows=5653313 width=13)(actual time=0.007..7845.824 rows=5651551 loops=1) |<br /> | Total runtime: 80686.217 ms |<br/> +------------------------------------------------------------------------------------------------------------------------------------------------+<br /><br/></tt></font>