Re: Seq scan on join, not on subselect? analyze this - Mailing list pgsql-sql
| From | Bryce Nesbitt |
|---|---|
| Subject | Re: Seq scan on join, not on subselect? analyze this |
| Date | |
| Msg-id | 490E0A33.3060403@obviously.com Whole thread Raw |
| In response to | Re: Seq scan on join, not on subselect? analyze this (Tom Lane <tgl@sss.pgh.pa.us>) |
| List | pgsql-sql |
Tom Lane wrote:<br /><blockquote cite="mid:4529.1225651101@sss.pgh.pa.us" type="cite"><pre wrap="">Why are the
rowcountestimates so far off? Maybe you need to increase
the statistics target for this table. regards, tom lane </pre></blockquote> Tom,<br /> How does one tell the
rowcountis off in a query plan? I've never found a great reference on interpreting the query analyze output!<br /><br
/>Upping the stats target made <i>little</i> difference in the estimated row count, but a <i>huge</i> difference in the
query. It also mattered<br /> only on the context_key column, not the other columns. Why would that be?<br /><br
/><fontsize="-1"><tt><br /> lyell5=> \pset format wrapped<br /> lyell5=> select * from pg_stats where
tablename='article_words';<br/>
+------------+-------------+----------+-----------+-----------+------------+--------------------+-------------------+---------------------+-------------+<br
/>| schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs
| histogram_bounds | correlation |<br />
+------------+-------------+----------+-----------+-----------+------------+--------------------+-------------------+---------------------+-------------+<br
/>| public | article_wor | word_key | 0 | 4 | 6361 | {382,226,1155,2130 | {0.003,0.00233333 |
{3,623,1376,2074,31| 0.0292482 |<br /> | ; ds ;
,2246,3328,8999,73; ,0.00233333,0.002 ; 54,4601,7269,12770, |<br />
| ; 3,1760,1796} ; 33333,0.00233333, ;
30017,212496,582068 |<br />
| ; 0.00233333,0.0023 ;
0} |<br />
| ;
3333,0.002,0.002, |<br />
| ;
0.002} |<br /> | public | article_wor | context_ | 0
| 4 | 4671 | {639302,113013,133 | {0.00233333,0.001 | {28,42838,92697,140 | 0.867505 |<br />
| ; ds ; key ; 052,211978,508496, ; 66667,0.00166667, ;
684,202950,248442,3 |<br /> | ;
545123,590829,5985; 0.00166667,0.0016 ; 38954,403025,498952 |<br />
| ; 95,649645,37658} ; 6667,0.00166667,0 ;
,584048,654070} |<br />
| ;
.00166667,0.00166 |<br />
| ;
667,0.00166667,0. |<br />
| ;
00133333} |<br />
+------------+-------------+----------+-----------+-----------+------------+--------------------+-------------------+---------------------+-------------+<br
/>lyell5=> analyze verbose article_words;<br /> INFO: "article_words": scanned 3000 of 1125489 pages, containing
555000live rows<br /> and 0 dead rows; 3000 rows in sample, 208,215,465 estimated total rows<br /><br /> lyell5=>
setdefault_statistics_target to 500;<br /> lyell5=> analyze verbose article_words;<br /> INFO: "article_words":
scanned150000 of 1125489 pages, containing 27749820 live rows<br /> and 180 dead rows; 150000 rows in sample,
208,214,114estimated total rows<br /><br /><br /><br /> But it makes a huge impact on this query:<br /><br
/></tt></font><tableborder="1" cellpadding="2" cellspacing="2" width="30%"><tbody><tr><td bgcolor="#cccccc"
valign="top">statstarget<font color="#333333"><br /></font></td><td bgcolor="#cccccc" valign="top">sampled<br
/></td><tdbgcolor="#cccccc" valign="top">query duration after analyze<br /></td></tr><tr><td valign="top">10<br
/></td><tdvalign="top">3000<br /></td><td valign="top">80600ms<br /></td></tr><tr><td valign="top">15<br /></td><td
valign="top">4500<br/></td><td valign="top">64000ms</td></tr><tr><td valign="top">20<br /></td><td valign="top">6000<br
/></td><tdvalign="top">4.2ms<br /></td></tr><tr><td valign="top">30<br /></td><td valign="top">9000<br /></td><td
valign="top">4.2ms<br/></td></tr><tr><td valign="top">250<br /></td><td valign="top">75000<br /></td><td
valign="top">4.2ms<br/></td></tr><tr><td valign="top">500<br /></td><td valign="top">?<br /></td><td
valign="top">4.2ms<br/></td></tr><tr><td valign="top">1000<br /></td><td valign="top">300000<br /></td><td
valign="top">4.2ms<br/></td></tr></tbody></table><font size="-1"><tt><br /> Is there any good reason not to set stats
targetto 1000 always?<br /><br /><br /><br /><br /> lyell5=> alter table article_words alter word_key set statistics
30;analyzeverbose article_words;<br /> explain analyze select words.* from article_words join words using (word_key)
wherecontext_key=535462;<br /> INFO: "article_words": scanned 9000 of 1125489 pages, containing 1665000 live rows and
0dead rows; 9000 rows in sample, 208215465 estimated total rows<br />
+------------------------------------------------------------------------------------------------------------------------------------------------+<br
/>| QUERY
PLAN |<br />
+------------------------------------------------------------------------------------------------------------------------------------------------+<br
/>| Nested Loop (cost=0.00..160395.31 rows=16170 width=13) (actual time=0.034..4.025 rows=777
loops=1) |<br /> | -> Index Scan using article_word_idx on
article_words (cost=0.00..18258.96 rows=16170 width=4) (actual time=0.023..0.344 rows=777 loops=1) |<br /> |
IndexCond: (context_key =
535462) |<br /> |
-> Index Scan using words_pkey on words (cost=0.00..8.78 rows=1 width=13) (actual time=0.003..0.004 rows=1
loops=777) |<br /> | Index Cond: (words.word_key =
article_words.word_key) |<br /> | Total
runtime:4.231
ms
|<br/>
+------------------------------------------------------------------------------------------------------------------------------------------------+<br
/><br/><br /> lyell5=> alter table article_words alter word_key set statistics 10;analyze verbose article_words;<br
/>explain analyze select words.* from article_words join words using (word_key) where context_key=535462;<br /> INFO:
"article_words":scanned 3000 of 1125489 pages, containing 555000 live rows and 0 dead rows; 3000 rows in sample,
208215465estimated total rows<br />
+------------------------------------------------------------------------------------------------------------------------------------------------+<br
/>| QUERY
PLAN |<br />
+------------------------------------------------------------------------------------------------------------------------------------------------+<br
/>| Hash Join (cost=192089.54..265653.38 rows=44589 width=13) (actual time=18809.184..80685.239 rows=777
loops=1) |<br /> | Hash Cond: (article_words.word_key =
words.word_key) |<br /> |
-> Index Scan using article_word_idx on article_words (cost=0.00..44717.05 rows=44589 width=4) (actual
time=0.025..0.597rows=777 loops=1) |<br /> | Index Cond: (context_key =
535462) |<br /> |
-> Hash (cost=93818.13..93818.13 rows=5653313 width=13) (actual time=18808.872..18808.872 rows=5651551
loops=1) |<br /> | -> Seq Scan on words (cost=0.00..93818.13 rows=5653313
width=13)(actual time=0.007..7845.824 rows=5651551 loops=1) |<br /> | Total runtime: 80686.217
ms
|<br/>
+------------------------------------------------------------------------------------------------------------------------------------------------+<br
/><br/></tt></font>