Re: Seq scan on join, not on subselect? analyze this - Mailing list pgsql-sql

From Bryce Nesbitt
Subject Re: Seq scan on join, not on subselect? analyze this
Date
Msg-id 490E0A33.3060403@obviously.com
Whole thread Raw
In response to Re: Seq scan on join, not on subselect? analyze this  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-sql
Tom Lane wrote:<br /><blockquote cite="mid:4529.1225651101@sss.pgh.pa.us" type="cite"><pre wrap="">Why are the
rowcountestimates so far off?  Maybe you need to increase
 
the statistics target for this table.        regards, tom lane </pre></blockquote> Tom,<br /> How does one tell the
rowcountis off in a query plan?  I've never found a great reference on interpreting the query analyze output!<br /><br
/>Upping the stats target made <i>little</i> difference in the estimated row count, but a <i>huge</i> difference in the
query. It also mattered<br /> only on the context_key column, not the other columns.  Why would that be?<br /><br
/><fontsize="-1"><tt><br /> lyell5=> \pset format wrapped<br /> lyell5=> select * from pg_stats where
tablename='article_words';<br/>
+------------+-------------+----------+-----------+-----------+------------+--------------------+-------------------+---------------------+-------------+<br
/>| schemaname |  tablename  | attname  | null_frac | avg_width | n_distinct |  most_common_vals  | most_common_freqs
| histogram_bounds   | correlation |<br />
+------------+-------------+----------+-----------+-----------+------------+--------------------+-------------------+---------------------+-------------+<br
/>| public     | article_wor | word_key |         0 |         4 |       6361 | {382,226,1155,2130 | {0.003,0.00233333 |
{3,623,1376,2074,31|   0.0292482 |<br /> |            ; ds                                                          ;
,2246,3328,8999,73; ,0.00233333,0.002 ; 54,4601,7269,12770,               |<br />
|                                                                         ; 3,1760,1796}       ; 33333,0.00233333, ;
30017,212496,582068              |<br />
|                                                                                              ; 0.00233333,0.0023 ;
0}                               |<br />
|                                                                                              ;
3333,0.002,0.002,                                    |<br />
|                                                                                              ;
0.002}                                               |<br /> | public     | article_wor | context_ |         0
|        4 |       4671 | {639302,113013,133 | {0.00233333,0.001 | {28,42838,92697,140 |    0.867505 |<br />
|           ; ds          ; key                                           ; 052,211978,508496, ; 66667,0.00166667, ;
684,202950,248442,3              |<br /> |                                                                          ;
545123,590829,5985; 0.00166667,0.0016 ; 38954,403025,498952               |<br />
|                                                                         ; 95,649645,37658}   ; 6667,0.00166667,0 ;
,584048,654070}                  |<br />
|                                                                                              ;
.00166667,0.00166                                    |<br />
|                                                                                              ;
667,0.00166667,0.                                    |<br />
|                                                                                              ;
00133333}                                            |<br />
+------------+-------------+----------+-----------+-----------+------------+--------------------+-------------------+---------------------+-------------+<br
/>lyell5=> analyze verbose article_words;<br /> INFO:  "article_words": scanned 3000 of 1125489 pages, containing
555000live rows<br /> and 0 dead rows; 3000 rows in sample, 208,215,465 estimated total rows<br /><br /> lyell5=>
setdefault_statistics_target to 500;<br /> lyell5=> analyze verbose article_words;<br /> INFO:  "article_words":
scanned150000 of 1125489 pages, containing 27749820 live rows<br /> and 180 dead rows; 150000 rows in sample,
208,214,114estimated total rows<br /><br /><br /><br /> But it makes a huge impact on this query:<br /><br
/></tt></font><tableborder="1" cellpadding="2" cellspacing="2" width="30%"><tbody><tr><td bgcolor="#cccccc"
valign="top">statstarget<font color="#333333"><br /></font></td><td bgcolor="#cccccc" valign="top">sampled<br
/></td><tdbgcolor="#cccccc" valign="top">query duration after analyze<br /></td></tr><tr><td valign="top">10<br
/></td><tdvalign="top">3000<br /></td><td valign="top">80600ms<br /></td></tr><tr><td valign="top">15<br /></td><td
valign="top">4500<br/></td><td valign="top">64000ms</td></tr><tr><td valign="top">20<br /></td><td valign="top">6000<br
/></td><tdvalign="top">4.2ms<br /></td></tr><tr><td valign="top">30<br /></td><td valign="top">9000<br /></td><td
valign="top">4.2ms<br/></td></tr><tr><td valign="top">250<br /></td><td valign="top">75000<br /></td><td
valign="top">4.2ms<br/></td></tr><tr><td valign="top">500<br /></td><td valign="top">?<br /></td><td
valign="top">4.2ms<br/></td></tr><tr><td valign="top">1000<br /></td><td valign="top">300000<br /></td><td
valign="top">4.2ms<br/></td></tr></tbody></table><font size="-1"><tt><br /> Is there any good reason not to set stats
targetto 1000 always?<br /><br /><br /><br /><br /> lyell5=> alter table article_words alter word_key set statistics
30;analyzeverbose article_words;<br /> explain analyze select words.* from article_words join words using (word_key)
wherecontext_key=535462;<br /> INFO:  "article_words": scanned 9000 of 1125489 pages, containing 1665000 live rows and
0dead rows; 9000 rows in sample, 208215465 estimated total rows<br />
+------------------------------------------------------------------------------------------------------------------------------------------------+<br
/>|                                                                   QUERY
PLAN                                                                  |<br />
+------------------------------------------------------------------------------------------------------------------------------------------------+<br
/>| Nested Loop  (cost=0.00..160395.31 rows=16170 width=13) (actual time=0.034..4.025 rows=777
loops=1)                                           |<br /> |   ->  Index Scan using article_word_idx on
article_words (cost=0.00..18258.96 rows=16170 width=4) (actual time=0.023..0.344 rows=777 loops=1) |<br /> |        
IndexCond: (context_key =
535462)                                                                                                    |<br /> |  
-> Index Scan using words_pkey on words  (cost=0.00..8.78 rows=1 width=13) (actual time=0.003..0.004 rows=1
loops=777)                     |<br /> |         Index Cond: (words.word_key =
article_words.word_key)                                                                                 |<br /> | Total
runtime:4.231
ms                                                                                                                       
|<br/>
+------------------------------------------------------------------------------------------------------------------------------------------------+<br
/><br/><br /> lyell5=> alter table article_words alter word_key set statistics 10;analyze verbose article_words;<br
/>explain analyze select words.* from article_words join words using (word_key) where context_key=535462;<br /> INFO: 
"article_words":scanned 3000 of 1125489 pages, containing 555000 live rows and 0 dead rows; 3000 rows in sample,
208215465estimated total rows<br />
+------------------------------------------------------------------------------------------------------------------------------------------------+<br
/>|                                                                   QUERY
PLAN                                                                  |<br />
+------------------------------------------------------------------------------------------------------------------------------------------------+<br
/>| Hash Join  (cost=192089.54..265653.38 rows=44589 width=13) (actual time=18809.184..80685.239 rows=777
loops=1)                                |<br /> |   Hash Cond: (article_words.word_key =
words.word_key)                                                                                        |<br /> |  
-> Index Scan using article_word_idx on article_words  (cost=0.00..44717.05 rows=44589 width=4) (actual
time=0.025..0.597rows=777 loops=1) |<br /> |         Index Cond: (context_key =
535462)                                                                                                    |<br /> |  
-> Hash  (cost=93818.13..93818.13 rows=5653313 width=13) (actual time=18808.872..18808.872 rows=5651551
loops=1)                           |<br /> |         ->  Seq Scan on words  (cost=0.00..93818.13 rows=5653313
width=13)(actual time=0.007..7845.824 rows=5651551 loops=1)                  |<br /> | Total runtime: 80686.217
ms                                                                                                                   
|<br/>
+------------------------------------------------------------------------------------------------------------------------------------------------+<br
/><br/></tt></font> 

pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: Seq scan on join, not on subselect? analyze this
Next
From: Sebastian Böhm
Date:
Subject: reliable lock inside stored procedure