Home > mailing lists

Re: Seq scan on join, not on subselect? analyze this - Mailing list pgsql-sql

From	Helio Campos Mello de Andrade
Subject	Re: Seq scan on join, not on subselect? analyze this
Date	November 10, 2008 10:28:44
Msg-id	29e3942f0811100328s438d9595h61ae4ad714fe08c5@mail.gmail.com Whole thread Raw
In response to	Seq scan on join, not on subselect? analyze this (Bryce Nesbitt <bryce2@obviously.com>)
List	pgsql-sql

Tree view

Bryce,<br /><br />  - I think that the difference between the two queries has to do with the way postgresql execute
them.<br/><br />    In the first the SGDB does:<br />         1º Creates a temporary table with "m" X "n" rows where
the"m" and "n" are the number of the rows in the tables been joined.<br />         2º Take only the rows that has the
same"work_key"<br />         3º It restricts using the where clause.<br /><br />         OBS: Maybe It use the where
clausefirst on the tables just to minimize the "m" and "n". I not sure about that. Still it creates and "m" X "n"
temporarytable with lots of bad rows.<br /><br />   In the second query the SGDB:<br />      1º Select in
"article_words"only the rows that correspond with the restriction to that "context_key". It results in a much smaller
numberof rows. "k" <<< "n".<br />      2º It uses "k-results" and look for the for the rows where "word_key"
isin the group created by the INNER Query.<br /><br />That's why you have the difference between the query's "Total
runtime".<br/><br />Regards<br /><br />--<br />Helio Campos Mello de Andrade<br /><br /><br /><br /><div
class="gmail_quote">OnSun, Nov 2, 2008 at 3:51 PM, Bryce Nesbitt <span dir="ltr"><<a
href="mailto:bryce2@obviously.com">bryce2@obviously.com</a>></span>wrote:<br /><blockquote class="gmail_quote"
style="border-left:1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">I'm a bit confused why
thequery planner is not restricting my join, and<br /> not using the index.  Two explain analyze statements follow.<br
/>Why is the second so much better?<br /><br /> lyell5=> select version();<br /> PostgreSQL 8.3.4 on
x86_64-pc-linux-gnu,compiled by GCC cc (GCC) 4.1.2<br /> 20061115 (prerelease) (Debian 4.1.1-21)<br /><br />
lyell5=>explain analyze select * from article_words join words using<br /> (word_key) where context_key=535462;<br
/>
+------------------------------------------------------------------------------------------------------------------------------------------------+<br
/>|<br /> QUERY<br /> PLAN                                                                   |<br />
+------------------------------------------------------------------------------------------------------------------------------------------------+<br
/>| Hash Join  (cost=192092.90..276920.93 rows=45327 width=17) (actual<br /> time=6020.932..60084.817 rows=777<br />
loops=1)                                 |<br /> |   Hash Cond: (article_words.word_key =<br /> words.word_key)<br />
|<br/> |   ->  Index Scan using article_word_idx on article_words<br /> (cost=0.00..55960.50 rows=45327 width=8)
(actualtime=0.031..0.547<br /> rows=777 loops=1) |<br /> |         Index Cond: (context_key =<br /> 535462)<br /> |<br
/>|   ->  Hash  (cost=93819.62..93819.62 rows=5653462 width=13) (actual<br /> time=6020.605..6020.605 rows=5651551
loops=1)                             |<br /> |         ->  Seq Scan on words  (cost=0.00..93819.62 rows=5653462<br
/>width=13) (actual time=0.006..2010.962 rows=5651551<br /> loops=1)                  |<br /> | Total runtime:
60085.616<br/> ms<br /> |<br />
+------------------------------------------------------------------------------------------------------------------------------------------------+<br
/><br/><br /> lyell5=> explain analyze select * from words where word_key in (select<br /> word_key from
article_wordswhere context_key=535462);<br />
+------------------------------------------------------------------------------------------------------------------------------------------------------+<br
/>|<br /> QUERY<br /> PLAN                                                                      |<br />
+------------------------------------------------------------------------------------------------------------------------------------------------------+<br
/>| Nested Loop  (cost=56073.81..56091.41 rows=2 width=13) (actual<br /> time=0.808..4.723 rows=777<br /> loops=1)    
                                             |<br /> |   ->  HashAggregate  (cost=56073.81..56073.83 rows=2 width=4)
(actual<br/> time=0.795..1.072 rows=777<br /> loops=1)                                            |<br /> |        
-> Index Scan using article_word_idx on article_words<br /> (cost=0.00..55960.50 rows=45327 width=4) (actual
time=0.030..0.344<br/> rows=777 loops=1) |<br /> |               Index Cond: (context_key =<br /> 535462)<br /> |<br />
|  ->  Index Scan using words_pkey on words  (cost=0.00..8.78 rows=1<br /> width=13) (actual time=0.003..0.004
rows=1<br/> loops=777)                            |<br /> |         Index Cond: (words.word_key =<br />
article_words.word_key)<br/> |<br /> | Total runtime: 4.936<br /> ms<br /> |<br />
+------------------------------------------------------------------------------------------------------------------------------------------------------+<br
/><fontcolor="#888888"><br /><br /> --<br /> Sent via pgsql-sql mailing list (<a
href="mailto:pgsql-sql@postgresql.org">pgsql-sql@postgresql.org</a>)<br/> To make changes to your subscription:<br /><a
href="http://www.postgresql.org/mailpref/pgsql-sql"target="_blank">http://www.postgresql.org/mailpref/pgsql-sql</a><br
/></font></blockquote></div>

pgsql-sql by date:

From: Richard Huxton
Date: 10 November 2008, 10:05:37
Subject: Re: Query optimizing

From: prakash
Date: 10 November 2008, 10:38:44
Subject: [PERFORM] Can we activate WAL runtime?

Re: Seq scan on join, not on subselect? analyze this - Mailing list pgsql-sql

Previous

Next