Re: Query optimization problem - Mailing list pgsql-hackers

From Zotov
Subject Re: Query optimization problem
Date
Msg-id 4C4839D3.2020805@oe-it.ru
Whole thread Raw
In response to Query optimization problem  (Zotov <zotov@oe-it.ru>)
List pgsql-hackers
 20.07.2010 18:31, Robert Haas: <blockquote cite="mid:AANLkTi=S6GjWxeSnoHOeL4ciBA2LcbR6eEZAyEUMFGLM@mail.gmail.com"
type="cite"><prewrap="">According to the EXPLAIN ANALYZE output, your "slow" query is
 
executing in 0.007 ms, and your "fast" query is executing in 0.026 ms
(i.e. not as quickly as the slow query).  Since you mention that it
takes 7 s further down, I suspect this is not the real EXPLAIN ANALYZE
output on the real data that you're having a problem with.  You might
have better luck if you post the actual EXPLAIN ANALYZE output here.
Incidentally, sorry for not responding sooner to your private email -
I was on vacation last week.  But please do keep all replies on-list
so that everyone can comment.

All that having been said, I think the issue here is that the query
planner isn't inferring that d1.ID=<some constant> implies d2.ID=<some
constant>, even though there's a join clause d1.ID=d2.ID.  I'm not
really sure why it isn't doing that...  I suspect Tom Lane is the only
person who can comment intelligently on that, and he's away this week
(but if anyone else has an idea, feel free to jump in...). </pre></blockquote> Yes, I have a mistake when EXPLAIN
ANALYZEwithout data.. It create another plan, because seq scan were faster. Now I send results on real data (1 million
rows)<br/><br /><b>Slow Query:</b><br /> -------------------------------------------------<br /> test=# EXPLAIN
(ANALYZEon, VERBOSE on, COSTS on, BUFFERS off )SELECT d1.ID, d2.ID<br /> test-# FROM DocPrimary d1<br /> test-#  JOIN
DocPrimaryd2 ON d2.BasedOn=d1.ID<br /> test-# WHERE (d1.ID=234409763<br /> test(# ) OR (d2.ID=234409763);<br />
                                                                       QUERY PLAN<br />
----------------------------------------------------------------------------------------------------------------------------------------------------------<br
/> Merge Join  (cost=2222.72..53967.30 rows=1 width=8) (actual time=6697.847..6697.847 rows=0 loops=1)<br />    Output:
d1.id,d2.id<br />    Merge Cond: (d1.id = d2.basedon)<br />    Join Filter: ((d1.id = 234409763) OR (d2.id =
234409763))<br/>    ->  Index Scan using id_pk on public.docprimary d1  (cost=0.00..37224.48 rows=1076842 width=4)
(actualtime=0.016..3184.474 rows=1076795 loops=1)<br />          Output: d1.id, d1.basedon<br />    ->  Index Scan
usingbasedon_idx on public.docprimary d2  (cost=0.00..46245.14 rows=1076842 width=8) (actual time=0.011..1861.570
rows=235362loops=1)<br />          Output: d2.id, d2.basedon<br />  Total runtime: 6697.968 ms<br /> (9 rows)<br />
-----------------------------------------------<br/><br /><b>Fast Query:</b><br />
----------------------------------------------<br/> test=# EXPLAIN (ANALYZE on, VERBOSE on, COSTS on, BUFFERS off
)SELECTd1.ID, d2.ID<br /> test-# FROM DocPrimary d1<br /> test-#  JOIN DocPrimary d2 ON d2.BasedOn=d1.ID<br /> test-#
WHERE(d1.ID=234409763 and d2.BasedOn=234409763<br /> test(# ) OR (d2.ID=234409763);<br />
                                                          QUERY PLAN<br />
---------------------------------------------------------------------------------------------------------------------------------<br
/> Nested Loop  (cost=9.01..422.70 rows=1 width=8) (actual time=0.145..0.145 rows=0 loops=1)<br />    Output: d1.id,
d2.id<br/>    Join Filter: (((d1.id = 234409763) AND (d2.basedon = 234409763)) OR (d2.id = 234409763))<br />    -> 
BitmapHeap Scan on public.docprimary d2  (cost=9.01..136.90 rows=34 width=8) (actual time=0.141..0.141 rows=0
loops=1)<br/>          Output: d2.id, d2.basedon<br />          Recheck Cond: ((d2.basedon = 234409763) OR (d2.id =
234409763))<br/>          ->  BitmapOr  (cost=9.01..9.01 rows=34 width=0) (actual time=0.136..0.136 rows=0
loops=1)<br/>                ->  Bitmap Index Scan on basedon_idx  (cost=0.00..4.62 rows=33 width=0) (actual
time=0.078..0.078rows=0 loops=1)<br />                      Index Cond: (d2.basedon = 234409763)<br />               
-> Bitmap Index Scan on id_pk  (cost=0.00..4.38 rows=1 width=0) (actual time=0.051..0.051 rows=0 loops=1)<br />
                    Index Cond: (d2.id = 234409763)<br />    ->  Index Scan using id_pk on public.docprimary d1 
(cost=0.00..8.39rows=1 width=4) (never executed)<br />          Output: d1.id, d1.basedon<br />          Index Cond:
(d1.id= d2.basedon)<br />  Total runtime: 0.233 ms<br /> (15 rows)<br /> ------------------------------------------<br
/><br/> I use another fast query:<br /> SELECT d1.ID, d2.ID<br /> FROM DocPrimary d1<br />  JOIN DocPrimary d2 ON
d2.BasedOn=d1.ID<br/> WHERE (<b>d1.ID=234409763 and d2.BasedOn=234409763</b>) OR (d2.ID=234409763)<br /><br /> Bolded
partof query was d2.BasedOn=234409763 I replace it because it can help find way to optimize it automaticaly<br /><br />
Sosorry, but i can`t give programmer to do something in Postgres, because<br /> we don`t use it now as supported DB, we
thinkabout it and do some tests. <br /> It`s very hard and slow task (support another DB, now we use FireBird, and plan
useanother DB, and look for Postgres and MSSQL, maybe support it both as free and commercial DB solution)<br /> And in
ourdepartment only 4 (with me) programmers who can programm on "System Level", and only one of us (doesn`t me) know
C/C++<br /> We all programming on Delphi... If we choose Postgres as free DB platform then I can think about give
programmersfor Postgress development.<br /><br /> In so large letters my English stay more bad :)<br /><br /><br /><br
/>

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: multibyte charater set in levenshtein function
Next
From: Alexander Korotkov
Date:
Subject: Re: multibyte charater set in levenshtein function