Give me a HINT or I'll got crazy ;) - Mailing list pgsql-general

From wstrzalka
Subject Give me a HINT or I'll got crazy ;)
Date
Msg-id a4f02dc6-7e2f-45a0-88a7-4b95915e66b3@e34g2000vbm.googlegroups.com
Whole thread Raw
Responses Re: Give me a HINT or I'll got crazy ;)
List pgsql-general
So the query is:

SELECT m.messageid, mad.destination
FROM messages AS m
      LEFT JOIN message_address_link AS mal ON (mal.message_id =
m.messageid)
      JOIN message_address_data AS mad ON (mad.id = mal.address_id)
WHERE  delete_status <> 1
AND  folderid=E'200702210742181172061738846603000'


the structure doesn't really matter - it's intuitive m:n relation with
primary key's indexes on join columns and on 'folderid' column
and the plan is like:

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=317.55..17771.38 rows=95249 width=36) (actual
time=1116.358..1356.393 rows=152 loops=1)
   Hash Cond: (mal.address_id = mad.id)
   ->  Hash Join  (cost=101.53..15650.39 rows=95249 width=8) (actual
time=1102.977..1342.675 rows=152 loops=1)
         Hash Cond: (mal.message_id = m.messageid)
         ->  Seq Scan on message_address_link mal
(cost=0.00..11738.90 rows=761990 width=8) (actual time=0.059..666.597
rows=761990 loops=1)
         ->  Hash  (cost=101.22..101.22 rows=25 width=4) (actual
time=0.207..0.207 rows=39 loops=1)
               ->  Bitmap Heap Scan on messages m  (cost=4.46..101.22
rows=25 width=4) (actual time=0.064..0.163 rows=39 loops=1)
                     Recheck Cond: (folderid =
'200702210742181172061738846603000'::text)
                     Filter: (delete_status <> 1)
                     ->  Bitmap Index Scan on messages_folderid_idx
(cost=0.00..4.46 rows=25 width=0) (actual time=0.047..0.047 rows=39
loops=1)
                           Index Cond: (folderid =
'200702210742181172061738846603000'::text)
   ->  Hash  (cost=133.23..133.23 rows=6623 width=36) (actual
time=13.353..13.353 rows=6623 loops=1)
         ->  Seq Scan on message_address_data mad  (cost=0.00..133.23
rows=6623 width=36) (actual time=0.008..6.443 rows=6623 loops=1)
 Total runtime: 1356.600 ms


Prior to the playing with statistics target (it was 100 by default) I
was able to go with the time to 30ms by adding to the query such a
condition:

--------- AND m.messageid BETWEEN 1 AND 1000000000 -----------

which was more like a hint then a real condition in this case.
After some stats playing i'm not able to reproduce this, however I
remember the messages table was scanned with index first ( it returns
only several tuples) and the joined with message_address_link &
message_address_data - while the current version performs full scan on
the message_address_link table where there is over 700k tuples and
then plays with it.

messages 75k tuples
message_address_link 750k tuples
message_address_data - 6k tuples


I know PG community don't want hints - so how would you bite this
case. I saw it's possible that this query will be fast. The question
is - how ?





pgsql-general by date:

Previous
From: Sam Mason
Date:
Subject: Re: Query inside RTF
Next
From: Daniel Shane
Date:
Subject: Tips/Hacks to create minial DB from the execution of several (simple) SQL requests.