Thread: Give me a HINT or I'll got crazy ;)

Give me a HINT or I'll got crazy ;)

From

wstrzalka

Date:

08 October 2009, 13:36:38

So the query is:

SELECT m.messageid, mad.destination
FROM messages AS m
      LEFT JOIN message_address_link AS mal ON (mal.message_id =
m.messageid)
      JOIN message_address_data AS mad ON (mad.id = mal.address_id)
WHERE  delete_status <> 1
AND  folderid=E'200702210742181172061738846603000'


the structure doesn't really matter - it's intuitive m:n relation with
primary key's indexes on join columns and on 'folderid' column
and the plan is like:

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=317.55..17771.38 rows=95249 width=36) (actual
time=1116.358..1356.393 rows=152 loops=1)
   Hash Cond: (mal.address_id = mad.id)
   ->  Hash Join  (cost=101.53..15650.39 rows=95249 width=8) (actual
time=1102.977..1342.675 rows=152 loops=1)
         Hash Cond: (mal.message_id = m.messageid)
         ->  Seq Scan on message_address_link mal
(cost=0.00..11738.90 rows=761990 width=8) (actual time=0.059..666.597
rows=761990 loops=1)
         ->  Hash  (cost=101.22..101.22 rows=25 width=4) (actual
time=0.207..0.207 rows=39 loops=1)
               ->  Bitmap Heap Scan on messages m  (cost=4.46..101.22
rows=25 width=4) (actual time=0.064..0.163 rows=39 loops=1)
                     Recheck Cond: (folderid =
'200702210742181172061738846603000'::text)
                     Filter: (delete_status <> 1)
                     ->  Bitmap Index Scan on messages_folderid_idx
(cost=0.00..4.46 rows=25 width=0) (actual time=0.047..0.047 rows=39
loops=1)
                           Index Cond: (folderid =
'200702210742181172061738846603000'::text)
   ->  Hash  (cost=133.23..133.23 rows=6623 width=36) (actual
time=13.353..13.353 rows=6623 loops=1)
         ->  Seq Scan on message_address_data mad  (cost=0.00..133.23
rows=6623 width=36) (actual time=0.008..6.443 rows=6623 loops=1)
 Total runtime: 1356.600 ms


Prior to the playing with statistics target (it was 100 by default) I
was able to go with the time to 30ms by adding to the query such a
condition:

--------- AND m.messageid BETWEEN 1 AND 1000000000 -----------

which was more like a hint then a real condition in this case.
After some stats playing i'm not able to reproduce this, however I
remember the messages table was scanned with index first ( it returns
only several tuples) and the joined with message_address_link &
message_address_data - while the current version performs full scan on
the message_address_link table where there is over 700k tuples and
then plays with it.

messages 75k tuples
message_address_link 750k tuples
message_address_data - 6k tuples


I know PG community don't want hints - so how would you bite this
case. I saw it's possible that this query will be fast. The question
is - how ?

Re: Give me a HINT or I'll got crazy ;)

From

Tom Lane

Date:

08 October 2009, 14:39:05

wstrzalka <wstrzalka@gmail.com> writes:
> Prior to the playing with statistics target (it was 100 by default) I
> was able to go with the time to 30ms by adding to the query such a
> condition:

So what sort of "playing" did you do?  It looks to me like the core of
the problem is the sucky join size estimate here:

>    ->  Hash Join  (cost=101.53..15650.39 rows=95249 width=8) (actual
> time=1102.977..1342.675 rows=152 loops=1)
>          Hash Cond: (mal.message_id = m.messageid)

If it were correctly estimating that only a few message_address_link
rows would join to each messages row, it'd probably do the right thing.
But it seems to think there will be thousands of joins for each one...

            regards, tom lane

Re: Give me a HINT or I'll got crazy ;)

From

Wojciech Strzałka

Date:

09 October 2009, 04:57:01

  In the madness I did

  - setting statistics target to 1000 for all join & filter columns
  - cluster the playing tables
  - reindex the playing tables
  - analyze the playing tables

  and it helped now. I'm at ~50ms which satisfies me completely.

  If no hints - some debug for explain would be great to be able to
  track what's wrong for such a lame developers like me ;)
  The problem is solved but I can not tell that I understand why it was
  wrong before - and why it's OK now :(



> wstrzalka <wstrzalka@gmail.com> writes:
>> Prior to the playing with statistics target (it was 100 by default) I
>> was able to go with the time to 30ms by adding to the query such a
>> condition:

> So what sort of "playing" did you do?  It looks to me like the core of
> the problem is the sucky join size estimate here:

>>    ->  Hash Join  (cost=101.53..15650.39 rows=95249 width=8) (actual
>> time=1102.977..1342.675 rows=152 loops=1)
>>          Hash Cond: (mal.message_id = m.messageid)

> If it were correctly estimating that only a few message_address_link
> rows would join to each messages row, it'd probably do the right thing.
> But it seems to think there will be thousands of joins for each one...

>                         regards, tom lane



--
Pozdrowienia,
 Wojciech Strzałka