Re: Cost Issue - How do I force a Hash Join - Mailing list pgsql-performance

From Virag Saksena
Subject Re: Cost Issue - How do I force a Hash Join
Date
Msg-id 02a101c636b0$c08e19c0$3100000a@demo1
Whole thread Raw
In response to Cost Issue - How do I force a Hash Join  ("Virag Saksena" <virag@auptyma.com>)
List pgsql-performance
Tables are analyzed, though I would love to find a way to increase it's
accuracy of statistics
Tried raising the statistics target upto 100, but it did not help. Should I
bump it even more

However I found that if I add depth to the group by clauses, it somehow
tells the optimizer that it would get more than 1 row
and it goes to a Hash Join ....
For this query, only rows with one value of depth are accessed, so we are
okay ... but I would like to see if there is some other
way I can get a better approximation for the costs

 Sort (cost=25214.36..25214.39 rows=10 width=958) (actual
time=9798.860..9815.670 rows=6115 loops=1)
  Sort Key: (o1.totmem - COALESCE(o2.totmem, 0::bigint))
  ->Hash Left Join (cost=25213.83..25214.19 rows=10 width=958) (actual
time=8526.248..9755.721 rows=6115 loops=1)
    Hash Cond: ((("outer".objsig)::text = ("inner".objsig)::text) AND
(("outer".objtype)::text = ("inner".objtype)::text) AND
(("outer".fieldname)::text = ("inner".fieldname)::text))
    ->Subquery Scan o1 (cost=18993.48..18993.66 rows=10 width=990) (actual
time=6059.880..6145.223 rows=6115 loops=1)
      ->HashAggregate (cost=18993.48..18993.56 rows=10 width=46) (actual
time=6059.871..6094.897 rows=6115 loops=1)
        ->Nested Loop (cost=0.00..18993.22 rows=15 width=46) (actual
time=45.510..5980.807 rows=6338 loops=1)
          ->Index Scan using jam_heaprel_n1 on jam_heaprel ar
(cost=0.00..18932.01 rows=10 width=19) (actual time=45.374..205.520
rows=6338 loops=1)
            Index Cond: ((heap_id = 1) AND (parentaddr = 0))
            Filter: ((fieldname)::text = 'K'::text)
          ->Index Scan using jam_heapobj_u1 on jam_heapobj ao
(cost=0.00..6.10 rows=2 width=43) (actual time=0.885..0.890 rows=1
loops=6338)
            Index Cond: ((ao.heap_id = 1) AND (ao.objaddr =
"outer".childaddr))
    ->Hash (cost=6220.34..6220.34 rows=2 width=982) (actual
time=2466.178..2466.178 rows=0 loops=1)
      ->Subquery Scan o2 (cost=6220.30..6220.34 rows=2 width=982) (actual
time=2225.242..2433.744 rows=6038 loops=1)
        ->HashAggregate (cost=6220.30..6220.32 rows=2 width=46) (actual
time=2225.233..2366.890 rows=6038 loops=1)
          ->Nested Loop (cost=0.00..6220.27 rows=2 width=46) (actual
time=0.449..2149.257 rows=6259 loops=1)
            ->Index Scan using jam_heaprel_n1 on jam_heaprel br
(cost=0.00..6202.89 rows=4 width=19) (actual time=0.296..51.310 rows=6259
loops=1)
              Index Cond: ((heap_id = 0) AND (parentaddr = 0))
              Filter: ((fieldname)::text = 'K'::text)
            ->Index Scan using jam_heapobj_u1 on jam_heapobj bo
(cost=0.00..4.33 rows=1 width=43) (actual time=0.294..0.300 rows=1
loops=6259)
              Index Cond: ((bo.heap_id = 0) AND (bo.objaddr =
"outer".childaddr))
 Total runtime: 9950.192 ms

Regards,

Virag

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Virag Saksena" <virag@auptyma.com>
Cc: <pgsql-performance@postgresql.org>
Sent: Monday, February 20, 2006 9:35 PM
Subject: Re: [PERFORM] Cost Issue - How do I force a Hash Join


> "Virag Saksena" <virag@auptyma.com> writes:
> > The individual queries run in 50-300 ms. However the optimizer is =
> > choosing a nested loop to join them rather than a Hash join
> > causing the complete query to take 500+ seconds. It expects that it will
=
> > get 1 row out from each of the sources, but here is gets
> > several thousand rows.
>
> The best approach is to see if you can't fix that estimation error.
> Are the stats up to date on these tables?  If so, maybe raising the
> statistics targets would help.
>
> regards, tom lane
>


pgsql-performance by date:

Previous
From: "Craig A. James"
Date:
Subject: Re: Cost Issue - How do I force a Hash Join
Next
From: "Ibrahim Tekin"
Date:
Subject: LIKE query on indexes