Richard Kut <rkut@intelerad.com> writes:
> Here are the results with enable_seqscan = on:
Hmm, there's something pretty odd about your data, it seems like.
In the planner's preferred hash-join plan, it seems to be necessary
to scan through 654414 rows of pv1 to find the first 5324 rows that
join to pid:
> Limit (cost=27704.04..28790.59 rows=5324 width=691) (actual
> time=3865.677..17464.071 rows=5324 loops=1)
> -> Hash Join (cost=27704.04..264408.14 rows=1159822 width=691) (actual
> time=3865.674..17453.938 rows=5324 loops=1)
> Hash Cond: (("outer".patient_id_internal_id)::text =
> ("inner".patient_id_internal_id)::text)
> -> Seq Scan on pv1 (cost=0.00..80441.22 rows=1159822 width=407)
> (actual time=4.907..9361.791 rows=654414 loops=1)
^^^^^^^^^^^
> -> Hash (cost=14375.03..14375.03 rows=328403 width=284) (actual
> time=3853.090..3853.090 rows=328403 loops=1)
> -> Seq Scan on pid (cost=0.00..14375.03 rows=328403
> width=284) (actual time=6.198..2658.306 rows=328403 loops=1)
> Total runtime: 18120.833 ms
This is strange because all the other results look like there's about a
one-to-one correspondence. In the case that is fast for you, the outer
scan appears to have matched every time:
> Limit (cost=0.00..33408.19 rows=5324 width=691) (actual time=0.054..106.972
> rows=5324 loops=1)
> -> Nested Loop (cost=0.00..7277902.27 rows=1159822 width=691) (actual
> time=0.051..99.230 rows=5324 loops=1)
> -> Index Scan using patient_id_internal_id_idx on pv1
> (cost=0.00..290598.21 rows=1159822 width=407) (actual time=0.020..14.995
> rows=5324 loops=1)
^^^^^^^^^
> -> Index Scan using kdx_pid on pid (cost=0.00..6.01 rows=1
> width=284) (actual time=0.009..0.010 rows=1 loops=5324)
> Index Cond: ((pid.patient_id_internal_id)::text =
> ("outer".patient_id_internal_id)::text)
> Total runtime: 111.137 ms
Do you have any idea why the hash case is so unlucky?
BTW, these examples sure look like they are not being done on exactly
the same tables... the names are different.
regards, tom lane