Re: Followup - expression (functional) index use in joins - Mailing list pgsql-performance
From | Roger Ging |
---|---|
Subject | Re: Followup - expression (functional) index use in joins |
Date | |
Msg-id | 3FCB7708.1020305@musicreports.com Whole thread Raw |
In response to | Re: Followup - expression (functional) index use in joins (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-performance |
Tom,
Turning enable_hashjoin off made the query run as it had on v7.3. We have worked around this by changing the index from a function call to a direct index on a new column with the results of the function maintained by a trigger. Would there be performance issues from leaving enable_hashjoin off, or do you recomend enabling it, and working around function calls in indices?
See results below. I was not sure if I was supposed to reply-all, or just to the list. Sorry if the protocol is incorrect.
ppl=# explain analyse select title from music.program p
ppl-# join music.logfile l on
ppl-# l.program_id = music.fn_mri_id_no_program(p.mri_id_no)
ppl-# where l.air_date = '01/30/2001'
ppl-# and l.station = 'KABC';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=69.89..19157.06 rows=2322 width=28) (actual time=500.905..1473.748 rows=242 loops=1)
Hash Cond: ((music.fn_mri_id_no_program("outer".mri_id_no))::text = ("inner".program_id)::text)
-> Seq Scan on program p (cost=0.00..16888.98 rows=173998 width=40) (actual time=98.371..532.184 rows=173998 loops=1)
-> Hash (cost=69.84..69.84 rows=17 width=9) (actual time=65.817..65.817 rows=0 loops=1)
-> Index Scan using idx_logfile_station_air_date on logfile l (cost=0.00..69.84 rows=17 width=9) (actual time=24.499..65.730 rows=32 loops=1)
Index Cond: (((station)::text = 'KABC'::text) AND (air_date = '2001-01-30 00:00:00'::timestamp without time zone))
Total runtime: 1474.067 ms
(7 rows)
ppl=# set enable_mergejoin = false;
SET
ppl=# explain analyse select title from music.program p
ppl-# join music.logfile l on
ppl-# l.program_id = music.fn_mri_id_no_program(p.mri_id_no)
ppl-# where l.air_date = '01/30/2001'
ppl-# and l.station = 'KABC';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=69.89..19157.06 rows=2322 width=28) (actual time=444.834..1428.815 rows=242 loops=1)
Hash Cond: ((music.fn_mri_id_no_program("outer".mri_id_no))::text = ("inner".program_id)::text)
-> Seq Scan on program p (cost=0.00..16888.98 rows=173998 width=40) (actual time=105.977..542.870 rows=173998 loops=1)
-> Hash (cost=69.84..69.84 rows=17 width=9) (actual time=1.197..1.197 rows=0 loops=1)
-> Index Scan using idx_logfile_station_air_date on logfile l (cost=0.00..69.84 rows=17 width=9) (actual time=0.574..1.151 rows=32 loops=1)
Index Cond: (((station)::text = 'KABC'::text) AND (air_date = '2001-01-30 00:00:00'::timestamp without time zone))
Total runtime: 1429.111 ms
(7 rows)
ppl=# set enable_hashjoin = false;
SET
ppl=# explain analyse select title from music.program p
ppl-# join music.logfile l on
ppl-# l.program_id = music.fn_mri_id_no_program(p.mri_id_no)
ppl-# where l.air_date = '01/30/2001'
ppl-# and l.station = 'KABC';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..58104.34 rows=2322 width=28) (actual time=0.480..5.357 rows=242 loops=1)
-> Index Scan using idx_logfile_station_air_date on logfile l (cost=0.00..69.84 rows=17 width=9) (actual time=0.176..0.754 rows=32 loops=1)
Index Cond: (((station)::text = 'KABC'::text) AND (air_date = '2001-01-30 00:00:00'::timestamp without time zone))
-> Index Scan using idx_program_mri_id_no_program on program p (cost=0.00..3400.74 rows=870 width=40) (actual time=0.041..0.127 rows=8 loops=32)
Index Cond: (("outer".program_id)::text = (music.fn_mri_id_no_program(p.mri_id_no))::text)
Total runtime: 5.637 ms
(6 rows)
Tom Lane wrote:
Turning enable_hashjoin off made the query run as it had on v7.3. We have worked around this by changing the index from a function call to a direct index on a new column with the results of the function maintained by a trigger. Would there be performance issues from leaving enable_hashjoin off, or do you recomend enabling it, and working around function calls in indices?
See results below. I was not sure if I was supposed to reply-all, or just to the list. Sorry if the protocol is incorrect.
ppl=# explain analyse select title from music.program p
ppl-# join music.logfile l on
ppl-# l.program_id = music.fn_mri_id_no_program(p.mri_id_no)
ppl-# where l.air_date = '01/30/2001'
ppl-# and l.station = 'KABC';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=69.89..19157.06 rows=2322 width=28) (actual time=500.905..1473.748 rows=242 loops=1)
Hash Cond: ((music.fn_mri_id_no_program("outer".mri_id_no))::text = ("inner".program_id)::text)
-> Seq Scan on program p (cost=0.00..16888.98 rows=173998 width=40) (actual time=98.371..532.184 rows=173998 loops=1)
-> Hash (cost=69.84..69.84 rows=17 width=9) (actual time=65.817..65.817 rows=0 loops=1)
-> Index Scan using idx_logfile_station_air_date on logfile l (cost=0.00..69.84 rows=17 width=9) (actual time=24.499..65.730 rows=32 loops=1)
Index Cond: (((station)::text = 'KABC'::text) AND (air_date = '2001-01-30 00:00:00'::timestamp without time zone))
Total runtime: 1474.067 ms
(7 rows)
ppl=# set enable_mergejoin = false;
SET
ppl=# explain analyse select title from music.program p
ppl-# join music.logfile l on
ppl-# l.program_id = music.fn_mri_id_no_program(p.mri_id_no)
ppl-# where l.air_date = '01/30/2001'
ppl-# and l.station = 'KABC';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=69.89..19157.06 rows=2322 width=28) (actual time=444.834..1428.815 rows=242 loops=1)
Hash Cond: ((music.fn_mri_id_no_program("outer".mri_id_no))::text = ("inner".program_id)::text)
-> Seq Scan on program p (cost=0.00..16888.98 rows=173998 width=40) (actual time=105.977..542.870 rows=173998 loops=1)
-> Hash (cost=69.84..69.84 rows=17 width=9) (actual time=1.197..1.197 rows=0 loops=1)
-> Index Scan using idx_logfile_station_air_date on logfile l (cost=0.00..69.84 rows=17 width=9) (actual time=0.574..1.151 rows=32 loops=1)
Index Cond: (((station)::text = 'KABC'::text) AND (air_date = '2001-01-30 00:00:00'::timestamp without time zone))
Total runtime: 1429.111 ms
(7 rows)
ppl=# set enable_hashjoin = false;
SET
ppl=# explain analyse select title from music.program p
ppl-# join music.logfile l on
ppl-# l.program_id = music.fn_mri_id_no_program(p.mri_id_no)
ppl-# where l.air_date = '01/30/2001'
ppl-# and l.station = 'KABC';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..58104.34 rows=2322 width=28) (actual time=0.480..5.357 rows=242 loops=1)
-> Index Scan using idx_logfile_station_air_date on logfile l (cost=0.00..69.84 rows=17 width=9) (actual time=0.176..0.754 rows=32 loops=1)
Index Cond: (((station)::text = 'KABC'::text) AND (air_date = '2001-01-30 00:00:00'::timestamp without time zone))
-> Index Scan using idx_program_mri_id_no_program on program p (cost=0.00..3400.74 rows=870 width=40) (actual time=0.041..0.127 rows=8 loops=32)
Index Cond: (("outer".program_id)::text = (music.fn_mri_id_no_program(p.mri_id_no))::text)
Total runtime: 5.637 ms
(6 rows)
Tom Lane wrote:
Roger Ging <rging@paccomsys.com> writes:Ran vacuum analyse on both program and logfile tables. Estimates are more in line with reality now,And they are what now? You really can't expect to get useful help here when you're being so miserly with the details ... FWIW, I suspect you could force 7.4 to generate 7.3's plan by setting enable_mergejoin to off (might have to also set enable_hashjoin to off, if it then tries for a hash join). 7.3 could not even consider those join types in this example, while 7.4 can. The interesting question from my perspective is why the planner is guessing wrong about the relative costs of the plans. EXPLAIN ANALYZE results with each type of join forced would be useful to look at. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
pgsql-performance by date: