Performance problems with multiple layers of functions - Mailing list pgsql-performance
From | Svenne Krap |
---|---|
Subject | Performance problems with multiple layers of functions |
Date | |
Msg-id | 4423EACD.80706@krap.dk Whole thread Raw |
Responses |
Re: Performance problems with multiple layers of functions
|
List | pgsql-performance |
Hi there. I have hit a edge in the planning and I hope you can help. The system uses a lot of stored procedures to move as much of the intelligence into the database layer as possible. My (development) query looks like and runs reasonably fast: explain analyze select dataset_id, entity, sum(amount) from entrydata_current where flow_direction in (select * from outflow_direction(dataset_id)) and dataset_id in (122,112,125,89,111) group by dataset_id, entity; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=918171.00..918171.30 rows=24 width=19) (actual time=11533.297..11533.340 rows=50 loops=1) -> Bitmap Heap Scan on entrydata_current (cost=676.72..917736.04 rows=57994 width=19) (actual time=23.921..11425.373 rows=37870 loops=1) Recheck Cond: ((dataset_id = 122) OR (dataset_id = 112) OR (dataset_id = 125) OR (dataset_id = 89) OR (dataset_id = 111)) Filter: (subplan) -> BitmapOr (cost=676.72..676.72 rows=117633 width=0) (actual time=15.765..15.765 rows=0 loops=1) -> Bitmap Index Scan on entrydata_current_dataset_idx (cost=0.00..83.97 rows=14563 width=0) (actual time=1.881..1.881 rows=13728 loops=1) Index Cond: (dataset_id = 122) -> Bitmap Index Scan on entrydata_current_dataset_idx (cost=0.00..156.12 rows=27176 width=0) (actual time=3.508..3.508 rows=25748 loops=1) Index Cond: (dataset_id = 112) -> Bitmap Index Scan on entrydata_current_dataset_idx (cost=0.00..124.24 rows=21498 width=0) (actual time=2.729..2.729 rows=20114 loops=1) Index Cond: (dataset_id = 125) -> Bitmap Index Scan on entrydata_current_dataset_idx (cost=0.00..102.20 rows=17771 width=0) (actual time=2.351..2.351 rows=17344 loops=1) Index Cond: (dataset_id = 89) -> Bitmap Index Scan on entrydata_current_dataset_idx (cost=0.00..210.19 rows=36625 width=0) (actual time=5.292..5.292 rows=37118 loops=1) Index Cond: (dataset_id = 111) SubPlan -> Function Scan on outflow_direction (cost=0.00..12.50 rows=1000 width=4) (actual time=0.093..0.095 rows=4 loops=114052) Total runtime: 11540.506 ms (18 rows) The problem is, that the application should not need to know the five dataset_ids (it will always know one - its own). So I make a function to return the five ids and then the query looks like: explain select dataset_id, entity, sum(amount) from entrydata_current where flow_direction in (select * from outflow_direction(dataset_id)) and dataset_id in (select * from get_dataset_ids(122)) group by dataset_id, entity; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=24672195.68..24672203.88 rows=656 width=19) -> Hash IN Join (cost=15.00..24660005.45 rows=1625364 width=19) Hash Cond: ("outer".dataset_id = "inner".get_dataset_ids) -> Index Scan using entrydata_current_dataset_idx on entrydata_current (cost=0.00..24558405.20 rows=1625364 width=19) Filter: (subplan) SubPlan -> Function Scan on outflow_direction (cost=0.00..12.50 rows=1000 width=4) -> Hash (cost=12.50..12.50 rows=1000 width=4) -> Function Scan on get_dataset_ids (cost=0.00..12.50 rows=1000 width=4) (9 rows) which does not return within 10 minutes - which is unacceptable. Is there any way to get a better plan for the second ? The planner should really see the two queries as equal as there is no dependencies between the outer query and get_dataset_ids (isn't it called constant folding?). Thanks in advance Svenne
Attachment
pgsql-performance by date: