Re: join to view over custom aggregate seems like it should be faster - Mailing list pgsql-performance
From | Merlin Moncure |
---|---|
Subject | Re: join to view over custom aggregate seems like it should be faster |
Date | |
Msg-id | b42b73150704100559n3ccb9d89p1767a0fa94324375@mail.gmail.com Whole thread Raw |
In response to | Re: join to view over custom aggregate seems like it should be faster ("Merlin Moncure" <mmoncure@gmail.com>) |
Responses |
Re: join to view over custom aggregate seems like it should be faster
|
List | pgsql-performance |
On 4/9/07, Merlin Moncure <mmoncure@gmail.com> wrote: > On 4/9/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "Merlin Moncure" <mmoncure@gmail.com> writes: > > > I have an odd performance issue on 8.2 that I'd thought I'd document > > > here. I have a workaround, but I'm if there is something that I'm not > > > seeing. > > > > It's hard to comment on this without seeing the full details of the view > > and tables. I'm wondering where the SubPlans are coming from, for instance. > > ah, it looks like the aggregate is being re-expanded for each field > returned by the aggregate. I notice this for non-trivial record > returning functions also. standard m.o. is to push into a subquery > and expand afterwords. [sorry for the deluge of info] I cleaned up the view from: create or replace view latest_download as select software_binary_id, host_id, (( select latest_software_download( (bds_status_id, mtime, dl_window_open, dl_window_close, download_start, download_stop, info, userupgradeable, overrideflag, percent_complete)::software_download_data) )::software_download_data).* from software_download group by host_id, software_binary_id; to this: create or replace view latest_download as select software_binary_id, host_id, (v).* from ( select software_binary_id, host_id, latest_software_download( (bds_status_id, mtime, dl_window_open, dl_window_close, download_start, download_stop, info, userupgradeable, overrideflag, percent_complete)::software_download_data) as v from software_download group by host_id, software_binary_id ) q; this cleaned up the odd subplans but is still slow: dev20400=# explain analyze select * from foo join latest_download using (host_id, software_binary_id); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=1308.84..1467.81 rows=25 width=40) (actual time=1472.668..1914.799 rows=494 loops=1) Hash Cond: ((q.host_id = foo.host_id) AND (q.software_binary_id = foo.software_binary_id)) -> HashAggregate (cost=1293.48..1350.17 rows=4535 width=94) (actual time=1467.002..1700.388 rows=37247 loops=1) -> Seq Scan on software_download (cost=0.00..953.42 rows=45342 width=94) (actual time=0.014..274.747 rows=45342 loops=1) -> Hash (cost=7.94..7.94 rows=494 width=8) (actual time=5.028..5.028 rows=494 loops=1) -> Seq Scan on foo (cost=0.00..7.94 rows=494 width=8) (actual time=0.022..2.507 rows=494 loops=1) Total runtime: 1918.721 ms compare it to this: dev20400=# explain analyze select * from foo f where exists (select * from latest_download where host_id = f.host_id and software_binary_id = f.software_binary_id); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on foo f (cost=0.00..3122.01 rows=247 width=8) (actual time=0.152..45.941 rows=494 loops=1) Filter: (subplan) SubPlan -> Subquery Scan q (cost=0.00..6.30 rows=1 width=40) (actual time=0.081..0.081 rows=1 loops=494) -> GroupAggregate (cost=0.00..6.29 rows=1 width=94) (actual time=0.065..0.065 rows=1 loops=494) -> Index Scan using software_download_idx on software_download (cost=0.00..6.27 rows=1 width=94) (actual time=0.013..0.021 r Index Cond: ((host_id = $0) AND (software_binary_id = $1)) Total runtime: 48.323 ms (8 rows) Time: 49.851 ms I since I need both sides, I can't figure out a way to force the index to be used during the join except to use a function to look up the view based on the key, which works: dev20400=# explain analyze select latest_download(host_id, software_binary_id) from foo; QUERY PLAN -------------------------------------------------------------------------------------------------- Seq Scan on foo (cost=0.00..9.18 rows=494 width=8) (actual time=0.566..51.605 rows=494 loops=1) Total runtime: 54.290 ms (2 rows) dev20400=# explain analyze select * from latest_download where host_id = 1 and software_binary_id = 12345; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Subquery Scan q (cost=0.00..6.30 rows=1 width=40) (actual time=0.046..0.046 rows=0 loops=1) -> GroupAggregate (cost=0.00..6.29 rows=1 width=94) (actual time=0.035..0.035 rows=0 loops=1) -> Index Scan using software_download_idx on software_download (cost=0.00..6.27 rows=1 width=94) (actual time=0.024..0.024 rows=0 lo Index Cond: ((host_id = 1) AND (software_binary_id = 12345)) Total runtime: 0.134 ms For some reason, I can't get the index to be used on the table sitting under a view during a join, even though it should be, or at least it seems.... merlin
pgsql-performance by date: