query optimization differs between view and explicit query - Mailing list pgsql-performance
| From | Reece Hart |
|---|---|
| Subject | query optimization differs between view and explicit query |
| Date | |
| Msg-id | 1075422699.4062.164.camel@tallac Whole thread Raw |
| Responses |
Re: query optimization differs between view and explicit
Re: query optimization differs between view and explicit |
| List | pgsql-performance |
I have a large query which I would like to place in a view. The explicit query is sufficiently fast, but the same query as a view is much slower and uses a different plan. I would appreciate an explanation of why this is, and, more importantly whether/how I might coax the view to use a different plan.
The original query:
The original query:
rkh@csb-dev=> select distinct on (AH.p2gblataln_id) AH.p2gblataln_id,H.pseq_id,min(H.pstart) as "pstart", max(H.pstop) as "pstop",A.ident,(A.ident/Q.len::float*100)::int as "pct_ident", sum(H.pstop-H.pstart+1) as "aln_length",H.genasm_id,H.chr,H.plus_strand,min(H.gstart) as "gstart", max(H.gstop) as "gstop" from p2gblathsp H join p2gblatalnhsp AH on H.p2gblathsp_id=AH.p2gblathsp_id join p2gblataln A on AH.p2gblataln_id=A.p2gblataln_id join pseq Q on H.pseq_id=Q.pseq_id where H.pseq_id=76 group by AH.p2gblataln_id,H.pseq_id,H.genasm_id,H.chr,H.plus_strand,A.ident,Q.len \g /dev/null Time: 277.804 msNow as a view:
rkh@csb-dev=> create view v1 as
select distinct on (AH.p2gblataln_id) AH.p2gblataln_id,H.pseq_id,min(H.pstart) as "pstart",
max(H.pstop) as "pstop",A.ident,(A.ident/Q.len::float*100)::int as "pct_ident",
sum(H.pstop-H.pstart+1) as "aln_length",H.genasm_id,H.chr,H.plus_strand,min(H.gstart) as "gstart",
max(H.gstop) as "gstop"
from p2gblathsp H
join p2gblatalnhsp AH on H.p2gblathsp_id=AH.p2gblathsp_id
join p2gblataln A on AH.p2gblataln_id=A.p2gblataln_id
join pseq Q on H.pseq_id=Q.pseq_id
group by AH.p2gblataln_id,H.pseq_id,H.genasm_id,H.chr,H.plus_strand,A.ident,Q.len;
CREATE VIEW
Time: 103.041 ms
rkh@csb-dev=> select * from v1 where pseq_id=76 \g /dev/null
Time: 31973.979 ms
Okay, that's ~100x slower. The plans:
rkh@csb-dev=> explain select distinct on <snip... same as the first query above>
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=11157.75..11187.26 rows=454 width=40)
-> GroupAggregate (cost=11157.75..11186.13 rows=454 width=40)
-> Sort (cost=11157.75..11158.89 rows=454 width=40)
Sort Key: ah.p2gblataln_id, h.pseq_id, h.genasm_id, h.chr, h.plus_strand, a.ident, q.len
-> Nested Loop (cost=11125.62..11137.71 rows=454 width=40)
-> Index Scan using pseq_pkey on pseq q (cost=0.00..3.01 rows=2 width=6)
Index Cond: (76 = pseq_id)
-> Materialize (cost=11125.62..11127.89 rows=227 width=38)
-> Nested Loop (cost=546.15..11125.62 rows=227 width=38)
-> Hash Join (cost=546.15..10438.72 rows=227 width=34)
Hash Cond: ("outer".p2gblathsp_id = "inner".p2gblathsp_id)
-> Seq Scan on p2gblatalnhsp ah (cost=0.00..6504.03 rows=451503 width=8)
-> Hash (cost=545.58..545.58 rows=227 width=34)
-> Index Scan using p2gblathsp_p_lookup on p2gblathsp h (cost=0.00..545.58 rows=227 wid Index Cond: (pseq_id = 76)
-> Index Scan using p2gblataln_pkey on p2gblataln a (cost=0.00..3.01 rows=1 width=8)
Index Cond: ("outer".p2gblataln_id = a.p2gblataln_id)
(17 rows)
rkh@csb-dev=> explain select * from v1 where pseq_id=76;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------- Subquery Scan v1 (cost=246907.54..281897.70 rows=2258 width=77)
Filter: (pseq_id = 76)
-> Unique (cost=246907.54..276254.13 rows=451486 width=40)
-> GroupAggregate (cost=246907.54..275125.41 rows=451486 width=40)
-> Sort (cost=246907.54..248036.25 rows=451486 width=40)
Sort Key: ah.p2gblataln_id, h.pseq_id, h.genasm_id, h.chr, h.plus_strand, a.ident, q.len
-> Hash Join (cost=14019.29..204503.24 rows=451486 width=40)
Hash Cond: ("outer".p2gblataln_id = "inner".p2gblataln_id)
-> Hash Join (cost=7632.79..191344.45 rows=451486 width=36)
Hash Cond: ("outer".p2gblathsp_id = "inner".p2gblathsp_id)
-> Merge Join (cost=0.00..176939.38 rows=451486 width=36)
Merge Cond: ("outer".pseq_id = "inner".pseq_id)
-> Index Scan using p2gblathsp_p_lookup on p2gblathsp h (cost=0.00..16102.40 rows=451485 widt
pgsql-performance by date: