query optimization differs between view and explicit query - Mailing list pgsql-performance
From | Reece Hart |
---|---|
Subject | query optimization differs between view and explicit query |
Date | |
Msg-id | 1075422699.4062.164.camel@tallac Whole thread Raw |
Responses |
Re: query optimization differs between view and explicit
Re: query optimization differs between view and explicit |
List | pgsql-performance |
I have a large query which I would like to place in a view. The explicit query is sufficiently fast, but the same query as a view is much slower and uses a different plan. I would appreciate an explanation of why this is, and, more importantly whether/how I might coax the view to use a different plan.
The original query:
The original query:
rkh@csb-dev=> select distinct on (AH.p2gblataln_id) AH.p2gblataln_id,H.pseq_id,min(H.pstart) as "pstart", max(H.pstop) as "pstop",A.ident,(A.ident/Q.len::float*100)::int as "pct_ident", sum(H.pstop-H.pstart+1) as "aln_length",H.genasm_id,H.chr,H.plus_strand,min(H.gstart) as "gstart", max(H.gstop) as "gstop" from p2gblathsp H join p2gblatalnhsp AH on H.p2gblathsp_id=AH.p2gblathsp_id join p2gblataln A on AH.p2gblataln_id=A.p2gblataln_id join pseq Q on H.pseq_id=Q.pseq_id where H.pseq_id=76 group by AH.p2gblataln_id,H.pseq_id,H.genasm_id,H.chr,H.plus_strand,A.ident,Q.len \g /dev/null Time: 277.804 msNow as a view:
rkh@csb-dev=> create view v1 as select distinct on (AH.p2gblataln_id) AH.p2gblataln_id,H.pseq_id,min(H.pstart) as "pstart", max(H.pstop) as "pstop",A.ident,(A.ident/Q.len::float*100)::int as "pct_ident", sum(H.pstop-H.pstart+1) as "aln_length",H.genasm_id,H.chr,H.plus_strand,min(H.gstart) as "gstart", max(H.gstop) as "gstop" from p2gblathsp H join p2gblatalnhsp AH on H.p2gblathsp_id=AH.p2gblathsp_id join p2gblataln A on AH.p2gblataln_id=A.p2gblataln_id join pseq Q on H.pseq_id=Q.pseq_id group by AH.p2gblataln_id,H.pseq_id,H.genasm_id,H.chr,H.plus_strand,A.ident,Q.len; CREATE VIEW Time: 103.041 ms rkh@csb-dev=> select * from v1 where pseq_id=76 \g /dev/null Time: 31973.979 ms Okay, that's ~100x slower. The plans: rkh@csb-dev=> explain select distinct on <snip... same as the first query above> QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=11157.75..11187.26 rows=454 width=40) -> GroupAggregate (cost=11157.75..11186.13 rows=454 width=40) -> Sort (cost=11157.75..11158.89 rows=454 width=40) Sort Key: ah.p2gblataln_id, h.pseq_id, h.genasm_id, h.chr, h.plus_strand, a.ident, q.len -> Nested Loop (cost=11125.62..11137.71 rows=454 width=40) -> Index Scan using pseq_pkey on pseq q (cost=0.00..3.01 rows=2 width=6) Index Cond: (76 = pseq_id) -> Materialize (cost=11125.62..11127.89 rows=227 width=38) -> Nested Loop (cost=546.15..11125.62 rows=227 width=38) -> Hash Join (cost=546.15..10438.72 rows=227 width=34) Hash Cond: ("outer".p2gblathsp_id = "inner".p2gblathsp_id) -> Seq Scan on p2gblatalnhsp ah (cost=0.00..6504.03 rows=451503 width=8) -> Hash (cost=545.58..545.58 rows=227 width=34) -> Index Scan using p2gblathsp_p_lookup on p2gblathsp h (cost=0.00..545.58 rows=227 wid Index Cond: (pseq_id = 76) -> Index Scan using p2gblataln_pkey on p2gblataln a (cost=0.00..3.01 rows=1 width=8) Index Cond: ("outer".p2gblataln_id = a.p2gblataln_id) (17 rows) rkh@csb-dev=> explain select * from v1 where pseq_id=76; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Subquery Scan v1 (cost=246907.54..281897.70 rows=2258 width=77) Filter: (pseq_id = 76) -> Unique (cost=246907.54..276254.13 rows=451486 width=40) -> GroupAggregate (cost=246907.54..275125.41 rows=451486 width=40) -> Sort (cost=246907.54..248036.25 rows=451486 width=40) Sort Key: ah.p2gblataln_id, h.pseq_id, h.genasm_id, h.chr, h.plus_strand, a.ident, q.len -> Hash Join (cost=14019.29..204503.24 rows=451486 width=40) Hash Cond: ("outer".p2gblataln_id = "inner".p2gblataln_id) -> Hash Join (cost=7632.79..191344.45 rows=451486 width=36) Hash Cond: ("outer".p2gblathsp_id = "inner".p2gblathsp_id) -> Merge Join (cost=0.00..176939.38 rows=451486 width=36) Merge Cond: ("outer".pseq_id = "inner".pseq_id) -> Index Scan using p2gblathsp_p_lookup on p2gblathsp h (cost=0.00..16102.40 rows=451485 widt
pgsql-performance by date: