Re: Weird index or sort behaviour - Mailing list pgsql-performance
From | Matthew Wakeling |
---|---|
Subject | Re: Weird index or sort behaviour |
Date | |
Msg-id | alpine.DEB.2.00.0908181901220.19472@aragorn.flymine.org Whole thread Raw |
In response to | Re: Weird index or sort behaviour (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Weird index or sort behaviour
|
List | pgsql-performance |
On Tue, 18 Aug 2009, Tom Lane wrote: >> I would be more curious in the poster's situation to turn off >> enable_seqscan, enable_sort, and/or enable_nestloop see how the index >> scan merge join plan runs. Like this: QUERY PLAN ----------------------------------------------------------------------- Aggregate (cost=2441719.92..2441719.93 rows=1 width=0) (actual time=50087.537..50087.538 rows=1 loops=1) -> HashAggregate (cost=2397366.95..2417079.38 rows=1971243 width=28) (actual time=40462.069..48634.713 rows=17564726 loops=1) -> Merge Join (cost=0.00..2362870.20 rows=1971243 width=28) (actual time=0.095..22041.693 rows=21463106 loops=1) Merge Cond: ((l1.objectid = l2.objectid) AND (l1.bin = l2.bin)) Join Filter: ((l1.intermine_start <= l2.intermine_end) AND (l2.intermine_start <= l1.intermine_end)) -> Index Scan using locationbin8000__subjectobjectbin on locationbin8000 l1 (cost=0.00..71635.23 rows=657430 width=20) (actual time=0.056..170.857 rows=664588 loops=1) Index Cond: (subjecttype = 'GeneFlankingRegion'::text) -> Index Scan using locationbin8000__subjectobjectbin on locationbin8000 l2 (cost=0.00..71635.23 rows=657430 width=20) (actual time=0.020..9594.466 rows=38231659 loops=1) Index Cond: (l2.subjecttype = 'GeneFlankingRegion'::text) Total runtime: 50864.569 ms (10 rows) >> rewinding an index scan is more expensive than rewinding a materialize >> node but would it really be so much expensive that it's worth copying >> the entire table into temporary space? > > Absolutely not, but remember that what we're expecting the Materialize > to do is buffer only as far back as the last Mark, so that it's unlikely > ever to spill to disk. If that's how it works, then that sounds very promising indeed. > In particular, in Matthew's example the sort is being estimated at > significantly higher cost than the indexscan, which presumably means > that we are estimating there will be a *lot* of re-fetches, else we > wouldn't have rejected the indexscan on the inside. select sum(c * c) / sum(c) from (select objectid, bin, count(*) AS c from locationbin8000 where subjecttype = 'GeneFlankingRegion' GROUP BY objectid, bin) as a; ?column? --------------------- 57.5270393085641029 So on average, we will be rewinding by 57 rows each time. A materialise step really does sound like a win in this situation. Matthew -- Patron: "I am looking for a globe of the earth." Librarian: "We have a table-top model over here." Patron: "No, that's not good enough. Don't you have a life-size?" Librarian: (pause) "Yes, but it's in use right now."
pgsql-performance by date: