Re: count(*), EXISTS, indexes - Mailing list pgsql-sql
From | Itai Zukerman |
---|---|
Subject | Re: count(*), EXISTS, indexes |
Date | |
Msg-id | 873ckoqs7m.fsf@matt.w80.math-hat.com Whole thread Raw |
In response to | Re: count(*), EXISTS, indexes (Josh Berkus <josh@agliodbs.com>) |
List | pgsql-sql |
>> They are my own operators and functions. profile is a integer array >> and the ~'s are subset operators. psig is a bit signature, "~" is >> complement, and the ~ operators again are subset operators. > > You're going to have to work on your question-posting skills. *blush*, sorry. > This explains why the planner thinks that the exists clause will return > 255,000 rows instead of the handful it actually does return. Actually: # explain select sum(weight) from rprofile rp where exists (select 1 from rcount_prof rcp where rcp.profile ~<= rp.profileand ~rcp.psig ~<= rp.psig and rcp.filter='{734,1944}'); QUERY PLAN ---------------------------------------------------------------------------------------------------- Aggregate (cost=1544943.75..1544943.75rows=1 width=4) -> Seq Scan on rprofile rp (cost=0.00..1544255.00 rows=275500 width=4) Filter: (subplan) SubPlan -> Seq Scan on rcount_prof rcp (cost=0.00..2.70 rows=1 width=0) Filter: ((profile ~<= $0) AND ((~ psig) ~<= $1) AND (filter = '{734,1944}'::text)) I sort-of don't know what I'm doing reading these query plans, but I think the estimates are more-or-less right: few rows from rcount_prof, many rows from rprofile. 275,500 may be a bit high (it's actually around 24,000 in this case), but def. within an order of magnitude. > Or build your own custom index types to take advantage of your custom > operators. B-tree indexes are optimized for =, LIKE, <, and > queries; I > don't think they know what to do with "~<=" I created GiST indexes for the ~ operators (but, no, I haven't tuned the selectivity estimation functions). The problem was, the query wasn't using the index on rprofile. Stephan Szabo's suggestion worked beautifully, though: dl=# EXPLAIN ANALYZE dl-# SELECT sum(weight) dl-# FROM (SELECT DISTINCT ON(rp.rid) rp.rid, rp.weight dl(# FROM rprofilerp, rcount_prof rcp dl(# WHERE rcp.profile ~<= rp.profile dl(# AND ~rcp.psig ~<= rp.psig dl(# ANDrcp.filter='{734,1944}') AS FOO; QUERYPLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=2245.08..2245.08 rows=1 width=1026) (actual time=954.45..954.45 rows=1 loops=1) -> Subquery Scan foo (cost=2245.07..2245.08 rows=1 width=1026) (actual time=824.85..935.71 rows=23619 loops=1) -> Unique (cost=2245.07..2245.08rows=1 width=1026) (actual time=824.85..898.24 rows=23619 loops=1) -> Sort (cost=2245.07..2245.08rows=1 width=1026) (actual time=824.84..840.52 rows=23619 loops=1) Sort Key: rp.rid -> Nested Loop (cost=0.00..2245.06 rows=1 width=1026) (actual time=0.28..520.41 rows=23619 loops=1) Join Filter: ("outer".profile ~<= "inner".profile) -> SeqScan on rcount_prof rcp (cost=0.00..2.44 rows=1 width=287) (actual time=0.06..0.33 rows=1 loops=1) Filter: (filter = '{734,1944}'::text) -> Index Scan using rprofile_profile_idxon rprofile rp (cost=0.00..2232.98 rows=551 width=739) (actual time=0.14..453.11 rows=23666 loops=1) Index Cond: ((~ "outer".psig) ~<= rp.psig) Total runtime: 958.62 msec The row estimates are way off, though. My fault... -- Itai Zukerman <http://www.math-hat.com/~zukerman/>