On 21/01/10 22:15, Tory M Blue wrote:
> · Data distribution = In the 98mill records, there are 7000 unique
> makeid's, and 21mill unique UID's. About 41mill of the records have
> tagged=true
>
> · Time to execute the following query with indices on makeid and
> tagged = 90-120 seconds. The planner uses the webid index and filters on
> tagged and then rechecks the webid index
>
> * SELECT COUNT(DISTINCT uid ) AS active_users FROM
> pixelpool.userstats WHERE makeid ='bmw-ferman' AND tagged =true*
>
> · Time to execute the the same query with a combined index on makeid
> and tagged = 60-100 seconds. The planner uses the combined index and then
> filters tagged.
Two things:
1. You have got the combined index on (makeid, tagged) and not (tagged,
makeid) haven't you? Just checking.
2. If it's mostly tagged=true you are interested in you can always use a
partial index: CREATE INDEX ... (makeid) WHERE tagged
This might be a win even if you need a second index with WHERE NOT tagged.
Also, either I've not had enough cofee yet, or a bitmap scan is an odd
choice for only ~ 13000 rows out of 100 million.
> * " -> Bitmap Index Scan on
> idx_retargetuserstats_makeidtag (cost=0.00..360.20 rows=13175 width=0)"*
>
> * " Index Cond: ((makeid = 'b1mw-ferman'::text)
> AND (tagged = true))"*
Otherwise, see what Craig said.
I'm assuming this isn't the query that is CPU bound for a long time.
Unless your table is horribly bloated, there's no reason for that
judging by this plan.
--
Richard Huxton
Archonet Ltd