Hello!
There are many questions on internet about whether it is possible to
optimize "Bitmap Heap Scan" somehow without answer, so seems like
problem is rather important.
The query I need to optimize is:
EXPLAIN SELECT date_trunc('day', d.created_at) AS day, COUNT(*) AS
download FROM downloads d WHERE d.file_id in (select id from files
where owner_id = 443) AND d.download_status != 0 AND d.created_at >=
'2009-12-05' AND d.created_at < '2009-12-16' GROUP BY 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=15809.49..17126.20 rows=87781 width=8)
-> Hash Semi Join (cost=5809.51..15368.11 rows=88276 width=8)
Hash Cond: (d.file_id = files.id)
-> Index Scan using idx_downloads_created_at on downloads d
(cost=0.00..7682.73 rows=88276 width=16)
Index Cond: ((created_at >= '2009-12-05
00:00:00'::timestamp without time zone) AND (created_at < '2009-12-16
00:00:00'::timestamp without time zone))
-> Hash (cost=5741.51..5741.51 rows=5440 width=8)
-> Bitmap Heap Scan on files (cost=106.42..5741.51
rows=5440 width=8)
Recheck Cond: (owner_id = 443)
-> Bitmap Index Scan on idx_files_owner
(cost=0.00..105.06 rows=5440 width=0)
Index Cond: (owner_id = 443)
The problem here is that we are forced to fetch "files" in Bitmap Heap Scan.
But actually there is no need for the whole "files" record. The
necessary data is only "files" ids.
The idea is to avoid fetching data from "files" table, and get the ids
from index! (probably it is a little bit tricky, but it is a
performance area...)
I created an index with following command:
create index idx_files_owner_id ON files (owner_id, id);
and even tried to remove old index to enforce postgresql to use newly
created index.
But postgresql still do Bitmap Heap Scan.
(The other idea is to use raw_id as a primary key of "files" table to
don't extend index. But I don't know whether it is possible at all or
this idea have some drawbacks)
I think it worth to learn postgreql to do this trick especially taking
into account there are many questions about whether it is possible to
optimize such a queries.
If there is an known solution to this problem please provide a link to it.
With best regards,
Michael Mikhulya.