"Martijn van Oosterhout" <kleptog@svana.org> writes:
> On Fri, Nov 30, 2007 at 11:27:24AM -0500, Josh Harrison wrote:
>> Thanks for your reply
>> Is there a way to get them not to use the
>> heap for intermediate result and go to heap only for final data? This will
>> drastically improve the performance but Im not sure if postgres can do that?
>> Will creating the index in a different way and/or rewriting the query in a
>> different way achieve this result?
>
> I'm trying to imagine what it would take to avoid the heap access after
> the index scan I don't think it's possible. It would require that the
> bitmaps generated by the bitmap scan have the person_id attached and
> then have the bitmap AND operation only happen if the person IDs match.
> No such machinary currently exists.
I think you're describing a star schema join. This is a common checklist item
for data warehousing databases.
The classic data warehouse has a table like "person" which has the info you're
looking for, and dozens of tables with person_id and possibly some associated
data. In some cases those tables don't even have any other data, the mere
existence of the person_id in that table is enough.
So a typical query could look like something like:
select *
from person
where person_id in (select person_id from people_who_used_service_in_the_past)
and person_id in (select person_id from people_with_big_balances)
and person_id in (select person_id from people_...)
and person_id not in (select person_id from people_who_unsubscribed)
and person_id not in (select person_id from people_who_we_mailed_last_week)
The best plan for this is to gather up the person_ids in a kind of bitmap scan
with a bitmap of ids. And once the bitmap is done scan an index on person for
just the matching records. Postgres doesn't support anything like this (yet:).
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!