many to many performance - Mailing list pgsql-performance

From Chavdar Kopoev
Subject many to many performance
Date
Msg-id 200811261738027964123@kopoev.com
Whole thread Raw
Responses Re: many to many performance
List pgsql-performance
Hello,

I have following common situation:

Category IDs: about 50 000
Document IDs: about 3 000 000
Many to many relationship.
A document id have a relation with 10 up to 1000 category ids
One query, with input set of document ids, resulting set of category ids, having relation with input ids. (very often
thisquery is called with more than 500k of input document ids) 

I use custom written datawarehouse, file storage, and memory kept "id offset" like indecies. So a query for all 3
milliondocument ids, resulting almost all category ids take less than a second on desktop machine. Abstracting from
concreterealization, query plan is like: 
1. for each input document id: look up an array by document id and retrive a list of ralated category ids.
1.1 for each category id in the list: look up an array value by category id and mark it as found
2. traverse category array to extract category ids marked as found

I want to use as a data storage postgresql. Tried several data structures, testing btree, gin, gist indecies over them,
butbest achieved performance for a 10 times smaller dataset (10k cat ids, 100k doc ids, 1m relations) is slower more
than5 times. 

I read about postgresql bitmap indecies and "data lookup" when scanning indecies to get a value for current
transaction.Downloaded latest v8.4 snapshot, compiled it, but as I see there is no bitmap index in it. Maybe if I
downloadHEAD revision I will find them there, dont know. 

Anyway, I want to ask, have anyone faced similar situation, and is there any way to achive closer to optimal
performanceusing postgresql functionality and extensibility? 

Regards,
Chavdar Kopoev


pgsql-performance by date:

Previous
From: Mario Weilguni
Date:
Subject: Re: Increasing pattern index query speed
Next
From: Craig Ringer
Date:
Subject: Re: many to many performance