Re: OK, does anyone have any better ideas? - Mailing list pgsql-hackers

From mlw
Subject Re: OK, does anyone have any better ideas?
Date
Msg-id 3A323112.FCF35B3B@mohawksoft.com
Whole thread Raw
In response to Re: OK, does anyone have any better ideas?  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: OK, does anyone have any better ideas?  (Oleg Bartunov <oleg@sai.msu.su>)
List pgsql-hackers
Oleg Bartunov wrote:
> 
> We need multi-key B-tree like index for such problem.
> Our full text search engine is based on arrays and we need to find quickly
> is some number exists in array - some kind of index over int array.
> We're currently testing GiST approach and seems will have some conclusions
> soon. I think multi-key B-tree like index would be better in my
> opinion, but this requires to much work and knowledge of postgres's internals.
> Yesterday I read about UBTree, seems like it's good for index and query
> sets. Currently postgres has no set specific methods.

The way I do my search indexing is with bitmap objects and a word
dictionary. One creates a searchable dictionary of words by scanning the
selected records. So, in one query that results in 2 million records,
the total aggregate number of words is about 60,000 depending on how you
parse. For each word, you create a "bitmap object" (in one of a few
forms) where bit '0' represents the first record, bit '1' represents the
second, and so on, until you have 2 million bits.

Set the correct bit in the bitmap for each document that contains that
word. In the end you will have the equivalent 60,000 bitmaps or 2
million bits.

During search time, one creates an empty bitmap of 2 million bits as a
work space. One parses the search term, and performs boolean operation
on the workspace from the bitmap retrieved for each word.

When you are done parsing, you have a bitmap with a bit set for each
document that fits the search criteria. You then enumerate the bits by
bit position, and you now have a list of document numbers.

If only I could get the list of document numbers back into
postgres....... It would be great.
-- 
http://www.mohawksoft.com


pgsql-hackers by date:

Previous
From: Oleg Bartunov
Date:
Subject: ...
Next
From: "Hiroshi Inoue"
Date:
Subject: RE: [COMMITTERS] pgsql/src/backend/commands (command.c vacuum.c)