Thread: Bloom Filter indexes?
Has any thought been given to adding bloom filter indexes to PostgreSQL? A bloom index would be created on a column, and could then be used to accelerate exact matches where it is common that the user may query for a value that doesn't exist. For example, with the query select userid from user_table where name="notauser", the failure could be returned instantly, in most cases. A bloom filter index could be used to accelerate joins, esp full outer joins. Insertions into a bloom filter are very cheap. Updates could be done as an insert. Deletes are expensive (either you make a refcounted filter or you regenerate the filter). However, since bloom filters have false positives, it would be acceptable to regenerate the filter during a vacuum if there have been entries deleted or updated. The filter could be resized at vacuum time based on statistics gathered during execution. It would also be useful to have an array bloom index: store a bloom filter per record for an arrayed field, as well as the bloom filter for all records. This would allow membership tests for a field containing large arrays to happen very quickly. Perhaps useful for GIS and full text indexing applications.
Has any thought been given to adding bloom filter indexes to PostgreSQL? A bloom index would be created on a column, and could then be used to accelerate exact matches where it is common that the user may query for a value that doesn't exist. For example, with the query select userid from user_table where name="notauser", the failure could be returned instantly, in most cases. A bloom filter index could be used to accelerate joins, esp full outer joins. Insertions into a bloom filter are very cheap. Updates could be done as an insert. Deletes are expensive (either you make a refcounted filter or you regenerate the filter). However, since bloom filters have false positives, it would be acceptable to regenerate the filter during a vacuum if there have been entries deleted or updated. The filter could be resized at vacuum time based on statistics gathered during execution. It would also be useful to have an array bloom index: store a bloom filter per record for an arrayed field, as well as the bloom filter for all records. This would allow membership tests for a field containing large arrays to happen very quickly. Perhaps useful for GIS and full text indexing applications.
We use it in contrib/intarray for set operations and contrib/tsearch2 for full text search. The problem is how to store it to provide efficient operations. We use RD-tree as a storage. Oleg On Sat, 28 May 2005, Gregory Maxwell wrote: > Has any thought been given to adding bloom filter indexes to PostgreSQL? > > A bloom index would be created on a column, and could then be used to > accelerate exact matches where it is common that the user may query > for a value that doesn't exist. For example, with the query select > userid from user_table where name="notauser", the failure could be > returned instantly, in most cases. > > A bloom filter index could be used to accelerate joins, esp full outer joins. > > Insertions into a bloom filter are very cheap. Updates could be done > as an insert. Deletes are expensive (either you make a refcounted > filter or you regenerate the filter). However, since bloom filters > have false positives, it would be acceptable to regenerate the filter > during a vacuum if there have been entries deleted or updated. The > filter could be resized at vacuum time based on statistics gathered > during execution. > > It would also be useful to have an array bloom index: store a bloom > filter per record for an arrayed field, as well as the bloom filter > for all records. This would allow membership tests for a field > containing large arrays to happen very quickly. Perhaps useful for GIS > and full text indexing applications. > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83