Home > mailing lists

Re: bloom filter indexes? - Mailing list pgsql-general

From	Mason Hale
Subject	Re: bloom filter indexes?
Date	June 3, 2008 15:06:26
Msg-id	8bca3aa10806031106j67af071bse6397f783fd6991f@mail.gmail.com Whole thread Raw
In response to	Re: bloom filter indexes? (Jeff Davis <pgsql@j-davis.com>)
Responses	Re: bloom filter indexes?
List	pgsql-general

Tree view

On Tue, Jun 3, 2008 at 12:04 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Tue, 2008-06-03 at 09:43 -0500, Mason Hale wrote:
>> I've been working on partitioning a rather large dataset into multiple
>> tables. One limitation I've run into the lack of cross-partition-table
>> unique indexes. In my case I need to guarantee the uniqueness of a
>> two-column pair across all partitions -- and this value is not used to
>> partition the tables. The table is partitioned based on a insert date
>> timestamp.
>
> You're looking for a constraint across tables.
>

Yes, for this particular case. But I'm also interested in speeding up
cross-partition queries whether it is for a uniqueness check or not.
This uniqueness check is just one (important) instance of a
cross-partition query.

>> To check the uniqueness of this value I've added an insert/update
>> trigger to search for matches in the other partitions. This trigger is
>> adding significant overhead to inserts and updates.
>
> Do you lock all of the tables before doing the check? If not, then you
> have a race condition.
>

Yes, I was concerned about that.

> It's possible this index strategy will be better for your case.
> However, I think what you really want is some kind of multi-table
> primary key. Have you considered storing the key in its own two-column
> table with a UNIQUE index and having the partitions reference it?

Thanks for the suggestion -- I'll explore maintaining the compound key
in its own non-partitioned table. I was trying to avoid any
application-layer code changes. I guess I can still accomplish that by
updating this table via an insert/update trigger.

But to reiterate, having bloom filter-based index would allow constant
time determination of whether a given partition *may* contain the
data. This would be very useful for large partitioned data-sets,
especially in (very common) cases where performance is critical.

This feature would also be useful for applications where data is
partitioned (aka 'federated') across multiple servers.

pgsql-general by date:

From: Jeff Davis
Date: 03 June 2008, 14:21:54
Subject: Re: bloom filter indexes?

From: "Joshua D. Drake"
Date: 03 June 2008, 15:09:16
Subject: Re: bloom filter indexes?

Re: bloom filter indexes? - Mailing list pgsql-general

Previous

Next