Re: bloom filter indexes? - Mailing list pgsql-general

From Mason Hale
Subject Re: bloom filter indexes?
Date
Msg-id 8bca3aa10806031106j67af071bse6397f783fd6991f@mail.gmail.com
Whole thread Raw
In response to Re: bloom filter indexes?  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: bloom filter indexes?  ("Joshua D. Drake" <jd@commandprompt.com>)
List pgsql-general
On Tue, Jun 3, 2008 at 12:04 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Tue, 2008-06-03 at 09:43 -0500, Mason Hale wrote:
>> I've been working on partitioning a rather large dataset into multiple
>> tables. One limitation I've run into the lack of cross-partition-table
>> unique indexes. In my case I need to guarantee the uniqueness of a
>> two-column pair across all partitions -- and this value is not used to
>> partition the tables. The table is partitioned based on a insert date
>> timestamp.
>
> You're looking for a constraint across tables.
>

Yes, for this particular case. But I'm also interested in speeding up
cross-partition queries whether it is for a uniqueness check or not.
This uniqueness check is just one (important) instance of a
cross-partition query.

>> To check the uniqueness of this value I've added an insert/update
>> trigger to search for matches in the other partitions. This trigger is
>> adding significant overhead to inserts and updates.
>
> Do you lock all of the tables before doing the check? If not, then you
> have a race condition.
>

Yes, I was concerned about that.

> It's possible this index strategy will be better for your case.
> However, I think what you really want is some kind of multi-table
> primary key. Have you considered storing the key in its own two-column
> table with a UNIQUE index and having the partitions reference it?

Thanks for the suggestion -- I'll explore maintaining the compound key
in its own non-partitioned table. I was trying to avoid any
application-layer code changes. I guess I can still accomplish that by
updating this table via an insert/update trigger.

But to reiterate, having bloom filter-based index would allow constant
time determination of whether a given partition *may* contain the
data. This would be very useful for large partitioned data-sets,
especially in (very common) cases where performance is critical.

This feature would also be useful for applications where data is
partitioned (aka 'federated') across multiple servers.

pgsql-general by date:

Previous
From: Jeff Davis
Date:
Subject: Re: bloom filter indexes?
Next
From: "Joshua D. Drake"
Date:
Subject: Re: bloom filter indexes?