Thread: How to use the BRIN index properly?
Hello,
We have large tables with billions of rows in them and want to take advantage of the BRIN index on them.
Issues we are facing:
- as I understand, BRIN index is useful only if the data is stored in index order. As an example we want to create a composite BRIN index on 3 columns - integers and strings (varchar). How can we tell Postgres to store data in index order as new records are inserted into the database?
- i understand that turning on autosummarize will keep the index fresh and up-to-date as new records are inserted. is this correct?
Thanks for your help.
S.
our insertion order is of course != index order otherwise the question would have been trivial.
we use postgres 14
On Wed, Feb 8, 2023 at 11:51 AM Siddharth Jain <siddhsql@gmail.com> wrote:
Hello,We have large tables with billions of rows in them and want to take advantage of the BRIN index on them.Issues we are facing:
- as I understand, BRIN index is useful only if the data is stored in index order. As an example we want to create a composite BRIN index on 3 columns - integers and strings (varchar). How can we tell Postgres to store data in index order as new records are inserted into the database?
- i understand that turning on autosummarize will keep the index fresh and up-to-date as new records are inserted. is this correct?
Thanks for your help.S.
Is the data in your tables stored in natural correlation with those three columns? I'm dubious that can even happen.
BRIN is best for range queries on tables who's data is added in the same order as the key in the BRIN index (for example, a BRIN index on a timestamp field in a log table where new records are always being appended in "timestamp" order).
It would also be great for history tables where you can pre-sort the data by, for example, customer_id, and then put the BRIN on customer_id.
BRIN is best for range queries on tables who's data is added in the same order as the key in the BRIN index (for example, a BRIN index on a timestamp field in a log table where new records are always being appended in "timestamp" order).
It would also be great for history tables where you can pre-sort the data by, for example, customer_id, and then put the BRIN on customer_id.
On 2/8/23 13:58, Siddharth Jain wrote:
our insertion order is of course != index order otherwise the question would have been trivial.we use postgres 14On Wed, Feb 8, 2023 at 11:51 AM Siddharth Jain <siddhsql@gmail.com> wrote:Hello,We have large tables with billions of rows in them and want to take advantage of the BRIN index on them.Issues we are facing:
- as I understand, BRIN index is useful only if the data is stored in index order. As an example we want to create a composite BRIN index on 3 columns - integers and strings (varchar). How can we tell Postgres to store data in index order as new records are inserted into the database?
- i understand that turning on autosummarize will keep the index fresh and up-to-date as new records are inserted. is this correct?
Thanks for your help.S.
--
Born in Arizona, moved to Babylonia.
Born in Arizona, moved to Babylonia.
As I explained in my question that is indeed our dilemma. Our insertion order will not be equal to index order. i.e., referring to your response:
> who's data is added in the same order as the key in the BRIN index
does NOT hold.
On Wed, Feb 8, 2023 at 12:27 PM Ron <ronljohnsonjr@gmail.com> wrote:
Is the data in your tables stored in natural correlation with those three columns? I'm dubious that can even happen.
BRIN is best for range queries on tables who's data is added in the same order as the key in the BRIN index (for example, a BRIN index on a timestamp field in a log table where new records are always being appended in "timestamp" order).
It would also be great for history tables where you can pre-sort the data by, for example, customer_id, and then put the BRIN on customer_id.On 2/8/23 13:58, Siddharth Jain wrote:our insertion order is of course != index order otherwise the question would have been trivial.we use postgres 14On Wed, Feb 8, 2023 at 11:51 AM Siddharth Jain <siddhsql@gmail.com> wrote:Hello,We have large tables with billions of rows in them and want to take advantage of the BRIN index on them.Issues we are facing:
- as I understand, BRIN index is useful only if the data is stored in index order. As an example we want to create a composite BRIN index on 3 columns - integers and strings (varchar). How can we tell Postgres to store data in index order as new records are inserted into the database?
- i understand that turning on autosummarize will keep the index fresh and up-to-date as new records are inserted. is this correct?
Thanks for your help.S.--
Born in Arizona, moved to Babylonia.
> On Feb 8, 2023, at 13:17, Siddharth Jain <siddhsql@gmail.com> wrote: > > As I explained in my question that is indeed our dilemma. Our insertion order will not be equal to index order. i.e., referringto your response: > > > who's data is added in the same order as the key in the BRIN index > > does NOT hold. A BRIN index is not a good choice in this case. You can CLUSTER the data on an index, but that's a one-time operation: PostgreSQLwill not maintain that order after the CLUSTER. If the number of rows in the table at the time of the CLUSTERis much larger than the number that are inserted between CLUSTER operations, then a BRIN index might be useful, butclustering a very large table is an expensive operation, and requires an exclusive lock on the table while it is beingdone.
OK so in that case we are left with the B-Tree index.
If the B-Tree index will be so large that it cannot fit in memory, then is it worth creating it at all? Are there any established patterns here?
On Wed, Feb 8, 2023 at 1:21 PM Christophe Pettus <xof@thebuild.com> wrote:
> On Feb 8, 2023, at 13:17, Siddharth Jain <siddhsql@gmail.com> wrote:
>
> As I explained in my question that is indeed our dilemma. Our insertion order will not be equal to index order. i.e., referring to your response:
>
> > who's data is added in the same order as the key in the BRIN index
>
> does NOT hold.
A BRIN index is not a good choice in this case. You can CLUSTER the data on an index, but that's a one-time operation: PostgreSQL will not maintain that order after the CLUSTER. If the number of rows in the table at the time of the CLUSTER is much larger than the number that are inserted between CLUSTER operations, then a BRIN index might be useful, but clustering a very large table is an expensive operation, and requires an exclusive lock on the table while it is being done.
> On Feb 8, 2023, at 14:14, Siddharth Jain <siddhsql@gmail.com> wrote: > > If the B-Tree index will be so large that it cannot fit in memory, then is it worth creating it at all? Yes. Of course, more memory is better, and more recently versions of PostgreSQL have optimizations that are valuable forlarge B-tree indexes.
1. The whole index does not need to fit in memory, just the parts of it you need at that time.
2. Partition the table by the primary key. Each index will be much smaller, since each child will be smaller.
2. Partition the table by the primary key. Each index will be much smaller, since each child will be smaller.
On 2/8/23 16:14, Siddharth Jain wrote:
OK so in that case we are left with the B-Tree index.If the B-Tree index will be so large that it cannot fit in memory, then is it worth creating it at all? Are there any established patterns here?On Wed, Feb 8, 2023 at 1:21 PM Christophe Pettus <xof@thebuild.com> wrote:
> On Feb 8, 2023, at 13:17, Siddharth Jain <siddhsql@gmail.com> wrote:
>
> As I explained in my question that is indeed our dilemma. Our insertion order will not be equal to index order. i.e., referring to your response:
>
> > who's data is added in the same order as the key in the BRIN index
>
> does NOT hold.
A BRIN index is not a good choice in this case. You can CLUSTER the data on an index, but that's a one-time operation: PostgreSQL will not maintain that order after the CLUSTER. If the number of rows in the table at the time of the CLUSTER is much larger than the number that are inserted between CLUSTER operations, then a BRIN index might be useful, but clustering a very large table is an expensive operation, and requires an exclusive lock on the table while it is being done.
--
Born in Arizona, moved to Babylonia.
Born in Arizona, moved to Babylonia.
For membership/equality predicates (also partial, for multiple columns) you could take a look at bloom indexes: they are quite efficient in terms of space footprint, you can even choose how long is the signature for each entry and how is distributed among the columns.
g
On Wed, 8 Feb 2023 at 23:15, Siddharth Jain <siddhsql@gmail.com> wrote:
OK so in that case we are left with the B-Tree index.If the B-Tree index will be so large that it cannot fit in memory, then is it worth creating it at all? Are there any established patterns here?On Wed, Feb 8, 2023 at 1:21 PM Christophe Pettus <xof@thebuild.com> wrote:
> On Feb 8, 2023, at 13:17, Siddharth Jain <siddhsql@gmail.com> wrote:
>
> As I explained in my question that is indeed our dilemma. Our insertion order will not be equal to index order. i.e., referring to your response:
>
> > who's data is added in the same order as the key in the BRIN index
>
> does NOT hold.
A BRIN index is not a good choice in this case. You can CLUSTER the data on an index, but that's a one-time operation: PostgreSQL will not maintain that order after the CLUSTER. If the number of rows in the table at the time of the CLUSTER is much larger than the number that are inserted between CLUSTER operations, then a BRIN index might be useful, but clustering a very large table is an expensive operation, and requires an exclusive lock on the table while it is being done.