Thread: How to use the BRIN index properly?

How to use the BRIN index properly?

From
Siddharth Jain
Date:
Hello,

We have large tables with billions of rows in them and want to take advantage of the BRIN index on them.

Issues we are facing:
  • as I understand, BRIN index is useful only if the data is stored in index order. As an example we want to create a composite BRIN index on 3 columns - integers and strings (varchar). How can we tell Postgres to store data in index order as new records are inserted into the database?
  • i understand that turning on autosummarize will keep the index fresh and up-to-date as new records are inserted. is this correct?
Thanks for your help.

S.

Re: How to use the BRIN index properly?

From
Siddharth Jain
Date:
our insertion order is of course != index order otherwise the question would have been trivial.
we use postgres 14

On Wed, Feb 8, 2023 at 11:51 AM Siddharth Jain <siddhsql@gmail.com> wrote:
Hello,

We have large tables with billions of rows in them and want to take advantage of the BRIN index on them.

Issues we are facing:
  • as I understand, BRIN index is useful only if the data is stored in index order. As an example we want to create a composite BRIN index on 3 columns - integers and strings (varchar). How can we tell Postgres to store data in index order as new records are inserted into the database?
  • i understand that turning on autosummarize will keep the index fresh and up-to-date as new records are inserted. is this correct?
Thanks for your help.

S.

Re: How to use the BRIN index properly?

From
Ron
Date:
Is the data in your tables stored in natural correlation with those three columns?  I'm dubious that can even happen.

BRIN is best for range queries on tables who's data is added in the same order as the key in the BRIN index (for example, a BRIN index on a timestamp field in a log table where new records are always being appended in "timestamp" order).

It would also be great for history tables where you can pre-sort the data by, for example, customer_id, and then put the BRIN on customer_id.

On 2/8/23 13:58, Siddharth Jain wrote:
our insertion order is of course != index order otherwise the question would have been trivial.
we use postgres 14

On Wed, Feb 8, 2023 at 11:51 AM Siddharth Jain <siddhsql@gmail.com> wrote:
Hello,

We have large tables with billions of rows in them and want to take advantage of the BRIN index on them.

Issues we are facing:
  • as I understand, BRIN index is useful only if the data is stored in index order. As an example we want to create a composite BRIN index on 3 columns - integers and strings (varchar). How can we tell Postgres to store data in index order as new records are inserted into the database?
  • i understand that turning on autosummarize will keep the index fresh and up-to-date as new records are inserted. is this correct?
Thanks for your help.

S.

--
Born in Arizona, moved to Babylonia.

Re: How to use the BRIN index properly?

From
Siddharth Jain
Date:
As I explained in my question that is indeed our dilemma. Our insertion order will not be equal to index order. i.e., referring to your response:

> who's data is added in the same order as the key in the BRIN index

does NOT hold. 

On Wed, Feb 8, 2023 at 12:27 PM Ron <ronljohnsonjr@gmail.com> wrote:
Is the data in your tables stored in natural correlation with those three columns?  I'm dubious that can even happen.

BRIN is best for range queries on tables who's data is added in the same order as the key in the BRIN index (for example, a BRIN index on a timestamp field in a log table where new records are always being appended in "timestamp" order).

It would also be great for history tables where you can pre-sort the data by, for example, customer_id, and then put the BRIN on customer_id.

On 2/8/23 13:58, Siddharth Jain wrote:
our insertion order is of course != index order otherwise the question would have been trivial.
we use postgres 14

On Wed, Feb 8, 2023 at 11:51 AM Siddharth Jain <siddhsql@gmail.com> wrote:
Hello,

We have large tables with billions of rows in them and want to take advantage of the BRIN index on them.

Issues we are facing:
  • as I understand, BRIN index is useful only if the data is stored in index order. As an example we want to create a composite BRIN index on 3 columns - integers and strings (varchar). How can we tell Postgres to store data in index order as new records are inserted into the database?
  • i understand that turning on autosummarize will keep the index fresh and up-to-date as new records are inserted. is this correct?
Thanks for your help.

S.

--
Born in Arizona, moved to Babylonia.

Re: How to use the BRIN index properly?

From
Christophe Pettus
Date:

> On Feb 8, 2023, at 13:17, Siddharth Jain <siddhsql@gmail.com> wrote:
>
> As I explained in my question that is indeed our dilemma. Our insertion order will not be equal to index order. i.e.,
referringto your response: 
>
> > who's data is added in the same order as the key in the BRIN index
>
> does NOT hold.

A BRIN index is not a good choice in this case.  You can CLUSTER the data on an index, but that's a one-time operation:
PostgreSQLwill not maintain that order after the CLUSTER.  If the number of rows in the table at the time of the
CLUSTERis much larger than the number that are inserted between CLUSTER operations, then a BRIN index might be useful,
butclustering a very large table is an expensive operation, and requires an exclusive lock on the table while it is
beingdone. 


Re: How to use the BRIN index properly?

From
Siddharth Jain
Date:
OK so in that case we are left with the B-Tree index.

If the B-Tree index will be so large that it cannot fit in memory, then is it worth creating it at all? Are there any established patterns here?

On Wed, Feb 8, 2023 at 1:21 PM Christophe Pettus <xof@thebuild.com> wrote:


> On Feb 8, 2023, at 13:17, Siddharth Jain <siddhsql@gmail.com> wrote:
>
> As I explained in my question that is indeed our dilemma. Our insertion order will not be equal to index order. i.e., referring to your response:
>
> > who's data is added in the same order as the key in the BRIN index
>
> does NOT hold.

A BRIN index is not a good choice in this case.  You can CLUSTER the data on an index, but that's a one-time operation: PostgreSQL will not maintain that order after the CLUSTER.  If the number of rows in the table at the time of the CLUSTER is much larger than the number that are inserted between CLUSTER operations, then a BRIN index might be useful, but clustering a very large table is an expensive operation, and requires an exclusive lock on the table while it is being done.

Re: How to use the BRIN index properly?

From
Christophe Pettus
Date:

> On Feb 8, 2023, at 14:14, Siddharth Jain <siddhsql@gmail.com> wrote:
>
> If the B-Tree index will be so large that it cannot fit in memory, then is it worth creating it at all?

Yes.  Of course, more memory is better, and more recently versions of PostgreSQL have optimizations that are valuable
forlarge B-tree indexes. 


Re: How to use the BRIN index properly?

From
Ron
Date:
1. The whole index does not need to fit in memory, just the parts of it you need at that time.
2. Partition the table by the primary key.  Each index will be much smaller, since each child will be smaller.

On 2/8/23 16:14, Siddharth Jain wrote:
OK so in that case we are left with the B-Tree index.

If the B-Tree index will be so large that it cannot fit in memory, then is it worth creating it at all? Are there any established patterns here?

On Wed, Feb 8, 2023 at 1:21 PM Christophe Pettus <xof@thebuild.com> wrote:


> On Feb 8, 2023, at 13:17, Siddharth Jain <siddhsql@gmail.com> wrote:
>
> As I explained in my question that is indeed our dilemma. Our insertion order will not be equal to index order. i.e., referring to your response:
>
> > who's data is added in the same order as the key in the BRIN index
>
> does NOT hold.

A BRIN index is not a good choice in this case.  You can CLUSTER the data on an index, but that's a one-time operation: PostgreSQL will not maintain that order after the CLUSTER.  If the number of rows in the table at the time of the CLUSTER is much larger than the number that are inserted between CLUSTER operations, then a BRIN index might be useful, but clustering a very large table is an expensive operation, and requires an exclusive lock on the table while it is being done.

--
Born in Arizona, moved to Babylonia.

Re: How to use the BRIN index properly?

From
GF
Date:

For membership/equality predicates (also partial, for multiple columns) you could take a look at bloom indexes: they are quite efficient in terms of space footprint, you can even choose how long is the signature for each entry and how is distributed among the columns.
g



On Wed, 8 Feb 2023 at 23:15, Siddharth Jain <siddhsql@gmail.com> wrote:
OK so in that case we are left with the B-Tree index.

If the B-Tree index will be so large that it cannot fit in memory, then is it worth creating it at all? Are there any established patterns here?

On Wed, Feb 8, 2023 at 1:21 PM Christophe Pettus <xof@thebuild.com> wrote:


> On Feb 8, 2023, at 13:17, Siddharth Jain <siddhsql@gmail.com> wrote:
>
> As I explained in my question that is indeed our dilemma. Our insertion order will not be equal to index order. i.e., referring to your response:
>
> > who's data is added in the same order as the key in the BRIN index
>
> does NOT hold.

A BRIN index is not a good choice in this case.  You can CLUSTER the data on an index, but that's a one-time operation: PostgreSQL will not maintain that order after the CLUSTER.  If the number of rows in the table at the time of the CLUSTER is much larger than the number that are inserted between CLUSTER operations, then a BRIN index might be useful, but clustering a very large table is an expensive operation, and requires an exclusive lock on the table while it is being done.