Thread: Do BRIN indexes support MIN/MAX?

Do BRIN indexes support MIN/MAX?

From
Andrey Klochkov
Date:
Hello,
We've started experimenting with using BRIN indexes for some of our large tables, as a replacement for B-Tree on "natural" timestamp columns that seem to be a good case for BRIN. Is it correct that BRIN indexes don't support MIN/MAX operations ?

We see that a query like `SELECT max(timestamp_column) FROM table` does a sequential scan on the table instead of using the index. 

We're on Postgres 10.23 at the moment if that's important.

--
Andrey Klochkov

Re: Do BRIN indexes support MIN/MAX?

From
Vladimir Sitnikov
Date:
> Is it correct that BRIN indexes don't support MIN/MAX operations ?

In theory, it should be possible to implement min/max scan support for BRIN, however it is not implemented yet.

Just in case, min/max query would require to read all BRIN pages, and then it would require to read the corresponding pages in table.

For instance, imagine the table has N pages. Then BRIN would have N/128 pages with the default pages_per_range=128, so your max(..) query would take N/128 + 128 pages to read. In theory it would be sequential, however, under concurrent load it might not be that sequential for the disk.

For instance, 80GiB table would be like 10’000’000 pages, so the default BRIN would take about 78’000 pages (625MiB), so the min/max scan would read 626 MiB
If pages per range is increased to ~3162, then index size would be ~3162 pages (25MiB), and each index entry would cover 25MiB range. Then the query would have to read ~50MiB to fetch min/max. It is not clear if that is really practical though.

What are you data volumes and expectations by the way?

Vladimir

--
Vladimir

Re: Do BRIN indexes support MIN/MAX?

From
Francisco Olarte
Date:
On Wed, 29 Mar 2023 at 22:07, Vladimir Sitnikov
<sitnikov.vladimir@gmail.com> wrote:
> > Is it correct that BRIN indexes don't support MIN/MAX operations ?
> In theory, it should be possible to implement min/max scan support for BRIN, however it is not implemented yet.
>
> Just in case, min/max query would require to read all BRIN pages, and then it would require to read the corresponding
pagesin table. 

> For instance, imagine the table has N pages. Then BRIN would have N/128 pages with the default pages_per_range=128,
soyour max(..) query would take N/128 + 128 pages to read. In theory it would be sequential, however, under concurrent
loadit might not be that sequential for the disk. 

I think BRIN would require N/128 RANGES, not pages, and if I am not
mistaken it fits several ranges in an index page. It talks of summary
tuples, and I suspect a summary tuple for say, an integer, is not
gonna be longer, than 128 bytes, in which case you could fit 64 of
them in a 4k page.

Also, if you account for possible concurrent load disturbing your
index+partial scan, you also have to account for the ( more likely )
disruption on the full scan.

I.e., I have this table
 apc     | apc_cdrs_p2022_12              | table             |
postgres   | permanent   | heap          | 860 MB     |
N/128 pages implies N/128 bytes, so index would be 6.7Mb in your numbers, but
 apc     | apc_cdrs_p2022_12_cuando_idx               | index
   | postgres   | apc_cdrs_p2022_12      | permanent   | brin
| 64 kB      |
 apc     | apc_cdrs_p2022_12_raw_id_idx               | index
   | postgres   | apc_cdrs_p2022_12      | permanent   | brin
| 64 kB      |
1st one is on a timestamp  column, second on an integer. And several
empty partitions hace 48kB indexes, so it seems data is just 16k for
the 860 ranges. That could be about 20 bytes/range which more or less
fits to a couple of values.

In my experience, BRIN are ridiculously small. I use them on that
particular table because both cuando and raw_id correlate with
insertion order and I normally only read several megabytes ranges
indexed on them, so they work very well in limiting the scan range to
nearly what I need.

> For instance, 80GiB table would be like 10’000’000 pages, so the default BRIN would take about 78’000 pages (625MiB),
sothe min/max scan would read 626 MiB 
> If pages per range is increased to ~3162, then index size would be ~3162 pages (25MiB), and each index entry would
cover25MiB range. Then the query would have to read ~50MiB to fetch min/max. It is not clear if that is really
practicalthough. 

If you assume your index fits 64 tuples per page your index read drops
to about 10Mb, plus the 1Mb range.

Also, I suspect you will have to read all unsummarized ranges (
probably before the summarized ones, as unsummarized can discard
summarizeds, but not the other way ).

Francisco Olarte.



Re: Do BRIN indexes support MIN/MAX?

From
Andrey Klochkov
Date:
BRIN indexes seem to work perfectly well for our purposes, and they are so tiny compared to B-Tree. Selecting min/max values is very expensive though. 

In my case the table is ~2.5TB (530M records), while the whole BRIN index is 16MB. I think it'd be totally fine to scan all BRIN pages, it'd be way better than doing table scan.

On Wed, Mar 29, 2023 at 1:47 PM Francisco Olarte <folarte@peoplecall.com> wrote:
On Wed, 29 Mar 2023 at 22:07, Vladimir Sitnikov
<sitnikov.vladimir@gmail.com> wrote:
> > Is it correct that BRIN indexes don't support MIN/MAX operations ?
> In theory, it should be possible to implement min/max scan support for BRIN, however it is not implemented yet.
>
> Just in case, min/max query would require to read all BRIN pages, and then it would require to read the corresponding pages in table.

> For instance, imagine the table has N pages. Then BRIN would have N/128 pages with the default pages_per_range=128, so your max(..) query would take N/128 + 128 pages to read. In theory it would be sequential, however, under concurrent load it might not be that sequential for the disk.

I think BRIN would require N/128 RANGES, not pages, and if I am not
mistaken it fits several ranges in an index page. It talks of summary
tuples, and I suspect a summary tuple for say, an integer, is not
gonna be longer, than 128 bytes, in which case you could fit 64 of
them in a 4k page.

Also, if you account for possible concurrent load disturbing your
index+partial scan, you also have to account for the ( more likely )
disruption on the full scan.

I.e., I have this table
 apc     | apc_cdrs_p2022_12              | table             |
postgres   | permanent   | heap          | 860 MB     |
N/128 pages implies N/128 bytes, so index would be 6.7Mb in your numbers, but
 apc     | apc_cdrs_p2022_12_cuando_idx               | index
   | postgres   | apc_cdrs_p2022_12      | permanent   | brin
| 64 kB      |
 apc     | apc_cdrs_p2022_12_raw_id_idx               | index
   | postgres   | apc_cdrs_p2022_12      | permanent   | brin
| 64 kB      |
1st one is on a timestamp  column, second on an integer. And several
empty partitions hace 48kB indexes, so it seems data is just 16k for
the 860 ranges. That could be about 20 bytes/range which more or less
fits to a couple of values.

In my experience, BRIN are ridiculously small. I use them on that
particular table because both cuando and raw_id correlate with
insertion order and I normally only read several megabytes ranges
indexed on them, so they work very well in limiting the scan range to
nearly what I need.

> For instance, 80GiB table would be like 10’000’000 pages, so the default BRIN would take about 78’000 pages (625MiB), so the min/max scan would read 626 MiB
> If pages per range is increased to ~3162, then index size would be ~3162 pages (25MiB), and each index entry would cover 25MiB range. Then the query would have to read ~50MiB to fetch min/max. It is not clear if that is really practical though.

If you assume your index fits 64 tuples per page your index read drops
to about 10Mb, plus the 1Mb range.

Also, I suspect you will have to read all unsummarized ranges (
probably before the summarized ones, as unsummarized can discard
summarizeds, but not the other way ).

Francisco Olarte.


--
Andrey Klochkov

Re: Do BRIN indexes support MIN/MAX?

From
Vladimir Sitnikov
Date:
Right you are I should have divided it by ranges per page.

Vladimir


--
Vladimir

Re: Do BRIN indexes support MIN/MAX?

From
Julien Rouhaud
Date:
On Thu, 30 Mar 2023, 05:03 Andrey Klochkov, <diggerk@gmail.com> wrote:
BRIN indexes seem to work perfectly well for our purposes, and they are so tiny compared to B-Tree. Selecting min/max values is very expensive though. 

In my case the table is ~2.5TB (530M records), while the whole BRIN index is 16MB. I think it'd be totally fine to scan all BRIN pages, it'd be way better than doing table scan.

brin indexes don't work the way you would hope for.  the stored min/max values per range guarantees that all values in the underlying relation pages are contained in that range, but it doesn't mean that those min/max values are still present in the table, so you can't deduce in which range the current min or max value is from there.

Re: Do BRIN indexes support MIN/MAX?

From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes:
> brin indexes don't work the way you would hope for.  the stored min/max
> values per range guarantees that all values in the underlying relation
> pages are contained in that range, but it doesn't mean that those min/max
> values are still present in the table, so you can't deduce in which range
> the current min or max value is from there.

Yeah.  You could for example (when looking for a MAX) skip scanning
block ranges whose indexed MAX is less than the indexed MIN of some
other block range.  But unless the column you are interested in is
pretty well correlated with physical storage order, that seems
unlikely to buy much.

[ ... thinks ... ]  Actually, I'm not sure that even that optimization
works without hackery.  It requires that you know that the "other block
range" is not empty according to your snapshot.  I guess you could work
by scanning block ranges in descending indexed-MIN order, and skipping
ranges only once you have a candidate MAX in hand.  But that sounds
complicated and very prone to nonsequential table access, so I'm not
sure how much of a win it is.  If you need fast min/max access, you
probably need to bite the bullet and maintain a btree index.

            regards, tom lane



Re: Do BRIN indexes support MIN/MAX?

From
Vladimir Sitnikov
Date:
> so you can't deduce in which range the current min or max value is from there.

That is why you select several candidate ranges and scan the table for those ranges.

For instance, if you have ranges
1) 1..4
2) 5..8
3) 6..9

Then you do something like

select x
  from (
    select max(col) x from tab t where rowid in 5..8 or rowid in 6..9
    union all
    selext max(col) x from tab t where rowid in 1..4
  )
 limit 1

If the first two (2 and 3) ranges happen to be empty, then scanning of 1 would be needed.

Of course, it would degrade to scanning all the pages in table if all the ranges intersect or if all the rows are deleted from the table. However, it might work well in timestamp-like cases.

Vladimir

--
Vladimir

Re: Do BRIN indexes support MIN/MAX?

From
David Rowley
Date:
On Thu, 30 Mar 2023 at 17:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > brin indexes don't work the way you would hope for.  the stored min/max
> > values per range guarantees that all values in the underlying relation
> > pages are contained in that range, but it doesn't mean that those min/max
> > values are still present in the table, so you can't deduce in which range
> > the current min or max value is from there.
>
> Yeah.  You could for example (when looking for a MAX) skip scanning
> block ranges whose indexed MAX is less than the indexed MIN of some
> other block range.  But unless the column you are interested in is
> pretty well correlated with physical storage order, that seems
> unlikely to buy much.

I might be missing something obvious here, but as I understand it,
there's not really any API to ask an index AM what the maximum or
minimum indexed value is. There's only amcanorder == true indexes that
could give you values starting with the highest or lowest depending on
if it's a forward or backward index scan.

Tomas is doing some work in https://commitfest.postgresql.org/42/3949/
which I imagined would have allowed index scans of BRIN indexes. Which
I imagined would lead to allowing the MIN/MAX aggregates to
effectively be rewritten to effectively be executed as SELECT <col>
FROM <table> ORDER BY col LIMIT 1, as is currently done in
build_minmax_path().  I'd assume the way to make this work with BRIN
would be to allow ordered scans by first splitting all ranges into
non-overlapping sets and then sorting tuples from each of those range
sets in batches.  Of course, that would only be efficient when ranges
were reasonably not overlapping each other.

(Glancing at Tomas's patch, I was surprised to see it didn't set
amcanorder to true, so I'm a little unsure how that patch is adding
more usable optimisations which the planner can make use of.)

David



Re: Do BRIN indexes support MIN/MAX?

From
Vladimir Sitnikov
Date:
> You could for example (when looking for a MAX) skip scanning
block ranges whose indexed MAX is less than the indexed MIN of some
other block range.

When looking for a max, you could scan the range with the maximal indexed MAX, and then you could skip all the ranges that have MAX less than the already found max value. You can use any found value as a cutoff point.

You do not need to have much hackery as you can scan the most promising range first, and then scan the others if the most promising did not yield value which is known to exceed all the other ranges.

So find_min and find_max can be pretty efficient with BRIN.
I’m not sure regarding the practical use of that though.

Top N can be implemented in the same way by passing all the values to a binary heap and skipping all the ranges that are known to be less than what we already have in a heap.

Andrey, could you clarify the use cases for looking up of min/max record?
I am afraid, PostgreSQL has no way to represent “this access method supports min/max retrieval”, and what currently exists is “this access method supports ordered scans sorted by the indexed column’s value”
Apparently, implementing ordered BRIN scan would be harder (from implementation, costing, and testing point of views), so it would be nice to hear on the use cases for having min/max am scans or for having BRIN ordered scans.

Vladimir

--
Vladimir

Re: Do BRIN indexes support MIN/MAX?

From
Andrey Klochkov
Date:
Vladimir,
Here's the use case description. We have some large tables that are effectively append-only logs with some stats on our systems. The table I was experimenting with recently contains 6 months of data. It has a timestamp column that tells when the stats in the row were collected. The system that consumes the data from this table normally does point queries that use B-Tree indexes which involve more than the timestamp column. Sometimes that system needs to reprocess a few hours to a few days of the most recent data, and that's when the BRIN index is used. In addition, we sometimes check the recency of the data in the table by finding "MAX(timestamp_column)", and that's when the BRIN index is not helpful. For now we modified the query that does this to limit the search to a few days of data as we don't expect the data in the table to ever become more stale than that.

On Wed, Mar 29, 2023 at 9:58 PM Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
> You could for example (when looking for a MAX) skip scanning
block ranges whose indexed MAX is less than the indexed MIN of some
other block range.

When looking for a max, you could scan the range with the maximal indexed MAX, and then you could skip all the ranges that have MAX less than the already found max value. You can use any found value as a cutoff point.

You do not need to have much hackery as you can scan the most promising range first, and then scan the others if the most promising did not yield value which is known to exceed all the other ranges.

So find_min and find_max can be pretty efficient with BRIN.
I’m not sure regarding the practical use of that though.

Top N can be implemented in the same way by passing all the values to a binary heap and skipping all the ranges that are known to be less than what we already have in a heap.

Andrey, could you clarify the use cases for looking up of min/max record?
I am afraid, PostgreSQL has no way to represent “this access method supports min/max retrieval”, and what currently exists is “this access method supports ordered scans sorted by the indexed column’s value”
Apparently, implementing ordered BRIN scan would be harder (from implementation, costing, and testing point of views), so it would be nice to hear on the use cases for having min/max am scans or for having BRIN ordered scans.

Vladimir

--
Vladimir


--
Andrey Klochkov