On 1/12/21 6:28 PM, John Naylor wrote:
> On Sat, Dec 19, 2020 at 8:15 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>
> wrote:
> > [12-20 version]
>
> Hi Tomas,
>
> The measurements look good. In case it fell through the cracks, my
> earlier review comments for Bloom BRIN indexes regarding minor details
> don't seem to have been addressed in this version. I'll point to earlier
> discussion for convenience:
>
> https://www.postgresql.org/message-id/CACPNZCt%3Dx-fOL0CUJbjR3BFXKgcd9HMPaRUVY9cwRe58hmd8Xg%40mail.gmail.com
> <https://www.postgresql.org/message-id/CACPNZCt%3Dx-fOL0CUJbjR3BFXKgcd9HMPaRUVY9cwRe58hmd8Xg%40mail.gmail.com>
>
> https://www.postgresql.org/message-id/CACPNZCuqpkCGt8%3DcywAk1kPu0OoV_TjPXeV-J639ABQWyViyug%40mail.gmail.com
> <https://www.postgresql.org/message-id/CACPNZCuqpkCGt8%3DcywAk1kPu0OoV_TjPXeV-J639ABQWyViyug%40mail.gmail.com>
>
Whooops :-( I'll go through those again, thanks for reminding me.
> > The improvements are fairly minor:
> >
> > 1) Rejecting bloom filters that are clearly too large (larger than page)
> > early. This is imperfect, as it works for individual index keys, not the
> > whole row. But per discussion it seems useful.
>
> I think this is good enough.
>
> > So based on this I'm tempted to just use the version with two hashes, as
> > implemented in 0005. It's much simpler than the partitioning scheme,
> > does not need any of the logic to generate primes etc.
>
> Sounds like the best engineering decision.
>
> Circling back to multi-minmax build times, I ran a couple quick tests on
> bigger hardware, and found that not only is multi-minmax slower than
> minmax, which is to be expected, but also slower than btree. (unlogged
> table ~12GB in size, maintenance_work_mem = 1GB, median of three runs)
>
> btree 38.3s
> minmax 26.2s
> multi-minmax 110s
>
> Since btree indexes are much larger, I imagine something algorithmic is
> involved. Is it worth digging further to see if some code path is taking
> more time than we would expect?
>
I suspect it'd due to minmax having to decide which "ranges" to merge,
which requires repeated sorting, etc. I certainly don't dare to claim
the current algorithm is perfect. I wouldn't have expected such big
difference, though - so definitely worth investigating.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company