Re: WIP: BRIN multi-range indexes - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: WIP: BRIN multi-range indexes
Date
Msg-id 20200910200137.GA25491@alvherre.pgsql
Whole thread Raw
In response to Re: WIP: BRIN multi-range indexes  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: WIP: BRIN multi-range indexes  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On 2020-Sep-10, Tomas Vondra wrote:

> I've spent a bit of time experimenting with this. My idea was to allow
> keeping an "expanded" version of the summary somewhere. As the addValue
> function only receives BrinValues I guess one option would be to just
> add bv_mem_values field. Or do you have a better idea?

Maybe it's okay to pass the BrinMemTuple to the add_value function, and
keep something there.  Or maybe that's pointless and just a new field in
BrinValues is okay.

> Of course, more would need to be done:
> 
> 1) We'd need to also pass the right memory context (bt_context seems
> like the right thing, but that's not something addValue sees now).

You could use GetMemoryChunkContext() for that.

> 2) We'd also need to specify some sort of callback that serializes the
> in-memory value into bt_values. That's not something addValue can do,
> because it doesn't know whether it's the last value in the range etc. I
> guess one option would be to add yet another support proc, but I guess a
> simple callback would be enough.

Hmm.

> I've hacked together an experimental version of this to see how much
> would it help, and it reduces the duration from ~4.6s to ~3.3s. Which is
> nice, but plain minmax is ~1.1s. I suppose there's room for further
> improvements in compare_combine_ranges/reduce_combine_ranges and so on,
> but I still think there'll always be a gap compared to plain minmax.

The main reason I'm talking about desupporting plain minmax is that,
even if it's amazingly fast, it loses quite quickly in real-world cases
because of loss of correlation.  Minmax's build time is pretty much
determined by speed at which you can seqscan the table.  I don't think
we lose much if we add overhead in order to create an index that is 100x
more useful.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Collation versioning
Next
From: Tom Lane
Date:
Subject: Re: SIGQUIT handling, redux