Re: WIP: BRIN multi-range indexes - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: WIP: BRIN multi-range indexes
Date
Msg-id 20200910183248.eajizgzuypntsjmh@development
Whole thread Raw
In response to Re: WIP: BRIN multi-range indexes  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: WIP: BRIN multi-range indexes  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: WIP: BRIN multi-range indexes  (John Naylor <john.naylor@2ndquadrant.com>)
List pgsql-hackers
On Wed, Sep 09, 2020 at 10:26:00PM +0200, Tomas Vondra wrote:
>On Wed, Sep 09, 2020 at 04:53:30PM -0300, Alvaro Herrera wrote:
>>On 2020-Sep-09, Tomas Vondra wrote:
>>
>>>There are some minor optimizations possible - for example I noticed we
>>>call minmax_multi_get_strategy_procinfo often because it happens in a
>>>loop, and we could easily do it just once. But that saves only about 10%
>>>or so, it's not a ground-breaking optimization.
>>
>>Well, I guess this kind of thing should be fixed regardless while we
>>still know it's there, just to avoid an obvious inefficiency.
>>
>
>Sure. I was just suggesting it's not something that'd make this very
>close to plain minmax opclass.
>
>>>The main reason for the slowness is that we pass the values one by one
>>>to brin_minmax_multi_add_value - and on each call we need to deserialize
>>>(and then sometimes also serialize) the summary, which may be quite
>>>expensive. The regular minmax does not have this issue, it just swaps
>>>the Datum value and that's it.
>>
>>Ah, right, that's more interesting.  The original dumb BRIN code
>>separates BrinMemTuple from BrinTuple so that things can be operated
>>efficiently in memory.  Maybe something similar can be done in this
>>case, which also sounds like your second suggestion:
>>
>>>Another option would be to teach add_value to keep the deserialized
>>>summary somewhere, and then force serialization at the end of the BRIN
>>>page range. The end result would be roughly the same, I think.
>>
>
>Well, the patch already has Ranges (memory) and SerializedRanges (disk)
>but it's not very clear to me where to stash the in-memory data and
>where to make the conversion.
>

I've spent a bit of time experimenting with this. My idea was to allow
keeping an "expanded" version of the summary somewhere. As the addValue
function only receives BrinValues I guess one option would be to just
add bv_mem_values field. Or do you have a better idea?

Of course, more would need to be done:

1) We'd need to also pass the right memory context (bt_context seems
like the right thing, but that's not something addValue sees now).

2) We'd also need to specify some sort of callback that serializes the
in-memory value into bt_values. That's not something addValue can do,
because it doesn't know whether it's the last value in the range etc. I
guess one option would be to add yet another support proc, but I guess a
simple callback would be enough.

I've hacked together an experimental version of this to see how much
would it help, and it reduces the duration from ~4.6s to ~3.3s. Which is
nice, but plain minmax is ~1.1s. I suppose there's room for further
improvements in compare_combine_ranges/reduce_combine_ranges and so on,
but I still think there'll always be a gap compared to plain minmax.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: SIGQUIT handling, redux
Next
From: "Jonathan S. Katz"
Date:
Subject: Re: PG 13 release notes, first draft