Re: PATCH: Using BRIN indexes for sorted output - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: PATCH: Using BRIN indexes for sorted output |
Date | |
Msg-id | 0c331bf8-0cd3-e9fe-ae81-9e3d68f1f95c@enterprisedb.com Whole thread Raw |
In response to | Re: PATCH: Using BRIN indexes for sorted output (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: PATCH: Using BRIN indexes for sorted output
|
List | pgsql-hackers |
On 7/10/23 12:22, Matthias van de Meent wrote: > On Fri, 7 Jul 2023 at 20:26, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: >> >> Hi, >> >> I finally had time to look at this patch again. There's a bit of bitrot, >> so here's a rebased version (no other changes). > > Thanks! > >> On 2/27/23 16:40, Matthias van de Meent wrote: >>> Some initial comments on 0004: >>> >>>> +/* >>>> + * brin_minmax_ranges >>>> + * Load the BRIN ranges and sort them. >>>> + */ >>>> +Datum >>>> +brin_minmax_ranges(PG_FUNCTION_ARGS) >>>> +{ >>> >>> Like in 0001, this seems to focus on only single columns. Can't we put >>> the scan and sort infrastructure in brin, and put the weight- and >>> compare-operators in the opclasses? I.e. >>> brin_minmax_minorder(PG_FUNCTION_ARGS=brintuple) -> range.min and >>> brin_minmax_maxorder(PG_FUNCTION_ARGS=brintuple) -> range.max, and a >>> brin_minmax_compare(order, order) -> int? I'm thinking of something >>> similar to GIST's distance operators, which would make implementing >>> ordering by e.g. `pointcolumn <-> (1, 2)::point` implementable in the >>> brin infrastructure. >>> >> >> However, it's not quite clear to me is what you mean by the weight- and >> compare-operators? That is, what are >> >> - brin_minmax_minorder(PG_FUNCTION_ARGS=brintuple) -> range.min >> - brin_minmax_maxorder(PG_FUNCTION_ARGS=brintuple) -> range.max >> - brin_minmax_compare(order, order) -> int >> >> supposed to do? Or what does "PG_FUNCTION_ARGS=brintuple" mean? > > _minorder/_maxorder is for extracting the minimum/maximum relative > order of each range, used for ASC/DESC sorting of operator results > (e.g. to support ORDER BY <->(box_column, '(1,2)'::point) DESC). > PG_FUNCTION_ARGS is mentioned because of PG calling conventions; > though I did forget to describe the second operator argument for the > distance function. We might also want to use only one such "order > extraction function" with DESC/ASC indicated by an argument. > I'm still not sure I understand what "minimum/maximum relative order" is. Isn't it the same as returning min/max value that can appear in the range? Although, that wouldn't work for points/boxes. >> In principle we just need a procedure that tells us min/max for a given >> page range - I guess that's what the minorder/maxorder functions do? But >> why would we need the compare one? We're comparing by the known data >> type, so we can just delegate the comparison to that, no? > > Is there a comparison function for any custom orderable type that we > can just use? GIST distance ordering uses floats, and I don't quite > like that from a user perspective, as it makes ordering operations > imprecise. I'd rather allow (but discourage) any type with its own > compare function. > I haven't really thought about geometric types, just about minmax and minmax-multi. It's not clear to me what the benefit for these types be. I mean, we can probably sort points lexicographically, but is anyone doing that in queries? It seems useless for order by distance. >> Also, the existence of these opclass procedures should be enough to >> identify the opclasses which can support this. > > Agreed. > >>>> +/* >>>> + * XXX Does it make sense (is it possible) to have a sort by more than one >>>> + * column, using a BRIN index? >>>> + */ >>> >>> Yes, even if only one prefix column is included in the BRIN index >>> (e.g. `company` in `ORDER BY company, date`, the tuplesort with table >>> tuples can add additional sorting without first reading the whole >>> table, potentially (likely) reducing the total resource usage of the >>> query. That utilizes the same idea as incremental sorts, but with the >>> caveat that the input sort order is approximately likely but not at >>> all guaranteed. So, even if the range sort is on a single index >>> column, we can still do the table's tuplesort on all ORDER BY >>> attributes, as long as a prefix of ORDER BY columns are included in >>> the BRIN index. >>> >> >> That's now quite what I meant, though. When I mentioned sorting by more >> than one column, I meant using a multi-column BRIN index on those >> columns. Something like this: >> >> CREATE TABLE t (a int, b int); >> INSERT INTO t ... >> CREATE INDEX ON t USING brin (a,b); >> >> SELECT * FROM t ORDER BY a, b; >> >> Now, what I think you described is using BRIN to sort by "a", and then >> do incremental sort for "b". What I had in mind is whether it's possible >> to use BRIN to sort by "b" too. >> >> I was suspecting it might be made to work, but now that I think about it >> again it probably can't - BRIN pretty much sorts the columns separately, >> it's not like having an ordering by (a,b) - first by "a", then "b". > > I imagine it would indeed be limited to an extremely small subset of > cases, and probably not worth the effort to implement in an initial > version. > OK, let's stick to order by a single column. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: