Re: PATCH: Using BRIN indexes for sorted output - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: PATCH: Using BRIN indexes for sorted output
Date
Msg-id 825f648f-2a31-1870-942f-03fba2b6b3cf@enterprisedb.com
Whole thread Raw
In response to Re: PATCH: Using BRIN indexes for sorted output  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses Re: PATCH: Using BRIN indexes for sorted output
List pgsql-hackers
Hi,

I finally had time to look at this patch again. There's a bit of bitrot,
so here's a rebased version (no other changes).

[more comments inline]

On 2/27/23 16:40, Matthias van de Meent wrote:
> Hi,
> 
> On Thu, 23 Feb 2023 at 17:44, Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
>>
>> I'll see to further reviewing 0004 and 0005 when I have additional time.
> 
> Some initial comments on 0004:
> 
>> +/*
>> + * brin_minmax_ranges
>> + *        Load the BRIN ranges and sort them.
>> + */
>> +Datum
>> +brin_minmax_ranges(PG_FUNCTION_ARGS)
>> +{
> 
> Like in 0001, this seems to focus on only single columns. Can't we put
> the scan and sort infrastructure in brin, and put the weight- and
> compare-operators in the opclasses? I.e.
> brin_minmax_minorder(PG_FUNCTION_ARGS=brintuple) -> range.min and
> brin_minmax_maxorder(PG_FUNCTION_ARGS=brintuple) -> range.max, and a
> brin_minmax_compare(order, order) -> int? I'm thinking of something
> similar to GIST's distance operators, which would make implementing
> ordering by e.g. `pointcolumn <-> (1, 2)::point` implementable in the
> brin infrastructure.
> 
> Note: One big reason I don't really like the current
> brin_minmax_ranges (and the analyze code in 0001) is because it breaks
> the operatorclass-vs-index abstraction layer. Operator classes don't
> (shouldn't) know or care about which attribute number they have, nor
> what the index does with the data.
> Scanning the index is not something that I expect the operator class
> to do, I expect that the index code organizes the scan, and forwards
> the data to the relevant operator classes.
> Scanning the index N times for N attributes can be excused if there
> are good reasons, but I'd rather have that decision made in the
> index's core code rather than at the design level.
> 

I think you're right. It'd be more appropriate to have most of the core
scanning logic in brin.c, and then delegate only some minor decisions to
the opclass. Like, comparisons, extraction of min/max from ranges etc.

However, it's not quite clear to me is what you mean by the weight- and
compare-operators? That is, what are

 - brin_minmax_minorder(PG_FUNCTION_ARGS=brintuple) -> range.min
 - brin_minmax_maxorder(PG_FUNCTION_ARGS=brintuple) -> range.max
 - brin_minmax_compare(order, order) -> int

supposed to do? Or what does "PG_FUNCTION_ARGS=brintuple" mean?

In principle we just need a procedure that tells us min/max for a given
page range - I guess that's what the minorder/maxorder functions do? But
why would we need the compare one? We're comparing by the known data
type, so we can just delegate the comparison to that, no?

Also, the existence of these opclass procedures should be enough to
identify the opclasses which can support this.

>> +/*
>> + * XXX Does it make sense (is it possible) to have a sort by more than one
>> + * column, using a BRIN index?
>> + */
> 
> Yes, even if only one prefix column is included in the BRIN index
> (e.g. `company` in `ORDER BY company, date`, the tuplesort with table
> tuples can add additional sorting without first reading the whole
> table, potentially (likely) reducing the total resource usage of the
> query. That utilizes the same idea as incremental sorts, but with the
> caveat that the input sort order is approximately likely but not at
> all guaranteed. So, even if the range sort is on a single index
> column, we can still do the table's tuplesort on all ORDER BY
> attributes, as long as a prefix of ORDER BY columns are included in
> the BRIN index.
> 

That's now quite what I meant, though. When I mentioned sorting by more
than one column, I meant using a multi-column BRIN index on those
columns. Something like this:

   CREATE TABLE t (a int, b int);
   INSERT INTO t ...
   CREATE INDEX ON t USING brin (a,b);

   SELECT * FROM t ORDER BY a, b;

Now, what I think you described is using BRIN to sort by "a", and then
do incremental sort for "b". What I had in mind is whether it's possible
to use BRIN to sort by "b" too.

I was suspecting it might be made to work, but now that I think about it
again it probably can't - BRIN pretty much sorts the columns separately,
it's not like having an ordering by (a,b) - first by "a", then "b".

>> +            /*
>> +             * XXX We can be a bit smarter for LIMIT queries - once we
>> +             * know we have more rows in the tuplesort than we need to
>> +             * output, we can stop spilling - those rows are not going
>> +             * to be needed. We can discard the tuplesort (no need to
>> +             * respill) and stop spilling.
>> +             */
> 
> Shouldn't that be "discard the tuplestore"?
> 

Yeah, definitely.

>> +#define BRIN_PROCNUM_RANGES         12    /* optional */
> 
> It would be useful to add documentation for this in this patch.
> 

Right, this should be documented in doc/src/sgml/brin.sgml.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Akshat Jaimini
Date:
Subject: Re: Add more sanity checks around callers of changeDependencyFor()
Next
From: Gurjeet Singh
Date:
Subject: Re: Standardize type of variable when extending Buffers