Re: PATCH: Using BRIN indexes for sorted output - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: PATCH: Using BRIN indexes for sorted output
Date
Msg-id 065bb749-71ee-6bd2-7a4b-1088cb26a31a@enterprisedb.com
Whole thread Raw
In response to Re: PATCH: Using BRIN indexes for sorted output  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses Re: PATCH: Using BRIN indexes for sorted output
List pgsql-hackers


On 7/11/23 13:20, Matthias van de Meent wrote:
> On Mon, 10 Jul 2023 at 22:04, Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> On 7/10/23 18:18, Matthias van de Meent wrote:
>>> On Mon, 10 Jul 2023 at 17:09, Tomas Vondra
>>> <tomas.vondra@enterprisedb.com> wrote:
>>>> On 7/10/23 14:38, Matthias van de Meent wrote:
>>>>>> I haven't really thought about geometric types, just about minmax and
>>>>>> minmax-multi. It's not clear to me what the benefit for these types be.
>>>>>> I mean, we can probably sort points lexicographically, but is anyone
>>>>>> doing that in queries? It seems useless for order by distance.
>>>>>
>>>>> Yes, that's why you would sort them by distance, where the distance is
>>>>> generated by the opclass as min/max distance between the summary and
>>>>> the distance's origin, and then inserted into the tuplesort.
>>>>>
>>>>
>>>> OK, so the query says "order by distance from point X" and we calculate
>>>> the min/max distance of values in a given page range.
>>>
>>> Yes, and because it's BRIN that's an approximation, which should
>>> generally be fine.
>>>
>>
>> Approximation in what sense? My understanding was we'd get a range of
>> distances that we know covers all rows in that range. So the results
>> should be accurate, no?
> 
> The distance is going to be accurate only to the degree that the
> summary can produce accurate distances for the datapoints it
> represents. That can be quite imprecise due to the nature of the
> contained datapoints: a summary of the points (-1, -1) and (1, 1) will
> have a minimum distance of 0 to the origin, where the summary (-1, 0)
> and (-1, 0.5) would have a much more accurate distance of 1.

Ummm, I'm probably missing something, or maybe my mental model of this
is just wrong, but why would the distance for the second summary be more
accurate? Or what does "more accurate" mean?

Is that about the range of distances for the summary? For the first
range the summary is a bounding box [(-1,1), (1,1)] so all we know the
points may have distance in range [0, sqrt(2)]. While for the second
summary it's [1, sqrt(1.25)].

> The point I was making is that the summary can only approximate the
> distance, and that approximation is fine w.r.t. the BRIN sort
> algoritm.
> 

I think as long as the approximation (whatever it means) does not cause
differences in results (compared to not using an index), it's OK.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: logical decoding and replication of sequences, take 2
Next
From: Farias de Oliveira
Date:
Subject: Re: In Postgres 16 BETA, should the ParseNamespaceItem have the same index as it's RangeTableEntry?