Re: PATCH: Using BRIN indexes for sorted output - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: PATCH: Using BRIN indexes for sorted output |
Date | |
Msg-id | 5c92f99a-e83e-c1b3-2c47-d75857a70b73@enterprisedb.com Whole thread Raw |
In response to | Re: PATCH: Using BRIN indexes for sorted output (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: PATCH: Using BRIN indexes for sorted output
|
List | pgsql-hackers |
On 7/10/23 18:18, Matthias van de Meent wrote: > On Mon, 10 Jul 2023 at 17:09, Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> On 7/10/23 14:38, Matthias van de Meent wrote: >>> Kind of. For single-dimensional opclasses (minmax, minmax_multi) we >>> only need to extract the normal min/max values for ASC/DESC sorts, >>> which are readily available in the summary. But for multi-dimensional >>> and distance searches (nearest neighbour) we need to calculate the >>> distance between the indexed value(s) and the origin value to compare >>> the summary against, and the order would thus be asc/desc on distance >>> - a distance which may not be precisely represented by float types - >>> thus 'relative order' with its own order operation. >>> >> >> Can you give some examples of such data / queries, and how would it >> leverage the BRIN sort stuff? > > Order by distance would be `ORDER BY box <-> '(1, 2)'::point ASC`, and > the opclass would then decide that `<->(box, point) ASC` means it has > to return the closest distance from the point to the summary, for some > measure of 'distance' (this case L2, <#> other types, etc.). For DESC, > that would return the distance from `'(1,2)'::point` to the furthest > edge of the summary away from that point. Etc. > Thanks. >> For distance searches, I imagine this as data indexed by BRIN inclusion >> opclass, which creates a bounding box. We could return closest/furthest >> point on the bounding box (from the point used in the query). Which >> seems a bit like a R-tree ... > > Kind of; it would allow us to utilize such orderings without the > expensive 1 tuple = 1 index entry and without scanning the full table > before getting results. No tree involved, just a sequential scan on > the index to allow some sketch-based pre-sort on the data. Again, this > would work similar to how GiST's internal pages work: each downlink in > GiST contains a summary of the entries on the downlinked page, and > distance searches use a priority queue where the priority is the > distance of the opclass-provided distance operator - lower distance > means higher priority. Yes, that's roughly how I understood this too - a tradeoff that won't give the same performance as GiST, but much smaller and cheaper to maintain. > For BRIN, we'd have to build a priority queue > for the whole table at once, but presorting table sections is part of > the design of BRIN sort, right? Yes, that's kinda the whole point of BRIN sort. > >> But I have no idea what would this do for multi-dimensional searches, or >> what would those searches do? How would you sort such data other than >> lexicographically? Which I think is covered by the current BRIN Sort, >> because the data is either stored as multiple columns, in which case we >> use the BRIN on the first column. Or it's indexed using BRIN minmax as a >> tuple of values, but then it's sorted lexicographically. > > Yes, just any BRIN summary that allows distance operators and the like > should be enough MINMAX is easy to understand, and box inclusion are > IMO also fairly easy to understand. > True. If minmax is interpreted as inclusion with a simple 1D points, it kinda does the same thing. (Of course, minmax work with data types that don't have distances, but there's similarity.) >>>> I haven't really thought about geometric types, just about minmax and >>>> minmax-multi. It's not clear to me what the benefit for these types be. >>>> I mean, we can probably sort points lexicographically, but is anyone >>>> doing that in queries? It seems useless for order by distance. >>> >>> Yes, that's why you would sort them by distance, where the distance is >>> generated by the opclass as min/max distance between the summary and >>> the distance's origin, and then inserted into the tuplesort. >>> >> >> OK, so the query says "order by distance from point X" and we calculate >> the min/max distance of values in a given page range. > > Yes, and because it's BRIN that's an approximation, which should > generally be fine. > Approximation in what sense? My understanding was we'd get a range of distances that we know covers all rows in that range. So the results should be accurate, no? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: