Re: PATCH: Using BRIN indexes for sorted output - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: PATCH: Using BRIN indexes for sorted output |
Date | |
Msg-id | CAEze2WjOrBXbMEM-xq8OOPJquNdrTn-t6qX1G7FzBfn9nY_a6w@mail.gmail.com Whole thread Raw |
In response to | Re: PATCH: Using BRIN indexes for sorted output (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: PATCH: Using BRIN indexes for sorted output
|
List | pgsql-hackers |
On Mon, 10 Jul 2023 at 17:09, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > On 7/10/23 14:38, Matthias van de Meent wrote: >> Kind of. For single-dimensional opclasses (minmax, minmax_multi) we >> only need to extract the normal min/max values for ASC/DESC sorts, >> which are readily available in the summary. But for multi-dimensional >> and distance searches (nearest neighbour) we need to calculate the >> distance between the indexed value(s) and the origin value to compare >> the summary against, and the order would thus be asc/desc on distance >> - a distance which may not be precisely represented by float types - >> thus 'relative order' with its own order operation. >> > > Can you give some examples of such data / queries, and how would it > leverage the BRIN sort stuff? Order by distance would be `ORDER BY box <-> '(1, 2)'::point ASC`, and the opclass would then decide that `<->(box, point) ASC` means it has to return the closest distance from the point to the summary, for some measure of 'distance' (this case L2, <#> other types, etc.). For DESC, that would return the distance from `'(1,2)'::point` to the furthest edge of the summary away from that point. Etc. > For distance searches, I imagine this as data indexed by BRIN inclusion > opclass, which creates a bounding box. We could return closest/furthest > point on the bounding box (from the point used in the query). Which > seems a bit like a R-tree ... Kind of; it would allow us to utilize such orderings without the expensive 1 tuple = 1 index entry and without scanning the full table before getting results. No tree involved, just a sequential scan on the index to allow some sketch-based pre-sort on the data. Again, this would work similar to how GiST's internal pages work: each downlink in GiST contains a summary of the entries on the downlinked page, and distance searches use a priority queue where the priority is the distance of the opclass-provided distance operator - lower distance means higher priority. For BRIN, we'd have to build a priority queue for the whole table at once, but presorting table sections is part of the design of BRIN sort, right? > But I have no idea what would this do for multi-dimensional searches, or > what would those searches do? How would you sort such data other than > lexicographically? Which I think is covered by the current BRIN Sort, > because the data is either stored as multiple columns, in which case we > use the BRIN on the first column. Or it's indexed using BRIN minmax as a > tuple of values, but then it's sorted lexicographically. Yes, just any BRIN summary that allows distance operators and the like should be enough MINMAX is easy to understand, and box inclusion are IMO also fairly easy to understand. >>> I haven't really thought about geometric types, just about minmax and >>> minmax-multi. It's not clear to me what the benefit for these types be. >>> I mean, we can probably sort points lexicographically, but is anyone >>> doing that in queries? It seems useless for order by distance. >> >> Yes, that's why you would sort them by distance, where the distance is >> generated by the opclass as min/max distance between the summary and >> the distance's origin, and then inserted into the tuplesort. >> > > OK, so the query says "order by distance from point X" and we calculate > the min/max distance of values in a given page range. Yes, and because it's BRIN that's an approximation, which should generally be fine. Kind regards, Matthias van de Meent Neon (https://neon.tech/)
pgsql-hackers by date: