Re: PATCH: Using BRIN indexes for sorted output - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: PATCH: Using BRIN indexes for sorted output |
Date | |
Msg-id | CAEze2WgN9LhrrZD3kTHw28sTty5Zc6AYMyFqROJ9wmKH0eF6yw@mail.gmail.com Whole thread Raw |
In response to | Re: PATCH: Using BRIN indexes for sorted output (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: PATCH: Using BRIN indexes for sorted output
|
List | pgsql-hackers |
First of all, it's really great to see that this is being worked on. On Sun, 16 Oct 2022 at 16:34, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > Try to formulate the whole algorithm. Maybe I'm missing something. > > The current algorithm is something like this: > > 1. request info about ranges from the BRIN opclass > 2. sort them by maxval and minval Why sort on maxval and minval? That seems wasteful for effectively all sorts, where range sort on minval should suffice: If you find a range that starts at 100 in a list of ranges sorted at minval, you've processed all values <100. You can't make a similar comparison when that range is sorted on maxvals. > 3. NULLS FIRST: read all ranges that might have NULLs => output > 4. read the next range (by maxval) into tuplesort > (if no more ranges, go to (9)) > 5. load all tuples from "splill" tuplestore, compare to maxval Instead of this, shouldn't an update to tuplesort that allows for restarting the sort be better than this? Moving tuples that we've accepted into BRINsort state but not yet returned around seems like a waste of cycles, and I can't think of a reason why it can't work. > 6. load all tuples from no-summarized ranges (first range only) > (into tuplesort/tuplestore, depending on maxval comparison) > 7. load all intersecting ranges (with minval < current maxval) > (into tuplesort/tuplestore, depending on maxval comparison) > 8. sort the tuplesort, output all tuples, then back to (4) > 9. NULLS LAST: read all ranges that might have NULLs => output > 10. done > > For "DESC" ordering the process is almost the same, except that we swap > minval/maxval in most places. When I was thinking about this feature at the PgCon unconference, I was thinking about it more along the lines of the following system (for ORDER BY col ASC NULLS FIRST): 1. prepare tuplesort Rs (for Rangesort) for BRIN tuples, ordered by [has_nulls, min ASC] 2. scan info about ranges from BRIN, store them in Rs. 3. Finalize the sorting of Rs. 4. prepare tuplesort Ts (for Tuplesort) for sorting on the specified column ordering. 5. load all tuples from no-summarized ranges into Ts' 6. while Rs has a block range Rs' with has_nulls: - Remove Rs' from Rs - store the tuples of Rs' range in Ts. We now have all tuples with NULL in our sorted set; max_sorted = (NULL) 7. Finalize the Ts sorted set. 8. While the next tuple Ts' in the Ts tuplesort <= max_sorted - Remove Ts' from Ts - Yield Ts' Now, all tuples up to and including max_sorted are yielded. 9. If there are no more ranges in Rs: - Yield all remaining tuples from Ts, then return. 10. "un-finalize" Ts, so that we can start adding tuples to that tuplesort. This is different from Tomas' implementation, as he loads the tuples into a new tuplestore. 11. get the next item from Rs: Rs' - remove Rs' from Rs - assign Rs' min value to max_sorted - store the tuples of Rs' range in Ts 12. while the next item Rs' from Rs has a min value of max_sorted: - remove Rs' from Rs - store the tuples of Rs' range in Ts 13. The 'new' value from the next item from Rs is stored in max_sorted. If no such item exists, max_sorted is assigned a sentinel value (+INF) 14. Go to Step 7 This set of operations requires a restarting tuplesort for Ts, but I don't think that would result in many API changes for tuplesort. It reduces the overhead of large overlapping ranges, as it doesn't need to copy all tuples that have been read from disk but have not yet been returned. The maximum cost of this tuplesort would be the cost of sorting a seqscanned table, plus sorting the relevant BRIN ranges, plus the 1 extra compare per tuple and range that are needed to determine whether the range or tuple should be extracted from the tuplesort. The minimum cost would be the cost of sorting all BRIN ranges, plus sorting all tuples in one of the index's ranges. Kind regards, Matthias van de Meent PS. Are you still planning on giving the HOT optimization for BRIN a second try? I'm fairly confident that my patch at [0] would fix the issue that lead to the revert of that feature, but it introduced ABI changes after the feature freeze and thus it didn't get in. The patch might need some polishing, but I think it shouldn't take too much extra effort to get into PG16. [0] https://www.postgresql.org/message-id/CAEze2Wi9%3DBay_%3DrTf8Z6WPgZ5V0tDOayszQJJO%3DR_9aaHvr%2BTg%40mail.gmail.com
pgsql-hackers by date: