Re: PATCH: Using BRIN indexes for sorted output - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: PATCH: Using BRIN indexes for sorted output |
Date | |
Msg-id | 9049670c-41f3-45af-2e6a-64ddde225333@enterprisedb.com Whole thread Raw |
In response to | Re: PATCH: Using BRIN indexes for sorted output (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On 10/16/22 16:41, Tom Lane wrote: > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: >> I forgot to mention one important issue in my list yesterday, and that's >> memory consumption. > > TBH, this is all looking like vastly more complexity than benefit. > It's going to be impossible to produce a reliable cost estimate > given all the uncertainty, and I fear that will end in picking > BRIN-based sorting when it's not actually a good choice. > Maybe. If it turns out the estimates we have are insufficient to make good planning decisions, that's life. As I wrote in my message, I know the BRIN costing is a bit shaky in general (not just for this new operation), and I intend to propose some improvement in a separate patch. I think the main issue with BRIN costing is that we have no stats about the ranges, and we can't estimate how many ranges we'll really end up accessing. If you have 100 rows, will that be 1 range or 100 ranges? Or for the BRIN Sort, how many overlapping ranges will there be? I intend to allow index AMs to collect custom statistics, and the BRIN minmax opfamily would collect e.g. this: 1) number of non-summarized ranges 2) number of all-nulls ranges 3) number of has-nulls ranges 4) average number of overlaps (given a random range, how many other ranges intersect with it) 5) how likely is it for a row to hit multiple ranges (cross-check sample rows vs. ranges) I believe this will allow much better / more reliable BRIN costing (the number of overlaps is particularly useful for the this patch). > The examples you showed initially are cherry-picked to demonstrate > the best possible case, which I doubt has much to do with typical > real-world tables. It would be good to see what happens with > not-perfectly-sequential data before even deciding this is worth > spending more effort on. Yes, the example was trivial "happy case" example. Obviously, the performance degrades as the data become more random (with ranges wider), forcing the BRIN Sort to read / sort more tuples. But let's see an example with less correlated data, say, like this: create table t (a int) with (fillfactor = 10); insert into t select i + 10000 * random() from generate_series(1,10000000) s(i); With the fillfactor=10, there are ~2500 values per 1MB range, so this means each range overlaps with ~4 more. The results then look like this: 1) select * from t order by a; seqscan+sort: 4437 ms brinsort: 4233 ms 2) select * from t order by a limit 10; seqscan+sort: 1859 ms brinsort: 4 ms If you increase the random factor from 10000 to 100000 (so, 40 ranges), the seqscan timings remain about the same, while brinsort gets to 5200 and 20 ms. And with 1M, it's ~6000 and 300 ms. Only at 5000000, where we pretty much read 1/2 the table because the ranges intersect, we get the same timing as the seqscan (for the LIMIT query). The "full sort" query is more like 5000 vs. 6600 ms, so slower but not by a huge amount. Yes, this is a very simple example. I can do more tests with other datasets (larger/smaller, different distribution, ...). > It also seems kind of unfair to decide > that the relevant comparison point is a seqscan rather than a > btree indexscan. > I don't think it's all that unfair. How likely is it to have both a BRIN and btree index on the same column? And even if you do have such indexes (say, on different sets of keys), we kinda already have this costing issue with index and bitmap index scans. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: