Re: BRIN index which is much faster never chosen by planner - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: BRIN index which is much faster never chosen by planner |
Date | |
Msg-id | CAKJS1f8XM4WckdOQ_RDp8MSt2O1BkktW1EGe7MZZ31svr90Obg@mail.gmail.com Whole thread Raw |
In response to | Re: BRIN index which is much faster never chosen by planner (Jeremy Finzel <finzelj@gmail.com>) |
Responses |
Re: BRIN index which is much faster never chosen by planner
|
List | pgsql-hackers |
On Tue, 15 Oct 2019 at 08:43, Jeremy Finzel <finzelj@gmail.com> wrote: > I wanted to follow up on this specific issue. Isn't this the heart of the matter and a fundamental problem? If I wantto rely on BRIN indexes as in a straightforward case as explained in OP, but I don't know if the planner will be nearlyreliable enough, how can I depend on them in production? Is this not considered a planner bug or should this kindof case be documented as problematic for BRIN? As another way to look at it: is there a configuration parameter thatcould be added specific to BRIN or bitmapscan to provide help to cases like this? > > On freshly analyzed tables, I tried my original query again and found that even with now() - 3 days it does not choosethe BRIN index. In fact, it chose another btree on the table like (id1, id2, rec_insert_time). With warm cache, thepg-chosen plan takes 40 seconds to execute, whereas when I force a BRIN scan it takes only 4 seconds. Another thing which you might want to look at is the correlation column in the pg_stats view for the rec_insert_time column. Previous to 7e534adcd, BRIN index were costed based on the selectivity estimate. There was no accountability towards the fact that the pages for those records might have been spread out over the entire table. Post 7e534adcd, we use the correlation estimate to attempt to estimate how many pages (more specifically "ranges") we're likely to hit based on that and the selectivity estimate. This commit intended to fix the issue we had with BRIN indexes being selected far too often. Of course, the correlation is based on the entire table, if there are subsets of the table that are perhaps perfectly correlated, then the planner is not going to know about that. It's possible that some of your older rec_insert_times are spread out far more than the newer ones. As a test, you could try creating a new table and copying the records over to it in rec_insert_time order and seeing if the BRIN index is selected for that table (after having performed an ANALYZE). It would be interesting if you could show the pg_stats row for the column so that we can see if the correlation is low. You can see from the code below that the final selectivity strongly influenced by the correlation value (REF: brincostestimate) qualSelectivity = clauselist_selectivity(root, indexQuals, baserel->relid, JOIN_INNER, NULL); /* work out the actual number of ranges in the index */ indexRanges = Max(ceil((double) baserel->pages / statsData.pagesPerRange), 1.0); /* * Now calculate the minimum possible ranges we could match with if all of * the rows were in the perfect order in the table's heap. */ minimalRanges = ceil(indexRanges * qualSelectivity); /* * Now estimate the number of ranges that we'll touch by using the * indexCorrelation from the stats. Careful not to divide by zero (note * we're using the absolute value of the correlation). */ if (*indexCorrelation < 1.0e-10) estimatedRanges = indexRanges; else estimatedRanges = Min(minimalRanges / *indexCorrelation, indexRanges); /* we expect to visit this portion of the table */ selec = estimatedRanges / indexRanges; CLAMP_PROBABILITY(selec); My overall view on this is that the BRIN index is not that great since it's not eliminating that many rows by using it. From above we see: > Bitmap Heap Scan on foo.log_table l (cost=2391.71..24360848.29 rows=735 width=99) (actual time=824.133..21329.054 rows=466loops=1) Output: <hidden> Recheck Cond: (l.rec_insert_time >= (now() - '10 days'::interval)) Rows Removed by Index Recheck: 8187584 Filter: ((l.field1 IS NOT NULL) AND (l.category = 'music'::name)) Rows Removed by Filter: 19857107 Heap Blocks: lossy=1509000 So you have just 466 rows matching these quals, but the executor had to scan 1.5 million pages to get those and filter out 8.1 million rows on the recheck then 19.8 million on the filter. You've mentioned that the table's heap is 139 GB, which is about 18 million pages. It seems your query would perform much better if you had a btree index such as (category, rec_insert_time) where field1 is not null;, Of course, you've mentioned that you are finding when the plan uses the BRIN index that it executes more quickly, but I think you're going to find BRIN unreliable for tables anything other than INSERT-only tables which the records are always inserted with an ever-increasing or decreasing value in the BRIN indexed column. If you start performing UPDATEs then that's going to create holes that new record will fill and cause the correlation to start dropping resulting in the BRIN indexes scan cost going up. On the other hand, if you think you can do better than what was done in 7e534adcd, then it would be good to see someone working on it. I'm sure something better can be done. It's just not that easy to do with the scant correlation data we have on the column. As for is this a bug or something that's missing from the documents. The documents do mention: "BRIN stands for Block Range Index. BRIN is designed for handling very large tables in which certain columns have some natural correlation with their physical location within the table." https://www.postgresql.org/docs/current/brin-intro.html -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: