RE: Adding skip scan (including MDAM style range skip scan) to nbtree - Mailing list pgsql-hackers
From | |
---|---|
Subject | RE: Adding skip scan (including MDAM style range skip scan) to nbtree |
Date | |
Msg-id | TYWPR01MB10982A413E0EC4088E78C0E11B1A62@TYWPR01MB10982.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Adding skip scan (including MDAM style range skip scan) to nbtree (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Adding skip scan (including MDAM style range skip scan) to nbtree
|
List | pgsql-hackers |
Hi, Since I'd like to understand the skip scan to improve the EXPLAIN output for multicolumn B-Tree Index[1], I began to try the skip scan with some queries and look into the source code. I have some feedback and comments. (1) At first, I was surprised to look at your benchmark result because the skip scan index can improve much performance. I agree that there are many users to be happy with the feature for especially OLAP use-case. I expected to use v18. (2) I found the cost is estimated to much higher if the number of skipped attributes is more than two. Is it expected behavior? # Test result. The attached file is the detail of tests. -- Index Scan -- The actual time is low since the skip scan works well -- But the cost is higher than one of seqscan EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT * FROM test WHERE id3 = 101; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_id1_id2_id3 on public.test (cost=0.42..26562.77 rows=984 width=20) (actual time=0.051..15.533 rows=991loops=1) Output: id1, id2, id3, value Index Cond: (test.id3 = 101) Buffers: shared hit=4402 Planning: Buffers: shared hit=7 Planning Time: 0.234 ms Execution Time: 15.711 ms (8 rows) -- Seq Scan -- actual time is high, but the cost is lower than one of the above Index Scan. EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT * FROM test WHERE id3 = 101; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..12676.73 rows=984 width=20) (actual time=0.856..113.861 rows=991 loops=1) Output: id1, id2, id3, value Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=6370 -> Parallel Seq Scan on public.test (cost=0.00..11578.33 rows=410 width=20) (actual time=0.061..102.016 rows=330 loops=3) Output: id1, id2, id3, value Filter: (test.id3 = 101) Rows Removed by Filter: 333003 Buffers: shared hit=6370 Worker 0: actual time=0.099..98.014 rows=315 loops=1 Buffers: shared hit=2066 Worker 1: actual time=0.054..97.162 rows=299 loops=1 Buffers: shared hit=1858 Planning: Buffers: shared hit=19 Planning Time: 0.194 ms Execution Time: 114.129 ms (18 rows) I look at btcostestimate() to find the reason and found the bound quals and cost.num_sa_scans are different from my expectation. My assumption is * bound quals is id3=XXX (and id1 and id2 are skipped attributes) * cost.num_sa_scans = 100 (=10*10 because assuming 10 primitive index scans per skipped attribute) But it's wrong. The above index scan result is * bound quals is NULL * cost.num_sa_scans = 1 As I know you said the below, but I'd like to know the above is expected or not. > That approach seems far more practicable than preempting the problem > during planning or during nbtree preprocessing. It seems like it'd be > very hard to model the costs statistically. We need revisions to > btcostestimate, of course, but the less we can rely on btcostestimate > the better. As I said, there are no new index paths generated by the > optimizer for any of this. I couldn't understand why there is the below logic well. btcostestimate() (...omit...) if (indexcol != iclause->indexcol) { /* no quals at all for indexcol */ found_skip = true; if (index->pages < 100) break; num_sa_scans += 10 * (indexcol - iclause->indexcol); // why add minus value? continue; // why skip to add bound quals? } (3) Currently, there is an assumption that "there will be 10 primitive index scans per skipped attribute". Is any chance to use pg_stats.n_distinct? [1] Improve EXPLAIN output for multicolumn B-Tree Index https://www.postgresql.org/message-id/flat/TYWPR01MB1098260B694D27758FE2BA46FB1C92%40TYWPR01MB10982.jpnprd01.prod.outlook.com Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
pgsql-hackers by date: