RE: Adding skip scan (including MDAM style range skip scan) to nbtree - Mailing list pgsql-hackers

From
Subject RE: Adding skip scan (including MDAM style range skip scan) to nbtree
Date
Msg-id TYWPR01MB10982A413E0EC4088E78C0E11B1A62@TYWPR01MB10982.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Adding skip scan (including MDAM style range skip scan) to nbtree  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Adding skip scan (including MDAM style range skip scan) to nbtree
List pgsql-hackers
Hi,

Since I'd like to understand the skip scan to improve the EXPLAIN output
for multicolumn B-Tree Index[1], I began to try the skip scan with some
queries and look into the source code.

I have some feedback and comments.

(1)

At first, I was surprised to look at your benchmark result because the skip scan
index can improve much performance. I agree that there are many users to be
happy with the feature for especially OLAP use-case. I expected to use v18.


(2)

I found the cost is estimated to much higher if the number of skipped attributes
is more than two. Is it expected behavior?

# Test result. The attached file is the detail of tests.

-- Index Scan
-- The actual time is low since the skip scan works well
-- But the cost is higher than one of seqscan
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT * FROM test WHERE id3 = 101;
                                                              QUERY PLAN
               

---------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_id1_id2_id3 on public.test  (cost=0.42..26562.77 rows=984 width=20) (actual time=0.051..15.533
rows=991loops=1) 
   Output: id1, id2, id3, value
   Index Cond: (test.id3 = 101)
   Buffers: shared hit=4402
 Planning:
   Buffers: shared hit=7
 Planning Time: 0.234 ms
 Execution Time: 15.711 ms
(8 rows)

-- Seq Scan
-- actual time is high, but the cost is lower than one of the above Index Scan.
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT * FROM test WHERE id3 = 101;
                                                          QUERY PLAN
       

-------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..12676.73 rows=984 width=20) (actual time=0.856..113.861 rows=991 loops=1)
   Output: id1, id2, id3, value
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=6370
   ->  Parallel Seq Scan on public.test  (cost=0.00..11578.33 rows=410 width=20) (actual time=0.061..102.016 rows=330
loops=3)
         Output: id1, id2, id3, value
         Filter: (test.id3 = 101)
         Rows Removed by Filter: 333003
         Buffers: shared hit=6370
         Worker 0:  actual time=0.099..98.014 rows=315 loops=1
           Buffers: shared hit=2066
         Worker 1:  actual time=0.054..97.162 rows=299 loops=1
           Buffers: shared hit=1858
 Planning:
   Buffers: shared hit=19
 Planning Time: 0.194 ms
 Execution Time: 114.129 ms
(18 rows)


I look at btcostestimate() to find the reason and found the bound quals
and cost.num_sa_scans are different from my expectation.

My assumption is
* bound quals is id3=XXX (and id1 and id2 are skipped attributes)
* cost.num_sa_scans = 100 (=10*10 because assuming 10 primitive index scans
                          per skipped attribute)

But it's wrong. The above index scan result is
* bound quals is NULL
* cost.num_sa_scans = 1


As I know you said the below, but I'd like to know the above is expected or not.

> That approach seems far more practicable than preempting the problem
> during planning or during nbtree preprocessing. It seems like it'd be
> very hard to model the costs statistically. We need revisions to
> btcostestimate, of course, but the less we can rely on btcostestimate
> the better. As I said, there are no new index paths generated by the
> optimizer for any of this.

I couldn't understand why there is the below logic well.

  btcostestimate()
  (...omit...)
              if (indexcol != iclause->indexcol)
              {
                  /* no quals at all for indexcol */
                  found_skip = true;
                  if (index->pages < 100)
                      break;
                  num_sa_scans += 10 * (indexcol - iclause->indexcol);     // why add minus value?
                  continue;                                                  // why skip to add bound quals?
              }


(3)

Currently, there is an assumption that "there will be 10 primitive index scans
per skipped attribute". Is any chance to use pg_stats.n_distinct?

[1] Improve EXPLAIN output for multicolumn B-Tree Index

https://www.postgresql.org/message-id/flat/TYWPR01MB1098260B694D27758FE2BA46FB1C92%40TYWPR01MB10982.jpnprd01.prod.outlook.com

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION


Attachment

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Test to dump and restore objects left behind by regression
Next
From: Michael Paquier
Date:
Subject: Re: Allow non-superuser to cancel superuser tasks.