Hi Peter,
> Attached is a POC patch that adds skip scan to nbtree. The patch
> teaches nbtree index scans to efficiently use a composite index on
> '(a, b)' for queries with a predicate such as "WHERE b = 5". This is
> feasible in cases where the total number of distinct values in the
> column 'a' is reasonably small (think tens or hundreds, perhaps even
> thousands for very large composite indexes).
>
> [...]
>
> Thoughts?
Many thanks for working on this. I believe it is an important feature
and it would be great to deliver it during the PG18 cycle.
I experimented with the patch and here are the results I got so far.
Firstly, it was compiled on Intel MacOS and ARM Linux. All the tests
pass just fine.
Secondly, I tested the patch manually using a release build on my
Raspberry Pi 5 and the GUCs that can be seen in [1].
Test 1 - simple one.
```
CREATE TABLE test1(c char, n bigint);
CREATE INDEX test1_idx ON test1 USING btree(c,n);
INSERT INTO test1
SELECT chr(ascii('a') + random(0,2)) AS c,
random(0, 1_000_000_000) AS n
FROM generate_series(0, 1_000_000);
EXPLAIN [ANALYZE] SELECT COUNT(*) FROM test1 WHERE n > 900_000_000;
```
Test 2 - a more complicated one.
```
CREATE TABLE test2(c1 char, c2 char, n bigint);
CREATE INDEX test2_idx ON test2 USING btree(c1,c2,n);
INSERT INTO test2
SELECT chr(ascii('a') + random(0,2)) AS c1,
chr(ascii('a') + random(0,2)) AS c2,
random(0, 1_000_000_000) AS n
FROM generate_series(0, 1_000_000);
EXPLAIN [ANALYZE] SELECT COUNT(*) FROM test2 WHERE n > 900_000_000;
```
Test 3 - to see how it works with covering indexes.
```
CREATE TABLE test3(c char, n bigint, s text DEFAULT 'text_value' || n);
CREATE INDEX test3_idx ON test3 USING btree(c,n) INCLUDE(s);
INSERT INTO test3
SELECT chr(ascii('a') + random(0,2)) AS c,
random(0, 1_000_000_000) AS n,
'text_value_' || random(0, 1_000_000_000) AS s
FROM generate_series(0, 1_000_000);
EXPLAIN [ANALYZE] SELECT s FROM test3 WHERE n < 1000;
```
In all the cases the patch worked as expected.
I noticed that with the patch we choose Index Only Scans for Test 1
and without the patch - Parallel Seq Scan. However the Parallel Seq
Scan is 2.4 times faster. Before the patch the query takes 53 ms,
after the patch - 127 ms. I realize this could be just something
specific to my hardware and/or amount of data.
Do you think this is something that was expected or something worth
investigating further?
I haven't looked at the code yet.
[1]: https://github.com/afiskon/pgscripts/blob/master/single-install-meson.sh
--
Best regards,
Aleksander Alekseev