Re: Adding skip scan (including MDAM style range skip scan) to nbtree - Mailing list pgsql-hackers

From Aleksander Alekseev
Subject Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Date
Msg-id CAJ7c6TP2N9Dv06NkbFXYKOTaZuQcdiSf6PSKjbAPizzV3p1pfA@mail.gmail.com
Whole thread Raw
In response to Re: Adding skip scan (including MDAM style range skip scan) to nbtree  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Adding skip scan (including MDAM style range skip scan) to nbtree
List pgsql-hackers
Hi Peter,

> It looks like the queries you posted have a kind of adversarial
> quality to them, as if they were designed to confuse the
> implementation. Was it intentional?

To some extent. I merely wrote several queries that I would expect
should benefit from skip scans. Since I didn't look at the queries you
used there was a chance that I will hit something interesting.

> Attached v2 fixes this bug. The problem was that the skip support
> function used by the "char" opclass assumed signed char comparisons,
> even though the authoritative B-Tree comparator (support function 1)
> uses signed comparisons (via uint8 casting). A simple oversight. Your
> test cases will work with this v2, provided you use "char" (instead of
> unadorned char) in the create table statements.

Thanks for v2.

> If you change your table definition to CREATE TABLE test1(c "char", n
> bigint), then your example queries can use the optimization. This
> makes a huge difference.

You are right, it does.

Test1 takes 33.7 ms now (53 ms before the path, x1.57)

Test3 I showed before contained an error in the table definition
(Postgres can't do `n bigint, s text DEFAULT 'text_value' || n`). Here
is the corrected test:

```
CREATE TABLE test3(c "char", n bigint, s text);
CREATE INDEX test3_idx ON test3 USING btree(c,n) INCLUDE(s);

INSERT INTO test3
  SELECT chr(ascii('a') + random(0,2)) AS c,
         random(0, 1_000_000_000) AS n,
         'text_value_' || random(0, 1_000_000_000) AS s
  FROM generate_series(0, 1_000_000);

EXPLAIN ANALYZE SELECT s FROM test3 WHERE n < 10_000;
```

It runs fast (< 1 ms) and uses the index, as expected.

Test2 with "char" doesn't seem to benefit from the patch anymore
(pretty sure it did in v1). It always chooses Parallel Seq Scans even
if I change the condition to `WHERE n > 999_995_000` or `WHERE n =
999_997_362`. Is it an expected behavior?

I also tried Test4 and Test5.

In Test4 I was curious if scip scans work properly with functional indexes:

```
CREATE TABLE test4(d date, n bigint);
CREATE INDEX test4_idx ON test4 USING btree(extract(year from d),n);

INSERT INTO test4
  SELECT ('2024-' || random(1,12) || '-' || random(1,28)) :: date AS d,
         random(0, 1_000_000_000) AS n
  FROM generate_series(0, 1_000_000);

EXPLAIN ANALYZE SELECT COUNT(*) FROM test4 WHERE n > 900_000_000;
```

The query uses Index Scan, however the performance is worse than with
Seq Scan chosen before the patch. It doesn't matter if I choose '>' or
'=' condition.

Test5 checks how skip scans work with partial indexes:

```
CREATE TABLE test5(c "char", n bigint);
CREATE INDEX test5_idx ON test5 USING btree(c, n) WHERE n > 900_000_000;

INSERT INTO test5
  SELECT chr(ascii('a') + random(0,2)) AS c,
         random(0, 1_000_000_000) AS n
  FROM generate_series(0, 1_000_000);

EXPLAIN ANALYZE SELECT COUNT(*) FROM test5 WHERE n > 950_000_000;
```

It runs fast and choses Index Only Scan. But then I discovered that
without the patch Postgres also uses Index Only Scan for this query. I
didn't know it could do this - what is the name of this technique? The
query takes 17.6 ms with the patch, 21 ms without the patch. Not a
huge win but still.

That's all I have for now.

--
Best regards,
Aleksander Alekseev



pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
Next
From: vignesh C
Date:
Subject: Re: Improving the latch handling between logical replication launcher and worker processes.