Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 - Mailing list pgsql-hackers
From | Japin Li |
---|---|
Subject | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 |
Date | |
Msg-id | ME0P300MB0445827B6E9CC04E0FAEE446B624A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM Whole thread Raw |
Responses |
Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 |
List | pgsql-hackers |
On Tue, 29 Jul 2025 at 06:57, Peter Smith <smithpb2250@gmail.com> wrote: > Here are the latest v15 patches. > > Changes include: > > PATCH 0002. > - README now says user should not tamper with VCI internal relations > - fixes/test the VACUUM bug -- fix provided by Japin [1] > - fixes/tests the reported segv for attempted REFRESH of VCI internal > relation -- see [2 comment#1] > - fixes/tests VCI internal relation dependency on the indexed table > - simplifies code for PG_TEMP_FILES_DIR -- see [2 comment#2] > Hi Peter, Thanks for updating the patches. 1. I've identified another TRAP failure. Here are the reproduction steps: rm -rf demo initdb -D demo cat <<EOF >>demo/postgresql.auto.conf shared_preload_libraries = 'vci' max_worker_processes = '20' EOF pg_ctl -D demo start cat <<EOF | psql postgres CREATE EXTENSION vci; CREATE TABLE t (id int, info text); CREATE INDEX ON t USING vci (id); INSERT INTO t SELECT id, md5(random()::text) FROM generate_series(1, 1000) id; REINDEX TABLE t; REINDEX TABLE t; EOF The current VCI design doesn't support REINDEX, which is expected. But I've discovered an unexpected issue: running REINDEX on a table a second time causes an assertion failure. 2. +Internal Relation Types: +- -1: TID relation (maps CRID to original TID) +- -2: NULL vector (bit array for NULL values) +- -3: Delete vector (bit array for deleted records) +- -5: TID-CRID mapping table +- -9: Data WOS (buffered row data) +- -10: Whiteout WOS (deletion markers) +- 0-N: ROS column data relations (one per indexed column) + +Example: +For a VCI index on sales(customer_id, amount, date): + +Generated relations include: +vci_0000001234_00000_d → Column 0 data (customer_id) +vci_0000001234_00001_d → Column 1 data (amount) +vci_0000001234_00002_d → Column 2 data (date) +vci_0000001234_65535_d → TID relation +vci_0000001234_65534_d → NULL vector +vci_0000001234_65533_d → Delete vector +vci_0000001234_65531_m → TID-CRID mapping +vci_0000001234_65527_d → Data WOS +vci_0000001234_65526_d → Whiteout WOS The README states that it generates the above relations, but there are additional internal relations that are not mentioned. SELECT relname, relkind FROM pg_class WHERE relname ~ 'vci*' ORDER BY relname; relname | relkind ------------------------+--------- vci_0000016578_00000_d | m vci_0000016578_00000_m | m vci_0000016578_00001_d | m vci_0000016578_00001_m | m vci_0000016578_65526_d | m vci_0000016578_65527_d | m vci_0000016578_65530_0 | m vci_0000016578_65530_1 | m vci_0000016578_65531_d | m vci_0000016578_65531_m | m vci_0000016578_65533_d | m vci_0000016578_65533_m | m vci_0000016578_65534_d | m vci_0000016578_65534_m | m vci_0000016578_65535_d | m vci_0000016578_65535_m | m Based on the above, are the following materialized views unused, or is their use just undocumented? - vci_0000016578_00000_m - vci_0000016578_00001_m - vci_0000016491_65530_0 - vci_0000016578_65530_1 - vci_0000016578_65531_d - vci_0000016578_65534_m - vci_0000016578_65535_m What is the purpose of the '0' and '1' suffixes? 3. I've also found that the VCI index is not working. Is this the expected behavior? [local]:3209161 postgres=# \d+ t Table "public.t" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------+---------+-----------+----------+---------+----------+-------------+--------------+------------- id | integer | | | | plain | | | info | text | | | | extended | | | Indexes: "t_id_idx" vci (id) Access method: heap [local]:3209161 postgres=# SET enable_seqscan TO off; SET [local]:3209161 postgres=# EXPLAIN SELECT * FROM t WHERE id = 1000; QUERY PLAN ----------------------------------------------------- Seq Scan on t (cost=0.00..2084.00 rows=1 width=37) Disabled: true Filter: (id = 1000) (3 rows) -- Regards, Japin Li
pgsql-hackers by date: