Home > mailing lists

Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 - Mailing list pgsql-hackers

From	Alexandre Felipe
Subject	Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
Date	February 10 21:19:28
Msg-id	CAE8JnxPY-x0hSAQe8cf=qo+BoFYZtD707y_zdQRH3M-EQeGSmA@mail.gmail.com Whole thread
In response to	Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 (Álvaro Herrera <alvherre@kurilemu.de>)
Responses	RE: [WIP]Vertical Clustered Index (columnar store extension) - take2
List	pgsql-hackers

Tree view

Thank you for your attention Peter, Japin, and all the guys involved in this work.

Alvaro,

Please review and let us know what you think

I reiterate what Álvaro said

> good columnar store is a very important piece that

> Postgres is missing. This project appears to be a good way to get that,
> so I'd be very happy if it gets done, so I estimate the reward to be
> high.

And taking from his comment

>> So, IMO the discussion we need to have is about setting a development

>> direction so that this overall project takes a form that can be

>> accepted.

DESIGN PRINCIPLES

1. {,un}pluggable:

- Available if one asks unnoticeable to everyone else.

- Core changes are kept to a minimum.

- Absolutely no references to the extension in the core.

2. useful:

- It should be able to scan a table faster than any other method

3. simple:

- It should be easy to get more people onboard.

4. limited as necessary.

- This is not all or nothing, it is about getting something.

Principle 1 is for reviewers. It is much more comfortable to accept a patch, like that.

Principle 2 is for users, it will make it desirable.

Principle 3 is for contributors, open source that everyone is afraid to touch doesn't take us too far.

Principle 4 is for the core developers, it will be rewarding to see progress.

DESIGN PHASES

I don't know the details involved in each of these phases, I am pretty new to postgres, and just landed on this thread. Hopefully everyone here agrees that this project can be split into smaller milestones. However, I am risking to propose a plan with three clear

Phase 1: PREPARATORY WORK

should be finished and committed, forever rebasing is a huge waste of time, not trying to be original here [1].

- 3.3.1 Ad-hoc hooks: this should be addressed as part of the preparatory work, as long as they are not on a hot path there should be little resistance in getting these accepted.

- 3.3.2 Embedded code: Should these be made as additional hooks?

- Can this interface be simplified, reducing the number of hooks?

Phase 2: CORE IMPLEMENTATION

Add support to the most useful use case (scans with aggregations in my opinion), and try to get it right.

> there is no need to be super fast. There is the need to be remarkably faster than it is now [2]

I would say that we should focus on fixed size data types first, e.g. dates, timestamps, and numeric types (maybe geolocation?). Indices should be built manually. No automatic ROS/WOS at this stage. Users can request the index to be updated manually via REINDEX (or any other prefered method). Admins could set up cron jobs running nightly to keep it reasonably updated.

Current issues listed in the /contrib/vci/README that should be addressed

- 8.1.1 DROP EXTENSION vci: the extension should not force anyone to reboot the database say if it has a memory leak.
- 8.2.4 Performance Testing: it will be important to have at least one example to showcase the performance gains at this point. This will be the selling point, to show that it can be useful to ship with version 19.

- I think this should consider sorting, or at least clustering the outputs per page (note at the end).

Phase 3: OPPORTUNISTIC ROS UPDATE

Should run with low priority, and small chunks (of course you can make it configurable), the focus of Posgres continues being on transactions. Chunks should be small enough that they don't impact transaction latency. It could be more performant using larger chunks? In that case we should consider an approach that can be easily interrupted and won't impact the available working memory. The extension should works with the default postgres parameters, currently max_worker_processes default is 8, and VCI requires at least 10.

Phase X: OPTIMIZE AND GENERALIZE

This item is where we should list the things that we MUST NOT ATTEMPT to get done during the previous phases. They are going to be necessary, but they must be built on top of a foundation yet to be built. At this point we could consider things like, adding support to variable length types, arrays, json, text, inverted indices, extensibility for custom types, etc. Compressed JSON is very promising, if you give developers the ability to write arbitrary JSONs, they will use a very nice and readable, low entropy format. Instead of TOAST tables could store external types with VCIs.

Looking forward to the refinements.

[2] https://www.postgresql.org/message-id/flat/a748aa6b-c7e6-4d02-a590-ab404d590448%40vondra.me#a8c868836d065b212da31265d16a528b:~:text=I%20just%20wanted%20to%20say%3A%20there%20is%20no%20need%20to%20be%20super%20fast.%0AThere%20is%20the%20need%20to%20be%20remarkably%20faster%20than%20it%20is%20now.

[1] https://www.postgresql.org/message-id/flat/a748aa6b-c7e6-4d02-a590-ab404d590448%40vondra.me#a8c868836d065b212da31265d16a528b:~:text=in%20which%20case%20we%20may%0Acommit%20the%20%22preparatory%22%20pieces%20and%20not%20have%20to%20rebase%20them%20forever.

Note:

I had different expectations, the first query uses two columns and filters over a 1-year period, the second query uses only one column and filters over.

My first expectation was that if only one column is used there is less data being accessed.

In my experience most queries will care about a timestamp column, it would be nice to be able to cluster the data by certain columns (in my case the created) column
Maybe even clustering by multiple columns independently mediated by CRID permutations. If you think that this is easy enough we could insert this before phase 3.

EXPLAIN (ANALYSE, VERBOSE)

SELECT department_id, date_trunc('month', created) count(1) FROM metadata

WHERE created BETWEEN '2025-01-01' AND '2026-01-01' group by 1, 2;

------

Custom Scan (VCI HashAggregate) (actual time=993.170..1002.744 rows=18114.00 loops=1)

Output: department_id, (date_trunc('month'::text, created)), count(*)

Group Key: metadata.department_id, date_trunc('month'::text, metadata.created)

Buffers: shared hit=599753 read=26782

-> Custom Scan (VCI Scan) using metadata_vci on public.metadata (rows=6509488)

Output: department_id, date_trunc('month'::text, created)

Filter: ...

Rows Removed by Filter: 9030822

Buffers: shared hit=599753 read=26782

Planning:

Buffers: shared read=4

Planning Time: 1.847 ms

Execution Time: 1083.087 ms

EXPLAIN (ANALYSE, VERBOSE)

SELECT department_id, count(1)

FROM metadata WHERE created BETWEEN '2025-01-01' AND '2025-01-02' group by 1;

-------

Custom Scan (VCI GroupAggregate) (actual time=439.137..439.182 rows=107.00 loops=1)

Output: department_id, count(*)

Group Key: metadata.department_id

Buffers: shared hit=599753 read=26782

-> Custom Scan (VCI Sort) (actual time=439.130..439.144 rows=570 loops=1)

Output: department_id

Sort Key: metadata.department_id

Sort Method: quicksort Memory: 38kB

Buffers: shared hit=599753 read=26782

-> Custom Scan (VCI Scan) using metadata_vci on public.metadata (rows=570)

Output: department_id

Filter: ...

Rows Removed by Filter: 15362038

Buffers: shared hit=599753 read=26782

Planning Time: 0.545 ms

Execution Time: 446.081 ms

Kind Regards,

Alexandre Felipe

On Tue, Feb 10, 2026 at 12:02 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:

Hello,

I think we need to have a discussion about this patch set. In my
opinion, this is going nowhere in its current form. It's just not
acceptable to have a full-blown executor implementation in contrib/ that
operates based on hooks in heapam and the transaction machinery. This
design doesn't work for us and it can't be accepted.

Which is to say, these continued postings of rebased with minor tweaks
here and there, appear somewhat pointless from my point of view.

So, IMO the discussion we need to have is about setting a development
direction so that this overall project takes a form that can be
accepted. I already proposed upthread and in pgconf.dev last year that
this should be implemented as a new table AM (in src/backend/access/),
and then add appropriate executor support in the main executor code.
If we disagree on this, let's discuss the reasons until we find a
consensus, and then focus on how to use all this code in a way that
works with that design.

Now, I imagine that morphing all this code to become a table AM is a
huge undertaking, so it's going to require buy-in from the employers of
whoever gets to work on it, with (of course) no promise that the project
is going to be successful in the end. In that light, it sounds quite
risky.

Overall, I think a good columnar store is a very important piece that
Postgres is missing. This project appears to be a good way to get that,
so I'd be very happy if it gets done, so I estimate the reward to be
high.

This is just my personal opinion -- other Postgres hackers likely have
different ones. Also, I do not speak on behalf of my employer.

Thanks,

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Industry suffers from the managerial dogma that for the sake of stability
and continuity, the company should be independent of the competence of
individual employees." (E. Dijkstra)

pgsql-hackers by date:

From: Andres Freund
Date: 10 February, 21:15:01
Subject: Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals)

From: Tom Lane
Date: 10 February, 21:34:37
Subject: Re: Instability in postgres_fdw regression tests

Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 - Mailing list pgsql-hackers

Previous

Next