Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 - Mailing list pgsql-hackers

From Alexandre Felipe
Subject Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
Date
Msg-id CAE8JnxPY-x0hSAQe8cf=qo+BoFYZtD707y_zdQRH3M-EQeGSmA@mail.gmail.com
Whole thread
In response to Re: [WIP]Vertical Clustered Index (columnar store extension) - take2  (Álvaro Herrera <alvherre@kurilemu.de>)
Responses RE: [WIP]Vertical Clustered Index (columnar store extension) - take2
List pgsql-hackers
Thank you for your attention Peter, Japin, and all the guys involved in this work.

Alvaro,
Please review and let us know what you think

I reiterate what Álvaro said
> good columnar store is a very important piece that
> Postgres is missing.  This project appears to be a good way to get that,
> so I'd be very happy if it gets done, so I estimate the reward to be
> high.

And taking from his comment

>> So, IMO the discussion we need to have is about setting a development
>> direction so that this overall project takes a form that can be 
>> accepted.

DESIGN PRINCIPLES

1. {,un}pluggable: 
 - Available if one asks unnoticeable to everyone else.
 - Core changes are kept to a minimum.
 - Absolutely no references to the extension in the core.
2. useful: 
 - It should be able to scan a table faster than any other method
3. simple: 
 - It should be easy to get more people onboard.
4. limited as necessary.
 - This is not all or nothing, it is about getting something.

Principle 1 is for reviewers. It is much more comfortable to accept a patch, like that.
Principle 2 is for users, it will make it desirable.
Principle 3 is for contributors, open source that everyone is afraid to touch doesn't take us too far.
Principle 4 is for the core developers, it will be rewarding to see progress.

DESIGN PHASES

I don't know the details involved in each of these phases, I am pretty new to postgres, and just landed on this thread. Hopefully everyone here agrees that this project can be split into smaller milestones. However, I am risking to propose a plan with three clear 


Phase 1: PREPARATORY WORK
 should be finished and committed, forever rebasing is a huge waste of time, not trying to be original here [1]. 
- 3.3.1 Ad-hoc hooks: this should be addressed as part of the preparatory work, as long as they are not on a hot path there should be little resistance in getting these accepted.
- 3.3.2 Embedded code: Should these be made as additional hooks?
- Can this interface be simplified, reducing the number of hooks?


Phase 2: CORE IMPLEMENTATION
Add support to the most useful use case (scans with aggregations in my opinion), and try to get it right.
> there is no need to be super fast. There is the need to be remarkably faster than it is now [2]
I would say that we should focus on fixed size data types first, e.g. dates, timestamps, and numeric types (maybe geolocation?). Indices should be built manually. No automatic ROS/WOS at this stage. Users can request the index to be updated manually via REINDEX (or any other prefered method). Admins could set up cron jobs running nightly to keep it reasonably updated.
Current issues listed in the /contrib/vci/README that should be addressed
- 8.1.1 DROP EXTENSION vci: the extension should not force anyone to reboot the database say if it has a memory leak.
- 8.2.4 Performance Testing: it will be important to have at least one example to showcase the performance gains at this point. This will be the selling point, to show that it can be useful to ship with version 19.
 - I think this should consider sorting, or at least clustering the outputs per page (note at the end).


Phase 3: OPPORTUNISTIC ROS UPDATE
Should run with low priority, and small chunks (of course you can make it configurable), the focus of Posgres continues being on transactions. Chunks should be small enough that they don't impact transaction latency. It could be more performant using larger chunks? In that case we should consider an approach that can be easily interrupted and won't impact the available working memory. The extension should works with the default postgres parameters, currently max_worker_processes default is 8, and VCI requires at least 10.


Phase X: OPTIMIZE AND GENERALIZE
This item is where we should list the things that we MUST NOT ATTEMPT to get done during the previous phases. They are going to be necessary, but they must be built on top of a foundation yet to be built. At this point we could consider things like, adding support to variable length types, arrays, json, text, inverted indices, extensibility for custom types, etc. Compressed JSON is very promising, if you give developers the ability to write arbitrary JSONs, they will use a very nice and readable, low entropy format. Instead of TOAST tables could store external types with VCIs. 

Looking forward to the refinements.





Note:
I had different expectations, the first query uses two columns and filters over a 1-year period, the second query uses only one column and filters over.
My first expectation was that if only one column is used there is less data being accessed.
In my experience most queries will care about a timestamp column, it would be nice to be able to cluster the data by certain columns (in my case the created) column
Maybe even clustering by multiple columns independently mediated by CRID permutations. If you think that this is easy enough we could insert this before phase 3.


EXPLAIN (ANALYSE, VERBOSE) 

SELECT department_id, date_trunc('month', created) count(1) FROM metadata 

WHERE created BETWEEN '2025-01-01' AND '2026-01-01' group by 1, 2;

------

 Custom Scan (VCI HashAggregate) (actual time=993.170..1002.744 rows=18114.00 loops=1)

   Output: department_id, (date_trunc('month'::text, created)), count(*)

   Group Key: metadata.department_id, date_trunc('month'::text, metadata.created)

   Buffers: shared hit=599753 read=26782

   ->  Custom Scan (VCI Scan) using metadata_vci on public.metadata (rows=6509488)

         Output: department_id, date_trunc('month'::text, created)

         Filter: ...

         Rows Removed by Filter: 9030822

         Buffers: shared hit=599753 read=26782

 Planning:

   Buffers: shared read=4

 Planning Time: 1.847 ms

 Execution Time: 1083.087 ms




EXPLAIN (ANALYSE, VERBOSE) 

SELECT department_id, count(1) 

FROM metadata WHERE created BETWEEN '2025-01-01' AND '2025-01-02' group by 1;

-------

 Custom Scan (VCI GroupAggregate) (actual time=439.137..439.182 rows=107.00 loops=1)

   Output: department_id, count(*)

   Group Key: metadata.department_id

   Buffers: shared hit=599753 read=26782

   ->  Custom Scan (VCI Sort) (actual time=439.130..439.144 rows=570 loops=1)

         Output: department_id

         Sort Key: metadata.department_id

         Sort Method: quicksort  Memory: 38kB

         Buffers: shared hit=599753 read=26782

         ->  Custom Scan (VCI Scan) using metadata_vci on public.metadata (rows=570)

               Output: department_id

               Filter: ...

               Rows Removed by Filter: 15362038

               Buffers: shared hit=599753 read=26782

 Planning Time: 0.545 ms

 Execution Time: 446.081 ms


Kind Regards,
Alexandre Felipe


On Tue, Feb 10, 2026 at 12:02 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
Hello,

I think we need to have a discussion about this patch set.  In my
opinion, this is going nowhere in its current form.  It's just not
acceptable to have a full-blown executor implementation in contrib/ that
operates based on hooks in heapam and the transaction machinery.  This
design doesn't work for us and it can't be accepted.

Which is to say, these continued postings of rebased with minor tweaks
here and there, appear somewhat pointless from my point of view.

So, IMO the discussion we need to have is about setting a development
direction so that this overall project takes a form that can be
accepted.  I already proposed upthread and in pgconf.dev last year that
this should be implemented as a new table AM (in src/backend/access/),
and then add appropriate executor support in the main executor code.
If we disagree on this, let's discuss the reasons until we find a
consensus, and then focus on how to use all this code in a way that
works with that design.

Now, I imagine that morphing all this code to become a table AM is a
huge undertaking, so it's going to require buy-in from the employers of
whoever gets to work on it, with (of course) no promise that the project
is going to be successful in the end.  In that light, it sounds quite
risky.

Overall, I think a good columnar store is a very important piece that
Postgres is missing.  This project appears to be a good way to get that,
so I'd be very happy if it gets done, so I estimate the reward to be
high.

This is just my personal opinion -- other Postgres hackers likely have
different ones.  Also, I do not speak on behalf of my employer.

Thanks,

--
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Industry suffers from the managerial dogma that for the sake of stability
and continuity, the company should be independent of the competence of
individual employees."                                      (E. Dijkstra)



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: PGPROC alignment (was Re: pgsql: Separate RecoveryConflictReasons from procsignals)
Next
From: Tom Lane
Date:
Subject: Re: Instability in postgres_fdw regression tests