Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 - Mailing list pgsql-hackers
| From | Alexandre Felipe |
|---|---|
| Subject | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 |
| Date | |
| Msg-id | CAE8JnxPY-x0hSAQe8cf=qo+BoFYZtD707y_zdQRH3M-EQeGSmA@mail.gmail.com Whole thread |
| In response to | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 (Álvaro Herrera <alvherre@kurilemu.de>) |
| Responses |
RE: [WIP]Vertical Clustered Index (columnar store extension) - take2
|
| List | pgsql-hackers |
Alvaro,
I reiterate what Álvaro said
> so I'd be very happy if it gets done, so I estimate the reward to be
> high.
- 8.2.4 Performance Testing: it will be important to have at least one example to showcase the performance gains at this point. This will be the selling point, to show that it can be useful to ship with version 19.
Maybe even clustering by multiple columns independently mediated by CRID permutations. If you think that this is easy enough we could insert this before phase 3.
EXPLAIN (ANALYSE, VERBOSE)
SELECT department_id, date_trunc('month', created) count(1) FROM metadata
WHERE created BETWEEN '2025-01-01' AND '2026-01-01' group by 1, 2;
------
Custom Scan (VCI HashAggregate) (actual time=993.170..1002.744 rows=18114.00 loops=1)
Output: department_id, (date_trunc('month'::text, created)), count(*)
Group Key: metadata.department_id, date_trunc('month'::text, metadata.created)
Buffers: shared hit=599753 read=26782
-> Custom Scan (VCI Scan) using metadata_vci on public.metadata (rows=6509488)
Output: department_id, date_trunc('month'::text, created)
Filter: ...
Rows Removed by Filter: 9030822
Buffers: shared hit=599753 read=26782
Planning:
Buffers: shared read=4
Planning Time: 1.847 ms
Execution Time: 1083.087 ms
EXPLAIN (ANALYSE, VERBOSE)
SELECT department_id, count(1)
FROM metadata WHERE created BETWEEN '2025-01-01' AND '2025-01-02' group by 1;
-------
Custom Scan (VCI GroupAggregate) (actual time=439.137..439.182 rows=107.00 loops=1)
Output: department_id, count(*)
Group Key: metadata.department_id
Buffers: shared hit=599753 read=26782
-> Custom Scan (VCI Sort) (actual time=439.130..439.144 rows=570 loops=1)
Output: department_id
Sort Key: metadata.department_id
Sort Method: quicksort Memory: 38kB
Buffers: shared hit=599753 read=26782
-> Custom Scan (VCI Scan) using metadata_vci on public.metadata (rows=570)
Output: department_id
Filter: ...
Rows Removed by Filter: 15362038
Buffers: shared hit=599753 read=26782
Planning Time: 0.545 ms
Execution Time: 446.081 ms
Hello,
I think we need to have a discussion about this patch set. In my
opinion, this is going nowhere in its current form. It's just not
acceptable to have a full-blown executor implementation in contrib/ that
operates based on hooks in heapam and the transaction machinery. This
design doesn't work for us and it can't be accepted.
Which is to say, these continued postings of rebased with minor tweaks
here and there, appear somewhat pointless from my point of view.
So, IMO the discussion we need to have is about setting a development
direction so that this overall project takes a form that can be
accepted. I already proposed upthread and in pgconf.dev last year that
this should be implemented as a new table AM (in src/backend/access/),
and then add appropriate executor support in the main executor code.
If we disagree on this, let's discuss the reasons until we find a
consensus, and then focus on how to use all this code in a way that
works with that design.
Now, I imagine that morphing all this code to become a table AM is a
huge undertaking, so it's going to require buy-in from the employers of
whoever gets to work on it, with (of course) no promise that the project
is going to be successful in the end. In that light, it sounds quite
risky.
Overall, I think a good columnar store is a very important piece that
Postgres is missing. This project appears to be a good way to get that,
so I'd be very happy if it gets done, so I estimate the reward to be
high.
This is just my personal opinion -- other Postgres hackers likely have
different ones. Also, I do not speak on behalf of my employer.
Thanks,
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Industry suffers from the managerial dogma that for the sake of stability
and continuity, the company should be independent of the competence of
individual employees." (E. Dijkstra)
pgsql-hackers by date: