Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 - Mailing list pgsql-hackers
From | Jim Nasby |
---|---|
Subject | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 |
Date | |
Msg-id | CAMFBP2qL5GmhJSneErnXj_JXfLoH+g4p6ijyO+vvWd+xtU0QWg@mail.gmail.com Whole thread Raw |
In response to | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 (Tomas Vondra <tomas@vondra.me>) |
Responses |
Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
|
List | pgsql-hackers |
On Wed, Jun 4, 2025 at 1:16 PM Tomas Vondra <tomas@vondra.me> wrote:
On 6/4/25 19:59, Jim Nasby wrote:
>
>
> On Fri, May 23, 2025 at 4:29 PM Tomas Vondra <tomas@vondra.me
> <mailto:tomas@vondra.me>> wrote:
>
> Also, Alvaro seemed to think TAM is the way to go, and in order to keep
> the OLTP performance he suggested to use both heap and VCI at the same
> time, in different "forks". I'm not sure how would that work, or if we
> can already do that - AFAIK we can't, because ForkNumber does not allow
> adding custom forks. We'd have to relax that, or invent some sort of
> federated TAM (that just multiplexes it to two TAMs). Maybe.
>
> But it's not like the IAM approach doesn't need to do this. The first
> patch had to add stuff to a lot of random places to make this work. And
> some of the places touch stuff that we don't expect indexes to worry
> about, like ALTER TABLE, etc.
>
>
> I suspect another option would be to handle this with table inheritance:
> have one child that is heap-based, a second that's VCI, and a background
> job to move data from heap to VCI (and vice-versa for updates and maybe
> deletes).
>
> Note that you could actually implement all that in user-space.
> Personally I'd much rather have a way to do pure VCI / column-store
> sooner and manage it myself than have to wait another release (or more)
> to get a complete solution...
I don't see how could this ever work with the optimizer, which assumes
scanning an inheritance hierarchy means scanning all parts. But this
would require making planner "smarter" to know it should scan only one
of the child relations. And I believe it's not possible to do that while
constructing scans for the heap/VCI parts, those places are not aware of
what other parts are being scanned etc.
Right; I was envisioning that one child would be a conventional heap that stored very recent data and another child would be columnar in nature. So you'd definitely want to always look at both children.
I am making an assumption (based on the comment about multiple forks) that we'd have some way to handle VCI without having an actual heap.
Sure, you could do this in "user-space" by constructing queries that
reference either the heap or VCI part. But then why put that into
inheritance tree at all? It certainly does not help with moving data
between the parts.
Right; I only brought it up because just having a working column-store would be a big win, even if you had to code something to deal with any DML that wasn't already batch up. Of course it would be better if it just did the RightThing(TM) out of the box... but the perfect can be the enemy of the good.
What I can imagine is "VCI" as a "proxy" TAM on top of heap, keeping the
columnar format in a separate fork. And using either that from custom
scans, or the heap as a fallback for cases not supported by VCI.
Yeah, there'd definitely need to be some kind of proxy... I'm just suggesting that we don't *have* to do that as a separate fork...
Of course I could also just be missing something :)
pgsql-hackers by date: