Re: Zedstore - compressed in-core columnar storage - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Zedstore - compressed in-core columnar storage
Date
Msg-id 20190415182209.itgnb3miuket3xsh@alap3.anarazel.de
Whole thread Raw
In response to Re: Zedstore - compressed in-core columnar storage  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

On 2019-04-15 11:10:38 -0400, Tom Lane wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> >> I think having a colstore in core is important not just for adoption,
> >> but also for testing and development of the executor / planner bits.
> 
> > Agreed.
> 
> TBH, I thought the reason we were expending so much effort on a tableam
> API was exactly so we *wouldn't* have to include such stuff in core.

I think it's mostly orthogonal. We need something like tableam to have
multiple types of storage options for tables - independent of whether
they are in core. And sure, we could have maybe reduced the effort a bit
here and there by e.g. not allowing AMs to be dynamlically loaded, or
writing fewer comments or such.


> There is a finite limit to how much stuff we can maintain as part of core.
> We should embrace the notion that Postgres is an extensible system, rather
> than build all the tooling for extension and then proceed to dump stuff
> into core anyway.

I totally agree that that's something we should continue to focus
on. I personally think we *already* *have* embraced that - pretty
heavily so.  And sometimes to the detriment of our users.

I think there's a pretty good case for e.g. *one* column store
in-core. For one there is a huge portion of existing postgres workloads
that benefit from them (often not for all tables, but some). Relatedly,
it's also one of the more frequent reasons why users can't migrate to
postgres / have to migrate off.  And from a different angle, there's
plenty planner and executor work to be done to make column stores fast -
and that can't really be done nicely outside of core; and doing the
improvements in core without a user there is both harder, less likely to
be accepted, and more likely to regress.


> >> If we have multiple candidates with sufficient code quality, then we may
> >> consider including both.
> 
> Dear god, no.

Yea, I don't see much point in that. Unless there's a pretty fundamental
reason why one columnar AM can't fullfill two different workloads
(e.g. by having options that define how things are laid out / compressed
/ whatnot), I think that'd be a *terrible* idea. By that logic we'd just
get a lot of AMs with a few differences in some workloads, and our users
would be unable to choose one, and all of them would suck.  I think one
such fundamental difference is e.g. the visibility management for an
in-line mvcc approach like heap, and an undo-based mvcc row-store (like
zheap) - it's very hard to imagine meaningful code savings by having
those combined into one AM. I'm sure we can find similar large
architectural issues for some types of columnar AMs - but I'm far far
from convinced that there's enough distinctive need for two different
approaches in postgres.  Without having heap historically and the desire
for on-disk compat, I can't quite see being convinced that we should
e.g. add a store like heap if we already had zheap.

I think it's perfectly reasonable to have in-core AMs try to optimize
~80% for a lot of different [sets of] workloads, even if a very
specialized AM could be optimized for it much further.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Zedstore - compressed in-core columnar storage
Next
From: Andres Freund
Date:
Subject: Re: Zedstore - compressed in-core columnar storage