Re: Zedstore - compressed in-core columnar storage - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Zedstore - compressed in-core columnar storage |
Date | |
Msg-id | 20190415182209.itgnb3miuket3xsh@alap3.anarazel.de Whole thread Raw |
In response to | Re: Zedstore - compressed in-core columnar storage (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
Hi, On 2019-04-15 11:10:38 -0400, Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote: > >> I think having a colstore in core is important not just for adoption, > >> but also for testing and development of the executor / planner bits. > > > Agreed. > > TBH, I thought the reason we were expending so much effort on a tableam > API was exactly so we *wouldn't* have to include such stuff in core. I think it's mostly orthogonal. We need something like tableam to have multiple types of storage options for tables - independent of whether they are in core. And sure, we could have maybe reduced the effort a bit here and there by e.g. not allowing AMs to be dynamlically loaded, or writing fewer comments or such. > There is a finite limit to how much stuff we can maintain as part of core. > We should embrace the notion that Postgres is an extensible system, rather > than build all the tooling for extension and then proceed to dump stuff > into core anyway. I totally agree that that's something we should continue to focus on. I personally think we *already* *have* embraced that - pretty heavily so. And sometimes to the detriment of our users. I think there's a pretty good case for e.g. *one* column store in-core. For one there is a huge portion of existing postgres workloads that benefit from them (often not for all tables, but some). Relatedly, it's also one of the more frequent reasons why users can't migrate to postgres / have to migrate off. And from a different angle, there's plenty planner and executor work to be done to make column stores fast - and that can't really be done nicely outside of core; and doing the improvements in core without a user there is both harder, less likely to be accepted, and more likely to regress. > >> If we have multiple candidates with sufficient code quality, then we may > >> consider including both. > > Dear god, no. Yea, I don't see much point in that. Unless there's a pretty fundamental reason why one columnar AM can't fullfill two different workloads (e.g. by having options that define how things are laid out / compressed / whatnot), I think that'd be a *terrible* idea. By that logic we'd just get a lot of AMs with a few differences in some workloads, and our users would be unable to choose one, and all of them would suck. I think one such fundamental difference is e.g. the visibility management for an in-line mvcc approach like heap, and an undo-based mvcc row-store (like zheap) - it's very hard to imagine meaningful code savings by having those combined into one AM. I'm sure we can find similar large architectural issues for some types of columnar AMs - but I'm far far from convinced that there's enough distinctive need for two different approaches in postgres. Without having heap historically and the desire for on-disk compat, I can't quite see being convinced that we should e.g. add a store like heap if we already had zheap. I think it's perfectly reasonable to have in-core AMs try to optimize ~80% for a lot of different [sets of] workloads, even if a very specialized AM could be optimized for it much further. Greetings, Andres Freund
pgsql-hackers by date: