Re: Columnar store as default for PostgreSQL 10? - Mailing list pgsql-general

From Alvaro Herrera
Subject Re: Columnar store as default for PostgreSQL 10?
Date
Msg-id 20160425142011.GA356690@alvherre.pgsql
Whole thread Raw
In response to Columnar store as default for PostgreSQL 10?  (Bráulio Bhavamitra <brauliobo@gmail.com>)
Responses Re: Columnar store as default for PostgreSQL 10?  (Bráulio Bhavamitra <brauliobo@gmail.com>)
Re: Columnar store as default for PostgreSQL 10?  (Merlin Moncure <mmoncure@gmail.com>)
Re: Columnar store as default for PostgreSQL 10?  (Bruce Momjian <bruce@momjian.us>)
Re: Columnar store as default for PostgreSQL 10?  (Bráulio Bhavamitra <brauliobo@gmail.com>)
List pgsql-general
Bráulio Bhavamitra wrote:
> Hi all,
>
> I'm finally having performance issues with PostgreSQL when doing big
> analytics queries over almost the entire database of more than 100gb of
> data.
>
> And what I keep reading all over the web is many databases switching to
> columnar store (RedShift, Cassandra, cstore_fdw, etc) and having great
> performance on queries in general and giant boosts with big analytics
> queries.
>
> I wonder if there is any plans to move postgresql entirely to a columnar
> store (or at least make it an option), maybe for version 10?

This is a pretty interesting question.  I wrote an answer, then thought
it would make a good blog post, so it's at
http://blog.2ndquadrant.com/column-store-plans/
I reproduce it below.

Completely replacing the current row-based store wouldn't be a good
idea: it has served us extremely well and I’m pretty sure that replacing
it entirely with a columnar store would be disastrous performance-wise
for OLTP use cases.

That doesn't mean columnar stores are a bad idea in general -- because
they aren't. They just have a more limited use case than "the whole
database". For analytical queries on append-mostly data, a columnar
store is a much more appropriate representation than the regular
row-based store, but not all databases are analytical.

However, in order to attain interesting performance gains you need to do
a lot more than just change the underlying storage: you need to ensure
that the rest of the system can take advantage of the changed
representation, so that it can execute queries optimally; for instance,
you may want aggregates that operate in a SIMD mode rather than
one-value-at-a-time as it is today. This, in itself, is a large
undertaking, and there are other challenges too.

As it turns out, there's a team at 2ndQuadrant working precisely on
these matters. We posted a patch last year, but it wasn’t terribly
interesting -— it only made a single-digit percentage improvement in
TPC-H scores; not enough to bother the development community with (it
was a fairly invasive patch). We want more than that.

In our design, columnar or not is going to be an option: you're going to
be able to say "Dear server, for this table kindly set up columnar
storage for me, would you? Thank you very much." And then you’re going
to get a table which may be slower for regular usage but which will rock
for analytics. For most of your tables the current row-based store will
still likely be the best option, because row-based storage is much
better suited to the more general cases.

We don’t have a timescale yet. Stay tuned.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-general by date:

Previous
From: Babak Alipour
Date:
Subject: Calculating Minkowski distance between two rows
Next
From: Adrian Klaver
Date:
Subject: Re: Calculating Minkowski distance between two rows