On Thu, May 23, 2019 at 01:08:42AM -0400, George Neuner wrote:
>On Tue, 21 May 2019 21:28:07 -0700, Lev Kokotov
><lev.kokotov@gmail.com> wrote:
>
>>Is it efficient to use Postgres as a column store by creating one
>table per
>>column?
>>
>>I would query it with something like `[...] UNION SELECT value AS <table>
>>FROM <table> WHERE value = <value> UNION [...]` to build a row.
>
>I think you mean JOIN.
>
>You'd need more than that: Postgresql uses MVCC for concurrency, so
>whenever you update any row in a table, the ordering of the rows
>within the table changes. And the JOIN operation inherently is
>unordered - you need to sort the result deliberately to control
>ordering.
>
>To emulate a column-store, at the very least you need a way to
>associate values from different "columns" that belong to the same
>"row" of the virtual table. IOW, every value in every "column" needs
>an explicit "row" identifier. E.g.,
>
> col1 = { rowid, value1 }, col2 = { rowid, value2 }, ...
>
>For performance you would need to have indexes on at least the rowid
>in each of the "column" tables.
>
>This is a bare minimum and can only work if the columns of your
>virtual table and the queries against it are application controlled or
>statically known. If you want to do something more flexible that will
>support ad hoc table modifications, elastically sized values (strings,
>bytes, arrays, JSON, XML), etc. this example is not suffice and the
>implementation can get very complicated very quickly
>
>
>Justin Pryzby was not joking when he said the performance could be
>awful ... at least as compared to a more normal row-oriented
>structure. Performance of a query that involves more than a handful
>of "columns", in general, will be horrible. It is up to you to decide
>whether some (maybe little) increase in performance in processing
>*single* columns will offset likely MASSIVE loss of performance in
>processing multiple columns.
>
Maybe take a look at this paper:
http://db.csail.mit.edu/projects/cstore/abadi-sigmod08.pdf
which essentially compares this approach to a "real" column store.
It certainly won't give you performance comparable to column store, it
adds quite a bit of overhead (disk space because of row headers, CPU
because of extra joins, etc.).
And it can't give you the column-store benefits - compression and/or
more efficient execution.
>
>>I'm thinking since Postgres stores tables in continuous blocks of
>16MB each
>>(I think that's the default page size?)
>
>Default page size is 8 KB. You'd have to recompile to change that,
>and it might break something - a whole lot of code depends on the
>knowing the size of storage pages.
>
>
Right. And the largest page size is 64kB. But 8kB is a pretty good
trade-off, in most cases.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services