Re: Use Postgres as a column store by creating one table per column - Mailing list pgsql-performance

From George Neuner
Subject Re: Use Postgres as a column store by creating one table per column
Date
Msg-id d871e346-bd77-1081-42b8-af31bc8ed6ed@comcast.net
Whole thread Raw
In response to Use Postgres as a column store by creating one table per column  (Lev Kokotov <lev.kokotov@gmail.com>)
Responses Re: Use Postgres as a column store by creating one table per column
List pgsql-performance
On Tue, 21 May 2019 21:28:07 -0700, Lev Kokotov <lev.kokotov@gmail.com> 
wrote:

 >Is it efficient to use Postgres as a column store by creating one 
table per
 >column?
 >
 >I would query it with something like `[...] UNION SELECT value AS <table>
 >FROM <table> WHERE value = <value> UNION [...]` to build a row.

I think you mean JOIN.

You'd need more than that: Postgresql uses MVCC for concurrency, so 
whenever you update any row in a table, the ordering of the rows within 
the table changes.  And the JOIN operation inherently is unordered - you 
need to sort the result deliberately to control ordering.

To emulate a column-store, at the very least you need a way to associate 
values from different "columns" that belong to the same "row" of the 
virtual table.  IOW, every value in every "column" needs an explicit 
"row" identifier.  E.g.,

    col1 = { rowid, value1 }, col2 = { rowid, value2 }, ...

For performance you would need to have indexes on at least the rowid in 
each of the "column" tables.

This is a bare minimum and can only work if the columns of your virtual 
table and the queries against it are application controlled or 
statically known.  If you want to do something more flexible that will 
support ad hoc table modifications, elastically sized values (strings, 
bytes, arrays, JSON, XML), etc. this example is not suffice and the 
implementation can get very complicated very quickly


Justin Pryzby was not joking when he said the performance could be awful 
... at least as compared to a more normal row-oriented structure.  
Performance of a query that involves more than a handful of "columns", 
in general, will be horrible.  It is up to you to decide whether some 
(maybe little) increase in performance in processing *single* columns 
will offset likely MASSIVE loss of performance in processing multiple 
columns.


 >I'm thinking since Postgres stores tables in continuous blocks of 16MB 
each
 >(I think that's the default page size?)

Default page size is 8 KB.  You'd have to recompile to change that, and 
it might break something - a whole lot of code depends on the knowing 
the size of storage pages.


George




pgsql-performance by date:

Previous
From: Jeremy Altavilla
Date:
Subject: Re: Analyze results in more expensive query plan
Next
From: "Naik, Sameer"
Date:
Subject: RE: Re: Generic Plans for Prepared Statement are 158155 times slower thanCustom Plans