Hi hackers,
Vertical (columnar) storage mode is most optimal for analytic and this is why it is widely used in databases oriented
onOLAP, such as Vertica, HyPer,KDB,...
In Postgres we have cstore extension which is not able to provide all benefits of vertical model because of lack of
supportof vector operations in executor.
Situation can be changed if we will have pluggable storage API with support of vectorized execution.
But veritcal model is not so good for updates and load of data (because data is mostly imported in horizontal format).
This is why in most of the existed systems data is presentin both formats (at least for some time).
I want to announce new model, "diagonal storage" which combines benefits of both approaches.
The idea is very simple: we first store column 1 of first record, then column 2 of second record, ... and so on until
wereach the last column.
After it we store second column of first record, third column of the second record,...
Profiling of TPC-H queries shows that mode of the time of query exectution (about 17%) is spent is heap_deform_tuple.
New format will allow to significantly reduce time of heap deforming, because there is just of column if the particular
recordin each tile.
Moreover over we can perform deforming of many tuples in parallel, which ids especially efficient at quantum
computers.
Attach please find patch with first prototype implementation. It provides about 3.14 times improvement of performance
atmost of TPC-H queries.
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company