Thread: Diagonal storage model
Hi hackers, Vertical (columnar) storage mode is most optimal for analytic and this is why it is widely used in databases oriented onOLAP, such as Vertica, HyPer,KDB,... In Postgres we have cstore extension which is not able to provide all benefits of vertical model because of lack of supportof vector operations in executor. Situation can be changed if we will have pluggable storage API with support of vectorized execution. But veritcal model is not so good for updates and load of data (because data is mostly imported in horizontal format). This is why in most of the existed systems data is presentin both formats (at least for some time). I want to announce new model, "diagonal storage" which combines benefits of both approaches. The idea is very simple: we first store column 1 of first record, then column 2 of second record, ... and so on until wereach the last column. After it we store second column of first record, third column of the second record,... Profiling of TPC-H queries shows that mode of the time of query exectution (about 17%) is spent is heap_deform_tuple. New format will allow to significantly reduce time of heap deforming, because there is just of column if the particular recordin each tile. Moreover over we can perform deforming of many tuples in parallel, which ids especially efficient at quantum computers. Attach please find patch with first prototype implementation. It provides about 3.14 times improvement of performance atmost of TPC-H queries. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Hi, Konstantin! Thank you for working on new pluggable storage API. Your patch in attachment is 505 bytes and contains only diff from explain.c. Is it right? 01.04.2018, 15:48, "Konstantin Knizhnik" <k.knizhnik@postgrespro.ru>: > Hi hackers, > > Vertical (columnar) storage mode is most optimal for analytic and this is why it is widely used in databases oriented onOLAP, such as Vertica, HyPer,KDB,... > In Postgres we have cstore extension which is not able to provide all benefits of vertical model because of lack of supportof vector operations in executor. > Situation can be changed if we will have pluggable storage API with support of vectorized execution. > > But veritcal model is not so good for updates and load of data (because data is mostly imported in horizontal format). > This is why in most of the existed systems data is presentin both formats (at least for some time). > > I want to announce new model, "diagonal storage" which combines benefits of both approaches. > The idea is very simple: we first store column 1 of first record, then column 2 of second record, ... and so on until wereach the last column. > After it we store second column of first record, third column of the second record,... > > Profiling of TPC-H queries shows that mode of the time of query exectution (about 17%) is spent is heap_deform_tuple. > New format will allow to significantly reduce time of heap deforming, because there is just of column if the particularrecord in each tile. > Moreover over we can perform deforming of many tuples in parallel, which ids especially efficient at quantum computers. > > Attach please find patch with first prototype implementation. It provides about 3.14 times improvement of performance atmost of TPC-H queries. > > -- > Konstantin Knizhnik > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company -- Best regards, Dmitry Voronin
Great Idea ! thank you Konstantin -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Hi!
On Sun, Apr 1, 2018 at 3:48 PM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
I want to announce new model, "diagonal storage" which combines benefits of both approaches.
The idea is very simple: we first store column 1 of first record, then column 2 of second record, ... and so on until we reach the last column.
After it we store second column of first record, third column of the second record,...
Sounds interesting. Could "diagonal storages" be applied twice? That is could we apply
diagonal transformation to the result of another diagonal transformation? I expect we
should get a "square diagonal" transformation...
Attach please find patch with first prototype implementation. It provides about 3.14 times improvement of performance at most of TPC-H queries.
Great, but with square diagonal transformation we should get 3.14^2 times improvement,
which is even better!
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Sun, Apr 01, 2018 at 03:48:07PM +0300, Konstantin Knizhnik wrote: > Hi hackers, > > Vertical (columnar) storage mode is most optimal for analytic and this is why it is widely used in databases oriented onOLAP, such as Vertica, HyPer,KDB,... > In Postgres we have cstore extension which is not able to provide all benefits of vertical model because of lack of supportof vector operations in executor. > Situation can be changed if we will have pluggable storage API with support of vectorized execution. > > But veritcal model is not so good for updates and load of data (because data is mostly imported in horizontal format). > This is why in most of the existed systems data is presentin both formats (at least for some time). > > I want to announce new model, "diagonal storage" which combines benefits of both approaches. > The idea is very simple: we first store column 1 of first record, then column 2 of second record, ... and so on until wereach the last column. > After it we store second column of first record, third column of the second record,... > > Profiling of TPC-H queries shows that mode of the time of query exectution (about 17%) is spent is heap_deform_tuple. > New format will allow to significantly reduce time of heap deforming, because there is just of column if the particularrecord in each tile. > Moreover over we can perform deforming of many tuples in parallel, which ids especially efficient at quantum computers. > > Attach please find patch with first prototype implementation. It provides about 3.14 times improvement of performance atmost of TPC-H queries. You're sure it's not 3.14159265358979323...? Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Sun, Apr 1, 2018 at 3:48 PM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
I want to announce new model, "diagonal storage" which combines benefits of both approaches.
The idea is very simple: we first store column 1 of first record, then column 2 of second record, ... and so on until we reach the last column.
After it we store second column of first record, third column of the second record,...
I'm a little worried about the fact that even with this model we're still limited to only two dimensions. That's bound to cause problems sooner or later.
.m
On Mon, Apr 2, 2018 at 3:49 AM, Marko Tiikkaja <marko@joh.to> wrote: > On Sun, Apr 1, 2018 at 3:48 PM, Konstantin Knizhnik > <k.knizhnik@postgrespro.ru> wrote: >> >> I want to announce new model, "diagonal storage" which combines benefits >> of both approaches. >> The idea is very simple: we first store column 1 of first record, then >> column 2 of second record, ... and so on until we reach the last column. >> After it we store second column of first record, third column of the >> second record,... > > > I'm a little worried about the fact that even with this model we're still > limited to only two dimensions. That's bound to cause problems sooner or > later. > How about a 3D storage model, whose first dimension gives horizontal view, second provides vertical or columnar view and third one provides diagonal view. It also provides capability to add extra dimensions to provide additional views like double diagonal view. Alas! it all collapses since I was late to the party. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
> 2 апр. 2018 г., в 16:57, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> написал(а): > On Mon, Apr 2, 2018 at 3:49 AM, Marko Tiikkaja <marko@joh.to> wrote: >> >> I'm a little worried about the fact that even with this model we're still >> limited to only two dimensions. That's bound to cause problems sooner or >> later. > How about a 3D storage model, whose first dimension gives horizontal > view, second provides vertical or columnar view and third one provides > diagonal view. It also provides capability to add extra dimensions to > provide additional views like double diagonal view. Alas! it all > collapses since I was late to the party. BTW, MDX expression actually provides mulitidimensional result. They have COLUMNS, ROWS, PAGES, SECTIONS, CHAPTERS, and AXIS(N)for those who is not satisfied with named dimensions. Best regards, Andrey Borodin.
On Sun, Apr 01, 2018 at 03:48:07PM +0300, Konstantin Knizhnik wrote: > Attach please find patch with first prototype implementation. It > provides about 3.14 times improvement of performance at most of TPC-H > queries. Congratulations in finding a patch able to improve all workloads of Postgres in such a simple and magic way, especially on this particular date. I would have used M_PI if I were you. -- Michael