Re: In-Memory Columnar Store - Mailing list pgsql-hackers

From knizhnik
Subject Re: In-Memory Columnar Store
Date
Msg-id 52A88DFC.5020601@garret.ru
Whole thread Raw
In response to Re: In-Memory Columnar Store  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: In-Memory Columnar Store
List pgsql-hackers
Hello!

Implementation of IMCS itself took me about two months (with testing and 
writing documentation).
But huge part of the code was previously written by me for other 
projects, so I have reused them.
Most of the time I have spent in integration of this code with 
PostgreSQL (I was not so familiar with it before).

Certainly implementations of columnar store for Oracle (Oracle Database 
In-Memory Option), DB2 (BLU Acceleration), ... are more convenient for 
users: them can execute normal SQL queries and do not require users to 
learn new functions and approach. But it requires complete redesign of 
query engine (or providing alternative implementation). I was not able 
to do it.

This is why I try to provide advantages of vertical data representation 
(vector operation, parallel execution, data skipping) as well as 
advantages of fast access to in-memory data as standard PostgreSQL 
extension. There are obviously some limitations and queries look more 
complicated than in case of standard SQL...

But from the other side it is possible to write queries which are hardly 
to be expressed using standard SQL.
For example calculating split-adjusted prices can not be done in SQL 
without using stored procedures.
To make usage of IMCS functions as simple as possible I defined a larger 
number of various operators for most popular operations.
For example Volume-Weighted-Average-Price can be calculated just as:

select Volume//Close as VWAP from Quote_get();

It is even shore than analog SQL statement:

select sum(Close*Volume)/sum(Volume) as VWAP from Quote;


Concerning integration with PostgreSQL, there were several problems. 
Some of them seems to have no easy solution, but other are IMHO 
imperfections in PostgreSQL which I hope will be fixed sometime:

1. Calls in PL/pgSQL are very slow - about 1-2 micsroseconds at my 
computer. Just defining insertion per-row trigger with empty procedure 
increase time of insertion of 6 million records twice - from 7 till 15 
seconds. If trigger procedure is not empty, then time is increased 
proportionally number of performed calls.
In my case inserting data with propagation it in columnar store using 
trigger takes about 80 seconds. But if I first load data without 
triggers in PostgreSQL table and then
insert it in columnar store using load function (implemented in C), then 
time will be 7+9=16 seconds.

Certainly I realize that plpgsql is interpreted language. But for 
example also interpreted Python is able to do 100 times more calls per 
second.
Unfortunately profiler doesn;t show some bottleneck - looks like long 
calltime is caused by large overhead of initializing and resetting 
memory context and copying arguments data.

2. Inefficient implementation of expanding composite type columns using 
(foo()).* clause. In this case function foo() will be invoked as much 
times as there are fields in the returned composite type. Even in case 
of placing call in FROM list (thanks to lateral joins in 9.3), 
PostgreSQL still sometimes performs redundant calls which can be avoided 
using hack with adding "OFFSET 1" clause.

3. 256Gb limit for used shared memory segment size at Linux.

Concerning last problem - I have included in IMCS distributive much 
simpler patch which just set MAP_HUGETLB flags when
a) is it defined in system headers
b) requested memory size is larger than 256Gb

In this case right now PostgreSQL will just fail to start.
But certainly it is more correct to trigger this flag through 
configuration parameter, because large pages can minimize MMU overhead 
and so increase speed even if size of used memory is less than 256Gb 
(this is why Oracle is widely using it).



. Вызов функции занимает прядка 2 микросекунд. Т.е. если я напишу 
триггер с пустой процедурой, то вставка 6 миллионов объектов займёт 15 
секунд. Это при том, что без триггера вставка занимает всего 7 секунд...


On 12/11/2013 06:33 PM, Merlin Moncure wrote:
> On Mon, Dec 9, 2013 at 1:40 PM, knizhnik <knizhnik@garret.ru> wrote:
>> Hello!
>>
>> I want to annouce my implementation of In-Memory Columnar Store extension
>> for PostgreSQL:
>>
>>       Documentation: http://www.garret.ru/imcs/user_guide.html
>>       Sources: http://www.garret.ru/imcs-1.01.tar.gz
>>
>> Any feedbacks, bug reports and suggestions are welcome.
>>
>> Vertical representation of data is stored in PostgreSQL shared memory.
>> This is why it is important to be able to utilize all available physical
>> memory.
>> Now servers with Tb or more RAM are not something exotic, especially in
>> financial world.
>> But there is limitation in Linux with standard 4kb pages  for maximal size
>> of mapped memory segment: 256Gb.
>> It is possible to overcome this limitation either by creating multiple
>> segments - but it requires too much changes in PostgreSQL memory manager.
>> Or just set MAP_HUGETLB flag (assuming that huge pages were allocated in the
>> system).
>>
>> I found several messages related with MAP_HUGETLB flag, the most recent one
>> was from 21 of November:
>> http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org
>>
>> I wonder what is the current status of this patch?
> I looked over your extension.  I think it's a pretty amazing example
> of the postgres extension and type systems -- up there with postgis.
> Very well done.  How long did this take you to write?
>
> MAP_HUGETLB patch was marked 'returned with feedback'.
> https://commitfest.postgresql.org/action/patch_view?id=1308.  It seems
> likely to be revived, perhaps in time for 9.4.
>
> Honestly, I think your efforts here provide more argument for adding
> huge tbl support.
>
> merlin




pgsql-hackers by date:

Previous
From: "MauMau"
Date:
Subject: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?
Next
From: Tom Lane
Date:
Subject: Re: -d option for pg_isready is broken