Thread: In-Memory Columnar Store

In-Memory Columnar Store

From

knizhnik

Date:

09 December 2013, 19:40:58

Hello!

I want to annouce my implementation of In-Memory Columnar Store 
extension for PostgreSQL:
     Documentation: http://www.garret.ru/imcs/user_guide.html     Sources: http://www.garret.ru/imcs-1.01.tar.gz

Any feedbacks, bug reports and suggestions are welcome.

Vertical representation of data is stored in PostgreSQL shared memory.
This is why it is important to be able to utilize all available physical 
memory.
Now servers with Tb or more RAM are not something exotic, especially in 
financial world.
But there is limitation in Linux with standard 4kb pages  for maximal 
size of mapped memory segment: 256Gb.
It is possible to overcome this limitation either by creating multiple 
segments - but it requires too much changes in PostgreSQL memory manager.
Or just set MAP_HUGETLB flag (assuming that huge pages were allocated in 
the system).

I found several messages related with MAP_HUGETLB flag, the most recent 
one was from 21 of November:
http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org

I wonder what is the current status of this patch?

Re: In-Memory Columnar Store

From

Merlin Moncure

Date:

11 December 2013, 14:33:25

On Mon, Dec 9, 2013 at 1:40 PM, knizhnik <knizhnik@garret.ru> wrote:
> Hello!
>
> I want to annouce my implementation of In-Memory Columnar Store extension
> for PostgreSQL:
>
>      Documentation: http://www.garret.ru/imcs/user_guide.html
>      Sources: http://www.garret.ru/imcs-1.01.tar.gz
>
> Any feedbacks, bug reports and suggestions are welcome.
>
> Vertical representation of data is stored in PostgreSQL shared memory.
> This is why it is important to be able to utilize all available physical
> memory.
> Now servers with Tb or more RAM are not something exotic, especially in
> financial world.
> But there is limitation in Linux with standard 4kb pages  for maximal size
> of mapped memory segment: 256Gb.
> It is possible to overcome this limitation either by creating multiple
> segments - but it requires too much changes in PostgreSQL memory manager.
> Or just set MAP_HUGETLB flag (assuming that huge pages were allocated in the
> system).
>
> I found several messages related with MAP_HUGETLB flag, the most recent one
> was from 21 of November:
> http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org
>
> I wonder what is the current status of this patch?

I looked over your extension.  I think it's a pretty amazing example
of the postgres extension and type systems -- up there with postgis.
Very well done.  How long did this take you to write?

MAP_HUGETLB patch was marked 'returned with feedback'.
https://commitfest.postgresql.org/action/patch_view?id=1308.  It seems
likely to be revived, perhaps in time for 9.4.

Honestly, I think your efforts here provide more argument for adding
huge tbl support.

merlin

Re: In-Memory Columnar Store

From

"ktm@rice.edu"

Date:

11 December 2013, 15:16:44

On Mon, Dec 09, 2013 at 11:40:41PM +0400, knizhnik wrote:
> Hello!
> 
> I want to annouce my implementation of In-Memory Columnar Store
> extension for PostgreSQL:
> 
>      Documentation: http://www.garret.ru/imcs/user_guide.html
>      Sources: http://www.garret.ru/imcs-1.01.tar.gz
> 
> Any feedbacks, bug reports and suggestions are welcome.
> 
> Vertical representation of data is stored in PostgreSQL shared memory.
> This is why it is important to be able to utilize all available
> physical memory.

Hi,

This is very neat! The question I have, which applies to the matview
support as well, is "How can we transparently substitute usage of the
in-memory columnar store/matview in a SQL query?".

Regards,
Ken

Re: In-Memory Columnar Store

From

knizhnik

Date:

11 December 2013, 16:08:52

Hello!

Implementation of IMCS itself took me about two months (with testing and 
writing documentation).
But huge part of the code was previously written by me for other 
projects, so I have reused them.
Most of the time I have spent in integration of this code with 
PostgreSQL (I was not so familiar with it before).

Certainly implementations of columnar store for Oracle (Oracle Database 
In-Memory Option), DB2 (BLU Acceleration), ... are more convenient for 
users: them can execute normal SQL queries and do not require users to 
learn new functions and approach. But it requires complete redesign of 
query engine (or providing alternative implementation). I was not able 
to do it.

This is why I try to provide advantages of vertical data representation 
(vector operation, parallel execution, data skipping) as well as 
advantages of fast access to in-memory data as standard PostgreSQL 
extension. There are obviously some limitations and queries look more 
complicated than in case of standard SQL...

But from the other side it is possible to write queries which are hardly 
to be expressed using standard SQL.
For example calculating split-adjusted prices can not be done in SQL 
without using stored procedures.
To make usage of IMCS functions as simple as possible I defined a larger 
number of various operators for most popular operations.
For example Volume-Weighted-Average-Price can be calculated just as:

select Volume//Close as VWAP from Quote_get();

It is even shore than analog SQL statement:

select sum(Close*Volume)/sum(Volume) as VWAP from Quote;

Concerning integration with PostgreSQL, there were several problems. 
Some of them seems to have no easy solution, but other are IMHO 
imperfections in PostgreSQL which I hope will be fixed sometime:

1. Calls in PL/pgSQL are very slow - about 1-2 micsroseconds at my 
computer. Just defining insertion per-row trigger with empty procedure 
increase time of insertion of 6 million records twice - from 7 till 15 
seconds. If trigger procedure is not empty, then time is increased 
proportionally number of performed calls.
In my case inserting data with propagation it in columnar store using 
trigger takes about 80 seconds. But if I first load data without 
triggers in PostgreSQL table and then
insert it in columnar store using load function (implemented in C), then 
time will be 7+9=16 seconds.

Certainly I realize that plpgsql is interpreted language. But for 
example also interpreted Python is able to do 100 times more calls per 
second.
Unfortunately profiler doesn;t show some bottleneck - looks like long 
calltime is caused by large overhead of initializing and resetting 
memory context and copying arguments data.

2. Inefficient implementation of expanding composite type columns using 
(foo()).* clause. In this case function foo() will be invoked as much 
times as there are fields in the returned composite type. Even in case 
of placing call in FROM list (thanks to lateral joins in 9.3), 
PostgreSQL still sometimes performs redundant calls which can be avoided 
using hack with adding "OFFSET 1" clause.

3. 256Gb limit for used shared memory segment size at Linux.

Concerning last problem - I have included in IMCS distributive much 
simpler patch which just set MAP_HUGETLB flags when
a) is it defined in system headers
b) requested memory size is larger than 256Gb

In this case right now PostgreSQL will just fail to start.
But certainly it is more correct to trigger this flag through 
configuration parameter, because large pages can minimize MMU overhead 
and so increase speed even if size of used memory is less than 256Gb 
(this is why Oracle is widely using it).

. Вызов функции занимает прядка 2 микросекунд. Т.е. если я напишу 
триггер с пустой процедурой, то вставка 6 миллионов объектов займёт 15 
секунд. Это при том, что без триггера вставка занимает всего 7 секунд...

On 12/11/2013 06:33 PM, Merlin Moncure wrote:
> On Mon, Dec 9, 2013 at 1:40 PM, knizhnik <knizhnik@garret.ru> wrote:
>> Hello!
>>
>> I want to annouce my implementation of In-Memory Columnar Store extension
>> for PostgreSQL:
>>
>>       Documentation: http://www.garret.ru/imcs/user_guide.html
>>       Sources: http://www.garret.ru/imcs-1.01.tar.gz
>>
>> Any feedbacks, bug reports and suggestions are welcome.
>>
>> Vertical representation of data is stored in PostgreSQL shared memory.
>> This is why it is important to be able to utilize all available physical
>> memory.
>> Now servers with Tb or more RAM are not something exotic, especially in
>> financial world.
>> But there is limitation in Linux with standard 4kb pages  for maximal size
>> of mapped memory segment: 256Gb.
>> It is possible to overcome this limitation either by creating multiple
>> segments - but it requires too much changes in PostgreSQL memory manager.
>> Or just set MAP_HUGETLB flag (assuming that huge pages were allocated in the
>> system).
>>
>> I found several messages related with MAP_HUGETLB flag, the most recent one
>> was from 21 of November:
>> http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org
>>
>> I wonder what is the current status of this patch?
> I looked over your extension.  I think it's a pretty amazing example
> of the postgres extension and type systems -- up there with postgis.
> Very well done.  How long did this take you to write?
>
> MAP_HUGETLB patch was marked 'returned with feedback'.
> https://commitfest.postgresql.org/action/patch_view?id=1308.  It seems
> likely to be revived, perhaps in time for 9.4.
>
> Honestly, I think your efforts here provide more argument for adding
> huge tbl support.
>
> merlin

Re: In-Memory Columnar Store

From

knizhnik

Date:

11 December 2013, 16:25:52

Hi,

I depends on what you mean by "transparently substitute".
I f you want to be able to execute standard SQL queries using columnar 
store, then it seems to be impossible without rewriting of executor.
I provided another approach based on calling standard functions which 
perform manipulations not with scalar types but with timeseries.

For example instead of standard SQL

select sum(ClosePrice) from Quote;

I will have to write:

select cs_sum(ClosePrice) from Quote_get();

It looks similar but not quite the same.
And for more complex queries difference is larger.
For example the query

select sum(score*volenquired)/sum(volenquired) from DbItem group by 
(trader,desk,office);

can be written as

select agg_val,cs_cut(group_by,'c22c30c10') from    (select (cs_project_agg(ss1.*)).* from          (select
(s1).sum/(s2).sum,(s1).groupsfrom DbItem_get() q,               cs_hash_sum(q.score*q.volenquired, 

q.trader||q.desk||q.office) s1,                cs_hash_sum(q.volenquired, q.trader||q.desk||q.office) 
s2) ss1) ss2;

Looks too complex, doesn't it?
But first two lines are responsible to perform reverse mapping: from 
vertical data representation to normal horisontal tuples.
The good thing is that this query is executed more than 1000 times 
faster (with default PostgreSQL configuration parameters except shared 
shared_buffers
which was set large enough to fit all data in memory).

On 12/11/2013 07:14 PM, ktm@rice.edu wrote:
> On Mon, Dec 09, 2013 at 11:40:41PM +0400, knizhnik wrote:
>> Hello!
>>
>> I want to annouce my implementation of In-Memory Columnar Store
>> extension for PostgreSQL:
>>
>>       Documentation: http://www.garret.ru/imcs/user_guide.html
>>       Sources: http://www.garret.ru/imcs-1.01.tar.gz
>>
>> Any feedbacks, bug reports and suggestions are welcome.
>>
>> Vertical representation of data is stored in PostgreSQL shared memory.
>> This is why it is important to be able to utilize all available
>> physical memory.
> Hi,
>
> This is very neat! The question I have, which applies to the matview
> support as well, is "How can we transparently substitute usage of the
> in-memory columnar store/matview in a SQL query?".
>
> Regards,
> Ken

Re: In-Memory Columnar Store

From

Merlin Moncure

Date:

11 December 2013, 18:33:13

On Wed, Dec 11, 2013 at 10:08 AM, knizhnik <knizhnik@garret.ru> wrote:
> 1. Calls in PL/pgSQL are very slow - about 1-2 micsroseconds at my computer.
> Just defining insertion per-row trigger with empty procedure increase time
> of insertion of 6 million records twice - from 7 till 15 seconds. If trigger
> procedure is not empty, then time is increased proportionally number of
> performed calls.
> In my case inserting data with propagation it in columnar store using
> trigger takes about 80 seconds. But if I first load data without triggers in
> PostgreSQL table and then
> insert it in columnar store using load function (implemented in C), then
> time will be 7+9=16 seconds.

Yeah. For this problem, we either unfortunately have to try to try to
use standard sql functions in such away that supports inlining (this
is a black art mostly, and fragile), or move logic out of the function
and into the query via things like window functions, or just deal with
the performance hit.  postgres flavored SQL is pretty much the most
productive language on the planet AFAIC, but the challenge is always
performance, performance.

Down the line, I am optimistic per call function overhead can be
optimized, probably by expanding what can be inlined somehow.  The
problem is that this requires cooperation from the language executors
this is not currently possible through the SPI interface, so I really
don't know.

> Certainly I realize that plpgsql is interpreted language. But for example
> also interpreted Python is able to do 100 times more calls per second.
> Unfortunately profiler doesn;t show some bottleneck - looks like long
> calltime is caused by large overhead of initializing and resetting memory
> context and copying arguments data.
>
> 2. Inefficient implementation of expanding composite type columns using
> (foo()).* clause. In this case function foo() will be invoked as much times
> as there are fields in the returned composite type. Even in case of placing
> call in FROM list (thanks to lateral joins in 9.3), PostgreSQL still
> sometimes performs redundant calls which can be avoided using hack with
> adding "OFFSET 1" clause.

Yeah, this is long standing headache.   LATERAL mostly deals with this
but most cases (even with pre-9.3) can be worked around one way or
another.

> 3. 256Gb limit for used shared memory segment size at Linux.

I figure this will be solved fairly soon. It's a nice problem to have.

merlin

Re: In-Memory Columnar Store

From

Kevin Grittner

Date:

11 December 2013, 18:52:27

"ktm@rice.edu" <ktm@rice.edu> wrote:

> The question I have, which applies to the matview support as
> well, is "How can we transparently substitute usage of the
> in-memory columnar store/matview in a SQL query?".

My take on that regarding matviews is:

(1)  It makes no sense to start work on this without a far more
sophisticated concept of matview "freshness" (or "staleness", as
some products prefer to call it).

(2)  Work on query rewrite to use sufficiently fresh matviews to
optimize the execution of a query and work on "freshness" tracking
are orthogonal to work on incremental maintenance.

I have no plans to work on either matview freshness or rewrite, as
there seems to be several years worth of work to get incremental
maintenance up to a level matching other products.  I welcome
anyone else to take on those other projects.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: In-Memory Columnar Store

From

desmodemone

Date:

12 December 2013, 00:06:49

<div dir="ltr"><br /><div class="gmail_extra"><br /><br /><div class="gmail_quote">2013/12/9 knizhnik <span
dir="ltr"><<ahref="mailto:knizhnik@garret.ru" target="_blank">knizhnik@garret.ru</a>></span><br /><blockquote
class="gmail_quote"style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> Hello!<br
/><br/> I want to annouce my implementation of In-Memory Columnar Store extension for PostgreSQL:<br /><br />    
 Documentation:<a href="http://www.garret.ru/imcs/user_guide.html"
target="_blank">http://www.garret.ru/imcs/<u></u>user_guide.html</a><br/>      Sources: <a
href="http://www.garret.ru/imcs-1.01.tar.gz"target="_blank">http://www.garret.ru/imcs-1.<u></u>01.tar.gz</a><br /><br
/>Any feedbacks, bug reports and suggestions are welcome.<br /><br /> Vertical representation of data is stored in
PostgreSQLshared memory.<br /> This is why it is important to be able to utilize all available physical memory.<br />
Nowservers with Tb or more RAM are not something exotic, especially in financial world.<br /> But there is limitation
inLinux with standard 4kb pages  for maximal size of mapped memory segment: 256Gb.<br /> It is possible to overcome
thislimitation either by creating multiple segments - but it requires too much changes in PostgreSQL memory manager.<br
/>Or just set MAP_HUGETLB flag (assuming that huge pages were allocated in the system).<br /><br /> I found several
messagesrelated with MAP_HUGETLB flag, the most recent one was from 21 of November:<br /><a
href="http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org"
target="_blank">http://www.postgresql.org/<u></u>message-id/20131125032920.<u></u>GA23793@toroid.org</a><br/><br /> I
wonderwhat is the current status of this patch?<span class=""><font color="#888888"><br /><br /><br /><br /><br /><br
/><br/> -- <br /> Sent via pgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org"
target="_blank">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your subscription:<br /><a
href="http://www.postgresql.org/mailpref/pgsql-hackers"
target="_blank">http://www.postgresql.org/<u></u>mailpref/pgsql-hackers</a><br/></font></span></blockquote></div><br
/><br/></div><div class="gmail_extra">Hello,<br /></div><div class="gmail_extra">           excellent work! I begin to
dotesting and it's very fast, by the way I found a strange case of "endless" query with CPU a 100%  when the value used
asfilter does not exists:<br /><br /></div><div class="gmail_extra">I am testing with postgres 9.3.1 on debian  and I
useddefault value for the extension except memory ( 512mb )<br /><br /></div><div class="gmail_extra">how to recreate
thetest case :<br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">## create a table :<br /><br
/></div><divclass="gmail_extra">create table endless ( col1 int , col2 char(30) , col3 int ) ;<br /><br /></div><div
class="gmail_extra">##insert some values:<br /></div><div class="gmail_extra"><br />insert into endless values ( 1,
'ahahahaha',3);<br /><br />insert into endless values ( 2, 'ghghghghg', 4);<br /><br />## create the column store
objects:<br/><br />select cs_create('endless','col1','col2');<br />  cs_create <br />-----------<br /> <br />(1 row)<br
/><br/></div><div class="gmail_extra">## try and test column store :<br /></div><div class="gmail_extra"><br />select
cs_avg(col3)from  endless_get('ahahahaha');<br /> cs_avg <br /> --------<br />      3<br />(1 row)<br /><br />select
cs_avg(col3)from  endless_get('ghghghghg');<br /> cs_avg <br />--------<br />      4<br />(1 row)<br /><br /></div><div
class="gmail_extra">##now select with a value that does not exist :<br /></div><div class="gmail_extra"><br />select
cs_avg(col3)from  endless_get('testing');<br /><br /></div><div class="gmail_extra"># and now  start to loop on cpu and
seemsto never ends , I had to terminate backend<br /></div><div class="gmail_extra"><br /></div><div
class="gmail_extra">Bye<br/><br />Mat<br /></div></div>

Re: In-Memory Columnar Store

From

knizhnik

Date:

12 December 2013, 06:53:21

<div class="moz-cite-prefix">Thank you very much for reporting the problem.<br /> And sorry for this bug and lack of
negativetests.<br /><br /> Attempt to access unexisted value cause autoloading of data from the table to columnar store
(becauseautoload property is enabled by default) <br /> and as far as this entry is not present in the table, the code
fallsinto infinite recursion.<br /> Patched version of IMCS is available at <a class="moz-txt-link-freetext"
href="http://www.garret.ru/imcs-1.01.tar.gz">http://www.garret.ru/imcs-1.01.tar.gz</a><br/><br /> I am going to place
IMCSunder version control now. Just looking for proper place for repository...<br /><br /><br /> On 12/12/2013 04:06
AM,desmodemone wrote:<br /></div><blockquote
cite="mid:CAEs9oFn920CSw_0k+TTa79cdF6zQC+TdHz3xwvbdXmXa_iEMZQ@mail.gmail.com"type="cite"><div dir="ltr"><br /><div
class="gmail_extra"><br/><br /><div class="gmail_quote">2013/12/9 knizhnik <span dir="ltr"><<a
href="mailto:knizhnik@garret.ru"moz-do-not-send="true" target="_blank">knizhnik@garret.ru</a>></span><br
/><blockquoteclass="gmail_quote" style="margin:0px 0px 0px             0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">Hello!<br /><br /> I want to annouce my implementation of In-Memory Columnar Store
extensionfor PostgreSQL:<br /><br />      Documentation: <a href="http://www.garret.ru/imcs/user_guide.html"
moz-do-not-send="true"target="_blank">http://www.garret.ru/imcs/user_guide.html</a><br />      Sources: <a
href="http://www.garret.ru/imcs-1.01.tar.gz"moz-do-not-send="true"
target="_blank">http://www.garret.ru/imcs-1.01.tar.gz</a><br/><br /> Any feedbacks, bug reports and suggestions are
welcome.<br/><br /> Vertical representation of data is stored in PostgreSQL shared memory.<br /> This is why it is
importantto be able to utilize all available physical memory.<br /> Now servers with Tb or more RAM are not something
exotic,especially in financial world.<br /> But there is limitation in Linux with standard 4kb pages  for maximal size
ofmapped memory segment: 256Gb.<br /> It is possible to overcome this limitation either by creating multiple segments -
butit requires too much changes in PostgreSQL memory manager.<br /> Or just set MAP_HUGETLB flag (assuming that huge
pageswere allocated in the system).<br /><br /> I found several messages related with MAP_HUGETLB flag, the most recent
onewas from 21 of November:<br /><a href="http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org"
moz-do-not-send="true"target="_blank">http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org</a><br
/><br/> I wonder what is the current status of this patch?<span class=""><font color="#888888"><br /><br /><br /><br
/><br/><br /><br /> -- <br /> Sent via pgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org"
moz-do-not-send="true"target="_blank">pgsql-hackers@postgresql.org</a>)<br /> To make changes to your subscription:<br
/><ahref="http://www.postgresql.org/mailpref/pgsql-hackers" moz-do-not-send="true"
target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></font></span></blockquote></div><br /><br
/></div><divclass="gmail_extra">Hello,<br /></div><div class="gmail_extra">           excellent work! I begin to do
testingand it's very fast, by the way I found a strange case of "endless" query with CPU a 100%  when the value used as
filterdoes not exists:<br /><br /></div><div class="gmail_extra">I am testing with postgres 9.3.1 on debian  and I used
defaultvalue for the extension except memory ( 512mb )<br /><br /></div><div class="gmail_extra">how to recreate the
testcase :<br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">## create a table :<br /><br
/></div><divclass="gmail_extra">create table endless ( col1 int , col2 char(30) , col3 int ) ;<br /><br /></div><div
class="gmail_extra">##insert some values:<br /></div><div class="gmail_extra"><br /> insert into endless values ( 1,
'ahahahaha',3);<br /><br /> insert into endless values ( 2, 'ghghghghg', 4);<br /><br /> ## create the column store
objects:<br/><br /> select cs_create('endless','col1','col2');<br />  cs_create <br /> -----------<br />  <br /> (1
row)<br/><br /></div><div class="gmail_extra">## try and test column store :<br /></div><div class="gmail_extra"><br />
selectcs_avg(col3) from  endless_get('ahahahaha');<br />  cs_avg <br /> --------<br />       3<br /> (1 row)<br /><br
/>select cs_avg(col3) from  endless_get('ghghghghg');<br />  cs_avg <br /> --------<br />       4<br /> (1 row)<br
/><br/></div><div class="gmail_extra">## now select with a value that does not exist :<br /></div><div
class="gmail_extra"><br/> select cs_avg(col3) from  endless_get('testing');<br /><br /></div><div class="gmail_extra">#
andnow  start to loop on cpu and seems to never ends , I had to terminate backend<br /></div><div
class="gmail_extra"><br/></div><div class="gmail_extra">Bye<br /><br /> Mat<br /></div></div></blockquote><br />

Re: In-Memory Columnar Store

From

Pavel Stehule

Date:

12 December 2013, 07:42:54

<p dir="ltr">it is interesting idea. For me, a significant information from comparation, so we do some significantly
wrong.Memory engine should be faster naturally, but I don't tkink it can be 1000x.<p dir="ltr">Yesterday we did a some
tests,that shows so for large tables (5G)a our hashing is not effective. Disabling hash join and using merge join
increasedspeed 2x<br /> Dne 9. 12. 2013 20:41 "knizhnik" <<a
href="mailto:knizhnik@garret.ru">knizhnik@garret.ru</a>>napsal(a):<br /> ><br /> > Hello!<br /> ><br />
>I want to annouce my implementation of In-Memory Columnar Store extension for PostgreSQL:<br /> ><br /> >    
 Documentation:<a href="http://www.garret.ru/imcs/user_guide.html">http://www.garret.ru/imcs/user_guide.html</a><br />
>     Sources: <a href="http://www.garret.ru/imcs-1.01.tar.gz">http://www.garret.ru/imcs-1.01.tar.gz</a><br />
><br/> > Any feedbacks, bug reports and suggestions are welcome.<br /> ><br /> > Vertical representation of
datais stored in PostgreSQL shared memory.<br /> > This is why it is important to be able to utilize all available
physicalmemory.<br /> > Now servers with Tb or more RAM are not something exotic, especially in financial world.<br
/>> But there is limitation in Linux with standard 4kb pages  for maximal size of mapped memory segment: 256Gb.<br
/>> It is possible to overcome this limitation either by creating multiple segments - but it requires too much
changesin PostgreSQL memory manager.<br /> > Or just set MAP_HUGETLB flag (assuming that huge pages were allocated
inthe system).<br /> ><br /> > I found several messages related with MAP_HUGETLB flag, the most recent one was
from21 of November:<br /> > <a
href="http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org">http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org</a><br
/>><br /> > I wonder what is the current status of this patch?<br /> ><br /> ><br /> ><br /> ><br />
><br/> ><br /> > -- <br /> > Sent via pgsql-hackers mailing list (<a
href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> > To make changes to your
subscription:<br/> > <a
href="http://www.postgresql.org/mailpref/pgsql-hackers">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/>

Re: In-Memory Columnar Store

From

knizhnik

Date:

12 December 2013, 10:02:48

<div class="moz-cite-prefix">On 12/12/2013 11:42 AM, Pavel Stehule wrote:<br /></div><blockquote
cite="mid:CAFj8pRDv7dHivPVrCQFT3zz2a7-d9+nm6cwB-mQcxOWtkdrrDg@mail.gmail.com"type="cite"><p dir="ltr">it is interesting
idea.For me, a significant information from comparation, so we do some significantly wrong. Memory engine should be
fasternaturally, but I don't tkink it can be 1000x.</blockquote><br /> Sorry, but I didn't  fabricate this results:<br
/>Below is just snapshot from my computer:<br /><br /><br /> postgres=# select DbItem_load();<br />  dbitem_load <br />
-------------<br/>      9999998<br /> (1 row)<br /><br /> postgres=# \timing<br /> Timing is on.<br /> postgres=#
selectcs_used_memory();<br />  cs_used_memory <br /> ----------------<br />      4441894912<br /> (1 row)<br /><br />
postgres=#select agg_val,cs_cut(group_by,'c22c30c10') from <br />      (select (cs_project_agg(ss1.*)).* from <br />
          (select (s1).sum/(s2).sum,(s1).groups from DbItem_get() q, <br />                
cs_hash_sum(q.score*q.volenquired,q.trader||q.desk||q.office) s1,  <br />                  cs_hash_sum(q.volenquired,
q.trader||q.desk||q.office)s2) ss1) ss2;<br />      agg_val      |                          
cs_cut                          <br />
------------------+------------------------------------------------------------<br/>  1.50028393511844 | ("John
Coltrane","NewYork Corporates","New York")<br /> ....<br /> Time: <font color="#ff0000">506.125 ms</font><br /><br />
postgres=#select sum(score*volenquired)/sum(volenquired) from DbItem group by (trader,desk,office);<br /> ...<br />
Time:<font color="#ff0000">449328.645 ms</font><br /> postgres=# select sum(score*volenquired)/sum(volenquired) from
DbItemgroup by (trader,desk,office);<br /> ...<br /> Time: <font color="#ff0000">441530.689 ms</font><br /><br />
Pleasenotice that time of second execution is almost the same as first, although all data can fit in cache!<br /><br />
Certainlyit was intersting to me to understand the reason of such bad performance.<br /> And find out two things:<br
/><br/> 1. <br />      select sum(score*volenquired)/sum(volenquired) from DbItem group by (trader,desk,office);<br />
and<br/>      select sum(score*volenquired)/sum(volenquired) from DbItem group by trader,desk,office;<br /><br /> are
notthe same queries (it is hard to understand to C programmer:)<br /> And first one is executed significantly
slower.<br/><br /> 2. It is not enough to increase "shared_buffers" parameter in postgresql.conf.<br /> "work_mem" is
alsovery important. When I increased it to 1Gb from default 1Mb, then time of query execution is reduced to <br />
7107.146ms. So the real difference is ten times, not 1000 times.<br /><br /><br /><br /><br /><br /><blockquote
cite="mid:CAFj8pRDv7dHivPVrCQFT3zz2a7-d9+nm6cwB-mQcxOWtkdrrDg@mail.gmail.com"type="cite"><p dir="ltr">Yesterday we did
asome tests, that shows so for large tables (5G)a our hashing is not effective. Disabling hash join and using merge
joinincreased speed 2x<br /> Dne 9. 12. 2013 20:41 "knizhnik" <<a href="mailto:knizhnik@garret.ru"
moz-do-not-send="true">knizhnik@garret.ru</a>>napsal(a):<br /> ><br /> > Hello!<br /> ><br /> > I want
toannouce my implementation of In-Memory Columnar Store extension for PostgreSQL:<br /> ><br /> >    
 Documentation:<a href="http://www.garret.ru/imcs/user_guide.html"
moz-do-not-send="true">http://www.garret.ru/imcs/user_guide.html</a><br/> >      Sources: <a
href="http://www.garret.ru/imcs-1.01.tar.gz"moz-do-not-send="true">http://www.garret.ru/imcs-1.01.tar.gz</a><br />
><br/> > Any feedbacks, bug reports and suggestions are welcome.<br /> ><br /> > Vertical representation of
datais stored in PostgreSQL shared memory.<br /> > This is why it is important to be able to utilize all available
physicalmemory.<br /> > Now servers with Tb or more RAM are not something exotic, especially in financial world.<br
/>> But there is limitation in Linux with standard 4kb pages  for maximal size of mapped memory segment: 256Gb.<br
/>> It is possible to overcome this limitation either by creating multiple segments - but it requires too much
changesin PostgreSQL memory manager.<br /> > Or just set MAP_HUGETLB flag (assuming that huge pages were allocated
inthe system).<br /> ><br /> > I found several messages related with MAP_HUGETLB flag, the most recent one was
from21 of November:<br /> > <a href="http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org"
moz-do-not-send="true">http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org</a><br/> ><br /> >
Iwonder what is the current status of this patch?<br /> ><br /> ><br /> ><br /> ><br /> ><br /> ><br
/>> -- <br /> > Sent via pgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org"
moz-do-not-send="true">pgsql-hackers@postgresql.org</a>)<br/> > To make changes to your subscription:<br /> > <a
href="http://www.postgresql.org/mailpref/pgsql-hackers"
moz-do-not-send="true">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></blockquote><br />

Re: In-Memory Columnar Store

From

Merlin Moncure

Date:

12 December 2013, 15:03:56

On Thu, Dec 12, 2013 at 4:02 AM, knizhnik <knizhnik@garret.ru> wrote:
> On 12/12/2013 11:42 AM, Pavel Stehule wrote:
>
> it is interesting idea. For me, a significant information from comparation,
> so we do some significantly wrong. Memory engine should be faster naturally,
> but I don't tkink it can be 1000x.
>
>
> Sorry, but I didn't  fabricate this results:
> Below is just snapshot from my computer:
>
>
> postgres=# select DbItem_load();
>  dbitem_load
> -------------
>      9999998
> (1 row)
>
> postgres=# \timing
> Timing is on.
> postgres=# select cs_used_memory();
>  cs_used_memory
> ----------------
>      4441894912
> (1 row)
>
> postgres=# select agg_val,cs_cut(group_by,'c22c30c10') from
>      (select (cs_project_agg(ss1.*)).* from
>            (select (s1).sum/(s2).sum,(s1).groups from DbItem_get() q,
>                 cs_hash_sum(q.score*q.volenquired,
> q.trader||q.desk||q.office) s1,
>                  cs_hash_sum(q.volenquired, q.trader||q.desk||q.office) s2)
> ss1) ss2;
>      agg_val      |                           cs_cut
> ------------------+------------------------------------------------------------
>  1.50028393511844 | ("John Coltrane","New York Corporates","New York")
> ....
> Time: 506.125 ms
>
> postgres=# select sum(score*volenquired)/sum(volenquired) from DbItem group
> by (trader,desk,office);
> ...
> Time: 449328.645 ms
> postgres=# select sum(score*volenquired)/sum(volenquired) from DbItem group
> by (trader,desk,office);
> ...
> Time: 441530.689 ms
>
> Please notice that time of second execution is almost the same as first,
> although all data can fit in cache!
>
> Certainly it was intersting to me to understand the reason of such bad
> performance.
> And find out two things:
>
> 1.
>      select sum(score*volenquired)/sum(volenquired) from DbItem group by
> (trader,desk,office);
> and
>      select sum(score*volenquired)/sum(volenquired) from DbItem group by
> trader,desk,office;
>
> are not the same queries (it is hard to understand to C programmer:)
> And first one is executed significantly slower.
>
> 2. It is not enough to increase "shared_buffers" parameter in
> postgresql.conf.
> "work_mem" is also very important. When I increased it to 1Gb from default
> 1Mb, then time of query execution is reduced to
> 7107.146 ms. So the real difference is ten times, not 1000 times.

Yeah.  It's not fair to compare vs an implementation that is
constrained to use only 1mb.  For analytics work huge work mem is
pretty typical setting.   10x improvement is believable considering
you've removed all MVCC overhead, locking, buffer management, etc. and
have a simplified data structure.

merlin

Re: In-Memory Columnar Store

From

knizhnik

Date:

12 December 2013, 18:18:42

On 12/12/2013 07:03 PM, Merlin Moncure wrote:
> On Thu, Dec 12, 2013 at 4:02 AM, knizhnik <knizhnik@garret.ru> wrote:
> Yeah. It's not fair to compare vs an implementation that is 
> constrained to use only 1mb. For analytics work huge work mem is 
> pretty typical setting. 10x improvement is believable considering 
> you've removed all MVCC overhead, locking, buffer management, etc. and 
> have a simplified data structure. merlin 
I agree that it is not fair comparison. As an excuse I can say that I am 
not an experienced PostgreSQL user, so I thought that setting 
shared_buffers is enough to avoid disk access by PostgreSQL. Only after 
getting such strange results I started investigation of how to properly 
tune P{ostgreSQL parameters.

IMHO it is strange to see such small default values in postgresql 
configuration - PostgreSQL is not an embedded database and now even 
mobile devices have several gigs of memory...
Also it will be nice to have one single switch - how much physical 
memory can PostgreSQL use. And let PostgreSQL spit it in optimal way. 
For example I have no idea how to optimally split memory between 
""shared_buffers", "temp_buffers", "work_mem", "maintenance_work_mem". 
PostgreSQL itself should do this work much better than unexperienced 
administrator.

And one of the possible values of such parameter can be "auto": make it 
possible to automatically determine available memory (it is not a big 
deal to check amount of available RAM in the system). I know that 
vendors of big databases never tries to simplify configuration and 
tuning of their products: just because most of the profit them get from 
consulting. But I think that it is not true for PostgreSQL.

Re: In-Memory Columnar Store

From

Merlin Moncure

Date:

12 December 2013, 18:26:51

On Thu, Dec 12, 2013 at 12:18 PM, knizhnik <knizhnik@garret.ru> wrote:
> IMHO it is strange to see such small default values in postgresql
> configuration.

This (low default work mem) is because of three things:

1) Most queries do not really need a lot of work mem
2) Work mem stacks with each query using it -- so with your 1mb
setting vs 1000 connections, you get a gigabyte.  So, some
conservatism is justified although this setting tended to be much more
dangerous in the old days when we measured memory in megabytes.
3) Postgres does not query available physical memory for default
settings due to portability issues.  So we tend to tune to "common
denominator".

merlin