Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL - Mailing list pgsql-hackers

From knizhnik
Subject Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date
Msg-id 52CEF60F.9070206@garret.ru
Whole thread Raw
In response to Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL  (Jim Nasby <jim@nasby.net>)
List pgsql-hackers
On 01/09/2014 09:22 PM, Robert Haas wrote:
> On Wed, Jan 8, 2014 at 2:39 PM, knizhnik <knizhnik@garret.ru> wrote:
>> I wonder what is the intended use case of dynamic shared memory?
>> Is is primarly oriented on PostgreSQL extensions or it will be used also in
>> PosatgreSQL core?
> My main motivation is that I want to use it to support parallel query.
>   There is unfortunately quite a bit of work left to be done before we
> can make that a reality, but that's the goal.

I do not want to waste your time, but this topic is very interesting to 
me and I will be very pleased if you drop few words about how DSM can 
help to implement parallel query processing?
It seems to me that the main complexity is in optimizer - it needs to 
split query plan into several subplans which can be executed 
concurrently and then merge their partial results.
As far as I understand it is not possible to use multithreading for 
parallel query execution because most of PostgreSQL code is 
non-reentrant. So we need to execute this subplans by several processes. 
And unlike threads, the only way of efficient exchanging data between 
processes is shared memory. So it is clear why do we need shared memory 
for parallel query execution. But why it has to be dynamic? Why it can 
not be preallocated at start time as most of other resources used by 
PostgreSQL?

>
>> May be I am wrong, but I do not see some reasons for creating multiple DSM
>> segments by the same extension.
> Right.
>
>> And total number of DSM segments is expected to be not very large (<10). The
>> same is true for synchronization primitives (LWLocks for example) needed to
>> synchronize access to this DSM segments. So I am not sure if possibility to
>> place locks in DSM is really so critical...
>> We can just reserved some space for LWLocks which can be used by extension,
>> so that LWLockAssign() can be used without RequestAddinLWLocks or
>> RequestAddinLWLocks can be used not only from preloaded extension.
> If you're doing all of this at postmaster startup time, that all works
> fine.  If you want to be able to load up an extension on the fly, then
> it doesn't.  You can only RequestAddinLWLocks() at postmaster start
> time, not afterwards, so currently any extension that wants to use
> lwlocks has to be loaded at postmaster startup time, or you're out of
> luck.
>
> Well.  Technically we reserve something like 3 extra lwlocks that
> could be assigned later.  But relying on those to be available is not
> very reliable, and also, 3 is not very many, considering that we have
> something north of 32k core lwlocks in the default configuration.

3 is definitely too small.
But you agreed with me that number of DSM segments will be not very large.
And if we do not need fine grain locking (and IMHO it is not needed for 
most extensions), then we need just few (most likely one) lock per DSM 
segment.
It means that if instead of 3 we reserve let's say 30 LW-locks, then it 
will be enough for most extensions. And there will be almost now extra 
resources overhead, because as you wrote PostgreSQL has 32k locks in 
default configuration.

Certainly if we need independent lock for each page of DSM memory than 
there will be no other choice except placing locks in DSM segment 
itself. But once again - I do not think that most of extension needed 
shared memory will use such fine grain locking.






pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: nested hstore patch
Next
From: knizhnik
Date:
Subject: Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL