Thread: Shared memory

Shared memory

From

Thomas Hallgren

Date:

24 March 2006, 06:51:31

Hi,
I'm currently investigating the feasibility of an alternative PL/Java implementation that 
would use shared memory to communicate between a JVM and the backend processes. I would very 
much like to make use of the routines provided in shmem.c but I'm a bit uncertain how to add 
a segment for my own use.

The flow I have in mind is:

Initialization:
Initialization takes place when the first PL/Java function (or validator) of the first 
session since the postmaster was started is called. The initialization process will create a 
small segment that represents the JVM. It will also start the JVM which in turn will attach 
to this segment. The JVM uses a small JNI library for this.

Session connect:
Connect takes place when the first PL/Java function (or validator) of a session is called 
(after initialization of course if its the first session).

The backend creates (or obtains, if I decide to pool them) a communication buffer of fixed 
size in shared memory. This buffer is can only be used by this backend and the JVM. The 
backend notifies the JVM of its presence using the global segment created during initialization.


My questions are:
1. Do you see something right away that invalidates this approach?
2. Is using the shared memory functionality that the backend provides a good idea (I'm 
thinking shmem functions, critical sections, semaphores, etc.). I'd rather depend on them 
then having conditional code for different operating systems.
3. Would it be better if the Postmaster allocated the global segment and started the JVM 
(based on some config parameter)?

All ideas and opinions are very welcome.

Kind Regards,
Thomas Hallgren

Re: Shared memory

From

Martijn van Oosterhout

Date:

24 March 2006, 08:59:00

On Fri, Mar 24, 2006 at 11:51:30AM +0100, Thomas Hallgren wrote:
> Hi,
> I'm currently investigating the feasibility of an alternative PL/Java
> implementation that would use shared memory to communicate between a JVM
> and the backend processes. I would very much like to make use of the
> routines provided in shmem.c but I'm a bit uncertain how to add a segment
> for my own use.

I'm wondering if a better way to do it would be similar to the way X
does it. The client connects to the X server via a pipe (tcp/ip or unix
domain). This is handy because you can block on a pipe. The client then
allocates a shared memory segment and sends a message to the server,
who can then also connect to it.

The neat thing about this is that the client can put data in the shared
memory segment and send one byte through the pipe and then block on a
read. The JVM which has a thread waiting on the other end wakes up,
processes the data, puts the result back and writes a byte to the pipe
and waits. This wakes up the client who can then read the result.

No locking, no semaphores, the standard UNIX semantics on pipes and
sockets make sure everything works.

In practice you'd probably end up sending small responses exclusively
via the pipe and only use the shared memory for larger blocks of data
but that's your choice. In X this is mostly used for image data and
such.

> My questions are:
> 1. Do you see something right away that invalidates this approach?

Nothing direct, though a single segment just for finding the JVM seems
a lot. A socket approach would work better I think.

> 2. Is using the shared memory functionality that the backend provides a
> good idea (I'm thinking shmem functions, critical sections, semaphores,
> etc.). I'd rather depend on them then having conditional code for different
> operating systems.

That I don't know. However, ISTM a lock-free approach is better
wherever possible. If you can avoid the semaphores altogether...

> 3. Would it be better if the Postmaster allocated the global segment and
> started the JVM (based on some config parameter)?

I don't know about the segment but the postmaster should start. I
thought the tsearch guys had an approach using a co-process. I don't
know how they start it up but they connected via pipes.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Shared memory

From

Thomas Hallgren

Date:

24 March 2006, 09:29:39

Martijn van Oosterhout wrote:
> On Fri, Mar 24, 2006 at 11:51:30AM +0100, Thomas Hallgren wrote:
>   
>> Hi,
>> I'm currently investigating the feasibility of an alternative PL/Java 
>> implementation that would use shared memory to communicate between a JVM 
>> and the backend processes. I would very much like to make use of the 
>> routines provided in shmem.c but I'm a bit uncertain how to add a segment 
>> for my own use.
>>     
>
> I'm wondering if a better way to do it would be similar to the way X
> does it. The client connects to the X server via a pipe (tcp/ip or unix
> domain). This is handy because you can block on a pipe. The client then
> allocates a shared memory segment and sends a message to the server,
> who can then also connect to it.
>
> The neat thing about this is that the client can put data in the shared
> memory segment and send one byte through the pipe and then block on a
> read. The JVM which has a thread waiting on the other end wakes up,
> processes the data, puts the result back and writes a byte to the pipe
> and waits. This wakes up the client who can then read the result.
>
> No locking, no semaphores, the standard UNIX semantics on pipes and
> sockets make sure everything works.
>
> In practice you'd probably end up sending small responses exclusively
> via the pipe and only use the shared memory for larger blocks of data
> but that's your choice. In X this is mostly used for image data and
> such.
>
>   

Pipes could be used when the connection is initialized, that's for sure. 
Thanks for the suggestion. Only thing I need to solve is how to detect 
if the JVM is present and start it up when it isn't. Either I require 
that it's there and generate an error when it isn't (analog with what 
Apache would do if Tomcat is missing) or I treat the failure to obtain 
the pipe as an indication on that it's not started yet.



>> My questions are:
>> 1. Do you see something right away that invalidates this approach?
>>     
>
> Nothing direct, though a single segment just for finding the JVM seems
> a lot. A socket approach would work better I think.
>
>   
For the initial setup, sure. But I think pipes might be too slow for the 
actual function calls. What I want is the absolute most efficient ipc 
mechanism that can be achived. I'm thinking in terms of critical 
sections obtained using spinlocks and atomic exchange on memory that 
perhaps migrate to a real semaphore when the spin goes on for too long.

I will do some tests using pipes too. If the gain using other types of 
concurrency control is second to none, then I would agree that pipes are 
simpler and more elegant.

>> 2. Is using the shared memory functionality that the backend provides a 
>> good idea (I'm thinking shmem functions, critical sections, semaphores, 
>> etc.). I'd rather depend on them then having conditional code for different 
>> operating systems.
>>     
>
> That I don't know. However, ISTM a lock-free approach is better
> wherever possible. If you can avoid the semaphores altogether...
>
>   
Lock free? I'm not sure I understand what you mean. I'll have to wait on 
something. Or are you referring to the pipe approach?


>> 3. Would it be better if the Postmaster allocated the global segment and 
>> started the JVM (based on some config parameter)?
>>     
>
> I don't know about the segment but the postmaster should start. I
> thought the tsearch guys had an approach using a co-process. I don't
> know how they start it up but they connected via pipes.
>
>   
I'll check that out. Thanks for the tip.

> Hope this helps,
>   
Your insights often do. Thanks a lot.

Regards,
Thomas Hallgren

Re: Shared memory

From

Thomas Hallgren

Date:

27 March 2006, 04:57:31

Martijn,

I tried a Socket approach. Using the new IO stuff that arrived with Java 1.4 (SocketChannel 
etc.), the performance is really good. Especially on Linux where an SMP machine show a 1 to 
1.5 ratio between one process doing ping-pong between two threads and two processes doing 
ping-pong using a socket. That's acceptable overhead indeed and I don't think I'll be able 
to trim it much using a shared memory approach (the thread scenario uses Java monitor locks. 
That's the most efficient lightweight locking implementation I've come across).

One downside is that on a Windows box, the ratio between the threads and the processes 
scenario seems to be 1 to 5 which is a bit worse. I've heard that Solaris too is less 
efficient then Linux in this respect.

The real downside is that a call from SQL to PL/Java using the current in-process approach 
is really fast. It takes about 5 micro secs on my 2.8GHz i386 box. The overhead of an 
IPC-call on that box is about 18 micro secs on Linux and 64 micro secs on Windows. That's an 
overhead of between 440% and 1300% due to context switching alone. Yet, for some 
applications, perhaps that overhead is acceptable? It should be compared to the high memory 
consumption that the in-process approach undoubtedly results in (which in turn might lead to 
less optimal use of CPU caches and, if memory is insufficient, more time spent doing swapping).

Given those numbers, it would be interesting to hear what the community as a whole thinks 
about this.

Kind Regards,
Thomas Hallgren

Martijn van Oosterhout wrote:
> On Fri, Mar 24, 2006 at 11:51:30AM +0100, Thomas Hallgren wrote:
>> Hi,
>> I'm currently investigating the feasibility of an alternative PL/Java 
>> implementation that would use shared memory to communicate between a JVM 
>> and the backend processes. I would very much like to make use of the 
>> routines provided in shmem.c but I'm a bit uncertain how to add a segment 
>> for my own use.
> 
> I'm wondering if a better way to do it would be similar to the way X
> does it. The client connects to the X server via a pipe (tcp/ip or unix
> domain). This is handy because you can block on a pipe. The client then
> allocates a shared memory segment and sends a message to the server,
> who can then also connect to it.
> 
> The neat thing about this is that the client can put data in the shared
> memory segment and send one byte through the pipe and then block on a
> read. The JVM which has a thread waiting on the other end wakes up,
> processes the data, puts the result back and writes a byte to the pipe
> and waits. This wakes up the client who can then read the result.
> 
> No locking, no semaphores, the standard UNIX semantics on pipes and
> sockets make sure everything works.
> 
> In practice you'd probably end up sending small responses exclusively
> via the pipe and only use the shared memory for larger blocks of data
> but that's your choice. In X this is mostly used for image data and
> such.
> 
>> My questions are:
>> 1. Do you see something right away that invalidates this approach?
> 
> Nothing direct, though a single segment just for finding the JVM seems
> a lot. A socket approach would work better I think.
> 
>> 2. Is using the shared memory functionality that the backend provides a 
>> good idea (I'm thinking shmem functions, critical sections, semaphores, 
>> etc.). I'd rather depend on them then having conditional code for different 
>> operating systems.
> 
> That I don't know. However, ISTM a lock-free approach is better
> wherever possible. If you can avoid the semaphores altogether...
> 
>> 3. Would it be better if the Postmaster allocated the global segment and 
>> started the JVM (based on some config parameter)?
> 
> I don't know about the segment but the postmaster should start. I
> thought the tsearch guys had an approach using a co-process. I don't
> know how they start it up but they connected via pipes.
> 
> Hope this helps,

Re: Shared memory

From

Martijn van Oosterhout

Date:

27 March 2006, 06:45:12

On Mon, Mar 27, 2006 at 10:57:21AM +0200, Thomas Hallgren wrote:
> Martijn,
>
> I tried a Socket approach. Using the new IO stuff that arrived with Java
> 1.4 (SocketChannel etc.), the performance is really good. Especially on
> Linux where an SMP machine show a 1 to 1.5 ratio between one process doing
> ping-pong between two threads and two processes doing ping-pong using a
> socket. That's acceptable overhead indeed and I don't think I'll be able to
> trim it much using a shared memory approach (the thread scenario uses Java
> monitor locks. That's the most efficient lightweight locking implementation
> I've come across).

Yeah, it's fairly well known that the distinctions between processes
and threads on linux is much smaller than on other OSes. Windows is
pretty bad, which is why threading is much more popular there.

> The real downside is that a call from SQL to PL/Java using the current
> in-process approach is really fast. It takes about 5 micro secs on my
> 2.8GHz i386 box. The overhead of an IPC-call on that box is about 18 micro
> secs on Linux and 64 micro secs on Windows. That's an overhead of between
> 440% and 1300% due to context switching alone. Yet, for some applications,

<snip>

This might take some more measurements but AIUI the main difference
between in-process and intra-process is that one has a JVM per
connection, the other one JVM shared. In that case might thoughts are
as follows:

- Overhead of starting JVM. If you can start the JVM in the postmaster
you might be able to avoid this. However, if you have to restart the
JVM each process, that's a cost.

- JIT overhead. For often used classes JIT compiling can help a lot
with speed. But if every class needs to be reinterpreted each time,
maybe that costs more than your IPC.

- Memory overhead. You meantioned this already.

- Are you optimising for many short-lived connections or a few
long-lived connections?

My gut feeling is that if someone creates a huge number of server-side
java functions that performence will be better by having one always
running JVM with highly JIT optimised code than having each JVM doing
it from scratch. But this will obviously need to be tested.

One other thing is that seperate processes give you the ability to
parallelize. For example, if a Java function does an SPI query, it can
receive and process results in parallel with the backend generating
them. This may not be easy to acheive with an in-process JVM.

Incidently, there are compilers these days that can compile Java to
native. Is this Java stuff setup in such a way that you can compile your
classes to native and load directly for the real speed-freaks? In that
case, maybe you should concentrate on relibility and flexibility and
still have a way out for functions that *must* be high-performance.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Shared memory

From

Thomas Hallgren

Date:

27 March 2006, 08:48:25

Martijn van Oosterhout wrote:
> On Mon, Mar 27, 2006 at 10:57:21AM +0200, Thomas Hallgren wrote:
>> Martijn,
>>
>> I tried a Socket approach. Using the new IO stuff that arrived with Java 
>> 1.4 (SocketChannel etc.), the performance is really good. Especially on 
>> Linux where an SMP machine show a 1 to 1.5 ratio between one process doing 
>> ping-pong between two threads and two processes doing ping-pong using a 
>> socket. That's acceptable overhead indeed and I don't think I'll be able to 
>> trim it much using a shared memory approach (the thread scenario uses Java 
>> monitor locks. That's the most efficient lightweight locking implementation 
>> I've come across).
> 
> Yeah, it's fairly well known that the distinctions between processes
> and threads on linux is much smaller than on other OSes. Windows is
> pretty bad, which is why threading is much more popular there.
> 
>> The real downside is that a call from SQL to PL/Java using the current 
>> in-process approach is really fast. It takes about 5 micro secs on my 
>> 2.8GHz i386 box. The overhead of an IPC-call on that box is about 18 micro 
>> secs on Linux and 64 micro secs on Windows. That's an overhead of between 
>> 440% and 1300% due to context switching alone. Yet, for some applications, 
> 
> <snip>
> 
> This might take some more measurements but AIUI the main difference
> between in-process and intra-process is that one has a JVM per
> connection, the other one JVM shared. In that case might thoughts are
> as follows:
> 
> - Overhead of starting JVM. If you can start the JVM in the postmaster
> you might be able to avoid this. However, if you have to restart the
> JVM each process, that's a cost.
> 
> - JIT overhead. For often used classes JIT compiling can help a lot
> with speed. But if every class needs to be reinterpreted each time,
> maybe that costs more than your IPC.
> 
> - Memory overhead. You meantioned this already.
> 
> - Are you optimising for many short-lived connections or a few
> long-lived connections?
> 
> My gut feeling is that if someone creates a huge number of server-side
> java functions that performence will be better by having one always
> running JVM with highly JIT optimised code than having each JVM doing
> it from scratch. But this will obviously need to be tested.
> 
The use case with a huge number of short-lived connections is not feasible at all with 
PL/Java as it stands today. This is partly the reason for my current research. Another 
reason is that it's sometimes desirable to share resources between your connections. 
Dangerous perhaps, but an API that encourages separation and allows sharing in a controlled 
way might prove very beneficial.

The ideal use-case for PL/Java is a client that utilizes a connection pool. And most servlet 
containers and EJB servers do. Scenarios where you have just a few and fairly long lived 
clients are OK too.

> One other thing is that seperate processes give you the ability to
> parallelize. For example, if a Java function does an SPI query, it can
> receive and process results in parallel with the backend generating
> them. This may not be easy to acheive with an in-process JVM.
>

It is fairly easy to achieve using threads. Only one thread at a time may of course execute 
an SPI query but that's true when multiple processes are in place too since the backend is 
single-threaded, and since the logical thread in PL/Java must utilize the same backend as 
where the call originated (to maintain the transaction boundaries). Any result must also 
sooner or later be delivered using that same backend which further limits the ability to 
parallelize.

> Incidently, there are compilers these days that can compile Java to
> native. Is this Java stuff setup in such a way that you can compile your
> classes to native and load directly for the real speed-freaks?

PL/Java can be used with GCJ although I don't think the GCJ compiler outranks the JIT 
compiler in a modern JVM. It can only do static optimizations whereas the JIT has runtime 
heuristics to base its optimizations on. In the test results I've seen so far, the GCJ 
compiler only gets the upper hand in very simple tests. The JIT generated code is faster 
when things are more complicated.

GCJ is great if you're using short-lived connections (less startup time and everything is 
optimized from the very start) but the native code that it produces still needs a JVM of 
some sort. No interpreter of course but classes must be initialized, a garbage collector 
must be running etc. The shared native code results in some gain in memory consumption but 
it's not as significant as one might think.

> In that
> case, maybe you should concentrate on relibility and flexibility and
> still have a way out for functions that *must* be high-performance.
> 

Given time and enough resources, I'd like to provide the best of two worlds and give the 
user a choice whether or not the JVM should be external. Ideally, this should be controlled 
using configuration parameters so that its easy to test which scenario that works best. It's 
a lot of work though.

It very much comes down to your point "Are you optimising for many short-lived connections 
or a few long-lived connections?"

If the use-cases for the former are fairly few then I'm not sure it's worth the effort. In 
my experience, that is the case. People tend to use connection pools nowadays. But that's me 
and my opinion. It would be great if more people where involved in this discussion.

Regards,
Thomas Hallgren

Re: Shared memory

From

Tom Lane

Date:

27 March 2006, 11:31:45

Thomas Hallgren <thomas@tada.se> writes:
> The real downside is that a call from SQL to PL/Java using the current
> in-process approach is really fast. It takes about 5 micro secs on my
> 2.8GHz i386 box. The overhead of an IPC-call on that box is about 18
> micro secs on Linux and 64 micro secs on Windows. That's an overhead
> of between 440% and 1300% due to context switching alone. Yet, for
> some applications, perhaps that overhead is acceptable?

It's only that much difference?  Given all the other advantages of
separating the JVM from the backends, I'd say you should gladly pay
that price.
        regards, tom lane

Re: Shared memory

From

Thomas Hallgren

Date:

27 March 2006, 12:27:14

Tom Lane wrote:
> It's only that much difference?  Given all the other advantages of
> separating the JVM from the backends, I'd say you should gladly pay
> that price.
> 
If I'm right, and the most common scenario is clients using connection pools, then it's very 
likely that you don't get any advantages at all. Paying for nothing with a 440% increase in 
calling time (at best) seems expensive :-)

Regards,
Thomas Hallgren

Re: Shared memory

From

Tom Lane

Date:

27 March 2006, 12:32:33

Thomas Hallgren <thomas@tada.se> writes:
> Tom Lane wrote:
>> It's only that much difference?  Given all the other advantages of
>> separating the JVM from the backends, I'd say you should gladly pay
>> that price.
>> 
> If I'm right, and the most common scenario is clients using connection pools, then it's very 
> likely that you don't get any advantages at all. Paying for nothing with a 440% increase in 
> calling time (at best) seems expensive :-)

You are focused too narrowly on a few performance numbers.  In my mind
the primary advantage is that it will *work*.  I do not actually believe
that you'll ever get the embedded-JVM approach to production-grade
reliability, because of the fundamental problems with threading, error
processing, etc.
        regards, tom lane

Re: Shared memory

From

Thomas Hallgren

Date:

27 March 2006, 14:09:53

Tom Lane wrote:
> Thomas Hallgren <thomas@tada.se> writes:
>   
>> Tom Lane wrote:
>>     
>>> It's only that much difference?  Given all the other advantages of
>>> separating the JVM from the backends, I'd say you should gladly pay
>>> that price.
>>>
>>>       
>> If I'm right, and the most common scenario is clients using connection pools, then it's very 
>> likely that you don't get any advantages at all. Paying for nothing with a 440% increase in 
>> calling time (at best) seems expensive :-)
>>     
>
> You are focused too narrowly on a few performance numbers.  In my mind
> the primary advantage is that it will *work*.  I do not actually believe
> that you'll ever get the embedded-JVM approach to production-grade
> reliability, because of the fundamental problems with threading, error
> processing, etc.
>   
My focus with PL/Java over the last year has been to make it a 
production-grade product and I think I've succeeded pretty well. The 
current list of open bugs is second to none. What fundamental problems 
are you thinking of that hasn't been solved already?

Regards,
Thomas Hallgren

Re: Shared memory

From

Simon Riggs

Date:

28 March 2006, 06:24:29

On Mon, 2006-03-27 at 18:27 +0200, Thomas Hallgren wrote:
> Tom Lane wrote:
> > It's only that much difference?  Given all the other advantages of
> > separating the JVM from the backends, I'd say you should gladly pay
> > that price.
> > 
> If I'm right, and the most common scenario is clients using connection pools, then it's very 
> likely that you don't get any advantages at all. Paying for nothing with a 440% increase in 
> calling time (at best) seems expensive :-)

Just some thoughts from afar: DB2 supports in-process and out-of-process
external function calls (UDFs) that it refers to as UNFENCED and FENCED
procedures. For Java only, IBM have moved to supporting *only* FENCED
procedures for Java functions, i.e. having a single JVM for all
connections. Each connection's Java function runs as a thread on a
single dedicated JVM-only process. 

That approach definitely does increase the invocation time, but it
significantly reduces the resources associated with the JVM, as well as
allowing memory management to be more controllable (bliss...). So the
overall picture could be more CPU and memory resources for each
connection in the connection pool.

If you have a few small Java functions centralisation would not be good,
but if you have a whole application architecture with many connections
executing reasonable chunks of code then this can be a win.

In that environment we used Java for major database functions, with SQL
functions for small extensions.

Also the Java invocation time we should be celebrating is that by having
Java in the database the Java<->DB time is much less than it would be if
we had a Java stack sitting on another server.

Best Regards, Simon Riggs

Re: Shared memory

From

Thomas Hallgren

Date:

28 March 2006, 11:48:01

Hi Simon,
Thanks for your input. All good points. I actually did some work using Java stored 
procedures on DB2 a while back but I had managed to forget (or repress :-) ) all about the 
FENCED/NOT FENCED stuff. The current discussion definitely puts it in a different 
perspective. I think PL/Java has a pretty good 'NOT FENCED' implementation, as does many 
other PL's, but no PL has yet come up with a FENCED solution.

This FENCED/NOT FENCED terminology would be a good way to differentiate between the two 
approaches. Any chance of that syntax making it into the PostgreSQL grammar, should the need 
arise?

Some more comments inline:

Simon Riggs wrote:
> Just some thoughts from afar: DB2 supports in-process and out-of-process
> external function calls (UDFs) that it refers to as UNFENCED and FENCED
> procedures. For Java only, IBM have moved to supporting *only* FENCED
> procedures for Java functions, i.e. having a single JVM for all
> connections.>
Are you sure about this? As I recall it a FENCED stored procedure executed in a remote JVM 
of it's own. A parameter could be used that either caused a new JVM to be instantiated for 
each stored procedure call or to be kept for the duration of the session. The former would 
yield really horrible performance but keep memory utilization at a minimum. The latter would 
get a more acceptable performance but waste more memory (in par with PL/Java today).

> Each connection's Java function runs as a thread on a
> single dedicated JVM-only process. 
> 
If that was true, then different threads could share dirty session data. I wanted to do that 
using DB2 but found it impossible. That was a while back though.

> That approach definitely does increase the invocation time, but it
> significantly reduces the resources associated with the JVM, as well as
> allowing memory management to be more controllable (bliss...). So the
> overall picture could be more CPU and memory resources for each
> connection in the connection pool.
> 
My very crude measurements indicate that the overhead of using a separate JVM is between 
6-15MB of real memory per connection. Today, you get about 10MB/$ and servers configured 
with 4GB RAM or more are not uncommon.

I'm not saying that the overhead doesn't matter. Of course it does. But the time when you 
needed to be extremely conservative with memory usage has passed. It might be far less 
expensive to buy some extra memory then to invest in SMP architectures to minimize IPC overhead.

My point is, even fairly large app-servers (using connection pools with up to 200 
simultaneous connections) can run using relatively inexpensive boxes such as an AMD64 based 
server with 4GB RAM and show very good throughput with the current implementation.

> If you have a few small Java functions centralisation would not be good,
> but if you have a whole application architecture with many connections
> executing reasonable chunks of code then this can be a win.
> 
One thing to remembered is that a 'chunk of code' that executes in a remote JVM and uses 
JDBC will be hit by the IPC overhead on each interaction over the JDBC connection. I.e. the 
overhead is not just limited to the actual call of the UDF, it's also imposed on all 
database accesses that the UDF makes in turn.

> In that environment we used Java for major database functions, with SQL
> functions for small extensions.
> 
My guess is that those major database functions did a fair amount of JDBC. Am I right?

> Also the Java invocation time we should be celebrating is that by having
> Java in the database the Java<->DB time is much less than it would be if
> we had a Java stack sitting on another server.
> 

I think the cases when you have a Tomcat or JBoss sitting on the same physical server as the 
actual database are very common. One major reason being that you don't want network overhead 
between the middle tier and the backend. Moving logic into the database instead of keeping 
it in the middle tier is often done to get rid of the last hurdle, the overhead of IPC.

Regards,
Thomas Hallgren

Re: Shared memory

From

Tom Lane

Date:

28 March 2006, 12:38:08

Thomas Hallgren <thomas@tada.se> writes:
> This FENCED/NOT FENCED terminology would be a good way to
> differentiate between the two approaches. Any chance of that syntax
> making it into the PostgreSQL grammar, should the need arise?

Of what value would it be to have it in the grammar?  The behavior would
be entirely internal to any particular PL in any case.
        regards, tom lane

Re: Shared memory

From

Thomas Hallgren

Date:

28 March 2006, 13:11:13

Tom Lane wrote:
> Thomas Hallgren <thomas@tada.se> writes:
>   
>> This FENCED/NOT FENCED terminology would be a good way to
>> differentiate between the two approaches. Any chance of that syntax
>> making it into the PostgreSQL grammar, should the need arise?
>>     
>
> Of what value would it be to have it in the grammar?  The behavior would
> be entirely internal to any particular PL in any case.
>
>   
Not necessarily but perhaps the term FENCED is incorrect for the concept 
that I have in mind.

All languages that are implemented using a VM could benefit from the 
same remote UDF protocol. Java, C#, perhaps even Perl or Ruby. The flag 
that I'd like to have would control 'in-process' versus 'remote'.

I'm not too keen on the term FENCED, since it, in the PL/Java case will 
lead to poorer isolation. Multiple threads running in the same JVM will 
be able to share data and a JVM crash will affect all connected sessions.

Then again, perhaps it's a bad idea to have this in the function 
declaration in the first place. A custom GUC parameter might be a better 
choice. It will not be possible to have some functions use the 
in-process approach and others to execute remotely but I doubt that will 
matter that much.

I'm still eager to hear what it is in the current PL/Java that you 
consider fundamental unresolvable problems.

Regards,
Thomas Hallgren

Re: Shared memory

From

Simon Riggs

Date:

28 March 2006, 13:59:49

On Tue, 2006-03-28 at 17:48 +0200, Thomas Hallgren wrote:

> Simon Riggs wrote:
> > Just some thoughts from afar: DB2 supports in-process and out-of-process
> > external function calls (UDFs) that it refers to as UNFENCED and FENCED
> > procedures. For Java only, IBM have moved to supporting *only* FENCED
> > procedures for Java functions, i.e. having a single JVM for all
> > connections.
>  >
> Are you sure about this? 

Yes.

> As I recall it a FENCED stored procedure executed in a remote JVM 
> of it's own. A parameter could be used that either caused a new JVM to be instantiated for 
> each stored procedure call or to be kept for the duration of the session. The former would 
> yield really horrible performance but keep memory utilization at a minimum. The latter would 
> get a more acceptable performance but waste more memory (in par with PL/Java today).

In the previous release, yes.

> > That approach definitely does increase the invocation time, but it
> > significantly reduces the resources associated with the JVM, as well as
> > allowing memory management to be more controllable (bliss...). So the
> > overall picture could be more CPU and memory resources for each
> > connection in the connection pool.
> > 
> My very crude measurements indicate that the overhead of using a separate JVM is between 
> 6-15MB of real memory per connection. Today, you get about 10MB/$ and servers configured 
> with 4GB RAM or more are not uncommon.
> 
> I'm not saying that the overhead doesn't matter. Of course it does. But the time when you 
> needed to be extremely conservative with memory usage has passed. It might be far less 
> expensive to buy some extra memory then to invest in SMP architectures to minimize IPC overhead.
> 
> My point is, even fairly large app-servers (using connection pools with up to 200 
> simultaneous connections) can run using relatively inexpensive boxes such as an AMD64 based 
> server with 4GB RAM and show very good throughput with the current implementation.

Memory is cheap, memory bandwidth is not.

All CPUs have limited cache resources, so the more mem you waste, the
less efficient your CPUs will be.

That effects the way you do things, sure. 1GB lookup table: no problem.
10MB wasted memory retrieval: lots of dead CPU time.

> > If you have a few small Java functions centralisation would not be good,
> > but if you have a whole application architecture with many connections
> > executing reasonable chunks of code then this can be a win.
> > 
> One thing to remembered is that a 'chunk of code' that executes in a remote JVM and uses 
> JDBC will be hit by the IPC overhead on each interaction over the JDBC connection. I.e. the 
> overhead is not just limited to the actual call of the UDF, it's also imposed on all 
> database accesses that the UDF makes in turn.
> 
> 
> > In that environment we used Java for major database functions, with SQL
> > functions for small extensions.
> > 
> My guess is that those major database functions did a fair amount of JDBC. Am I right?

Not once I'd reviewed them...

> > Also the Java invocation time we should be celebrating is that by having
> > Java in the database the Java<->DB time is much less than it would be if
> > we had a Java stack sitting on another server.
> > 
> 
> I think the cases when you have a Tomcat or JBoss sitting on the same physical server as the 
> actual database are very common. One major reason being that you don't want network overhead 
> between the middle tier and the backend. Moving logic into the database instead of keeping 
> it in the middle tier is often done to get rid of the last hurdle, the overhead of IPC.

I can see the performance argument for both, but supporting both,
especially in a mix-and-match architecture is much harder.

Anyway, just trying to add some additional perspective.

Best Regards, Simon Riggs

Re: Shared memory

From

Dave Cramer

Date:

28 March 2006, 14:25:47

On 28-Mar-06, at 10:48 AM, Thomas Hallgren wrote:

> Hi Simon,
> Thanks for your input. All good points. I actually did some work  
> using Java stored procedures on DB2 a while back but I had managed  
> to forget (or repress :-) ) all about the FENCED/NOT FENCED stuff.  
> The current discussion definitely puts it in a different  
> perspective. I think PL/Java has a pretty good 'NOT FENCED'  
> implementation, as does many other PL's, but no PL has yet come up  
> with a FENCED solution.

What exactly is a FENCED solution ? If it is simply a remote  
connection to a single JVM then pl-j already does that.
>
> This FENCED/NOT FENCED terminology would be a good way to  
> differentiate between the two approaches. Any chance of that syntax  
> making it into the PostgreSQL grammar, should the need arise?
>
> Some more comments inline:
>
> Simon Riggs wrote:
>> Just some thoughts from afar: DB2 supports in-process and out-of- 
>> process
>> external function calls (UDFs) that it refers to as UNFENCED and  
>> FENCED
>> procedures. For Java only, IBM have moved to supporting *only* FENCED
>> procedures for Java functions, i.e. having a single JVM for all
>> connections.
> >
> Are you sure about this? As I recall it a FENCED stored procedure  
> executed in a remote JVM of it's own. A parameter could be used  
> that either caused a new JVM to be instantiated for each stored  
> procedure call or to be kept for the duration of the session. The  
> former would yield really horrible performance but keep memory  
> utilization at a minimum. The latter would get a more acceptable  
> performance but waste more memory (in par with PL/Java today).
>
>
>> Each connection's Java function runs as a thread on a
>> single dedicated JVM-only process.
> If that was true, then different threads could share dirty session  
> data. I wanted to do that using DB2 but found it impossible. That  
> was a while back though.
>
>> That approach definitely does increase the invocation time, but it
>> significantly reduces the resources associated with the JVM, as  
>> well as
>> allowing memory management to be more controllable (bliss...). So the
>> overall picture could be more CPU and memory resources for each
>> connection in the connection pool.
> My very crude measurements indicate that the overhead of using a  
> separate JVM is between 6-15MB of real memory per connection.  
> Today, you get about 10MB/$ and servers configured with 4GB RAM or  
> more are not uncommon.
>
> I'm not saying that the overhead doesn't matter. Of course it does.  
> But the time when you needed to be extremely conservative with  
> memory usage has passed. It might be far less expensive to buy some  
> extra memory then to invest in SMP architectures to minimize IPC  
> overhead.
>
> My point is, even fairly large app-servers (using connection pools  
> with up to 200 simultaneous connections) can run using relatively  
> inexpensive boxes such as an AMD64 based server with 4GB RAM and  
> show very good throughput with the current implementation.
>
>
>> If you have a few small Java functions centralisation would not be  
>> good,
>> but if you have a whole application architecture with many  
>> connections
>> executing reasonable chunks of code then this can be a win.
> One thing to remembered is that a 'chunk of code' that executes in  
> a remote JVM and uses JDBC will be hit by the IPC overhead on each  
> interaction over the JDBC connection. I.e. the overhead is not just  
> limited to the actual call of the UDF, it's also imposed on all  
> database accesses that the UDF makes in turn.
>
>
>> In that environment we used Java for major database functions,  
>> with SQL
>> functions for small extensions.
> My guess is that those major database functions did a fair amount  
> of JDBC. Am I right?
>
>
>> Also the Java invocation time we should be celebrating is that by  
>> having
>> Java in the database the Java<->DB time is much less than it would  
>> be if
>> we had a Java stack sitting on another server.
>
> I think the cases when you have a Tomcat or JBoss sitting on the  
> same physical server as the actual database are very common. One  
> major reason being that you don't want network overhead between the  
> middle tier and the backend. Moving logic into the database instead  
> of keeping it in the middle tier is often done to get rid of the  
> last hurdle, the overhead of IPC.
>
>
> Regards,
> Thomas Hallgren
>
>
> ---------------------------(end of  
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>

Re: Shared memory

From

Dave Cramer

Date:

28 March 2006, 14:27:12

On 28-Mar-06, at 12:11 PM, Thomas Hallgren wrote:

> Tom Lane wrote:
>> Thomas Hallgren <thomas@tada.se> writes:
>>
>>> This FENCED/NOT FENCED terminology would be a good way to
>>> differentiate between the two approaches. Any chance of that syntax
>>> making it into the PostgreSQL grammar, should the need arise?
>>>
>>
>> Of what value would it be to have it in the grammar?  The behavior  
>> would
>> be entirely internal to any particular PL in any case.
>>
>>
> Not necessarily but perhaps the term FENCED is incorrect for the  
> concept that I have in mind.
>
> All languages that are implemented using a VM could benefit from  
> the same remote UDF protocol. Java, C#, perhaps even Perl or Ruby.  
> The flag that I'd like to have would control 'in-process' versus  
> 'remote'.
>
> I'm not too keen on the term FENCED, since it, in the PL/Java case  
> will lead to poorer isolation. Multiple threads running in the same  
> JVM will be able to share data and a JVM crash will affect all  
> connected sessions.
When was the last time you saw a JVM crash ? These are very rare now.  
In any case if it does fail, it's a JVM bug and can happen to any  
code running and take the server down if it is in process.
>
> Then again, perhaps it's a bad idea to have this in the function  
> declaration in the first place. A custom GUC parameter might be a  
> better choice. It will not be possible to have some functions use  
> the in-process approach and others to execute remotely but I doubt  
> that will matter that much.
>
> I'm still eager to hear what it is in the current PL/Java that you  
> consider fundamental unresolvable problems.
>
> Regards,
> Thomas Hallgren
>
>
> ---------------------------(end of  
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that  
> your
>       message can get through to the mailing list cleanly
>

Re: Shared memory

From

Thomas Hallgren

Date:

28 March 2006, 16:27:46

Dave Cramer wrote:
>
> What exactly is a FENCED solution ? If it is simply a remote 
> connection to a single JVM then pl-j already does that.
Last time I tried to use pl-j (in order to build a mutual test 
platform), I didn't manage to make it compile due to missing artifacts 
and it wasn't ported to Windows. Lazslo filed a JIRA bug on that but 
since then (August last year) I've seen no activity in the project. Is 
it still alive? Is anyone using it?

Regards,
Thomas Hallgren

Re: Shared memory

From

Dave Cramer

Date:

28 March 2006, 16:44:07

The last time I talked to him Laszlo said he is working on it again.

Dave
On 28-Mar-06, at 2:21 PM, Thomas Hallgren wrote:

> Dave Cramer wrote:
>>
>> What exactly is a FENCED solution ? If it is simply a remote  
>> connection to a single JVM then pl-j already does that.
> Last time I tried to use pl-j (in order to build a mutual test  
> platform), I didn't manage to make it compile due to missing  
> artifacts and it wasn't ported to Windows. Lazslo filed a JIRA bug  
> on that but since then (August last year) I've seen no activity in  
> the project. Is it still alive? Is anyone using it?
>
> Regards,
> Thomas Hallgren
>
>

Re: Shared memory

From

Thomas Hallgren

Date:

28 March 2006, 17:18:15

Dave Cramer wrote:
>
>>
>> I'm not too keen on the term FENCED, since it, in the PL/Java case 
>> will lead to poorer isolation. Multiple threads running in the same 
>> JVM will be able to share data and a JVM crash will affect all 
>> connected sessions.
> When was the last time you saw a JVM crash ? These are very rare now.
I think that's somewhat dependent on what JVM you're using. For the 
commercial ones, BEA, IBM, and Sun, i fully agree.

> In any case if it does fail, it's a JVM bug and can happen to any code 
> running and take the server down if it is in process.
Crash is perhaps not the right word. My point concerned level of 
isolation. Code that is badly written may have serious impact on other 
threads in the same JVM. Let's say you cause an OutOfMemoryException or 
an endless loop. The former will render the JVM completely useless and 
the latter will cause low scheduling prio. If the same thing happens 
using an in-process JVM, the problem is isolated to that one session.

Regards,
Thomas Hallgren

Re: Shared memory

From

Mark Dilger

Date:

28 March 2006, 21:14:12

Thomas Hallgren wrote:
> Martijn,
> 
> I tried a Socket approach. Using the new IO stuff that arrived with Java
> 1.4 (SocketChannel etc.), the performance is really good. Especially on
> Linux where an SMP machine show a 1 to 1.5 ratio between one process
> doing ping-pong between two threads and two processes doing ping-pong
> using a socket. That's acceptable overhead indeed and I don't think I'll
> be able to trim it much using a shared memory approach (the thread
> scenario uses Java monitor locks. That's the most efficient lightweight
> locking implementation I've come across).
> 
> One downside is that on a Windows box, the ratio between the threads and
> the processes scenario seems to be 1 to 5 which is a bit worse. I've
> heard that Solaris too is less efficient then Linux in this respect.
> 
> The real downside is that a call from SQL to PL/Java using the current
> in-process approach is really fast. It takes about 5 micro secs on my
> 2.8GHz i386 box. The overhead of an IPC-call on that box is about 18
> micro secs on Linux and 64 micro secs on Windows. That's an overhead of
> between 440% and 1300% due to context switching alone. Yet, for some
> applications, perhaps that overhead is acceptable? It should be compared
> to the high memory consumption that the in-process approach undoubtedly
> results in (which in turn might lead to less optimal use of CPU caches
> and, if memory is insufficient, more time spent doing swapping).
> 
> Given those numbers, it would be interesting to hear what the community
> as a whole thinks about this.

Assuming by "community" you mean developers not normally involved in hackers, then:

1) As a developer, the required debugging time increases greatly when one
session can effect (or crash) all the other sessions.  This in turn drives up
the cost of development.  Unless some guarantees could be had against this sort
of intermittent runtime bugginess, I would be less likely to opt for PL/Java and
exposing myself to the potential cost overruns.

2) As a speed freak, I'm going to code things in C, not Java.  So the appeal of
Java must come from something other than speed, such as stability and faster
development cycles.

My opinion is that it all depends whether you can hammer down a reliable
solution that has the necessary stability guarantees.  Splitting the middle,
trying to get performance benefits at the cost of stability, would seem to make
PL/Java a sort of lukewarm solution on the speed side, and a lukewarm solution
on the stability side.  I doubt I could get excited about it.

mark