Thread: Remote PL/Java, Summary

Remote PL/Java, Summary

From
Thomas Hallgren
Date:
Hi all,
And thanks for very good input regarding a remote alternative to PL/Java 
(thread titled "Shared Memory"). I'm convinced that such an alternative 
would be a great addition to PL/Java and increase the number of users. 
The work to create such a platform that has the stability and quality of 
todays PL/Java is significant (I really do think it is a 
production-grade product today). So significant in fact, that I'm 
beginning to think of a third alternative. An alternative that would 
combine the performance of using in-process calls with the benefits of 
sharing a JVM. The answer is of course to make the backend multi-threaded.

This question has been debated before and always promptly rejected. One 
major reason is of course that it will not bring any benefits over the 
current multi-process approach on a majority of the platforms where 
PostgreSQL is used. A process-switch is just as fast as a thread-switch 
on Linux based systems. Over the last year however, something has happen 
that certainly speaks in the favor of multi-threading. PostgreSQL is 
getting widely adopted on Windows. On Windows, a process-switch is at 
least 5 times more expensive then a thread-switch. In order to 
appropriate locking, PostgreSQL is forced to do a fair amount of 
switching during transaction processing so the gain in using a 
multi-threaded approach on Windows is probably significant. The same is 
true for other OS'es where process-switching is relatively expensive.

There are other benefits as well. PostgreSQL would no longer need shared 
memory and semaphores and lot more resources could be shared between 
backend processes. The one major drawback of a multi-threaded approach 
(the one that's been the main argument for the defenders of the current 
approach) is vulnerability. If one thread is messing things up, then the 
whole system will be brought to a halt (on the other hand, that can be 
said about the current shared-memory approach as well). The cure for 
this is to have a system that, to the extent possible, prevents this 
from happening. How would that be possible? Well, such systems are 
widely used today. Huge companies use them in mission critical 
applications all over the world. They are called Virtual Machines. Two 
types in particular are gaining more an more ground. The .NET based CLR 
and the Java VM.

Although there's an Open Source initiative called Mono that implements 
the CLR, I still don't see it as a viable alternative to create a 
production-grade multi-platform database. Microsofts CLR is of course 
confined to Microsoft platforms. The Java VM's are however a different 
matter altogether. And with the java.nio.channels package that was 
introduced in Java 1.4 and the java.util.concurrent package from Java 
5.0, Java has taken a major steps forward in being a very feasible 
platform for a database implementation. There's actually nothing 
stopping you from doing a high-performance MVCC system using Java today. 
A SQL parser would be based on JavaCC technology (the grammar is already 
written although it needs small adjustments to comply with the 
PostgreSQL dialect). Lots of technology is there out-of-the-box such as 
regular expressions, hash-maps, linked lists, etc. Not to forget an 
exceptionally great threading system, now providing atomic operations, 
semaphores, copy-on-write arrays etc. In short, everything that a 
database implementor could ever wish for.

The third alternative for PL/Java, an approach that gets more viable 
every minute I think about it, is to implement the PostgreSQL backend 
completely in Java. I'm involved in the development of one of the 
commercial JVM's. I know that an enormous amount of resources are 
constantly devoted to performance optimizations. The days when a complex 
system written in C or C++ could outperform a JVM have passed. A static 
optimizer can only do so well. A JVM, that collects heuristics, 
communicates with the CPU about cache usage etc., can be a great deal 
smarter on how the final machine code will be optimized, and 
re-optimized should the conditions change. It would be great if 
PostgreSQL could benefit from all this research.

If a commercial JVM is perceived as a problem, then combine^h^h^hpile 
the code with GNU gcj instead of gcc like today.

The list of advantages can be made a mile long. There's no point in 
listing everything here. From my own standpoint, I'm of course thinking 
first and foremost about the advantages with PL/Java. It will become the 
absolute most efficient PL of them all. Other languages, for which no 
good Java implementation exists (I'm thinking Jython for Python, etc.), 
can be implemented using JNI. The most common functions used by say, 
PL/Perl could probably be implemented as callbacks into the Java domain 
in order to make the changes in the respective PL minimal.

Opinions? Suggestions?

Kind Regards,
Thomas Hallgren




Re: Remote PL/Java, Summary

From
Andrew Dunstan
Date:

Thomas Hallgren wrote:

> Hi all,
> And thanks for very good input regarding a remote alternative to 
> PL/Java (thread titled "Shared Memory"). I'm convinced that such an 
> alternative would be a great addition to PL/Java and increase the 
> number of users. The work to create such a platform that has the 
> stability and quality of todays PL/Java is significant (I really do 
> think it is a production-grade product today). So significant in fact, 
> that I'm beginning to think of a third alternative. An alternative 
> that would combine the performance of using in-process calls with the 
> benefits of sharing a JVM. The answer is of course to make the backend 
> multi-threaded.
>
> This question has been debated before and always promptly rejected. 
> One major reason is of course that it will not bring any benefits over 
> the current multi-process approach on a majority of the platforms 
> where PostgreSQL is used. A process-switch is just as fast as a 
> thread-switch on Linux based systems. Over the last year however, 
> something has happen that certainly speaks in the favor of 
> multi-threading. PostgreSQL is getting widely adopted on Windows. On 
> Windows, a process-switch is at least 5 times more expensive then a 
> thread-switch. In order to appropriate locking, PostgreSQL is forced 
> to do a fair amount of switching during transaction processing so the 
> gain in using a multi-threaded approach on Windows is probably 
> significant. The same is true for other OS'es where process-switching 
> is relatively expensive.
>
> There are other benefits as well. PostgreSQL would no longer need 
> shared memory and semaphores and lot more resources could be shared 
> between backend processes. The one major drawback of a multi-threaded 
> approach (the one that's been the main argument for the defenders of 
> the current approach) is vulnerability. If one thread is messing 
> things up, then the whole system will be brought to a halt (on the 
> other hand, that can be said about the current shared-memory approach 
> as well). The cure for this is to have a system that, to the extent 
> possible, prevents this from happening. How would that be possible? 
> Well, such systems are widely used today. Huge companies use them in 
> mission critical applications all over the world. They are called 
> Virtual Machines. Two types in particular are gaining more an more 
> ground. The .NET based CLR and the Java VM.
>
> Although there's an Open Source initiative called Mono that implements 
> the CLR, I still don't see it as a viable alternative to create a 
> production-grade multi-platform database. Microsofts CLR is of course 
> confined to Microsoft platforms. The Java VM's are however a different 
> matter altogether. And with the java.nio.channels package that was 
> introduced in Java 1.4 and the java.util.concurrent package from Java 
> 5.0, Java has taken a major steps forward in being a very feasible 
> platform for a database implementation. There's actually nothing 
> stopping you from doing a high-performance MVCC system using Java 
> today. A SQL parser would be based on JavaCC technology (the grammar 
> is already written although it needs small adjustments to comply with 
> the PostgreSQL dialect). Lots of technology is there out-of-the-box 
> such as regular expressions, hash-maps, linked lists, etc. Not to 
> forget an exceptionally great threading system, now providing atomic 
> operations, semaphores, copy-on-write arrays etc. In short, everything 
> that a database implementor could ever wish for.
>
> The third alternative for PL/Java, an approach that gets more viable 
> every minute I think about it, is to implement the PostgreSQL backend 
> completely in Java. I'm involved in the development of one of the 
> commercial JVM's. I know that an enormous amount of resources are 
> constantly devoted to performance optimizations. The days when a 
> complex system written in C or C++ could outperform a JVM have passed. 
> A static optimizer can only do so well. A JVM, that collects 
> heuristics, communicates with the CPU about cache usage etc., can be a 
> great deal smarter on how the final machine code will be optimized, 
> and re-optimized should the conditions change. It would be great if 
> PostgreSQL could benefit from all this research.
>
> If a commercial JVM is perceived as a problem, then combine^h^h^hpile 
> the code with GNU gcj instead of gcc like today.
>
> The list of advantages can be made a mile long. There's no point in 
> listing everything here. From my own standpoint, I'm of course 
> thinking first and foremost about the advantages with PL/Java. It will 
> become the absolute most efficient PL of them all. Other languages, 
> for which no good Java implementation exists (I'm thinking Jython for 
> Python, etc.), can be implemented using JNI. The most common functions 
> used by say, PL/Perl could probably be implemented as callbacks into 
> the Java domain in order to make the changes in the respective PL 
> minimal.
>
>

We already do use threads on Windows to a limited extent to do things 
like timers and pseudo-signal handling.

If this were a greenfields project then your arguments would have force. 
But for how long would you like to suspend Postgres development activity 
while we re-implement everything in Java? Not to mention the effort to 
recruit new developers to replace those who leave because they can't or 
don't want to be part of the effort.

For better or worse, PostgreSQL is written in C, and I can't see that 
changing.

It might be interesting to take a frozen code base for PostgreSQL and 
reimplement it in Java, and then run some comparisons, both for 
performance and crash stability. I just counted roughly 100k lines of 
source code, so a reimplementation effort would be distinctly non-trivial.

cheers

andrew


Re: Remote PL/Java, Summary

From
Andrew Dunstan
Date:

Andrew Dunstan wrote:

>
> We already do use threads on Windows to a limited extent to do things 
> like timers and pseudo-signal handling.
>
> If this were a greenfields project then your arguments would have 
> force. But for how long would you like to suspend Postgres development 
> activity while we re-implement everything in Java? Not to mention the 
> effort to recruit new developers to replace those who leave because 
> they can't or don't want to be part of the effort.
>
> For better or worse, PostgreSQL is written in C, and I can't see that 
> changing.
>
> It might be interesting to take a frozen code base for PostgreSQL and 
> reimplement it in Java, and then run some comparisons, both for 
> performance and crash stability. I just counted roughly 100k lines of 
> source code, so a reimplementation effort would be distinctly 
> non-trivial.
>

and a happy April 1 to you too, btw.

cheers

andrew


Re: Remote PL/Java, Summary

From
Thomas Hallgren
Date:
Andrew Dunstan wrote:
>
> and a happy April 1 to you too, btw.
>
;-)

- thomas