Remote PL/Java, Summary - Mailing list pgsql-hackers

From Thomas Hallgren
Subject Remote PL/Java, Summary
Date
Msg-id 442E4FEC.9050304@tada.se
Whole thread Raw
Responses Re: Remote PL/Java, Summary
List pgsql-hackers
Hi all,
And thanks for very good input regarding a remote alternative to PL/Java 
(thread titled "Shared Memory"). I'm convinced that such an alternative 
would be a great addition to PL/Java and increase the number of users. 
The work to create such a platform that has the stability and quality of 
todays PL/Java is significant (I really do think it is a 
production-grade product today). So significant in fact, that I'm 
beginning to think of a third alternative. An alternative that would 
combine the performance of using in-process calls with the benefits of 
sharing a JVM. The answer is of course to make the backend multi-threaded.

This question has been debated before and always promptly rejected. One 
major reason is of course that it will not bring any benefits over the 
current multi-process approach on a majority of the platforms where 
PostgreSQL is used. A process-switch is just as fast as a thread-switch 
on Linux based systems. Over the last year however, something has happen 
that certainly speaks in the favor of multi-threading. PostgreSQL is 
getting widely adopted on Windows. On Windows, a process-switch is at 
least 5 times more expensive then a thread-switch. In order to 
appropriate locking, PostgreSQL is forced to do a fair amount of 
switching during transaction processing so the gain in using a 
multi-threaded approach on Windows is probably significant. The same is 
true for other OS'es where process-switching is relatively expensive.

There are other benefits as well. PostgreSQL would no longer need shared 
memory and semaphores and lot more resources could be shared between 
backend processes. The one major drawback of a multi-threaded approach 
(the one that's been the main argument for the defenders of the current 
approach) is vulnerability. If one thread is messing things up, then the 
whole system will be brought to a halt (on the other hand, that can be 
said about the current shared-memory approach as well). The cure for 
this is to have a system that, to the extent possible, prevents this 
from happening. How would that be possible? Well, such systems are 
widely used today. Huge companies use them in mission critical 
applications all over the world. They are called Virtual Machines. Two 
types in particular are gaining more an more ground. The .NET based CLR 
and the Java VM.

Although there's an Open Source initiative called Mono that implements 
the CLR, I still don't see it as a viable alternative to create a 
production-grade multi-platform database. Microsofts CLR is of course 
confined to Microsoft platforms. The Java VM's are however a different 
matter altogether. And with the java.nio.channels package that was 
introduced in Java 1.4 and the java.util.concurrent package from Java 
5.0, Java has taken a major steps forward in being a very feasible 
platform for a database implementation. There's actually nothing 
stopping you from doing a high-performance MVCC system using Java today. 
A SQL parser would be based on JavaCC technology (the grammar is already 
written although it needs small adjustments to comply with the 
PostgreSQL dialect). Lots of technology is there out-of-the-box such as 
regular expressions, hash-maps, linked lists, etc. Not to forget an 
exceptionally great threading system, now providing atomic operations, 
semaphores, copy-on-write arrays etc. In short, everything that a 
database implementor could ever wish for.

The third alternative for PL/Java, an approach that gets more viable 
every minute I think about it, is to implement the PostgreSQL backend 
completely in Java. I'm involved in the development of one of the 
commercial JVM's. I know that an enormous amount of resources are 
constantly devoted to performance optimizations. The days when a complex 
system written in C or C++ could outperform a JVM have passed. A static 
optimizer can only do so well. A JVM, that collects heuristics, 
communicates with the CPU about cache usage etc., can be a great deal 
smarter on how the final machine code will be optimized, and 
re-optimized should the conditions change. It would be great if 
PostgreSQL could benefit from all this research.

If a commercial JVM is perceived as a problem, then combine^h^h^hpile 
the code with GNU gcj instead of gcc like today.

The list of advantages can be made a mile long. There's no point in 
listing everything here. From my own standpoint, I'm of course thinking 
first and foremost about the advantages with PL/Java. It will become the 
absolute most efficient PL of them all. Other languages, for which no 
good Java implementation exists (I'm thinking Jython for Python, etc.), 
can be implemented using JNI. The most common functions used by say, 
PL/Perl could probably be implemented as callbacks into the Java domain 
in order to make the changes in the respective PL minimal.

Opinions? Suggestions?

Kind Regards,
Thomas Hallgren




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Suggestion: Which Binary?
Next
From: "Qingqing Zhou"
Date:
Subject: Re: [GENERAL] PANIC: heap_update_redo: no block