Re: Remote PL/Java, Summary - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: Remote PL/Java, Summary
Date
Msg-id 442E7C5D.4050901@dunslane.net
Whole thread Raw
In response to Remote PL/Java, Summary  (Thomas Hallgren <thomas@tada.se>)
Responses Re: Remote PL/Java, Summary
List pgsql-hackers

Thomas Hallgren wrote:

> Hi all,
> And thanks for very good input regarding a remote alternative to 
> PL/Java (thread titled "Shared Memory"). I'm convinced that such an 
> alternative would be a great addition to PL/Java and increase the 
> number of users. The work to create such a platform that has the 
> stability and quality of todays PL/Java is significant (I really do 
> think it is a production-grade product today). So significant in fact, 
> that I'm beginning to think of a third alternative. An alternative 
> that would combine the performance of using in-process calls with the 
> benefits of sharing a JVM. The answer is of course to make the backend 
> multi-threaded.
>
> This question has been debated before and always promptly rejected. 
> One major reason is of course that it will not bring any benefits over 
> the current multi-process approach on a majority of the platforms 
> where PostgreSQL is used. A process-switch is just as fast as a 
> thread-switch on Linux based systems. Over the last year however, 
> something has happen that certainly speaks in the favor of 
> multi-threading. PostgreSQL is getting widely adopted on Windows. On 
> Windows, a process-switch is at least 5 times more expensive then a 
> thread-switch. In order to appropriate locking, PostgreSQL is forced 
> to do a fair amount of switching during transaction processing so the 
> gain in using a multi-threaded approach on Windows is probably 
> significant. The same is true for other OS'es where process-switching 
> is relatively expensive.
>
> There are other benefits as well. PostgreSQL would no longer need 
> shared memory and semaphores and lot more resources could be shared 
> between backend processes. The one major drawback of a multi-threaded 
> approach (the one that's been the main argument for the defenders of 
> the current approach) is vulnerability. If one thread is messing 
> things up, then the whole system will be brought to a halt (on the 
> other hand, that can be said about the current shared-memory approach 
> as well). The cure for this is to have a system that, to the extent 
> possible, prevents this from happening. How would that be possible? 
> Well, such systems are widely used today. Huge companies use them in 
> mission critical applications all over the world. They are called 
> Virtual Machines. Two types in particular are gaining more an more 
> ground. The .NET based CLR and the Java VM.
>
> Although there's an Open Source initiative called Mono that implements 
> the CLR, I still don't see it as a viable alternative to create a 
> production-grade multi-platform database. Microsofts CLR is of course 
> confined to Microsoft platforms. The Java VM's are however a different 
> matter altogether. And with the java.nio.channels package that was 
> introduced in Java 1.4 and the java.util.concurrent package from Java 
> 5.0, Java has taken a major steps forward in being a very feasible 
> platform for a database implementation. There's actually nothing 
> stopping you from doing a high-performance MVCC system using Java 
> today. A SQL parser would be based on JavaCC technology (the grammar 
> is already written although it needs small adjustments to comply with 
> the PostgreSQL dialect). Lots of technology is there out-of-the-box 
> such as regular expressions, hash-maps, linked lists, etc. Not to 
> forget an exceptionally great threading system, now providing atomic 
> operations, semaphores, copy-on-write arrays etc. In short, everything 
> that a database implementor could ever wish for.
>
> The third alternative for PL/Java, an approach that gets more viable 
> every minute I think about it, is to implement the PostgreSQL backend 
> completely in Java. I'm involved in the development of one of the 
> commercial JVM's. I know that an enormous amount of resources are 
> constantly devoted to performance optimizations. The days when a 
> complex system written in C or C++ could outperform a JVM have passed. 
> A static optimizer can only do so well. A JVM, that collects 
> heuristics, communicates with the CPU about cache usage etc., can be a 
> great deal smarter on how the final machine code will be optimized, 
> and re-optimized should the conditions change. It would be great if 
> PostgreSQL could benefit from all this research.
>
> If a commercial JVM is perceived as a problem, then combine^h^h^hpile 
> the code with GNU gcj instead of gcc like today.
>
> The list of advantages can be made a mile long. There's no point in 
> listing everything here. From my own standpoint, I'm of course 
> thinking first and foremost about the advantages with PL/Java. It will 
> become the absolute most efficient PL of them all. Other languages, 
> for which no good Java implementation exists (I'm thinking Jython for 
> Python, etc.), can be implemented using JNI. The most common functions 
> used by say, PL/Perl could probably be implemented as callbacks into 
> the Java domain in order to make the changes in the respective PL 
> minimal.
>
>

We already do use threads on Windows to a limited extent to do things 
like timers and pseudo-signal handling.

If this were a greenfields project then your arguments would have force. 
But for how long would you like to suspend Postgres development activity 
while we re-implement everything in Java? Not to mention the effort to 
recruit new developers to replace those who leave because they can't or 
don't want to be part of the effort.

For better or worse, PostgreSQL is written in C, and I can't see that 
changing.

It might be interesting to take a frozen code base for PostgreSQL and 
reimplement it in Java, and then run some comparisons, both for 
performance and crash stability. I just counted roughly 100k lines of 
source code, so a reimplementation effort would be distinctly non-trivial.

cheers

andrew


pgsql-hackers by date:

Previous
From: "Qingqing Zhou"
Date:
Subject: Re: [GENERAL] PANIC: heap_update_redo: no block
Next
From: Andrew Dunstan
Date:
Subject: Re: Remote PL/Java, Summary