Re: PostgreSQL in the press again - Mailing list pgsql-advocacy

From Christopher Browne
Subject Re: PostgreSQL in the press again
Date
Msg-id m3u0rskj0u.fsf@knuth.knuth.cbbrowne.com
Whole thread Raw
In response to PostgreSQL in the press again  (Simon Riggs <simon@2ndquadrant.com>)
Responses Re: PostgreSQL in the press again
Re: PostgreSQL in the press again
Re: PostgreSQL in the press again
List pgsql-advocacy
Oops! scrappy@postgresql.org ("Marc G. Fournier") was seen spray-painting on a wall:
> On Sat, 13 Nov 2004, Thomas Hallgren wrote:
>
>> Joshua D. Drake wrote:
>>> Yes but I believe even you would agree that their are programming
>>> languages that are better for certain tasks than others. The use
>>> of java as a replication engine for PostgreSQL seems,
>>> well... incorrect.
>>
>> Marc G. Fournier wrote:
>>> We definitely concur with that, which is why we are re-writing it
>>> ...  going to Java, as Andrew has mentioned, was *not* a design
>>> decision that we made, but was made for us :(
>>>
>> Now I get really curious. Why would Java be a bad choice for a
>> replication engine? I would consider it an excellent choice,
>> provided of course that the people tasked with the implementation
>> had the right skills. C-JDBC for instance, is written in Java.
>
> Everyone obviously has their opinion, but in mine, Java just has
> toooooo large of a memory foot print ... I don't know enough about
> Java to know if this is something that is restricted to how
> eRServer/Java was coded or not, but by default, the damn thing takes
> something like 300Mb of RAM for just the engine :(

The problem with Java is twofold:

1.  Naive system implementations wind up gratuitously using a lot of
    memory.

2.  The garbage collection system makes it particularly difficult to
    be aware of how the "memory life cycle" works.  Which helps keep
    developers naive for somewhat longer...

In the case of eRServer, the way the snapshot system was constructed
led to "gratuitous memory use," and that's not an obvious result of
either 1. or 2.

Someone could have made a C-based version of ERS that, by using
similar implementation strategies, would also use "gratuitously large"
amounts of memory.

In contrast, Slony-I happens to be _immensely_ more frugal in its use
of memory.  That is a matter of design, not of the language used.  The
"strategy" involves loading into memory only the "buffering" (more or
less) of the data that is being loaded.  If there's a replication set
consisting of 80GB of data, you don't need to hold it all in RAM; you
just need to buffer a few hundred KB of it so that you're streaming
large enough blocks across the network to let the network connections
be used efficiently.  If the strategies of Slony-I had been
implemented in Java, the memory footprint would still be relatively
small.  The fact that Java has heftier libraries than C means that
Java apps will be somewhat bigger than C ones.

But I wouldn't raise any "red flags" if a "Slony-Java" process
consumed 25MB whilst the C version only consumed 8MB.  Those are both
small enough sizes that they're not going to challenge JVM maximum
memory sizes.  On a couple occasions, I saw eRServer "blow up" due to
the JVM not being configured to have enough memory space, and could
foresee situations where you couldn't set memory space high enough
:-(.

I'd expect a C++-based system to fall somewhere in between.  Between
exception handling, templates, and such, C++ adds a bit of "gratuitous
bloat," but not quite so much as in Java.  (Unless you use STL Way
Lots, but that's another story :-).)

But in all of this, the things that cause the _real_ bloat are
pessimal algorithmic design choices.  The things to _fix_ bloat are
algorithmic changes, not changes of language.

The things to "hate" about Java aren't about any of this.  It's more
like:

 - Java runs, in a "supportable" manner, on way fewer platforms than
   PostgreSQL

 - If you pick libraries that are functional enough to be useful,
   then you likely have to get a Sun JDK with pretty proprietary
   licensing

 - Due to licensing complexities, it's WAY more complex to deploy
   Java-based apps than C-based apps.  The average Linux or BSD
   distribution contains hundreds if not thousands of apps
   deployed in C; doing the same for Java has proved more than
   troublesome.
--
output = reverse("gro.mca" "@" "enworbbc")
http://www.ntlug.org/~cbbrowne/linux.html
"Using Java  as a general purpose application  development language is
like  going big  game  hunting  armed with  Nerf  weapons."
-- Author Unknown

pgsql-advocacy by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: PostgreSQL in the press again
Next
From: Thomas Hallgren
Date:
Subject: Re: PostgreSQL in the press again