Thread: Opportunity for a Radical Changes in Database Software

Opportunity for a Radical Changes in Database Software

From

Dan

Date:

25 October 2007, 12:16:43

Hi

In looking at current developments in computers, it seems we're nearing
a point where a fundamental change may be possible in databases...
Namely in-memory databases which could lead to huge performance
improvements.

A good starting point is to look at memcached, since it provides proof
that it's possible to interconnect hundreds of machines into a huge
memory cluster with, albeit, some issues on reliability.

For more info on memcached, try:
http://www.socialtext.net/memcached/index.cgi?faq

The sites that use it see incredible performance increases, but often at
the cost of not being able to provide versioned results that are
guaranteed to be accurate. 

The big questions are then, how would you create a distributed in-memory
database? 


Another idea that may be workable

Everyone knows the main problem with a standard cluster is that every
machine has to perform every write, which leads to diminishing returns
as the writes consume more and more of every machine's resources. Would
it be possible to create a clustered environment where the master is the
only machine that writes the data to disk, while the others just use
cached data? Or, perhaps it would work better if the master or master
log entry moves from machine to machine with a commit coinciding with a
disk write on each machine?

Any other ideas?  It seems to be a problem worth pondering since
in-memory databases are possible.

Thanks

Dan

Re: Opportunity for a Radical Changes in Database Software

From

"Jonah H. Harris"

Date:

25 October 2007, 12:23:03

I'd suggest looking at the source code to several of the in-memory
databases which already exist.

On 10/25/07, Dan <dss01Card-Offer@prestohosting.com> wrote:
> Hi
>
> In looking at current developments in computers, it seems we're nearing
> a point where a fundamental change may be possible in databases...
> Namely in-memory databases which could lead to huge performance
> improvements.
>
> A good starting point is to look at memcached, since it provides proof
> that it's possible to interconnect hundreds of machines into a huge
> memory cluster with, albeit, some issues on reliability.
>
> For more info on memcached, try:
> http://www.socialtext.net/memcached/index.cgi?faq
>
> The sites that use it see incredible performance increases, but often at
> the cost of not being able to provide versioned results that are
> guaranteed to be accurate.
>
> The big questions are then, how would you create a distributed in-memory
> database?
>
>
> Another idea that may be workable
>
> Everyone knows the main problem with a standard cluster is that every
> machine has to perform every write, which leads to diminishing returns
> as the writes consume more and more of every machine's resources. Would
> it be possible to create a clustered environment where the master is the
> only machine that writes the data to disk, while the others just use
> cached data? Or, perhaps it would work better if the master or master
> log entry moves from machine to machine with a commit coinciding with a
> disk write on each machine?
>
> Any other ideas?  It seems to be a problem worth pondering since
> in-memory databases are possible.
>
> Thanks
>
> Dan
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>


-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation                | fax: 732.331.1301
499 Thornall Street, 2nd Floor          | jonah.harris@enterprisedb.com
Edison, NJ 08837                        | http://www.enterprisedb.com/

Re: Opportunity for a Radical Changes in Database Software

From

Martijn van Oosterhout

Date:

25 October 2007, 13:12:33

On Thu, Oct 25, 2007 at 08:05:24AM -0700, Dan wrote:
> In looking at current developments in computers, it seems we're nearing
> a point where a fundamental change may be possible in databases...
> Namely in-memory databases which could lead to huge performance
> improvements.

I think there are a number of challenges in this area. Higher end
machines are tending towards a NUMA architecture, where postgresql's
single buffer pool becomes a liability. In some situations you might
want a smaller per processor pool and an explicit copy to grab buffers
from processes on other CPUs.

I think relibility becomes the real issue though, you can always
produce the wrong answer instantly, the trick is to get the right
one...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Opportunity for a Radical Changes in Database Software

From

"J. Andrew Rogers"

Date:

25 October 2007, 14:28:54

On Oct 25, 2007, at 8:05 AM, Dan wrote:
> In looking at current developments in computers, it seems we're  
> nearing
> a point where a fundamental change may be possible in databases...
> Namely in-memory databases which could lead to huge performance
> improvements.
> ...
> The sites that use it see incredible performance increases, but  
> often at
> the cost of not being able to provide versioned results that are
> guaranteed to be accurate.
>
> The big questions are then, how would you create a distributed in- 
> memory
> database?


Everything you are looking for is here:

http://web.mit.edu/dna/www/vldb07hstore.pdf

It is the latest Stonebraker et al on massively distributed in-memory  
OLTP architectures.


J. Andrew Rogers

Re: Opportunity for a Radical Changes in Database Software

From

Florian Weimer

Date:

27 October 2007, 18:20:56

* J. Andrew Rogers:

> Everything you are looking for is here:
>
> http://web.mit.edu/dna/www/vldb07hstore.pdf
>
> It is the latest Stonebraker et al on massively distributed in-memory
> OLTP architectures.

"Ruby-on-Rails compiles into standard JDBC, but hides all the complexity
of that interface. Hence, H-Store plans to move from C++ to
Ruby-on-Rails as our stored procedure language."  This reads a bit
strange.

Re: Opportunity for a Radical Changes in Database Software

From

"J. Andrew Rogers"

Date:

28 October 2007, 01:38:37

On Oct 27, 2007, at 2:20 PM, Florian Weimer wrote:
> * J. Andrew Rogers:
>
>> Everything you are looking for is here:
>>
>> http://web.mit.edu/dna/www/vldb07hstore.pdf
>>
>> It is the latest Stonebraker et al on massively distributed in-memory
>> OLTP architectures.
>
> "Ruby-on-Rails compiles into standard JDBC, but hides all the  
> complexity
> of that interface. Hence, H-Store plans to move from C++ to
> Ruby-on-Rails as our stored procedure language."  This reads a bit
> strange.

Yeah, that's a bit of a "WTF?".  Okay, a giant "WTF?".  I could see  
using Ruby as a stored procedure language, but Ruby-on-Rails seems  
like an exercise in buzzword compliance.  And Ruby is just about the  
slowest language in its class, which based on the rest of the paper  
(serializing all transactions, doing all transactions strictly in- 
memory) means that you would be bottlenecking your database node on  
the procedural language rather than the usual I/O considerations.

Most of the architectural stuff made a considerable amount of sense,  
though I had quibbles with bits of it (I think the long history of  
the design makes some decisions look silly in a world that is now  
multi-core by default).  The Ruby-on-Rails part is obviously  
fungible.  Nonetheless, it is a good starting point for massively  
distributed in-memory OLTP architectures and makes a good analysis of  
many aspects of database design from that perspective, or at least I  
have not really seen anything better.  I prefer a slightly more  
conservative approach that generalizes better in that space than what  
is suggested personally.

Cheers,

J. Andrew Rogers

Re: Opportunity for a Radical Changes in Database Software

From

Josh Berkus

Date:

28 October 2007, 18:52:27

J.,

I'd actually be curious what incremental changes you could see making to 
PostgreSQL for better in-memory operation.  Ideas?

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: Opportunity for a Radical Changes in Database Software

From

"J. Andrew Rogers"

Date:

31 October 2007, 04:01:10

On Oct 28, 2007, at 2:54 PM, Josh Berkus wrote:
> I'd actually be curious what incremental changes you could see  
> making to
> PostgreSQL for better in-memory operation.  Ideas?

It would be difficult to make PostgreSQL really competitive for in- 
memory operation, primarily because a contrary assumption pervades  
the entire design.  You would need to rip out a lot of the guts of  
it.  I was not even intending to suggest that it would be a good idea  
or trivial to adapt PostgreSQL to in-memory operation, but since I am  
at least somewhat familiar with the research I thought I'd offer a  
useful link that detailed the kinds of considerations involved. That  
said, I have seriously considered the idea since I have a major  
project that requires that kind of capability and there is some  
utility in using parts of PostgreSQL if possible, particularly since  
it was used to prototype it.  In my specific case I also need to  
shoehorn a new type of access method into it as well that there is no  
conceptual support for, so it will probably be easier to build a  
(mostly) new database engine altogether.

Personally, if I was designing a distributed in-memory database, I  
would use a somewhat more conservative set of assumptions than  
Stonebraker so that it would have a more general applicability.  For  
example, his assumption of extremely short CPU times for a  
transaction (<1 millisecond) are not even valid for some types of  
OLTP loads, never mind the numerous uses that are not strictly OLTP- 
like but which nonetheless are built on relatively short  
transactions; in the Stonebraker design this much latency would be a  
pathology.  Unfortunately, if you remove that assumption, the design  
starts to unravel noticeably.  Nonetheless, there are other viable  
design paths that while not over-fitted to OLTP still could offer  
large gains.

I think the market is right for a well-designed distributed, in- 
memory database, but I think one would be starting with an  
architecture inferior for the purpose that would be hard to get away  
from if we made incremental changes to a solid disk-based engine.  It  
seems short-term expedient but long-term bad engineering -- think MySQL.

Cheers,

J. Andrew Rogers