Thread: Opportunity for a Radical Changes in Database Software
Hi In looking at current developments in computers, it seems we're nearing a point where a fundamental change may be possible in databases... Namely in-memory databases which could lead to huge performance improvements. A good starting point is to look at memcached, since it provides proof that it's possible to interconnect hundreds of machines into a huge memory cluster with, albeit, some issues on reliability. For more info on memcached, try: http://www.socialtext.net/memcached/index.cgi?faq The sites that use it see incredible performance increases, but often at the cost of not being able to provide versioned results that are guaranteed to be accurate. The big questions are then, how would you create a distributed in-memory database? Another idea that may be workable Everyone knows the main problem with a standard cluster is that every machine has to perform every write, which leads to diminishing returns as the writes consume more and more of every machine's resources. Would it be possible to create a clustered environment where the master is the only machine that writes the data to disk, while the others just use cached data? Or, perhaps it would work better if the master or master log entry moves from machine to machine with a commit coinciding with a disk write on each machine? Any other ideas? It seems to be a problem worth pondering since in-memory databases are possible. Thanks Dan
I'd suggest looking at the source code to several of the in-memory databases which already exist. On 10/25/07, Dan <dss01Card-Offer@prestohosting.com> wrote: > Hi > > In looking at current developments in computers, it seems we're nearing > a point where a fundamental change may be possible in databases... > Namely in-memory databases which could lead to huge performance > improvements. > > A good starting point is to look at memcached, since it provides proof > that it's possible to interconnect hundreds of machines into a huge > memory cluster with, albeit, some issues on reliability. > > For more info on memcached, try: > http://www.socialtext.net/memcached/index.cgi?faq > > The sites that use it see incredible performance increases, but often at > the cost of not being able to provide versioned results that are > guaranteed to be accurate. > > The big questions are then, how would you create a distributed in-memory > database? > > > Another idea that may be workable > > Everyone knows the main problem with a standard cluster is that every > machine has to perform every write, which leads to diminishing returns > as the writes consume more and more of every machine's resources. Would > it be possible to create a clustered environment where the master is the > only machine that writes the data to disk, while the others just use > cached data? Or, perhaps it would work better if the master or master > log entry moves from machine to machine with a commit coinciding with a > disk write on each machine? > > Any other ideas? It seems to be a problem worth pondering since > in-memory databases are possible. > > Thanks > > Dan > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/
On Thu, Oct 25, 2007 at 08:05:24AM -0700, Dan wrote: > In looking at current developments in computers, it seems we're nearing > a point where a fundamental change may be possible in databases... > Namely in-memory databases which could lead to huge performance > improvements. I think there are a number of challenges in this area. Higher end machines are tending towards a NUMA architecture, where postgresql's single buffer pool becomes a liability. In some situations you might want a smaller per processor pool and an explicit copy to grab buffers from processes on other CPUs. I think relibility becomes the real issue though, you can always produce the wrong answer instantly, the trick is to get the right one... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Oct 25, 2007, at 8:05 AM, Dan wrote: > In looking at current developments in computers, it seems we're > nearing > a point where a fundamental change may be possible in databases... > Namely in-memory databases which could lead to huge performance > improvements. > ... > The sites that use it see incredible performance increases, but > often at > the cost of not being able to provide versioned results that are > guaranteed to be accurate. > > The big questions are then, how would you create a distributed in- > memory > database? Everything you are looking for is here: http://web.mit.edu/dna/www/vldb07hstore.pdf It is the latest Stonebraker et al on massively distributed in-memory OLTP architectures. J. Andrew Rogers
* J. Andrew Rogers: > Everything you are looking for is here: > > http://web.mit.edu/dna/www/vldb07hstore.pdf > > It is the latest Stonebraker et al on massively distributed in-memory > OLTP architectures. "Ruby-on-Rails compiles into standard JDBC, but hides all the complexity of that interface. Hence, H-Store plans to move from C++ to Ruby-on-Rails as our stored procedure language." This reads a bit strange.
On Oct 27, 2007, at 2:20 PM, Florian Weimer wrote: > * J. Andrew Rogers: > >> Everything you are looking for is here: >> >> http://web.mit.edu/dna/www/vldb07hstore.pdf >> >> It is the latest Stonebraker et al on massively distributed in-memory >> OLTP architectures. > > "Ruby-on-Rails compiles into standard JDBC, but hides all the > complexity > of that interface. Hence, H-Store plans to move from C++ to > Ruby-on-Rails as our stored procedure language." This reads a bit > strange. Yeah, that's a bit of a "WTF?". Okay, a giant "WTF?". I could see using Ruby as a stored procedure language, but Ruby-on-Rails seems like an exercise in buzzword compliance. And Ruby is just about the slowest language in its class, which based on the rest of the paper (serializing all transactions, doing all transactions strictly in- memory) means that you would be bottlenecking your database node on the procedural language rather than the usual I/O considerations. Most of the architectural stuff made a considerable amount of sense, though I had quibbles with bits of it (I think the long history of the design makes some decisions look silly in a world that is now multi-core by default). The Ruby-on-Rails part is obviously fungible. Nonetheless, it is a good starting point for massively distributed in-memory OLTP architectures and makes a good analysis of many aspects of database design from that perspective, or at least I have not really seen anything better. I prefer a slightly more conservative approach that generalizes better in that space than what is suggested personally. Cheers, J. Andrew Rogers
J., I'd actually be curious what incremental changes you could see making to PostgreSQL for better in-memory operation. Ideas? -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
On Oct 28, 2007, at 2:54 PM, Josh Berkus wrote: > I'd actually be curious what incremental changes you could see > making to > PostgreSQL for better in-memory operation. Ideas? It would be difficult to make PostgreSQL really competitive for in- memory operation, primarily because a contrary assumption pervades the entire design. You would need to rip out a lot of the guts of it. I was not even intending to suggest that it would be a good idea or trivial to adapt PostgreSQL to in-memory operation, but since I am at least somewhat familiar with the research I thought I'd offer a useful link that detailed the kinds of considerations involved. That said, I have seriously considered the idea since I have a major project that requires that kind of capability and there is some utility in using parts of PostgreSQL if possible, particularly since it was used to prototype it. In my specific case I also need to shoehorn a new type of access method into it as well that there is no conceptual support for, so it will probably be easier to build a (mostly) new database engine altogether. Personally, if I was designing a distributed in-memory database, I would use a somewhat more conservative set of assumptions than Stonebraker so that it would have a more general applicability. For example, his assumption of extremely short CPU times for a transaction (<1 millisecond) are not even valid for some types of OLTP loads, never mind the numerous uses that are not strictly OLTP- like but which nonetheless are built on relatively short transactions; in the Stonebraker design this much latency would be a pathology. Unfortunately, if you remove that assumption, the design starts to unravel noticeably. Nonetheless, there are other viable design paths that while not over-fitted to OLTP still could offer large gains. I think the market is right for a well-designed distributed, in- memory database, but I think one would be starting with an architecture inferior for the purpose that would be hard to get away from if we made incremental changes to a solid disk-based engine. It seems short-term expedient but long-term bad engineering -- think MySQL. Cheers, J. Andrew Rogers