Re: Postgres as In-Memory Database? - Mailing list pgsql-general

From Alban Hertroys
Subject Re: Postgres as In-Memory Database?
Date
Msg-id B45B0FA9-D76D-4B55-A837-00EDDE034969@gmail.com
Whole thread Raw
In response to Re: Postgres as In-Memory Database?  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-general
On 01 Apr 2014, at 4:20, Jeff Janes <jeff.janes@gmail.com> wrote:

> On Sunday, March 30, 2014, Stefan Keller <sfkeller@gmail.com> wrote:
> There seem to be two main things which make in-memory dbs special:
> 1. Index: Having all data in memory there is no need for a default index. A full-table scan "suddenly" becomes the
default.
>
> Surely not.  I would say that full table scans are *already* the default, deviated from only if it thinks an index
seemsto be better.  If you don't make an index, it can't seem to be better.  And I don't know of any higher-level
in-memorylanguage which fails to provide a way to do efficient searching into an in-memory structure, usually in the
formof hash tables or balanced trees.  If "let's seq scan everything as long as it is already in memory" is a good
idea,why would Perl, Java, Python, etc. (not to mention C libraries and the source code of PostgreSQL itself) provide
waysto do efficient searches in memory? 
>
> The caveat here is you can't make certain constraints without an index.  In theory you *could* have an unique
constraintwithout an index to support it, but if it were implemented you would probably rapidly learn that you don't
actuallywant to do that. 

That’s what I thought initially reading that line as well, but that’s not necessarily true for an in-memory database.

The keyword here is “default index”. The reasoning is probably along the lines that if all your data is in-memory, then
keepingit all sorted on the primary column(s) is relatively cheap (compared to doing so on disk). You could even split
offthe primary column(s) from the rest of the records and link back to them using pointers, which makes it easier to
keepthem in a (somewhat) balanced btree. 

When you do that, the data effectively becomes the primary key index and would theoretically also be usable as the
primarykey constraint. So you _do_ have an index, but it’s not a separate one; it’s part of your data. That frees up
memorythat you do not need to preserve for an index, which is probably rather a gain for an in-memory database. 

Apparently this works for HANA, but for an MVCC database like Postgres that’s a little more involved.

Of course, with databases like that you want them replicated on different hardware and kernels to reduce data-loss
risks.And that brings us closer to what ISTR Stonebreaker is doing these days. 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



pgsql-general by date:

Previous
From: Rajeev rastogi
Date:
Subject: Re: Doubts on startup costs
Next
From: shetty65
Date:
Subject: Unattended Installation