Re: Postgres as In-Memory Database? - Mailing list pgsql-general

From Stefan Keller
Subject Re: Postgres as In-Memory Database?
Date
Msg-id CAFcOn2-=NueGYYKtWQVtPymXZYoFwBXKw=-7mg8XNMeSM04gGg@mail.gmail.com
Whole thread Raw
In response to Re: Postgres as In-Memory Database?  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-general
Hi Jeff

I agree with most of your statements.

2014-04-01 4:20 GMT+02:00 Jeff Janes <jeff.janes@gmail.com>:
On Sunday, March 30, 2014, Stefan Keller <sfkeller@gmail.com> wrote:
Hi Jeff


2013/11/20 Jeff Janes <jeff.janes@gmail.com>

I don't know what you mean about enhancements in the buffer pool.  For an in-memory database, there shouldn't be a buffer pool in the first place, as it is *all* in memory.  

You are right: In-memory DBs are making buffer-pooling obsolete - except for making data persistent (see below).

I would be very reluctant to use any database engine which considered disk access obsolete.  There will always be a risk where data grows to exceed RAM, and where it would be inconvenient to expand RAM fast enough to accommodate it.  I've played those games enough with Perl and C in-memory systems.  You fight and squeeze to fit the data into RAM, then the data size grows 3% and all of our work is for naught.  You can buy more RAM, if you have the budget, and the RAM isn't back-ordered for 3 months because the factory that makes it had a fire, and if more RAM fits on your motherboard, and....

I think there are some use cases where you can plan and foresee how data increases.
 
 
Do you know why it is slow?  I'd give high odds that it would be a specific implementation detail in 
> the code that is suboptimal, or maybe a design decision of PostGIS, rather than some high level 
> architectural decision of PostgreSQL.

Referring to the application is something you can always say - but shouldn't prevent on enhancing Postgres.

Postgres has been enhanced.  Now we need to change osm2pgsql to take advantage of them.  It defines indexes on the tables that are going to be bulk loaded with COPY, which defeats some recent optimizations made to COPY.  The creation of the indexes should be delayed until after the bulk load is done.  

A further enhancement to Postgres would be would be to automatically defer creation of the indexes when a table is truncated or created within a transaction, so that users get the benefit of the improvement
These enhancements to osm2pgsql seem to be reasonable to me. I hope somebody has time to care about.

I have a fork of osm2pgsql on github which delays the index build until the COPY is done.  I'm not really motivated to convince anyone to merge it (as my interest is postgresql not osm itself), but if someone wants to pick it up, that is fine with me.  It helps somewhat, but it is not a game-changer because there are other bigger bottlenecks, at least for HDD based systems.

One of the bigger bottlenecks is building the GIN indexes on the way table at the end.  Setting maintenance_work_mem to huge values helps a lot, if you can find a safe setting for it considering multiple index builds it might be doing (at that point in the load, osm2pgsql's node cache has been released, so there is substantial RAM to re-purpose).  It would be better for this use if PostgreSQL built the index by using an external sort, rather than iterating over the table building maintenance_work_mem sized chunks of red-black trees.  The problem there is that osm uses the gin index in an odd way (the vast majority of nodes occur in exactly one way, with a minority occurring in more than one), and using a disk sort might not be ideal for the more common use cases where GIN is used, where a given token usually occurs in far more than one document.  So an improvement that only improves osm2pgsql and degrades other uses is unlikely to be adopted.
 
Another bottleneck is just the raw COPY into the node table.  When that is running against an unindexed table which was created in the same transaction, I see that osm2pgsql takes about 50% of a CPU to print a copy-stream, and postgresql uses about 50% of a CPU to parse that stream and insert into the table.  So they add up to about 1 CPU despite the fact this a is multiple CPU machine.  So they seem to be playing ping-pong with the pipe buffer when in theory they should each by able to run at almost full speed.  I don't know how to get it stop playing ping-pong, but I have other use cases where this shows up, so trade-off-free solution would be nifty.  I suspect that that is more of a kernel issue than either postgresql or osm2pgsql.

You could do the COPY in parallel in multiple threads, but the problem there is you can't use the "created in same transaction" optimization to avoid WAL overhead.  There is no fix to this without changing PostgreSQL to accommodate it, but i have no clear idea how one would do that.  Importing a snapshot doesn't seem like it would be enough, as you can only import snapshots for reading, not for writing.

Also, I'm not too sure how seriously to take the goal of optimizing osm2pgsql.  Development on it seems to be less than vigorous.  And its purpose is to create a database to be used, so wouldn't it make more sense to optimize the use, not the creation?  And if you do want to optimize the creation, the obvious way to do it so to create the export in a way more closely aligned to that need, rather than a generic export.

As long as planet file importtakes about 5 days, I think it's an issue?

 

In the meantime I discussed with HANA users an thought about what makes in-memory dbs special and how to configure Postgres to be an in-memory db.

There seem to be two main things which make in-memory dbs special: 
1. Index: Having all data in memory there is no need for a default index. A full-table scan "suddenly" becomes the default.

Surely not.  I would say that full table scans are *already* the default, deviated from only if it thinks an index seems to be better.  If you don't make an index, it can't seem to be better.  And I don't know of any higher-level in-memory language which fails to provide a way to do efficient searching into an in-memory structure, usually in the form of hash tables or balanced trees.  If "let's seq scan everything as long as it is already in memory" is a good idea, why would Perl, Java, Python, etc. (not to mention C libraries and the source code of PostgreSQL itself) provide ways to do efficient searches in memory?

The caveat here is you can't make certain constraints without an index.  In theory you *could* have an unique constraint without an index to support it, but if it were implemented you would probably rapidly learn that you don't actually want to do that.

Ok. But at least it seems to me obvious that there is a need for different index (configurations) since the currently implemented indices are designed to avoid secondary storage. 
 
 
2. Persistence: All updates/changes to the db are streamed to disk.

That is already done through WAL files.  But they are also written to memory in their proper location, and then that location is written to disk as well in a scattered manner once per checkpoint.   If you suppress checkpoints entirely then every time the database is restarted it would have to read the entire history of the database since creation from the WAL to create the initial in-memory image.  So, how should checkpoints be handled?

I have to look up how HANA solves this since that's one of the USPs they say. 
At least the delay checkpoints and accept "small data loss", thats for sure.
 
 
3. An increase of data simply needs to be compensated with more memory (since memory became cheap).

The size of the data seems to increasing at the same rate as the RAM is getting cheaper, if not faster.  And the new RAM might need a new motherboard, and then a  new power supply, and a new dev environment and new test environment and then a new rack to mount them, and then a union electrician to wire up the new rack, and then a new data center to hold the next new rack.....
 
The same could be said for new hard drives, too, but I usually have more head room in my hard drives than in my RAM.


AFAIK one can configure Postgres for all these properties: One can force the use of full-table scan(?) and replicate to disk(?).

Yes, already can be done.  Don't build an index, and delay checkpoints as long as you dare (1 hour is currently the max, but a simple hack can extend that.  Whether you will like the consequences of that is another matter).
 

What do you think about this?

I think there is no reason to think these changes will do much good.  In memory databases usually run as libraries, not as client-server infrastructures.  Without that, you already have one hand behind your back.  Also, I think they usually map datatypes directly to hardware supported types, which is surely faster but would render PostgreSQL's flexible type system nonfunctional.  I don't know how they deal with visibility, but I have to assume they either compromise on correctness, or limit the amount of updating which is possible on the in memory portion.  By the time you do that stuff, it isn't clear what would be left anymore of the things that make PostgreSQL be PostgreSQL.
 
Are there any show cases out there?

What did the HANA users have to say?  Seems like they would be in the best position to provide the test cases.

Yes, HANA or Oracle TimesTe or H-Store (VoltDB).
 

Cheers,

Jeff


Yours, Stefan

pgsql-general by date:

Previous
From: James Harper
Date:
Subject: unexpected unnest behaviour
Next
From: Igor Neyman
Date:
Subject: Re: Unattended Installation