Thread: Re: PostgreSQL as a local in-memory cache

From:
Chris Browne
Date:

"" <> writes:
> My question is how can I configure the database to run as quickly as
> possible if I don't care about data consistency or durability? That
> is, the data is updated so often and it can be reproduced fairly
> rapidly so that if there is a server crash or random particles from
> space mess up memory we'd just restart the machine and move on.

For such a scenario, I'd suggest you:

- Set up a filesystem that is memory-backed.  On Linux, RamFS or TmpFS
  are reasonable options for this.

- The complication would be that your "restart the machine and move
  on" needs to consist of quite a few steps:

  - recreating the filesystem
  - fixing permissions as needed
  - running initdb to set up new PG instance
  - automating any needful fiddling with postgresql.conf, pg_hba.conf
  - starting up that PG instance
  - creating users, databases, schemas, ...

When my desktop machine's not dead [as it is now :-(], I frequently
use this very kind of configuration to host databases where I'm doing
functionality testing on continually-freshly-created DBs and therefore
don't actually care if they get thrown away.

I have set up an "init.d"-style script which has an extra target to do
the database "init" in order to make the last few steps mentioned as
quick as possible.

  ~/dbs/pgsql-head.sh init

goes an extra mile, using sed to rewrite postgresql.conf to change
defaults.

I expect that, if running on a ramdisk, you'd want to fiddle some of
the disk performance parameters in postgresql.conf.

It's certainly worth trying out the ramdisk to see if it helps with
this case.  Note that all you'll lose is durability under conditions
of hardware outage - PostgreSQL will still care as much as always
about data consistency.

[Thinking about wilder possibilities...]

I wonder if this kind of installation "comes into its own" for more
realistic scenarios in the presence of streaming replication.  If you
know the WAL files have gotten to disk on another server, that's a
pretty good guarantee :-).
--
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/internet.html
"MS  apparently now  has a  team dedicated  to tracking  problems with
Linux  and publicizing them.   I guess  eventually they'll  figure out
this back fires... ;)" -- William Burrow <>

From:
Steve Wampler
Date:

Chris Browne wrote:
> "" <> writes:
>> My question is how can I configure the database to run as quickly as
>> possible if I don't care about data consistency or durability? That
>> is, the data is updated so often and it can be reproduced fairly
>> rapidly so that if there is a server crash or random particles from
>> space mess up memory we'd just restart the machine and move on.
>
> For such a scenario, I'd suggest you:
>
> - Set up a filesystem that is memory-backed.  On Linux, RamFS or TmpFS
>   are reasonable options for this.
>
> - The complication would be that your "restart the machine and move
>   on" needs to consist of quite a few steps:
>
>   - recreating the filesystem
>   - fixing permissions as needed
>   - running initdb to set up new PG instance
>   - automating any needful fiddling with postgresql.conf, pg_hba.conf
>   - starting up that PG instance
>   - creating users, databases, schemas, ...

Doesn't PG now support putting both WAL and user table files onto
file systems other than the one holding the PG config files and PG
'admin' tables?  Wouldn't doing so simplify the above considertably
by allowing just the WAL and user tables on the memory-backed file
systems?  I wouldn't think the performance impact of leaving
the rest of the stuff on disk would be that large.

Or does losing WAL files mandate a new initdb?

--
Steve Wampler -- 
The gods that smiled on your birth are now laughing out loud.

From:
Craig James
Date:

[oops, didn't hit "reply to list" first time, resending...]

On 6/15/10 9:02 AM, Steve Wampler wrote:
> Chris Browne wrote:
>> "" <> writes:
>>> My question is how can I configure the database to run as quickly as
>>> possible if I don't care about data consistency or durability? That
>>> is, the data is updated so often and it can be reproduced fairly
>>> rapidly so that if there is a server crash or random particles from
>>> space mess up memory we'd just restart the machine and move on.
>>
>> For such a scenario, I'd suggest you:
>>
>> - Set up a filesystem that is memory-backed. On Linux, RamFS or TmpFS
>> are reasonable options for this.
>>
>> - The complication would be that your "restart the machine and move
>> on" needs to consist of quite a few steps:
>>
>> - recreating the filesystem
>> - fixing permissions as needed
>> - running initdb to set up new PG instance
>> - automating any needful fiddling with postgresql.conf, pg_hba.conf
>> - starting up that PG instance
>> - creating users, databases, schemas, ...

How about this: Set up a database entirely on a RAM disk, then install a WAL-logging warm standby.  If the production
computergoes down, you bring the warm standby online, shut it down, and use tar(1) to recreate the database on the
productionserver when you bring it back online.  You have speed and you have near-100% backup. 

Craig

From:
"jgardner@jonathangardner.net"
Date:

On Jun 15, 8:47 am, Chris Browne <cbbro...@acm.org> wrote:
> "jgard...@jonathangardner.net" <jgard...@jonathangardner.net> writes:
> > My question is how can I configure the database to run as quickly as
> > possible if I don't care about data consistency or durability? That
> > is, the data is updated so often and it can be reproduced fairly
> > rapidly so that if there is a server crash or random particles from
> > space mess up memory we'd just restart the machine and move on.
>
> For such a scenario, I'd suggest you:
>
> - Set up a filesystem that is memory-backed.  On Linux, RamFS or TmpFS
>   are reasonable options for this.
>

I had forgotten about this. I will try this out.

> - The complication would be that your "restart the machine and move
>   on" needs to consist of quite a few steps:
>
>   - recreating the filesystem
>   - fixing permissions as needed
>   - running initdb to set up new PG instance
>   - automating any needful fiddling with postgresql.conf, pg_hba.conf
>   - starting up that PG instance
>   - creating users, databases, schemas, ...
>

I'm going to have a system in place to create these databases when I
restart the service.

> ...
>
> I wonder if this kind of installation "comes into its own" for more
> realistic scenarios in the presence of streaming replication.  If you
> know the WAL files have gotten to disk on another server, that's a
> pretty good guarantee :-).
>

I have found that pre-computing and storing values in a general
relational-type database without durability is an ideal use case to
help improve services that need to return calculated results quickly.
A simple hash lookup is no longer sufficient. Perhaps PostgreSQL
running in this mode will be the ideal solution.

Nowadays, no one is really surprised that it takes 30 seconds or so to
replicate your data everywhere, but they do detest not getting answers
to their complicated queries immediately.


From:
Chris Browne
Date:

 (Steve Wampler) writes:
> Or does losing WAL files mandate a new initdb?

Losing WAL would mandate initdb, so I'd think this all fits into the
set of stuff worth putting onto ramfs/tmpfs.  Certainly it'll all be
significant to the performance focus.
--
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/internet.html
"MS  apparently now  has a  team dedicated  to tracking  problems with
Linux  and publicizing them.   I guess  eventually they'll  figure out
this back fires... ;)" -- William Burrow <>

From:
Jaime Casanova
Date:

On Tue, Jun 15, 2010 at 12:37 PM, Chris Browne <> wrote:
>  (Steve Wampler) writes:
>> Or does losing WAL files mandate a new initdb?
>
> Losing WAL would mandate initdb, so I'd think this all fits into the
> set of stuff worth putting onto ramfs/tmpfs.  Certainly it'll all be
> significant to the performance focus.

why is that? isn't simply execute pg_resetxlog enough? specially
'cause OP doesn't care about loosing some transactions

--
Jaime Casanova         www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

From:
Josh Berkus
Date:

On 6/15/10 10:37 AM, Chris Browne wrote:
>  (Steve Wampler) writes:
>> Or does losing WAL files mandate a new initdb?
>
> Losing WAL would mandate initdb, so I'd think this all fits into the
> set of stuff worth putting onto ramfs/tmpfs.  Certainly it'll all be
> significant to the performance focus.

I'd like to see some figures about WAL on RAMfs vs. simply turning off
fsync and full_page_writes.  Per Gavin's tests, PostgreSQL is already
close to TokyoCabinet/MongoDB performance just with those turned off; I
wonder if actually having the WAL on a memory partition would make any
real difference in throughput.

I've seen a lot of call for this recently, especially since PostgreSQL
seems to be increasingly in use as a reporting server for Hadoop.  Might
be worth experimenting with just making wal writing a no-op.  We'd also
want to disable checkpointing, of course.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

From:
"jgardner@jonathangardner.net"
Date:

On Jun 15, 4:18 pm, j...@agliodbs.com (Josh Berkus) wrote:
> On 6/15/10 10:37 AM, Chris Browne wrote:
>
> I'd like to see some figures about WAL on RAMfs vs. simply turning off
> fsync and full_page_writes.  Per Gavin's tests, PostgreSQL is already
> close to TokyoCabinet/MongoDB performance just with those turned off; I
> wonder if actually having the WAL on a memory partition would make any
> real difference in throughput.
>
> I've seen a lot of call for this recently, especially since PostgreSQL
> seems to be increasingly in use as a reporting server for Hadoop.  Might
> be worth experimenting with just making wal writing a no-op.  We'd also
> want to disable checkpointing, of course.
>

My back-of-the-envelope experiment: Inserting single integers into a
table without indexes using a prepared query via psycopg2.

Python Script:
import psycopg2
from time import time
conn = psycopg2.connect(database='jgardner')
cursor = conn.cursor()
cursor.execute("CREATE TABLE test (data int not null)")
conn.commit()
cursor.execute("PREPARE ins AS INSERT INTO test VALUES ($1)")
conn.commit()
start = time()
tx = 0
while time() - start < 1.0:
    cursor.execute("EXECUTE ins(%s)", (tx,));
    conn.commit()
    tx += 1
print tx
cursor.execute("DROP TABLE test");
conn.commit();

Local disk, WAL on same FS:
* Default config => 90
* full_page_writes=off => 90
* synchronous_commit=off => 4,500
* fsync=off => 5,100
* fsync=off and synchronous_commit=off => 5,500
* fsync=off and full_page_writes=off => 5,150
* fsync=off, synchronous_commit=off and full_page_writes=off => 5,500

tmpfs, WAL on same tmpfs:
* Default config: 5,200
* full_page_writes=off => 5,200
* fsync=off => 5,250
* synchronous_commit=off => 5,200
* fsync=off and synchronous_commit=off => 5,450
* fsync=off and full_page_writes=off => 5,250
* fsync=off, synchronous_commit=off and full_page_writes=off => 5,500

NOTE: If I do one giant commit instead of lots of littler ones, I get
much better speeds for the slower cases, but I never exceed 5,500
which appears to be some kind of wall I can't break through.

If there's anything else I should tinker with, I'm all ears.


From:
Mark Kirkwood
Date:

On 16/06/10 18:30,  wrote:
> On Jun 15, 4:18 pm, j...@agliodbs.com (Josh Berkus) wrote:
>
>> On 6/15/10 10:37 AM, Chris Browne wrote:
>>
>> I'd like to see some figures about WAL on RAMfs vs. simply turning off
>> fsync and full_page_writes.  Per Gavin's tests, PostgreSQL is already
>> close to TokyoCabinet/MongoDB performance just with those turned off; I
>> wonder if actually having the WAL on a memory partition would make any
>> real difference in throughput.
>>
>> I've seen a lot of call for this recently, especially since PostgreSQL
>> seems to be increasingly in use as a reporting server for Hadoop.  Might
>> be worth experimenting with just making wal writing a no-op.  We'd also
>> want to disable checkpointing, of course.
>>
>>
> My back-of-the-envelope experiment: Inserting single integers into a
> table without indexes using a prepared query via psycopg2.
>
> Python Script:
> import psycopg2
> from time import time
> conn = psycopg2.connect(database='jgardner')
> cursor = conn.cursor()
> cursor.execute("CREATE TABLE test (data int not null)")
> conn.commit()
> cursor.execute("PREPARE ins AS INSERT INTO test VALUES ($1)")
> conn.commit()
> start = time()
> tx = 0
> while time() - start<  1.0:
>      cursor.execute("EXECUTE ins(%s)", (tx,));
>      conn.commit()
>      tx += 1
> print tx
> cursor.execute("DROP TABLE test");
> conn.commit();
>
> Local disk, WAL on same FS:
> * Default config =>  90
> * full_page_writes=off =>  90
> * synchronous_commit=off =>  4,500
> * fsync=off =>  5,100
> * fsync=off and synchronous_commit=off =>  5,500
> * fsync=off and full_page_writes=off =>  5,150
> * fsync=off, synchronous_commit=off and full_page_writes=off =>  5,500
>
> tmpfs, WAL on same tmpfs:
> * Default config: 5,200
> * full_page_writes=off =>  5,200
> * fsync=off =>  5,250
> * synchronous_commit=off =>  5,200
> * fsync=off and synchronous_commit=off =>  5,450
> * fsync=off and full_page_writes=off =>  5,250
> * fsync=off, synchronous_commit=off and full_page_writes=off =>  5,500
>
> NOTE: If I do one giant commit instead of lots of littler ones, I get
> much better speeds for the slower cases, but I never exceed 5,500
> which appears to be some kind of wall I can't break through.
>
> If there's anything else I should tinker with, I'm all ears.
>
>

Seeing some profiler output (e.g oprofile) for the fastest case (and
maybe 'em all later) might be informative about what limit is being hit
here.

regards

Mark

From:
"Pierre C"
Date:

Have you tried connecting using a UNIX socket instead of a TCP socket on
localhost ? On such very short queries, the TCP overhead is significant.

From:
"Pierre C"
Date:

> Have you tried connecting using a UNIX socket instead of a TCP socket on
> localhost ? On such very short queries, the TCP overhead is significant.

Actually UNIX sockets are the default for psycopg2, had forgotten that.

I get 7400 using UNIX sockets and 3000 using TCP (host="localhost")

From:
Greg Smith
Date:

 wrote:
> NOTE: If I do one giant commit instead of lots of littler ones, I get
> much better speeds for the slower cases, but I never exceed 5,500
> which appears to be some kind of wall I can't break through.
>

That's usually about where I run into the upper limit on how many
statements Python can execute against the database per second.  Between
that and the GIL preventing better multi-core use, once you pull the
disk out and get CPU bound it's hard to use Python for load testing of
small statements and bottleneck anywhere except in Python itself.

I normally just write little performance test cases in the pgbench
scripting language, then I get multiple clients and (in 9.0) multiple
driver threads all for free.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
   www.2ndQuadrant.us


From:
"Pierre C"
Date:

FYI I've tweaked this program a bit :

import psycopg2
 from time import time
conn = psycopg2.connect(database='peufeu')
cursor = conn.cursor()
cursor.execute("CREATE TEMPORARY TABLE test (data int not null)")
conn.commit()
cursor.execute("PREPARE ins AS INSERT INTO test VALUES ($1)")
cursor.execute("PREPARE sel AS SELECT 1")
conn.commit()
start = time()
tx = 0
N = 100
d = 0
while d < 10:
    for n in xrange( N ):
        cursor.execute("EXECUTE ins(%s)", (tx,));
        #~ conn.commit()
        #~ cursor.execute("EXECUTE sel" );
    conn.commit()
    d = time() - start
    tx += N
print "result : %d tps" % (tx / d)
cursor.execute("DROP TABLE test");
conn.commit();

Results (Core 2 quad, ubuntu 10.04 64 bits) :

SELECT 1 : 21000 queries/s (I'd say 50 us per query isn't bad !)
INSERT with commit every 100 inserts : 17800 insets/s
INSERT with commit every INSERT : 7650 tps

fsync is on but not synchronous_commit.


From:
Alvaro Herrera
Date:

Excerpts from 's message of mié jun 16 02:30:30 -0400 2010:

> NOTE: If I do one giant commit instead of lots of littler ones, I get
> much better speeds for the slower cases, but I never exceed 5,500
> which appears to be some kind of wall I can't break through.
>
> If there's anything else I should tinker with, I'm all ears.

increase wal_buffers?

--
Álvaro Herrera <>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:
Josh Berkus
Date:

> * fsync=off => 5,100
> * fsync=off and synchronous_commit=off => 5,500

Now, this *is* interesting ... why should synch_commit make a difference
if fsync is off?

Anyone have any ideas?

> tmpfs, WAL on same tmpfs:
> * Default config: 5,200
> * full_page_writes=off => 5,200
> * fsync=off => 5,250
> * synchronous_commit=off => 5,200
> * fsync=off and synchronous_commit=off => 5,450
> * fsync=off and full_page_writes=off => 5,250
> * fsync=off, synchronous_commit=off and full_page_writes=off => 5,500

So, in this test, it seems like having WAL on tmpfs doesn't make a
significant difference for everything == off.

I'll try running some tests on Amazon when I have a chance.  It would be
worthwhile to get figures without Python's "ceiling".

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

From:
Jonathan Gardner
Date:

On Wed, Jun 16, 2010 at 1:27 AM, Greg Smith <> wrote:
>
> I normally just write little performance test cases in the pgbench scripting
> language, then I get multiple clients and (in 9.0) multiple driver threads
> all for free.
>

See, this is why I love these mailing lists. I totally forgot about
pgbench. I'm going to dump my cheesy python script and play with that
for a while.

--
Jonathan Gardner


From:
Jonathan Gardner
Date:

On Wed, Jun 16, 2010 at 4:22 AM, Pierre C <> wrote:
>
> import psycopg2
> from time import time
> conn = psycopg2.connect(database='peufeu')
> cursor = conn.cursor()
> cursor.execute("CREATE TEMPORARY TABLE test (data int not null)")
> conn.commit()
> cursor.execute("PREPARE ins AS INSERT INTO test VALUES ($1)")
> cursor.execute("PREPARE sel AS SELECT 1")
> conn.commit()
> start = time()
> tx = 0
> N = 100
> d = 0
> while d < 10:
>        for n in xrange( N ):
>                cursor.execute("EXECUTE ins(%s)", (tx,));
>                #~ conn.commit()
>                #~ cursor.execute("EXECUTE sel" );
>        conn.commit()
>        d = time() - start
>        tx += N
> print "result : %d tps" % (tx / d)
> cursor.execute("DROP TABLE test");
> conn.commit();
>

I'm not surprised that Python add is so slow, but I am surprised that
I didn't remember it was... ;-)

--
Jonathan Gardner


From:
Jonathan Gardner
Date:

On Wed, Jun 16, 2010 at 12:00 PM, Josh Berkus <> wrote:
>
>> * fsync=off => 5,100
>> * fsync=off and synchronous_commit=off => 5,500
>
> Now, this *is* interesting ... why should synch_commit make a difference
> if fsync is off?
>
> Anyone have any ideas?
>

I may have stumbled upon this by my ignorance, but I thought I read
that synchronous_commit controlled whether it tries to line up commits
or has a more free-for-all that may cause some intermediate weirdness.

--
Jonathan Gardner


From:
Balkrishna Sharma
Date:

http://www.postgresql.org/docs/current/static/wal-async-commit.html

" the server waits for the transaction's WAL records to be flushed to permanent storage before returning a success indication to the client."

I think with fynch=off, whether WAL gets written to disk or not is still controlled by synchronous_commit parameter. guessing here...


> Date: Wed, 16 Jun 2010 12:19:20 -0700
> Subject: Re: [PERFORM] PostgreSQL as a local in-memory cache
> From:
> To:
> CC:
>
> On Wed, Jun 16, 2010 at 12:00 PM, Josh Berkus <> wrote:
> >
> >> * fsync=off => 5,100
> >> * fsync=off and synchronous_commit=off => 5,500
> >
> > Now, this *is* interesting ... why should synch_commit make a difference
> > if fsync is off?
> >
> > Anyone have any ideas?
> >
>
> I may have stumbled upon this by my ignorance, but I thought I read
> that synchronous_commit controlled whether it tries to line up commits
> or has a more free-for-all that may cause some intermediate weirdness.
>
> --
> Jonathan Gardner
>
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


The New Busy is not the old busy. Search, chat and e-mail from your inbox. Get started.
From:
Jonathan Gardner
Date:

On Wed, Jun 16, 2010 at 12:51 AM, Pierre C <> wrote:
>
> Have you tried connecting using a UNIX socket instead of a TCP socket on
> localhost ? On such very short queries, the TCP overhead is significant.
>

Unfortunately, this isn't an option for my use case. Carbonado only
supports TCP connections.


--
Jonathan Gardner


From:
Craig James
Date:

On 6/16/10 12:00 PM, Josh Berkus wrote:
>
>> * fsync=off =>  5,100
>> * fsync=off and synchronous_commit=off =>  5,500
>
> Now, this *is* interesting ... why should synch_commit make a difference
> if fsync is off?
>
> Anyone have any ideas?

I found that pgbench has "noise" of about 20% (I posted about this a couple days ago using data from 1000 identical
pgbenchruns).  Unless you make a bunch of runs and average them, a difference of 5,100 to 5,500 appears to be
meaningless.

Craig

>
>> tmpfs, WAL on same tmpfs:
>> * Default config: 5,200
>> * full_page_writes=off =>  5,200
>> * fsync=off =>  5,250
>> * synchronous_commit=off =>  5,200
>> * fsync=off and synchronous_commit=off =>  5,450
>> * fsync=off and full_page_writes=off =>  5,250
>> * fsync=off, synchronous_commit=off and full_page_writes=off =>  5,500
>
> So, in this test, it seems like having WAL on tmpfs doesn't make a
> significant difference for everything == off.
>
> I'll try running some tests on Amazon when I have a chance.  It would be
> worthwhile to get figures without Python's "ceiling".
>


From:
"Pierre C"
Date:

> I'm not surprised that Python add is so slow, but I am surprised that
> I didn't remember it was... ;-)

it's not the add(), it's the time.time()...