Re: WAL in RAM - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: WAL in RAM
Date
Msg-id 63e8ca87ee41b0b968c350518ef5597a.squirrel@sq.gransy.com
Whole thread Raw
In response to WAL in RAM  (Marcus Engene <mengpg2@engene.se>)
List pgsql-performance
Hi,

On 28 Říjen 2011, 17:28, Marcus Engene wrote:
> Hi list,
>
> Every now and then I have write peaks which causes annoying delay on my
> website. No particular reason it seems, just that laws of probability
> dictates that there will be peaks every now and then.
>
> Anyway, thinking of ways to make the peaks more bareable, I saw the new
> 9.1 feature to bypass WAL. Problems is mainly that some statistics
> tables ("x users clicked this link this month") clog the write cache,
> not more important writes. I could live with restoring a nightly dump of
> these tables and loose a days worth of logs.

Why do you think the write activity is related to WAL? Does that mean bulk
loading of data by users or what? Have you measured how many WAL segments
that creates? What triggers that write activity?

Write peaks usually mean a checkpoint is in progress, and that has nothing
to do with WAL. More precisely - it does not write data to WAL but to data
files, so moving WAL to a separate device won't help at all.

The common scenario is about this:

(1) The users do some changes (INSERT/UPDATE/DELETE) to the database, it's
written to the WAL (fsynced to the device). This is not a lot of work, as
writing to WAL is a sequential access and the actual modifications are
stored in the shared buffers (not forced to the disk yet).

(2) A checkpoint is triggered, i.e. either a checkpoint_timeout expires or
all available WAL segments are filled - this means all the dirty buffers
has to be actually written from shared buffers to the datafiles. This is a
PITA, as it's a rather random access.

Anyway there are options to tune the write performance - most notably
checkpoint_segments, checkpoint_completion_target, checkpoint_timeout.

> Though not keen on jumping over to early major versions an old idea of
> putting WAL in RAM came back. Not RAM in main memory but some thingie
> pretending to be a drive with proper battery backup.
>
> a) It seems to exist odd hardware with RAM modules and if lucky also
> battery
> b) Some drive manufactureres have done hybird ram-spindle drives
> (compare with possibly more common ssd-spindle hybrides).
>
> b) sounds slightly more appealing since it basically means I put
> everything on those drives and it magically is faster. The a)
> alternatives also seemed to be non ECC which is a no-no and disturbing.
>
> Does anyone here have any recommendations here?

The thing to look for when talking about WAL is a sequential write speed.
The solutions you've mentioned above are great for random access, but when
you need a sequential speed it's a waste of money (IMHO).

> Pricing is not very important but reliability is.

My recommendation is to find out what's wrong before buying anything. My
impression is that you don't know the actual cause, so you really don't
know what features should the device have.

If you're willing to spend money without proper analysis, send the money
to me - the result will be about the same. You'll spend money without
actually solving the issue, plus it will make me a bit happier.

Anyway my recommendation would be these:

(1) Find out what actually happens, i.e. check if it's a checkpoint issue,
or what is going on. Enable log_checkpoints etc.

(2) Try to tune the db a bit - not sure what version are you using or what
are the important values, but generally this is a good starting point for
write-heavy databases

checkpoint_segments = 64
checkpoint_completion_target = 0.9
checkpoint_timeout = 30 minutes

(3) Provide more details - Pg version, important config values, hardware
info etc.

Only if I knew what's wrong and if the above things did not help, I'd
consider buying a new hw. I'd probably go with one of those options:

(a) Move the WAL to a separate disk (not SSD), or maybe a RAID1/RAID10 of
such drives. Start with one, use more if needed and a controller with a
BBWC.

(b) Move the database to a SSD drive, leave the WAL on the original
location (not SSD). This might be signoficantly more expensive, especially
if you want to make it reliable (building RAID1 of SSDs or something like
that).

Tomas


pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: should i expected performance degradation over time
Next
From: "Tomas Vondra"
Date:
Subject: Re: WAL in RAM