Re: SSD + RAID - Mailing list pgsql-performance

From Laszlo Nagy
Subject Re: SSD + RAID
Date
Msg-id 4AFF9A18.80702@shopzeus.com
Whole thread Raw
In response to Re: SSD + RAID  (Craig Ringer <craig@postnewspapers.com.au>)
Responses Re: SSD + RAID  (Craig Ringer <craig@postnewspapers.com.au>)
List pgsql-performance
> A change has been written to the WAL and fsync()'d, so Pg knows it's hit
> disk. It can now safely apply the change to the tables themselves, and
> does so, calling fsync() to tell the drive containing the tables to
> commit those changes to disk.
>
> The drive lies, returning success for the fsync when it's just cached
> the data in volatile memory. Pg carries on, shortly deleting the WAL
> archive the changes were recorded in or recycling it and overwriting it
> with new change data. The SSD is still merrily buffering data to write
> cache, and hasn't got around to writing your particular change yet.
>
All right. I believe you. In the current Pg implementation, I need to
turn of disk cache.

But.... I would like to ask some theoretical questions. It is just an
idea from me, and probably I'm wrong.
Here is a scenario:

#1. user wants to change something, resulting in a write_to_disk(data) call
#2. data is written into the WAL and fsync()-ed
#3. at this point the write_to_disk(data) call CAN RETURN, the user can
continue his work (the WAL is already written, changes cannot be lost)
#4. Pg can continue writting data onto the disk, and fsync() it.
#5. Then WAL archive data can be deleted.

Now maybe I'm wrong, but between #3 and #5, the data to be written is
kept in memory. This is basically a write cache, implemented in OS
memory. We could really handle it like a write cache. E.g. everything
would remain the same, except that we add some latency. We can wait some
time after the last modification of a given block, and then write it out.

Is it possible to do? If so, then can we can turn off write cache for
all drives, except the one holding the WAL. And still write speed would
remain the same. I don't think that any SSD drive has more than some
megabytes of write cache. The same amount of write cache could easily be
implemented in OS memory, and then Pg would always know what hit the disk.

Thanks,

   Laci


pgsql-performance by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: FTS performance with the Polish config
Next
From: Craig Ringer
Date:
Subject: Re: SSD + RAID