Home > mailing lists

Re: Question: BlockSize > 8192 with FusionIO - Mailing list pgsql-performance

From	Scott Carey
Subject	Re: Question: BlockSize > 8192 with FusionIO
Date	January 5, 2011 02:49:36
Msg-id	42AF139A-0385-4226-B81C-9569FB64873E@richrelevance.com Whole thread Raw
In response to	Re: Question: BlockSize > 8192 with FusionIO (Merlin Moncure <mmoncure@gmail.com>)
List	pgsql-performance

Tree view

On Jan 4, 2011, at 8:48 AM, Merlin Moncure wrote:

> On Mon, Jan 3, 2011 at 9:13 PM, Greg Smith <greg@2ndquadrant.com> wrote:
>> Strange, John W wrote:
>>>
>>> Has anyone had a chance to recompile and try larger a larger blocksize
>>> than 8192 with pSQL 8.4.x?
>>
>> While I haven't done the actual experiment you're asking about, the problem
>> working against you here is how WAL data is used to protect against partial
>> database writes.  See the documentation for full_page_writes at
>> http://www.postgresql.org/docs/current/static/runtime-config-wal.html
>>  Because full size copies of the blocks have to get written there, attempts
>> to chunk writes into larger pieces end up requiring a correspondingly larger
>> volume of writes to protect against partial writes to those pages.  You
>> might get a nice efficiency gain on the read side, but the situation when
>> under a heavy write load (the main thing you have to be careful about with
>> these SSDs) is much less clear.
>
> most flash drives, especially mlc flash, use huge blocks anyways on
> physical level.  the numbers claimed here
> (http://www.fusionio.com/products/iodrive/)  (141k write iops) are
> simply not believable without write buffering.  i didn't see any note
> of how fault tolerance is maintained through the buffer (anyone
> know?).

Flash may have very large erase blocks -- 4k to 16M, but you can write to it at much smaller block sizes sequentially.

It has to delete a block in bulk, but it can write to an erased block bit by bit, sequentially (512 or 4096 bytes
typically,but some is 8k and 16k). 

Older MLC NAND flash could be written to at a couple bytes at a time -- but drives today incorporate too much EEC and
uselarger chunks to do that.  The minimum write size now is caused by the EEC requirements and not the physical NAND
flashrequirements.   

So, buffering isn't that big of a requirement with the current LBA > Physical translations which change all writes --
randomor not -- to sequential writes in one erase block. 
 But performance if waiting for the write to complete will not be all that good, especially with MLC.  Turn off the
bufferon an Intel SLC drive for example, and write IOPS is cut by 1/3 or more -- to 'only' 1000 or so iops.

pgsql-performance by date:

From: Greg Smith
Date: 04 January 2011, 21:52:03
Subject: Re: Same stament sometime fast, something slow

From: Josh Berkus
Date: 05 January 2011, 16:43:30
Subject: Wrong docs on wal_buffers?

Re: Question: BlockSize > 8192 with FusionIO - Mailing list pgsql-performance

Previous

Next