Thread: SAN and full_page_writes

SAN and full_page_writes

From
"Nikolas Everett"
Date:
I have the honor to be configuring Postgres to back into a NetApp FAS3020 via fiber.

Does anyone know if the SAN protects me from breakage due to partial page writes?

If anyone has an SAN specific postgres knowledge, I'd love to hear your words of wisdom.

For reference:
[postgres@localhost bonnie]$ ~neverett/bonnie++-1.03a/bonnie++
Writing with putc()...done
Writing intelligently...done
Rewriting...done
Reading with getc()...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
localhost.lo 32104M 81299  94 149848  30 42747   8 45465  61 55528   4 495.5   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++

Re: SAN and full_page_writes

From
"Nikolas Everett"
Date:
I seem to have answered my own question.  I'm sending the answer to the list in case someone else has the same question one day.

According to the NetApp documentation, it does protect me from partial page writes.  Thus, full_page_writes = off.


On Wed, Sep 3, 2008 at 12:03 PM, Nikolas Everett <nik9000@gmail.com> wrote:
I have the honor to be configuring Postgres to back into a NetApp FAS3020 via fiber.

Does anyone know if the SAN protects me from breakage due to partial page writes?

If anyone has an SAN specific postgres knowledge, I'd love to hear your words of wisdom.

For reference:
[postgres@localhost bonnie]$ ~neverett/bonnie++-1.03a/bonnie++
Writing with putc()...done
Writing intelligently...done
Rewriting...done
Reading with getc()...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
localhost.lo 32104M 81299  94 149848  30 42747   8 45465  61 55528   4 495.5   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++


Re: SAN and full_page_writes

From
Bruce Momjian
Date:
Nikolas Everett wrote:
> I seem to have answered my own question.  I'm sending the answer to the list
> in case someone else has the same question one day.
>
> According to the NetApp documentation, it does protect me from partial page
> writes.  Thus, full_page_writes = off.

Just for clarification, the NetApp must guarantee that the entire 8k
gets to disk, not just one of the 512-byte blocks that disks use
internally.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: SAN and full_page_writes

From
"Nikolas Everett"
Date:
Thanks for pointing that out Bruce.

NetApp has a 6 page PDF about NetApp and databases.  On page 4:

As discussed above, reads and writes are unconditionally atomic to 64 KB. While reads or writes
may fail for a number of reasons (out of space, permissions, etc.), the failure is always atomic to
64 KB. All possible error conditions are fully evaluated prior to committing any updates or
returning any data to the database.

From the sound of it, I can turn of full_page_writes.

This document can be found at http://www.netapp.com/us/ by searching for hosting databases.

Thanks,

--Nik

On Sat, Sep 6, 2008 at 3:46 PM, Bruce Momjian <bruce@momjian.us> wrote:
Nikolas Everett wrote:
> I seem to have answered my own question.  I'm sending the answer to the list
> in case someone else has the same question one day.
>
> According to the NetApp documentation, it does protect me from partial page
> writes.  Thus, full_page_writes = off.

Just for clarification, the NetApp must guarantee that the entire 8k
gets to disk, not just one of the 512-byte blocks that disks use
internally.

--
 Bruce Momjian  <bruce@momjian.us>        http://momjian.us
 EnterpriseDB                             http://enterprisedb.com

 + If your life is a hard drive, Christ can be your backup. +

Re: SAN and full_page_writes

From
Gregory Stark
Date:
"Nikolas Everett" <nik9000@gmail.com> writes:

> Thanks for pointing that out Bruce.
>
> NetApp has a 6 page PDF about NetApp and databases.  On page 4:

Skimming through this I think all 6 pages are critical. The sentence you quote
out of context pertains specifically to the NAS internal organization.

The previous pages discuss limitations of OSes, filesystems and especially NFS
clients which you may have to be concerned with as well.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

Re: SAN and full_page_writes

From
"Nikolas Everett"
Date:
Sorry about that.  I was having tunnel vision and pulled out the part that applied to me.  I also figured that the OS and file system information was superfluous but on second look it may not be.  This bit:

To satisfy the Durability requirement, all write operations must write through any OS
cache to stable storage before they are reported as complete or otherwise made visible.
Write-back caching behavior is prohibited, and data from failed writes must not appear in
an OS cache.
To satisfy the Serialization requirements, any OS cache must be fully coherent with the
underlying storage. For instance, each write must invalidate any OS-cached copies of
the data to be overwritten, on any and all hosts, prior to commitment. Multiple hosts may
access the same storage concurrently under shared-disk clustering, such as that
implemented by Oracle RAC and/or ASM.
Sounds kind of scary.  I think postgres forces the underlying OS and file system to do that stuff (sans the mutli-host magic) using fsync.  Is that right?

It does look like there are some gotchas with NFS.

On Mon, Sep 8, 2008 at 10:16 AM, Gregory Stark <stark@enterprisedb.com> wrote:

"Nikolas Everett" <nik9000@gmail.com> writes:

> Thanks for pointing that out Bruce.
>
> NetApp has a 6 page PDF about NetApp and databases.  On page 4:

Skimming through this I think all 6 pages are critical. The sentence you quote
out of context pertains specifically to the NAS internal organization.

The previous pages discuss limitations of OSes, filesystems and especially NFS
clients which you may have to be concerned with as well.

--
 Gregory Stark
 EnterpriseDB          http://www.enterprisedb.com
 Ask me about EnterpriseDB's Slony Replication support!

Re: SAN and full_page_writes

From
Gregory Stark
Date:
"Nikolas Everett" <nik9000@gmail.com> writes:

> Sounds kind of scary.  I think postgres forces the underlying OS and file
> system to do that stuff (sans the mutli-host magic) using fsync.  Is that
> right?

Yes, so you have to make sure that your filesystem really does support fsync
properly. I think most NFS implementations do that.

I was more concerned with:

    Network Appliance supports a number of NFS client implementations for use
    with databases. These clients provide write atomicity to at least 4 KB,
    and support synchronous writes when requested by the database. Typically,
    atomicity is guaranteed only to one virtual memory page, which may be as
    small as 4 KB. However, if the NFS client supports a direct I/O mode that
    completely bypasses the cache, then atomicity is guaranteed to the size
    specified by the “wsize” mount option, typically 32 KB.

    The failure of some NFS clients to assure write atomicity to a full
    database block means that the soft atomicity requirement is not always
    met. Some failures of the host system may result in a fractured database
    block on disk. In practice such failures are rare. When they happen no
    data is lost, but media recovery of the affected database block may be
    required

That "media recovery" it's referring to sounds like precisely our WAL full
page writes...


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

Re: SAN and full_page_writes

From
"Nikolas Everett"
Date:
On Mon, Sep 8, 2008 at 10:59 AM, Gregory Stark <stark@enterprisedb.com> wrote:

That "media recovery" it's referring to sounds like precisely our WAL full
page writes...


--
 Gregory Stark
 EnterpriseDB          http://www.enterprisedb.com
 Ask me about EnterpriseDB's PostGIS support!

That sounds right.

So the take home from this is that NetApp does its best to protect you from partial page writes but comes up short on untweaked NFS (see doc to tweak.)  Otherwise you are protected so long as your OS and file system implement fsync properly.

--Nik

Re: SAN and full_page_writes

From
Bruce Momjian
Date:
Nikolas Everett wrote:
> Thanks for pointing that out Bruce.
>
> NetApp has a 6 page PDF about NetApp and databases.  On page 4:
>
> As discussed above, reads and writes are unconditionally atomic to 64 KB.
> While reads or writes
> may fail for a number of reasons (out of space, permissions, etc.), the
> failure is always atomic to
> 64 KB. All possible error conditions are fully evaluated prior to committing
> any updates or
> returning any data to the database.

Well, that is certainly good news, and it is nice the specified the atomic
size.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +