Re: [HACKERS] PG on NFS may be just a bad idea - Mailing list pgsql-docs

From Bruce Momjian
Subject Re: [HACKERS] PG on NFS may be just a bad idea
Date
Msg-id 200711042151.lA4LpcP29113@momjian.us
Whole thread Raw
List pgsql-docs
Based on this analysis, I have added an NFS section to the tablespaces
portion of the documentation, and linked to it from 'Creating a database
cluster'.  Patch attached.

---------------------------------------------------------------------------

Tom Lane wrote:
> I spent a bit of time tonight poking at the issue reported here:
> http://archives.postgresql.org/pgsql-novice/2007-08/msg00123.php
>
> It turns out to be quite easy to reproduce, at least for me: start CVS
> HEAD on an NFS-mounted $PGDATA directory, and run the contrib regression
> tests ("make installcheck" in contrib/).  I see more than half of the
> DROP DATABASE commands complaining in exactly the way Miya describes.
> This failure rate might be an artifact of the particular environment
> (I tested NFS client = Fedora Core 6, server = HPUX 10.20 on a much
> slower machine) but the problem is clearly real.
>
> In the earlier thread I cited suggestions that this behavior comes from
> client programs holding files open longer than they should.  However,
> strace'ing this behavior shows no evidence at all that that is happening
> in Postgres.  I have an strace that shows conclusively that the bgwriter
> never opened any file in the target database at all, and all earlier
> backends exited before the one doing the DROP DATABASE began its dirty
> work, and yet:
>
> [pid 19211] 22:50:30.517077 rmdir("base/18193") = -1 ENOTEMPTY (Directory not empty)
> [pid 19211] 22:50:30.517863 write(2, "WARNING:  could not remove file "..., 79WARNING:  could not remove file or
directory"base/18193": Directory not empty 
> ) = 79
> [pid 19211] 22:50:30.517974 sendto(7, "N\0\0\0rSWARNING\0C01000\0Mcould not "..., 115, 0, NULL, 0) = 115
>
> After some googling I think that the damage may actually be getting done
> at the kernel level.  According to
> http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html
> it is fairly common for NFS clients to cache writes, meaning that the
> kernel itself may be holding an old write and not sending it to the NFS
> server until after the file deletion command has been sent.
>
> (I don't have the network-fu needed to prove that this is happening by
> sniffing the network traffic; anyone want to try?)
>
> If this is what's happening I'd claim it is a kernel bug, but seeing
> that I see it on FC6 and Miya sees it on Solaris 10, it would be a bug
> widespread enough that we'd not be likely to get it killed off soon.
>
> Maybe we need to actively discourage people from running Postgres
> against NFS-mounted data directories.  Shane Kerr's paper cited above
> mentions some other rather scary properties, including O_EXCL file
> creation not really working properly.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/manage-ag.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/manage-ag.sgml,v
retrieving revision 2.55
retrieving revision 2.56
diff -c -r2.55 -r2.56
*** doc/src/sgml/manage-ag.sgml    4 Nov 2007 19:43:33 -0000    2.55
--- doc/src/sgml/manage-ag.sgml    4 Nov 2007 21:40:02 -0000    2.56
***************
*** 495,499 ****
--- 495,525 ----
     the old tablespace locations.)
    </para>

+   <sect2 id="manage-ag-tablespaces-nfs">
+    <title>Network File Systems</title>
+
+    <indexterm zone="manage-ag-tablespaces-nfs">
+     <primary>Network File Systems</primary>
+    </indexterm>
+    <indexterm><primary><acronym>NFS</></><see>Network File Systems</></>
+    <indexterm><primary>Network Attached Storage (<acronym>NAS</>)</><see>Network File Systems</></>
+
+    <para>
+     Many installations create tablespace on network file systems.
+     Sometimes this is done directly via <acronym>NFS</>, or by using a
+     Network Attached Storage (<acronym>NAS</>) device that uses
+     <acronym>NFS</> internally.  <productname>PostgreSQL</> does nothing
+     special for <acronym>NFS</> file systems, meaning it assumes
+     <acronym>NFS</> behaves exactly like locally-connected drives.  If
+     client and server <acronym>NFS</> implementations have non-standard
+     semantics, this can cause reliability problems (see <ulink
+     url="http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html"></ulink>).
+     Specifically, delayed (asynchonous) writes to the <acronym>NFS</>
+     server can cause reliability problems;   if possible, mount
+     <acronym>NFS</> file systems synchonously to avoid this.
+    </para>
+
+   </sect2>
+
   </sect1>
  </chapter>
Index: doc/src/sgml/runtime.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v
retrieving revision 1.382
retrieving revision 1.383
diff -c -r1.382 -r1.383
*** doc/src/sgml/runtime.sgml    1 Nov 2007 19:06:01 -0000    1.382
--- doc/src/sgml/runtime.sgml    4 Nov 2007 21:48:03 -0000    1.383
***************
*** 159,164 ****
--- 159,170 ----
     for the database cluster.  Normally this should be chosen to match the
     locale setting.  For details see <xref linkend="multibyte">.
    </para>
+
+   <para>
+    If using non-local (network) file systems, see <xref
+    linkend="manage-ag-tablespaces-nfs">.
+   </para>
+
   </sect1>

   <sect1 id="server-start">

pgsql-docs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [GENERAL] Abbreviation list
Next
From: Guillaume Lelarge
Date:
Subject: Deux typo fixes...