Re: [HACKERS] PG on NFS may be just a bad idea - Mailing list pgsql-docs
From | Bruce Momjian |
---|---|
Subject | Re: [HACKERS] PG on NFS may be just a bad idea |
Date | |
Msg-id | 200711042151.lA4LpcP29113@momjian.us Whole thread Raw |
List | pgsql-docs |
Based on this analysis, I have added an NFS section to the tablespaces portion of the documentation, and linked to it from 'Creating a database cluster'. Patch attached. --------------------------------------------------------------------------- Tom Lane wrote: > I spent a bit of time tonight poking at the issue reported here: > http://archives.postgresql.org/pgsql-novice/2007-08/msg00123.php > > It turns out to be quite easy to reproduce, at least for me: start CVS > HEAD on an NFS-mounted $PGDATA directory, and run the contrib regression > tests ("make installcheck" in contrib/). I see more than half of the > DROP DATABASE commands complaining in exactly the way Miya describes. > This failure rate might be an artifact of the particular environment > (I tested NFS client = Fedora Core 6, server = HPUX 10.20 on a much > slower machine) but the problem is clearly real. > > In the earlier thread I cited suggestions that this behavior comes from > client programs holding files open longer than they should. However, > strace'ing this behavior shows no evidence at all that that is happening > in Postgres. I have an strace that shows conclusively that the bgwriter > never opened any file in the target database at all, and all earlier > backends exited before the one doing the DROP DATABASE began its dirty > work, and yet: > > [pid 19211] 22:50:30.517077 rmdir("base/18193") = -1 ENOTEMPTY (Directory not empty) > [pid 19211] 22:50:30.517863 write(2, "WARNING: could not remove file "..., 79WARNING: could not remove file or directory"base/18193": Directory not empty > ) = 79 > [pid 19211] 22:50:30.517974 sendto(7, "N\0\0\0rSWARNING\0C01000\0Mcould not "..., 115, 0, NULL, 0) = 115 > > After some googling I think that the damage may actually be getting done > at the kernel level. According to > http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html > it is fairly common for NFS clients to cache writes, meaning that the > kernel itself may be holding an old write and not sending it to the NFS > server until after the file deletion command has been sent. > > (I don't have the network-fu needed to prove that this is happening by > sniffing the network traffic; anyone want to try?) > > If this is what's happening I'd claim it is a kernel bug, but seeing > that I see it on FC6 and Miya sees it on Solaris 10, it would be a bug > widespread enough that we'd not be likely to get it killed off soon. > > Maybe we need to actively discourage people from running Postgres > against NFS-mounted data directories. Shane Kerr's paper cited above > mentions some other rather scary properties, including O_EXCL file > creation not really working properly. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: doc/src/sgml/manage-ag.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/manage-ag.sgml,v retrieving revision 2.55 retrieving revision 2.56 diff -c -r2.55 -r2.56 *** doc/src/sgml/manage-ag.sgml 4 Nov 2007 19:43:33 -0000 2.55 --- doc/src/sgml/manage-ag.sgml 4 Nov 2007 21:40:02 -0000 2.56 *************** *** 495,499 **** --- 495,525 ---- the old tablespace locations.) </para> + <sect2 id="manage-ag-tablespaces-nfs"> + <title>Network File Systems</title> + + <indexterm zone="manage-ag-tablespaces-nfs"> + <primary>Network File Systems</primary> + </indexterm> + <indexterm><primary><acronym>NFS</></><see>Network File Systems</></> + <indexterm><primary>Network Attached Storage (<acronym>NAS</>)</><see>Network File Systems</></> + + <para> + Many installations create tablespace on network file systems. + Sometimes this is done directly via <acronym>NFS</>, or by using a + Network Attached Storage (<acronym>NAS</>) device that uses + <acronym>NFS</> internally. <productname>PostgreSQL</> does nothing + special for <acronym>NFS</> file systems, meaning it assumes + <acronym>NFS</> behaves exactly like locally-connected drives. If + client and server <acronym>NFS</> implementations have non-standard + semantics, this can cause reliability problems (see <ulink + url="http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html"></ulink>). + Specifically, delayed (asynchonous) writes to the <acronym>NFS</> + server can cause reliability problems; if possible, mount + <acronym>NFS</> file systems synchonously to avoid this. + </para> + + </sect2> + </sect1> </chapter> Index: doc/src/sgml/runtime.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v retrieving revision 1.382 retrieving revision 1.383 diff -c -r1.382 -r1.383 *** doc/src/sgml/runtime.sgml 1 Nov 2007 19:06:01 -0000 1.382 --- doc/src/sgml/runtime.sgml 4 Nov 2007 21:48:03 -0000 1.383 *************** *** 159,164 **** --- 159,170 ---- for the database cluster. Normally this should be chosen to match the locale setting. For details see <xref linkend="multibyte">. </para> + + <para> + If using non-local (network) file systems, see <xref + linkend="manage-ag-tablespaces-nfs">. + </para> + </sect1> <sect1 id="server-start">
pgsql-docs by date: