Re: Strange issue with NFS mounted PGDATA on ugreen NAS - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Strange issue with NFS mounted PGDATA on ugreen NAS |
Date | |
Msg-id | CA+hUKGKU38SMdtwo3cUYs56SNwn-+JNwjYq-QF420g5n6o4wcA@mail.gmail.com Whole thread Raw |
In response to | Re: Strange issue with NFS mounted PGDATA on ugreen NAS (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Strange issue with NFS mounted PGDATA on ugreen NAS
|
List | pgsql-hackers |
On Wed, Jan 1, 2025 at 6:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > ISTM we used to disclaim responsibility for data integrity if you > try to put PGDATA on NFS. I looked at the current wording about > NFS in runtime.sgml and was frankly shocked at how optimistic it is. > Shouldn't we be saying something closer to "if it breaks you > get to keep both pieces"? I now suspect this specific readdir() problem is in FreeBSD's NFS client. See below. There have also been reports of missed files from (IIRC) Linux clients without much analysis, but that doesn't seem too actionable from here unless someone can come up with a repro or at least some solid details to investigate; those involved unspecified (possibly appliance/cloud) NFS and CIFS file servers. The other issue I know of personally is NFS ENOSPC, which has some exciting disappearing-committed-data failure modes caused by lazy allocation on Linux's implementation (and possibly others), that I've written about before. But actually that one is not strictly an NFS-only issue, it's just really easy to hit that way, and I have a patch to fix it on our side, which I hope to re-post soon. Independently of this, really as it's tangled up with quite a few other things... > > Anyway, I'll write a patch to change rmtree() to buffer the names in > > memory. In theory there could be hundreds of gigabytes of pathnames, > > so perhaps I should do it in batches; I'll look into that. > > This feels a lot like putting lipstick on a pig. Hehe. Yeah. Abandoned. I see this issue here with a FreeBSD client talking to a Debian server exporting BTRFS or XFS, even with dirreadsize set high so that multi-request paging is not expected. Looking at Wireshark and the NFS spec (disclaimer: I have never studied NFS at this level before, addito salis grano), what I see is a READDIR request with cookie=0 (good), and which receives a response containing the whole directory listing and a final entry marker eof=1 (good), but then FreeBSD unexpectedly (to me) sends *another* READDIR request with cookie=662, which is a real cookie that was received somewhere in the middle of the first response on the entry for "13816_fsm", and that entry was followed by an entry for "13816_vm". The second request gets a response that begins at "13816_vm" (correct on the server's part). Then the client sends REMOVE (unlink) requests for some but not all of the files, including "13816_fsm" but not "13816_vm". Then it sends yet another READDIR request with cookie=0 (meaning go from the top), and gets a non-empty directory listing, but immediately sends RMDIR, which unsurprisingly fails NFS3ERR_NOTEMPTY. So my best guess so far is that FreeBSD's NFS client must be corrupting its directory cache when files are unlinked, and it's not the server's fault. I don't see any obvious problem with the way the cookies work. Seems like material for a minimised bug report elsewhere, and not our issue.
pgsql-hackers by date: