Re: Strange issue with NFS mounted PGDATA on ugreen NAS - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Strange issue with NFS mounted PGDATA on ugreen NAS
Date
Msg-id CA+hUKGKU38SMdtwo3cUYs56SNwn-+JNwjYq-QF420g5n6o4wcA@mail.gmail.com
Whole thread Raw
In response to Re: Strange issue with NFS mounted PGDATA on ugreen NAS  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Strange issue with NFS mounted PGDATA on ugreen NAS
List pgsql-hackers
On Wed, Jan 1, 2025 at 6:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ISTM we used to disclaim responsibility for data integrity if you
> try to put PGDATA on NFS.  I looked at the current wording about
> NFS in runtime.sgml and was frankly shocked at how optimistic it is.
> Shouldn't we be saying something closer to "if it breaks you
> get to keep both pieces"?

I now suspect this specific readdir() problem is in FreeBSD's NFS
client.  See below.  There have also been reports of missed files from
(IIRC) Linux clients without much analysis, but that doesn't seem too
actionable from here unless someone can come up with a repro or at
least some solid details to investigate; those involved unspecified
(possibly appliance/cloud) NFS and CIFS file servers.

The other issue I know of personally is NFS ENOSPC, which has some
exciting disappearing-committed-data failure modes caused by lazy
allocation on Linux's implementation (and possibly others), that I've
written about before.  But actually that one is not strictly an
NFS-only issue, it's just really easy to hit that way, and I have a
patch to fix it on our side, which I hope to re-post soon.
Independently of this, really as it's tangled up with quite a few
other things...

> > Anyway, I'll write a patch to change rmtree() to buffer the names in
> > memory.  In theory there could be hundreds of gigabytes of pathnames,
> > so perhaps I should do it in batches; I'll look into that.
>
> This feels a lot like putting lipstick on a pig.

Hehe.  Yeah.  Abandoned.

I see this issue here with a FreeBSD client talking to a Debian server
exporting BTRFS or XFS, even with dirreadsize set high so that
multi-request paging is not expected.  Looking at Wireshark and the
NFS spec (disclaimer: I have never studied NFS at this level before,
addito salis grano), what I see is a READDIR request with cookie=0
(good), and which receives a response containing the whole directory
listing and a final entry marker eof=1 (good), but then FreeBSD
unexpectedly (to me) sends *another* READDIR request with cookie=662,
which is a real cookie that was received somewhere in the middle of
the first response on the entry for "13816_fsm", and that entry was
followed by an entry for "13816_vm".  The second request gets a
response that begins at "13816_vm" (correct on the server's part).
Then the client sends REMOVE (unlink) requests for some but not all of
the files, including "13816_fsm" but not "13816_vm".  Then it sends
yet another READDIR request with cookie=0 (meaning go from the top),
and gets a non-empty directory listing, but immediately sends RMDIR,
which unsurprisingly fails NFS3ERR_NOTEMPTY.  So my best guess so far
is that FreeBSD's NFS client must be corrupting its directory cache
when files are unlinked, and it's not the server's fault.  I don't see
any obvious problem with the way the cookies work.  Seems like
material for a minimised bug report elsewhere, and not our issue.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Conflict detection for update_deleted in logical replication
Next
From: jian he
Date:
Subject: Re: Proposal: Progressive explain