On Tue, Nov 26, 2024 at 11:36 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think the idea was that this mechanism is equivalent to an EMFILE
> limit. But if you feel a need to make a distinction, this seems fine:
I think we should never, ever confuse an error return from a system
call with any other kind of problem that can happen. Not even write()
returns 0 => ENOSPC.
AFAIK, the rationale for conflating failure cases like this is that
either both failures are unlikely or, at least, the case where errno
wasn't actually set is unlikely. But the problem is that when
something weird happens, that's exactly when you need a clear and
unambiguous error report. I've had multiple extremely painful support
experiences that were made painful precisely because I couldn't
determine exactly what really happened. Did a system call really
return an unlikely error code? Or was it the
not-a-real-error-code-but-we-faked-one case which is also not supposed
to happen?
I find this kind of thing maddening every time it happens, and it
happens to me more often than you might think, because it often
happens that other people are able to answer the normal questions and
they send me the weird ones. Let's say twice a year I spend a couple
of days sweating blood trying to determine the root cause of some
bizarre malfunction because the person who wrote the code couldn't be
bothered to take 2 minutes to make the errors distinguishable.
--
Robert Haas
EDB: http://www.enterprisedb.com