BufFileRead() error signalling - Mailing list pgsql-hackers

From Thomas Munro
Subject BufFileRead() error signalling
Date
Msg-id CA+hUKGJE04G=8TLK0DLypT_27D9dR8F1RQgNp0jK6qR0tZGWOw@mail.gmail.com
Whole thread Raw
Responses Re: BufFileRead() error signalling
Re: BufFileRead() error signalling
List pgsql-hackers
Hi,

As noted by Amit Khandhekar yesterday[1], BufFileLoad() silently eats
pread()'s error and makes them indistinguishable from EOF.

Some observations:

1.  BufFileRead() returns size_t (not ssize_t), so it's an
fread()-style interface, not a read()-style interface that could use
-1 to signal errors.  Unlike fread(), it doesn't seem to have anything
corresponding to feof()/ferror()/clearerr() that lets the caller
distinguish between EOF and error, so our hash and tuplestore spill
code simply trusts that if there is a 0 size read where it expects a
tuple boundary, it must be EOF.

2.  BufFileWrite() is the same, but just like fwrite(), a short write
must always mean error, so there is no problem here.

3.  The calling code assumes that unexpected short read or write sets
errno, which isn't the documented behaviour of fwrite() and fread(),
so there we're doing something a bit different (which is fine, just
pointing it out; we're sort of somewhere between the <stdio.h> and
<unistd.h> functions, in terms of error reporting).

I think the choices are: (1) switch to ssize_t and return -1, (2) add
at least one of BufFileEof(), BufFileError(), (3) have BufFileRead()
raise errors with elog().  I lean towards (2), and I doubt we need
BufFileClear() because the only thing we ever do in client code on
error is immediately burn the world down.

If we simply added an error flag to track if FileRead() had ever
signalled an error, we could change nodeHashjoin.c to do something
along these lines:

-    if (nread == 0)             /* end of file */
+    if (!BufFileError(file) && nread == 0)             /* end of file */

... and likewise for tuplestore.c:

-    if (nbytes != 0 || !eofOK)
+    if (BufFileError(file) || (nbytes == 0 && !eofOK))
         ereport(ERROR,

About the only advantage to the current design I can see if that you
can probably make your query finish faster by pulling out your temp
tablespace USB stick at the right time.  Or am I missing some
complication?

[1] https://www.postgresql.org/message-id/CAJ3gD9emnEys%3DR%2BT3OVt_87DuMpMfS4KvoRV6e%3DiSi5PT%2B9f3w%40mail.gmail.com



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Why is get_actual_variable_range()'s use of SnapshotNonVacuumablesafe during recovery?
Next
From: Thomas Munro
Date:
Subject: Re: BufFileRead() error signalling