Re: BufFileRead() error signalling - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: BufFileRead() error signalling
Date
Msg-id CAAKRu_Zd4PkFdG_a+1Yq0O3n_2ENfSR_6rQrsUZ26hYY4SJEQQ@mail.gmail.com
Whole thread Raw
In response to Re: BufFileRead() error signalling  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers

On Mon, Jan 27, 2020 at 7:09 AM Robert Haas <robertmhaas@gmail.com> wrote:
Rather than answering your actual question, I'd like to complain about this:

  if (BufFileRead(file, ptr, BLCKSZ) != BLCKSZ)
- elog(ERROR, "could not read temporary file: %m");
+ elog(ERROR, "could not read temporary file");

I recognize that your commit message explains this change by saying
that this code will now never be reached except as a result of a short
read, but I don't think that will necessarily be clear to future
readers of those code, or people who get the error message. It seems
like if we're going to do do this, the error messages ought to be
changed to complain about a short read rather than an inability to
read for unspecified reasons. However, I wonder why we don't make
BufFileRead() throw all of the errors including complaining about
short reads. I would like to confess my undying (and probably
unrequited) love for the following code from md.c:

                                         errmsg("could not read block
%u in file \"%s\": read only %d of %d bytes",


It would be cool to have the block number in more cases in error
messages. For example, in sts_parallel_scan_next()

/* Seek and load the chunk header. */
if (BufFileSeekBlock(accessor->read_file, read_page) != 0)
        ereport(ERROR,
                (errcode_for_file_access(),
                  errmsg("could not read from shared tuplestore temporary file"),
                  errdetail_internal("Could not seek to next block.")));

I'm actually in favor of having the block number in this error
message. I think it would be helpful for multi-batch parallel
hashjoin. If a worker reading one SharedTuplestoreChunk encounters an
error and another worker on a different SharedTuplstoreChunk of the
same file does not, I would find it useful to know more about which
block it was on when the error was encountered.

--
Melanie Plageman

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Hash join not finding which collation to use for string hashing
Next
From: Alvaro Herrera
Date:
Subject: Re: standby apply lag on inactive servers