On Mon, Oct 29, 2018 at 3:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So there are a couple of things to complain about here with respect
> to the error message, regardless of the underlying bug:
I agree with you fully.
> Yeah, that works fine on Windows AFAIK. I also note that ENOENT
> isn't an error code that lseek() can deliver, anyway, since it works
> on an already-open FD. The failure here must be coming from opening
> the file.
Good point.
> I'm a little inclined to suspect that the true cause here is workers
> not correctly computing the name of this temp file, which is what
> led me to complain about the error message. Although a weak spot in
> this theory is that it's not clear why they'd not fail later anyway,
> unless maybe this particular file never got touched by workers before.
There just isn't that much to get right there, though. Another weak
spot in that theory is that it seems unlikely that the first complaint
we'd hear would happen to be from a Windows user. I think that they're
very much in the minority, especially among early adopters.
> > I have a strong suspicion that going back to passing the size through
> > shared memory (i.e. partially reverting 445e31bdc74) would make the
> > problem go away, but I won't do that until I actually understand
> > what's going on.
>
> Sounds like papering over the bug ...
I may have been unclear. It would be papering over the bug if I went
ahead and did that now.
The advantage of getting the file size from shared memory is that it
doesn't leave it up to code like BufFileOpenShared() to find
everything through readdir() iteration, an approach that might not be
totally portable. We'll reliably fail if all BufFile segments cannot
be accounted for with the size-in-shared-memory approach, which seems
more robust. I wouldn't be surprised if that actually was the correct
fix in the end.
--
Peter Geoghegan