Re: logtape.c stats don't account for unused "prefetched" block numbers - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: logtape.c stats don't account for unused "prefetched" block numbers
Date
Msg-id dd630e6cec768004430d35fb92cc42cd8d0b7bfc.camel@j-davis.com
Whole thread Raw
In response to Re: logtape.c stats don't account for unused "prefetched" block numbers  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: logtape.c stats don't account for unused "prefetched" block numbers
List pgsql-hackers
On Sat, 2020-09-05 at 12:03 -0700, Peter Geoghegan wrote:
> The comments in ltsWriteBlock() added by the 2017 bugfix commit
> 7ac4a389a7d clearly say that the zero block writing stuff is only
> supposed to happen at the edge of a tape boundary, which ought to be
> rare -- see the comment block in ltsWriteBlock(). And yet the new
> preallocation stuff explicitly relies on that it writing zero blocks
> much more frequently. I'm concerned that that can result in increased
> and unnecessary I/O, especially for sorts, but also for hash aggs
> that
> spill. I'm also concerned that having preallocated-but-allocated
> blocks confuses the accounting used by
> trace_sort/LogicalTapeSetBlocks().

Preallocation showed significant gains for HashAgg, and BufFile doesn't
support sparse writes. So, for HashAgg, it seems like we should just
update the comment and consider it the price of using BufFile.

(Aside: is there a reason why BufFile doesn't support sparse writes, or
is it just a matter of implementation?)

For Sort, we can just disable preallocation.

> Separately, it's possible to make the
> trace_sort/LogicalTapeSetBlocks() instrumentation agree with the
> filesystem by replacing the use of nBlocksAllocated within
> LogicalTapeSetBlocks() with nBlocksWritten -- that seems to make the
> instrumentation correct without changing the current behavior at all.
> But I'm not ready to endorse that approach, since it's not quite
> clear
> what nBlocksAllocated and nBlocksWritten mean right now -- those two
> fields were both added by the aforementioned 2017 bugfix commit,
> which
> introduced the "allocated vs written" distinction in the first place

Right now, it seems nBlocksAllocated means "number of blocks returned
by ltsGetFreeBlock(), plus nHoleBlocks".

nBlocksWritten seems to mean "the logical size of the BufFile". The
BufFile can have holes in it after concatenation, but from the
perspective of logtape.c, nBlocksWritten seems like a better fit for
instrumentation purposes. So I'd be inclined to return (nBlocksWritten
- nHoleBlocks).

The only thing I can think of that would be better is if BufFile
tracked for itself the logical vs. physical size, which might be a good
improvement to make (and would mean that logtape.c wouldn't be
responsible for tracking the holes itself).

Thoughts?

Regards,
    Jeff Davis





pgsql-hackers by date:

Previous
From: Ian Barwick
Date:
Subject: Re: Improving connection scalability: GetSnapshotData()
Next
From: Peter Geoghegan
Date:
Subject: Re: Inconsistent Japanese name order in v13 contributors list