Here's v6, addressing most of the review comments. Most of this is
pretty simple / non-contentious. The remaining questions are in the last
two items (phs_len field, initializing the instrumentation pointers):
- 0001 tweaks WaitReadBuffers to return if a wait was needed. As pointed
out, this does not include some changes needed for io_uring, I don't
want to make this larger. So test with "worker".
- 0002 switches a couple explain tests to unaligned output. This makes
it easier to understand output changes with yaml/json/xml output.
- I kept the three scans in separate changes. It makes it easier to
review etc. and to me it seems easier to review. I'd keep it like this
for commit, but if someone thinks we should squash it into a single
commit that's possible too.
- I kept the IO option enabled by default. I think it makes sense to
keep this aligned with BUFFERS, but on the other hand the explain should
not be overwhelming. Various other explain options are similarly
helpful, and yet we left them OFF by default.
- Cleanup of the explain formatting code (merging appendString lines),
and so on. One question is whether to keep printing the information only
when (stats->prefetch_count > 0), especially in the non-text cases.
With the three scans this is not an issue, because the read stream is
initialized right away. But with index scans we delay that a bit, so I
wonder if this might cause some test instability. Probably not.
- I switched to %.2f format for printing the averages etc. instead of
%.3f. Not a huge change, we could probably go even to %.1f ...
- Replaced the ACCUMULATE_IO_STATS macro with a static inline.
- Renamed "stalls" to "waits".
- Fixed a couple obsolete comments, and the bug in dereferencing NULL
when the instrumentation was not initialized.
- Added the pscan_len length (returned by table_parallelscan_estimate)
to ParallelTableScanDesc, so that the workers can use this to calculate
the pointer of shared instrumentation without calling the _estimate
again (which I agree seems wrong and grotty).
This only matters for SeqScan and TidRangeScan, and it makes it quite a
bit cleaner. I don't see a better way. Right now the nodes set the new
field right after calling shm_toc_allocate, but I suppose it'd be more
correct to pass it to table_parallelscan_initialize() and set it from
there. Barring objections I'll adjust this for v7.
- I still don't understand how could the leader calculate the
instrumentation pointers for workers, so it's done the same way as in
BHS (except that we need to use the new phs_len field to calculate the
initial offset, of course).
regards
--
Tomas Vondra