Hi,
I worked on using the currently proposed streaming read API [1] in ANALYZE. The patch is attached. 0001 is the not yet merged streaming read API code changes that can be applied to the master, 0002 is the actual code.
The blocks to analyze are obtained by using the streaming read API now.
- Since streaming read API is already doing prefetch, I removed the #ifdef USE_PREFETCH code from acquire_sample_rows().
- Changed 'while (BlockSampler_HasMore(&bs))' to 'while (nblocks)' because the prefetch mechanism in the streaming read API will advance 'bs' before returning buffers.
- Removed BlockNumber and BufferAccessStrategy from the declaration of scan_analyze_next_block(), passing pgsr (PgStreamingRead) instead of them.
I c
ounted syscalls of analyzing ~5GB table. It can be seen that the patched version did ~1300 less read calls.
Patched:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
39.67 0.012128 0 29809 pwrite64
36.96 0.011299 0 28594 pread64
23.24 0.007104 0 27611 fadvise64
Master (21a71648d3):
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
38.94 0.016457 0 29816 pwrite64
36.79 0.015549 0 29850 pread64
23.91 0.010106 0 29848 fadvise64
--
Regards,
Nazir Bilal Yavuz
Microsoft