Use streaming read API in ANALYZE - Mailing list pgsql-hackers

From Nazir Bilal Yavuz
Subject Use streaming read API in ANALYZE
Date
Msg-id CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM+6M+mJg@mail.gmail.com
Whole thread Raw
Responses Re: Use streaming read API in ANALYZE
List pgsql-hackers
Hi,

I worked on using the currently proposed streaming read API [1] in ANALYZE. The patch is attached. 0001 is the not yet merged streaming read API code changes that can be applied to the master, 0002 is the actual code.

The blocks to analyze are obtained by using the streaming read API now.

- Since streaming read API is already doing prefetch, I removed the #ifdef USE_PREFETCH code from acquire_sample_rows().

- Changed 'while (BlockSampler_HasMore(&bs))' to 'while (nblocks)' because the prefetch mechanism in the streaming read API will advance 'bs' before returning buffers.

- Removed BlockNumber and BufferAccessStrategy from the declaration of scan_analyze_next_block(), passing pgsr (PgStreamingRead) instead of them.

I counted syscalls of analyzing ~5GB table. It can be seen that the patched version did ~1300 less read calls.

Patched:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 39.67    0.012128           0     29809           pwrite64
 36.96    0.011299           0     28594           pread64
 23.24    0.007104           0     27611           fadvise64

Master (21a71648d3):

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 38.94    0.016457           0     29816           pwrite64
 36.79    0.015549           0     29850           pread64
 23.91    0.010106           0     29848           fadvise64



--
Regards,
Nazir Bilal Yavuz
Microsoft
Attachment

pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: partitioning and identity column
Next
From: Jelte Fennema-Nio
Date:
Subject: Re: Add trim_trailing_whitespace to editorconfig file