Re: Use streaming read API in ANALYZE - Mailing list pgsql-hackers
From | Nazir Bilal Yavuz |
---|---|
Subject | Re: Use streaming read API in ANALYZE |
Date | |
Msg-id | CAN55FZ2_4hgTRK=TujRj=J2VYk8Rm8=H1KCTCeKm8zYN1zd3Nw@mail.gmail.com Whole thread Raw |
In response to | Re: Use streaming read API in ANALYZE (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: Use streaming read API in ANALYZE
Re: Use streaming read API in ANALYZE |
List | pgsql-hackers |
Hi, Thanks for the review! On Wed, 27 Mar 2024 at 23:15, Melanie Plageman <melanieplageman@gmail.com> wrote: > > On Tue, Mar 26, 2024 at 02:51:27PM +0300, Nazir Bilal Yavuz wrote: > > Hi, > > > > On Wed, 28 Feb 2024 at 14:42, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote: > > > > > > > > > The new version of the streaming read API [1] is posted. I updated the > > > streaming read API changes patch (0001), using the streaming read API > > > in ANALYZE patch (0002) remains the same. This should make it easier > > > to review as it can be applied on top of master > > > > > > > > > > The new version of the streaming read API is posted [1]. I rebased the > > patch on top of master and v9 of the streaming read API. > > > > There is a minimal change in the 'using the streaming read API in ANALYZE > > patch (0002)', I changed STREAMING_READ_FULL to STREAMING_READ_MAINTENANCE > > to copy exactly the same behavior as before. Also, some benchmarking > > results: > > > > I created a 22 GB table and set the size of shared buffers to 30GB, the > > rest is default. > > > > ╔═══════════════════════════╦═════════════════════╦════════════╗ > > ║ ║ Avg Timings in ms ║ ║ > > ╠═══════════════════════════╬══════════╦══════════╬════════════╣ > > ║ ║ master ║ patched ║ percentage ║ > > ╠═══════════════════════════╬══════════╬══════════╬════════════╣ > > ║ Both OS cache and ║ ║ ║ ║ > > ║ shared buffers are clear ║ 513.9247 ║ 463.1019 ║ %9.9 ║ > > ╠═══════════════════════════╬══════════╬══════════╬════════════╣ > > ║ OS cache is loaded but ║ ║ ║ ║ > > ║ shared buffers are clear ║ 423.1097 ║ 354.3277 ║ %16.3 ║ > > ╠═══════════════════════════╬══════════╬══════════╬════════════╣ > > ║ Shared buffers are loaded ║ ║ ║ ║ > > ║ ║ 89.2846 ║ 84.6952 ║ %5.1 ║ > > ╚═══════════════════════════╩══════════╩══════════╩════════════╝ > > > > Any kind of feedback would be appreciated. > > Thanks for working on this! > > A general review comment: I noticed you have the old streaming read > (pgsr) naming still in a few places (including comments) -- so I would > just make sure and update everywhere when you rebase in Thomas' latest > version of the read stream API. Done. > > > From c7500cc1b9068ff0b704181442999cd8bed58658 Mon Sep 17 00:00:00 2001 > > From: Nazir Bilal Yavuz <byavuz81@gmail.com> > > Date: Mon, 19 Feb 2024 14:30:47 +0300 > > Subject: [PATCH v3 2/2] Use streaming read API in ANALYZE > > > > --- a/src/backend/commands/analyze.c > > +++ b/src/backend/commands/analyze.c > > @@ -1102,6 +1102,26 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr) > > return stats; > > } > > > > +/* > > + * Prefetch callback function to get next block number while using > > + * BlockSampling algorithm > > + */ > > +static BlockNumber > > +pg_block_sampling_streaming_read_next(StreamingRead *stream, > > + void *user_data, > > + void *per_buffer_data) > > I don't think you need the pg_ prefix Done. > > > +{ > > + BlockSamplerData *bs = user_data; > > + BlockNumber *current_block = per_buffer_data; > > Why can't you just do BufferGetBlockNumber() on the buffer returned from > the read stream API instead of allocating per_buffer_data for the block > number? Done. > > > + > > + if (BlockSampler_HasMore(bs)) > > + *current_block = BlockSampler_Next(bs); > > + else > > + *current_block = InvalidBlockNumber; > > + > > + return *current_block; > > > I think we'd like to keep the read stream code in heapam-specific code. > Instead of doing streaming_read_buffer_begin() here, you could put this > in heap_beginscan() or initscan() guarded by > scan->rs_base.rs_flags & SO_TYPE_ANALYZE In the recent changes [1], heapam_scan_analyze_next_[block | tuple] are removed from tableam. They are directly called from heapam-specific code now. So, IMO, no need to do this now. v4 is rebased on top of v14 streaming read API changes. [1] 27bc1772fc814946918a5ac8ccb9b5c5ad0380aa -- Regards, Nazir Bilal Yavuz Microsoft
Attachment
pgsql-hackers by date: