Re: Use streaming read API in ANALYZE - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Use streaming read API in ANALYZE |
Date | |
Msg-id | 20240407222631.zew43rr74ajbadq6@liskov Whole thread Raw |
In response to | Re: Use streaming read API in ANALYZE (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Use streaming read API in ANALYZE
Re: Use streaming read API in ANALYZE |
List | pgsql-hackers |
On Sun, Apr 07, 2024 at 03:00:00PM -0700, Andres Freund wrote: > Hi, > > On 2024-04-07 16:59:26 -0400, Melanie Plageman wrote: > > From 1dc2343661f3edb3b1bc4307afb0e956397eb76c Mon Sep 17 00:00:00 2001 > > From: Melanie Plageman <melanieplageman@gmail.com> > > Date: Sun, 7 Apr 2024 14:55:22 -0400 > > Subject: [PATCH v10 1/3] Make heapam_scan_analyze_next_[tuple|block] static. > > > > 27bc1772fc81 removed the table AM callbacks scan_analyze_next_block and > > scan_analzye_next_tuple -- leaving their heap AM implementations only > > called by acquire_sample_rows(). > > Ugh, I don't think 27bc1772fc81 makes much sense. But that's unrelated to this > thread. I did raise that separately > https://www.postgresql.org/message-id/20240407214001.jgpg5q3yv33ve6y3%40awork3.anarazel.de > > Unless I seriously missed something, I see no alternative to reverting that > commit. Noted. I'll give up on this refactor then. Lots of churn for no gain. Attached v11 is just Bilal's v8 patch rebased to apply cleanly and with a few tweaks (I changed one of the loop conditions. All other changes are to comments and commit message). > > @@ -1206,11 +1357,13 @@ acquire_sample_rows(Relation onerel, int elevel, > > break; > > > > prefetch_block = BlockSampler_Next(&prefetch_bs); > > - PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, prefetch_block); > > + PrefetchBuffer(scan->rs_base.rs_rd, MAIN_FORKNUM, prefetch_block); > > } > > } > > #endif > > > > + scan->rs_cbuf = InvalidBuffer; > > + > > /* Outer loop over blocks to sample */ > > while (BlockSampler_HasMore(&bs)) > > { > > I don't think it's good to move a lot of code *and* change how it is > structured in the same commit. Makes it much harder to actually see changes / > makes git blame harder to use / etc. Yep. > > From 90d115c2401567be65bcf64393a6d3b39286779e Mon Sep 17 00:00:00 2001 > > From: Melanie Plageman <melanieplageman@gmail.com> > > Date: Sun, 7 Apr 2024 15:28:32 -0400 > > Subject: [PATCH v10 2/3] Use streaming read API in ANALYZE > > > > The ANALYZE command prefetches and reads sample blocks chosen by a > > BlockSampler algorithm. Instead of calling Prefetch|ReadBuffer() for > > each block, ANALYZE now uses the streaming API introduced in b5a9b18cd0. > > > > Author: Nazir Bilal Yavuz > > Reviewed-by: Melanie Plageman > > Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com > > --- > > src/backend/commands/analyze.c | 89 ++++++++++------------------------ > > 1 file changed, 26 insertions(+), 63 deletions(-) > > That's a very nice demonstration of how this makes good prefetching easier... Agreed. Yay streaming read API and Bilal! > > From 862b7ac81cdafcda7b525e02721da14e46265509 Mon Sep 17 00:00:00 2001 > > From: Melanie Plageman <melanieplageman@gmail.com> > > Date: Sun, 7 Apr 2024 15:38:41 -0400 > > Subject: [PATCH v10 3/3] Obsolete BlockSampler_HasMore() > > > > A previous commit stopped using BlockSampler_HasMore() for flow control > > in acquire_sample_rows(). There seems little use now for > > BlockSampler_HasMore(). It should be sufficient to return > > InvalidBlockNumber from BlockSampler_Next() when BlockSample_HasMore() > > would have returned false. Remove BlockSampler_HasMore(). > > > > Author: Melanie Plageman, Nazir Bilal Yavuz > > Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com > > The justification here seems somewhat odd. Sure, the previous commit stopped > using BlockSampler_HasMore in acquire_sample_rows - but only because it was > moved to block_sampling_streaming_read_next()? It didn't stop using it. It stopped being useful. The reason it existed, as far as I can tell, was to use it as the while() loop condition in acquire_sample_rows(). I think it makes much more sense for BlockSampler_Next() to return InvalidBlockNumber when there are no more blocks -- not to assert you don't call it when there aren't any more blocks. I didn't want to change BlockSampler_Next() in the same commit as the streaming read user and we can't remove BlockSampler_HasMore() without changing BlockSampler_Next(). - Melanie
Attachment
pgsql-hackers by date: