Re: Use streaming read API in ANALYZE - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Use streaming read API in ANALYZE
Date
Msg-id 20240407222631.zew43rr74ajbadq6@liskov
Whole thread Raw
In response to Re: Use streaming read API in ANALYZE  (Andres Freund <andres@anarazel.de>)
Responses Re: Use streaming read API in ANALYZE
Re: Use streaming read API in ANALYZE
List pgsql-hackers
On Sun, Apr 07, 2024 at 03:00:00PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2024-04-07 16:59:26 -0400, Melanie Plageman wrote:
> > From 1dc2343661f3edb3b1bc4307afb0e956397eb76c Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman@gmail.com>
> > Date: Sun, 7 Apr 2024 14:55:22 -0400
> > Subject: [PATCH v10 1/3] Make heapam_scan_analyze_next_[tuple|block] static.
> > 
> > 27bc1772fc81 removed the table AM callbacks scan_analyze_next_block and
> > scan_analzye_next_tuple -- leaving their heap AM implementations only
> > called by acquire_sample_rows().
> 
> Ugh, I don't think 27bc1772fc81 makes much sense. But that's unrelated to this
> thread.  I did raise that separately
> https://www.postgresql.org/message-id/20240407214001.jgpg5q3yv33ve6y3%40awork3.anarazel.de
> 
> Unless I seriously missed something, I see no alternative to reverting that
> commit.

Noted. I'll give up on this refactor then. Lots of churn for no gain.
Attached v11 is just Bilal's v8 patch rebased to apply cleanly and with
a few tweaks (I changed one of the loop conditions. All other changes
are to comments and commit message).

> > @@ -1206,11 +1357,13 @@ acquire_sample_rows(Relation onerel, int elevel,
> >                  break;
> >  
> >              prefetch_block = BlockSampler_Next(&prefetch_bs);
> > -            PrefetchBuffer(scan->rs_rd, MAIN_FORKNUM, prefetch_block);
> > +            PrefetchBuffer(scan->rs_base.rs_rd, MAIN_FORKNUM, prefetch_block);
> >          }
> >      }
> >  #endif
> >  
> > +    scan->rs_cbuf = InvalidBuffer;
> > +
> >      /* Outer loop over blocks to sample */
> >      while (BlockSampler_HasMore(&bs))
> >      {
> 
> I don't think it's good to move a lot of code *and* change how it is
> structured in the same commit. Makes it much harder to actually see changes /
> makes git blame harder to use / etc.

Yep.

> > From 90d115c2401567be65bcf64393a6d3b39286779e Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman@gmail.com>
> > Date: Sun, 7 Apr 2024 15:28:32 -0400
> > Subject: [PATCH v10 2/3] Use streaming read API in ANALYZE
> >
> > The ANALYZE command prefetches and reads sample blocks chosen by a
> > BlockSampler algorithm. Instead of calling Prefetch|ReadBuffer() for
> > each block, ANALYZE now uses the streaming API introduced in b5a9b18cd0.
> >
> > Author: Nazir Bilal Yavuz
> > Reviewed-by: Melanie Plageman
> > Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com
> > ---
> >  src/backend/commands/analyze.c | 89 ++++++++++------------------------
> >  1 file changed, 26 insertions(+), 63 deletions(-)
> 
> That's a very nice demonstration of how this makes good prefetching easier...

Agreed. Yay streaming read API and Bilal!

> > From 862b7ac81cdafcda7b525e02721da14e46265509 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman@gmail.com>
> > Date: Sun, 7 Apr 2024 15:38:41 -0400
> > Subject: [PATCH v10 3/3] Obsolete BlockSampler_HasMore()
> > 
> > A previous commit stopped using BlockSampler_HasMore() for flow control
> > in acquire_sample_rows(). There seems little use now for
> > BlockSampler_HasMore(). It should be sufficient to return
> > InvalidBlockNumber from BlockSampler_Next() when BlockSample_HasMore()
> > would have returned false. Remove BlockSampler_HasMore().
> > 
> > Author: Melanie Plageman, Nazir Bilal Yavuz
> > Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com
> 
> The justification here seems somewhat odd. Sure, the previous commit stopped
> using BlockSampler_HasMore in acquire_sample_rows - but only because it was
> moved to block_sampling_streaming_read_next()?

It didn't stop using it. It stopped being useful. The reason it existed,
as far as I can tell, was to use it as the while() loop condition in
acquire_sample_rows(). I think it makes much more sense for
BlockSampler_Next() to return InvalidBlockNumber when there are no more
blocks -- not to assert you don't call it when there aren't any more
blocks.

I didn't want to change BlockSampler_Next() in the same commit as the
streaming read user and we can't remove BlockSampler_HasMore() without
changing BlockSampler_Next().

- Melanie

Attachment

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Next
From: Daniel Gustafsson
Date:
Subject: Re: Cluster::restart dumping logs when stop fails