Home > mailing lists

Re: Use streaming read API in ANALYZE - Mailing list pgsql-hackers

From	Nazir Bilal Yavuz
Subject	Re: Use streaming read API in ANALYZE
Date	April 2, 2024 07:23:55
Msg-id	CAN55FZ2_4hgTRK=TujRj=J2VYk8Rm8=H1KCTCeKm8zYN1zd3Nw@mail.gmail.com Whole thread
In response to	Re: Use streaming read API in ANALYZE (Melanie Plageman <melanieplageman@gmail.com>)
Responses	Re: Use streaming read API in ANALYZE Re: Use streaming read API in ANALYZE
List	pgsql-hackers

Tree view

Hi,

Thanks for the review!

On Wed, 27 Mar 2024 at 23:15, Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Tue, Mar 26, 2024 at 02:51:27PM +0300, Nazir Bilal Yavuz wrote:
> > Hi,
> >
> > On Wed, 28 Feb 2024 at 14:42, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> > >
> > >
> > > The new version of the streaming read API [1] is posted. I updated the
> > > streaming read API changes patch (0001), using the streaming read API
> > > in ANALYZE patch (0002) remains the same. This should make it easier
> > > to review as it can be applied on top of master
> > >
> > >
> >
> > The new version of the streaming read API is posted [1]. I rebased the
> > patch on top of master and v9 of the streaming read API.
> >
> > There is a minimal change in the 'using the streaming read API in ANALYZE
> > patch (0002)', I changed STREAMING_READ_FULL to STREAMING_READ_MAINTENANCE
> > to copy exactly the same behavior as before. Also, some benchmarking
> > results:
> >
> > I created a 22 GB table and set the size of shared buffers to 30GB, the
> > rest is default.
> >
> > ╔═══════════════════════════╦═════════════════════╦════════════╗
> > ║                           ║  Avg Timings in ms  ║            ║
> > ╠═══════════════════════════╬══════════╦══════════╬════════════╣
> > ║                           ║  master  ║  patched ║ percentage ║
> > ╠═══════════════════════════╬══════════╬══════════╬════════════╣
> > ║     Both OS cache and     ║          ║          ║            ║
> > ║  shared buffers are clear ║ 513.9247 ║ 463.1019 ║    %9.9    ║
> > ╠═══════════════════════════╬══════════╬══════════╬════════════╣
> > ║   OS cache is loaded but  ║          ║          ║            ║
> > ║  shared buffers are clear ║ 423.1097 ║ 354.3277 ║    %16.3   ║
> > ╠═══════════════════════════╬══════════╬══════════╬════════════╣
> > ║ Shared buffers are loaded ║          ║          ║            ║
> > ║                           ║  89.2846 ║  84.6952 ║    %5.1    ║
> > ╚═══════════════════════════╩══════════╩══════════╩════════════╝
> >
> > Any kind of feedback would be appreciated.
>
> Thanks for working on this!
>
> A general review comment: I noticed you have the old streaming read
> (pgsr) naming still in a few places (including comments) -- so I would
> just make sure and update everywhere when you rebase in Thomas' latest
> version of the read stream API.

Done.

>
> > From c7500cc1b9068ff0b704181442999cd8bed58658 Mon Sep 17 00:00:00 2001
> > From: Nazir Bilal Yavuz <byavuz81@gmail.com>
> > Date: Mon, 19 Feb 2024 14:30:47 +0300
> > Subject: [PATCH v3 2/2] Use streaming read API in ANALYZE
> >
> > --- a/src/backend/commands/analyze.c
> > +++ b/src/backend/commands/analyze.c
> > @@ -1102,6 +1102,26 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr)
> >       return stats;
> >  }
> >
> > +/*
> > + * Prefetch callback function to get next block number while using
> > + * BlockSampling algorithm
> > + */
> > +static BlockNumber
> > +pg_block_sampling_streaming_read_next(StreamingRead *stream,
> > +                                                                       void *user_data,
> > +                                                                       void *per_buffer_data)
>
> I don't think you need the pg_ prefix

Done.

>
> > +{
> > +     BlockSamplerData *bs = user_data;
> > +     BlockNumber *current_block = per_buffer_data;
>
> Why can't you just do BufferGetBlockNumber() on the buffer returned from
> the read stream API instead of allocating per_buffer_data for the block
> number?

Done.

>
> > +
> > +     if (BlockSampler_HasMore(bs))
> > +             *current_block = BlockSampler_Next(bs);
> > +     else
> > +             *current_block = InvalidBlockNumber;
> > +
> > +     return *current_block;
>
>
> I think we'd like to keep the read stream code in heapam-specific code.
> Instead of doing streaming_read_buffer_begin() here, you could put this
> in heap_beginscan() or initscan() guarded by
>         scan->rs_base.rs_flags & SO_TYPE_ANALYZE

In the recent changes [1], heapam_scan_analyze_next_[block | tuple]
are removed from tableam. They are directly called from
heapam-specific code now. So, IMO, no need to do this now.

v4 is rebased on top of v14 streaming read API changes.

[1] 27bc1772fc814946918a5ac8ccb9b5c5ad0380aa

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Attachment

pgsql-hackers by date:

From: "Zhijie Hou (Fujitsu)"
Date: 02 April 2024, 07:20:46
Subject: RE: Synchronizing slots from primary to standby

From: Daniel Gustafsson
Date: 02 April 2024, 07:24:04
Subject: Re: Reports on obsolete Postgres versions

Re: Use streaming read API in ANALYZE - Mailing list pgsql-hackers

Attachment

Previous

Next