Re: Use streaming read API in ANALYZE - Mailing list pgsql-hackers
From | Nazir Bilal Yavuz |
---|---|
Subject | Re: Use streaming read API in ANALYZE |
Date | |
Msg-id | CAN55FZ2OHDHQYRxg22rRVinUBgxKDYiqQKa2tjW4KNqPVCZ+7g@mail.gmail.com Whole thread Raw |
In response to | Re: Use streaming read API in ANALYZE (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: Use streaming read API in ANALYZE
|
List | pgsql-hackers |
Hi, On Wed, 3 Apr 2024 at 23:44, Melanie Plageman <melanieplageman@gmail.com> wrote: > > > I've reviewed the patches inline below and attached a patch that has > some of my ideas on top of your patch. Thank you! > > > From 8d396a42186325f920d5a05e7092d8e1b66f3cdf Mon Sep 17 00:00:00 2001 > > From: Nazir Bilal Yavuz <byavuz81@gmail.com> > > Date: Wed, 3 Apr 2024 15:14:15 +0300 > > Subject: [PATCH v6] Use streaming read API in ANALYZE > > > > ANALYZE command gets random tuples using BlockSampler algorithm. Use > > streaming reads to get these tuples by using BlockSampler algorithm in > > streaming read API prefetch logic. > > --- > > src/include/access/heapam.h | 6 +- > > src/backend/access/heap/heapam_handler.c | 22 +++--- > > src/backend/commands/analyze.c | 85 ++++++++---------------- > > 3 files changed, 42 insertions(+), 71 deletions(-) > > > > diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h > > index a307fb5f245..633caee9d95 100644 > > --- a/src/include/access/heapam.h > > +++ b/src/include/access/heapam.h > > @@ -25,6 +25,7 @@ > > #include "storage/bufpage.h" > > #include "storage/dsm.h" > > #include "storage/lockdefs.h" > > +#include "storage/read_stream.h" > > #include "storage/shm_toc.h" > > #include "utils/relcache.h" > > #include "utils/snapshot.h" > > @@ -388,9 +389,8 @@ extern bool HeapTupleIsSurelyDead(HeapTuple htup, > > struct GlobalVisState *vistest); > > > > /* in heap/heapam_handler.c*/ > > -extern void heapam_scan_analyze_next_block(TableScanDesc scan, > > - BlockNumber blockno, > > - BufferAccessStrategy bstrategy); > > +extern bool heapam_scan_analyze_next_block(TableScanDesc scan, > > + ReadStream *stream); > > extern bool heapam_scan_analyze_next_tuple(TableScanDesc scan, > > TransactionId OldestXmin, > > double *liverows, double *deadrows, > > diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c > > index 0952d4a98eb..d83fbbe6af3 100644 > > --- a/src/backend/access/heap/heapam_handler.c > > +++ b/src/backend/access/heap/heapam_handler.c > > @@ -1054,16 +1054,16 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap, > > } > > > > /* > > - * Prepare to analyze block `blockno` of `scan`. The scan has been started > > - * with SO_TYPE_ANALYZE option. > > + * Prepare to analyze block returned from streaming object. If the block returned > > + * from streaming object is valid, true is returned; otherwise false is returned. > > + * The scan has been started with SO_TYPE_ANALYZE option. > > * > > * This routine holds a buffer pin and lock on the heap page. They are held > > * until heapam_scan_analyze_next_tuple() returns false. That is until all the > > * items of the heap page are analyzed. > > */ > > -void > > -heapam_scan_analyze_next_block(TableScanDesc scan, BlockNumber blockno, > > - BufferAccessStrategy bstrategy) > > +bool > > +heapam_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream) > > { > > HeapScanDesc hscan = (HeapScanDesc) scan; > > > > @@ -1076,11 +1076,15 @@ heapam_scan_analyze_next_block(TableScanDesc scan, BlockNumber blockno, > > * doing much work per tuple, the extra lock traffic is probably better > > * avoided. > > Personally I think heapam_scan_analyze_next_block() should be inlined. > It only has a few lines. I would find it clearer inline. At the least, > there is no reason for it (or heapam_scan_analyze_next_tuple()) to take > a TableScanDesc instead of a HeapScanDesc. I agree. > > > */ > > - hscan->rs_cblock = blockno; > > - hscan->rs_cindex = FirstOffsetNumber; > > - hscan->rs_cbuf = ReadBufferExtended(scan->rs_rd, MAIN_FORKNUM, > > - blockno, RBM_NORMAL, bstrategy); > > + hscan->rs_cbuf = read_stream_next_buffer(stream, NULL); > > + if (hscan->rs_cbuf == InvalidBuffer) > > + return false; > > + > > LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE); > > + > > + hscan->rs_cblock = BufferGetBlockNumber(hscan->rs_cbuf); > > + hscan->rs_cindex = FirstOffsetNumber; > > + return true; > > } > > > /* > > diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c > > index 2fb39f3ede1..764520d5aa2 100644 > > --- a/src/backend/commands/analyze.c > > +++ b/src/backend/commands/analyze.c > > @@ -1102,6 +1102,20 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr) > > return stats; > > } > > > > +/* > > + * Prefetch callback function to get next block number while using > > + * BlockSampling algorithm > > + */ > > +static BlockNumber > > +block_sampling_streaming_read_next(ReadStream *stream, > > + void *user_data, > > + void *per_buffer_data) > > +{ > > + BlockSamplerData *bs = user_data; > > + > > + return BlockSampler_HasMore(bs) ? BlockSampler_Next(bs) : InvalidBlockNumber; > > I don't see the point of BlockSampler_HasMore() anymore. I removed it in > the attached and made BlockSampler_Next() return InvalidBlockNumber > under the same conditions. Is there a reason not to do this? There > aren't other callers. If the BlockSampler_Next() wasn't part of an API, > we could just make it the streaming read callback, but that might be > weird as it is now. I agree. There is no reason to have BlockSampler_HasMore() after streaming read API changes. > That and my other ideas in attached. Let me know what you think. I agree with your changes but I am not sure if others agree with all the changes you have proposed. So, I didn't merge 0001 and your ideas yet, instead I wrote a commit message, added some comments, changed -> 'if (bs->t >= bs->N || bs->m >= bs->n)' to 'if (K <= 0 || k <= 0)' and attached it as 0002. -- Regards, Nazir Bilal Yavuz Microsoft
Attachment
pgsql-hackers by date: