Re: Use streaming read API in ANALYZE - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Use streaming read API in ANALYZE |
Date | |
Msg-id | 20240403204420.jtzssefkzxtdzouo@liskov Whole thread Raw |
In response to | Re: Use streaming read API in ANALYZE (Nazir Bilal Yavuz <byavuz81@gmail.com>) |
Responses |
Re: Use streaming read API in ANALYZE
|
List | pgsql-hackers |
On Wed, Apr 03, 2024 at 10:25:01PM +0300, Nazir Bilal Yavuz wrote: > > I realized the same error while working on Jakub's benchmarking results. > > Cause: I was using the nblocks variable to check how many blocks will > be returned from the streaming API. But I realized that sometimes the > number returned from BlockSampler_Init() is not equal to the number of > blocks that BlockSampler_Next() will return as BlockSampling algorithm > decides how many blocks to return on the fly by using some random > seeds. > > There are a couple of solutions I thought of: > > 1- Use BlockSampler_HasMore() instead of nblocks in the main loop in > the acquire_sample_rows(): > > Streaming API uses this function to prefetch block numbers. > BlockSampler_HasMore() will reach to the end first as it is used while > prefetching, so it will start to return false while there are still > buffers to return from the streaming API. That will cause some buffers > at the end to not be processed. > > 2- Expose something (function, variable etc.) from the streaming API > to understand if the read is finished and there is no buffer to > return: > > I think this works but I am not sure if the streaming API allows > something like that. > > 3- Check every buffer returned from the streaming API, if it is > invalid stop the main loop in the acquire_sample_rows(): > > This solves the problem but there will be two if checks for each > buffer returned, > - in heapam_scan_analyze_next_block() to check if the returned buffer is invalid > - to break main loop in acquire_sample_rows() if > heapam_scan_analyze_next_block() returns false > One of the if cases can be bypassed by moving > heapam_scan_analyze_next_block()'s code to the main loop in the > acquire_sample_rows(). > > I implemented the third solution, here is v6. I've reviewed the patches inline below and attached a patch that has some of my ideas on top of your patch. > From 8d396a42186325f920d5a05e7092d8e1b66f3cdf Mon Sep 17 00:00:00 2001 > From: Nazir Bilal Yavuz <byavuz81@gmail.com> > Date: Wed, 3 Apr 2024 15:14:15 +0300 > Subject: [PATCH v6] Use streaming read API in ANALYZE > > ANALYZE command gets random tuples using BlockSampler algorithm. Use > streaming reads to get these tuples by using BlockSampler algorithm in > streaming read API prefetch logic. > --- > src/include/access/heapam.h | 6 +- > src/backend/access/heap/heapam_handler.c | 22 +++--- > src/backend/commands/analyze.c | 85 ++++++++---------------- > 3 files changed, 42 insertions(+), 71 deletions(-) > > diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h > index a307fb5f245..633caee9d95 100644 > --- a/src/include/access/heapam.h > +++ b/src/include/access/heapam.h > @@ -25,6 +25,7 @@ > #include "storage/bufpage.h" > #include "storage/dsm.h" > #include "storage/lockdefs.h" > +#include "storage/read_stream.h" > #include "storage/shm_toc.h" > #include "utils/relcache.h" > #include "utils/snapshot.h" > @@ -388,9 +389,8 @@ extern bool HeapTupleIsSurelyDead(HeapTuple htup, > struct GlobalVisState *vistest); > > /* in heap/heapam_handler.c*/ > -extern void heapam_scan_analyze_next_block(TableScanDesc scan, > - BlockNumber blockno, > - BufferAccessStrategy bstrategy); > +extern bool heapam_scan_analyze_next_block(TableScanDesc scan, > + ReadStream *stream); > extern bool heapam_scan_analyze_next_tuple(TableScanDesc scan, > TransactionId OldestXmin, > double *liverows, double *deadrows, > diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c > index 0952d4a98eb..d83fbbe6af3 100644 > --- a/src/backend/access/heap/heapam_handler.c > +++ b/src/backend/access/heap/heapam_handler.c > @@ -1054,16 +1054,16 @@ heapam_relation_copy_for_cluster(Relation OldHeap, Relation NewHeap, > } > > /* > - * Prepare to analyze block `blockno` of `scan`. The scan has been started > - * with SO_TYPE_ANALYZE option. > + * Prepare to analyze block returned from streaming object. If the block returned > + * from streaming object is valid, true is returned; otherwise false is returned. > + * The scan has been started with SO_TYPE_ANALYZE option. > * > * This routine holds a buffer pin and lock on the heap page. They are held > * until heapam_scan_analyze_next_tuple() returns false. That is until all the > * items of the heap page are analyzed. > */ > -void > -heapam_scan_analyze_next_block(TableScanDesc scan, BlockNumber blockno, > - BufferAccessStrategy bstrategy) > +bool > +heapam_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream) > { > HeapScanDesc hscan = (HeapScanDesc) scan; > > @@ -1076,11 +1076,15 @@ heapam_scan_analyze_next_block(TableScanDesc scan, BlockNumber blockno, > * doing much work per tuple, the extra lock traffic is probably better > * avoided. Personally I think heapam_scan_analyze_next_block() should be inlined. It only has a few lines. I would find it clearer inline. At the least, there is no reason for it (or heapam_scan_analyze_next_tuple()) to take a TableScanDesc instead of a HeapScanDesc. > */ > - hscan->rs_cblock = blockno; > - hscan->rs_cindex = FirstOffsetNumber; > - hscan->rs_cbuf = ReadBufferExtended(scan->rs_rd, MAIN_FORKNUM, > - blockno, RBM_NORMAL, bstrategy); > + hscan->rs_cbuf = read_stream_next_buffer(stream, NULL); > + if (hscan->rs_cbuf == InvalidBuffer) > + return false; > + > LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE); > + > + hscan->rs_cblock = BufferGetBlockNumber(hscan->rs_cbuf); > + hscan->rs_cindex = FirstOffsetNumber; > + return true; > } > /* > diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c > index 2fb39f3ede1..764520d5aa2 100644 > --- a/src/backend/commands/analyze.c > +++ b/src/backend/commands/analyze.c > @@ -1102,6 +1102,20 @@ examine_attribute(Relation onerel, int attnum, Node *index_expr) > return stats; > } > > +/* > + * Prefetch callback function to get next block number while using > + * BlockSampling algorithm > + */ > +static BlockNumber > +block_sampling_streaming_read_next(ReadStream *stream, > + void *user_data, > + void *per_buffer_data) > +{ > + BlockSamplerData *bs = user_data; > + > + return BlockSampler_HasMore(bs) ? BlockSampler_Next(bs) : InvalidBlockNumber; I don't see the point of BlockSampler_HasMore() anymore. I removed it in the attached and made BlockSampler_Next() return InvalidBlockNumber under the same conditions. Is there a reason not to do this? There aren't other callers. If the BlockSampler_Next() wasn't part of an API, we could just make it the streaming read callback, but that might be weird as it is now. That and my other ideas in attached. Let me know what you think. - Melanie
Attachment
pgsql-hackers by date: