Re: Use streaming read API in ANALYZE - Mailing list pgsql-hackers
From | Nazir Bilal Yavuz |
---|---|
Subject | Re: Use streaming read API in ANALYZE |
Date | |
Msg-id | CAN55FZ3Q9FbGtszRXPf=0wH3-A0XWkjscEzEf=V7-rZku6DGMQ@mail.gmail.com Whole thread Raw |
In response to | Re: Use streaming read API in ANALYZE (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Use streaming read API in ANALYZE
|
List | pgsql-hackers |
Hi, On Mon, 8 Apr 2024 at 04:21, Thomas Munro <thomas.munro@gmail.com> wrote: > > Pushed. Thanks Bilal and reviewers! I wanted to discuss what will happen to this patch now that 27bc1772fc8 is reverted. I am continuing this thread but I can create another thread if you prefer so. After the revert of 27bc1772fc8, acquire_sample_rows() became table-AM agnostic again. So, read stream changes might have to be pushed down now but there are a couple of roadblocks like Melanie mentioned [1] before. Quote from Melanie [1]: On Thu, 11 Apr 2024 at 19:19, Melanie Plageman <melanieplageman@gmail.com> wrote: > > I am working on pushing streaming ANALYZE into heap AM code, and I ran > into a few roadblocks. > > If we want ANALYZE to make the ReadStream object in heap_beginscan() > (like the read stream implementation of heap sequential and TID range > scans do), I don't see any way around changing the scan_begin table AM > callback to take a BufferAccessStrategy at the least (and perhaps also > the BlockSamplerData). > > read_stream_begin_relation() doesn't just save the > BufferAccessStrategy in the ReadStream, it uses it to set various > other things in the ReadStream object. callback_private_data (which in > ANALYZE's case is the BlockSamplerData) is simply saved in the > ReadStream, so it could be set later, but that doesn't sound very > clean to me. > > As such, it seems like a cleaner alternative would be to add a table > AM callback for creating a read stream object that takes the > parameters of read_stream_begin_relation(). But, perhaps it is a bit > late for such additions. If we do not want to add a new table AM callback like Melanie mentioned, it is pretty much required to pass BufferAccessStrategy and BlockSamplerData to the initscan(). > It also opens us up to the question of whether or not sequential scan > should use such a callback instead of making the read stream object in > heap_beginscan(). > > I am happy to write a patch that does any of the above. But, I want to > raise these questions, because perhaps I am simply missing an obvious > alternative solution. I wonder the same, I could not think of any alternative solution to this problem. Another quote from Melanie [2] in the same thread: On Thu, 11 Apr 2024 at 20:48, Melanie Plageman <melanieplageman@gmail.com> wrote: > > I will also say that, had this been 6 months ago, I would probably > suggest we restructure ANALYZE's table AM interface to accommodate > read stream setup and to address a few other things I find odd about > the current code. For example, I think creating a scan descriptor for > the analyze scan in acquire_sample_rows() is quite odd. It seems like > it would be better done in the relation_analyze callback. The > relation_analyze callback saves some state like the callbacks for > acquire_sample_rows() and the Buffer Access Strategy. But at least in > the heap implementation, it just saves them in static variables in > analyze.c. It seems like it would be better to save them in a useful > data structure that could be accessed later. We have access to pretty > much everything we need at that point (in the relation_analyze > callback). I also think heap's implementation of > table_beginscan_analyze() doesn't need most of > heap_beginscan()/initscan(), so doing this instead of something > ANALYZE specific seems more confusing than helpful. If we want to implement ANALYZE specific counterparts of heap_beginscan()/initscan(); we may think of passing BufferAccessStrategy and BlockSamplerData to them. Also, there is an ongoing(?) discussion about a few problems / improvements about the acquire_sample_rows() mentioned at the end of the 'Table AM Interface Enhancements' thread [3]. Should we wait for these discussions to be resolved or can we resume working on this patch? Any kind of feedback would be appreciated. [1] https://www.postgresql.org/message-id/CAAKRu_ZxU6hucckrT1SOJxKfyN7q-K4KU1y62GhDwLBZWG%2BROg%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAAKRu_YkphAPNbBR2jcLqnxGhDEWTKhYfLFY%3D0R_oG5LHBH7Gw%40mail.gmail.com [3] https://www.postgresql.org/message-id/flat/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com -- Regards, Nazir Bilal Yavuz Microsoft
pgsql-hackers by date: