Thread: Using read stream in autoprewarm
Hi, I am working on using the read stream in autoprewarm. I observed ~10% performance gain with this change. The patch is attached. The downside of the read stream approach is that a new read stream object needs to be created for each database, relation and fork. I was wondering if this would cause a regression but it did not (at least depending on results of my testing). Another downside could be the code getting complicated. For the testing, - I created 50 databases with each of them having 50 tables and the size of the tables are 520KB. - patched: 51157 ms - master: 56769 ms - I created 5 databases with each of them having 1 table and the size of the tables are 3GB. - patched: 32679 ms - master: 36706 ms I put debugging message with timing information in autoprewarm_database_main() function, then run autoprewarm 100 times (by restarting the server) and cleared the OS cache before each restart. Also, I ensured that the block number of the buffer returning from the read stream API is correct. I am not sure if that much testing is enough for this kind of change. Any feedback would be appreciated. -- Regards, Nazir Bilal Yavuz Microsoft
Attachment
> On 8 Aug 2024, at 11:32, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote: > > Any feedback would be appreciated. I've took a look into the patch. It seems to me that you add new block numbers to the read stream until you have buffers.So when there are no more buffers you will still have some queued blocks. Maybe can you change the logic so that number of free buffers must be enough to allocate all blocks in look-ahead distance? Thanks! Best regards, Andrey Borodin.
Dear Nazir,
At first A quick look it looks good. I will take a closer look at it tomorrow. Could you please let me know about the performance tests and graphics?
Best regards, Stepan Neretin!
Hi, Thanks for looking into this! On Thu, 31 Oct 2024 at 21:18, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote: > > > On 8 Aug 2024, at 11:32, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote: > > > > Any feedback would be appreciated. > > I've took a look into the patch. It seems to me that you add new block numbers to the read stream until you have buffers.So when there are no more buffers you will still have some queued blocks. > Maybe can you change the logic so that number of free buffers must be enough to allocate all blocks in look-ahead distance? I see what you mean. When the have_free_buffer() function returns false in the callback, there are still queued blocks in the stream although there are no free buffers in the buffer pool. I think the best way to solve this is to get the number of free buffers in the buffer pool by 'BufferStrategyControl.lastFreeBuffer - BufferStrategyControl.firstFreeBuffer' and then compare it with 'stream->pending_read_nblocks'. When the 'stream->pending_read_nblocks == number_of_free_buffers_in_buffer_pool', end the stream. The problem with that is stream->pending_read_nblocks isn't public, also I am not sure whether 'BufferStrategyControl.lastFreeBuffer - BufferStrategyControl.firstFreeBuffer' is safe to use. -- Regards, Nazir Bilal Yavuz Microsoft
Hi, Thanks for looking into this! On Thu, 31 Oct 2024 at 22:02, Stepan Neretin <sndcppg@gmail.com> wrote: > > At first A quick look it looks good. I will take a closer look at it tomorrow. Could you please let me know about the performancetests and graphics? Sorry but I didn't understand what you mean by performance tests and graphics. Do you want something else than the information in the first email [1]? [1] postgr.es/m/CAN55FZ3n8Gd%2BhajbL%3D5UkGzu_aHGRqnn%2BxktXq2fuds%3D1AOR6Q%40mail.gmail.com -- Regards, Nazir Bilal Yavuz Microsoft
> On 1 Nov 2024, at 12:51, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote: > > am not > sure whether 'BufferStrategyControl.lastFreeBuffer - > BufferStrategyControl.firstFreeBuffer' is safe to use. Ugh... it will work. But it seems to me too dirty hack. There's no scalable way to know size of a free list. Let's just comment that we might read some more buffers if database does not fit into memory? Alternatively we can count size of a free list on the start. Best regards, Andrey Borodin.
Hi, Newer reviewer here, trying to understand more about the read stream API. On Tuesday, November 26th, 2024 at 11:07 AM, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote: > Any feedback would be appreciated. I've executed the same test of 5 databases with each of them having 1 table of 3GB of size and I've got very similar results. I've also tested using a single database with 4 tables with ~60GB of size and the results compared with master was more closer but still an improvement. Note that I've also increased the default shared_buffers to 7GB to see how it works with large buffer pools. - patched: 5.4259 s - master: 5.53186 s Not to much to say about the code, I'm currently learning more about the read stream API and Postgresql hacking itself. Just some minor points and questions about the patches. v2-0002-Count-free-buffers-at-the-start-of-the-autoprewar.patch --- a/src/backend/storage/buffer/freelist.c +/* + * get_number_of_free_buffers -- a lockless way to get the number of free + * buffers in buffer pool. + * + * Note that result continuosly changes as free buffers are moved out by other + * operations. + */ +int +get_number_of_free_buffers(void) typo on continuosly -> continuously v2-0001-Use-read-stream-in-autoprewarm.patch + bool *rs_have_free_buffer = per_buffer_data; + + + *rs_have_free_buffer = true; + Not sure if I understand why this variable is needed, it seems that it is only written and never read? Just as comparison, the block_range_read_stream_cb callback used on pg_prewarm seems to not use the per_buffer_data parameter. -- Matheus Alcantara EDB: https://www.enterprisedb.com
On Wednesday, November 27th, 2024 at 11:19 AM, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote: > > v2-0001-Use-read-stream-in-autoprewarm.patch > > + bool *rs_have_free_buffer = per_buffer_data; > > + > > + > > + *rs_have_free_buffer = true; > > + > > > > Not sure if I understand why this variable is needed, it seems that it is only > > written and never read? Just as comparison, the block_range_read_stream_cb > > callback used on pg_prewarm seems to not use the per_buffer_data parameter. > > > Actually, it is read in the main loop of the > autoprewarm_database_main() function: > > /* There are no free buffers left in shared buffers, break the loop. */ > else if (!(*rs_have_free_buffer)) > break; > > apw_read_stream_next_block() callback function sets > rs_have_free_buffer's value to false when there is no free buffer left > in the shared buffers. And the code above terminates the main loop in > the autoprewarm_database_main() function when it is set to false. > > block_range_read_stream_cb() callback is used when the callback only > needs to loop over the block numbers. However, for the autoprewarm > case; the callback function needs to do additional checks so another > callback and the use of this variable are required. Ohh, I see, thanks very much for the explanation. > v3 is attached. Thanks. I don't know if there is another way that this patch could be tested? Looking forward on other reviews on this. -- Matheus Alcantara EDB: https://www.enterprisedb.com
On Wed, 27 Nov 2024 at 19:20, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote: > > Hi, > > Thank you for looking into this! > > On Wed, 27 Nov 2024 at 16:50, Matheus Alcantara <mths.dev@pm.me> wrote: > > I've executed the same test of 5 databases with each of them having 1 table of > > 3GB of size and I've got very similar results. > > > > I've also tested using a single database with 4 tables with ~60GB of size and > > the results compared with master was more closer but still an improvement. Note > > that I've also increased the default shared_buffers to 7GB to see how it works > > with large buffer pools. > > - patched: 5.4259 s > > - master: 5.53186 s > > Thanks for the testing. > > > Not to much to say about the code, I'm currently learning more about the read > > stream API and Postgresql hacking itself. Just some minor points and questions > > about the patches. > > > > > > v2-0002-Count-free-buffers-at-the-start-of-the-autoprewar.patch > > --- a/src/backend/storage/buffer/freelist.c > > +/* > > + * get_number_of_free_buffers -- a lockless way to get the number of free > > + * buffers in buffer pool. > > + * > > + * Note that result continuosly changes as free buffers are moved out by other > > + * operations. > > + */ > > +int > > +get_number_of_free_buffers(void) > > > > typo on continuosly -> continuously > > Done. > > > v2-0001-Use-read-stream-in-autoprewarm.patch > > + bool *rs_have_free_buffer = per_buffer_data; > > + > > + > > + *rs_have_free_buffer = true; > > + > > > > Not sure if I understand why this variable is needed, it seems that it is only > > written and never read? Just as comparison, the block_range_read_stream_cb > > callback used on pg_prewarm seems to not use the per_buffer_data parameter. > > Actually, it is read in the main loop of the > autoprewarm_database_main() function: > > /* There are no free buffers left in shared buffers, break the loop. */ > else if (!(*rs_have_free_buffer)) > break; > > apw_read_stream_next_block() callback function sets > rs_have_free_buffer's value to false when there is no free buffer left > in the shared buffers. And the code above terminates the main loop in > the autoprewarm_database_main() function when it is set to false. > > block_range_read_stream_cb() callback is used when the callback only > needs to loop over the block numbers. However, for the autoprewarm > case; the callback function needs to do additional checks so another > callback and the use of this variable are required. > > v3 is attached. > > -- > Regards, > Nazir Bilal Yavuz > Microsoft Hi! > + old_blk = &(p->block_info[p->pos - 1]); > + cur_blk = &(p->block_info[p->pos]); Should we Assert(p->pos > 0 && p->pos < *something*) Patch tested with no regression. -- Best regards, Kirill Reshke
On Fri, 29 Nov 2024 at 16:19, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote: > v4 is attached. Hi! I feel like we are ready to mark this as RFC, WDYT? -- Best regards, Kirill Reshke
> On 2 Dec 2024, at 16:16, Kirill Reshke <reshkekirill@gmail.com> wrote: > > I feel like we are ready to mark this as RFC, WDYT? +1 Best regards, Andrey Borodin.
Hi, On Mon, 2 Dec 2024 at 16:30, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote: > > > On 2 Dec 2024, at 16:16, Kirill Reshke <reshkekirill@gmail.com> wrote: > > > > I feel like we are ready to mark this as RFC, WDYT? > > +1 Done. -- Regards, Nazir Bilal Yavuz Microsoft