Re: Using read stream in autoprewarm - Mailing list pgsql-hackers
From | Nazir Bilal Yavuz |
---|---|
Subject | Re: Using read stream in autoprewarm |
Date | |
Msg-id | CAN55FZ2ZSBLvfSF0wMAwNAkjZO6sMZrDhAKB2vZsBG0DnrVV=A@mail.gmail.com Whole thread Raw |
In response to | Re: Using read stream in autoprewarm (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: Using read stream in autoprewarm
|
List | pgsql-hackers |
Hi, On Wed, 2 Apr 2025 at 01:36, Melanie Plageman <melanieplageman@gmail.com> wrote: > > On Tue, Apr 1, 2025 at 8:50 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote: > > > > I am attaching v8, which is an updated version of the v7. I tried to > > get rid of these local variables and refactored code to make logic > > more straightforward instead of going back and forth. > > > > 0001 and 0002 are v8. 0003 is another refactoring attempt to make code > > more straightforward. I did not squash 0003 to previous patches as you > > might not like it. > > I looked at the code on your github branch that has all three of these > squashed together. Thank you! > I think our approaches are converging. I like that you are > fast-forwarding to the next filenumber or fork number explicitly when > there is a bad relation or fork. I've changed my version (see newest > one attached) to do the fast-forwarding inline instead of in a > separate function like yours (the function didn't save many LOC and > actually may have added to cognitive overhead). > > Compared to my version, I think you avoided one level of loop nesting with your > > if (!rel) > else if (smgrexists(RelationGetSmgr(rel), blk->forknum)) > else > > but for starters, I don't think you can do this: > > else if (smgrexists(RelationGetSmgr(rel), blk->forknum)) > > because you didn't check if you have a legal forknum first You are right, I missed that. I think smgrexists() should return NULL if the forknum is invalid but it is not a topic for this thread. > And, I actually kind of prefer the explicitly nested structure > > loop through all relations > loop through all forks > loop through all buffers I prefer this as well. We know when we opened the relation, so we do not need to close it in two places like I did. > While in the old structure, I liked your > autoprewarm_prewarm_relation() function, but I think it is nicer > inlined like in my version. It makes the loop through all buffers > explicit too. Yes, I liked your approach. > I know you mentioned off-list that you don't like the handling of > global objects in my version, but I prefer doing it this way (even > though we have to check for in the loop condition) to having to set > the current database once we reach non-shared objects. It feels too > fiddly. This way seems less error prone. Looking at this version, what > do you think? Could we do it better? I think there might be a problem with that approach. Let's say that we are able to open relation when database oid = 0 and filenumber = 18. Then we are trying to find a valid fork now. We couldn't find a valid fork immediately, so we continued looping. Then database oid is changed from 0 to let's say 1 but filenumber remains the same. We are still in the valid fork loop, so relation remains from the database oid = 0. Isn't that wrong? > Let me know what you think of this version. I think it is the best of > both our approaches. I've separated it into two commits -- the first > does all the refactoring without using the read stream API and the > second one uses the read stream API. Some comments, 0001: ReadBufferExtended() can be called in its own minimal loop, otherwise we end up doing unnecessary checks for each ReadBufferExtended() call. This is not a problem when the 0002 is applied. 0002: We don't skip blocks whose blocknum is more than nblocks_in_fork. We can add that either to a stream callback like you did before or after the read_stream_end. I prefer stream callback because of the reason below [1]. > On another topic, what are the minimal places we need to call > have_free_buffers() (in this version)? I haven't even started looking > at the last patch you've been sending that is about checking the > freelist. I'll have to do that next. I think its current places are good enough. We may add one after the read_stream_end if we want to handle blk->blocknum >= nblocks_in_fork after the read stream finishes. If we handle that in the stream callback then no need to add have_free_buffers() [1]. Other than these comments, I think the current structure looks good. -- Regards, Nazir Bilal Yavuz Microsoft
pgsql-hackers by date: