Home > mailing lists

Re: Using read stream in autoprewarm - Mailing list pgsql-hackers

From	Nazir Bilal Yavuz
Subject	Re: Using read stream in autoprewarm
Date	April 2, 2025 13:26:42
Msg-id	CAN55FZ2ZSBLvfSF0wMAwNAkjZO6sMZrDhAKB2vZsBG0DnrVV=A@mail.gmail.com Whole thread
In response to	Re: Using read stream in autoprewarm (Melanie Plageman <melanieplageman@gmail.com>)
Responses	Re: Using read stream in autoprewarm
List	pgsql-hackers

Tree view

Hi,

On Wed, 2 Apr 2025 at 01:36, Melanie Plageman <melanieplageman@gmail.com> wrote:
>
> On Tue, Apr 1, 2025 at 8:50 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> >
> > I am attaching v8, which is an updated version of the v7. I tried to
> > get rid of these local variables and refactored code to make logic
> > more straightforward instead of going back and forth.
> >
> > 0001 and 0002 are v8. 0003 is another refactoring attempt to make code
> > more straightforward. I did not squash 0003 to previous patches as you
> > might not like it.
>
> I looked at the code on your github branch that has all three of these
> squashed together.

Thank you!

> I think our approaches are converging. I like that you are
> fast-forwarding to the next filenumber or fork number explicitly when
> there is a bad relation or fork. I've changed my version (see newest
> one attached) to do the fast-forwarding inline instead of in a
> separate function like yours (the function didn't save many LOC and
> actually may have added to cognitive overhead).
>
> Compared to my version, I think you avoided one level of loop nesting with your
>
>         if (!rel)
>         else if (smgrexists(RelationGetSmgr(rel), blk->forknum))
>         else
>
> but for starters, I don't think you can do this:
>
>         else if (smgrexists(RelationGetSmgr(rel), blk->forknum))
>
> because you didn't check if you have a legal forknum first

You are right, I missed that. I think smgrexists() should return NULL
if the forknum is invalid but it is not a topic for this thread.

> And, I actually kind of prefer the explicitly nested structure
>
> loop through all relations
>       loop through all forks
>             loop through all buffers

I prefer this as well. We know when we opened the relation, so we do
not need to close it in two places like I did.

> While in the old structure, I liked your
> autoprewarm_prewarm_relation() function, but I think it is nicer
> inlined like in my version. It makes the loop through all buffers
> explicit too.

Yes, I liked your approach.

> I know you mentioned off-list that you don't like the handling of
> global objects in my version, but I prefer doing it this way (even
> though we have to check for in the loop condition) to having to set
> the current database once we reach non-shared objects. It feels too
> fiddly. This way seems less error prone. Looking at this version, what
> do you think? Could we do it better?

I think there might be a problem with that approach. Let's say that we
are able to open relation when database oid = 0 and filenumber = 18.
Then we are trying to find a valid fork now. We couldn't find a valid
fork immediately, so we continued looping. Then database oid is
changed from 0 to let's say 1 but filenumber remains the same. We are
still in the valid fork loop, so relation remains from the database
oid = 0. Isn't that wrong?

> Let me know what you think of this version. I think it is the best of
> both our approaches. I've separated it into two commits -- the first
> does all the refactoring without using the read stream API and the
> second one uses the read stream API.

Some comments,

0001:

ReadBufferExtended() can be called in its own minimal loop, otherwise
we end up doing unnecessary checks for each ReadBufferExtended() call.
This is not a problem when the 0002 is applied.

0002:

We don't skip blocks whose blocknum is more than nblocks_in_fork. We
can add that either to a stream callback like you did before or after
the read_stream_end. I prefer stream callback because of the reason
below [1].

> On another topic, what are the minimal places we need to call
> have_free_buffers() (in this version)? I haven't even started looking
> at the last patch you've been sending that is about checking the
> freelist. I'll have to do that next.

I think its current places are good enough. We may add one after the
read_stream_end if we want to handle blk->blocknum >= nblocks_in_fork
after the read stream finishes. If we handle that in the stream
callback then no need to add have_free_buffers() [1].

Other than these comments, I think the current structure looks good.

--
Regards,
Nazir Bilal Yavuz
Microsoft

pgsql-hackers by date:

From: Ashutosh Bapat
Date: 02 April 2025, 13:06:33
Subject: Re: Test to dump and restore objects left behind by regression

From: Amit Kapila
Date: 02 April 2025, 13:39:34
Subject: Re: Fix 035_standby_logical_decoding.pl race conditions

Re: Using read stream in autoprewarm - Mailing list pgsql-hackers

Previous

Next