Re: Using read stream in autoprewarm - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Using read stream in autoprewarm
Date
Msg-id CAAKRu_b5SOxZ46zV61O+Mt4bjJpU_x-vCipRuAAA9zPm7nWO3g@mail.gmail.com
Whole thread Raw
In response to Re: Using read stream in autoprewarm  (Nazir Bilal Yavuz <byavuz81@gmail.com>)
Responses Re: Using read stream in autoprewarm
List pgsql-hackers
On Wed, Apr 2, 2025 at 1:20 PM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> On Wed, 2 Apr 2025 at 18:54, Melanie Plageman <melanieplageman@gmail.com> wrote:
> >
> > On Wed, Apr 2, 2025 at 6:26 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> > >
> I don't have an example code right now. But what I mean is we may call
> ReadBufferExtended() in a loop for the blocks in the same fork. We
> don't need to call smgrexists() and RelationGetNumberOfBlocksInFork()
> for each block, we will call these for each fork not for each block.
> However, like I said before, this is not important when the read
> stream code is applied.

Ah, you are so right. That was totally messed up in the last version.
I've fixed it in attached v10. I think having it correct in the 0002
patch makes it easier to understand how the read stream callback is
replacing it.

> > > We don't skip blocks whose blocknum is more than nblocks_in_fork. We
> > > can add that either to a stream callback like you did before or after
> > > the read_stream_end. I prefer stream callback because of the reason
> > > below [1].
> >
> > Yep, I also thought we had to have that logic, but because we sort by
> > db,rel,fork,blkno, I think blocks with blocknumber >= nblocks_in_fork
> > will be last and so we just want to move on to the next fork.
>
> I agree that they will be the last, but won't we end up creating a
> read stream object for each block?

Ah, yes, you are right. That would have been really broken. I think I
fixed it. See attached. Now we'll only do that for the first block if
it is invalid (which is probably okay IMO).

> > I was also wondering about the other patch in your earlier set that
> > set stop_idx from get_number_of_free_buffers(). Could you tell me more
> > about that? What does it do and why is it needed with the read stream
> > but wasn't needed before?
>
> In the read stream code, we use callbacks to create bigger I/Os. These
> I/Os aren't processed until the io_combine_limit or we hit not
> sequential blocknum. In other words, when the have_free_buffer()
> function returns false in the callback; there are still queued blocks
> in the stream, although there are no free buffers in the buffer pool.
> We can end up creating I/Os bigger than free buffers in the shared
> buffers.
>
> To solve that a bit, we try to get a number of free buffers in the
> shared buffers. So, we try to minimize the problem above by using the
> actual free buffer count. That optimization has problems like if other
> processes fill shared buffers at the same time while the read stream
> is running, then this optimization will not work well.

Hmm. Yea, I do find it confusing that it will get so easily out of
date. Let's circle back to this after getting the other patches to a
good place (but before committing all of this).

- Melanie

Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Fix slot synchronization with two_phase decoding enabled
Next
From: Matthias van de Meent
Date:
Subject: Re: Incorrect result of bitmap heap scan.