Re: Using read stream in autoprewarm - Mailing list pgsql-hackers

From Nazir Bilal Yavuz
Subject Re: Using read stream in autoprewarm
Date
Msg-id CAN55FZ32QnQM41RMRU7a8T6A7anycWPr9PbK0dozOfgz9SxRPA@mail.gmail.com
Whole thread Raw
In response to Re: Using read stream in autoprewarm  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Using read stream in autoprewarm
List pgsql-hackers
Hi,

Thank you for looking into this!

On Sat, 29 Mar 2025 at 23:10, Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Fri, Nov 29, 2024 at 6:20 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> >
> > v4 is attached.
>
> I've started looking at this version. What I don't like about this
> structure is that we are forced to increment the position in the array
> of BlockInfoRecords in both the callback and the main loop in
> autoprewarm_database_main(). There isn't a way around it because we
> have to return control to the user when we encounter a new relation
> (we can't close the relation and destroy the read stream in the
> callback). And in the main loop in autoprewarm_database_main(), we may
> fail to open the next relation and then need to keep advancing the
> position in the array of BlockInfoRecords.
>
> It isn't just that we have to advance the position in both places --
> we also have to have a special case for the first block. All in all,
> given that in the current read stream API, a single stream must only
> concern itself with a single relation, fork combo, I think there is no
> elegant way to deal with this in autoprewarm.
>
> One alternative is to loop through the array of BlockInfoRecords and
> get the start and end positions of the blocks in the arary for a
> single relation/fork combo. Then we could make the read stream and
> pass those two positions and the array as callback_private_data. That
> would mean we loop through the whole array twice, but I wonder if the
> improvement in clarity is worth it?

I think this is a good alternative. I will work on this and try to
propose a patch.

> Some review feedback on your v4: I don't think we need the
> rs_have_free_buffer per_buffer_data. We can just check
> have_free_buffers() in both the callback and main loop in
> autoprewarm_database_main(). I also think you want a comment about why
> first_block is needed. And I think you need to guard the
> read_stream_end() after the loop -- what if we never made a read
> stream because we errored out for all the block's relations or
> something?

All of these are addressed. One extra thing I noticed is we were not
checking if blocknum < number_of_block_in_relation at the first_block
case in the stream callback, this is fixed now.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachment

pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: speedup COPY TO for partitioned table.
Next
From: Magnus Hagander
Date:
Subject: Re: pg_stat_database.checksum_failures vs shared relations