Re: Use read streams in CREATE DATABASE command when the strategy is wal_log - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Use read streams in CREATE DATABASE command when the strategy is wal_log
Date
Msg-id 20240716121935.fa.nmisch@google.com
Whole thread Raw
In response to Re: Use read streams in CREATE DATABASE command when the strategy is wal_log  (Nazir Bilal Yavuz <byavuz81@gmail.com>)
Responses Re: Use read streams in CREATE DATABASE command when the strategy is wal_log
List pgsql-hackers
On Tue, Jul 16, 2024 at 02:11:20PM +0300, Nazir Bilal Yavuz wrote:
> On Fri, 12 Jul 2024 at 02:52, Noah Misch <noah@leadboat.com> wrote:
> > On Tue, Apr 16, 2024 at 02:12:19PM +0300, Nazir Bilal Yavuz wrote:
> > > --- a/src/backend/storage/aio/read_stream.c
> > > +++ b/src/backend/storage/aio/read_stream.c
> > > @@ -549,7 +549,7 @@ read_stream_begin_relation(int flags,
> > >       {
> > >               stream->ios[i].op.rel = rel;
> > >               stream->ios[i].op.smgr = RelationGetSmgr(rel);
> > > -             stream->ios[i].op.smgr_persistence = 0;
> > > +             stream->ios[i].op.smgr_persistence = rel->rd_rel->relpersistence;
> >
> > Does the following comment in ReadBuffersOperation need an update?
> >
> >         /*
> >          * The following members should be set by the caller.  If only smgr is
> >          * provided without rel, then smgr_persistence can be set to override the
> >          * default assumption of RELPERSISTENCE_PERMANENT.
> >          */
> 
> I believe it does not need to be updated but I renamed
> 'ReadBuffersOperation.smgr_persistence' as
> 'ReadBuffersOperation.persistence'. So, this comment is updated as
> well. I think that rename suits better because persistence does not
> need to come from smgr, it could come from relation, too. Do you think
> it is a good idea? If it is, does it need a separate commit?

The rename is good.  I think the comment implies "persistence" is unused when
rel!=NULL.  That implication is true before the patch but false after the
patch.

> > > @@ -4667,19 +4698,31 @@ RelationCopyStorageUsingBuffer(RelFileLocator srclocator,
> >
> > >       /* Iterate over each block of the source relation file. */
> > >       for (blkno = 0; blkno < nblocks; blkno++)
> > >       {
> > >               CHECK_FOR_INTERRUPTS();
> > >
> > >               /* Read block from source relation. */
> > > -             srcBuf = ReadBufferWithoutRelcache(srclocator, forkNum, blkno,
> > > -                                                                                RBM_NORMAL, bstrategy_src,
> > > -                                                                                permanent);
> > > +             srcBuf = read_stream_next_buffer(src_stream, NULL);
> > >               LockBuffer(srcBuf, BUFFER_LOCK_SHARE);
> >
> > I think this should check for read_stream_next_buffer() returning
> > InvalidBuffer.  pg_prewarm doesn't, but the other callers do, and I think the
> > other callers are a better model.  LockBuffer() doesn't check the
> > InvalidBuffer case, so let's avoid the style of using a
> > read_stream_next_buffer() return value without checking.
> 
> There is an assert in the LockBuffer which checks for the
> InvalidBuffer. If that is not enough, we may add an if check for
> InvalidBuffer but what should we do in this case? It should not
> happen, so erroring out may be a good idea.

I like this style from read_stream_reset():

    while ((buffer = read_stream_next_buffer(stream, NULL)) != InvalidBuffer)
    {
        ...
    }

That is, don't iterate over block numbers.  Drain the stream until empty.  If
the stream returns a number of blocks higher or lower than we expected, we
won't detect that, and that's okay.  It's not a strong preference, so I'm open
to arguments against that from you or others.  A counterargument could be that
read_stream_reset() doesn't know the buffer count, so it has no choice.  The
counterargument could say that callers knowing the block count should use the
pg_prewarm() style, and others should use the read_stream_reset() style.



pgsql-hackers by date:

Previous
From: Yasir
Date:
Subject: Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows
Next
From: Andrew Dunstan
Date:
Subject: Re: ActiveState Perl is not valid anymore to build PG17 on the Windows 10/11 platforms, So Documentation still suggesting it should be updated