Re: refactoring relation extension and BufferAlloc(), faster COPY - Mailing list pgsql-hackers

From Andres Freund
Subject Re: refactoring relation extension and BufferAlloc(), faster COPY
Date
Msg-id 20230330010342.rxyr5wm6szkt42wq@awork3.anarazel.de
Whole thread Raw
In response to Re: refactoring relation extension and BufferAlloc(), faster COPY  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: refactoring relation extension and BufferAlloc(), faster COPY  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

On 2023-03-29 20:51:04 -0400, Melanie Plageman wrote:
> > > +    if (returnCode == 0)
> > > +        return 0;
> > > +
> > > +    /* for compatibility with %m printing etc */
> > > +    errno = returnCode;
> > > +
> > > +    /*
> > > +    * Return in cases of a "real" failure, if fallocate is not supported,
> > > +    * fall through to the FileZero() backed implementation.
> > > +    */
> > > +    if (returnCode != EINVAL && returnCode != EOPNOTSUPP)
> > > +        return returnCode;
> > >
> > > I'm pretty sure you can just delete the below if statement
> > >
> > > +    if (returnCode == 0 ||
> > > +        (returnCode != EINVAL && returnCode != EINVAL))
> > > +        return returnCode;
> >
> > Hm. I don't see how - wouldn't that lead us to call FileZero(), even if
> > FileFallocate() succeeded or failed (rather than not being supported)?
> 
> Uh...I'm confused...maybe my eyes aren't working. If returnCode was 0,
> you already would have returned and if returnCode wasn't EINVAL, you
> also already would have returned.
> Not to mention (returnCode != EINVAL && returnCode != EINVAL) contains
> two identical operands.

I'm afraid it was not your eyes that weren't working...


> > >  void
> > > -BufferCheckOneLocalPin(Buffer buffer)
> > > +BufferCheckWePinOnce(Buffer buffer)
> > >
> > > This name is weird. Who is we?
> >
> > The current backend. I.e. the function checks the current backend pins the
> > buffer exactly once, rather that *any* backend pins it once.
> >
> > I now see that BufferIsPinned() is named, IMO, misleadingly, more generally,
> > even though it also just applies to pins by the current backend.
> 
> Maybe there is a way to use "self" instead of a pronoun? But, if you
> feel quite strongly about a pronoun, I think "we" implies more than one
> backend, so "I" would be better.

I have no strong feelings around this in any form :)




> > > @@ -4709,8 +4704,6 @@ TerminateBufferIO(BufferDesc *buf, bool
> > > clear_dirty, uint32 set_flag_bits)
> > >  {
> > >      uint32        buf_state;
> > >
> > > I noticed that the comment above TermianteBufferIO() says
> > >  * TerminateBufferIO: release a buffer we were doing I/O on
> > >  *    (Assumptions)
> > >  *    My process is executing IO for the buffer
> > >
> > > Can we still say this is an assumption? What about when it is being
> > > cleaned up after being called from AbortBufferIO()
> >
> > That hasn't really changed - it was already called by AbortBufferIO().
> >
> > I think it's still correct, too. We must have marked the IO as being in
> > progress to get there.
> 
> Oh, no I meant the "my process is executing IO for the buffer" --
> couldn't another backend clear IO_IN_PROGRESS (i.e. not the original one
> who set it on all the victim buffers)?

No. Or at least not yet ;) - with AIO we will... Only the IO issuing backend
currently is allowed to reset IO_IN_PROGRESS.


> > > +    /* Should the smgr size cache be cleared? */
> > > +    EB_CLEAR_SIZE_CACHE = (1 << 4),
> > > +
> > > +    /* internal flags follow */
> > >
> > > I don't understand what this comment means ("internal flags follow")
> >
> > Hm - just that the flags defined subsequently are for internal use, not for
> > callers to specify.
> 
> If EB_LOCK_TARGET is the only one of these, it might be more clear, for
> now, to just say "an internal flag" or "for internal use" above
> EB_LOCK_TARGET, since it is the only one.

I am quite certain it won't be the only one...


> > > -        /* Without FSM, always fall out of the loop and extend */
> > > -        if (!use_fsm)
> > > -            break;
> > > +        if (bistate
> > > +            && bistate->next_free != InvalidBlockNumber
> > > +            && bistate->next_free <= bistate->last_free)
> > > +        {
> > > +            /*
> > > +            * We bulk extended the relation before, and there are still some
> > > +            * unused pages from that extension, so we don't need to look in
> > > +            * the FSM for a new page. But do record the free space from the
> > > +            * last page, somebody might insert narrower tuples later.
> > > +            */
> > >
> > > Why couldn't we have found out that we bulk-extended before and get the
> > > block from there up above the while loop?
> >
> > I'm not quite sure I follow - above the loop there might still have been space
> > on the prior page? We also need the ability to loop if the space has been used
> > since.
> >
> > I guess there's an argument for also checking above the loop, but I don't
> > think that'd currently ever be reachable.
> 
> My idea was that directly below this code in RelationGetBufferForTuple():
> 
>     if (bistate && bistate->current_buf != InvalidBuffer)
>         targetBlock = BufferGetBlockNumber(bistate->current_buf);
>     else
>         targetBlock = RelationGetTargetBlock(relation);
> 
> We could check bistate->next_free instead of checking the freespace map first.
> 
> But, perhaps we couldn't hit this because we would have already have set
> current_buf if we had a next_free?

Correct. I think it might be worth doing a larger refactoring of that function
at some point not too far away...

It's definitely somewhat sad that we spend time locking the buffer, recheck
pins etc, for the callers like heap_multi_insert() and heap_update() that
already know that the page is full. But that seems like independent enough
that I'd not tackle it now.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: refactoring relation extension and BufferAlloc(), faster COPY
Next
From: Peter Smith
Date:
Subject: Re: Simplify some codes in pgoutput