Re: refactoring relation extension and BufferAlloc(), faster COPY - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: refactoring relation extension and BufferAlloc(), faster COPY
Date
Msg-id 56577df0-f36d-20ee-301a-5d022980d619@iki.fi
Whole thread Raw
In response to Re: refactoring relation extension and BufferAlloc(), faster COPY  (Andres Freund <andres@anarazel.de>)
Responses Re: refactoring relation extension and BufferAlloc(), faster COPY  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 27/02/2023 23:45, Andres Freund wrote:
> On 2023-02-27 18:06:22 +0200, Heikki Linnakangas wrote:
>> We also do this in freespace.c and visibilitymap.c:
>>
>>      /* Extend as needed. */
>>      while (fsm_nblocks_now < fsm_nblocks)
>>      {
>>          PageSetChecksumInplace((Page) pg.data, fsm_nblocks_now);
>>
>>          smgrextend(reln, FSM_FORKNUM, fsm_nblocks_now,
>>                     pg.data, false);
>>          fsm_nblocks_now++;
>>       }
>>
>> We could use the new smgrzeroextend function here. But it would be better to
>> go through the buffer cache, because after this, the last block, at
>> 'fsm_nblocks', will be read with ReadBuffer() and modified.
> 
> I doubt it's a particularly crucial thing to optimize.

Yeah, it won't make any practical difference to performance. I'm more 
thinking if we can make this more consistent with other places where we 
extend a relation.

> But, uh, isn't this code racy? Because this doesn't go through shared buffers,
> there's no IO_IN_PROGRESS interlocking against a concurrent reader. We know
> that writing pages isn't atomic vs readers. So another connection could
> connection could see the new relation size, but a read might return a
> partially written state of the page. Which then would cause checksum
> failures. And even worse, I think it could lead to loosing a write, if the
> concurrent connection writes out a page.

fsm_readbuf and vm_readbuf check the relation size first, with 
smgrnblocks(), before trying to read the page. So to have a problem, the 
smgrnblocks() would have to already return the new size, but the 
smgrread() would not return the new contents. I don't think that's 
possible, but not sure.

>> We could use BulkExtendSharedRelationBuffered() to extend the relation and
>> keep the last page locked, but the BulkExtendSharedRelationBuffered()
>> signature doesn't allow that. It can return the first N pages locked, but
>> there's no way to return the *last* page locked.
> 
> We can't rely on bulk extending a, potentially, large number of pages in one
> go anyway (since we might not be allowed to pin that many pages). So I don't
> think requiring locking the last page is a really viable API.
> 
> I think for this case I'd just just use the ExtendRelationTo() API we were
> discussing nearby. Compared to the cost of reducing syscalls / filesystem
> overhead to extend the relation, the cost of the buffer mapping lookup does't
> seem significant. That's different in e.g. the hio.c case, because there we
> need a buffer with free space, and concurrent activity could otherwise fill up
> the buffer before we can lock it again.

Works for me.

- Heikki




pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: documentation updates for SQL:2023
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: Time delayed LR (WAS Re: logical replication restrictions)