On 27/02/2023 23:45, Andres Freund wrote:
> On 2023-02-27 18:06:22 +0200, Heikki Linnakangas wrote:
>> We also do this in freespace.c and visibilitymap.c:
>>
>> /* Extend as needed. */
>> while (fsm_nblocks_now < fsm_nblocks)
>> {
>> PageSetChecksumInplace((Page) pg.data, fsm_nblocks_now);
>>
>> smgrextend(reln, FSM_FORKNUM, fsm_nblocks_now,
>> pg.data, false);
>> fsm_nblocks_now++;
>> }
>>
>> We could use the new smgrzeroextend function here. But it would be better to
>> go through the buffer cache, because after this, the last block, at
>> 'fsm_nblocks', will be read with ReadBuffer() and modified.
>
> I doubt it's a particularly crucial thing to optimize.
Yeah, it won't make any practical difference to performance. I'm more
thinking if we can make this more consistent with other places where we
extend a relation.
> But, uh, isn't this code racy? Because this doesn't go through shared buffers,
> there's no IO_IN_PROGRESS interlocking against a concurrent reader. We know
> that writing pages isn't atomic vs readers. So another connection could
> connection could see the new relation size, but a read might return a
> partially written state of the page. Which then would cause checksum
> failures. And even worse, I think it could lead to loosing a write, if the
> concurrent connection writes out a page.
fsm_readbuf and vm_readbuf check the relation size first, with
smgrnblocks(), before trying to read the page. So to have a problem, the
smgrnblocks() would have to already return the new size, but the
smgrread() would not return the new contents. I don't think that's
possible, but not sure.
>> We could use BulkExtendSharedRelationBuffered() to extend the relation and
>> keep the last page locked, but the BulkExtendSharedRelationBuffered()
>> signature doesn't allow that. It can return the first N pages locked, but
>> there's no way to return the *last* page locked.
>
> We can't rely on bulk extending a, potentially, large number of pages in one
> go anyway (since we might not be allowed to pin that many pages). So I don't
> think requiring locking the last page is a really viable API.
>
> I think for this case I'd just just use the ExtendRelationTo() API we were
> discussing nearby. Compared to the cost of reducing syscalls / filesystem
> overhead to extend the relation, the cost of the buffer mapping lookup does't
> seem significant. That's different in e.g. the hio.c case, because there we
> need a buffer with free space, and concurrent activity could otherwise fill up
> the buffer before we can lock it again.
Works for me.
- Heikki