Re: hio.c does visibilitymap_pin()/IO while holding buffer lock - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: hio.c does visibilitymap_pin()/IO while holding buffer lock
Date
Msg-id 0c11032a-d91b-8875-ab7b-4554776390c5@enterprisedb.com
Whole thread Raw
In response to hio.c does visibilitymap_pin()/IO while holding buffer lock  (Andres Freund <andres@anarazel.de>)
Responses Re: hio.c does visibilitymap_pin()/IO while holding buffer lock
List pgsql-hackers
On 3/25/23 03:57, Andres Freund wrote:
> Hi,
> 
> Starting with
> 
> commit 7db0cd2145f2bce84cac92402e205e4d2b045bf2
> Author: Tomas Vondra <tomas.vondra@postgresql.org>
> Date:   2021-01-17 22:11:39 +0100
> 
>     Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE
> 

That's a bummer :-(

> RelationGetBufferForTuple does
> 
>     /*
>      * The page is empty, pin vmbuffer to set all_frozen bit.
>      */
>     if (options & HEAP_INSERT_FROZEN)
>     {
>         Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
>         visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
>     }
> 
> while holding a buffer lock. visibilitymap_pin() reads pages, if vmbuffer
> doesn't already point to the right block.
> 
> 
> The lock ordering rules are to lock VM pages *before* locking heap pages.
> 
> 
> I think the reason this hasn't yet bitten us badly, is that INSERT_FROZEN
> effectively requires that the relation is access exclusive locked.  There
> shouldn't be other backends locking multiple buffers in the relation (bgwriter
> / checkpointer can lock a single buffer at a time, but that's it).
>

Right. Still, it seems a bit fragile ...

> 
> I see roughly two ways forward:
> 
> 1) We add a comment explaining why it's safe to violate lock ordering rules in
>    this one situation
> 

Possible, although I feel uneasy about just documenting a broken rule.
Would be better to maintain the locking order.

> 2) Change relevant code so that we only return a valid vmbuffer if we could do
>    so without blocking / IO and, obviously, skip updating the VM if we
>    couldn't get the buffer.
> 

I don't recall the exact details about the vm locking/pinning, but can't
we just ensure we actually follow the proper locking order? I mean, this
only deals with new pages, requested at line ~624:

  buffer = ReadBufferBI(relation, P_NEW, RBM_ZERO_AND_LOCK, bistate);

Can't we ensure we actually lock the vm buffer too in ReadBufferBI,
before calling ReadBufferExtended? Or am I confused and that's not
possible for some reason?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: meson/msys2 fails with plperl/Strawberry
Next
From: Tom Lane
Date:
Subject: Re: About the constant-TRUE clause in reconsider_outer_join_clauses