Hi,
On 2019-01-28 14:10:55 -0800, Andres Freund wrote:
> So, I'd pushed the latest version. And longfin came back with an
> interesting error:
>
> ERROR: page 135 of relation "pg_class" should be empty but is not
>
> The only way I can currently imagine this happening is that there's a
> concurrent vacuum that discovers the page is empty, enters it into the
> FSM (which now isn't happening under an extension lock anymore), and
> then a separate backend starts to actually use that buffer. That seems
> tight but possible. Seems we need to re-add the
> LockRelationForExtension/UnlockRelationForExtension() logic :(
Hm, but thinking about this, isn't this a risk independent of this
change? The FSM isn't WAL logged, and given it's tree-ish structure it's
possible to "rediscover" free space (e.g. FreeSpaceMapVacuum()). ISTM
that after a crash the FSM might point to free space that doesn't
actually exist, and is rediscovered after independent changes. Not sure
if that's actually a realistic issue.
I'm inclined to put back the
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
LockRelationForExtension(onerel, ExclusiveLock);
UnlockRelationForExtension(onerel, ExclusiveLock);
LockBufferForCleanup(buf);
if (PageIsNew(page))
dance regardless, just to get the buildfarm to green?
But I do wonder if we should just make hio.c cope with this instead.
Greetings,
Andres Freund