Re: New FSM patch - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: New FSM patch
Date
Msg-id 48BF9737.2050206@enterprisedb.com
Whole thread Raw
In response to Re: New FSM patch  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: New FSM patch  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
Thanks for the review!

Simon Riggs wrote:
> Can I check some aspects of this related to Hot Standby? Some of them
> sound obvious, but worth double checking.
> 
> * There will be no need to read FSM by any normal operation of a
> read-only transaction, so locking correctness considerations can
> possibly be ignored during recovery.

Correct.

A HOT prune operation doesn't currently update the FSM, but if it did, 
that would be a case of a read-only transaction updating the FSM. But we 
can't prune anyway in a hot standby.

>  pg_freespacemap exists though:
> would we need to prevent that from executing during recovery, or will
> the FSM be fully readable? i.e. does redo take appropriate locks already
> (I don't see any Cleanup locks being required).

pg_freespacemap, the contrib module? Yeah, the FSM should be fully readable.

During normal operation, when a bottom level page is updated, and the 
update needs to be bubbled up, the upper level page is pinned and locked 
before the lock on the lower level page is released. That interlocking 
is not done during WAL replay, and the lock on the lower level page is 
released before locking the upper page. It's not required during WAL 
replay, as there's no concurrent updates to the FSM.

> * FSM will be continuously maintained during recovery, so FSM will now
> be correct and immediately available when recovery completes?

Correct,

> * There are no cases where a screwed-up FSM will crash either recovery
> (FATAL+) or halt normal operation (PANIC)?

Hmm, there's no explicit elog(FATAL/PANIC) calls, but if the FSM is 
really corrupt, you can probably get a segfault. That should be fixable 
by adding more sanity checks, though.

> * incomplete action cleanup is fairly cheap and doesn't rely on the FSM
> being searchable to correct the error? This last is a hard one...

Correct.

> Do we have the concept of a invalid/corrupt FSM?  What happens if the
> logic goes wrong and we have a corrupt page? Will that mean we can't> complete actions against the heap?

Some scenarios I can think of:

Scenario: The binary tree on a page is corrupt, so that the value of an 
upper node is < Max(leftchild, rightchild).
Consequence: Searchers won't see the free space below that node, and 
will look elsewhere.

Scenario: The binary tree on a page is corrupt, so that the value of an 
upper node is > Max(leftchild, rightchild).
Consequence: Searchers will notice the corruption while trying to 
traverse down that path, and throw an elog(WARNING) in search_avail(). 
fsm_search will retry the search, and will in worst case go into an 
infinite loop. That's obviously not good. We could automatically fix the 
upper nodes of the tree, but that would wipe evidence that would be 
useful in debugging.

Scenario: An upper level page is corrupt, claiming that there's no free 
space on a lower level page, while there actually is. (the opposite, 
where an upper level page claims that there *is* free space on a lower 
level page, while there actually isn't, is now normal. The searcher will 
update the upper level page in that case)
Consequence: A searcher won't see that free space, and will look elsewhere.

Scenario: An upper level page is corrupt, claiming that there is free 
space on a lower level page that doesn't exist.
Consequence: Searchers will elog(ERROR), trying to read the non-existent 
FSM page.

The 3rd scenario would lead to heap inserts/updates failing. We could 
avoid that by checking that the page exists with 
RelationGetNumberOfBlocks(), but I'm not sure if it's worth the cost.

> Are there really any changes to these files?
> src/include/storage/bufmgr.h
> src/include/postmaster/bgwriter.h

Hmm, apparently not.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: [PATCH] Cleanup of GUC units code
Next
From: Simon Riggs
Date:
Subject: StartupCLOG