massive FPI_FOR_HINT load after promote - Mailing list pgsql-hackers

From Alvaro Herrera
Subject massive FPI_FOR_HINT load after promote
Date
Msg-id 20200810225637.GA2424@alvherre.pgsql
Whole thread Raw
Responses Re: massive FPI_FOR_HINT load after promote  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Re: massive FPI_FOR_HINT load after promote  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
Last week, James reported to us that after promoting a replica, some
seqscan was taking a huge amount of time; on investigation he saw that
there was a high rate of FPI_FOR_HINT wal messages by the seqscan.
Looking closely at the generated traffic, HEAP_XMIN_COMMITTED was being
set on some tuples.

Now this may seem obvious to some as a drawback of the current system,
but I was taken by surprise.  The problem was simply that when a page is
examined by a seqscan, we do HeapTupleSatisfiesVisibility of each tuple
in isolation; and for each tuple we call SetHintBits().  And only the
first time the FPI happens; by the time we get to the second tuple, the
page is already dirty, so there's no need to emit an FPI.  But the FPI
we sent only had the bit on the first tuple ... so the standby will not
have the bit set for any subsequent tuple.  And on promotion, the
standby will have to have the bits set for all those tuples, unless you
happened to dirty the page again later for other reasons.

So if you have some table where tuples gain hint bits in bulk, and
rarely modify the pages afterwards, and promote before those pages are
frozen, then you may end up with a massive amount of pages that will
need hinting after the promote, which can become troublesome.

Attached is a TAP file that reproduces the problem.  It always fails,
but in the log file you can see the tuples in the primary are all hinted
committed, while on the standby only the first one is hinted committed.



One simple idea to try to forestall this problem would be to modify the
algorithm so that all tuples are scanned and hinted if the page is going
to be dirtied -- then send a single FPI setting bits for all tuples,
instead of just on the first tuple.

-- 
Álvaro Herrera

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Add LWLock blocker(s) information
Next
From: Peter Eisentraut
Date:
Subject: Re: Replace remaining StrNCpy() by strlcpy()