Re: massive FPI_FOR_HINT load after promote - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: massive FPI_FOR_HINT load after promote |
Date | |
Msg-id | 20200811174159.GA9374@alvherre.pgsql Whole thread Raw |
In response to | Re: massive FPI_FOR_HINT load after promote (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>) |
Responses |
Re: massive FPI_FOR_HINT load after promote
|
List | pgsql-hackers |
On 2020-Aug-11, Masahiko Sawada wrote: > On Tue, 11 Aug 2020 at 07:56, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > So if you have some table where tuples gain hint bits in bulk, and > > rarely modify the pages afterwards, and promote before those pages are > > frozen, then you may end up with a massive amount of pages that will > > need hinting after the promote, which can become troublesome. > > Did the case you observed not use hot standby? I thought the impact of > this issue could be somewhat alleviated in hot standby cases since > read queries on the hot standby can set hint bits. Oh, interesting, I didn't know that. However, it's not 100% true: the standby can set the bit in shared buffers, but it does not mark the page dirty. So when the page is evicted, those bits that were set are lost. That's not great. See MarkBufferDirtyHint: /* * If we need to protect hint bit updates from torn writes, WAL-log a * full page image of the page. This full page image is only necessary * if the hint bit update is the first change to the page since the * last checkpoint. * * We don't check full_page_writes here because that logic is included * when we call XLogInsert() since the value changes dynamically. */ if (XLogHintBitIsNeeded() && (pg_atomic_read_u32(&bufHdr->state) & BM_PERMANENT)) { /* * If we must not write WAL, due to a relfilenode-specific * condition or being in recovery, don't dirty the page. We can * set the hint, just not dirty the page as a result so the hint * is lost when we evict the page or shutdown. * * See src/backend/storage/page/README for longer discussion. */ if (RecoveryInProgress() || RelFileNodeSkippingWAL(bufHdr->tag.rnode)) return; > > One simple idea to try to forestall this problem would be to modify the > > algorithm so that all tuples are scanned and hinted if the page is going > > to be dirtied -- then send a single FPI setting bits for all tuples, > > instead of just on the first tuple. > > This idea seems good to me but I'm concerned a bit that the > probability of concurrent processes writing FPI for the same page > might get higher since concurrent processes could set hint bits at the > same time. If it's true, I wonder if we can advertise hint bits are > being updated to prevent concurrent FPI writes for the same page. Hmm, a very good point. Sounds like we would need to obtain an exclusive lock on the page .. but that would be very problematic. I don't have a concrete proposal to solve this problem ATM, but it's more and more looking like it's a serious problem. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: