Page freezing, FSM, and WAL replay - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Page freezing, FSM, and WAL replay |
Date | |
Msg-id | 20180802172857.5skoexsilnjvgruk@alvherre.pgsql Whole thread Raw |
Responses |
Re: Page freezing, FSM, and WAL replay
|
List | pgsql-hackers |
We recently had a customer report a very strange problem, involving a very large insert-only table: without explanation, insertions would stall for several seconds, causing application timeout and process accumulation and other nastiness. After some investigation, we narrowed this down to happening immediately after the first VACUUM on the table right after a standby got promoted. It wasn't at first obvious what the connection between these factors was, but eventually we realized that VACUUM must have been skipping a bunch of pages because they had been marked all-frozen previously, so the FSM was not updated with the correct freespace figures for those pages. The FSM pages had been transmitted as full-page images on WAL before the promotion (because wal_log_hints), so they contained optimistic numbers on amount of free space coming from the previous master. (Because this only happens on the first change to that FSM page after a checkpoint, it's quite likely that one page every few thousand or so contains optimistic figures while the others remain all zeroes, or something like that.) Before VACUUM, nothing too bad would happen, because the upper layers of the FSM would not know about those optimistic numbers. But when VACUUM does FreeSpaceMapVacuum, it propagates those numbers upwards; as soon as that happens, inserters looking for pages would be told about those pages (wrongly catalogued to contain sufficient free space), go to insert there, and fail because there isn't actually any freespace; ask FSM for another page, lather, rinse, repeat until all those pages are all catalogued correctly by FSM, at which point things continue normally. (There are many processes doing this chase-up concurrently and it seems a pretty contentious process, about which see last paragraph; it can be seen in pg_xlogdump that it takes several seconds for things to settle). After considering several possible solutions, I propose to have heap_xlog_visible compute free space for any page being marked frozen; Pavan adds to that to have heap_xlog_clean compute free space for all pages also. This means that if we later promote this standby and VACUUM skips all-frozen pages, their FSM numbers are going to be up-to-date anyway. Patch attached. Now, it's possible that the problem occurs for all-visible pages not just all-frozen. I haven't seen that one, maybe there's some reason why it cannot. But fixing both things together is an easy change in the proposed patch: just do it on xlrec->flags != 0 rather than checking for the specific all-frozen flag. (This problem seems to be made worse by the fact that RecordAndGetPageWithFreeSpace (or rather fsm_set_and_search) holds exclusive lock on the FSM page for the whole duration of update plus search. So when there are many inserters, they all race to the update process. Maybe it'd be less terrible if we would release exclusive after the update and grab shared lock for the search in fsm_set_and_search, but we still have to have the exclusive for the update, so the contention point remains. Maybe there's not sufficient improvement to make a practical difference, so I'm not proposing changing this.) -- Álvaro Herrera
Attachment
pgsql-hackers by date: