BRIN FSM vacuuming questions - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | BRIN FSM vacuuming questions |
Date | |
Msg-id | 5801.1522429460@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: BRIN FSM vacuuming questions
|
List | pgsql-hackers |
I noticed that several places in brin_pageops.c do this sort of thing: if (extended) { Assert(BlockNumberIsValid(newblk)); RecordPageWithFreeSpace(idxrel, newblk, freespace); FreeSpaceMapVacuum(idxrel); } ie they put *one* page into the index's free space map and then invoke a scan of the index's entire FSM to ensure that that info is bubbled up to the top immediately. I wonder how carefully this was thought through. We don't generally think that it's worth doing an FSM vacuum for a single page worth of free space. We could make these operations somewhat cheaper, in the wake of 851a26e26, by doing - FreeSpaceMapVacuum(idxrel); + FreeSpaceMapVacuumRange(idxrel, newblk, newblk + 1); so that only the relevant subtree of the FSM gets scanned. But it seems at least as plausible to just drop the FreeSpaceMapVacuum calls entirely and let the next VACUUM fix up the map, like the entire rest of the system does. This is particularly true because the code is inconsistent: some places that do RecordPageWithFreeSpace follow it up with an immediate FreeSpaceMapVacuum, and some don't. Or, perhaps, there's actually some method behind this madness and there's good reason to think that FreeSpaceMapVacuum calls in these specific places are worth the cost? If so, a comment documenting the reasoning would be appropriate. I'm also finding this code in brin_page_cleanup a bit implausible: /* Measure free space and record it */ freespace = br_page_get_freespace(page); if (freespace > GetRecordedFreeSpace(idxrel, BufferGetBlockNumber(buf))) { RecordPageWithFreeSpace(idxrel, BufferGetBlockNumber(buf), freespace); return true; } because of the fact that GetRecordedFreeSpace will report a rounded-down result owing to the limited resolution of the FSM's numbers. If we assume that the result of br_page_get_freespace is always maxaligned, then by my math there's a 75% chance (on maxalign-8 machines; more if maxalign is 4) that this will think there's a discrepancy even when there is none. It seems questionable to me that it's worth checking GetRecordedFreeSpace at all. If it really is, we'd need to get the recorded free space, invoke RecordPageWithFreeSpace, and then see if the result of GetRecordedFreeSpace has changed in order to decide whether anything really changed. But that seems overly complicated, and again unlike anything done elsewhere. Also, the result value is used to decide whether FreeSpaceMapVacuum is needed, but I'm unconvinced that it's a good idea to skip said call just because the leaf FSM values did not change; that loses the opportunity to clean up any upper-level FSM errors that may exist. In short, I'd suggest making this RecordPageWithFreeSpace call unconditional, dropping the result value of brin_page_cleanup, and doing FreeSpaceMapVacuum unconditionally too, back in brin_vacuum_scan. I don't see a reason for brin to be unlike the rest of the system here. Lastly, I notice that brin_vacuum_scan's loop is coded like for (blkno = 0; blkno < RelationGetNumberOfBlocks(idxrel); blkno++) { ... } which means we're incurring an lseek kernel call for *each iteration* of the loop. Surely that's pretty inefficient. Is there an intention here to be sure we examine every page even in the face of concurrent extensions? If so, we could code it like this: nblocks = RelationGetNumberOfBlocks(idxrel); for (blkno = 0; blkno < nblocks; blkno++) { ... if (blkno == nblocks-1) nblocks = RelationGetNumberOfBlocks(idxrel); } but I'd just as soon skip the recheck unless there's a really strong reason for it. Even with that, there's no guarantee somebody hasn't added another page before we ever get out of the function, so I'm unclear on what the point is. regards, tom lane
pgsql-hackers by date: