heapam_index_build_range_scan's anyvisible - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | heapam_index_build_range_scan's anyvisible |
Date | |
Msg-id | CA+TgmoYgwjmmjK24Qxb_vWAu8_Hh7gfVFcr3+R7ocdLvYOWJXg@mail.gmail.com Whole thread Raw |
Responses |
Re: heapam_index_build_range_scan's anyvisible
Re: heapam_index_build_range_scan's anyvisible |
List | pgsql-hackers |
I spent some time today studying heapam_index_build_range_scan and quickly reached the conclusion that it's kind of a mess. At heart it's pretty simple: loop over all the table, check each tuple against any qual, and pass the visible ones to the callback. However, in an attempt to make it cater to various needs slightly outside of its original design purpose, various warts have been added, and there are enough of them now that I at least find it fairly difficult to understand. One of those warts is anyvisible, which I gather was added in support of BRIN. I first spent some time looking at how the 'anyvisible' flag affects the behavior of the function. AFAICS, setting the flag to true results in three behavior changes: 1. The elog(WARNING, ...) calls about a concurrent insert/delete calls in progress can't be reached. 2. In some cases, reltuples += 1 might not occur where it would've happened otherwise. 3. If we encounter a HOT-updated which was deleted by our own transaction, we index it instead of skipping it. Change #2 doesn't matter because the only caller that passes anyvisible = true seems to be BRIN, and BRIN ignores the return value. I initially thought that change #1 must not matter either, because function has comments in several places saying that the caller must hold ShareLock or better. And I thought change #3 must also not matter, because as the comments explain, this function is used to build indexes, and if our CREATE INDEX command commits, then any deletions that it has already performed will commit too, so the fact that we haven't indexed the now-deleted tuples will be fine. Then I realized that brin_summarize_new_values() is calling this function *without* ShareLock and for *not* for the purpose of creating a new index but rather for the purpose of updating an existing index, which means #1 and #3 do matter after all. But I think it's kind of confusing because anyvisible doesn't change anything about which tuples are visible. SnapshotAny is already making "any" tuple "visible." This flag really means "caller is holding a lower-than-normal lock level and is not inserting into a brand new relfilnode". There may be more than just a cosmetic problem here, because the comments say: * It might look unsafe to use this information across buffer * lock/unlock. However, we hold ShareLock on the table so no * ordinary insert/update/delete should occur; and we hold pin on the * buffer continuously while visiting the page, so no pruning * operation can occur either. In the BRIN case that doesn't apply; I don't know whether this is safe in that case for some other reason. I also note that amcheck's bt_check_every_level can also call this without ShareLock. It doesn't need to set anyvisible because passing a snapshot bypasses the WARNINGs anyway, but it might have whatever problem the above comment is thinking about. Also, it's just cosmetic, but this comment definitely needs updating: /* * We could possibly get away with not locking the buffer here, * since caller should hold ShareLock on the relation, but let's * be conservative about it. (This remark is still correct even * with HOT-pruning: our pin on the buffer prevents pruning.) */ LockBuffer(hscan->rs_cbuf, BUFFER_LOCK_SHARE); One more thing. Assuming that there are no live bugs here, or that we fix them, another possible simplification would be to remove the anyvisible = true flag and have BRIN pass SnapshotNonVacuumable. SnapshotNonVacuumable returns true when HeapTupleSatisfiesVacuum doesn't return HEAPTUPLE_DEAD, so I think we'd get exactly the same behavior (again, modulo reltuples, which doesn't matter). heap_getnext() would perform functionally the same check as the bespoke code internally, and just wouldn't return the dead tuples in the first place. There's an assertion that would trip, but we could probably just change it. BRIN's callback might also get a different value for tupleIsAlive in some cases, but it ignores that value anyway. So to summarize: 1. Is this function really safe with < ShareLock? Both BRIN and amcheck think so, but the function itself isn't sure. If yes, we need to adapt the comments. If no, we need to think about some other fix. 2. anyvisible is a funny name given what the flag really does. Maybe we can simplify by replacing it with SnapshotNonVacuumable(). Otherwise maybe we should rename the flag. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: