amcheck support for BRIN indexes - Mailing list pgsql-hackers
From | Arseniy Mukhin |
---|---|
Subject | amcheck support for BRIN indexes |
Date | |
Msg-id | CAE7r3MKZkMwuEeFU5M73Jk3yp+2CBr41a3tfvb3CHym+ysAVEA@mail.gmail.com Whole thread Raw |
List | pgsql-hackers |
Hi, It's not for v18, just wanted to share with the community and register it in the upcoming Commitfest 2025-07. Here is the patch with amcheck support for BRIN indexes. Patch uses amcheck common infrastructure that was introduced in [1]. It works and I deleted all the code that I copied from btree check at first. Great. During the check we hold ShareUpdateExclusiveLock, so we don't block regular reads/inserts/updates and the same time range summarizations/desummarizations are impossible during the check which simplifies check logic. While checking we do ereport(ERROR) on the first issue, the same way as btree, gin checks do. There are two parts: First part is 'index structure check': 1) Meta page check 2) Revmap check. Walk revmap and check every valid revmap item points to the index tuple with the expected range blkno, and index tuple is consistent with the tuple description. Also it's not documented, but known from the brin code that for every empty range we should have allnulls = true, hasnulls = false. So this is also checked here. 3) Regular pages check. Walk regular pages and check that every index tuple has a corresponding revmap item that points to it. We don't check index tuple structures here, because it was already done in 2 steps. Regular pages check is optional. Orphan index tuple errors in this check doesn't necessary mean that index is corrupted, but AFAIS brin is not supposed to have such orphan index tuples now, so if we encounter one than probably something wrong with the index. Second part is 'all consistent check': We check all heap tuples are consistent with the index. It's the most expensive part and it's optional. Here we call consistent functions for every heap tuple. Also here we check that fields like 'has_nulls', 'all_nulls', 'empty_range' are consistent with what we see in the heap. There are two patch files: 0001-brin-refactoring.patch It's just two tiny changes in the brin code. 1) For index tuple structure check we need to know how on-disk index tuples look like. Function that lets you get it 'brtuple_disk_tupdesc' is internal. This patch makes it extern. 2) Create macros for BRIN_MAX_PAGES_PER_RANGE. For the meta page check. 0002-amechek-brin-support.patch It's check functionality itself + regression tests + amcheck extension updates. Some open questions: 1) How to get the correct strategy number for the scan key in "all heap consistent" check. The consistent function wants a scan key, and to generate it we have to pick a strategy number. We can't just use the same strategy number for all indexes because its meaning differs from opclass to opclass (for example equal strategy number is 1 in Bloom and 3 in min_max). We also can't pick an operator and use it for every check, because opclasses don't have any requirements about what operators they should support. The solution was to let user to define check operator (parameter consistent_operator_names). It's an array as we can have a multicolumn index. We can use '=' as default check operator, because AFAIS '=' operator is supported by all core opclasses except 'box_inclusion_ops', and IMHO it's the most obvious candidate for such a check. So if param 'consistent_operator_names' is an empty array (param default value), then for all attributes we use operator '='. In most cases operator '=' does the job and you don't need to worry about consistent_operator_names param. Not sure about it, what do you think? 2) The current implementation of "all heap consistent" uses the scan of the entire heap. So if we have a lot of unsummarized ranges, we do a lot of wasted work because we can't use the tuples that belong to the unsummarized ranges. Instead of having one scan for the entire heap, we can walk the revmap, take only the summarized ranges, and scan only the pages that belong to those ranges. So we have one scan per range. This approach helps us avoid touching those heap tuples that we can't use for index checks. But I'm not sure if we should to worry about that because every autovacuum summarizes all the unsummarized ranges, so don't expect a lot of unsummarized ranges on average. TODO list: 1) add TAP tests 2) update SGML docs for amcheck (think it's better to do after patch is reviewed and more or less finalized) 3) pg_amcheck integration Big thanks to Tomas Vondra for the first patch idea and initial review. [1] https://www.postgresql.org/message-id/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru Best regards, Arseniy Mukhin
Attachment
pgsql-hackers by date: