Re: Amcheck verification of GiST and GIN - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Amcheck verification of GiST and GIN
Date
Msg-id CAH2-WzmUuuyZJkAnDUnP_TmfQMsa1JeO5yimcAeb5qczGB-0kg@mail.gmail.com
Whole thread Raw
In response to Re: Amcheck verification of GiST and GIN  (Andrey Borodin <amborodin86@gmail.com>)
Responses Re: Amcheck verification of GiST and GIN
List pgsql-hackers
On Sun, Feb 5, 2023 at 4:45 PM Andrey Borodin <amborodin86@gmail.com> wrote:
> Here's v24 == (v23 + a step for pg_amcheck). There's a lot of
> shotgun-style changes, but I hope next index types will be easy to add
> now.

Some feedback on the GiST patch:

* You forgot to initialize GistCheckState.heaptuplespresent to 0.

It might be better to allocate GistCheckState dynamically, using
palloc0(). That's future proof. "Simple and obvious" is usually the
most important goal for managing memory in amcheck code. It can be a
little inefficient if that makes it simpler.

* ISTM that gist_index_check() should allow the caller to omit a
"heapallindexed" argument by specifying "DEFAULT FALSE", for
consistency with bt_index_check().

(Actually there are two versions of bt_index_check(), with
overloading, but that's just because of the way that the extension
evolved over time).

* What's the point in having a custom memory context that is never reset?

I believe that gistgetadjusted() will leak memory here, so there is a
need for some kind of high level strategy for managing memory. The
strategy within verify_nbtree.c is to call MemoryContextReset() right
after every loop iteration within bt_check_level_from_leftmost() --
which is pretty much once every call to bt_target_page_check(). That
kind of approach is obviously not going to suffer any memory leaks.

Again, "simple and obvious" is good for memory management in amcheck.

* ISTM that it would be clearer if the per-page code within
gist_check_parent_keys_consistency() was broken out into its own
function -- a little like bt_target_page_check()..

That way the control flow would be easier to understand when looking
at the code at a high level.

* ISTM that gist_refind_parent() should throw an error about
corruption in the event of a parent page somehow becoming a leaf page.

Obviously this is never supposed to happen, and likely never will
happen, even with corruption. But it seems like a good idea to make
the most conservative possible assumption by throwing an error. If it
never happens anyway, then the fact that we handle it with an error
won't matter -- so the error is harmless. If it does happen then we'll
want to hear about it as soon as possible -- so the error is useful.

* I suggest using c99 style variable declarations in loops.

Especially for things like "for (OffsetNumber offset =
FirstOffsetNumber; ... ; ... )".

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: improving user.c error messages
Next
From: Andres Freund
Date:
Subject: Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?