Thread: amcheck verification for GiST and GIN

amcheck verification for GiST and GIN

From
Grigory Kryachko
Date:

Hello! My name is Grigory Kryachko, I decided to join the efforts with Andrey Borodin in his working on amcheck.

Here is the patch which I (with Andrey as my advisor) built on the top of the last patch from this thread: https://commitfest.postgresql.org/25/1800/ .
It adds an ability to verify validity  of GIN index. It is not polished yet, but it works and we wanted to show it to you so you can give us some feedback, and also let you know about this work if you have any plans of writing something like that yourselves, so that you do not redo what is already done.

In the mentioned above thread there was an issue with right type of lock, we have not addressed it yet. Right now I am primarily interested in feedback about GIN part.
Attachment

Re: amcheck verification for GiST and GIN

From
Peter Geoghegan
Date:
On Wed, May 27, 2020 at 10:11 AM Grigory Kryachko <gskryachko@gmail.com> wrote:
> Here is the patch which I (with Andrey as my advisor) built on the top of the last patch from this thread:
https://commitfest.postgresql.org/25/1800/. 
> It adds an ability to verify validity  of GIN index. It is not polished yet, but it works and we wanted to show it to
youso you can give us some feedback, and also let you know about this work if you have any plans of writing something
likethat yourselves, so that you do not redo what is already done. 

Can you rebase this patch, please?

Also suggest breaking out the series into distinct patch files using
"git format-patch master".

--
Peter Geoghegan



Re: amcheck verification for GiST and GIN

From
Heikki Linnakangas
Date:
On 07/08/2020 00:33, Peter Geoghegan wrote:
> On Wed, May 27, 2020 at 10:11 AM Grigory Kryachko <gskryachko@gmail.com> wrote:
>> Here is the patch which I (with Andrey as my advisor) built on the top of the last patch from this thread:
https://commitfest.postgresql.org/25/1800/.
 
>> It adds an ability to verify validity  of GIN index. It is not polished yet, but it works and we wanted to show it
toyou so you can give us some feedback, and also let you know about this work if you have any plans of writing
somethinglike that yourselves, so that you do not redo what is already done.
 
> 
> Can you rebase this patch, please?
> 
> Also suggest breaking out the series into distinct patch files using
> "git format-patch master".

I rebased the GIN parts of this patch, see attached. I also ran pgindent 
and made some other tiny cosmetic fixes, but I didn't review the patch, 
only rebased it in the state it was.

I was hoping that this would be useful to track down the bug we're 
discussing here: 
https://www.postgresql.org/message-id/CAJYBUS8aBQQL22oHsAwjHdwYfdB_NMzt7-sZxhxiOdEdn7cOkw%40mail.gmail.com. 
But now that I look what checks this performs, I doubt this will catch 
the kind of corruption that's happened there. I suspect it's more subtle 
than an inconsistencies between parent and child pages, because only a 
few rows are affected. But doesn't hurt to try.

- Heikki

Attachment

Re: amcheck verification for GiST and GIN

From
Nikolay Samokhvalov
Date:
Hello,

First of all, thank you all -- Andrey, Peter, Heikki and others -- for this work, GIN support in amcheck is *really* needed, especially for OS upgrades such as from Ubuntu 16.04 (which is EOL now) to 18.04 or 20.04

I was trying to check a bunch of GINs on some production after switching from Ubuntu 16.04 to 18.04 and got many errors. So decided to check for 16.04 first (that is still used on prod for that DB), without any OS/glibc changes.

On 16.04, I still saw errors and it was not really expected because this should mean that production is corrupted too. So, REINDEX should fix it. But it didn't -- see output below. I cannot give data and thinking how to create a synthetic demo of this. Any suggestions?

And is this a sign that the tool is wrong rather that we have a real corruption cases? (I assume if we did, we would see no errors after REINDEXing -- of course, if GIN itself doesn't have bugs).

Env: Ubuntu 16.04 (so, glibc 2.27), Postgres 12.7, patch from Heikki slightly adjusted to work with PG12 (

Before reindex:


INFO:  [2021-07-29 17:44:42.525+00] Processing 4/29: index: index_XXX_trigram (index relpages: 117935; heap tuples: ~379793)...

ERROR: index "index_XXX_trigram" has wrong tuple order, block 65754, offset 232


test=# reindex index index_XXX_trigram;

REINDEX



After REINDEX:


INFO:  [2021-07-29 18:01:23.339+00] Processing 4/29: index: index_XXX_trigram (index relpages: 70100; heap tuples: ~379793)...

ERROR: index "index_XXX_trigram" has wrong tuple order, block 70048, offset 253



On Thu, Jul 15, 2021 at 00:03 Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 07/08/2020 00:33, Peter Geoghegan wrote:
> On Wed, May 27, 2020 at 10:11 AM Grigory Kryachko <gskryachko@gmail.com> wrote:
>> Here is the patch which I (with Andrey as my advisor) built on the top of the last patch from this thread: https://commitfest.postgresql.org/25/1800/ .
>> It adds an ability to verify validity  of GIN index. It is not polished yet, but it works and we wanted to show it to you so you can give us some feedback, and also let you know about this work if you have any plans of writing something like that yourselves, so that you do not redo what is already done.
>
> Can you rebase this patch, please?
>
> Also suggest breaking out the series into distinct patch files using
> "git format-patch master".

I rebased the GIN parts of this patch, see attached. I also ran pgindent
and made some other tiny cosmetic fixes, but I didn't review the patch,
only rebased it in the state it was.

I was hoping that this would be useful to track down the bug we're
discussing here:
https://www.postgresql.org/message-id/CAJYBUS8aBQQL22oHsAwjHdwYfdB_NMzt7-sZxhxiOdEdn7cOkw%40mail.gmail.com.
But now that I look what checks this performs, I doubt this will catch
the kind of corruption that's happened there. I suspect it's more subtle
than an inconsistencies between parent and child pages, because only a
few rows are affected. But doesn't hurt to try.

- Heikki

Re: amcheck verification for GiST and GIN

From
Heikki Linnakangas
Date:
On 29/07/2021 21:34, Nikolay Samokhvalov wrote:
> I was trying to check a bunch of GINs on some production after switching 
> from Ubuntu 16.04 to 18.04 and got many errors. So decided to check for 
> 16.04 first (that is still used on prod for that DB), without any 
> OS/glibc changes.
> 
> On 16.04, I still saw errors and it was not really expected because this 
> should mean that production is corrupted too. So, REINDEX should fix it. 
> But it didn't -- see output below. I cannot give data and thinking how 
> to create a synthetic demo of this. Any suggestions?
> 
> And is this a sign that the tool is wrong rather that we have a real 
> corruption cases? (I assume if we did, we would see no errors after 
> REINDEXing -- of course, if GIN itself doesn't have bugs).
> 
> Env: Ubuntu 16.04 (so, glibc 2.27), Postgres 12.7, patch from Heikki 
> slightly adjusted to work with PG12 (
> https://gitlab.com/postgres/postgres/-/merge_requests/5 
> <https://gitlab.com/postgres/postgres/-/merge_requests/5>) snippet used 
> to run amcheck:
> https://gitlab.com/-/snippets/2001962 
> <https://gitlab.com/-/snippets/2001962> (see file #3)

Almost certainly the tool is wrong. We went back and forth a few times 
with Pawel, fixing various bugs in the amcheck patch at this thread: 
https://www.postgresql.org/message-id/9fdbb584-1e10-6a55-ecc2-9ba8b5dca1cf%40iki.fi. 
Can you try again with the latest patch version from that thread, 
please? That's v5-0001-Amcheck-for-GIN-13stable.patch.

- Heikki



Re: amcheck verification for GiST and GIN

From
Nikolay Samokhvalov
Date:
Thank you, v5 didn't find any issues at all. One thing: for my 29 indexes, the tool generated output 3.5 GiB. I guess many INFO messages should be downgraded to something like DEBUG1?

On Fri, Jul 30, 2021 at 2:35 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 29/07/2021 21:34, Nikolay Samokhvalov wrote:
> I was trying to check a bunch of GINs on some production after switching
> from Ubuntu 16.04 to 18.04 and got many errors. So decided to check for
> 16.04 first (that is still used on prod for that DB), without any
> OS/glibc changes.
>
> On 16.04, I still saw errors and it was not really expected because this
> should mean that production is corrupted too. So, REINDEX should fix it.
> But it didn't -- see output below. I cannot give data and thinking how
> to create a synthetic demo of this. Any suggestions?
>
> And is this a sign that the tool is wrong rather that we have a real
> corruption cases? (I assume if we did, we would see no errors after
> REINDEXing -- of course, if GIN itself doesn't have bugs).
>
> Env: Ubuntu 16.04 (so, glibc 2.27), Postgres 12.7, patch from Heikki
> slightly adjusted to work with PG12 (
> https://gitlab.com/postgres/postgres/-/merge_requests/5
> <https://gitlab.com/postgres/postgres/-/merge_requests/5>) snippet used
> to run amcheck:
> https://gitlab.com/-/snippets/2001962
> <https://gitlab.com/-/snippets/2001962> (see file #3)

Almost certainly the tool is wrong. We went back and forth a few times
with Pawel, fixing various bugs in the amcheck patch at this thread:
https://www.postgresql.org/message-id/9fdbb584-1e10-6a55-ecc2-9ba8b5dca1cf%40iki.fi.
Can you try again with the latest patch version from that thread,
please? That's v5-0001-Amcheck-for-GIN-13stable.patch.

- Heikki