Re: Failure in contrib test _int on loach - Mailing list pgsql-hackers

From Anastasia Lubennikova
Subject Re: Failure in contrib test _int on loach
Date
Msg-id 00873b28-8d7e-72ef-bb8f-0a7f5dfc64b4@postgrespro.ru
Whole thread Raw
In response to Re: Failure in contrib test _int on loach  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Failure in contrib test _int on loach  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
List pgsql-hackers
05.04.2019 18:01, Tom Lane writes:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> On Fri, Apr 5, 2019 at 2:02 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>>> This is a strange failure:
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=loach&dt=2019-04-05%2005%3A15%3A00
>>> [ wrong answers from queries using a GIST index ]
>> There are a couple of other recent instances of this failure, on
>> francolin and whelk.
> Yeah.  Given three failures in a couple of days, we can reasonably
> guess that the problem was introduced within a day or two prior to
> the first one.  Looking at what's touched GIST in that time frame,
> suspicion has to fall heavily on 9155580fd5fc2a0cbb23376dfca7cd21f59c2c7b.
>
> If I had to bet, I'd bet that there's something wrong with the
> machinations described in the commit message:
>      
>      For GiST, the LSN-NSN interlock makes this a little tricky. All pages must
>      be marked with a valid (i.e. non-zero) LSN, so that the parent-child
>      LSN-NSN interlock works correctly. We now use magic value 1 for that during
>      index build. Change the fake LSN counter to begin from 1000, so that 1 is
>      safely smaller than any real or fake LSN. 2 would've been enough for our
>      purposes, but let's reserve a bigger range, in case we need more special
>      values in the future.
>
> I'll go add this as an open issue.
>
>             regards, tom lane
>

Hi,
I've already noticed the same failure in our company buildfarm and 
started the research.

You are right, it's the " Generate less WAL during GiST, GIN and SP-GiST 
index build. " patch to blame.
Because of using the GistBuildLSN some pages are not linked correctly, 
so index scan cannot find some entries, while seqscan finds them.

In attachment, you can find patch with a test that allows to reproduce 
the bug not randomly, but on every run.
Now I'm trying to find a way to fix the issue.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

pgsql-hackers by date:

Previous
From: Alexis Andrieu
Date:
Subject: Small typo fix on tableam documentation
Next
From: Andres Freund
Date:
Subject: Re: Pluggable Storage - Andres's take