Re: BUG #17774: Assert triggered on brin_minmax_multi.c - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: BUG #17774: Assert triggered on brin_minmax_multi.c
Date
Msg-id 5a8acd6d-79ba-1017-e1df-46dd948347f1@enterprisedb.com
Whole thread Raw
In response to Re: BUG #17774: Assert triggered on brin_minmax_multi.c  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: BUG #17774: Assert triggered on brin_minmax_multi.c  (Dmitry Dolgov <9erthalion6@gmail.com>)
List pgsql-bugs

On 2/9/23 04:53, John Naylor wrote:
> 
> On Wed, Feb 8, 2023 at 11:05 PM Tom Lane <tgl@sss.pgh.pa.us
> <mailto:tgl@sss.pgh.pa.us>> wrote:
>>
>> Dmitry Dolgov <9erthalion6@gmail.com <mailto:9erthalion6@gmail.com>>
> writes:
>> >> On Wed, Feb 08, 2023 at 10:03:01AM -0500, Tom Lane wrote:
>> >> I'd believe this argument more readily if the calculation weren't being
>> >> done in float arithmetic.  Since it is, you're at the mercy of roundoff
>> >> error ... and that small negative delta could certainly pass for
>> >> roundoff error.
>>
>> > Hmm...yeah, good point. In both the reproducer I've posted and the
>> > backtrace from the thread the delta is indeed rather small.
>>
>> I bet also it only fails when dealing with IPv6 addresses.
>> With 32-bit IPv4 addresses, a float8 would have enough mantissa
>> bits that the calculation wouldn't become imprecise.
> 
> Addresses from different families are treated as a distance of one, so I
> don't think inserting a single IPv6 address would get this far, and in
> fact I still get a crash in an empty table by taking out the IPv6
> address from Dmitry's example:
> 
> insert into brin_test values('127.0.0.1/0' <http://127.0.0.1/0'>);
> insert into brin_test values('0.0.0.0/12' <http://0.0.0.0/12'>);
> 
> Adding some debug calls shows it starts off negative from the start:
> 
> =# insert into brin_test values('127.0.0.1/0' <http://127.0.0.1/0'>);
> INSERT 0 1
> =#  insert into brin_test values('0.0.0.0/12' <http://0.0.0.0/12'>);
> NOTICE:  idx: 3 before div: -1.000000
> NOTICE:  idx: 3 after div: -0.003906
> NOTICE:  idx: 2 before div: -0.003906
> NOTICE:  idx: 2 after div: -0.000015
> NOTICE:  idx: 1 before div: -0.000015
> NOTICE:  idx: 1 after div: -0.000000
> NOTICE:  idx: 0 before div: -0.000000
> NOTICE:  idx: 0 after div: -0.000000
> 
> ...so something else is wrong here but I haven't dug deeper yet.
> 

I believe the bug is pretty trivial - the code applies the netmask
incorrectly, so that with 127.0.0.1/0 it ends with 0.0.0.1, and because
it assumes 0.0.0.1 < 0.0.0.0 it ends with negative delta.

In particular, the issue is that the code does this:

    lena = ip_bits(ipa);     -- 0
    len = ip_addrsize(ipa);  -- 4

    for (for (i = 0; i < len; i++)
    {
      nbits = lena - (i * 8);
      ...
      mask = (0xFF << (8 - nbits));
      ...
    }

But for 127.0.0.1/0 we get lena=0, so for i>0 nbits gets negative, and
the shift is probably going to do something silly (not sure what
exactly, but AFAICS it's undefined behavior).

Attached is a fixup that resolves this failure for me. I need to look a
bit closer if there are some other issues (e.g. with the float rounding
errors, etc.).

Good thing is this is this won't break anything without the assert - we
may pick incorrect ranges to merge, so the index is a bit less
efficient, but still valid/correct.


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

pgsql-bugs by date:

Previous
From: Andrey Lizenko
Date:
Subject: Re: BUG #17784: broken URL in 15.2 release notes
Next
From: "Jonathan S. Katz"
Date:
Subject: Re: BUG #17784: broken URL in 15.2 release notes