Thread: Crash in BRIN minmax-multi indexes

Crash in BRIN minmax-multi indexes

From
Jaime Casanova
Date:
Hi,

Just found $SUBJECT involving time with time zone and a subselect. I
still don't have narrowed to the exact table/index minimal schema but
if you run this query on the regression database it will creash.

```
update public.brintest_multi set
  timetzcol = (select tz from generate_series('2021-01-01'::timestamp
with time zone, '2021-01-31', '5 days') tz limit 1)
;
```

attached a backtrace. Let me know if you need extra information.

--
Jaime Casanova
Director de Servicios Profesionales
SYSTEMGUARDS - Consultores de PostgreSQL

Attachment

Re: Crash in BRIN minmax-multi indexes

From
Zhihong Yu
Date:
Hi,
In build_distances():

        a1 = eranges[i].maxval;
        a2 = eranges[i + 1].minval;

It seems there was overlap between the successive ranges, leading to
delta = -6785000000

FYI

On Wed, Mar 31, 2021 at 10:30 AM Jaime Casanova <jcasanov@systemguards.com.ec> wrote:
Hi,

Just found $SUBJECT involving time with time zone and a subselect. I
still don't have narrowed to the exact table/index minimal schema but
if you run this query on the regression database it will creash.

```
update public.brintest_multi set
  timetzcol = (select tz from generate_series('2021-01-01'::timestamp
with time zone, '2021-01-31', '5 days') tz limit 1)
;
```

attached a backtrace. Let me know if you need extra information.

--
Jaime Casanova
Director de Servicios Profesionales
SYSTEMGUARDS - Consultores de PostgreSQL

Re: Crash in BRIN minmax-multi indexes

From
Tomas Vondra
Date:
On 3/31/21 8:20 PM, Zhihong Yu wrote:
> Hi,
> In build_distances():
> 
>         a1 = eranges[i].maxval;
>         a2 = eranges[i + 1].minval;
> 
> It seems there was overlap between the successive ranges, leading to
> delta = -6785000000
> 

I've been unable to reproduce this, so far :-( How exactly did you
manage to reproduce it?


The thing is - how could there be an overlap? The way we build expanded
ranges that should not be possible, I think. Can you print the ranges at
the end of fill_expanded_ranges? That should shed some light on this.


FWIW I suspect those asserts on delta may be a bit problematic due to
rounding errors. And I found one issue in the inet distance function,
because apparently

test=# select '10.2.14.243/24'::inet < '10.2.14.231/24'::inet;
 ?column?
----------
 f
(1 row)

but the delta formula currently ignores the mask. But that's a separate
issue.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Crash in BRIN minmax-multi indexes

From
Tomas Vondra
Date:
Hi,

I think I found the issue - it's kinda obvious, really. We need to
consider the timezone, because the "time" parts alone may be sorted
differently. The attached patch should fix this, and it also fixes a
similar issue in the inet data type.

As for why the regression tests did not catch this, it's most likely
because the data is likely generated in "nice" ordering, or something
like that. I'll see if I can tweak the ordering to trigger these issues
reliably, and I'll do a bit more randomized testing.

There's also the question of rounding errors, which I think might cause
random assert failures (but in practice it's harmless, in the worst case
we'll merge the ranges a bit differently).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: Crash in BRIN minmax-multi indexes

From
Zhihong Yu
Date:
Hi,
For inet data type fix:

+       unsigned char a = addra[i];
+       unsigned char b = addrb[i];
+
+       if (i >= lena)
+           a = 0;
+
+       if (i >= lenb)
+           b = 0;

Should the length check precede the addra[i] ?
Something like:

       unsigned char a;
       if (i >= lena) a = 0;
       else a = addra[i];

Cheers

On Wed, Mar 31, 2021 at 3:25 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
Hi,

I think I found the issue - it's kinda obvious, really. We need to
consider the timezone, because the "time" parts alone may be sorted
differently. The attached patch should fix this, and it also fixes a
similar issue in the inet data type.

As for why the regression tests did not catch this, it's most likely
because the data is likely generated in "nice" ordering, or something
like that. I'll see if I can tweak the ordering to trigger these issues
reliably, and I'll do a bit more randomized testing.

There's also the question of rounding errors, which I think might cause
random assert failures (but in practice it's harmless, in the worst case
we'll merge the ranges a bit differently).


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Crash in BRIN minmax-multi indexes

From
Tomas Vondra
Date:
On 4/1/21 12:53 AM, Zhihong Yu wrote:
> Hi,
> For inet data type fix:
> 
> +       unsigned char a = addra[i];
> +       unsigned char b = addrb[i];
> +
> +       if (i >= lena)
> +           a = 0;
> +
> +       if (i >= lenb)
> +           b = 0;
> 
> Should the length check precede the addra[i] ?
> Something like:
> 
>        unsigned char a;
>        if (i >= lena) a = 0;
>        else a = addra[i];
> 

I don't think that makes any difference. We know the bytes are there, we
just want to ignore / reset them in some cases.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Crash in BRIN minmax-multi indexes

From
Jaime Casanova
Date:
On Wed, Mar 31, 2021 at 5:25 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> Hi,
>
> I think I found the issue - it's kinda obvious, really. We need to
> consider the timezone, because the "time" parts alone may be sorted
> differently. The attached patch should fix this, and it also fixes a
> similar issue in the inet data type.
>

ah! yeah! obvious... if you say so ;)

> As for why the regression tests did not catch this, it's most likely
> because the data is likely generated in "nice" ordering, or something
> like that. I'll see if I can tweak the ordering to trigger these issues
> reliably, and I'll do a bit more randomized testing.
>
> There's also the question of rounding errors, which I think might cause
> random assert failures (but in practice it's harmless, in the worst case
> we'll merge the ranges a bit differently).
>
>

I can confirm this fixes the crash in the query I showed and the original case.

-- 
Jaime Casanova
Director de Servicios Profesionales
SYSTEMGUARDS - Consultores de PostgreSQL



Re: Crash in BRIN minmax-multi indexes

From
Zhihong Yu
Date:
Hi,
-       delta += (float8) addrb[i] - (float8) addra[i];
-       delta /= 256;
...
+       delta /= 255;

May I know why the divisor was changed ?

Thanks

On Wed, Mar 31, 2021 at 3:25 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
Hi,

I think I found the issue - it's kinda obvious, really. We need to
consider the timezone, because the "time" parts alone may be sorted
differently. The attached patch should fix this, and it also fixes a
similar issue in the inet data type.

As for why the regression tests did not catch this, it's most likely
because the data is likely generated in "nice" ordering, or something
like that. I'll see if I can tweak the ordering to trigger these issues
reliably, and I'll do a bit more randomized testing.

There's also the question of rounding errors, which I think might cause
random assert failures (but in practice it's harmless, in the worst case
we'll merge the ranges a bit differently).


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Crash in BRIN minmax-multi indexes

From
Tomas Vondra
Date:
On 4/1/21 3:22 AM, Zhihong Yu wrote:
> Hi,
> -       delta += (float8) addrb[i] - (float8) addra[i];
> -       delta /= 256;
> ...
> +       delta /= 255;
> 
> May I know why the divisor was changed ?
> 

Yeah, that's a mistake, it should remain 256. Consider two subtractions

1.1.2.255 - 1.1.1.0 = [0, 0, 1, 255]

1.1.2.255 - 1.1.0.255 = [0, 0, 2, 0]

With the divisor being 255 those would be the same (2 * 256), but we
want the first one to be a bit smaller. It's also consistent with how
inet does subtractions:

test=# select '1.1.2.255'::inet - '1.1.0.255'::inet;
 ?column?
----------
      512
(1 row)

test=# select '1.1.2.255'::inet - '1.1.1.0'::inet;
 ?column?
----------
      511
(1 row)

So I'll keep the 256.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Crash in BRIN minmax-multi indexes

From
Thomas Munro
Date:
On Thu, Apr 1, 2021 at 11:25 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> As for why the regression tests did not catch this, it's most likely
> because the data is likely generated in "nice" ordering, or something
> like that. I'll see if I can tweak the ordering to trigger these issues
> reliably, and I'll do a bit more randomized testing.

For what little it's worth now that you've cracked it, I can report
that make check blows up somewhere in here on a 32 bit system with
--with-blocksize=32 :-)



Re: Crash in BRIN minmax-multi indexes

From
Jaime Casanova
Date:
On Wed, Mar 31, 2021 at 6:19 PM Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
>
> On Wed, Mar 31, 2021 at 5:25 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
> >
> > Hi,
> >
> > I think I found the issue - it's kinda obvious, really. We need to
> > consider the timezone, because the "time" parts alone may be sorted
> > differently. The attached patch should fix this, and it also fixes a
> > similar issue in the inet data type.
> >
>
> ah! yeah! obvious... if you say so ;)
>
> > As for why the regression tests did not catch this, it's most likely
> > because the data is likely generated in "nice" ordering, or something
> > like that. I'll see if I can tweak the ordering to trigger these issues
> > reliably, and I'll do a bit more randomized testing.
> >
> > There's also the question of rounding errors, which I think might cause
> > random assert failures (but in practice it's harmless, in the worst case
> > we'll merge the ranges a bit differently).
> >
> >
>
> I can confirm this fixes the crash in the query I showed and the original case.
>

But I found another, but similar issue.

```
update public.brintest_multi set
  intervalcol = (select pg_catalog.avg(intervalcol) from public.brintest_bloom)
;
```

BTW, i can reproduce just by executing "make installcheck" and
immediately execute that query

--
Jaime Casanova
Director de Servicios Profesionales
SYSTEMGUARDS - Consultores de PostgreSQL

Attachment

Re: Crash in BRIN minmax-multi indexes

From
Zhihong Yu
Date:
Hi,
Can you try this patch ?

Thanks

diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index 70109960e8..25d6d2e274 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2161,7 +2161,7 @@ brin_minmax_multi_distance_interval(PG_FUNCTION_ARGS)
     delta = 24L * 3600L * delta;

     /* and add the time part */
-    delta += result->time / (float8) 1000000.0;
+    delta += (result->time + result->zone * USECS_PER_SEC) / (float8) 1000000.0;

     Assert(delta >= 0);

On Wed, Mar 31, 2021 at 11:25 PM Jaime Casanova <jcasanov@systemguards.com.ec> wrote:
On Wed, Mar 31, 2021 at 6:19 PM Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
>
> On Wed, Mar 31, 2021 at 5:25 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
> >
> > Hi,
> >
> > I think I found the issue - it's kinda obvious, really. We need to
> > consider the timezone, because the "time" parts alone may be sorted
> > differently. The attached patch should fix this, and it also fixes a
> > similar issue in the inet data type.
> >
>
> ah! yeah! obvious... if you say so ;)
>
> > As for why the regression tests did not catch this, it's most likely
> > because the data is likely generated in "nice" ordering, or something
> > like that. I'll see if I can tweak the ordering to trigger these issues
> > reliably, and I'll do a bit more randomized testing.
> >
> > There's also the question of rounding errors, which I think might cause
> > random assert failures (but in practice it's harmless, in the worst case
> > we'll merge the ranges a bit differently).
> >
> >
>
> I can confirm this fixes the crash in the query I showed and the original case.
>

But I found another, but similar issue.

```
update public.brintest_multi set
  intervalcol = (select pg_catalog.avg(intervalcol) from public.brintest_bloom)
;
```

BTW, i can reproduce just by executing "make installcheck" and
immediately execute that query

--
Jaime Casanova
Director de Servicios Profesionales
SYSTEMGUARDS - Consultores de PostgreSQL

Re: Crash in BRIN minmax-multi indexes

From
Tomas Vondra
Date:
On 4/1/21 3:09 PM, Zhihong Yu wrote:
> Hi,
> Can you try this patch ?
> 
> Thanks
> 
> diff --git a/src/backend/access/brin/brin_minmax_multi.c
> b/src/backend/access/brin/brin_minmax_multi.c
> index 70109960e8..25d6d2e274 100644
> --- a/src/backend/access/brin/brin_minmax_multi.c
> +++ b/src/backend/access/brin/brin_minmax_multi.c
> @@ -2161,7 +2161,7 @@ brin_minmax_multi_distance_interval(PG_FUNCTION_ARGS)
>      delta = 24L * 3600L * delta;
> 
>      /* and add the time part */
> -    delta += result->time / (float8) 1000000.0;
> +    delta += (result->time + result->zone * USECS_PER_SEC) / (float8)
> 1000000.0;
> 

That won't work, because Interval does not have a "zone" field, so this
won't even compile.

The problem is that interval comparisons convert the value using 30 days
per month (see interval_cmp_value), but the formula in this function
uses 31. So either we can tweak that (seems to fix it for me), or maybe
just switch to interval_cmp_value directly.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Crash in BRIN minmax-multi indexes

From
Zhihong Yu
Date:
Hi, Tomas:
Thanks for the correction.

I think switching to interval_cmp_value() would be better (with a comment explaining why).

Cheers

On Thu, Apr 1, 2021 at 6:23 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
On 4/1/21 3:09 PM, Zhihong Yu wrote:
> Hi,
> Can you try this patch ?
>
> Thanks
>
> diff --git a/src/backend/access/brin/brin_minmax_multi.c
> b/src/backend/access/brin/brin_minmax_multi.c
> index 70109960e8..25d6d2e274 100644
> --- a/src/backend/access/brin/brin_minmax_multi.c
> +++ b/src/backend/access/brin/brin_minmax_multi.c
> @@ -2161,7 +2161,7 @@ brin_minmax_multi_distance_interval(PG_FUNCTION_ARGS)
>      delta = 24L * 3600L * delta;
>
>      /* and add the time part */
> -    delta += result->time / (float8) 1000000.0;
> +    delta += (result->time + result->zone * USECS_PER_SEC) / (float8)
> 1000000.0;
>

That won't work, because Interval does not have a "zone" field, so this
won't even compile.

The problem is that interval comparisons convert the value using 30 days
per month (see interval_cmp_value), but the formula in this function
uses 31. So either we can tweak that (seems to fix it for me), or maybe
just switch to interval_cmp_value directly.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Crash in BRIN minmax-multi indexes

From
Jaime Casanova
Date:
On Thu, Apr 01, 2021 at 03:22:59PM +0200, Tomas Vondra wrote:
> On 4/1/21 3:09 PM, Zhihong Yu wrote:
> > Hi,
> > Can you try this patch ?
> > 
> > Thanks
> > 
> > diff --git a/src/backend/access/brin/brin_minmax_multi.c
> > b/src/backend/access/brin/brin_minmax_multi.c
> > index 70109960e8..25d6d2e274 100644
> > --- a/src/backend/access/brin/brin_minmax_multi.c
> > +++ b/src/backend/access/brin/brin_minmax_multi.c
> > @@ -2161,7 +2161,7 @@ brin_minmax_multi_distance_interval(PG_FUNCTION_ARGS)
> >      delta = 24L * 3600L * delta;
> > 
> >      /* and add the time part */
> > -    delta += result->time / (float8) 1000000.0;
> > +    delta += (result->time + result->zone * USECS_PER_SEC) / (float8)
> > 1000000.0;
> > 
> 
> That won't work, because Interval does not have a "zone" field, so this
> won't even compile.
> 
> The problem is that interval comparisons convert the value using 30 days
> per month (see interval_cmp_value), but the formula in this function
> uses 31. So either we can tweak that (seems to fix it for me), or maybe
> just switch to interval_cmp_value directly.
> 

Changing to using month of 30 days on the formula fixed it.

and I found another issue, this time involves autovacuum which makes it
a little more complicated to reproduce.

Currently the only stable way to reproduce it is using pgbench:

pgbench -i postgres
psql -c "CREATE INDEX ON pgbench_history USING brin (tid int4_minmax_multi_ops);" postgres
pgbench -c2 -j2 -T 300 -n postgres

Attached a backtrace

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Attachment

Re: Crash in BRIN minmax-multi indexes

From
Tomas Vondra
Date:
On 4/4/21 7:25 AM, Jaime Casanova wrote:
> ...
> Changing to using month of 30 days on the formula fixed it.
> 

I've pushed fixes for all the bugs reported in this thread so far
(mostly distance calculations, ...), and one bug (swapped operator
parameters in one place) I discovered while working on the fixes.

> and I found another issue, this time involves autovacuum which makes it
> a little more complicated to reproduce.
> 
> Currently the only stable way to reproduce it is using pgbench:
> 
> pgbench -i postgres
> psql -c "CREATE INDEX ON pgbench_history USING brin (tid int4_minmax_multi_ops);" postgres
> pgbench -c2 -j2 -T 300 -n postgres
> 

Fixed and pushed too.

Turned out to be a silly bug in forgetting to remember the number of
ranges after deduplication, which sometimes resulted in assert failure.
It's a bit hard to trigger because concurrency / good timing is needed
while summarizing the range, requiring a call to "union" function. Just
running the pgbench did not trigger the issue for me, I had to manually
call the brin_summarize_new_values().

For the record, I did a lot of testing with data randomized in various
ways - the scripts are available here:

https://github.com/tvondra/brin-randomized-tests

It was focused on discovering issues in the distance functions, and
comparing results with/without the index. Maybe the next step should be
adding some changes to the data, which might trigger more issues like
this one.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Crash in BRIN minmax-multi indexes

From
Tomas Vondra
Date:
BTW, for the inet data type, I considered simply calling the "minus"
function, but that does not work because of this strange behavior:


int4=# select '10.1.1.102/32'::inet > '10.1.1.142/24'::inet;
 ?column?
----------
 t
(1 row)

int4=# select '10.1.1.102/32'::inet - '10.1.1.142/24'::inet;
 ?column?
----------
      -40
(1 row)


That is, (a>b) but then (a-b) < 0. AFAICS it's due to comparator
considering the mask, while the minus ignores it. I find it a bit
strange, but I assume it's intentional.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Crash in BRIN minmax-multi indexes

From
Jaime Casanova
Date:
On Sun, Apr 04, 2021 at 07:52:50PM +0200, Tomas Vondra wrote:
> On 4/4/21 7:25 AM, Jaime Casanova wrote:
> > 
> > pgbench -i postgres
> > psql -c "CREATE INDEX ON pgbench_history USING brin (tid int4_minmax_multi_ops);" postgres
> > pgbench -c2 -j2 -T 300 -n postgres
> > 
> 
> Fixed and pushed too.
> 
> Turned out to be a silly bug in forgetting to remember the number of
> ranges after deduplication, which sometimes resulted in assert failure.
> It's a bit hard to trigger because concurrency / good timing is needed
> while summarizing the range, requiring a call to "union" function. Just
> running the pgbench did not trigger the issue for me, I had to manually
> call the brin_summarize_new_values().
> 
> For the record, I did a lot of testing with data randomized in various
> ways - the scripts are available here:
> 
> https://github.com/tvondra/brin-randomized-tests
> 
> It was focused on discovering issues in the distance functions, and
> comparing results with/without the index. Maybe the next step should be
> adding some changes to the data, which might trigger more issues like
> this one.
> 

Just found one more ocurrance of this one with this index while an
autovacuum was running:

"""
CREATE INDEX bt_f8_heap_seqno_idx 
    ON public.bt_f8_heap 
    USING brin (seqno float8_minmax_multi_ops);
"""

Attached is a backtrace.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Attachment

Re: Crash in BRIN minmax-multi indexes

From
Tomas Vondra
Date:
On 9/29/22 08:53, Jaime Casanova wrote:
> ...
> 
> Just found one more ocurrance of this one with this index while an
> autovacuum was running:
> 
> """
> CREATE INDEX bt_f8_heap_seqno_idx 
>     ON public.bt_f8_heap 
>     USING brin (seqno float8_minmax_multi_ops);
> """
> Attached is a backtrace.

Thanks for the report!

I think I see the issue - brin_minmax_multi_union does not realize the
two summaries could have just one range each, and those can overlap so
that merge_overlapping_ranges combines them into a single one.

This is harmless, except that the assert int build_distances is overly
strict. Not sure if we should just remove the assert or only compute the
distances with (neranges>1).

Do you happen to have the core dump? It'd be useful to look at ranges_a
and ranges_b, to confirm this is indeed what's happening.

If not, how reproducible is it?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Crash in BRIN minmax-multi indexes

From
Jaime Casanova
Date:
On Mon, Oct 03, 2022 at 07:53:34PM +0200, Tomas Vondra wrote:
> On 9/29/22 08:53, Jaime Casanova wrote:
> > ...
> > 
> > Just found one more ocurrance of this one with this index while an
> > autovacuum was running:
> > 
> > """
> > CREATE INDEX bt_f8_heap_seqno_idx 
> >     ON public.bt_f8_heap 
> >     USING brin (seqno float8_minmax_multi_ops);
> > """
> > Attached is a backtrace.
> 
> Thanks for the report!
> 
> I think I see the issue - brin_minmax_multi_union does not realize the
> two summaries could have just one range each, and those can overlap so
> that merge_overlapping_ranges combines them into a single one.
> 
> This is harmless, except that the assert int build_distances is overly
> strict. Not sure if we should just remove the assert or only compute the
> distances with (neranges>1).
> 
> Do you happen to have the core dump? It'd be useful to look at ranges_a
> and ranges_b, to confirm this is indeed what's happening.
> 

I do have it.

(gdb) p *ranges_a
$4 = {
  typid = 701,
  colloid = 0,
  attno = 0,
  cmp = 0x0,
  nranges = 0,
  nsorted = 1,
  nvalues = 1,
  maxvalues = 32,
  target_maxvalues = 32,
  values = 0x55d2ea1987c8
}
(gdb) p *ranges_b
$5 = {
  typid = 701,
  colloid = 0,
  attno = 0,
  cmp = 0x0,
  nranges = 0,
  nsorted = 1,
  nvalues = 1,
  maxvalues = 32,
  target_maxvalues = 32,
  values = 0x55d2ea196da8
}

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL



Re: Crash in BRIN minmax-multi indexes

From
Tomas Vondra
Date:
On 10/3/22 21:25, Jaime Casanova wrote:
> On Mon, Oct 03, 2022 at 07:53:34PM +0200, Tomas Vondra wrote:
>> On 9/29/22 08:53, Jaime Casanova wrote:
>>> ...
>>>
>>> Just found one more ocurrance of this one with this index while an
>>> autovacuum was running:
>>>
>>> """
>>> CREATE INDEX bt_f8_heap_seqno_idx 
>>>     ON public.bt_f8_heap 
>>>     USING brin (seqno float8_minmax_multi_ops);
>>> """
>>> Attached is a backtrace.
>>
>> Thanks for the report!
>>
>> I think I see the issue - brin_minmax_multi_union does not realize the
>> two summaries could have just one range each, and those can overlap so
>> that merge_overlapping_ranges combines them into a single one.
>>
>> This is harmless, except that the assert int build_distances is overly
>> strict. Not sure if we should just remove the assert or only compute the
>> distances with (neranges>1).
>>
>> Do you happen to have the core dump? It'd be useful to look at ranges_a
>> and ranges_b, to confirm this is indeed what's happening.
>>
> 
> I do have it.
> 
> (gdb) p *ranges_a
> $4 = {
>   typid = 701,
>   colloid = 0,
>   attno = 0,
>   cmp = 0x0,
>   nranges = 0,
>   nsorted = 1,
>   nvalues = 1,
>   maxvalues = 32,
>   target_maxvalues = 32,
>   values = 0x55d2ea1987c8
> }
> (gdb) p *ranges_b
> $5 = {
>   typid = 701,
>   colloid = 0,
>   attno = 0,
>   cmp = 0x0,
>   nranges = 0,
>   nsorted = 1,
>   nvalues = 1,
>   maxvalues = 32,
>   target_maxvalues = 32,
>   values = 0x55d2ea196da8
> }
> 

Thanks. That mostly confirms my theory. I'd bet that this

(gdb) p ranges_a->values[0]
(gdb) p ranges_b->values[0]

will print the same value. I've been able to reproduce this, but it's
pretty difficult to get the timing right (and it requires table with
just a single value in that BRIN range).

I'll get this fixed in a couple days. Considering the benign nature of
this issue (unnecessary assert) I'm not going to rush.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Crash in BRIN minmax-multi indexes

From
Jaime Casanova
Date:
On Mon, Oct 03, 2022 at 10:29:38PM +0200, Tomas Vondra wrote:
> On 10/3/22 21:25, Jaime Casanova wrote:
> > On Mon, Oct 03, 2022 at 07:53:34PM +0200, Tomas Vondra wrote:
> >> On 9/29/22 08:53, Jaime Casanova wrote:
> >>> ...
> >>>
> >>> Just found one more ocurrance of this one with this index while an
> >>> autovacuum was running:
> >>>
> >>> """
> >>> CREATE INDEX bt_f8_heap_seqno_idx 
> >>>     ON public.bt_f8_heap 
> >>>     USING brin (seqno float8_minmax_multi_ops);
> >>> """
> >>> Attached is a backtrace.
> >>
> >> Thanks for the report!
> >>
> >> I think I see the issue - brin_minmax_multi_union does not realize the
> >> two summaries could have just one range each, and those can overlap so
> >> that merge_overlapping_ranges combines them into a single one.
> >>
> >> This is harmless, except that the assert int build_distances is overly
> >> strict. Not sure if we should just remove the assert or only compute the
> >> distances with (neranges>1).
> >>
> >> Do you happen to have the core dump? It'd be useful to look at ranges_a
> >> and ranges_b, to confirm this is indeed what's happening.
> >>
> > 
> > I do have it.
> > 
> > (gdb) p *ranges_a
> > $4 = {
> >   typid = 701,
> >   colloid = 0,
> >   attno = 0,
> >   cmp = 0x0,
> >   nranges = 0,
> >   nsorted = 1,
> >   nvalues = 1,
> >   maxvalues = 32,
> >   target_maxvalues = 32,
> >   values = 0x55d2ea1987c8
> > }
> > (gdb) p *ranges_b
> > $5 = {
> >   typid = 701,
> >   colloid = 0,
> >   attno = 0,
> >   cmp = 0x0,
> >   nranges = 0,
> >   nsorted = 1,
> >   nvalues = 1,
> >   maxvalues = 32,
> >   target_maxvalues = 32,
> >   values = 0x55d2ea196da8
> > }
> > 
> 
> Thanks. That mostly confirms my theory. I'd bet that this
> 
> (gdb) p ranges_a->values[0]
> (gdb) p ranges_b->values[0]
> 
> will print the same value. 
> 

you're right, they are same value

(gdb) p ranges_a->values[0]
$1 = 4679532294229561068
(gdb) p ranges_b->values[0]
$2 = 4679532294229561068

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL



Re: Crash in BRIN minmax-multi indexes

From
Justin Pryzby
Date:
On Mon, Oct 03, 2022 at 10:29:38PM +0200, Tomas Vondra wrote:
> I'll get this fixed in a couple days. Considering the benign nature of
> this issue (unnecessary assert) I'm not going to rush.

Is this still an outstanding issue ?

-- 
Justin



Re: Crash in BRIN minmax-multi indexes

From
Tomas Vondra
Date:
On 11/9/22 00:13, Justin Pryzby wrote:
> On Mon, Oct 03, 2022 at 10:29:38PM +0200, Tomas Vondra wrote:
>> I'll get this fixed in a couple days. Considering the benign nature of
>> this issue (unnecessary assert) I'm not going to rush.
> 
> Is this still an outstanding issue ?
> 

Yes. I'll get it fixed, but it's harmless in practice (without asserts),
and I've been focusing on the other issue with broken NULL-handling in
BRIN indexes.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Crash in BRIN minmax-multi indexes

From
Tomas Vondra
Date:
I finally pushed this fix.

In the end I both relaxed the assert a little bit to allow calling
build_distances for a single range, and added a bail out so that the
caller gets regular NULL and not whatever palloc(0) produces.

Thanks again for the report!


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company