Thread: Crash in BRIN minmax-multi indexes
Hi, Just found $SUBJECT involving time with time zone and a subselect. I still don't have narrowed to the exact table/index minimal schema but if you run this query on the regression database it will creash. ``` update public.brintest_multi set timetzcol = (select tz from generate_series('2021-01-01'::timestamp with time zone, '2021-01-31', '5 days') tz limit 1) ; ``` attached a backtrace. Let me know if you need extra information. -- Jaime Casanova Director de Servicios Profesionales SYSTEMGUARDS - Consultores de PostgreSQL
Attachment
Hi,
In build_distances():
a1 = eranges[i].maxval;
a2 = eranges[i + 1].minval;
a2 = eranges[i + 1].minval;
It seems there was overlap between the successive ranges, leading to
delta = -6785000000
FYI
On Wed, Mar 31, 2021 at 10:30 AM Jaime Casanova <jcasanov@systemguards.com.ec> wrote:
Hi,
Just found $SUBJECT involving time with time zone and a subselect. I
still don't have narrowed to the exact table/index minimal schema but
if you run this query on the regression database it will creash.
```
update public.brintest_multi set
timetzcol = (select tz from generate_series('2021-01-01'::timestamp
with time zone, '2021-01-31', '5 days') tz limit 1)
;
```
attached a backtrace. Let me know if you need extra information.
--
Jaime Casanova
Director de Servicios Profesionales
SYSTEMGUARDS - Consultores de PostgreSQL
On 3/31/21 8:20 PM, Zhihong Yu wrote: > Hi, > In build_distances(): > > a1 = eranges[i].maxval; > a2 = eranges[i + 1].minval; > > It seems there was overlap between the successive ranges, leading to > delta = -6785000000 > I've been unable to reproduce this, so far :-( How exactly did you manage to reproduce it? The thing is - how could there be an overlap? The way we build expanded ranges that should not be possible, I think. Can you print the ranges at the end of fill_expanded_ranges? That should shed some light on this. FWIW I suspect those asserts on delta may be a bit problematic due to rounding errors. And I found one issue in the inet distance function, because apparently test=# select '10.2.14.243/24'::inet < '10.2.14.231/24'::inet; ?column? ---------- f (1 row) but the delta formula currently ignores the mask. But that's a separate issue. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, I think I found the issue - it's kinda obvious, really. We need to consider the timezone, because the "time" parts alone may be sorted differently. The attached patch should fix this, and it also fixes a similar issue in the inet data type. As for why the regression tests did not catch this, it's most likely because the data is likely generated in "nice" ordering, or something like that. I'll see if I can tweak the ordering to trigger these issues reliably, and I'll do a bit more randomized testing. There's also the question of rounding errors, which I think might cause random assert failures (but in practice it's harmless, in the worst case we'll merge the ranges a bit differently). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Hi,
For inet data type fix:
+ unsigned char a = addra[i];
+ unsigned char b = addrb[i];
+
+ if (i >= lena)
+ a = 0;
+
+ if (i >= lenb)
+ b = 0;
+ unsigned char b = addrb[i];
+
+ if (i >= lena)
+ a = 0;
+
+ if (i >= lenb)
+ b = 0;
Should the length check precede the addra[i] ?
Something like:
unsigned char a;
if (i >= lena) a = 0;
else a = addra[i];
Cheers
On Wed, Mar 31, 2021 at 3:25 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
Hi,
I think I found the issue - it's kinda obvious, really. We need to
consider the timezone, because the "time" parts alone may be sorted
differently. The attached patch should fix this, and it also fixes a
similar issue in the inet data type.
As for why the regression tests did not catch this, it's most likely
because the data is likely generated in "nice" ordering, or something
like that. I'll see if I can tweak the ordering to trigger these issues
reliably, and I'll do a bit more randomized testing.
There's also the question of rounding errors, which I think might cause
random assert failures (but in practice it's harmless, in the worst case
we'll merge the ranges a bit differently).
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 4/1/21 12:53 AM, Zhihong Yu wrote: > Hi, > For inet data type fix: > > + unsigned char a = addra[i]; > + unsigned char b = addrb[i]; > + > + if (i >= lena) > + a = 0; > + > + if (i >= lenb) > + b = 0; > > Should the length check precede the addra[i] ? > Something like: > > unsigned char a; > if (i >= lena) a = 0; > else a = addra[i]; > I don't think that makes any difference. We know the bytes are there, we just want to ignore / reset them in some cases. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 31, 2021 at 5:25 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Hi, > > I think I found the issue - it's kinda obvious, really. We need to > consider the timezone, because the "time" parts alone may be sorted > differently. The attached patch should fix this, and it also fixes a > similar issue in the inet data type. > ah! yeah! obvious... if you say so ;) > As for why the regression tests did not catch this, it's most likely > because the data is likely generated in "nice" ordering, or something > like that. I'll see if I can tweak the ordering to trigger these issues > reliably, and I'll do a bit more randomized testing. > > There's also the question of rounding errors, which I think might cause > random assert failures (but in practice it's harmless, in the worst case > we'll merge the ranges a bit differently). > > I can confirm this fixes the crash in the query I showed and the original case. -- Jaime Casanova Director de Servicios Profesionales SYSTEMGUARDS - Consultores de PostgreSQL
Hi,
- delta += (float8) addrb[i] - (float8) addra[i];
- delta /= 256;
- delta /= 256;
...
+ delta /= 255;
May I know why the divisor was changed ?
Thanks
On Wed, Mar 31, 2021 at 3:25 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
Hi,
I think I found the issue - it's kinda obvious, really. We need to
consider the timezone, because the "time" parts alone may be sorted
differently. The attached patch should fix this, and it also fixes a
similar issue in the inet data type.
As for why the regression tests did not catch this, it's most likely
because the data is likely generated in "nice" ordering, or something
like that. I'll see if I can tweak the ordering to trigger these issues
reliably, and I'll do a bit more randomized testing.
There's also the question of rounding errors, which I think might cause
random assert failures (but in practice it's harmless, in the worst case
we'll merge the ranges a bit differently).
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 4/1/21 3:22 AM, Zhihong Yu wrote: > Hi, > - delta += (float8) addrb[i] - (float8) addra[i]; > - delta /= 256; > ... > + delta /= 255; > > May I know why the divisor was changed ? > Yeah, that's a mistake, it should remain 256. Consider two subtractions 1.1.2.255 - 1.1.1.0 = [0, 0, 1, 255] 1.1.2.255 - 1.1.0.255 = [0, 0, 2, 0] With the divisor being 255 those would be the same (2 * 256), but we want the first one to be a bit smaller. It's also consistent with how inet does subtractions: test=# select '1.1.2.255'::inet - '1.1.0.255'::inet; ?column? ---------- 512 (1 row) test=# select '1.1.2.255'::inet - '1.1.1.0'::inet; ?column? ---------- 511 (1 row) So I'll keep the 256. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Apr 1, 2021 at 11:25 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > As for why the regression tests did not catch this, it's most likely > because the data is likely generated in "nice" ordering, or something > like that. I'll see if I can tweak the ordering to trigger these issues > reliably, and I'll do a bit more randomized testing. For what little it's worth now that you've cracked it, I can report that make check blows up somewhere in here on a 32 bit system with --with-blocksize=32 :-)
On Wed, Mar 31, 2021 at 6:19 PM Jaime Casanova <jcasanov@systemguards.com.ec> wrote: > > On Wed, Mar 31, 2021 at 5:25 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: > > > > Hi, > > > > I think I found the issue - it's kinda obvious, really. We need to > > consider the timezone, because the "time" parts alone may be sorted > > differently. The attached patch should fix this, and it also fixes a > > similar issue in the inet data type. > > > > ah! yeah! obvious... if you say so ;) > > > As for why the regression tests did not catch this, it's most likely > > because the data is likely generated in "nice" ordering, or something > > like that. I'll see if I can tweak the ordering to trigger these issues > > reliably, and I'll do a bit more randomized testing. > > > > There's also the question of rounding errors, which I think might cause > > random assert failures (but in practice it's harmless, in the worst case > > we'll merge the ranges a bit differently). > > > > > > I can confirm this fixes the crash in the query I showed and the original case. > But I found another, but similar issue. ``` update public.brintest_multi set intervalcol = (select pg_catalog.avg(intervalcol) from public.brintest_bloom) ; ``` BTW, i can reproduce just by executing "make installcheck" and immediately execute that query -- Jaime Casanova Director de Servicios Profesionales SYSTEMGUARDS - Consultores de PostgreSQL
Attachment
Hi,
Can you try this patch ?
Thanks
diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index 70109960e8..25d6d2e274 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2161,7 +2161,7 @@ brin_minmax_multi_distance_interval(PG_FUNCTION_ARGS)
delta = 24L * 3600L * delta;
/* and add the time part */
- delta += result->time / (float8) 1000000.0;
+ delta += (result->time + result->zone * USECS_PER_SEC) / (float8) 1000000.0;
Assert(delta >= 0);
index 70109960e8..25d6d2e274 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2161,7 +2161,7 @@ brin_minmax_multi_distance_interval(PG_FUNCTION_ARGS)
delta = 24L * 3600L * delta;
/* and add the time part */
- delta += result->time / (float8) 1000000.0;
+ delta += (result->time + result->zone * USECS_PER_SEC) / (float8) 1000000.0;
Assert(delta >= 0);
On Wed, Mar 31, 2021 at 11:25 PM Jaime Casanova <jcasanov@systemguards.com.ec> wrote:
On Wed, Mar 31, 2021 at 6:19 PM Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
>
> On Wed, Mar 31, 2021 at 5:25 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
> >
> > Hi,
> >
> > I think I found the issue - it's kinda obvious, really. We need to
> > consider the timezone, because the "time" parts alone may be sorted
> > differently. The attached patch should fix this, and it also fixes a
> > similar issue in the inet data type.
> >
>
> ah! yeah! obvious... if you say so ;)
>
> > As for why the regression tests did not catch this, it's most likely
> > because the data is likely generated in "nice" ordering, or something
> > like that. I'll see if I can tweak the ordering to trigger these issues
> > reliably, and I'll do a bit more randomized testing.
> >
> > There's also the question of rounding errors, which I think might cause
> > random assert failures (but in practice it's harmless, in the worst case
> > we'll merge the ranges a bit differently).
> >
> >
>
> I can confirm this fixes the crash in the query I showed and the original case.
>
But I found another, but similar issue.
```
update public.brintest_multi set
intervalcol = (select pg_catalog.avg(intervalcol) from public.brintest_bloom)
;
```
BTW, i can reproduce just by executing "make installcheck" and
immediately execute that query
--
Jaime Casanova
Director de Servicios Profesionales
SYSTEMGUARDS - Consultores de PostgreSQL
On 4/1/21 3:09 PM, Zhihong Yu wrote: > Hi, > Can you try this patch ? > > Thanks > > diff --git a/src/backend/access/brin/brin_minmax_multi.c > b/src/backend/access/brin/brin_minmax_multi.c > index 70109960e8..25d6d2e274 100644 > --- a/src/backend/access/brin/brin_minmax_multi.c > +++ b/src/backend/access/brin/brin_minmax_multi.c > @@ -2161,7 +2161,7 @@ brin_minmax_multi_distance_interval(PG_FUNCTION_ARGS) > delta = 24L * 3600L * delta; > > /* and add the time part */ > - delta += result->time / (float8) 1000000.0; > + delta += (result->time + result->zone * USECS_PER_SEC) / (float8) > 1000000.0; > That won't work, because Interval does not have a "zone" field, so this won't even compile. The problem is that interval comparisons convert the value using 30 days per month (see interval_cmp_value), but the formula in this function uses 31. So either we can tweak that (seems to fix it for me), or maybe just switch to interval_cmp_value directly. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, Tomas:
Thanks for the correction.
I think switching to interval_cmp_value() would be better (with a comment explaining why).
Cheers
On Thu, Apr 1, 2021 at 6:23 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
On 4/1/21 3:09 PM, Zhihong Yu wrote:
> Hi,
> Can you try this patch ?
>
> Thanks
>
> diff --git a/src/backend/access/brin/brin_minmax_multi.c
> b/src/backend/access/brin/brin_minmax_multi.c
> index 70109960e8..25d6d2e274 100644
> --- a/src/backend/access/brin/brin_minmax_multi.c
> +++ b/src/backend/access/brin/brin_minmax_multi.c
> @@ -2161,7 +2161,7 @@ brin_minmax_multi_distance_interval(PG_FUNCTION_ARGS)
> delta = 24L * 3600L * delta;
>
> /* and add the time part */
> - delta += result->time / (float8) 1000000.0;
> + delta += (result->time + result->zone * USECS_PER_SEC) / (float8)
> 1000000.0;
>
That won't work, because Interval does not have a "zone" field, so this
won't even compile.
The problem is that interval comparisons convert the value using 30 days
per month (see interval_cmp_value), but the formula in this function
uses 31. So either we can tweak that (seems to fix it for me), or maybe
just switch to interval_cmp_value directly.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Apr 01, 2021 at 03:22:59PM +0200, Tomas Vondra wrote: > On 4/1/21 3:09 PM, Zhihong Yu wrote: > > Hi, > > Can you try this patch ? > > > > Thanks > > > > diff --git a/src/backend/access/brin/brin_minmax_multi.c > > b/src/backend/access/brin/brin_minmax_multi.c > > index 70109960e8..25d6d2e274 100644 > > --- a/src/backend/access/brin/brin_minmax_multi.c > > +++ b/src/backend/access/brin/brin_minmax_multi.c > > @@ -2161,7 +2161,7 @@ brin_minmax_multi_distance_interval(PG_FUNCTION_ARGS) > > delta = 24L * 3600L * delta; > > > > /* and add the time part */ > > - delta += result->time / (float8) 1000000.0; > > + delta += (result->time + result->zone * USECS_PER_SEC) / (float8) > > 1000000.0; > > > > That won't work, because Interval does not have a "zone" field, so this > won't even compile. > > The problem is that interval comparisons convert the value using 30 days > per month (see interval_cmp_value), but the formula in this function > uses 31. So either we can tweak that (seems to fix it for me), or maybe > just switch to interval_cmp_value directly. > Changing to using month of 30 days on the formula fixed it. and I found another issue, this time involves autovacuum which makes it a little more complicated to reproduce. Currently the only stable way to reproduce it is using pgbench: pgbench -i postgres psql -c "CREATE INDEX ON pgbench_history USING brin (tid int4_minmax_multi_ops);" postgres pgbench -c2 -j2 -T 300 -n postgres Attached a backtrace -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
Attachment
On 4/4/21 7:25 AM, Jaime Casanova wrote: > ... > Changing to using month of 30 days on the formula fixed it. > I've pushed fixes for all the bugs reported in this thread so far (mostly distance calculations, ...), and one bug (swapped operator parameters in one place) I discovered while working on the fixes. > and I found another issue, this time involves autovacuum which makes it > a little more complicated to reproduce. > > Currently the only stable way to reproduce it is using pgbench: > > pgbench -i postgres > psql -c "CREATE INDEX ON pgbench_history USING brin (tid int4_minmax_multi_ops);" postgres > pgbench -c2 -j2 -T 300 -n postgres > Fixed and pushed too. Turned out to be a silly bug in forgetting to remember the number of ranges after deduplication, which sometimes resulted in assert failure. It's a bit hard to trigger because concurrency / good timing is needed while summarizing the range, requiring a call to "union" function. Just running the pgbench did not trigger the issue for me, I had to manually call the brin_summarize_new_values(). For the record, I did a lot of testing with data randomized in various ways - the scripts are available here: https://github.com/tvondra/brin-randomized-tests It was focused on discovering issues in the distance functions, and comparing results with/without the index. Maybe the next step should be adding some changes to the data, which might trigger more issues like this one. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
BTW, for the inet data type, I considered simply calling the "minus" function, but that does not work because of this strange behavior: int4=# select '10.1.1.102/32'::inet > '10.1.1.142/24'::inet; ?column? ---------- t (1 row) int4=# select '10.1.1.102/32'::inet - '10.1.1.142/24'::inet; ?column? ---------- -40 (1 row) That is, (a>b) but then (a-b) < 0. AFAICS it's due to comparator considering the mask, while the minus ignores it. I find it a bit strange, but I assume it's intentional. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Apr 04, 2021 at 07:52:50PM +0200, Tomas Vondra wrote: > On 4/4/21 7:25 AM, Jaime Casanova wrote: > > > > pgbench -i postgres > > psql -c "CREATE INDEX ON pgbench_history USING brin (tid int4_minmax_multi_ops);" postgres > > pgbench -c2 -j2 -T 300 -n postgres > > > > Fixed and pushed too. > > Turned out to be a silly bug in forgetting to remember the number of > ranges after deduplication, which sometimes resulted in assert failure. > It's a bit hard to trigger because concurrency / good timing is needed > while summarizing the range, requiring a call to "union" function. Just > running the pgbench did not trigger the issue for me, I had to manually > call the brin_summarize_new_values(). > > For the record, I did a lot of testing with data randomized in various > ways - the scripts are available here: > > https://github.com/tvondra/brin-randomized-tests > > It was focused on discovering issues in the distance functions, and > comparing results with/without the index. Maybe the next step should be > adding some changes to the data, which might trigger more issues like > this one. > Just found one more ocurrance of this one with this index while an autovacuum was running: """ CREATE INDEX bt_f8_heap_seqno_idx ON public.bt_f8_heap USING brin (seqno float8_minmax_multi_ops); """ Attached is a backtrace. -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
Attachment
On 9/29/22 08:53, Jaime Casanova wrote: > ... > > Just found one more ocurrance of this one with this index while an > autovacuum was running: > > """ > CREATE INDEX bt_f8_heap_seqno_idx > ON public.bt_f8_heap > USING brin (seqno float8_minmax_multi_ops); > """ > Attached is a backtrace. Thanks for the report! I think I see the issue - brin_minmax_multi_union does not realize the two summaries could have just one range each, and those can overlap so that merge_overlapping_ranges combines them into a single one. This is harmless, except that the assert int build_distances is overly strict. Not sure if we should just remove the assert or only compute the distances with (neranges>1). Do you happen to have the core dump? It'd be useful to look at ranges_a and ranges_b, to confirm this is indeed what's happening. If not, how reproducible is it? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 03, 2022 at 07:53:34PM +0200, Tomas Vondra wrote: > On 9/29/22 08:53, Jaime Casanova wrote: > > ... > > > > Just found one more ocurrance of this one with this index while an > > autovacuum was running: > > > > """ > > CREATE INDEX bt_f8_heap_seqno_idx > > ON public.bt_f8_heap > > USING brin (seqno float8_minmax_multi_ops); > > """ > > Attached is a backtrace. > > Thanks for the report! > > I think I see the issue - brin_minmax_multi_union does not realize the > two summaries could have just one range each, and those can overlap so > that merge_overlapping_ranges combines them into a single one. > > This is harmless, except that the assert int build_distances is overly > strict. Not sure if we should just remove the assert or only compute the > distances with (neranges>1). > > Do you happen to have the core dump? It'd be useful to look at ranges_a > and ranges_b, to confirm this is indeed what's happening. > I do have it. (gdb) p *ranges_a $4 = { typid = 701, colloid = 0, attno = 0, cmp = 0x0, nranges = 0, nsorted = 1, nvalues = 1, maxvalues = 32, target_maxvalues = 32, values = 0x55d2ea1987c8 } (gdb) p *ranges_b $5 = { typid = 701, colloid = 0, attno = 0, cmp = 0x0, nranges = 0, nsorted = 1, nvalues = 1, maxvalues = 32, target_maxvalues = 32, values = 0x55d2ea196da8 } -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
On 10/3/22 21:25, Jaime Casanova wrote: > On Mon, Oct 03, 2022 at 07:53:34PM +0200, Tomas Vondra wrote: >> On 9/29/22 08:53, Jaime Casanova wrote: >>> ... >>> >>> Just found one more ocurrance of this one with this index while an >>> autovacuum was running: >>> >>> """ >>> CREATE INDEX bt_f8_heap_seqno_idx >>> ON public.bt_f8_heap >>> USING brin (seqno float8_minmax_multi_ops); >>> """ >>> Attached is a backtrace. >> >> Thanks for the report! >> >> I think I see the issue - brin_minmax_multi_union does not realize the >> two summaries could have just one range each, and those can overlap so >> that merge_overlapping_ranges combines them into a single one. >> >> This is harmless, except that the assert int build_distances is overly >> strict. Not sure if we should just remove the assert or only compute the >> distances with (neranges>1). >> >> Do you happen to have the core dump? It'd be useful to look at ranges_a >> and ranges_b, to confirm this is indeed what's happening. >> > > I do have it. > > (gdb) p *ranges_a > $4 = { > typid = 701, > colloid = 0, > attno = 0, > cmp = 0x0, > nranges = 0, > nsorted = 1, > nvalues = 1, > maxvalues = 32, > target_maxvalues = 32, > values = 0x55d2ea1987c8 > } > (gdb) p *ranges_b > $5 = { > typid = 701, > colloid = 0, > attno = 0, > cmp = 0x0, > nranges = 0, > nsorted = 1, > nvalues = 1, > maxvalues = 32, > target_maxvalues = 32, > values = 0x55d2ea196da8 > } > Thanks. That mostly confirms my theory. I'd bet that this (gdb) p ranges_a->values[0] (gdb) p ranges_b->values[0] will print the same value. I've been able to reproduce this, but it's pretty difficult to get the timing right (and it requires table with just a single value in that BRIN range). I'll get this fixed in a couple days. Considering the benign nature of this issue (unnecessary assert) I'm not going to rush. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 03, 2022 at 10:29:38PM +0200, Tomas Vondra wrote: > On 10/3/22 21:25, Jaime Casanova wrote: > > On Mon, Oct 03, 2022 at 07:53:34PM +0200, Tomas Vondra wrote: > >> On 9/29/22 08:53, Jaime Casanova wrote: > >>> ... > >>> > >>> Just found one more ocurrance of this one with this index while an > >>> autovacuum was running: > >>> > >>> """ > >>> CREATE INDEX bt_f8_heap_seqno_idx > >>> ON public.bt_f8_heap > >>> USING brin (seqno float8_minmax_multi_ops); > >>> """ > >>> Attached is a backtrace. > >> > >> Thanks for the report! > >> > >> I think I see the issue - brin_minmax_multi_union does not realize the > >> two summaries could have just one range each, and those can overlap so > >> that merge_overlapping_ranges combines them into a single one. > >> > >> This is harmless, except that the assert int build_distances is overly > >> strict. Not sure if we should just remove the assert or only compute the > >> distances with (neranges>1). > >> > >> Do you happen to have the core dump? It'd be useful to look at ranges_a > >> and ranges_b, to confirm this is indeed what's happening. > >> > > > > I do have it. > > > > (gdb) p *ranges_a > > $4 = { > > typid = 701, > > colloid = 0, > > attno = 0, > > cmp = 0x0, > > nranges = 0, > > nsorted = 1, > > nvalues = 1, > > maxvalues = 32, > > target_maxvalues = 32, > > values = 0x55d2ea1987c8 > > } > > (gdb) p *ranges_b > > $5 = { > > typid = 701, > > colloid = 0, > > attno = 0, > > cmp = 0x0, > > nranges = 0, > > nsorted = 1, > > nvalues = 1, > > maxvalues = 32, > > target_maxvalues = 32, > > values = 0x55d2ea196da8 > > } > > > > Thanks. That mostly confirms my theory. I'd bet that this > > (gdb) p ranges_a->values[0] > (gdb) p ranges_b->values[0] > > will print the same value. > you're right, they are same value (gdb) p ranges_a->values[0] $1 = 4679532294229561068 (gdb) p ranges_b->values[0] $2 = 4679532294229561068 -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
On Mon, Oct 03, 2022 at 10:29:38PM +0200, Tomas Vondra wrote: > I'll get this fixed in a couple days. Considering the benign nature of > this issue (unnecessary assert) I'm not going to rush. Is this still an outstanding issue ? -- Justin
On 11/9/22 00:13, Justin Pryzby wrote: > On Mon, Oct 03, 2022 at 10:29:38PM +0200, Tomas Vondra wrote: >> I'll get this fixed in a couple days. Considering the benign nature of >> this issue (unnecessary assert) I'm not going to rush. > > Is this still an outstanding issue ? > Yes. I'll get it fixed, but it's harmless in practice (without asserts), and I've been focusing on the other issue with broken NULL-handling in BRIN indexes. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I finally pushed this fix. In the end I both relaxed the assert a little bit to allow calling build_distances for a single range, and added a bail out so that the caller gets regular NULL and not whatever palloc(0) produces. Thanks again for the report! regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company