Thread: Segmentation fault - PostgreSQL 17.0
Best regards, Ľubo
Textom tejto emailovej správy odosielateľ nesľubuje ani neuzatvára za spoločnosť MicroStep – HDO s.r.o. žiadnu zmluvu, nakoľko naša spoločnosť uzatvára každú zmluvu výlučne v písomnej forme. Ak Vám bol tento e-mail zaslaný omylom, prosím upozornite odosielateľa a tento e-mail odstráňte.
The sender of this e-mail message does not promise nor shall conclude any contract on the behalf of the company MicroStep HDO s.r.o. as our company enters into any contract exclusively in writing. If you have been sent this email in error, please notify the sender and delete this email.
On 11/8/24 18:22, Ľuboslav Špilák wrote: > Hello. > > I am trying postgresql17 partitioning tables and check brin indexes with > error* Segmenation fault.* > We upgraded db from last PostgreSQL12 to 17.0 using pg_upgrade binary. > Everything seems to be OK. Only this select is problem. > > select * from brin_page_items( > get_raw_page('test1_table_2022q3_timeseries_id_time_idx',2), > 'test1_table_2022q3_timeseries_id_time_idx' > ) > > Welcome to *Ubuntu 20.04.6 *LTS (GNU/Linux 5.4.0-200-generic x86_64) > *PostgreSQL 17.0* (Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, > compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit > > > 2024-11-08 18:12:20.861 CET [12350] LOG: starting PostgreSQL 17.0 > (Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc > (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit > 2024-11-08 18:12:20.864 CET [12350] LOG: listening on IPv4 address > "0.0.0.0", port 5432 > 2024-11-08 18:12:20.867 CET [12350] LOG: could not create IPv6 socket > for address "::": Address family not supported by protocol > 2024-11-08 18:12:20.868 CET [12350] LOG: listening on Unix socket "/ > var/run/postgresql/.s.PGSQL.5432" > 2024-11-08 18:12:20.878 CET [12357] LOG: database system was shut down > at 2024-11-08 18:12:19 CET > 2024-11-08 18:12:20.890 CET [12350] LOG: database system is ready to > accept connections > 2024-11-08 18:12:41.055 CET [12350] LOG: *server process (PID 12376) > was terminated by signal 11: Segmentation fault* > 2024-11-08 18:12:41.055 CET [12350] DETAIL: Failed process was running: > select * > from brin_page_items( > get_raw_page('test1_table_2022q3_timeseries_id_time_idx',2), > 'test1_table_2022q3_timeseries_id_time_idx' > ) > 2024-11-08 18:12:41.055 CET [12350] LOG: terminating any other active > server processes > 2024-11-08 18:12:41.058 CET [12350] LOG: all server processes > terminated; reinitializing > 2024-11-08 18:12:41.276 CET [12379] LOG: database system was > interrupted; last known up at 2024-11-08 18:12:20 CET > 2024-11-08 18:12:41.293 CET [12382] postgres@xtimeseries FATAL: the > database system is in recovery mode > 2024-11-08 18:12:41.319 CET [12383] postgres@xtimeseries FATAL: the > database system is in recovery mode > 2024-11-08 18:12:41.346 CET [12384] postgres@xtimeseries FATAL: the > database system is in recovery mode > 2024-11-08 18:12:41.364 CET [12379] LOG: database system was not > properly shut down; automatic recovery in progress > 2024-11-08 18:12:41.371 CET [12379] LOG: redo starts at FE49/7A5FBCD0 > 2024-11-08 18:12:41.371 CET [12379] LOG: invalid record length at > FE49/7A5FBD08: expected at least 24, got 0 > 2024-11-08 18:12:41.371 CET [12379] LOG: redo done at FE49/7A5FBCD0 > system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s > 2024-11-08 18:12:41.371 CET [12385] postgres@xtimeseries FATAL: the > database system is in recovery mode > 2024-11-08 18:12:41.384 CET [12380] LOG: checkpoint starting: end-of- > recovery immediate wait > 2024-11-08 18:12:41.401 CET [12386] postgres@xtimeseries FATAL: the > database system is not yet accepting connections > 2024-11-08 18:12:41.401 CET [12386] postgres@xtimeseries DETAIL: > Consistent recovery state has not been yet reached. > 2024-11-08 18:12:41.408 CET [12380] LOG: checkpoint complete: wrote 2 > buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.017 > s, sync=0.002 s, total=0.026 s; sync files=3, longest=0.001 s, > average=0.001 s; distance=0 kB, estimate=0 kB; lsn=FE49/7A5FBD08, redo > lsn=FE49/7A5FBD08 > 2024-11-08 18:12:41.413 CET [12350] LOG: database system is ready to > accept connections > > What could be wrong? Hard to say, really. It would be interesting to see the backtrace from the crash. Considering you're able to trigger the issue easily, it shouldn't be too difficult to attach GDB to a backend before running the query. Alternatively, you can enable core files, and generate the backtrace from that. Presumably the index is a simple BRIN minmax index? Or what opclass does it use? Any special parameters? Is the index working otherwise, producing correct results? regards -- Tomas Vondra
brin_metapage_info .. ok
brin_revmap_data .. ok
brin_page_items .. sigsegv
Alternatively, you can enable core files, and generate the backtrace "
Sent: Saturday, November 9, 2024 1:01:25 PM
To: Ľuboslav Špilák <lspilak@microstep-hdo.sk>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: Re: Segmentation fault - PostgreSQL 17.0
On 11/8/24 18:22, Ľuboslav Špilák wrote:
> Hello.
>
> I am trying postgresql17 partitioning tables and check brin indexes with
> error* Segmenation fault.*
> We upgraded db from last PostgreSQL12 to 17.0 using pg_upgrade binary.
> Everything seems to be OK. Only this select is problem.
>
> select * from brin_page_items(
> get_raw_page('test1_table_2022q3_timeseries_id_time_idx',2),
> 'test1_table_2022q3_timeseries_id_time_idx'
> )
>
> Welcome to *Ubuntu 20.04.6 *LTS (GNU/Linux 5.4.0-200-generic x86_64)
> *PostgreSQL 17.0* (Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu,
> compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
>
>
> 2024-11-08 18:12:20.861 CET [12350] LOG: starting PostgreSQL 17.0
> (Ubuntu 17.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc
> (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
> 2024-11-08 18:12:20.864 CET [12350] LOG: listening on IPv4 address
> "0.0.0.0", port 5432
> 2024-11-08 18:12:20.867 CET [12350] LOG: could not create IPv6 socket
> for address "::": Address family not supported by protocol
> 2024-11-08 18:12:20.868 CET [12350] LOG: listening on Unix socket "/
> var/run/postgresql/.s.PGSQL.5432"
> 2024-11-08 18:12:20.878 CET [12357] LOG: database system was shut down
> at 2024-11-08 18:12:19 CET
> 2024-11-08 18:12:20.890 CET [12350] LOG: database system is ready to
> accept connections
> 2024-11-08 18:12:41.055 CET [12350] LOG: *server process (PID 12376)
> was terminated by signal 11: Segmentation fault*
> 2024-11-08 18:12:41.055 CET [12350] DETAIL: Failed process was running:
> select *
> from brin_page_items(
> get_raw_page('test1_table_2022q3_timeseries_id_time_idx',2),
> 'test1_table_2022q3_timeseries_id_time_idx'
> )
> 2024-11-08 18:12:41.055 CET [12350] LOG: terminating any other active
> server processes
> 2024-11-08 18:12:41.058 CET [12350] LOG: all server processes
> terminated; reinitializing
> 2024-11-08 18:12:41.276 CET [12379] LOG: database system was
> interrupted; last known up at 2024-11-08 18:12:20 CET
> 2024-11-08 18:12:41.293 CET [12382] postgres@xtimeseries FATAL: the
> database system is in recovery mode
> 2024-11-08 18:12:41.319 CET [12383] postgres@xtimeseries FATAL: the
> database system is in recovery mode
> 2024-11-08 18:12:41.346 CET [12384] postgres@xtimeseries FATAL: the
> database system is in recovery mode
> 2024-11-08 18:12:41.364 CET [12379] LOG: database system was not
> properly shut down; automatic recovery in progress
> 2024-11-08 18:12:41.371 CET [12379] LOG: redo starts at FE49/7A5FBCD0
> 2024-11-08 18:12:41.371 CET [12379] LOG: invalid record length at
> FE49/7A5FBD08: expected at least 24, got 0
> 2024-11-08 18:12:41.371 CET [12379] LOG: redo done at FE49/7A5FBCD0
> system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> 2024-11-08 18:12:41.371 CET [12385] postgres@xtimeseries FATAL: the
> database system is in recovery mode
> 2024-11-08 18:12:41.384 CET [12380] LOG: checkpoint starting: end-of-
> recovery immediate wait
> 2024-11-08 18:12:41.401 CET [12386] postgres@xtimeseries FATAL: the
> database system is not yet accepting connections
> 2024-11-08 18:12:41.401 CET [12386] postgres@xtimeseries DETAIL:
> Consistent recovery state has not been yet reached.
> 2024-11-08 18:12:41.408 CET [12380] LOG: checkpoint complete: wrote 2
> buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.017
> s, sync=0.002 s, total=0.026 s; sync files=3, longest=0.001 s,
> average=0.001 s; distance=0 kB, estimate=0 kB; lsn=FE49/7A5FBD08, redo
> lsn=FE49/7A5FBD08
> 2024-11-08 18:12:41.413 CET [12350] LOG: database system is ready to
> accept connections
>
> What could be wrong?
Hard to say, really. It would be interesting to see the backtrace from
the crash.
Considering you're able to trigger the issue easily, it shouldn't be too
difficult to attach GDB to a backend before running the query.
Alternatively, you can enable core files, and generate the backtrace
from that.
Presumably the index is a simple BRIN minmax index? Or what opclass does
it use? Any special parameters? Is the index working otherwise,
producing correct results?
regards
--
Tomas Vondra
Textom tejto emailovej správy odosielateľ nesľubuje ani neuzatvára za spoločnosť MicroStep – HDO s.r.o. žiadnu zmluvu, nakoľko naša spoločnosť uzatvára každú zmluvu výlučne v písomnej forme. Ak Vám bol tento e-mail zaslaný omylom, prosím upozornite odosielateľa a tento e-mail odstráňte.
The sender of this e-mail message does not promise nor shall conclude any contract on the behalf of the company MicroStep HDO s.r.o. as our company enters into any contract exclusively in writing. If you have been sent this email in error, please notify the sender and delete this email.
On Sat, Nov 9, 2024 at 7:01 AM Tomas Vondra <tomas@vondra.me> wrote: > Considering you're able to trigger the issue easily, it shouldn't be too > difficult to attach GDB to a backend before running the query. > Alternatively, you can enable core files, and generate the backtrace > from that. This query involves the use of a pageinspect function that accepts a raw page image. There are some sanity checks of the page, but those are quite lightweight. It's really not that hard to imagine it segfaulting from a page image that passes those checks by mistake, but is nevertheless not a valid BRIN page. In any case this should be easy to debug: save the page image that the function segfaults on, verify that it doesn't contain confidential information, and then post it here. See: https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#contrib.2Fpageinspect_page_dump -- Peter Geoghegan
On 11/9/24 14:02, Ľuboslav Špilák wrote: > Ahoj/Hello. > > On migrated db. > In postgres or public schema (im not sure now) I created the table with > one column int8 - cas (unixtime) > Then I create index brin on that column (by cas/unixtime). > Insert only one row. > Then I Vacuumed table. > So is this a tiny single-row table? Did you create it on PG17, or before running pg_upgrade? > I want to check brin index with > Funkcions: > > |brin_metapage_info .. ok| > |brin_revmap_data .. ok| > |brin_page_items .. sigsegv| > > This is done repeatedly on my migrated db. > > On Monday I could try create new cluster / empty database and try the > same again. > > I must google it to know how: > "attach GDB to a backend before running the query. > Alternatively, you can enable core files, and generate the backtrace " > There are wiki pages [1] and [2] with instructions how to do this. But in short, connect to the DB, get PID using SELECT pg_backend_pid(); attach gdb to that backend gdb -p $PID Hit 'c' to continue running the program, and run the crashing query in the client. The gdb session will interrupt on the segfault, and you'll be able to get backtrace by 'bt'. > In Your pg17 db this funkction works correctly? > It works for me, yes. this is what I tried: create table t (a bigint); insert into t values (1); create index on t using brin (a); select * from brin_metapage_info(get_raw_page('t_a_idx', 0)); magic | version | pagesperrange | lastrevmappage ------------+---------+---------------+---------------- 0xA8109CFA | 1 | 128 | 1 (1 row) select * from brin_revmap_data(get_raw_page('t_a_idx', 1)); pages ------- (2,1) (0,0) (0,0) ... select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx'); itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value ------------+--------+--------+----------+----------+-------------+-------+---------- 1 | 0 | 1 | f | f | f | f | {1 .. 1} (1 row) But this is just a very simple test. regards -- Tomas Vondra
Sent: Saturday, November 9, 2024 5:07:48 PM
To: Ľuboslav Špilák <lspilak@microstep-hdo.sk>; pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
Subject: Re: Segmentation fault - PostgreSQL 17.0
> Ahoj/Hello.
>
> On migrated db.
> In postgres or public schema (im not sure now) I created the table with
> one column int8 - cas (unixtime)
> Then I create index brin on that column (by cas/unixtime).
> Insert only one row.
> Then I Vacuumed table.
>
So is this a tiny single-row table? Did you create it on PG17, or before
running pg_upgrade?
> I want to check brin index with
> Funkcions:
>
> |brin_metapage_info .. ok|
> |brin_revmap_data .. ok|
> |brin_page_items .. sigsegv|
>
> This is done repeatedly on my migrated db.
>
> On Monday I could try create new cluster / empty database and try the
> same again.
>
> I must google it to know how:
> "attach GDB to a backend before running the query.
> Alternatively, you can enable core files, and generate the backtrace "
>
There are wiki pages [1] and [2] with instructions how to do this. But
in short, connect to the DB, get PID using
SELECT pg_backend_pid();
attach gdb to that backend
gdb -p $PID
Hit 'c' to continue running the program, and run the crashing query in
the client. The gdb session will interrupt on the segfault, and you'll
be able to get backtrace by 'bt'.
> In Your pg17 db this funkction works correctly?
>
It works for me, yes. this is what I tried:
create table t (a bigint);
insert into t values (1);
create index on t using brin (a);
select * from brin_metapage_info(get_raw_page('t_a_idx', 0));
magic | version | pagesperrange | lastrevmappage
------------+---------+---------------+----------------
0xA8109CFA | 1 | 128 | 1
(1 row)
select * from brin_revmap_data(get_raw_page('t_a_idx', 1));
pages
-------
(2,1)
(0,0)
(0,0)
...
select * from brin_page_items(get_raw_page('t_a_idx', 2), 't_a_idx');
itemoffset | blknum | attnum | allnulls | hasnulls | placeholder |
empty | value
------------+--------+--------+----------+----------+-------------+-------+----------
1 | 0 | 1 | f | f | f | f
| {1 .. 1}
(1 row)
But this is just a very simple test.
regards
--
Tomas Vondra
Textom tejto emailovej správy odosielateľ nesľubuje ani neuzatvára za spoločnosť MicroStep – HDO s.r.o. žiadnu zmluvu, nakoľko naša spoločnosť uzatvára každú zmluvu výlučne v písomnej forme. Ak Vám bol tento e-mail zaslaný omylom, prosím upozornite odosielateľa a tento e-mail odstráňte.
The sender of this e-mail message does not promise nor shall conclude any contract on the behalf of the company MicroStep HDO s.r.o. as our company enters into any contract exclusively in writing. If you have been sent this email in error, please notify the sender and delete this email.
On 11/9/24 17:35, Ľuboslav Špilák wrote: > Hello. > > After pg_upgrade there was 200 timeseries tables in xtimeseries database. > Each about 2GB with two indexes. One Btree and second brin index. Brinis the case > on two columns (timeseries_id, time). > > I created copy of one table and try change table to partitioned table, > partigioned by time. One quartal to one partition. > > Analyse, vacuum table. > > Then i figured out it sigsegv on brin_page_items Function repeatedly. > > So THEN I try create new test table on pg17. I dont know now if I > created table on different database or only on different schema, but on > the same db cluster. > Test table with one column with one brin index on that column. Insert > only one row. I Vacuumed this test table. I try again sequence of three > brin functions to check if brin index is computed. Third > function brin_page_items caused sigsegv again. > I'm a bit confused about what exactly are the cases that fail. But if you're observing crashes even on tables/indexes created on PG17 after the upgrade, it's unlikely related to the upgrade. Please, get the backtrace when you have access to the system. Is there anything special about the system? Which repository are you using for postgres packages? regards -- Tomas Vondra
On 11/11/24 10:30, Ľuboslav Špilák wrote: > Hello. > > After creating new database cluster (5433) in Postgresql 17 there was no > problem with calling function > select * from brin_page_items( > get_raw_page( > > > On the pg_upgraded cluster I got this backtrace on sigsegv. Is this > helpful or do I need to include any more information? > Could you maybe try on a completely new 17.0 cluster, not one that went through pg_upgrade? I don't think pg_upgrade should cause anything like this, but it'd be good to conclusively rule that out by reproducing the issue on a fresh cluster. > (gdb) c > Continuing. > > Program received signal SIGSEGV, Segmentation fault. > 0x00005627752205d5 in heap_compute_data_size > (tupleDesc=tupleDesc@entry=0x5627a1df38c0, > values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448) > at ./build/../src/backend/access/common/heaptuple.c:234 > 234 ./build/../src/backend/access/common/heaptuple.c: No such file > or directory. > (gdb) bt > #0 0x00005627752205d5 in heap_compute_data_size > (tupleDesc=tupleDesc@entry=0x5627a1df38c0, > values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448) > at ./build/../src/backend/access/common/heaptuple.c:234 This is ... weird. heap_compute_data_size literally didn't change for the last 9 years, so it's the same for 12 and 17. Line 234 is this: Size heap_compute_data_size(TupleDesc tupleDesc, const Datum *values, const bool *isnull) { Size data_length = 0; int i; int numberOfAttributes = tupleDesc->natts; for (i = 0; i < numberOfAttributes; i++) { Datum val; Form_pg_attribute atti; if (isnull[i]) continue; val = values[i]; atti = TupleDescAttr(tupleDesc, i); if (ATT_IS_PACKABLE(atti) && VARATT_CAN_MAKE_SHORT(DatumGetPointer(val))) I wonder which of the conditions triggers the segfault. Whether the one accessing the attribute info (atti), or the one checking the pointer. It has to be the first, because we're dealing with int8, and that's not a varlena type, so it's not packable. So my guess would be atti is some bogus pointer, with garbage. Could you please print variables "i", "numberOfAttributes" and then also the contents of tupleDesc and atti? print i print numberOfAttributes print *tupleDesc print *atti regards -- Tomas Vondra
On 11/11/24 09:25, Ľuboslav Špilák wrote: > Hello. > > I am sending you the dump file from command: > Postgres@hdoppxendb1:~$ *PGOPTIONS="-c search_path=\"XEN_TS\"" > psql -XAt -d "xtimeseries" -c "SELECT > encode(get_raw_page('test_idxbrin', 2),'base64')" | base64 -d > > dump_block_2.page* > > The steps for preparing table and index are: > > > CREATE TABLE test ( > cas int8 NULL > ); > > CREATE INDEX test_idxbrin ON test USING brin (cas) WITH > (pages_per_range='32'); > It took me a while to get this working. It was failing for me with ERROR: column "cas" does not exist because the spaces in CREATE TABLE are actually not regular spaces, but "EN SPACES" (U+2002), which we just consider not-whitespace, and include them in the column name. Presumably it's been added by the mail client. > insert into test values (123) > > analyse test > > vacuum test > > CREATE extension pageinspect; > > SELECT brin_page_type(get_raw_page('test_idxbrin', 0)); > > select * from "XEN_TS".brin_metapage_info(get_raw_page('test_idxbrin',0)); > > select * from brin_revmap_data(get_raw_page('test_idxbrin',1)) limit 1000; > > > > select * > from brin_page_items( > get_raw_page('test_idxbrin',2), > 'test_idxbrin' > ); > > Last select returns this error: > > SQL Error [57P03]: FATAL: the database system is not yet accepting > connections > Detail: Consistent recovery state has not been yet reached. > > I am working on getting the backtrace. > Well, all of this works just fine for me :-( I even tried on a cluster that went thought the same PG12 -> PG17 pg_upgrade, but all of that works. Even reading the page works fine: test=# select lo_import('/tmp/dump_block_2.page'); lo_import ----------- 16443 (1 row) test=# select * from brin_page_items(lo_get(16443), 'test_idxbrin'); itemoffset | blknum | attnum | allnulls | hasnulls | placeholder | empty | value ------------+--------+--------+----------+----------+-------------+-------+-------------- 1 | 0 | 1 | f | f | f | f | {123 .. 123} (1 row) Not sure what's going on. Can you maybe share which exact Ubuntu version and packages you use? Is there anything special about the system? Do you use extensions? regards -- Tomas Vondra
On 11/11/24 15:22, Ľuboslav Špilák wrote: > Hello. > >> Could you maybe try on a completely new 17.0 cluster, not one that went >>through pg_upgrade? I don't think pg_upgrade should cause anything like >>this, but it'd be good to conclusively rule that out by reproducing the >>issue on a fresh cluster. > > We can't reproduce the problem on a completely new 17.0 cluster. > > Program received signal SIGSEGV, Segmentation fault. > 0x00005627752205d5 in heap_compute_data_size > (tupleDesc=tupleDesc@entry=0x5627a1e1eea0, > values=values@entry=0x7fff4744a450, isnull=isnull@entry=0x7fff4744a448) > at ./build/../src/backend/access/common/heaptuple.c:234 > 234 ./build/../src/backend/access/common/heaptuple.c: No such file > or directory. > *(gdb) print i* > *$1 = 6* > *(gdb) print numberOfAttributes* > *$2 = <optimized out>* > *(gdb) print *tupleDesc* > *$3 = {natts = 7, tdtypeid = 2249, tdtypmod = 0, tdrefcount = -1, constr > = 0x0, attrs = 0x5627a1e1eeb8}* > *(gdb) print *atti* > *$4 = {attrelid = 0, attname = {data = "value", '\000' <repeats 58 > times>}, atttypid = 25, attlen = -1, attnum = 7, attcacheoff = -1, > atttypmod = -1, attndims = 0, attbyval = false,* > * attalign = 105 'i', attstorage = 120 'x', attcompression = 0 '\000', > attnotnull = false, atthasdef = false, atthasmissing = false, > attidentity = 0 '\000', attgenerated = 0 '\000',* > * attisdropped = false, attislocal = true, attinhcount = 0, > attcollation = 100}* OK, this is really weird - the index you created clearly has just 1 attribute, but this descriptor says there are 7. Which means it likely accesses garbage outside the actual BRIN tuple - not surprising it crashes on that. That tuple descriptor however looks sane, so my guess is you actually have multiple indexes with the same relname, in a different schemas. And this finds the wrong one first. That would also explain why it only happens on an upgraded cluster - the new one won't have the other indexes, of course. What does SELECT * FROM pg_class WHERE relname = 'test_idxbrin'; say? My bet is it'll return multiple rows, one of which will have 7 attributes. If this is the case, it's not a bug - as Peter explained, there are some basic sanity checks, but there's not enough info to check everything. If you pass a page as bytea with a mismatching index, segfault is expected (even if unfortunate). It's a power tool - if you hold it wrong, you may get injured. One solution is to use fully qualified name of the index, including the schema. Or always set the search_path. regards -- Tomas Vondra
On 11/11/24 16:20, Ľuboslav Špilák wrote: > Hello. > > I had similar ly created table in a different schema, so there were > truly 2 rows in the given select (but the 2^nd one was created to test > the problem), so even after removing one of them the problem still > persists. > > *select* * *from* pg_class *where* relname='test_idxbrin'; > "oid","relname","relnamespace","reltype","reloftype","relowner","relam","relfilenode","reltablespace","relpages","reltuples","relallvisible","reltoastrelid","relhasindex","relisshared","relpersistence","relkind","relnatts","relchecks","relhasrules","relhastriggers","relhassubclass","relrowsecurity","relforcerowsecurity","relispopulated","relreplident","relispartition","relrewrite","relfrozenxid","relminmxid","relacl","reloptions","relpartbound" > 1128187015,test_idxbrin,2200,0,0,10,3580,1128187015,0,3,0.0,0,0,false,false,p,i,1,0,false,false,false,false,false,true,n,false,0,"0","0",,{pages_per_range=32}, > 1128178819,test_idxbrin,16830,0,0,10,3580,1128178819,0,3,0.0,0,0,false,false,p,i,1,0,false,false,false,false,false,true,n,false,0,"0","0",,{pages_per_range=32}, > > So we removed one of the tables with this index and now this select > returned one row > > *select* * *from* pg_class *where* relname='test_idxbrin'; > "oid","relname","relnamespace","reltype","reloftype","relowner","relam","relfilenode","reltablespace","relpages","reltuples","relallvisible","reltoastrelid","relhasindex","relisshared","relpersistence","relkind","relnatts","relchecks","relhasrules","relhastriggers","relhassubclass","relrowsecurity","relforcerowsecurity","relispopulated","relreplident","relispartition","relrewrite","relfrozenxid","relminmxid","relacl","reloptions","relpartbound" > 1128178819,test_idxbrin,16830,0,0,10,3580,1128178819,0,3,0.0,0,0,false,false,p,i,1,0,false,false,false,false,false,true,n,false,0,"0","0",,{pages_per_range=32}, > > > Then we called the problematic function again and it crashed. > Ah, I see. I've been looking at this assuming the descriptor is for the index, when in fact it's for the result, which actually has more attributes (so my comment about the index having just 1 attribute was misguided). But now I noticed an interesting thing - if I print the descriptor in heap_compute_data_size, I get this: (gdb) p *tupleDesc $1 = {natts = 8, tdtypeid = 2249, tdtypmod = 0, tdrefcount = -1, constr = 0x0, attrs = 0xb2d29b0} There's 8 attributes, not 7 (which is what you get). Well, the reason is likely pretty simple - I'd bet you have pageinspect at version 1.11 (or older), which didn't know about empty ranges. And 1.12 added that, and the C code dutifully fills that. But the descriptor is derived from the function signature, and that doesn't have that attribute. So it tries to interpret 0 (=false) as a pointer, and that just segfaults. If you do \dx (or select * from pg_extension), what version you get for pageinspect? And if you do "\df brin_page_items" does it have "empty" as one of the output arguments? You can try "alter extension pageinspect update" to update the function signatures, etc. That should make the segfault go away. I can reproduce this by installing pageinspect 1.11 and running the brin_page_items() query. What a stupid bug, I should have thought about this when adding the "empty" field. Thanks for the report! regards -- Tomas Vondra