Re: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY
Date
Msg-id 883b7a3c-84ac-4825-a73e-6772c81bc1e1@vondra.me
Whole thread Raw
In response to RE: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY  (Ana Almeida <Ana.Almeida@timestamp.pt>)
Responses RE: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY
List pgsql-bugs
On 3/18/26 15:54, Ana Almeida wrote:
> Hello Jim,
> 
> I didn’t notice that the error showed the schema and table name. For
> confidentiality reasons, could you please not share the schema and table
> name if this is released as a bug?
> 
> Here is the information:
> 
>  
> 
>                                                          Table
> "myschema.mytable"
> 
>        Column       |            Type             | Collation | Nullable
> | Default | Storage  | Compression | Stats target | Description
> 
> --------------------+-----------------------------+-----------
> +----------+---------+----------+-------------+--------------+-------------
> 
> id                 | bigint                      |           | not null
> |         | plain    |             |              |
> 
> axxxxxx            | character varying(32)       |           | not null
> |         | extended |             |              |
> 
> bxx                | text                        |           | not null
> |         | extended |             |              |
> 
> cxxxxxxx           | text                        |           | not null
> |         | extended |             |              |
> 
> dxxxxxxxx          | text                        |           |         
> |         | extended |             |              |
> 
> lag_val            | text                        |           |         
> |         | extended |             |              |
> 
> exxxxxxxxxx        | text                        |           |         
> |         | extended |             |              |
> 
> fxxxxxxxxxxxxx     | text                        |           |         
> |         | extended |             |              |
> 
> gxxxxxxxxxxxx      | text                        |           |         
> |         | extended |             |              |
> 
> hxxxxxx            | numeric                     |           | not null
> |         | main     |             |              |
> 
> ixxxxxxxxxxxxxx    | numeric                     |           |         
> |         | main     |             |              |
> 
> jxxxxxxxxxxxxxx    | numeric                     |           |         
> |         | main     |             |              |
> 
> kxxxxxx            | integer                     |           |         
> |         | plain    |             |              |
> 
> lxxxxxxxxxxxx      | integer                     |           | not null
> |         | plain    |             |              |
> 
> mxxxxxxxxxxxxxx    | timestamp without time zone |           |         
> |         | plain    |             |              |
> 
> nxxxxxxxxxxxxx     | timestamp without time zone |           |         
> |         | plain    |             |              |
> 
> oxxxxxxxxxxxx      | timestamp without time zone |           |         
> |         | plain    |             |              |
> 
> pxxxxxxxxxxx       | timestamp without time zone |           | not null
> |         | plain    |             |              |
> 
> qr_mydb_id         | bigint                      |           |         
> |         | plain    |             |              |
> 
> qxxxxxx            | character varying(100)      |           |         
> |         | extended |             |              |
> 
> Indexes:
> 
>     "mytable_pkey" PRIMARY KEY, btree (id)
> 
>     "idx_lag_val" btree (lag_val)
> 
>     "idx_mytable_qr_mydb" btree (qr_mydb_id)
> 
> Foreign-key constraints:
> 
>     "fk__mytable__qr_mydb" FOREIGN KEY (qr_mydb_id) REFERENCES
> myschema.qr_mydb(id)
> 
> Access method: heap
> 
> Options: autovacuum_enabled=true, toast.autovacuum_enabled=true
> 
>  
> 
> Just another note, before we also had the error below in the same
> reindex command. The database didn’t crash when that error happened but
> the reindex failed. After that, we recreated the table.
> 
>  
> 
> ERROR:  could not open file "base/179146/184526.4" (target block
> 808464432): previous segment is only 99572 blocks
> 


So what was the sequence of events, exactly? You got this "could not
open file" error during REINDEX CONCURRENTLY, you recreated the table
and then it crashed on some later REINDEX CONCURRENTLY?

How did you recreate the table? Did you reload it from a backup or
something else?

>
> We haven’t been able to reproduce the errors again.
> 

That suggests it might have been some sort of data corruption, but it's
just a guess. Have you checked the server log if there are any messages
suggesting e.g. storage / memory issues or something like that?

Per the backtrace you shared in the previous message, the segfault
happened here:

  #0  0x00000000005d67a8 validate_index_callback (postgres)
  #1  0x00000000005738bd btvacuumpage (postgres)
  #2  0x0000000000573d8a btvacuumscan (postgres)
  #3  0x0000000000573f00 btbulkdelete (postgres)
  ...

Which is a very heavily exercised code, so I'm somewhat skeptical a bug
would go unnoticed for very long. It's possible, of course. But the
validate_index_callback doesn't do all that much - it just writes the
TID value to a tuplesort / temporary file.

It seems you have the core saved in a file:

> Storage: /var/lib/systemd/coredump/core.postgres.26.0a32...

Can you try inspecting getting a better backtrace using gdb? It might
tell us if there's a bogus pointer or something like that. Or maybe not,
chances are the compiler optimized some of the variables, but it's worth
a try.


regards

-- 
Tomas Vondra




pgsql-bugs by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: ALTER FOREIGN DATA WRAPPER can drop dependency on handler
Next
From: Tender Wang
Date:
Subject: Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables