Re: Segfault while creating logical replication slots on active DB 14.6-1 + 15.1-1 - Mailing list pgsql-bugs
From | Alex Richman |
---|---|
Subject | Re: Segfault while creating logical replication slots on active DB 14.6-1 + 15.1-1 |
Date | |
Msg-id | CAMnUB3oJ5rjxxrtB6vc15RdyqDeVtsfMn1aEehzWATw56VNtow@mail.gmail.com Whole thread Raw |
In response to | Re: Segfault while creating logical replication slots on active DB 14.6-1 + 15.1-1 (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Segfault while creating logical replication slots on active DB 14.6-1 + 15.1-1
|
List | pgsql-bugs |
Apologies for the delay (and happy christmas/new years).
Please find included a full backtrace[1] of a sample of this crash, replicated on postgres 15.1-1 in the same environment described in my original email. Included as a gist due to the length but lmk if it should be pasted in full for posterity. I've also added the python script[2] used to replicate, if that's relevant.
Unfortunately we have not been able to reproduce this in a clean room environment, however we can note a few additional things:
- This has occurred over multiple distinct servers with different data sets, though similar write loads. Suggesting it's not a specific server with data corruption.
- Disabling pg_repack, autovacuum, automatic reindexing, has no effect, the bug can still occur
- Running the same script on a read-only logical replica does not hit the bug
- As above, if the server is idle (no write traffic), then it does not hit the bug
- The bug occurs roughly 1 in every 10 executions of the create replication slot, so is not 100% consistent.
- We're fairly confident that this did not occur pre 14.5-1, and started occurring in 14.6-1 & 15.1-1.
So we would assume that there is some concurrent write traffic from our web tier that sometimes causes a segfault in the logical replication slot creation.
Please let me know if you need any more information.
Thanks,
- Alex.
On Thu, 15 Dec 2022 at 03:06, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Dec 15, 2022 at 4:35 AM Alex Richman <alexrichman@onesignal.com> wrote:
>
> We've noticed a segfault bug while creating logical replication slots. We've seen this on 14.6-1 and 15.1-1 (both installed from postgres apt-archive repos to debian 11 hosts). We're confident that this behaviour is not present in 14.5-1.
>
> We can only reproduce this if the server is under load (reads/writes from clients, perhaps also pg_repacks, reindexes etc). If the server is idle then we cannot reproduce it.
>
> Dmesg logs:
> 211935.284834] postgres[2297089]: segfault at 10 ip 000055907a33dd97 sp 00007fff6ebef068 error 4 in postgres[559079e4d000+51a000]
>
> Postgres logs:
> 2022-12-14 22:33:58 UTC 2297089STATEMENT: SELECT pg_create_logical_replication_slot('replica_b7fa97cc_28', 'pgoutput', false);
> 2022-12-14 22:33:58 UTC 778310LOG: server process (PID 2297089) was terminated by signal 11: Segmentation fault
> 2022-12-14 22:33:58 UTC 778310DETAIL: Failed process was running: SELECT pg_create_logical_replication_slot('replica_b7fa97cc_28', 'pgoutput', false);
> 2022-12-14 22:33:58 UTC 778310LOG: terminating any other active server processes
>
> Journal logs:
> Dec 14 22:33:58 c7a2da82-x60-postgres-persistence-onesignal kernel: postgres[2297089]: segfault at 10 ip 000055907a33dd97 sp 00007fff6ebef068 error 4 in postgres[559079e4d000+51a000]
> Dec 14 22:33:58 c7a2da82-x60-postgres-persistence-onesignal kernel: Code: 48 83 c4 08 49 89 c0 5b 41 5c 4c 89 c0 41 5d 5d c3 66 90 83 e3 02 75 b8 e9 71 18 b9 ff e9 d3 18 b9 ff 90 48 89 fe 48 8b 7f f8 <48> 8b 47 >
>
> Let me know if you need any more info.
>
It is difficult to diagnose this without more information because
there is no clear call stack or a reproducible scenario. Will it be
possible for you to get a reproducible test case or at least call
stack to proceed?
--
With Regards,
Amit Kapila.
pgsql-bugs by date: