Thread: BUG #18047: ODBC to PG long transaction causes PANIC
The following bug has been logged on the website: Bug reference: 18047 Logged by: Justin Tocci Email address: justin@tocci.org PostgreSQL version: 15.3 Operating system: FreeBSD Description: on the server: everything going nicely: Aug 2 19:41:12 bvc postgres[20276]: [7-24] Aug 2 19:41:12 bvc postgres[20276]: [7-25] Aug 2 19:41:12 bvc postgres[20276]: [7-26] Then the server crashes.... Aug 2 19:41:12 bvc envelope.exe[47831]: FILE: common_websocket.c:408:WS_sendFrame_step2() ERROR === CERR: 35 (Resource temporarily unavailable) === write() failed ... Aug 2 19:41:12 bvc envelope.exe[47831]: FILE: common_websocket.c:435:WS_sendFrame_step2() ERROR === CERR: 35 (Resource temporarily unavailable) === should never get to EAGAIN with libev Aug 2 19:41:12 bvc postgres[10161]: [5-1] 2023-08-02 19:41:12.502 CDT [10161] PANIC: could not open file "pg_wal/000000010000000000000006": No such file or directory Aug 2 19:41:12 bvc envelope.exe[47831]: FILE: common_client.c:326:client_notify_cb() ERROR === Lost postgresql connection: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Aug 2 19:41:12 bvc syslogd: last message repeated 3 times Aug 2 19:41:12 bvc postgres[13049]: [7-1] 2023-08-02 19:41:12.627 CDT [13049] FATAL: the database system is in recovery mode ok, we got to recovery mode... Aug 2 19:41:12 bvc envelope.exe[47831]: FILE: common_client.c:241:cnxn_reset_cb() ERROR === Connect failed: connection to server on socket "/tmp/.s.PGSQL.5434" failed: FATAL: the database system is in recovery mode Aug 2 19:41:12 bvc postgres[13568]: [7-1] 2023-08-02 19:41:12.628 CDT [13568] FATAL: the database system is in recovery mode Aug 2 19:41:12 bvc postgres[13623]: [7-1] 2023-08-02 19:41:12.628 CDT [13623] FATAL: the database system is in recovery mode Aug 2 19:41:12 bvc envelope.exe[47831]: FILE: common_client.c:241:cnxn_reset_cb() ERROR === Connect failed: connection to server on socket "/tmp/.s.PGSQL.5434" failed: FATAL: the database system is in recovery mode Aug 2 19:41:12 bvc syslogd: last message repeated 1 times Aug 2 19:41:12 bvc postgres[15492]: [8-1] 2023-08-02 19:41:12.638 CDT [15492] FATAL: the database system is in recovery mode Aug 2 19:41:12 bvc envelope.exe[47831]: FILE: common_client.c:241:cnxn_reset_cb() ERROR === Connect failed: connection to server on socket "/tmp/.s.PGSQL.5434" failed: FATAL: the database system is in recovery mode Aug 2 19:41:12 bvc postgres[14032]: [15-1] 2023-08-02 19:41:12.838 CDT [14032] PANIC: could not open file "pg_wal/000000010000000000000006": No such file or directory oh no! a PANIC! apparently we could not recover. from the client i get nothing useful: FATAL common_client.c:cnxn_cb: FATAL common_client.c:client_notify_cb: PQresetStart failed cannot start connect: connection to server on socket "/tmp/.s.PGSQL.5434" failed: No such file or directory Is the server running locally and accepting connections on that socket? last message repeated 1 times quickest way to recover is to su - postgres: pg_resetwal -f /opt/env.d/5434/data exit service postgresql restart 5434 and we are back in business. I suspect any long script will be a problem but I happen to have the schema that crashes the server for me. It's here: https://pastebin.com/Gr6gvPHa but it is long. it's a sage 100 table schema. I would say every one to four times i run this script it crashes the server. We have tried this script various ways and the result is the same but I am happy to take suggestions. fourth run with: psql -p 5434 -1f sage.sql crashes and gave me the same output as above. Aug 2 20:10:12 bvc postgres[99759]: [5-1] 2023-08-02 20:10:12.687 CDT [99759] ERROR: syntax error at or near "dsafasdfds" at character 1 Aug 2 20:10:12 bvc postgres[99759]: [5-2] 2023-08-02 20:10:12.687 CDT [99759] STATEMENT: dsafasdfds Aug 2 20:10:13 bvc postgres[20]: [5-1] 2023-08-02 20:10:13.981 CDT [20] ERROR: syntax error at or near "dsafasdfds" at character 1 Aug 2 20:10:13 bvc postgres[20]: [5-2] 2023-08-02 20:10:13.981 CDT [20] STATEMENT: dsafasdfds Aug 2 20:10:14 bvc postgres[710]: [5-1] 2023-08-02 20:10:14.823 CDT [710] ERROR: syntax error at or near "dsafasdfds" at character 1 Aug 2 20:10:14 bvc postgres[710]: [5-2] 2023-08-02 20:10:14.823 CDT [710] STATEMENT: dsafasdfds Aug 2 20:10:15 bvc postgres[88314]: [5-1] 2023-08-02 20:10:15.040 CDT [88314] PANIC: could not open file "pg_wal/000000010000000000000008": No such file or directory
Hi, On Thu, Aug 03, 2023 at 12:08:08AM -0500, justin tocci wrote: > found it! had to set security.bsd.hardlink_check_gid=0 to give postgres the > ability to create hard links. please disregard. Note sure how it was related, especially to the missing WAL part. In any case the full postgres log on the server side should give you an explanation. > quickest way to recover is to su - postgres: > pg_resetwal -f /opt/env.d/5434/data > exit > service postgresql restart 5434 > > and we are back in business. Just to be clear, this is *NOT* a way to recover, this is a way to irremediably corrupt your data. If you care about your data you should do a fresh initdb and recover from your latest backup, or if you have no backup try to extract the data from what's left on your current instance, with the fact that you can't assume that any visible row is supposed to be visible and that expected rows are not there anymore.
Hi Julien! I eventually gave up and went back to pg14. same problem, (big clue) but this time initdb gave me an error "could not link" and "operation not permitted" just before the unhelpful wal error. I had posted the logs from the postgres server. Do you mean something other by 'full postgres log'? Very interested to know if I missed something. You are correct, 'quickest way' was for a test server that was not stable already. Not something you would want to do in production. Thank you for pointing that out. I'd hate to have someone reset their wal from my recommendation. Best Regards, Justin Tocci 817-503-9545 On 8/3/2023 12:14 AM, Julien Rouhaud wrote: > Hi, > On Thu, Aug 03, 2023 at 12:08:08AM -0500, justin tocci wrote: > >> found it! had to set security.bsd.hardlink_check_gid=0 to give postgres the >> ability to create hard links. please disregard. > Note sure how it was related, especially to the missing WAL part. In any case > the full postgres log on the server side should give you an explanation. > >> quickest way to recover is to su - postgres: >> pg_resetwal -f /opt/env.d/5434/data >> exit >> service postgresql restart 5434 >> >> and we are back in business. > Just to be clear, this is *NOT* a way to recover, this is a way to irremediably > corrupt your data. > > If you care about your data you should do a fresh initdb and recover from your > latest backup, or if you have no backup try to extract the data from what's > left on your current instance, with the fact that you can't assume that any > visible row is supposed to be visible and that expected rows are not there > anymore.
On Thu, Aug 3, 2023 at 5:08 PM justin tocci <justin@tocci.org> wrote: > found it! had to set security.bsd.hardlink_check_gid=0 to give postgres the ability to create hard links. I didn't look closely but it's surprising to me that we didn't log that link() failed in 15, or cope with it better, or that the gids didn't match in the first place. But anyway, for the record, this link() usage is removed in 16 (now in beta). https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=dac1ff3