Thread: BUG #18047: ODBC to PG long transaction causes PANIC

BUG #18047: ODBC to PG long transaction causes PANIC

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18047
Logged by:          Justin Tocci
Email address:      justin@tocci.org
PostgreSQL version: 15.3
Operating system:   FreeBSD
Description:

on the server:
everything going nicely:
Aug  2 19:41:12 bvc postgres[20276]: [7-24]
Aug  2 19:41:12 bvc postgres[20276]: [7-25]
Aug  2 19:41:12 bvc postgres[20276]: [7-26]
Then the server crashes....
Aug  2 19:41:12 bvc envelope.exe[47831]: FILE:
common_websocket.c:408:WS_sendFrame_step2() ERROR  === CERR: 35 (Resource
temporarily unavailable) === write() failed
...
Aug  2 19:41:12 bvc envelope.exe[47831]: FILE:
common_websocket.c:435:WS_sendFrame_step2() ERROR  === CERR: 35 (Resource
temporarily unavailable) === should never get to EAGAIN with libev
Aug  2 19:41:12 bvc postgres[10161]: [5-1] 2023-08-02 19:41:12.502 CDT
[10161] PANIC:  could not open file "pg_wal/000000010000000000000006": No
such file or directory
Aug  2 19:41:12 bvc envelope.exe[47831]: FILE:
common_client.c:326:client_notify_cb() ERROR  === Lost postgresql
connection: server closed the connection unexpectedly    This probably means
the server terminated abnormally    before or while processing the
request.
Aug  2 19:41:12 bvc syslogd: last message repeated 3 times
Aug  2 19:41:12 bvc postgres[13049]: [7-1] 2023-08-02 19:41:12.627 CDT
[13049] FATAL:  the database system is in recovery mode

ok, we got to recovery mode...

Aug  2 19:41:12 bvc envelope.exe[47831]: FILE:
common_client.c:241:cnxn_reset_cb() ERROR  === Connect failed: connection to
server on socket "/tmp/.s.PGSQL.5434" failed: FATAL:  the database system is
in recovery mode
Aug  2 19:41:12 bvc postgres[13568]: [7-1] 2023-08-02 19:41:12.628 CDT
[13568] FATAL:  the database system is in recovery mode
Aug  2 19:41:12 bvc postgres[13623]: [7-1] 2023-08-02 19:41:12.628 CDT
[13623] FATAL:  the database system is in recovery mode
Aug  2 19:41:12 bvc envelope.exe[47831]: FILE:
common_client.c:241:cnxn_reset_cb() ERROR  === Connect failed: connection to
server on socket "/tmp/.s.PGSQL.5434" failed: FATAL:  the database system is
in recovery mode
Aug  2 19:41:12 bvc syslogd: last message repeated 1 times
Aug  2 19:41:12 bvc postgres[15492]: [8-1] 2023-08-02 19:41:12.638 CDT
[15492] FATAL:  the database system is in recovery mode
Aug  2 19:41:12 bvc envelope.exe[47831]: FILE:
common_client.c:241:cnxn_reset_cb() ERROR  === Connect failed: connection to
server on socket "/tmp/.s.PGSQL.5434" failed: FATAL:  the database system is
in recovery mode
Aug  2 19:41:12 bvc postgres[14032]: [15-1] 2023-08-02 19:41:12.838 CDT
[14032] PANIC:  could not open file "pg_wal/000000010000000000000006": No
such file or directory

oh no! a PANIC! apparently we could not recover.

from the client i get nothing useful:
FATAL
common_client.c:cnxn_cb: FATAL
common_client.c:client_notify_cb: PQresetStart failed
cannot start connect: connection to server on socket "/tmp/.s.PGSQL.5434"
failed: No such file or directory
    Is the server running locally and accepting connections on that socket?
last message repeated 1 times

quickest way to recover is to su - postgres:
pg_resetwal -f /opt/env.d/5434/data
exit
service postgresql restart 5434

and we are back in business.

I suspect any long script will be a problem but I happen to have the schema
that crashes the server for me. It's here: https://pastebin.com/Gr6gvPHa
but it is long. it's a sage 100 table schema. I would say every one to four
times i run this script it crashes the server. We have tried this script
various ways and the result is the same but I am happy to take suggestions.


fourth run with: psql -p 5434 -1f sage.sql crashes and gave me the same
output as above.

Aug  2 20:10:12 bvc postgres[99759]: [5-1] 2023-08-02 20:10:12.687 CDT
[99759] ERROR:  syntax error at or near "dsafasdfds" at character 1
Aug  2 20:10:12 bvc postgres[99759]: [5-2] 2023-08-02 20:10:12.687 CDT
[99759] STATEMENT:  dsafasdfds
Aug  2 20:10:13 bvc postgres[20]: [5-1] 2023-08-02 20:10:13.981 CDT [20]
ERROR:  syntax error at or near "dsafasdfds" at character 1
Aug  2 20:10:13 bvc postgres[20]: [5-2] 2023-08-02 20:10:13.981 CDT [20]
STATEMENT:  dsafasdfds
Aug  2 20:10:14 bvc postgres[710]: [5-1] 2023-08-02 20:10:14.823 CDT [710]
ERROR:  syntax error at or near "dsafasdfds" at character 1
Aug  2 20:10:14 bvc postgres[710]: [5-2] 2023-08-02 20:10:14.823 CDT [710]
STATEMENT:  dsafasdfds
Aug  2 20:10:15 bvc postgres[88314]: [5-1] 2023-08-02 20:10:15.040 CDT
[88314] PANIC:  could not open file "pg_wal/000000010000000000000008": No
such file or directory


Re: BUG #18047: ODBC to PG long transaction causes PANIC

From
Julien Rouhaud
Date:
Hi,
On Thu, Aug 03, 2023 at 12:08:08AM -0500, justin tocci wrote:

> found it! had to set security.bsd.hardlink_check_gid=0 to give postgres the
> ability to create hard links.  please disregard.

Note sure how it was related, especially to the missing WAL part.  In any case
the full postgres log on the server side should give you an explanation.

> quickest way to recover is to su - postgres:
> pg_resetwal -f /opt/env.d/5434/data
> exit
> service postgresql restart 5434
>
> and we are back in business.

Just to be clear, this is *NOT* a way to recover, this is a way to irremediably
corrupt your data.

If you care about your data you should do a fresh initdb and recover from your
latest backup, or if you have no backup try to extract the data from what's
left on your current instance, with the fact that you can't assume that any
visible row is supposed to be visible and that expected rows are not there
anymore.



Re: BUG #18047: ODBC to PG long transaction causes PANIC

From
Justin Tocci
Date:
Hi Julien!

I eventually gave up and went back to pg14. same problem, (big clue) but 
this time initdb gave me an error "could not link" and "operation not 
permitted" just before the unhelpful wal error.

I had posted the logs from the postgres server. Do you mean something 
other by 'full postgres log'? Very interested to know if I missed something.

You are correct, 'quickest way' was for a test server that was not 
stable already. Not something you would want to do in production. Thank 
you for pointing that out. I'd hate to have someone reset their wal from 
my recommendation.

Best Regards,

Justin Tocci
817-503-9545

On 8/3/2023 12:14 AM, Julien Rouhaud wrote:
> Hi,
> On Thu, Aug 03, 2023 at 12:08:08AM -0500, justin tocci wrote:
>
>> found it! had to set security.bsd.hardlink_check_gid=0 to give postgres the
>> ability to create hard links.  please disregard.
> Note sure how it was related, especially to the missing WAL part.  In any case
> the full postgres log on the server side should give you an explanation.
>
>> quickest way to recover is to su - postgres:
>> pg_resetwal -f /opt/env.d/5434/data
>> exit
>> service postgresql restart 5434
>>
>> and we are back in business.
> Just to be clear, this is *NOT* a way to recover, this is a way to irremediably
> corrupt your data.
>
> If you care about your data you should do a fresh initdb and recover from your
> latest backup, or if you have no backup try to extract the data from what's
> left on your current instance, with the fact that you can't assume that any
> visible row is supposed to be visible and that expected rows are not there
> anymore.



Re: BUG #18047: ODBC to PG long transaction causes PANIC

From
Thomas Munro
Date:
On Thu, Aug 3, 2023 at 5:08 PM justin tocci <justin@tocci.org> wrote:
> found it! had to set security.bsd.hardlink_check_gid=0 to give postgres the ability to create hard links.

I didn't look closely but it's surprising to me that we didn't log
that link() failed in 15, or cope with it better, or that the gids
didn't match in the first place.  But anyway, for the record, this
link() usage is removed in 16 (now in beta).

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=dac1ff3