Thread: BUG #17421: Core dump in ECPGdo() when calling PostgreSQL API from 32-bit client for RHEL8

BUG #17421: Core dump in ECPGdo() when calling PostgreSQL API from 32-bit client for RHEL8

From

PG Bug reporting form

Date:

27 February 2022, 08:21:32

The following bug has been logged on the website:

Bug reference:      17421
Logged by:          Masayuki Hirose
Email address:      hirose.masay-01@jp.fujitsu.com
PostgreSQL version: 12.1
Operating system:   Red Hat Enterprise Linux
Description:

Core dump in ECPGdo() when calling PostgreSQL API from 32-bit client for
RHEL8

Hello, I have encountered the core dump in ECPGdo () issue.
Have you ever seen this error?
--------
(gdb) where
#0  0xf7163dc7 in strlen () from /usr/lib/libc.so.6
#1  0x08169faf in dopr ()
#2  0x08169c19 in pg_vsnprintf ()
#3  0x08169c6b in pg_snprintf ()
#4  0x08153ec7 in ecpg_raise_backend ()
#5  0x081540bb in ecpg_check_Pqresult ()
#6  0x0814da6a in ecpg_autostart_transaction ()
#7  0x0814ebc6 in ecpg_do ()
#8  0x0814ec79 in ECPGdo ()
#9  0xf7f96e40 in TJVvDatabaseAPI::_ExecSQL (this=0x9fb73f0, ctxnum=0,
--------

I am using the 32-bit client for RHEL8 and it calls PostgreSQL API, thus I
encountered this error.

Regards,
Masa

Re: BUG #17421: Core dump in ECPGdo() when calling PostgreSQL API from 32-bit client for RHEL8

From

Michael Paquier

Date:

27 February 2022, 15:20:57

On Sun, Feb 27, 2022 at 05:21:32AM +0000, PG Bug reporting form wrote:
> Have you ever seen this error?

No such issue has been reported AFAIK.  If you really are on 12.1, you
may want to update to the latest version of 12.X and retry if the
error still shows up.

> --------
> (gdb) where
> #0  0xf7163dc7 in strlen () from /usr/lib/libc.so.6
> #1  0x08169faf in dopr ()
> #2  0x08169c19 in pg_vsnprintf ()
> #3  0x08169c6b in pg_snprintf ()
> #4  0x08153ec7 in ecpg_raise_backend ()
> #5  0x081540bb in ecpg_check_Pqresult ()
> #6  0x0814da6a in ecpg_autostart_transaction ()
> #7  0x0814ebc6 in ecpg_do ()
> #8  0x0814ec79 in ECPGdo ()
> #9  0xf7f96e40 in TJVvDatabaseAPI::_ExecSQL (this=0x9fb73f0, ctxnum=0,
> --------
>
> I am using the 32-bit client for RHEL8 and it calls PostgreSQL API, thus I
> encountered this error.

Hm.  Could you isolate that in a self-contained test case?  Based on
this trace, it looks like "message" is NULL, which may be possible
because pqInternalNotice() missed something?  I would not bet on
errorMessage being NULL, but there may be holes..
--
Michael

Attachment

signature.asc

Re: BUG #17421: Core dump in ECPGdo() when calling PostgreSQL API from 32-bit client for RHEL8

From

Tom Lane

Date:

27 February 2022, 19:30:58

Michael Paquier <michael@paquier.xyz> writes:
> Hm.  Could you isolate that in a self-contained test case?  Based on
> this trace, it looks like "message" is NULL, which may be possible
> because pqInternalNotice() missed something?  I would not bet on
> errorMessage being NULL, but there may be holes..

Yeah.  It seems likely that this is a longstanding ecpglib bug
that was previously masked by platform snprintfs not crashing
on printf("%s", NULL).  If so, it's masked again in 12.8 and
later (cf 3779ac62d), but it's still a bug in that ecpg won't
print anything useful when this edge condition --- whatever it
is --- happens.  So, could we see a test case?

            regards, tom lane

RE: BUG #17421: Core dump in ECPGdo() when calling PostgreSQL API from 32-bit client for RHEL8

From

"hirose.masay-01@fujitsu.com"

Date:

05 March 2022, 09:45:21

Hi Tom and Michael,
>Michael Paquier <michael@paquier.xyz> writes:
>> Hm.  Could you isolate that in a self-contained test case?  Based on
>> this trace, it looks like "message" is NULL, which may be possible
>> because pqInternalNotice() missed something?  I would not bet on
>> errorMessage being NULL, but there may be holes..
>
>Yeah.  It seems likely that this is a longstanding ecpglib bug that was previously masked by platform snprintfs not
crashingon printf("%s", NULL).  If so, it's masked again in 12.8 and later (cf 3779ac62d), but it's still a bug in that
ecpgwon't print anything useful when this edge condition --- whatever it is --- happens.  So, could we see a test case? 
>
>            regards, tom lane
My test case to reproduce the issue is:
1. The client connects Postgres Database and issues SQL continuously.
2. Switch the Database role from Active to Standby.
The Database is mirrored by the Mirroring Controller between two clustered servers. the Mirroring Controller may be the
originalfeature added by the enterprise. 
Please let me know if you have notice and advice.
Regards,

Re: BUG #17421: Core dump in ECPGdo() when calling PostgreSQL API from 32-bit client for RHEL8

From

Michael Paquier

Date:

05 March 2022, 13:29:23

On Sat, Mar 05, 2022 at 06:45:21AM +0000, hirose.masay-01@fujitsu.com wrote:
> My test case to reproduce the issue is:
> 1. The client connects Postgres Database and issues SQL continuously.
> 2. Switch the Database role from Active to Standby.
> The Database is mirrored by the Mirroring Controller between two
> clustered servers. the Mirroring Controller may be the original
> feature added by the enterprise.

A self-contained test case enters in the category of an ECPG script
that we could use to reproduce the problem.  Personally, I have no
idea what kind of application stack you are using, and I don't know
TJVvDatabaseAPI, which I suspect is a proprietary solution for
something related to databases.  The information you are providing
here is not enough for one to know how to reproduce this problem.
--
Michael

Attachment

signature.asc

Re: BUG #17421: Core dump in ECPGdo() when calling PostgreSQL API from 32-bit client for RHEL8

From

Tom Lane

Date:

05 March 2022, 18:19:02

Michael Paquier <michael@paquier.xyz> writes:
> On Sat, Mar 05, 2022 at 06:45:21AM +0000, hirose.masay-01@fujitsu.com wrote:
>> My test case to reproduce the issue is:
>> 1. The client connects Postgres Database and issues SQL continuously.
>> 2. Switch the Database role from Active to Standby.
>> The Database is mirrored by the Mirroring Controller between two
>> clustered servers. the Mirroring Controller may be the original
>> feature added by the enterprise. 

> A self-contained test case enters in the category of an ECPG script
> that we could use to reproduce the problem.  Personally, I have no
> idea what kind of application stack you are using, and I don't know
> TJVvDatabaseAPI, which I suspect is a proprietary solution for
> something related to databases.  The information you are providing
> here is not enough for one to know how to reproduce this problem.

"Switch from active to standby" isn't even possible in community
Postgres, so there are definitely moving parts in this recipe that
we are not responsible for or familiar with.  Perhaps the problem
can be reproduced with just stock Postgres, but nobody here is
going to expend the effort to try to build a reproducer from this
amount of information.

We have a wiki page offering advice about creating
actionable problem reports:

https://wiki.postgresql.org/wiki/Guide_to_reporting_problems

            regards, tom lane

RE: BUG #17421: Core dump in ECPGdo() when calling PostgreSQL API from 32-bit client for RHEL8

From

"hirose.masay-01@fujitsu.com"

Date:

06 April 2022, 20:30:40

Hi,
Sorry for the late response. I report my test result using the client 12.10, formerly 12.10, and insight with the code
trace.
I tested the same scenario to reproduce the issue using the postgreSQL client 12.10.
The issue was reproduced same as the previous version 12.1.
The frequency is as follows :
 Client 12.1 : DB server down 30 times -> Core dump 7 times
 Client 12.10: DB server down 30 times -> Core dump 6 times

I cannot tell the detailed condition to reproduce the issue, however it looks that the issue occurs when the client
issuesSQL commit and simultaneously the Database server goes down. 
Next, I tried to find out where the core is generated in ecpg_raise_backend code with debug log inserted.
In case the core is NOT generated - Case(a), ecpg_raise_backend() processed error information(sqlca) normally. In this
case,my application retried the transaction. 
In case the core is generated - Case(b), ecpg_raise_backend() did not process error information(sqlca) correctly but
generatedthe core file. 
The difference between Case(a) and Case(b) is as follows:
In case (a), the client detects communication error between Database server 20 seconds after it issued SQL commit. When
ecpg_raise_backendis called, "status" in PGconn context was set as "CONNECTION_BAD(1)" 
In case (b) of core dumped, the client detects communication error between Database server 2 minutes after it issued
SQLcommit. When ecpg_raise_backend is called, "status" in PGconn context was set as "CONNECTION_OK(0)". 
Please check the following log message and the code trace in ecpg_raise_backend().
When "status" in the PGconn context is "CONNECTION_OK(0)" - Case(b), the core dump could occur in snprintf of
ecpg_raise_backend().In this scenario, once the "message" is NULL and the value is set unchanged until the snprintf is
called.

Please note that Japanese log messages are replaced to "xxx"

Case(a) normal
--------------
Client log:
...
20:20:18     [...]: ecpg_execute on line 941: using PQexec
20:20:18     [...]: ecpg_process_output on line 941: OK: INSERT 0 1
20:20:18     [...]: ECPGtrans on line 716: action "commit work"; connection "0"
20:20:39     [...]: ECPGnoticeReceiver: xxx (18563)
20:20:39     [...]: raising sqlcode 0
20:20:39     [...]: ecpg_check_PQresult on line 941: bad response - server closed the connection unexpectedly
20:20:39        This probably means the server terminated abnormally
20:20:39        before or while processing the request.
20:20:39     [...]: raise_backend start: conn - server closed the connection unexpectedly
20:20:39        This probably means the server terminated abnormally
20:20:39        before or while processing the request.
20:20:39     [...]: raise_backend start: conn - 1
20:20:39     [...]: raise_backend: result not NULL
20:20:39     [...]: raise_backend: sqlstate NULL
20:20:39     [...]: raise_backend: message NULL
20:20:39     [...]: raise_backend: sqlstate INTERNAL_ERROR
20:20:39     [...]: ecpg_raise_backend sqlstate: 57P02 messase: the connection to the server was lost errno: 0
20:20:39     [...]: raising sqlstate 57P02 (sqlcode -400): the connection to the server was lost on line 941
20:20:39     [...]: ECPGtrans on line 768: action "rollback work"; connection "0"
20:20:39     [...]: ecpg_check_PQresult on line 768: no result - no connection to the server
20:20:39     [...]: raise_backend start: conn - no connection to the server
20:20:39     [...]: raise_backend start: conn - 1
20:20:39     [...]: raise_backend: result NULL
20:20:39     [...]: raise_backend: sqlstate INTERNAL_ERROR
20:20:39     [...]: ecpg_raise_backend sqlstate: 57P02 messase: the connection to the server was lost errno: 0
20:20:39     [...]: raising sqlstate 57P02 (sqlcode -400): the connection to the server was lost on line 768
20:20:39     [...]: ecpg_finish: connection 0 closed

DB server log:
...
20:20:31.227 ...  PostgreSQL JDBC Driver) WARNING:  57P02: xxx (...)
...


Case(b) core dump
--------------

Client log:
...
18:38:17     [...]: ecpg_execute on line 561: using PQexec
18:38:17     [...]: ecpg_process_output on line 561: OK: CLOSE CURSOR
18:38:17     [...]: deallocate_one on line 562: name stmid
18:38:17     [...]: ECPGtrans on line 716: action "commit work"; connection "0"
18:40:15     [...]: ECPGnoticeReceiver: xxx (...)
18:40:15     [...]: raising sqlcode 0
18:40:15     [...]: ecpg_check_PQresult on line 941: bad response - server closed the connection unexpectedly
18:40:15        This probably means the server terminated abnormally
18:40:15        before or while processing the request.
18:40:15     [...]: raise_backend start: conn - server closed the connection unexpectedly
18:40:15        This probably means the server terminated abnormally
18:40:15        before or while processing the request.
18:40:15     [...]: raise_backend start: conn - 0
18:40:15     [...]: raise_backend: result not NULL
18:40:15     [...]: raise_backend: sqlstate NULL
18:40:15     [...]: raise_backend: message NULL
18:40:15     [...]: raise_backend: sqlstate INTERNAL_ERROR
18:40:16     // No subsequent log messages due to the core dump //

DB server log:
...
18:38:17.952 ...  PostgreSQL JDBC Driver) WARNING:  57P02: xxx (...)
...

The ecpg_raise_backend code used with debug log:

void
ecpg_raise_backend(int line, PGresult *result, PGconn *conn, int compat)
{
    struct sqlca_t *sqlca = ECPGget_sqlca();
    char       *sqlstate;
    char       *message;

    /* debug */
    ecpg_log("raise_backend start: conn - %s", PQerrorMessage(conn));
    ecpg_log("raise_backend start: conn - %d\n", PQstatus(conn));

    ...

    if (result)
    {
        sqlstate = PQresultErrorField(result, PG_DIAG_SQLSTATE);
        /* debug */
        ecpg_log("raise_backend: result not NULL\n");
        /* debug end */
        if (sqlstate == NULL)
        {    /* debug */
            ecpg_log("raise_backend: sqlstate NULL\n");
            /* debug end */
            sqlstate = ECPG_SQLSTATE_ECPG_INTERNAL_ERROR;
        }    /* debug */
        message = PQresultErrorField(result, PG_DIAG_MESSAGE_PRIMARY);
        /* debug */
        if (message == NULL)
        {
            ecpg_log("raise_backend: message NULL\n");
        }
        /* debug end */
    }
    else
    {
        /* debug */
        ecpg_log("raise_backend: result NULL\n");
        /* debug end */
        sqlstate = ECPG_SQLSTATE_ECPG_INTERNAL_ERROR;
        message = PQerrorMessage(conn);
    }

    if (strcmp(sqlstate, ECPG_SQLSTATE_ECPG_INTERNAL_ERROR) == 0)
    {
        /*
         * we might get here if the connection breaks down, so let's check for
         * this instead of giving just the generic internal error
         */
        /* debug */
        ecpg_log("raise_backend: sqlstate INTERNAL_ERROR\n");
        /* debug end */
        if (PQstatus(conn) == CONNECTION_BAD)
        {
            sqlstate = "57P02";
            message = ecpg_gettext("the connection to the server was lost");
        }
    }

    /* Debug start */
    ecpg_log("ecpg_raise_backend sqlstate: %s messase: %s errno: %d\n", sqlstate, message, errno);
    /* Debug end */

    /* copy error message */
    snprintf(sqlca->sqlerrm.sqlerrmc, sizeof(sqlca->sqlerrm.sqlerrmc), "%s on line %d", message, line);
    sqlca->sqlerrm.sqlerrml = strlen(sqlca->sqlerrm.sqlerrmc);

    ...

Regards,
Masa

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Sunday, March 6, 2022 12:19 AM
To: Michael Paquier <michael@paquier.xyz>
Cc: Hirose, Masayuki/廣世 政幸 <hirose.masay-01@fujitsu.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #17421: Core dump in ECPGdo() when calling PostgreSQL API from 32-bit client for RHEL8

Michael Paquier <michael@paquier.xyz> writes:
> On Sat, Mar 05, 2022 at 06:45:21AM +0000, hirose.masay-01@fujitsu.com wrote:
>> My test case to reproduce the issue is:
>> 1. The client connects Postgres Database and issues SQL continuously.
>> 2. Switch the Database role from Active to Standby.
>> The Database is mirrored by the Mirroring Controller between two
>> clustered servers. the Mirroring Controller may be the original
>> feature added by the enterprise.

> A self-contained test case enters in the category of an ECPG script
> that we could use to reproduce the problem.  Personally, I have no
> idea what kind of application stack you are using, and I don't know
> TJVvDatabaseAPI, which I suspect is a proprietary solution for
> something related to databases.  The information you are providing
> here is not enough for one to know how to reproduce this problem.

"Switch from active to standby" isn't even possible in community Postgres, so there are definitely moving parts in this
recipethat we are not responsible for or familiar with.  Perhaps the problem can be reproduced with just stock
Postgres,but nobody here is going to expend the effort to try to build a reproducer from this amount of information. 

We have a wiki page offering advice about creating actionable problem reports:

https://wiki.postgresql.org/wiki/Guide_to_reporting_problems

            regards, tom lane

RE: BUG #17421: Core dump in ECPGdo() when calling PostgreSQL API from 32-bit client for RHEL8

From

"hirose.masay-01@fujitsu.com"

Date:

06 June 2022, 12:50:16

Hello,
I could not find out the root cause of this case. Instead I propose a workaround fix for ecpglib not to generate the
corefile in snprintf(). 
I expect the fix would avoid most of irregular cases. Could you review the fix and include it to the latest sources
bothof Postgres12.x and 14.x? 
I explain the problem and the fix below:
[Problem]
The potential problem is no value is set to "message" after PQresultErrorField() is called in ecpg_raise_backend().
src\interfaces\ecpg\ecpglib\error.c
----------------------------
    if (result)
    {
        sqlstate = PQresultErrorField(result, PG_DIAG_SQLSTATE);
        if (sqlstate == NULL)
            sqlstate = ECPG_SQLSTATE_ECPG_INTERNAL_ERROR;
        message = PQresultErrorField(result, PG_DIAG_MESSAGE_PRIMARY);
    }
----------------------------

I take another function as an example, ECPGnoticeReceiver() in connect.c, which has similar code.
The value "empty message text" is set to "message" just in case. I propose to add the same code to ecpg_raise_backend()
asECPGnoticeReceiver(). 
src\interfaces\ecpg\ecpglib\connect.c
----------------------------
static void
ECPGnoticeReceiver(void *arg, const PGresult *result)
{
    char       *sqlstate = PQresultErrorField(result, PG_DIAG_SQLSTATE);
    char       *message = PQresultErrorField(result, PG_DIAG_MESSAGE_PRIMARY);

    if (sqlstate == NULL)
        sqlstate = ECPG_SQLSTATE_ECPG_INTERNAL_ERROR;

    if (message == NULL)        /* Shouldn't happen, but need to be sure */
        message = ecpg_gettext("empty message text");
----------------------------

[Fix]
The fix is to add 2 lines with changebar below in ecpg_raise_backend():
src\interfaces\ecpg\ecpglib\error.c (version12.1 line:237)
----------------------------
    if (result)
    {
        sqlstate = PQresultErrorField(result, PG_DIAG_SQLSTATE);
        if (sqlstate == NULL)
            sqlstate = ECPG_SQLSTATE_ECPG_INTERNAL_ERROR;
        message = PQresultErrorField(result, PG_DIAG_MESSAGE_PRIMARY);
|        if (message == NULL)
|            message = ecpg_gettext("empty message text");
    }
----------------------------

[Test result]
With the fix, I confirmed the snprintf() did not generate the core and returned the sqlstate "YE000" as expected.

Regards,
Masa

-----Original Message-----
From: Hirose, Masayuki/廣世 政幸
Sent: Thursday, April 7, 2022 2:31 AM
To: 'Tom Lane' <tgl@sss.pgh.pa.us>; 'Michael Paquier' <michael@paquier.xyz>
Cc: 'pgsql-bugs@lists.postgresql.org' <pgsql-bugs@lists.postgresql.org>
Subject: RE: BUG #17421: Core dump in ECPGdo() when calling PostgreSQL API from 32-bit client for RHEL8

Hi,
Sorry for the late response. I report my test result using the client 12.10, formerly 12.10, and insight with the code
trace.
I tested the same scenario to reproduce the issue using the postgreSQL client 12.10.
The issue was reproduced same as the previous version 12.1.
The frequency is as follows :
 Client 12.1 : DB server down 30 times -> Core dump 7 times  Client 12.10: DB server down 30 times -> Core dump 6 times

I cannot tell the detailed condition to reproduce the issue, however it looks that the issue occurs when the client
issuesSQL commit and simultaneously the Database server goes down. 
Next, I tried to find out where the core is generated in ecpg_raise_backend code with debug log inserted.
In case the core is NOT generated - Case(a), ecpg_raise_backend() processed error information(sqlca) normally. In this
case,my application retried the transaction. 
In case the core is generated - Case(b), ecpg_raise_backend() did not process error information(sqlca) correctly but
generatedthe core file. 
The difference between Case(a) and Case(b) is as follows:
In case (a), the client detects communication error between Database server 20 seconds after it issued SQL commit. When
ecpg_raise_backendis called, "status" in PGconn context was set as "CONNECTION_BAD(1)" 
In case (b) of core dumped, the client detects communication error between Database server 2 minutes after it issued
SQLcommit. When ecpg_raise_backend is called, "status" in PGconn context was set as "CONNECTION_OK(0)". 
Please check the following log message and the code trace in ecpg_raise_backend().
When "status" in the PGconn context is "CONNECTION_OK(0)" - Case(b), the core dump could occur in snprintf of
ecpg_raise_backend().In this scenario, once the "message" is NULL and the value is set unchanged until the snprintf is
called.

Please note that Japanese log messages are replaced to "xxx"

Case(a) normal
--------------
Client log:
...
20:20:18     [...]: ecpg_execute on line 941: using PQexec
20:20:18     [...]: ecpg_process_output on line 941: OK: INSERT 0 1
20:20:18     [...]: ECPGtrans on line 716: action "commit work"; connection "0"
20:20:39     [...]: ECPGnoticeReceiver: xxx (18563)
20:20:39     [...]: raising sqlcode 0
20:20:39     [...]: ecpg_check_PQresult on line 941: bad response - server closed the connection unexpectedly
20:20:39        This probably means the server terminated abnormally
20:20:39        before or while processing the request.
20:20:39     [...]: raise_backend start: conn - server closed the connection unexpectedly
20:20:39        This probably means the server terminated abnormally
20:20:39        before or while processing the request.
20:20:39     [...]: raise_backend start: conn - 1
20:20:39     [...]: raise_backend: result not NULL
20:20:39     [...]: raise_backend: sqlstate NULL
20:20:39     [...]: raise_backend: message NULL
20:20:39     [...]: raise_backend: sqlstate INTERNAL_ERROR
20:20:39     [...]: ecpg_raise_backend sqlstate: 57P02 messase: the connection to the server was lost errno: 0
20:20:39     [...]: raising sqlstate 57P02 (sqlcode -400): the connection to the server was lost on line 941
20:20:39     [...]: ECPGtrans on line 768: action "rollback work"; connection "0"
20:20:39     [...]: ecpg_check_PQresult on line 768: no result - no connection to the server
20:20:39     [...]: raise_backend start: conn - no connection to the server
20:20:39     [...]: raise_backend start: conn - 1
20:20:39     [...]: raise_backend: result NULL
20:20:39     [...]: raise_backend: sqlstate INTERNAL_ERROR
20:20:39     [...]: ecpg_raise_backend sqlstate: 57P02 messase: the connection to the server was lost errno: 0
20:20:39     [...]: raising sqlstate 57P02 (sqlcode -400): the connection to the server was lost on line 768
20:20:39     [...]: ecpg_finish: connection 0 closed

DB server log:
...
20:20:31.227 ...  PostgreSQL JDBC Driver) WARNING:  57P02: xxx (...) ...


Case(b) core dump
--------------

Client log:
...
18:38:17     [...]: ecpg_execute on line 561: using PQexec
18:38:17     [...]: ecpg_process_output on line 561: OK: CLOSE CURSOR
18:38:17     [...]: deallocate_one on line 562: name stmid
18:38:17     [...]: ECPGtrans on line 716: action "commit work"; connection "0"
18:40:15     [...]: ECPGnoticeReceiver: xxx (...)
18:40:15     [...]: raising sqlcode 0
18:40:15     [...]: ecpg_check_PQresult on line 941: bad response - server closed the connection unexpectedly
18:40:15        This probably means the server terminated abnormally
18:40:15        before or while processing the request.
18:40:15     [...]: raise_backend start: conn - server closed the connection unexpectedly
18:40:15        This probably means the server terminated abnormally
18:40:15        before or while processing the request.
18:40:15     [...]: raise_backend start: conn - 0
18:40:15     [...]: raise_backend: result not NULL
18:40:15     [...]: raise_backend: sqlstate NULL
18:40:15     [...]: raise_backend: message NULL
18:40:15     [...]: raise_backend: sqlstate INTERNAL_ERROR
18:40:16     // No subsequent log messages due to the core dump //

DB server log:
...
18:38:17.952 ...  PostgreSQL JDBC Driver) WARNING:  57P02: xxx (...) ...

The ecpg_raise_backend code used with debug log:

void
ecpg_raise_backend(int line, PGresult *result, PGconn *conn, int compat) {
    struct sqlca_t *sqlca = ECPGget_sqlca();
    char       *sqlstate;
    char       *message;

    /* debug */
    ecpg_log("raise_backend start: conn - %s", PQerrorMessage(conn));
    ecpg_log("raise_backend start: conn - %d\n", PQstatus(conn));

    ...

    if (result)
    {
        sqlstate = PQresultErrorField(result, PG_DIAG_SQLSTATE);
        /* debug */
        ecpg_log("raise_backend: result not NULL\n");
        /* debug end */
        if (sqlstate == NULL)
        {    /* debug */
            ecpg_log("raise_backend: sqlstate NULL\n");
            /* debug end */
            sqlstate = ECPG_SQLSTATE_ECPG_INTERNAL_ERROR;
        }    /* debug */
        message = PQresultErrorField(result, PG_DIAG_MESSAGE_PRIMARY);
        /* debug */
        if (message == NULL)
        {
            ecpg_log("raise_backend: message NULL\n");
        }
        /* debug end */
    }
    else
    {
        /* debug */
        ecpg_log("raise_backend: result NULL\n");
        /* debug end */
        sqlstate = ECPG_SQLSTATE_ECPG_INTERNAL_ERROR;
        message = PQerrorMessage(conn);
    }

    if (strcmp(sqlstate, ECPG_SQLSTATE_ECPG_INTERNAL_ERROR) == 0)
    {
        /*
         * we might get here if the connection breaks down, so let's check for
         * this instead of giving just the generic internal error
         */
        /* debug */
        ecpg_log("raise_backend: sqlstate INTERNAL_ERROR\n");
        /* debug end */
        if (PQstatus(conn) == CONNECTION_BAD)
        {
            sqlstate = "57P02";
            message = ecpg_gettext("the connection to the server was lost");
        }
    }

    /* Debug start */
    ecpg_log("ecpg_raise_backend sqlstate: %s messase: %s errno: %d\n", sqlstate, message, errno);
    /* Debug end */

    /* copy error message */
    snprintf(sqlca->sqlerrm.sqlerrmc, sizeof(sqlca->sqlerrm.sqlerrmc), "%s on line %d", message, line);
    sqlca->sqlerrm.sqlerrml = strlen(sqlca->sqlerrm.sqlerrmc);

    ...

Regards,
Masa

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Sunday, March 6, 2022 12:19 AM
To: Michael Paquier <michael@paquier.xyz>
Cc: Hirose, Masayuki/廣世 政幸 <hirose.masay-01@fujitsu.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #17421: Core dump in ECPGdo() when calling PostgreSQL API from 32-bit client for RHEL8

Michael Paquier <michael@paquier.xyz> writes:
> On Sat, Mar 05, 2022 at 06:45:21AM +0000, hirose.masay-01@fujitsu.com wrote:
>> My test case to reproduce the issue is:
>> 1. The client connects Postgres Database and issues SQL continuously.
>> 2. Switch the Database role from Active to Standby.
>> The Database is mirrored by the Mirroring Controller between two
>> clustered servers. the Mirroring Controller may be the original
>> feature added by the enterprise.

> A self-contained test case enters in the category of an ECPG script
> that we could use to reproduce the problem.  Personally, I have no
> idea what kind of application stack you are using, and I don't know
> TJVvDatabaseAPI, which I suspect is a proprietary solution for
> something related to databases.  The information you are providing
> here is not enough for one to know how to reproduce this problem.

"Switch from active to standby" isn't even possible in community Postgres, so there are definitely moving parts in this
recipethat we are not responsible for or familiar with.  Perhaps the problem can be reproduced with just stock
Postgres,but nobody here is going to expend the effort to try to build a reproducer from this amount of information. 

We have a wiki page offering advice about creating actionable problem reports:

https://wiki.postgresql.org/wiki/Guide_to_reporting_problems

            regards, tom lane

Re: BUG #17421: Core dump in ECPGdo() when calling PostgreSQL API from 32-bit client for RHEL8

From

Tom Lane

Date:

06 June 2022, 18:01:52

"hirose.masay-01@fujitsu.com" <hirose.masay-01@fujitsu.com> writes:
> [Problem]
> The potential problem is no value is set to "message" after PQresultErrorField() is called in ecpg_raise_backend().

Ah-hah.  You're right, and this explains the symptoms exactly, because
libpq-generated error results don't contain broken-down fields, so we'd
get a null precisely in cases such as lost connection.

> |        if (message == NULL)
> |            message = ecpg_gettext("empty message text");

No, that'd be pretty unhelpful.  The best response is to substitute
PQerrorMessage(conn) in such cases.  We do that in, for example,
postgres_fdw.

(I don't feel a need to change ECPGnoticeReceiver, because that only
deals with NOTICE results which should always have such a field, and
PQerrorMessage wouldn't be relevant to a non-ERROR result anyway.)

Will fix, thanks for the report!

            regards, tom lane

diff --git a/src/interfaces/ecpg/ecpglib/error.c b/src/interfaces/ecpg/ecpglib/error.c
index cd6c6a6819..26fdcdb69e 100644
--- a/src/interfaces/ecpg/ecpglib/error.c
+++ b/src/interfaces/ecpg/ecpglib/error.c
@@ -229,18 +229,17 @@ ecpg_raise_backend(int line, PGresult *result, PGconn *conn, int compat)
         return;
     }

-    if (result)
-    {
-        sqlstate = PQresultErrorField(result, PG_DIAG_SQLSTATE);
-        if (sqlstate == NULL)
-            sqlstate = ECPG_SQLSTATE_ECPG_INTERNAL_ERROR;
-        message = PQresultErrorField(result, PG_DIAG_MESSAGE_PRIMARY);
-    }
-    else
-    {
+    /*
+     * PQresultErrorField will return NULL if "result" is NULL, or if there is
+     * no such field, which will happen for libpq-generated errors.  Fall back
+     * to PQerrorMessage in such cases.
+     */
+    sqlstate = PQresultErrorField(result, PG_DIAG_SQLSTATE);
+    if (sqlstate == NULL)
         sqlstate = ECPG_SQLSTATE_ECPG_INTERNAL_ERROR;
+    message = PQresultErrorField(result, PG_DIAG_MESSAGE_PRIMARY);
+    if (message == NULL)
         message = PQerrorMessage(conn);
-    }

     if (strcmp(sqlstate, ECPG_SQLSTATE_ECPG_INTERNAL_ERROR) == 0)
     {

RE: BUG #17421: Core dump in ECPGdo() when calling PostgreSQL API from 32-bit client for RHEL8

From

"hirose.masay-01@fujitsu.com"

Date:

07 June 2022, 04:14:50

Hello,
Thanks a lot for review and patch. With the patch I will test to verify my issue is fixed.
Kindly let me know if you have expected release month for the fix.
Regards,
Masa