Thread: BUG #17906: Segmentation fault and database crash during procedure call

BUG #17906: Segmentation fault and database crash during procedure call

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      17906
Logged by:          Václav Pink
Email address:      vaclav.pink@tietoevry.com
PostgreSQL version: 13.10
Operating system:   running on linux server RHEL 7.9 (x64)
Description:

Good afternoon gentlemans, 

we are fighting more than 3 weeks with error: server process (PID 10028) was
terminated by signal 11: Segmentation fault
This error occur when I call db procedure (whole procedure is below).
Database is in cluster, version of Postgresql is 13.10 and it's runnig on
linux server RHEL 7.9 (x64).
We can't find reason why it fails and whole database crashes.
We will be thankful of any comments, hints, suggestions how to fix it.

If I can provide any other info what can help you, please let me know.

Thank you very much!

Best regards
Vaclav Pink

details from log:
2023-04-21 08:15:58 UTC [10028]: [8-1] user=dm_admin,db=dm_data CONTEXT: 
PL/pgSQL function dm_dev.get_domains(character varying,character
varying,character varying,character varying,jsonb) line 93 at RAISE
2023-04-21 08:15:58 UTC [8848]: [10-1] user=,db= LOG:  server process (PID
10028) was terminated by signal 11: Segmentation fault
2023-04-21 08:15:58 UTC [8848]: [11-1] user=,db= DETAIL:  Failed process was
running: CALL dm_dev.get_domains (
                'te_mf',
            null,
                null,
                null,
                null
                );
2023-04-21 08:15:58 UTC [8848]: [12-1] user=,db= LOG:  terminating any other
active server processes
2023-04-21 08:15:58 UTC [20494]: [3-1] user=snowmirror,db=dm_data WARNING: 
terminating connection because of crash of another server process
2023-04-21 08:15:58 UTC [20494]: [4-1] user=snowmirror,db=dm_data DETAIL: 
The postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.

/var/log/messages : 

Apr 21 10:15:58 tfiubb4dmrepo1 kernel: postmaster[10028]: segfault at
2e19000 ip 00007fbac32f857f sp 00007fff722d8c48 error 4 in
libc-2.17.so[7fbac31a2000+1c4000]


procedure definiton:

CREATE OR REPLACE PROCEDURE dm_dev.get_domains(
    user_name character varying,
    query character varying,
    INOUT response_code character varying,
    INOUT response_message character varying,
    INOUT response_body jsonb)
LANGUAGE 'plpgsql'

AS $BODY$ 

DECLARE
  user_internal boolean;
  user_pii boolean;
  user_domain_path character varying (40);
  user_fixed_filter1 character varying (200);
  user_fixed_filter2 character varying (200);
  user_fixed_filter3 character varying (200);
  user_domain character varying (40);
  record_count integer;
  dd_count integer;
  r_data_row jsonb;
  sql_command character varying (2000) DEFAULT '';   
  hlp_object json;  

BEGIN

CREATE UNLOGGED TABLE IF NOT EXISTS record_temp_unlog (
  id serial,
  unique_call_id character varying     (100),
  r_data jsonb
 );

  CREATE UNLOGGED TABLE IF NOT EXISTS record_temp_unlog2 (
  id serial,
  unique_call_id character varying     (100),
  r_data jsonb
 );

SELECT internal, pii, domain, domain_path, fixed_filter1, fixed_filter2,
fixed_filter3
  INTO user_internal, user_pii, user_domain, user_domain_path,
user_fixed_filter1, user_fixed_filter2, user_fixed_filter3 
  FROM dm.apim_users
 WHERE username = user_name; 

  response_code := '200'; 
  response_message := 'Returns list of ServiceNow domains'; 

sql_command := sql_command || 
'INSERT INTO record_temp_unlog (r_data)
SELECT row_to_json (r)  
  FROM (
SELECT sys_id AS "id",
           name AS "name",
           active AS active,
           type AS type,
           json_build_object( ''id'', parent_id, ''name'', parent_name) AS
parent,
           sys_domain_path AS domain_path,
           full_name AS full_name,
           level AS level,
           sn_instance_code AS sn_instance_code,
           sn_instance AS sn_instance_label,
           case when sys_domain = ' ||''''|| user_domain ||'''' || ' then true
else false end as domain_flag
    FROM dm.domains dom
   WHERE dom.sys_domain_path LIKE ' || ''''  || user_domain_path ||'%'
||''''    
     || ' AND ' || COALESCE(user_fixed_filter1, ' 1 = 1 ') 
     || ' AND ' || COALESCE(user_fixed_filter2, ' 1 = 1 ') 
     || ' AND ' || COALESCE(user_fixed_filter3, ' 1 = 1 ')       
     || ' LIMIT 10 ) r'; 

RAISE NOTICE 'sql_string (%)', sql_command;  
EXECUTE sql_command;

SELECT COUNT(*)
  INTO record_count
  FROM record_temp_unlog;

SELECT MAX(order_by) 
  INTO dd_count
  FROM dm.data_dictionary
 WHERE entity = 'domains'; 

response_body := '{}'::jsonb;

BEGIN 
   FOR i IN 1..record_count LOOP  
     SELECT r_data
       INTO r_data_row
       FROM record_temp_unlog
      WHERE id = i; 
            
      RAISE NOTICE 'pocet radku (%)', r_data_row::text;  
      
      BEGIN 
        FOR j IN 1..dd_count LOOP   
        
        SELECT CASE 
                    WHEN (r_data_row ->> 'domain_flag')::boolean IS TRUE
THEN r_data_row
                    WHEN (dd.internal IS TRUE AND user_internal IS FALSE) OR (dd.pii IS
TRUE AND user_pii IS FALSE) THEN jsonb_set_lax(r_data_row,
string_to_array(dd.name, ','), NULL, false, 'use_json_null') 
                    ELSE r_data_row                    
            END          
        INTO r_data_row
        FROM dm.data_dictionary AS dd 
       WHERE dd.entity = 'domains' 
         AND dd.order_by = j;
     
      r_data_row := jsonb_set_lax(r_data_row,
string_to_array(r_data_row->>'domain_flag', ','), NULL, false,
'use_json_null'); 
        END LOOP;
      END;
    
    INSERT INTO record_temp_unlog2 (id, r_data)
    VALUES (i, r_data_row::jsonb);
    
    RAISE NOTICE 'pocet radku po smycce (%)', r_data_row::text;

         
   END LOOP;
END;
--$do$;

SELECT array_to_json(array_agg(jsonb_strip_nulls(r_data) ORDER BY
r_data->>'id' ))
  INTO response_body
  FROM record_temp_unlog2;    

END $BODY$;


PG Bug reporting form <noreply@postgresql.org> writes:
> we are fighting more than 3 weeks with error: server process (PID 10028) was
> terminated by signal 11: Segmentation fault
> This error occur when I call db procedure (whole procedure is below).

Can't help you with such an incomplete bug report.  If you could reduce
the problem to a self-contained test case, we'd be happy to look at it.
See

https://wiki.postgresql.org/wiki/Guide_to_reporting_problems

            regards, tom lane



Re: BUG #17906: Segmentation fault and database crash during procedure call

From
Michael Paquier
Date:
On Fri, Apr 21, 2023 at 10:09:00AM -0400, Tom Lane wrote:
> Can't help you with such an incomplete bug report.  If you could reduce
> the problem to a self-contained test case, we'd be happy to look at it.
> See

FYI, an isolated test case would require to know what's behind
the definitions of the tables apim_users, data_dictionary and domains
which are used as part of the procedure you are seeing to fail.
Likely this would require a sample of the data that fails.

Being able to get a backtrace of the crash could provide hints, though
without a data sample that may be difficult.  If you send a sample of
data, also make sure to mask anything sensitive.
--
Michael

Attachment
Good afternoon guys,

Our DB admin tryed reproduce the issue in OCP (single node, cluster similar to original environment), but everything
wasOK.  

Regarding the tables which are used in the procedure -

apim_users - basic table at this time with cca 10 rows, 10 columns, datatypes varchar and 2 boolean

data_dictionary - small table with cca 10 columns and 150 rows, procedure use only 11 rows (cut by where condition) -
datatypesvarchar and 2 boolean 

domains - materialized view with 11 columns and cca 3500 rows. Datatypes varchar.

I'm sorry for not so much details about data, but we have very strict rules...

And regarding logs and info - Db admin sayed that nothing more from crash time, only messages which I sent earlier.


If something come on your mind, where can be problem, what can solve the issue, it will be very helpful.

Thank you very much.

Vaclav


-----Original Message-----
From: Michael Paquier <michael@paquier.xyz>
Sent: Monday, May 1, 2023 9:41 AM
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Vaclav Pink <vaclav.pink@tietoevry.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #17906: Segmentation fault and database crash during procedure call

On Fri, Apr 21, 2023 at 10:09:00AM -0400, Tom Lane wrote:
> Can't help you with such an incomplete bug report.  If you could
> reduce the problem to a self-contained test case, we'd be happy to look at it.
> See

FYI, an isolated test case would require to know what's behind the definitions of the tables apim_users,
data_dictionaryand domains which are used as part of the procedure you are seeing to fail. 
Likely this would require a sample of the data that fails.

Being able to get a backtrace of the crash could provide hints, though without a data sample that may be difficult.  If
yousend a sample of data, also make sure to mask anything sensitive. 
--
Michael



Vaclav Pink <vaclav.pink@tietoevry.com> writes:
> I'm sorry for not so much details about data, but we have very strict rules...

It's unlikely that the specific data involved is significant.
See if you can build a publishable test case using dummy data.

            regards, tom lane