Thread: BUG #17906: Segmentation fault and database crash during procedure call
BUG #17906: Segmentation fault and database crash during procedure call
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 17906 Logged by: Václav Pink Email address: vaclav.pink@tietoevry.com PostgreSQL version: 13.10 Operating system: running on linux server RHEL 7.9 (x64) Description: Good afternoon gentlemans, we are fighting more than 3 weeks with error: server process (PID 10028) was terminated by signal 11: Segmentation fault This error occur when I call db procedure (whole procedure is below). Database is in cluster, version of Postgresql is 13.10 and it's runnig on linux server RHEL 7.9 (x64). We can't find reason why it fails and whole database crashes. We will be thankful of any comments, hints, suggestions how to fix it. If I can provide any other info what can help you, please let me know. Thank you very much! Best regards Vaclav Pink details from log: 2023-04-21 08:15:58 UTC [10028]: [8-1] user=dm_admin,db=dm_data CONTEXT: PL/pgSQL function dm_dev.get_domains(character varying,character varying,character varying,character varying,jsonb) line 93 at RAISE 2023-04-21 08:15:58 UTC [8848]: [10-1] user=,db= LOG: server process (PID 10028) was terminated by signal 11: Segmentation fault 2023-04-21 08:15:58 UTC [8848]: [11-1] user=,db= DETAIL: Failed process was running: CALL dm_dev.get_domains ( 'te_mf', null, null, null, null ); 2023-04-21 08:15:58 UTC [8848]: [12-1] user=,db= LOG: terminating any other active server processes 2023-04-21 08:15:58 UTC [20494]: [3-1] user=snowmirror,db=dm_data WARNING: terminating connection because of crash of another server process 2023-04-21 08:15:58 UTC [20494]: [4-1] user=snowmirror,db=dm_data DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. /var/log/messages : Apr 21 10:15:58 tfiubb4dmrepo1 kernel: postmaster[10028]: segfault at 2e19000 ip 00007fbac32f857f sp 00007fff722d8c48 error 4 in libc-2.17.so[7fbac31a2000+1c4000] procedure definiton: CREATE OR REPLACE PROCEDURE dm_dev.get_domains( user_name character varying, query character varying, INOUT response_code character varying, INOUT response_message character varying, INOUT response_body jsonb) LANGUAGE 'plpgsql' AS $BODY$ DECLARE user_internal boolean; user_pii boolean; user_domain_path character varying (40); user_fixed_filter1 character varying (200); user_fixed_filter2 character varying (200); user_fixed_filter3 character varying (200); user_domain character varying (40); record_count integer; dd_count integer; r_data_row jsonb; sql_command character varying (2000) DEFAULT ''; hlp_object json; BEGIN CREATE UNLOGGED TABLE IF NOT EXISTS record_temp_unlog ( id serial, unique_call_id character varying (100), r_data jsonb ); CREATE UNLOGGED TABLE IF NOT EXISTS record_temp_unlog2 ( id serial, unique_call_id character varying (100), r_data jsonb ); SELECT internal, pii, domain, domain_path, fixed_filter1, fixed_filter2, fixed_filter3 INTO user_internal, user_pii, user_domain, user_domain_path, user_fixed_filter1, user_fixed_filter2, user_fixed_filter3 FROM dm.apim_users WHERE username = user_name; response_code := '200'; response_message := 'Returns list of ServiceNow domains'; sql_command := sql_command || 'INSERT INTO record_temp_unlog (r_data) SELECT row_to_json (r) FROM ( SELECT sys_id AS "id", name AS "name", active AS active, type AS type, json_build_object( ''id'', parent_id, ''name'', parent_name) AS parent, sys_domain_path AS domain_path, full_name AS full_name, level AS level, sn_instance_code AS sn_instance_code, sn_instance AS sn_instance_label, case when sys_domain = ' ||''''|| user_domain ||'''' || ' then true else false end as domain_flag FROM dm.domains dom WHERE dom.sys_domain_path LIKE ' || '''' || user_domain_path ||'%' ||'''' || ' AND ' || COALESCE(user_fixed_filter1, ' 1 = 1 ') || ' AND ' || COALESCE(user_fixed_filter2, ' 1 = 1 ') || ' AND ' || COALESCE(user_fixed_filter3, ' 1 = 1 ') || ' LIMIT 10 ) r'; RAISE NOTICE 'sql_string (%)', sql_command; EXECUTE sql_command; SELECT COUNT(*) INTO record_count FROM record_temp_unlog; SELECT MAX(order_by) INTO dd_count FROM dm.data_dictionary WHERE entity = 'domains'; response_body := '{}'::jsonb; BEGIN FOR i IN 1..record_count LOOP SELECT r_data INTO r_data_row FROM record_temp_unlog WHERE id = i; RAISE NOTICE 'pocet radku (%)', r_data_row::text; BEGIN FOR j IN 1..dd_count LOOP SELECT CASE WHEN (r_data_row ->> 'domain_flag')::boolean IS TRUE THEN r_data_row WHEN (dd.internal IS TRUE AND user_internal IS FALSE) OR (dd.pii IS TRUE AND user_pii IS FALSE) THEN jsonb_set_lax(r_data_row, string_to_array(dd.name, ','), NULL, false, 'use_json_null') ELSE r_data_row END INTO r_data_row FROM dm.data_dictionary AS dd WHERE dd.entity = 'domains' AND dd.order_by = j; r_data_row := jsonb_set_lax(r_data_row, string_to_array(r_data_row->>'domain_flag', ','), NULL, false, 'use_json_null'); END LOOP; END; INSERT INTO record_temp_unlog2 (id, r_data) VALUES (i, r_data_row::jsonb); RAISE NOTICE 'pocet radku po smycce (%)', r_data_row::text; END LOOP; END; --$do$; SELECT array_to_json(array_agg(jsonb_strip_nulls(r_data) ORDER BY r_data->>'id' )) INTO response_body FROM record_temp_unlog2; END $BODY$;
PG Bug reporting form <noreply@postgresql.org> writes: > we are fighting more than 3 weeks with error: server process (PID 10028) was > terminated by signal 11: Segmentation fault > This error occur when I call db procedure (whole procedure is below). Can't help you with such an incomplete bug report. If you could reduce the problem to a self-contained test case, we'd be happy to look at it. See https://wiki.postgresql.org/wiki/Guide_to_reporting_problems regards, tom lane
Re: BUG #17906: Segmentation fault and database crash during procedure call
From
Michael Paquier
Date:
On Fri, Apr 21, 2023 at 10:09:00AM -0400, Tom Lane wrote: > Can't help you with such an incomplete bug report. If you could reduce > the problem to a self-contained test case, we'd be happy to look at it. > See FYI, an isolated test case would require to know what's behind the definitions of the tables apim_users, data_dictionary and domains which are used as part of the procedure you are seeing to fail. Likely this would require a sample of the data that fails. Being able to get a backtrace of the crash could provide hints, though without a data sample that may be difficult. If you send a sample of data, also make sure to mask anything sensitive. -- Michael
Attachment
Good afternoon guys, Our DB admin tryed reproduce the issue in OCP (single node, cluster similar to original environment), but everything wasOK. Regarding the tables which are used in the procedure - apim_users - basic table at this time with cca 10 rows, 10 columns, datatypes varchar and 2 boolean data_dictionary - small table with cca 10 columns and 150 rows, procedure use only 11 rows (cut by where condition) - datatypesvarchar and 2 boolean domains - materialized view with 11 columns and cca 3500 rows. Datatypes varchar. I'm sorry for not so much details about data, but we have very strict rules... And regarding logs and info - Db admin sayed that nothing more from crash time, only messages which I sent earlier. If something come on your mind, where can be problem, what can solve the issue, it will be very helpful. Thank you very much. Vaclav -----Original Message----- From: Michael Paquier <michael@paquier.xyz> Sent: Monday, May 1, 2023 9:41 AM To: Tom Lane <tgl@sss.pgh.pa.us> Cc: Vaclav Pink <vaclav.pink@tietoevry.com>; pgsql-bugs@lists.postgresql.org Subject: Re: BUG #17906: Segmentation fault and database crash during procedure call On Fri, Apr 21, 2023 at 10:09:00AM -0400, Tom Lane wrote: > Can't help you with such an incomplete bug report. If you could > reduce the problem to a self-contained test case, we'd be happy to look at it. > See FYI, an isolated test case would require to know what's behind the definitions of the tables apim_users, data_dictionaryand domains which are used as part of the procedure you are seeing to fail. Likely this would require a sample of the data that fails. Being able to get a backtrace of the crash could provide hints, though without a data sample that may be difficult. If yousend a sample of data, also make sure to mask anything sensitive. -- Michael
Vaclav Pink <vaclav.pink@tietoevry.com> writes: > I'm sorry for not so much details about data, but we have very strict rules... It's unlikely that the specific data involved is significant. See if you can build a publishable test case using dummy data. regards, tom lane