Thread: BUG #16369: Segmentation Faults and Data Corruption with Generated Columns
BUG #16369: Segmentation Faults and Data Corruption with Generated Columns
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16369 Logged by: Cameron Ezell Email address: cameron.ezell@clearcapital.com PostgreSQL version: 12.2 Operating system: CentOS 8, Red Hat 8, Mac OS X 10.14.6 Description: It seems that there are a few bugs that are throwing segmentation faults and causing data corruption. These issues keep appearing for our team when using tables that contain generated columns introduced in PostgreSQL 12. This has been tested on 12.2 & 12.1 on CentOS 8 as well as 12.1 on MacOS 10.14.6: select version(); -- PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit CREATE SCHEMA if not exists test;commit; CREATE TABLE if not exists test.bug_report ( id bigint generated by default as identity, hostname varchar, hostname_short varchar GENERATED ALWAYS AS (split_part(hostname, '.', 1)) STORED, device text, mount text, used_space_bytes bigint, used_space_gb numeric GENERATED ALWAYS AS (ROUND(used_space_bytes / 1073741824.0,2)) STORED, avail_space_bytes bigint, avail_space_gb numeric GENERATED ALWAYS AS (ROUND(avail_space_bytes / 1073741824.0,2)) STORED, inserted_dts timestamp with time zone NOT NULL DEFAULT clock_timestamp(), inserted_by text NOT NULL DEFAULT session_user );commit; -- No problems on the following insert INSERT INTO test.bug_report(hostname, device, mount, used_space_bytes, avail_space_bytes) VALUES ('123456789', 'devtmpfs', '/dev', 0, 6047076131313); select * from test.bug_report;commit; -- On CentOS 8, this bug is triggered with a hostname with 10+ characters. On MacOS 10.14.6, 19+ characters. INSERT INTO test.bug_report(hostname, device, mount, used_space_bytes, avail_space_bytes) VALUES ('12345678901234567890', 'devtmpfs', '/dev', 0, 6047076131313);commit; -- This should immediately crash the postgres service -- Inserting some strings below that character threshold will insert just fine, but a select statement on the table will now throw an error INSERT INTO test.bug_report(hostname, device, mount, used_space_bytes, avail_space_bytes) VALUES ('abc', 'devtmpfs', '/dev', 0, 6047076131313);commit; select * from test.bug_report;commit; --ERROR: XX000: invalid memory alloc request size 18446744073709551613 --LOCATION: palloc, mcxt.c:934 -- Once the "hostname_short" column no longer references any other column, I am unable to reproduce this error CREATE TABLE if not exists test.bug_report2 ( id bigint generated by default as identity, hostname varchar, hostname_short varchar GENERATED ALWAYS AS ('static_string') STORED, device text, mount text, used_space_bytes bigint, used_space_gb numeric GENERATED ALWAYS AS (ROUND(used_space_bytes / 1073741824.0,2)) STORED, avail_space_bytes bigint, avail_space_gb numeric GENERATED ALWAYS AS (ROUND(avail_space_bytes / 1073741824.0,2)) STORED, inserted_dts timestamp with time zone NOT NULL DEFAULT clock_timestamp(), inserted_by text NOT NULL DEFAULT session_user );commit; INSERT INTO test.bug_report2(hostname, device, mount, used_space_bytes, avail_space_bytes) VALUES ('12345678901234567890', 'devtmpfs', '/dev', 0, 6047076131313);commit; select * from test.bug_report2;commit; INSERT INTO test.bug_report2(hostname, device, mount, used_space_bytes, avail_space_bytes) VALUES ('abc', 'devtmpfs', '/dev', 0, 6047076131313);commit; select * from test.bug_report2;commit; -- simply referencing another column in the generated column will cause a crash CREATE TABLE if not exists test.bug_report3 ( id bigint generated by default as identity, hostname varchar, hostname_short varchar GENERATED ALWAYS AS (hostname) STORED, device text, mount text, used_space_bytes bigint, used_space_gb numeric GENERATED ALWAYS AS (ROUND(used_space_bytes / 1073741824.0,2)) STORED, avail_space_bytes bigint, avail_space_gb numeric GENERATED ALWAYS AS (ROUND(avail_space_bytes / 1073741824.0,2)) STORED, inserted_dts timestamp with time zone NOT NULL DEFAULT clock_timestamp(), inserted_by text NOT NULL DEFAULT session_user );commit; -- immediate crash INSERT INTO test.bug_report3(hostname, device, mount, used_space_bytes, avail_space_bytes) VALUES ('12345678901234567890', 'devtmpfs', '/dev', 0, 6047076131313);commit; -- no crash on insert INSERT INTO test.bug_report3(hostname, device, mount, used_space_bytes, avail_space_bytes) VALUES ('abc', 'devtmpfs', '/dev', 0, 6047076131313);commit; -- error thrown on select select * from test.bug_report3;commit; --ERROR: XX000: invalid memory alloc request size 18446744073709551613 --LOCATION: palloc, mcxt.c:934
Re: BUG #16369: Segmentation Faults and Data Corruption withGenerated Columns
From
David Rowley
Date:
On Thu, 16 Apr 2020 at 08:53, PG Bug reporting form <noreply@postgresql.org> wrote: > -- On CentOS 8, this bug is triggered with a hostname with 10+ characters. > On MacOS 10.14.6, 19+ characters. > INSERT INTO test.bug_report(hostname, device, mount, used_space_bytes, > avail_space_bytes) VALUES ('12345678901234567890', 'devtmpfs', '/dev', 0, > 6047076131313);commit; > -- This should immediately crash the postgres service --LOCATION: palloc, mcxt.c:934 Thanks for the report. This is certainly a bug. I slightly simplified the test case to become: drop table if exists crash; CREATE TABLE crash ( id bigint generated by default as identity, hostname varchar, hostname_short varchar GENERATED ALWAYS AS (hostname) STORED, device text, mount text, used_space_bytes bigint, used_space_gb bigint GENERATED ALWAYS AS (used_space_bytes) STORED, avail_space_bytes bigint, avail_space_gb bigint GENERATED ALWAYS AS (avail_space_bytes) STORED, inserted_dts timestamp with time zone NOT NULL DEFAULT clock_timestamp(), inserted_by text NOT NULL DEFAULT session_user ); INSERT INTO crash (hostname, device, mount, used_space_bytes, avail_space_bytes) VALUES ('12345678901234567', 'devtmpfs', '/dev', 0, 6047076131313); -- no crash INSERT INTO crash (hostname, device, mount, used_space_bytes, avail_space_bytes) VALUES ('123456789012345678', 'devtmpfs', '/dev', 0, 6047076131313); -- crash The crash occurs during ExecComputeStoredGenerated() when calculating the hostname_short column. When we call ExecEvalExpr() for that column, the eval function just returns the value of the Datum in the "hostname" column, which is stored in "slot". This happens to be a pointer into the tuple. This results in a crash because, later in that function, we do ExecClearTuple(slot), which frees the memory for that tuple. I suppose it normally works because generally the calculation will be using some SQL function which will return some calculated value which is allocated, but in this case, we just return the pointer to the memory in the tuple slot. I'd say the fix should be to just datumCopy() the result of the calculation. i.e the attached. David
Attachment
David Rowley <dgrowleyml@gmail.com> writes: > I'd say the fix should be to just datumCopy() the result of the > calculation. i.e the attached. That looks like it'll fail on a null result. regards, tom lane
Re: BUG #16369: Segmentation Faults and Data Corruption withGenerated Columns
From
David Rowley
Date:
On Thu, 16 Apr 2020 at 11:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > I'd say the fix should be to just datumCopy() the result of the > > calculation. i.e the attached. > > That looks like it'll fail on a null result. Thanks. I'll have another go then.
Attachment
Re: BUG #16369: Segmentation Faults and Data Corruption withGenerated Columns
From
Michael Paquier
Date:
On Thu, Apr 16, 2020 at 12:40:24PM +1200, David Rowley wrote: > On Thu, 16 Apr 2020 at 11:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> That looks like it'll fail on a null result. > > Thanks. I'll have another go then. Shouldn't you have more regression test cases here? I guess one for the non-NULL case, and one for the NULL case? -- Michael
Attachment
Re: BUG #16369: Segmentation Faults and Data Corruption withGenerated Columns
From
Peter Eisentraut
Date:
On 2020-04-16 07:30, Michael Paquier wrote: > On Thu, Apr 16, 2020 at 12:40:24PM +1200, David Rowley wrote: >> On Thu, 16 Apr 2020 at 11:52, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> That looks like it'll fail on a null result. >> >> Thanks. I'll have another go then. > > Shouldn't you have more regression test cases here? I guess one for > the non-NULL case, and one for the NULL case? I've tried creating a smaller test case, but it's difficult. I haven't fully understood the reason why it sometimes crashes and sometimes not. The smallest I could come up with so far is something like this: CREATE TABLE crash ( hostname varchar, hostname_short varchar GENERATED ALWAYS AS (hostname) STORED, device text, mount text, used_space_bytes bigint, avail_space_bytes bigint ); INSERT INTO crash (hostname, device, mount, used_space_bytes, avail_space_bytes) VALUES (repeat('1234567890', 100), 'devtmpfs', '/dev', 0, 6047076131313); Would this be easier to reproduce with one of the memory-cleaning #defines enabled? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: BUG #16369: Segmentation Faults and Data Corruption withGenerated Columns
From
David Rowley
Date:
On Fri, 17 Apr 2020 at 21:04, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > I've tried creating a smaller test case, but it's difficult. I haven't > fully understood the reason why it sometimes crashes and sometimes not. > > The smallest I could come up with so far is something like this: > > CREATE TABLE crash ( > hostname varchar, > hostname_short varchar GENERATED ALWAYS AS (hostname) STORED, > device text, > mount text, > used_space_bytes bigint, > avail_space_bytes bigint > ); > INSERT INTO crash (hostname, device, mount, used_space_bytes, > avail_space_bytes) VALUES (repeat('1234567890', 100), 'devtmpfs', > '/dev', 0, 6047076131313); > > Would this be easier to reproduce with one of the memory-cleaning > #defines enabled? I've done a bit more work on this and have something close. The case I came up with for it was: +CREATE TABLE gtest_varlena (a varchar, b varchar GENERATED ALWAYS AS (a) STORED); +INSERT INTO gtest_varlena (a) VALUES('01234567890123456789'); +INSERT INTO gtest_varlena (a) VALUES(NULL); +SELECT * FROM gtest_varlena ORDER BY a; + a | b +----------------------+---------------------- + 01234567890123456789 | 01234567890123456789 + | +(2 rows) + +DROP TABLE gtest_varlena; With that, on an unpatched server, there's no crash, but there's clearly junk in the b column.
Re: BUG #16369: Segmentation Faults and Data Corruption withGenerated Columns
From
Terry Schmitt
Date:
I've tried creating a smaller test case, but it's difficult. I haven't
fully understood the reason why it sometimes crashes and sometimes not.
The smallest I could come up with so far is something like this:
For context, I work with Cameron, who submitted this bug.
I totally agree with Peter on this. We tried to simplify the test case on our end, but the symptoms would change in odd ways. Changing the column order would change the symptoms slightly and at times the columns with DEFAULT values would be corrupt. It also seemed like string text based columns would behave differently than integers.
Re: BUG #16369: Segmentation Faults and Data Corruption withGenerated Columns
From
David Rowley
Date:
On Thu, 16 Apr 2020 at 17:30, Michael Paquier <michael@paquier.xyz> wrote: > Shouldn't you have more regression test cases here? I guess one for > the non-NULL case, and one for the NULL case? Yeah. I've just pushed a fix that included a fairly simple regression test. David
Re: BUG #16369: Segmentation Faults and Data Corruption withGenerated Columns
From
Michael Paquier
Date:
On Sat, Apr 18, 2020 at 02:13:17PM +1200, David Rowley wrote: > On Thu, 16 Apr 2020 at 17:30, Michael Paquier <michael@paquier.xyz> wrote: > > Shouldn't you have more regression test cases here? I guess one for > > the non-NULL case, and one for the NULL case? > > Yeah. I've just pushed a fix that included a fairly simple regression test. Thanks for the commit, David! I was wondering if it would have been better to add a "\pset null", but that's a nit and the result looks fine anyway. -- Michael