Thread: BUG #15489: Segfault on DELETE

BUG #15489: Segfault on DELETE

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15489
Logged by:          Kanwei Li
Email address:      kanwei@gmail.com
PostgreSQL version: 11.0
Operating system:   Debian 9
Description:

We started seeing a segfault crash on our postgresql 11 server instance
today when attempting to delete certain rows in the database:

2018-11-06 21:02:07.553 UTC [60606] LOG:  server process (PID 66881) was
terminated by signal 11: Segmentation fault
2018-11-06 21:02:07.553 UTC [60606] DETAIL:  Failed process was running:
        delete from integration_account
        where partner_id = 24

Attempting to delete certain rows were causing this segfault, and attempting
to delete other rows did not. There didn't seem to be a pattern, and because
this was on production we couldn't risk playing around too much.

Doing a SELECT on the rows that couldn't be deleted worked fine. There
didn't seem to be data corruption since all the data could be read. However,
attempting to DELETE certain rows would crash it. pg_dump also worked
fine.

What fixed it was performing a VACUUM ANALYZE on the database. After that,
the deletes worked again.

I'm sorry I can no longer list steps to reproduce this, since the VACUUM
fixed it, but I figured I should report it in case others have seen it, or
if anyone can maybe guess what the problem is.


Re: BUG #15489: Segfault on DELETE

From
Amit Langote
Date:
On 2018/11/07 14:01, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      15489
> Logged by:          Kanwei Li
> Email address:      kanwei@gmail.com
> PostgreSQL version: 11.0
> Operating system:   Debian 9
> Description:        
> 
> We started seeing a segfault crash on our postgresql 11 server instance
> today when attempting to delete certain rows in the database:
> 
> 2018-11-06 21:02:07.553 UTC [60606] LOG:  server process (PID 66881) was
> terminated by signal 11: Segmentation fault
> 2018-11-06 21:02:07.553 UTC [60606] DETAIL:  Failed process was running:
>         delete from integration_account
>         where partner_id = 24
> 
> Attempting to delete certain rows were causing this segfault, and attempting
> to delete other rows did not. There didn't seem to be a pattern, and because
> this was on production we couldn't risk playing around too much.
> 
> Doing a SELECT on the rows that couldn't be deleted worked fine. There
> didn't seem to be data corruption since all the data could be read. However,
> attempting to DELETE certain rows would crash it. pg_dump also worked
> fine.

Are there any triggers defined on integration_account?  Also, has there
recently been any ALTER TABLE DROP/DROP COLUMN activity on that table?

PG 11.1 to be released later this week fixed a bug that would cause
segmentation fault when running triggers (including, but not limited to
DELETE triggers).

> What fixed it was performing a VACUUM ANALYZE on the database. After that,
> the deletes worked again.

Hmm, that's a bit mysterious to me if your case is really hitting the bug
I'm suspecting.

Thanks,
Amit



Re: BUG #15489: Segfault on DELETE

From
Michael Paquier
Date:
On Wed, Nov 07, 2018 at 02:34:12PM +0900, Amit Langote wrote:
> Are there any triggers defined on integration_account?  Also, has there
> recently been any ALTER TABLE DROP/DROP COLUMN activity on that table?
>
> PG 11.1 to be released later this week fixed a bug that would cause
> segmentation fault when running triggers (including, but not limited to
> DELETE triggers).

The point is that without more information about the schema used which
would allow to build a reproducible test case from the ground, or even
better a self-contained test case, then there is nothing much we can do
except assuming about what kind of things have been happening here.
--
Michael

Attachment

Re: Re: BUG #15489: Segfault on DELETE

From
Frederico Costa Galvão
Date:
I stumbled upon this issue yesterday, and trying to reduce and pinpoint 
it, I managed to get to this:

//start
CREATE TABLE a (
     id bigint
);

INSERT INTO a (id) VALUES (1); -- this id's value doesn't matter

ALTER TABLE ONLY a
     ADD CONSTRAINT a_pkey PRIMARY KEY (id);

CREATE TABLE b (
     a_id bigint
);

ALTER TABLE ONLY b
     ADD CONSTRAINT b_a_id_fkey FOREIGN KEY (a_id) REFERENCES a(id);

ALTER TABLE a ADD x BOOLEAN NOT NULL DEFAULT FALSE; -- or TRUE, doesn't 
matter

-- VACUUM FULL ANALYZE a; -- uncomment this to fix the bug

DELETE FROM a;
//end

This was the bare minimum I could get to reproduce the segfault on a 
portable way. It's something between foreign keys pointing to tables 
that have gone through the new no-table-rewrite handling of nonnull 
columns with non-volatile default values.

Also, VACUUM ANALYZE itself didn't fix the corrupted data: it needs to 
be FULL.

I'm on <Xubuntu 16.04 x86_64>, with <psql (PostgreSQL) 11.0 (Ubuntu 
11.0-1.pgdg16.04+2)>.
I have some simple custom settings on postgresql.conf that I don't think 
are related to the issue, but I'm willing to provide if needed.

---

Frederico Costa Galvão

On 07/11/2018 05:14, Michael Paquier wrote:
> On Wed, Nov 07, 2018 at 02:34:12PM +0900, Amit Langote wrote:
>> Are there any triggers defined on integration_account?  Also, has there
>> recently been any ALTER TABLE DROP/DROP COLUMN activity on that table?
>>
>> PG 11.1 to be released later this week fixed a bug that would cause
>> segmentation fault when running triggers (including, but not limited to
>> DELETE triggers).
> The point is that without more information about the schema used which
> would allow to build a reproducible test case from the ground, or even
> better a self-contained test case, then there is nothing much we can do
> except assuming about what kind of things have been happening here.
> --
> Michael


Re: BUG #15489: Segfault on DELETE

From
Tom Lane
Date:
=?UTF-8?Q?Frederico_Costa_Galv=c3=a3o?= <frederico.costa.galvao@gmail.com> writes:
> I stumbled upon this issue yesterday, and trying to reduce and pinpoint
> it, I managed to get to this:

Yeah, this looks like the expand_tuple bug: you've got a foreign-key
trigger and a tuple that doesn't match the table rowtype anymore.
This example doesn't crash for me in HEAD or 11.1.

            regards, tom lane


Re: BUG #15489: Segfault on DELETE

From
Amit Langote
Date:
Thanks Frederico for your reply.

On 2018/11/08 10:10, Frederico Costa Galvão wrote:
> I stumbled upon this issue yesterday, and trying to reduce and pinpoint
> it, I managed to get to this:
> 
> //start
> CREATE TABLE a (
>     id bigint
> );
> 
> INSERT INTO a (id) VALUES (1); -- this id's value doesn't matter
> 
> ALTER TABLE ONLY a
>     ADD CONSTRAINT a_pkey PRIMARY KEY (id);
> 
> CREATE TABLE b (
>     a_id bigint
> );
> 
> ALTER TABLE ONLY b
>     ADD CONSTRAINT b_a_id_fkey FOREIGN KEY (a_id) REFERENCES a(id);
> 
> ALTER TABLE a ADD x BOOLEAN NOT NULL DEFAULT FALSE; -- or TRUE, doesn't
> matter

There it is.  These are similar steps as I'd used to track down a bug
that's now fixed in 11.1.

https://www.postgresql.org/message-id/9cb4aa1c-12ba-59c3-fd75-545fa90fb92f%40lab.ntt.co.jp

The bug had to do with foreign key trigger not getting a proper
representation of the tuple being deleted, considering the newly added column.

> -- VACUUM FULL ANALYZE a; -- uncomment this to fix the bug

Ah, VACUUM FULL will rewrite the tuples such that they're not hit by the
aforementioned bug.

So, if OP can tell that this is what happened in their case too, then 11.1
will have fixed the issue.

Thanks,
Amit



Re: BUG #15489: Segfault on DELETE

From
Frederico Galvão
Date:
I'm happy I could help, and I'm even happier to see you guys were 10 steps ahead of me and already fixed it for 11.1, which I'm definitely looking forward to.

On Wed, Nov 7, 2018 at 11:33 PM Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
Thanks Frederico for your reply.

On 2018/11/08 10:10, Frederico Costa Galvão wrote:
> I stumbled upon this issue yesterday, and trying to reduce and pinpoint
> it, I managed to get to this:
>
> //start
> CREATE TABLE a (
>     id bigint
> );
>
> INSERT INTO a (id) VALUES (1); -- this id's value doesn't matter
>
> ALTER TABLE ONLY a
>     ADD CONSTRAINT a_pkey PRIMARY KEY (id);
>
> CREATE TABLE b (
>     a_id bigint
> );
>
> ALTER TABLE ONLY b
>     ADD CONSTRAINT b_a_id_fkey FOREIGN KEY (a_id) REFERENCES a(id);
>
> ALTER TABLE a ADD x BOOLEAN NOT NULL DEFAULT FALSE; -- or TRUE, doesn't
> matter

There it is.  These are similar steps as I'd used to track down a bug
that's now fixed in 11.1.

https://www.postgresql.org/message-id/9cb4aa1c-12ba-59c3-fd75-545fa90fb92f%40lab.ntt.co.jp

The bug had to do with foreign key trigger not getting a proper
representation of the tuple being deleted, considering the newly added column.

> -- VACUUM FULL ANALYZE a; -- uncomment this to fix the bug

Ah, VACUUM FULL will rewrite the tuples such that they're not hit by the
aforementioned bug.

So, if OP can tell that this is what happened in their case too, then 11.1
will have fixed the issue.

Thanks,
Amit



--
Frederico Costa Galvão
Engenheiro de Computação - Universidade Federal de Goiás
PontoGet Inovação Web
Tippz Mobile

Re: BUG #15489: Segfault on DELETE

From
Kanwei Li
Date:
I did do a VACUUM FULL on that one particular table as well, so that may have been the command that fixed it, yes.

I have just upgraded to PG 11.1 and will report if I see this again. Thanks all!

Kanwei

On Nov 7, 2018, at 8:33 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

Thanks Frederico for your reply.

On 2018/11/08 10:10, Frederico Costa Galvão wrote:
I stumbled upon this issue yesterday, and trying to reduce and pinpoint
it, I managed to get to this:

//start
CREATE TABLE a (
    id bigint
);

INSERT INTO a (id) VALUES (1); -- this id's value doesn't matter

ALTER TABLE ONLY a
    ADD CONSTRAINT a_pkey PRIMARY KEY (id);

CREATE TABLE b (
    a_id bigint
);

ALTER TABLE ONLY b
    ADD CONSTRAINT b_a_id_fkey FOREIGN KEY (a_id) REFERENCES a(id);

ALTER TABLE a ADD x BOOLEAN NOT NULL DEFAULT FALSE; -- or TRUE, doesn't
matter

There it is.  These are similar steps as I'd used to track down a bug
that's now fixed in 11.1.

https://www.postgresql.org/message-id/9cb4aa1c-12ba-59c3-fd75-545fa90fb92f%40lab.ntt.co.jp

The bug had to do with foreign key trigger not getting a proper
representation of the tuple being deleted, considering the newly added column.

-- VACUUM FULL ANALYZE a; -- uncomment this to fix the bug

Ah, VACUUM FULL will rewrite the tuples such that they're not hit by the
aforementioned bug.

So, if OP can tell that this is what happened in their case too, then 11.1
will have fixed the issue.

Thanks,
Amit