Thread: backend crash

backend crash

From
kjonca@fastmail.com (Kamil Jońca)
Date:
I have problem with (pseudo) random backend crashes.
They seems to happen at one query, but not every time.

table:

--8<---------------cut here---------------start------------->8---
CREATE TABLE digitalocean.log (
    ts timestamp with time zone DEFAULT now() NOT NULL,
    actions jsonb NOT NULL,
    account text NOT NULL
);

ALTER TABLE ONLY digitalocean.log
    ADD CONSTRAINT log_ts_key UNIQUE (ts);
--8<---------------cut here---------------end--------------->8---







query:
--8<---------------cut here---------------start------------->8---
with x as (select jsonb_array_elements (actions->'actions')->'id' as id ,ts  from digitalocean.log where
account='onet')select max(x.id::bigint) from x;
 
--8<---------------cut here---------------end--------------->8---





stacktrace from core:

--8<---------------cut here---------------start------------->8---
sudo gdb -c core /usr/lib/postgresql/12/bin/postgres    
GNU gdb (Debian 9.1-3) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/postgresql/12/bin/postgres...
Reading symbols from /usr/lib/debug/.build-id/20/7804ddc4a915f7f080ecec71883176fde6d69e.debug...
[New LWP 1820750]

warning: .dynamic section for "/usr/lib/x86_64-linux-gnu/libgnutls.so.30" is not at the expected address (wrong library
orversion mismatch?)
 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: parallel worker for PID 1820747                                     '.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f562e80555b in __GI_abort () at abort.c:79
#2  0x000055f675632af4 in ExceptionalCondition (conditionName=conditionName@entry=0x55f6758523f8 "!(result->tdrefcount
==-1)", errorType=errorType@entry=0x55f67568a016 "FailedAssertion", 
 
    fileName=fileName@entry=0x55f675852008
"/build/postgresql-12-kUaQt8/postgresql-12-12.2/build/../src/backend/utils/cache/typcache.c",
lineNumber=lineNumber@entry=2621)at ./build/../src/backend/utils/error/assert.c:54
 
#3  0x000055f6756309ae in find_or_make_matching_shared_tupledesc (tupdesc=<optimized out>) at
./build/../src/backend/utils/cache/typcache.c:2621
#4  0x000055f675632191 in assign_record_type_typmod (tupDesc=<optimized out>, tupDesc@entry=0x55f6768ac950) at
./build/../src/backend/utils/cache/typcache.c:1815
#5  0x000055f675383328 in BlessTupleDesc (tupdesc=0x55f6768ac950) at ./build/../src/backend/executor/execTuples.c:2056
#6  BlessTupleDesc (tupdesc=tupdesc@entry=0x55f6768ac950) at ./build/../src/backend/executor/execTuples.c:2052
#7  0x000055f6755804fe in elements_worker_jsonb (fcinfo=0x55f6768ab788, as_text=false, funcname=<optimized out>) at
./build/../src/backend/utils/adt/jsonfuncs.c:2030
#8  0x000055f67537f57f in ExecMakeFunctionResultSet (fcache=0x55f67689ee90, econtext=econtext@entry=0x55f67689df48,
argContext=0x55f6768ad000,isNull=0x55f67689e5f8, isDone=isDone@entry=0x55f6768ab078) at
./build/../src/backend/executor/execSRF.c:606
#9  0x000055f6753a2b38 in ExecProjectSRF (node=node@entry=0x55f67689dc18, continuing=continuing@entry=false) at
./build/../src/backend/executor/nodeProjectSet.c:175
#10 0x000055f6753a2c1d in ExecProjectSet (pstate=0x55f67689dc18) at
./build/../src/backend/executor/nodeProjectSet.c:105
#11 0x000055f6753a3642 in ExecProcNode (node=0x55f67689dc18) at ./build/../src/include/executor/executor.h:239
#12 ExecResult (pstate=0x55f67689daa0) at ./build/../src/backend/executor/nodeResult.c:115
#13 0x000055f67538858c in ExecProcNode (node=0x55f67689daa0) at ./build/../src/include/executor/executor.h:239
#14 fetch_input_tuple (aggstate=aggstate@entry=0x55f67689d340) at ./build/../src/backend/executor/nodeAgg.c:406
#15 0x000055f67538a348 in agg_retrieve_direct (aggstate=0x55f67689d340) at
./build/../src/backend/executor/nodeAgg.c:1747
#16 ExecAgg (pstate=0x55f67689d340) at ./build/../src/backend/executor/nodeAgg.c:1562
#17 0x000055f67537499c in ExecProcNode (node=0x55f67689d340) at ./build/../src/include/executor/executor.h:239
#18 ExecutePlan (execute_once=<optimized out>, dest=0x55f676846288, direction=<optimized out>, numberTuples=0,
sendTuples=<optimizedout>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x55f67689d340,
estate=0x55f67689d098)at ./build/../src/backend/executor/execMain.c:1646
 
#19 standard_ExecutorRun (queryDesc=0x55f676847950, direction=<optimized out>, count=0, execute_once=<optimized out>)
at./build/../src/backend/executor/execMain.c:364
 
#20 0x000055f675379837 in ParallelQueryMain (seg=seg@entry=0x55f6767f9948, toc=toc@entry=0x7f562343f000) at
./build/../src/backend/executor/execParallel.c:1399
#21 0x000055f67522d70f in ParallelWorkerMain (main_arg=<optimized out>) at
./build/../src/backend/access/transam/parallel.c:1431
#22 0x000055f6754644d7 in StartBackgroundWorker () at ./build/../src/backend/postmaster/bgworker.c:834
#23 0x000055f675472504 in do_start_bgworker (rw=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:5770
#24 maybe_start_bgworkers () at ./build/../src/backend/postmaster/postmaster.c:5996
#25 0x000055f675472f92 in sigusr1_handler (postgres_signal_arg=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:5167
#26 <signal handler called>
#27 0x00007f562e8d4fd6 in __GI___select (nfds=nfds@entry=5, readfds=readfds@entry=0x7ffc8caa4420,
writefds=writefds@entry=0x0,exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffc8caa4380) at
../sysdeps/unix/sysv/linux/select.c:41
#28 0x000055f675473844 in ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1668
#29 0x000055f67547550d in PostmasterMain (argc=5, argv=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:1377
#30 0x000055f6751872b4 in main (argc=5, argv=0x55f67679dc40) at ./build/../src/backend/main/main.c:228
(gdb) quit
--8<---------------cut here---------------end--------------->8---


--8<---------------cut here---------------start------------->8---
 SELECT version();
                                                   version                                                    
-------------------------------------------------------------------------------------------------------------
PostgreSQL 12.2 (Debian 12.2-3.pgdg+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 9.3.0-7) 9.3.0, 64-bit
--8<---------------cut here---------------end--------------->8---

-- 
http://stopstopnop.pl/stop_stopnop.pl_o_nas.html



Re: backend crash

From
Tomas Vondra
Date:
On Mon, Apr 13, 2020 at 09:25:35AM +0200, Kamil Jońca wrote:
>
>I have problem with (pseudo) random backend crashes.
>They seems to happen at one query, but not every time.
>
>table:
>
>--8<---------------cut here---------------start------------->8---
>CREATE TABLE digitalocean.log (
>    ts timestamp with time zone DEFAULT now() NOT NULL,
>    actions jsonb NOT NULL,
>    account text NOT NULL
>);
>
>ALTER TABLE ONLY digitalocean.log
>    ADD CONSTRAINT log_ts_key UNIQUE (ts);
>--8<---------------cut here---------------end--------------->8---
>
>query:
>--8<---------------cut here---------------start------------->8---
>with x as (select jsonb_array_elements (actions->'actions')->'id' as id ,ts  from digitalocean.log where
account='onet')select max(x.id::bigint) from x;
 
>--8<---------------cut here---------------end--------------->8---
>

I've tried to reproduce this, but unfortunately it does not fail for me.
Chances are this depends on data. Can you prepare a small data sample
that would trigger the issue?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: backend crash

From
Tom Lane
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Mon, Apr 13, 2020 at 09:25:35AM +0200, Kamil Jońca wrote:
>> I have problem with (pseudo) random backend crashes.
>> They seems to happen at one query, but not every time.

> I've tried to reproduce this, but unfortunately it does not fail for me.
> Chances are this depends on data. Can you prepare a small data sample
> that would trigger the issue?

This is evidently a parallel-query worker, so it's unlikely you'd ever
get anywhere near this crash with a "small" data sample; for starters
there'd have to be enough data to prompt use of a parallel plan.

I doubt that the crash is very data-specific though.  Maybe just fill
the table with a lot of dummy data?

            regards, tom lane



Re: backend crash

From
Tomas Vondra
Date:
On Mon, Apr 13, 2020 at 11:01:01AM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On Mon, Apr 13, 2020 at 09:25:35AM +0200, Kamil Jońca wrote:
>>> I have problem with (pseudo) random backend crashes.
>>> They seems to happen at one query, but not every time.
>
>> I've tried to reproduce this, but unfortunately it does not fail for me.
>> Chances are this depends on data. Can you prepare a small data sample
>> that would trigger the issue?
>
>This is evidently a parallel-query worker, so it's unlikely you'd ever
>get anywhere near this crash with a "small" data sample; for starters
>there'd have to be enough data to prompt use of a parallel plan.
>
>I doubt that the crash is very data-specific though.  Maybe just fill
>the table with a lot of dummy data?
>

I did that, of course, and the plan looks like this:

                                                                      QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------
  Finalize Aggregate  (cost=315448.35..315448.36 rows=1 width=8) (actual time=18685.398..18685.400 rows=1 loops=1)
    ->  Gather  (cost=315447.52..315448.33 rows=8 width=8) (actual time=18683.862..18687.698 rows=8 loops=1)
          Workers Planned: 8
          Workers Launched: 7
          ->  Partial Aggregate  (cost=314447.52..314447.53 rows=1 width=8) (actual time=18669.777..18669.781 rows=1
loops=8)
                ->  Result  (cost=0.00..299447.50 rows=12500000 width=40) (actual time=0.160..17844.793 rows=250000
loops=8)
                      ->  ProjectSet  (cost=0.00..80697.50 rows=12500000 width=32) (actual time=0.145..14854.132
rows=250000loops=8)
 
                            ->  Parallel Seq Scan on log  (cost=0.00..16947.50 rows=125000 width=81) (actual
time=0.026..343.471rows=125000 loops=8)
 
                                  Filter: (account = 'onet'::text)
  Planning Time: 0.223 ms
  Execution Time: 18687.862 ms
(11 rows)

but still no crashes. I've even built from the exact 12.2 commit to make
sure the bug was not accidentally fixed by some commit since then.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: backend crash

From
kjonca@fastmail.com (Kamil Jońca)
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

> On Mon, Apr 13, 2020 at 11:01:01AM -0400, Tom Lane wrote:
>>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>> On Mon, Apr 13, 2020 at 09:25:35AM +0200, Kamil Jońca wrote:
>>>> I have problem with (pseudo) random backend crashes.
>>>> They seems to happen at one query, but not every time.
>>
>>> I've tried to reproduce this, but unfortunately it does not fail for me.
>>> Chances are this depends on data. Can you prepare a small data sample
>>> that would trigger the issue?
>>
>>This is evidently a parallel-query worker, so it's unlikely you'd ever
>>get anywhere near this crash with a "small" data sample; for starters
>>there'd have to be enough data to prompt use of a parallel plan.
>>
>>I doubt that the crash is very data-specific though.  Maybe just fill
>>the table with a lot of dummy data?
>>
>
> I did that, of course, and the plan looks like this:
>

1. Did you try run this many times? As I wrote this does not crash every
time. I can reproduce this on my data by running query 1000 times in
loop.
2. I can try to prepare test data, but it will be rather long
process. Stay tuned :/
KJ
--
http://stopstopnop.pl/stop_stopnop.pl_o_nas.html



Re: backend crash

From
Tomas Vondra
Date:
On Mon, Apr 13, 2020 at 06:02:34PM +0200, Kamil Jońca wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>
>> On Mon, Apr 13, 2020 at 11:01:01AM -0400, Tom Lane wrote:
>>>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>>> On Mon, Apr 13, 2020 at 09:25:35AM +0200, Kamil Jońca wrote:
>>>>> I have problem with (pseudo) random backend crashes.
>>>>> They seems to happen at one query, but not every time.
>>>
>>>> I've tried to reproduce this, but unfortunately it does not fail for me.
>>>> Chances are this depends on data. Can you prepare a small data sample
>>>> that would trigger the issue?
>>>
>>>This is evidently a parallel-query worker, so it's unlikely you'd ever
>>>get anywhere near this crash with a "small" data sample; for starters
>>>there'd have to be enough data to prompt use of a parallel plan.
>>>
>>>I doubt that the crash is very data-specific though.  Maybe just fill
>>>the table with a lot of dummy data?
>>>
>>
>> I did that, of course, and the plan looks like this:
>>
>
>1. Did you try run this many times? As I wrote this does not crash every
>time. I can reproduce this on my data by running query 1000 times in
>loop.

I did run it about 100x, I'll reduce the amount of data and let it run.

How often it crashes? What execution plan does it use?

>2. I can try to prepare test data, but it will be rather long
>process. Stay tuned :/

OK

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: backend crash

From
Tomas Vondra
Date:
On Mon, Apr 13, 2020 at 06:19:13PM +0200, Tomas Vondra wrote:
>On Mon, Apr 13, 2020 at 06:02:34PM +0200, Kamil Jońca wrote:
>>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>
>>>On Mon, Apr 13, 2020 at 11:01:01AM -0400, Tom Lane wrote:
>>>>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>>>>On Mon, Apr 13, 2020 at 09:25:35AM +0200, Kamil Jońca wrote:
>>>>>>I have problem with (pseudo) random backend crashes.
>>>>>>They seems to happen at one query, but not every time.
>>>>
>>>>>I've tried to reproduce this, but unfortunately it does not fail for me.
>>>>>Chances are this depends on data. Can you prepare a small data sample
>>>>>that would trigger the issue?
>>>>
>>>>This is evidently a parallel-query worker, so it's unlikely you'd ever
>>>>get anywhere near this crash with a "small" data sample; for starters
>>>>there'd have to be enough data to prompt use of a parallel plan.
>>>>
>>>>I doubt that the crash is very data-specific though.  Maybe just fill
>>>>the table with a lot of dummy data?
>>>>
>>>
>>>I did that, of course, and the plan looks like this:
>>>
>>
>>1. Did you try run this many times? As I wrote this does not crash every
>>time. I can reproduce this on my data by running query 1000 times in
>>loop.
>
>I did run it about 100x, I'll reduce the amount of data and let it run.
>
>How often it crashes? What execution plan does it use?
>

Also, presumably you're building from source, right? What configure
options are you using?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: backend crash

From
kjonca@fastmail.com (Kamil Jońca)
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

[...]
>>
>
> Also, presumably you're building from source, right? What configure
> options are you using?

No. It is installed from

http://apt.postgresql.org/pub/repos/apt/ sid-pgdg main

-- 
http://stopstopnop.pl/stop_stopnop.pl_o_nas.html



Re: backend crash

From
Tomas Vondra
Date:
On Mon, Apr 13, 2020 at 06:48:05PM +0200, Kamil Jońca wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>
>[...]
>>>
>>
>> Also, presumably you're building from source, right? What configure
>> options are you using?
>
>No. It is installed from
>
>http://apt.postgresql.org/pub/repos/apt/ sid-pgdg main
>

I'm not familiar with the debian packages, but I'd not expect them to
have asserts enabled - that's only for development builds, really. Are
you sure you're not running some other binary accidentally?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: backend crash

From
kjonca@fastmail.com (Kamil Jońca)
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

[...]
>
> I'm not familiar with the debian packages, but I'd not expect them to
> have asserts enabled - that's only for development builds, really. Are
> you sure you're not running some other binary accidentally?

I am sure I did not compile postgres  :)
I am sure I use packaged binaries, the only doubt is if I use debian
packaged or postgresql packaged debs, but "Debian 12.2-3.pgdg+1"
suggests it is packaged by postgresql team;

KJ

-- 
http://stopstopnop.pl/stop_stopnop.pl_o_nas.html



Re: backend crash

From
Magnus Hagander
Date:


On Mon, Apr 13, 2020 at 7:08 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On Mon, Apr 13, 2020 at 06:48:05PM +0200, Kamil Jońca wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>
>[...]
>>>
>>
>> Also, presumably you're building from source, right? What configure
>> options are you using?
>
>No. It is installed from
>
>http://apt.postgresql.org/pub/repos/apt/ sid-pgdg main
>

I'm not familiar with the debian packages, but I'd not expect them to
have asserts enabled - that's only for development builds, really. Are
you sure you're not running some other binary accidentally?

Debian "sid" is the "unstable" distribution, for development. Unless things has changed since https://www.postgresql.org/message-id/20170610202102.vz5l4eudghktvwgj%40msg.df7cb.de, they're built with asserts on by default for that distribution, but not for the regular versions.

--

Re: backend crash

From
Tomas Vondra
Date:
On Mon, Apr 13, 2020 at 07:20:18PM +0200, Magnus Hagander wrote:
>On Mon, Apr 13, 2020 at 7:08 PM Tomas Vondra <tomas.vondra@2ndquadrant.com>
>wrote:
>
>> On Mon, Apr 13, 2020 at 06:48:05PM +0200, Kamil Jońca wrote:
>> >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> >
>> >[...]
>> >>>
>> >>
>> >> Also, presumably you're building from source, right? What configure
>> >> options are you using?
>> >
>> >No. It is installed from
>> >
>> >http://apt.postgresql.org/pub/repos/apt/ sid-pgdg main
>> >
>>
>> I'm not familiar with the debian packages, but I'd not expect them to
>> have asserts enabled - that's only for development builds, really. Are
>> you sure you're not running some other binary accidentally?
>>
>
>Debian "sid" is the "unstable" distribution, for development. Unless things
>has changed since
>https://www.postgresql.org/message-id/20170610202102.vz5l4eudghktvwgj%40msg.df7cb.de,
>they're built with asserts on by default for that distribution, but not for
>the regular versions.
>

Ah, right - that explains it. Thanks.


-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: backend crash

From
Thomas Munro
Date:
On Mon, Apr 13, 2020 at 10:59 PM Kamil Jońca <kjonca@fastmail.com> wrote:
> #2  0x000055f675632af4 in ExceptionalCondition (conditionName=conditionName@entry=0x55f6758523f8
"!(result->tdrefcount== -1)", errorType=errorType@entry=0x55f67568a016 "FailedAssertion", 
>     fileName=fileName@entry=0x55f675852008
"/build/postgresql-12-kUaQt8/postgresql-12-12.2/build/../src/backend/utils/cache/typcache.c",
lineNumber=lineNumber@entry=2621)at ./build/../src/backend/utils/error/assert.c:54 
> #3  0x000055f6756309ae in find_or_make_matching_shared_tupledesc (tupdesc=<optimized out>) at
./build/../src/backend/utils/cache/typcache.c:2621

This code is asserting that a TupleDesc that it found in
CurrentSession->shared_record_table is not reference counted.  That's
because TupleDesc objects stored there stick around for the lifetime
of the session (= leader backend lifetime).  So the question is ...
how did that thing manage to get some other value there, and what was
it?  What does "frame 3", "print *result" show?