Thread: backend crash
I have problem with (pseudo) random backend crashes. They seems to happen at one query, but not every time. table: --8<---------------cut here---------------start------------->8--- CREATE TABLE digitalocean.log ( ts timestamp with time zone DEFAULT now() NOT NULL, actions jsonb NOT NULL, account text NOT NULL ); ALTER TABLE ONLY digitalocean.log ADD CONSTRAINT log_ts_key UNIQUE (ts); --8<---------------cut here---------------end--------------->8--- query: --8<---------------cut here---------------start------------->8--- with x as (select jsonb_array_elements (actions->'actions')->'id' as id ,ts from digitalocean.log where account='onet')select max(x.id::bigint) from x; --8<---------------cut here---------------end--------------->8--- stacktrace from core: --8<---------------cut here---------------start------------->8--- sudo gdb -c core /usr/lib/postgresql/12/bin/postgres GNU gdb (Debian 9.1-3) 9.1 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/lib/postgresql/12/bin/postgres... Reading symbols from /usr/lib/debug/.build-id/20/7804ddc4a915f7f080ecec71883176fde6d69e.debug... [New LWP 1820750] warning: .dynamic section for "/usr/lib/x86_64-linux-gnu/libgnutls.so.30" is not at the expected address (wrong library orversion mismatch?) [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `postgres: parallel worker for PID 1820747 '. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f562e80555b in __GI_abort () at abort.c:79 #2 0x000055f675632af4 in ExceptionalCondition (conditionName=conditionName@entry=0x55f6758523f8 "!(result->tdrefcount ==-1)", errorType=errorType@entry=0x55f67568a016 "FailedAssertion", fileName=fileName@entry=0x55f675852008 "/build/postgresql-12-kUaQt8/postgresql-12-12.2/build/../src/backend/utils/cache/typcache.c", lineNumber=lineNumber@entry=2621)at ./build/../src/backend/utils/error/assert.c:54 #3 0x000055f6756309ae in find_or_make_matching_shared_tupledesc (tupdesc=<optimized out>) at ./build/../src/backend/utils/cache/typcache.c:2621 #4 0x000055f675632191 in assign_record_type_typmod (tupDesc=<optimized out>, tupDesc@entry=0x55f6768ac950) at ./build/../src/backend/utils/cache/typcache.c:1815 #5 0x000055f675383328 in BlessTupleDesc (tupdesc=0x55f6768ac950) at ./build/../src/backend/executor/execTuples.c:2056 #6 BlessTupleDesc (tupdesc=tupdesc@entry=0x55f6768ac950) at ./build/../src/backend/executor/execTuples.c:2052 #7 0x000055f6755804fe in elements_worker_jsonb (fcinfo=0x55f6768ab788, as_text=false, funcname=<optimized out>) at ./build/../src/backend/utils/adt/jsonfuncs.c:2030 #8 0x000055f67537f57f in ExecMakeFunctionResultSet (fcache=0x55f67689ee90, econtext=econtext@entry=0x55f67689df48, argContext=0x55f6768ad000,isNull=0x55f67689e5f8, isDone=isDone@entry=0x55f6768ab078) at ./build/../src/backend/executor/execSRF.c:606 #9 0x000055f6753a2b38 in ExecProjectSRF (node=node@entry=0x55f67689dc18, continuing=continuing@entry=false) at ./build/../src/backend/executor/nodeProjectSet.c:175 #10 0x000055f6753a2c1d in ExecProjectSet (pstate=0x55f67689dc18) at ./build/../src/backend/executor/nodeProjectSet.c:105 #11 0x000055f6753a3642 in ExecProcNode (node=0x55f67689dc18) at ./build/../src/include/executor/executor.h:239 #12 ExecResult (pstate=0x55f67689daa0) at ./build/../src/backend/executor/nodeResult.c:115 #13 0x000055f67538858c in ExecProcNode (node=0x55f67689daa0) at ./build/../src/include/executor/executor.h:239 #14 fetch_input_tuple (aggstate=aggstate@entry=0x55f67689d340) at ./build/../src/backend/executor/nodeAgg.c:406 #15 0x000055f67538a348 in agg_retrieve_direct (aggstate=0x55f67689d340) at ./build/../src/backend/executor/nodeAgg.c:1747 #16 ExecAgg (pstate=0x55f67689d340) at ./build/../src/backend/executor/nodeAgg.c:1562 #17 0x000055f67537499c in ExecProcNode (node=0x55f67689d340) at ./build/../src/include/executor/executor.h:239 #18 ExecutePlan (execute_once=<optimized out>, dest=0x55f676846288, direction=<optimized out>, numberTuples=0, sendTuples=<optimizedout>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x55f67689d340, estate=0x55f67689d098)at ./build/../src/backend/executor/execMain.c:1646 #19 standard_ExecutorRun (queryDesc=0x55f676847950, direction=<optimized out>, count=0, execute_once=<optimized out>) at./build/../src/backend/executor/execMain.c:364 #20 0x000055f675379837 in ParallelQueryMain (seg=seg@entry=0x55f6767f9948, toc=toc@entry=0x7f562343f000) at ./build/../src/backend/executor/execParallel.c:1399 #21 0x000055f67522d70f in ParallelWorkerMain (main_arg=<optimized out>) at ./build/../src/backend/access/transam/parallel.c:1431 #22 0x000055f6754644d7 in StartBackgroundWorker () at ./build/../src/backend/postmaster/bgworker.c:834 #23 0x000055f675472504 in do_start_bgworker (rw=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:5770 #24 maybe_start_bgworkers () at ./build/../src/backend/postmaster/postmaster.c:5996 #25 0x000055f675472f92 in sigusr1_handler (postgres_signal_arg=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:5167 #26 <signal handler called> #27 0x00007f562e8d4fd6 in __GI___select (nfds=nfds@entry=5, readfds=readfds@entry=0x7ffc8caa4420, writefds=writefds@entry=0x0,exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffc8caa4380) at ../sysdeps/unix/sysv/linux/select.c:41 #28 0x000055f675473844 in ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1668 #29 0x000055f67547550d in PostmasterMain (argc=5, argv=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:1377 #30 0x000055f6751872b4 in main (argc=5, argv=0x55f67679dc40) at ./build/../src/backend/main/main.c:228 (gdb) quit --8<---------------cut here---------------end--------------->8--- --8<---------------cut here---------------start------------->8--- SELECT version(); version ------------------------------------------------------------------------------------------------------------- PostgreSQL 12.2 (Debian 12.2-3.pgdg+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 9.3.0-7) 9.3.0, 64-bit --8<---------------cut here---------------end--------------->8--- -- http://stopstopnop.pl/stop_stopnop.pl_o_nas.html
On Mon, Apr 13, 2020 at 09:25:35AM +0200, Kamil Jońca wrote: > >I have problem with (pseudo) random backend crashes. >They seems to happen at one query, but not every time. > >table: > >--8<---------------cut here---------------start------------->8--- >CREATE TABLE digitalocean.log ( > ts timestamp with time zone DEFAULT now() NOT NULL, > actions jsonb NOT NULL, > account text NOT NULL >); > >ALTER TABLE ONLY digitalocean.log > ADD CONSTRAINT log_ts_key UNIQUE (ts); >--8<---------------cut here---------------end--------------->8--- > >query: >--8<---------------cut here---------------start------------->8--- >with x as (select jsonb_array_elements (actions->'actions')->'id' as id ,ts from digitalocean.log where account='onet')select max(x.id::bigint) from x; >--8<---------------cut here---------------end--------------->8--- > I've tried to reproduce this, but unfortunately it does not fail for me. Chances are this depends on data. Can you prepare a small data sample that would trigger the issue? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On Mon, Apr 13, 2020 at 09:25:35AM +0200, Kamil Jońca wrote: >> I have problem with (pseudo) random backend crashes. >> They seems to happen at one query, but not every time. > I've tried to reproduce this, but unfortunately it does not fail for me. > Chances are this depends on data. Can you prepare a small data sample > that would trigger the issue? This is evidently a parallel-query worker, so it's unlikely you'd ever get anywhere near this crash with a "small" data sample; for starters there'd have to be enough data to prompt use of a parallel plan. I doubt that the crash is very data-specific though. Maybe just fill the table with a lot of dummy data? regards, tom lane
On Mon, Apr 13, 2020 at 11:01:01AM -0400, Tom Lane wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> On Mon, Apr 13, 2020 at 09:25:35AM +0200, Kamil Jońca wrote: >>> I have problem with (pseudo) random backend crashes. >>> They seems to happen at one query, but not every time. > >> I've tried to reproduce this, but unfortunately it does not fail for me. >> Chances are this depends on data. Can you prepare a small data sample >> that would trigger the issue? > >This is evidently a parallel-query worker, so it's unlikely you'd ever >get anywhere near this crash with a "small" data sample; for starters >there'd have to be enough data to prompt use of a parallel plan. > >I doubt that the crash is very data-specific though. Maybe just fill >the table with a lot of dummy data? > I did that, of course, and the plan looks like this: QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=315448.35..315448.36 rows=1 width=8) (actual time=18685.398..18685.400 rows=1 loops=1) -> Gather (cost=315447.52..315448.33 rows=8 width=8) (actual time=18683.862..18687.698 rows=8 loops=1) Workers Planned: 8 Workers Launched: 7 -> Partial Aggregate (cost=314447.52..314447.53 rows=1 width=8) (actual time=18669.777..18669.781 rows=1 loops=8) -> Result (cost=0.00..299447.50 rows=12500000 width=40) (actual time=0.160..17844.793 rows=250000 loops=8) -> ProjectSet (cost=0.00..80697.50 rows=12500000 width=32) (actual time=0.145..14854.132 rows=250000loops=8) -> Parallel Seq Scan on log (cost=0.00..16947.50 rows=125000 width=81) (actual time=0.026..343.471rows=125000 loops=8) Filter: (account = 'onet'::text) Planning Time: 0.223 ms Execution Time: 18687.862 ms (11 rows) but still no crashes. I've even built from the exact 12.2 commit to make sure the bug was not accidentally fixed by some commit since then. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On Mon, Apr 13, 2020 at 11:01:01AM -0400, Tom Lane wrote: >>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >>> On Mon, Apr 13, 2020 at 09:25:35AM +0200, Kamil Jońca wrote: >>>> I have problem with (pseudo) random backend crashes. >>>> They seems to happen at one query, but not every time. >> >>> I've tried to reproduce this, but unfortunately it does not fail for me. >>> Chances are this depends on data. Can you prepare a small data sample >>> that would trigger the issue? >> >>This is evidently a parallel-query worker, so it's unlikely you'd ever >>get anywhere near this crash with a "small" data sample; for starters >>there'd have to be enough data to prompt use of a parallel plan. >> >>I doubt that the crash is very data-specific though. Maybe just fill >>the table with a lot of dummy data? >> > > I did that, of course, and the plan looks like this: > 1. Did you try run this many times? As I wrote this does not crash every time. I can reproduce this on my data by running query 1000 times in loop. 2. I can try to prepare test data, but it will be rather long process. Stay tuned :/ KJ -- http://stopstopnop.pl/stop_stopnop.pl_o_nas.html
On Mon, Apr 13, 2020 at 06:02:34PM +0200, Kamil Jońca wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > >> On Mon, Apr 13, 2020 at 11:01:01AM -0400, Tom Lane wrote: >>>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >>>> On Mon, Apr 13, 2020 at 09:25:35AM +0200, Kamil Jońca wrote: >>>>> I have problem with (pseudo) random backend crashes. >>>>> They seems to happen at one query, but not every time. >>> >>>> I've tried to reproduce this, but unfortunately it does not fail for me. >>>> Chances are this depends on data. Can you prepare a small data sample >>>> that would trigger the issue? >>> >>>This is evidently a parallel-query worker, so it's unlikely you'd ever >>>get anywhere near this crash with a "small" data sample; for starters >>>there'd have to be enough data to prompt use of a parallel plan. >>> >>>I doubt that the crash is very data-specific though. Maybe just fill >>>the table with a lot of dummy data? >>> >> >> I did that, of course, and the plan looks like this: >> > >1. Did you try run this many times? As I wrote this does not crash every >time. I can reproduce this on my data by running query 1000 times in >loop. I did run it about 100x, I'll reduce the amount of data and let it run. How often it crashes? What execution plan does it use? >2. I can try to prepare test data, but it will be rather long >process. Stay tuned :/ OK -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Apr 13, 2020 at 06:19:13PM +0200, Tomas Vondra wrote: >On Mon, Apr 13, 2020 at 06:02:34PM +0200, Kamil Jońca wrote: >>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> >>>On Mon, Apr 13, 2020 at 11:01:01AM -0400, Tom Lane wrote: >>>>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >>>>>On Mon, Apr 13, 2020 at 09:25:35AM +0200, Kamil Jońca wrote: >>>>>>I have problem with (pseudo) random backend crashes. >>>>>>They seems to happen at one query, but not every time. >>>> >>>>>I've tried to reproduce this, but unfortunately it does not fail for me. >>>>>Chances are this depends on data. Can you prepare a small data sample >>>>>that would trigger the issue? >>>> >>>>This is evidently a parallel-query worker, so it's unlikely you'd ever >>>>get anywhere near this crash with a "small" data sample; for starters >>>>there'd have to be enough data to prompt use of a parallel plan. >>>> >>>>I doubt that the crash is very data-specific though. Maybe just fill >>>>the table with a lot of dummy data? >>>> >>> >>>I did that, of course, and the plan looks like this: >>> >> >>1. Did you try run this many times? As I wrote this does not crash every >>time. I can reproduce this on my data by running query 1000 times in >>loop. > >I did run it about 100x, I'll reduce the amount of data and let it run. > >How often it crashes? What execution plan does it use? > Also, presumably you're building from source, right? What configure options are you using? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: [...] >> > > Also, presumably you're building from source, right? What configure > options are you using? No. It is installed from http://apt.postgresql.org/pub/repos/apt/ sid-pgdg main -- http://stopstopnop.pl/stop_stopnop.pl_o_nas.html
On Mon, Apr 13, 2020 at 06:48:05PM +0200, Kamil Jońca wrote: >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > >[...] >>> >> >> Also, presumably you're building from source, right? What configure >> options are you using? > >No. It is installed from > >http://apt.postgresql.org/pub/repos/apt/ sid-pgdg main > I'm not familiar with the debian packages, but I'd not expect them to have asserts enabled - that's only for development builds, really. Are you sure you're not running some other binary accidentally? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: [...] > > I'm not familiar with the debian packages, but I'd not expect them to > have asserts enabled - that's only for development builds, really. Are > you sure you're not running some other binary accidentally? I am sure I did not compile postgres :) I am sure I use packaged binaries, the only doubt is if I use debian packaged or postgresql packaged debs, but "Debian 12.2-3.pgdg+1" suggests it is packaged by postgresql team; KJ -- http://stopstopnop.pl/stop_stopnop.pl_o_nas.html
On Mon, Apr 13, 2020 at 7:08 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On Mon, Apr 13, 2020 at 06:48:05PM +0200, Kamil Jońca wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>
>[...]
>>>
>>
>> Also, presumably you're building from source, right? What configure
>> options are you using?
>
>No. It is installed from
>
>http://apt.postgresql.org/pub/repos/apt/ sid-pgdg main
>
I'm not familiar with the debian packages, but I'd not expect them to
have asserts enabled - that's only for development builds, really. Are
you sure you're not running some other binary accidentally?
Debian "sid" is the "unstable" distribution, for development. Unless things has changed since https://www.postgresql.org/message-id/20170610202102.vz5l4eudghktvwgj%40msg.df7cb.de, they're built with asserts on by default for that distribution, but not for the regular versions.
On Mon, Apr 13, 2020 at 07:20:18PM +0200, Magnus Hagander wrote: >On Mon, Apr 13, 2020 at 7:08 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> >wrote: > >> On Mon, Apr 13, 2020 at 06:48:05PM +0200, Kamil Jońca wrote: >> >Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> > >> >[...] >> >>> >> >> >> >> Also, presumably you're building from source, right? What configure >> >> options are you using? >> > >> >No. It is installed from >> > >> >http://apt.postgresql.org/pub/repos/apt/ sid-pgdg main >> > >> >> I'm not familiar with the debian packages, but I'd not expect them to >> have asserts enabled - that's only for development builds, really. Are >> you sure you're not running some other binary accidentally? >> > >Debian "sid" is the "unstable" distribution, for development. Unless things >has changed since >https://www.postgresql.org/message-id/20170610202102.vz5l4eudghktvwgj%40msg.df7cb.de, >they're built with asserts on by default for that distribution, but not for >the regular versions. > Ah, right - that explains it. Thanks. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Apr 13, 2020 at 10:59 PM Kamil Jońca <kjonca@fastmail.com> wrote: > #2 0x000055f675632af4 in ExceptionalCondition (conditionName=conditionName@entry=0x55f6758523f8 "!(result->tdrefcount== -1)", errorType=errorType@entry=0x55f67568a016 "FailedAssertion", > fileName=fileName@entry=0x55f675852008 "/build/postgresql-12-kUaQt8/postgresql-12-12.2/build/../src/backend/utils/cache/typcache.c", lineNumber=lineNumber@entry=2621)at ./build/../src/backend/utils/error/assert.c:54 > #3 0x000055f6756309ae in find_or_make_matching_shared_tupledesc (tupdesc=<optimized out>) at ./build/../src/backend/utils/cache/typcache.c:2621 This code is asserting that a TupleDesc that it found in CurrentSession->shared_record_table is not reference counted. That's because TupleDesc objects stored there stick around for the lifetime of the session (= leader backend lifetime). So the question is ... how did that thing manage to get some other value there, and what was it? What does "frame 3", "print *result" show?