terminate called after throwing an instance of 'std::bad_alloc' - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | terminate called after throwing an instance of 'std::bad_alloc' |
Date | |
Msg-id | 20201001021609.GC8476@telsasoft.com Whole thread Raw |
Responses |
Re: terminate called after throwing an instance of 'std::bad_alloc'
Re: terminate called after throwing an instance of 'std::bad_alloc' Re: terminate called after throwing an instance of 'std::bad_alloc' |
List | pgsql-hackers |
A VM crashed which is now running PG13.0 on centos7: Sep 30 19:40:08 database7 abrt-hook-ccpp: Process 17905 (postgres) of user 26 killed by SIGABRT - dumping core Core was generated by `postgres: telsasoft ts 192.168.122.11(34608) SELECT '. Unfortunately, the filesystem wasn't large enough and the corefile is truncated. The first badness in our logfiles looks like this ; this is the very head of the logfile: |[pryzbyj@database7 ~]$ sudo gzip -dc /var/log/postgresql/crash-postgresql-2020-09-30_194000.log.gz |head |[sudo] password for pryzbyj: |terminate called after throwing an instance of 'std::bad_alloc' | what(): std::bad_alloc |< 2020-09-30 19:40:09.653 ADT >LOG: checkpoint starting: time |< 2020-09-30 19:40:17.002 ADT >LOG: checkpoint complete: wrote 74 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;write=7.331 s, sync=0.006 s, total=7.348 s; sync files=25, longest=0.002 s, average=0.000 s; distance=295 kB, estimate=4183kB |< 2020-09-30 19:40:22.642 ADT >LOG: server process (PID 17905) was terminated by signal 6: Aborted |< 2020-09-30 19:40:22.642 ADT >DETAIL: Failed process was running: --BEGIN SQL | SELECT * FROM I was able to grep the filesystem to find what looks like the preceding logfile (which our script had immediately rotated in its attempt to be helpful). |< 2020-09-30 19:39:09.096 ADT >LOG: checkpoint starting: time |< 2020-09-30 19:39:12.640 ADT >LOG: checkpoint complete: wrote 35 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;write=3.523 s, sync=0.006 s, total=3.544 s; sync files=14, longest=0.002 s, average=0.000 s; distance=103 kB, estimate=4615kB This seemed familiar, and I found it had happened 18 months ago on a different server: |terminate called after throwing an instance of 'std::bad_alloc' | what(): std::bad_alloc |< 2019-02-01 15:36:54.434 CST >LOG: server process (PID 13557) was terminated by signal 6: Aborted |< 2019-02-01 15:36:54.434 CST >DETAIL: Failed process was running: | SELECT *, '' as n | FROM a, b WHERE | a.id = b.id AND | b.c IS NOT NULL | ORDER BY b.c LIMIT 9 I have a log of pg_settings, which shows that this server ran v11.1 until 2019-02-14, when it was upgraded to v11.2. Since v11.2 included a fix for a bug I reported involving JIT and wide tables, I probably saw this crash and dismissed it, even though tables here named "a" and "b" have only ~30 columns combined. The query that crashed in 2019 is actually processing a small queue, and runs every 5sec. The query that crashed today is a "day" level query which runs every 15min, so it ran 70some times today with no issue. Our DBs use postgis, and today's crash JOINs to the table with geometry columns, but does not use them at all. But the 2019 doesn't even include the geometry table. I'm not sure if these are even the same crash, but if they are, I think it's maybe an JIT issue and not postgis (??) I've had JIT disabled since 2019, due to no performance benefit for us, but I've been re-enabling it during upgrades and transitions, and instead disabling jit_tuple_deforming (since this performs badly for columns with high attnum). So maybe this will recur before too long.
pgsql-hackers by date: