Thread: segfault in geqo on experimental gcc animal
Hi, https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=moonjelly&dt=2019-11-09%2010%3A17%3A06 shows a failure, including a backtrace: ======-=-====== stack trace: pgsql.build/src/test/regress/tmp_check/data/core ======-=-====== [New LWP 42902] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `postgres: fabien regression [local] SELECT '. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00000000006d962b in gimme_tour (root=root@entry=0x1cfb4b0, edge_table=edge_table@entry=0x1d3afc0, new_gene=<optimizedout>, num_gene=5) at geqo_erx.c:209 209 remove_gene(root, new_gene[i - 1], edge_table[(int) new_gene[i - 1]], edge_table); #0 0x00000000006d962b in gimme_tour (root=root@entry=0x1cfb4b0, edge_table=edge_table@entry=0x1d3afc0, new_gene=<optimizedout>, num_gene=5) at geqo_erx.c:209 #1 0x00000000006da0a8 in geqo (root=0x1cfb4b0, number_of_rels=<optimized out>, initial_rels=<optimized out>) at geqo_main.c:190 #2 0x00000000006de084 in make_one_rel (root=root@entry=0x1cfb4b0, joinlist=joinlist@entry=0x1d0a868) at allpaths.c:227 #3 0x0000000000701d19 in query_planner (root=root@entry=0x1cfb4b0, qp_callback=qp_callback@entry=0x702300 <standard_qp_callback>,qp_extra=qp_extra@entry=0x7ffd46b55a60) at planmain.c:269 #4 0x0000000000706844 in grouping_planner () at planner.c:2054 #5 0x00000000007093c7 in subquery_planner (glob=glob@entry=0x1cfb418, parse=parse@entry=0x1cd77b8, parent_root=parent_root@entry=0x0,hasRecursion=hasRecursion@entry=false, tuple_fraction=tuple_fraction@entry=0) at planner.c:1014 #6 0x000000000070a803 in standard_planner (parse=0x1cd77b8, cursorOptions=256, boundParams=<optimized out>) at planner.c:406 #7 0x00000000007cb1dc in pg_plan_query (querytree=0x1cd77b8, cursorOptions=256, boundParams=0x0) at postgres.c:873 #8 0x00000000007cb2be in pg_plan_queries (querytrees=0x1cfb3c0, cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0)at postgres.c:963 #9 0x00000000007cb618 in exec_simple_query () at postgres.c:1154 #10 0x00000000007cd384 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x1c23058, dbname=<optimized out>, username=<optimizedout>) at postgres.c:4278 #11 0x000000000074b574 in BackendRun (port=0x1c1c650) at postmaster.c:4498 #12 BackendStartup (port=0x1c1c650) at postmaster.c:4189 #13 ServerLoop () at postmaster.c:1727 #14 0x000000000074c34d in PostmasterMain (argc=argc@entry=8, argv=argv@entry=0x1bf35b0) at postmaster.c:1400 #15 0x0000000000491f41 in main (argc=8, argv=0x1bf35b0) at main.c:210 $1 = {si_signo = 11, si_errno = 0, si_code = 1, _sifields = {_pad = {30650304, -12, 0 <repeats 26 times>}, _kill = {si_pid= 30650304, si_uid = 4294967284}, _timer = {si_tid = 30650304, si_overrun = -12, si_sigval = {sival_int = 0, sival_ptr= 0x0}}, _rt = {si_pid = 30650304, si_uid = 4294967284, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld= {si_pid = 30650304, si_uid = 4294967284, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {si_addr = 0xfffffff401d3afc0,_addr_lsb = 0, _addr_bnd = {_lower = 0x0, _upper = 0x0}}, _sigpoll = {si_band = -51508957248, si_fd =0}}} I don't think there's been any relevant code changes since the last success. last success: 2019-11-09 09:20:28.346 CET [28785:1] LOG: starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 10.0.020191102 (experimental), 64-bit first failure: 2019-11-09 11:19:36.277 CET [42512:1] LOG: starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 10.0.020191109 (experimental), 64-bit so it sure looks like a gcc upgrade caused the failure. But it's not clear wheter it's a compiler bug, or some undefined behaviour that triggers the bug. Fabien, any chance to either bisect or get a bit more information on the backtrace? Greetings, Andres Freund
Hello Andres, > I don't think there's been any relevant code changes since the last > success. > > last success: > 2019-11-09 09:20:28.346 CET [28785:1] LOG: starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 10.0.020191102 (experimental), 64-bit > > first failure: > 2019-11-09 11:19:36.277 CET [42512:1] LOG: starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 10.0.020191109 (experimental), 64-bit > > > so it sure looks like a gcc upgrade caused the failure. But it's not > clear wheter it's a compiler bug, or some undefined behaviour that > triggers the bug. > > Fabien, any chance to either bisect or get a bit more information on the > backtrace? There is a promising "keep_error_builds" option in buildfarm settings, but it does not seem to be used anywhere in the scripts. Well, I can probably relaunch by hand. However, given the experimental nature of the setup, I think that the most probable cause is a newly introduced gcc bug, so I'd suggest to wait to check whether the issue persist before spending time on that, and if it persists to investigate further to either report a bug to gcc or pg, depending. Also, I'll recompile gcc before the next weekly builds. -- Fabien.
>> so it sure looks like a gcc upgrade caused the failure. But it's not >> clear wheter it's a compiler bug, or some undefined behaviour that >> triggers the bug. >> >> Fabien, any chance to either bisect or get a bit more information on >> the backtrace? > > There is a promising "keep_error_builds" option in buildfarm settings, > but it does not seem to be used anywhere in the scripts. Well, I can > probably relaunch by hand. > > However, given the experimental nature of the setup, I think that the > most probable cause is a newly introduced gcc bug, so I'd suggest to > wait to check whether the issue persist before spending time on that, > and if it persists to investigate further to either report a bug to gcc > or pg, depending. > > Also, I'll recompile gcc before the next weekly builds. I did some manual testing. All versions are tested failed miserably (I tested master, 12, 11, 10, 9.6…). High probability that it is a newly introduced gcc bug, however pg is not a nice self contain tested case to submit to gcc for debugging:-( I suggest to ignore for the time being, and if the problem persist I'll try to investigate to detect which gcc commit caused the regression. -- Fabien.
Hello, I did a (slow) dichotomy on gcc sources which determined that gcc r277979 was the culprit, then I started a bug report which showed that the issue was already reported this morning by Martin Liška, including a nice example isolated from sources. See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92506 -- Fabien.
Hi. Yep, I build periodically PostgreSQL package in openSUSE with the latest GCC and so that I identified that and isolated to a simple test-case. I would expect a fix today or tomorrow. See you, Martin On Thu, 14 Nov 2019 at 16:46, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello, > > I did a (slow) dichotomy on gcc sources which determined that gcc r277979 > was the culprit, then I started a bug report which showed that the issue > was already reported this morning by Martin Liška, including a nice > example isolated from sources. See: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92506 > > -- > Fabien.
> Yep, I build periodically PostgreSQL package in openSUSE with the latest > GCC and so that I identified that and isolated to a simple test-case. I > would expect a fix today or tomorrow. Indeed, the gcc issue reported seems fixed by gcc r278259. I'm updating moonjelly gcc to check if this solves pg compilation woes. -- Fabien.
Yes, after the revision I see other failing tests like: ... select_having ... ok 16 ms subselect ... FAILED 92 ms union ... FAILED 77 ms case ... ok 32 ms join ... FAILED 239 ms aggregates ... FAILED 136 ms transactions ... ok 59 ms ... I'm going to investigate that and will inform you guys. Martin On Fri, 15 Nov 2019 at 11:56, Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > > Yep, I build periodically PostgreSQL package in openSUSE with the latest > > GCC and so that I identified that and isolated to a simple test-case. I > > would expect a fix today or tomorrow. > > Indeed, the gcc issue reported seems fixed by gcc r278259. I'm updating > moonjelly gcc to check if this solves pg compilation woes. > > -- > Fabien.
> Yes, after the revision I see other failing tests like: Indeed, I can confirm there are still 18/195 fails with the updated gcc. > I'm going to investigate that and will inform you guys. Great, thanks! -- Fabien.
Heh, it's me who now breaks postgresql build: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92529 Martin On Fri, 15 Nov 2019 at 13:01, Fabien COELHO <fabien.coelho@mines-paristech.fr> wrote: > > > > Yes, after the revision I see other failing tests like: > > Indeed, I can confirm there are still 18/195 fails with the updated gcc. > > > I'm going to investigate that and will inform you guys. > > Great, thanks! > > -- > Fabien.
Hello. The issue is resolved now and tests are fine for me. Martin On Fri, 15 Nov 2019 at 13:11, Martin Liška <marxin.liska@gmail.com> wrote: > > Heh, it's me who now breaks postgresql build: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92529 > > Martin > > On Fri, 15 Nov 2019 at 13:01, Fabien COELHO > <fabien.coelho@mines-paristech.fr> wrote: > > > > > > > Yes, after the revision I see other failing tests like: > > > > Indeed, I can confirm there are still 18/195 fails with the updated gcc. > > > > > I'm going to investigate that and will inform you guys. > > > > Great, thanks! > > > > -- > > Fabien.
Hello Martin, > The issue is resolved now and tests are fine for me. I recompiled gcc trunk and the moonjelly is back to green. Thanks! -- Fabien.