[sqlsmith] crashes in RestoreSnapshot on hot standby - Mailing list pgsql-hackers

From Andreas Seltenreich
Subject [sqlsmith] crashes in RestoreSnapshot on hot standby
Date
Msg-id 87r3bes70d.fsf@credativ.de
Whole thread Raw
Responses Re: [sqlsmith] crashes in RestoreSnapshot on hot standby
List pgsql-hackers
Running sqlsmith on a streaming slave (master as of f8c5855) is
inconspicuous as long as the master is idle.  As soon as I start it on
the master as well, the standby frequently crashes in RestoreSnapshot.
It doesn't seem to be specific to the queries, as they don't trigger a
crash when re-run.

Backtraces always look like the ones below.

regards,
Andreas

**** BEGIN BACKTRACE OF CORE FILE ./slave/postgres.9826@.core ON doombat ****
Core was generated by `postgres: smith regression [local] SELECT                         '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:167
167    ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Datei oder Verzeichnis nicht gefunden.
(gdb) bt
#0  __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:167
#1  0x0000000000822032 in RestoreSnapshot (start_address=start_address@entry=0x7f2701d5a110 <error: Cannot access
memoryat address 0x7f2701d5a110>) at snapmgr.c:2020
 
#2  0x00000000004a934a in heap_beginscan_parallel (relation=0x2060a90,
parallel_scan=parallel_scan@entry=0x7f2701d5a0f8)at heapam.c:1657
 
#3  0x00000000005fbedf in ExecSeqScanInitializeDSM (node=0x1f5b470, pcxt=0x221af88) at nodeSeqscan.c:327
#4  0x00000000005dd0ad in ExecParallelInitializeDSM (planstate=planstate@entry=0x1f5b470, d=d@entry=0x7ffd4ba200d0) at
execParallel.c:245
#5  0x00000000005dd425 in ExecInitParallelPlan (planstate=0x1f5b470, estate=estate@entry=0x1f5ab28, nworkers=2) at
execParallel.c:477
#6  0x00000000005ef4a4 in ExecGather (node=node@entry=0x1f5b048) at nodeGather.c:159
#7  0x00000000005dda48 in ExecProcNode (node=node@entry=0x1f5b048) at execProcnode.c:515
#8  0x00000000005f4b30 in ExecLimit (node=node@entry=0x1f5acd0) at nodeLimit.c:91
#9  0x00000000005dd9d8 in ExecProcNode (node=node@entry=0x1f5acd0) at execProcnode.c:531
#10 0x00000000005fef7c in ExecSetParamPlan (node=<optimized out>, econtext=0x1f5c138) at nodeSubplan.c:999
#11 0x00000000005e28b5 in ExecEvalParamExec (exprstate=<optimized out>, econtext=<optimized out>, isNull=0x22045b0 "",
isDone=<optimizedout>) at execQual.c:1135
 
#12 0x00000000005deb6d in ExecMakeFunctionResultNoSets (fcache=0x2204200, econtext=0x1f5c138, isNull=0x2203d98 "",
isDone=<optimizedout>) at execQual.c:2015
 
#13 0x00000000005de29a in ExecEvalCoalesce (coalesceExpr=<optimized out>, econtext=0x1f5c138, isNull=0x2203d98 "",
isDone=<optimizedout>) at execQual.c:3446
 
#14 0x00000000005deb6d in ExecMakeFunctionResultNoSets (fcache=0x22039e8, econtext=0x1f5c138, isNull=0x7ffd4ba203df "",
isDone=<optimizedout>) at execQual.c:2015
 
#15 0x00000000005e4939 in ExecQual (qual=<optimized out>, econtext=econtext@entry=0x1f5c138,
resultForNull=resultForNull@entry=0'\000') at execQual.c:5269
 
#16 0x00000000005faef1 in ExecResult (node=node@entry=0x1f5c020) at nodeResult.c:82
#17 0x00000000005ddbf8 in ExecProcNode (node=node@entry=0x1f5c020) at execProcnode.c:392
#18 0x00000000005d9c1f in ExecutePlan (dest=0x1ebb7d0, direction=<optimized out>, numberTuples=0, sendTuples=<optimized
out>,operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1f5c020, estate=0x1f5ab28) at
execMain.c:1567
#19 standard_ExecutorRun (queryDesc=0x1f5a718, direction=<optimized out>, count=0) at execMain.c:338
#20 0x00000000006f7238 in PortalRunSelect (portal=portal@entry=0x1d13be8, forward=forward@entry=1 '\001', count=0,
count@entry=9223372036854775807,dest=dest@entry=0x1ebb7d0) at pquery.c:946
 
#21 0x00000000006f875e in PortalRun (portal=0x1d13be8, count=9223372036854775807, isTopLevel=<optimized out>,
dest=0x1ebb7d0,altdest=0x1ebb7d0, completionTag=0x7ffd4ba20840 "") at pquery.c:787
 
#22 0x00000000006f6003 in exec_simple_query (query_string=<optimized out>) at postgres.c:1094
#23 PostgresMain (argc=30489576, argv=0x1ecfb08, dbname=0x1cf5a00 "regression", username=0x1ecfc20 "\b\373\354\001") at
postgres.c:4074
#24 0x000000000046ca67 in BackendRun (port=0x1d17b50) at postmaster.c:4262
#25 BackendStartup (port=0x1d17b50) at postmaster.c:3936
#26 ServerLoop () at postmaster.c:1693
#27 0x0000000000690ab7 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1cf45e0) at postmaster.c:1301
#28 0x000000000046d9cd in main (argc=3, argv=0x1cf45e0) at main.c:228
(gdb) p debug_query_string
$1 = 0x1d68a78 "select  \n  sample_0.j as c0\nfrom \n  public.testjsonb as sample_0 tablesample system (8) \nwhere
cast(coalesce(pg_catalog.char_length(\n     cast((select comment from public.room limit 1 offset 20)\n         as
text)),\n   pg_catalog.pg_trigger_depth()) as integer) <> 3"
 

**** BEGIN BACKTRACE OF CORE FILE ./slave/postgres.8104@.core ON marbit ****
Core was generated by `postgres: bgworker: parallel worker for PID 2610                  '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:167
167    ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: Datei oder Verzeichnis nicht gefunden.
(gdb) bt
#0  __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:167
#1  0x0000000000822032 in RestoreSnapshot (start_address=0x7f7b1ee4fa58 <error: Cannot access memory at address
0x7f7b1ee4fa58>)at snapmgr.c:2020
 
#2  0x00000000004e6142 in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1010
#3  0x0000000000683b62 in StartBackgroundWorker () at bgworker.c:726
#4  0x000000000068ec32 in do_start_bgworker (rw=0x22a8900) at postmaster.c:5535
#5  maybe_start_bgworker () at postmaster.c:5709
#6  0x000000000068f686 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:4971
#7  <signal handler called>
#8  0x00007f7b1e3f3ac3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
#9  0x000000000046c381 in ServerLoop () at postmaster.c:1657
#10 0x0000000000690ab7 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x22825e0) at postmaster.c:1301
#11 0x000000000046d9cd in main (argc=3, argv=0x22825e0) at main.c:228



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: WIP: About CMake v2
Next
From: Robert Haas
Date:
Subject: Re: fixing consider_parallel for upper planner rels