Thread: BUG #18483: Segmentation fault in tests modules
The following bug has been logged on the website: Bug reference: 18483 Logged by: Alexander Kozhemyakin Email address: a.kozhemyakin@postgrespro.ru PostgreSQL version: 17beta1 Operating system: ubuntu-22.04 Description: first case echo "select * from test_predtest($$ select x, x is not true from booleans ; select x is false, x from booleans $$);" >> src/test/modules/test_predtest/sql/test_predtest.sql make check -C src/test/modules/test_predtest/ Core was generated by `postgres: postgres contrib_regression [local] SELECT '. Program terminated with signal SIGSEGV, Segmentation fault. #0 test_predtest (fcinfo=<optimized out>) at test_predtest.c:143 143 if (list_length(cplan->stmt_list) != 1) (gdb) bt #0 test_predtest (fcinfo=<optimized out>) at test_predtest.c:143 #1 0x0000562c7d026a8d in ExecMakeTableFunctionResult (setexpr=0x562c7edce758, econtext=0x562c7edce628, argContext=<optimized out>, expectedDesc=0x562c7edcf468, randomAccess=false) at execSRF.c:234 #2 0x0000562c7d039181 in FunctionNext (node=0x562c7edce418) at nodeFunctionscan.c:94 #3 0x0000562c7d01d46a in ExecProcNode (node=0x562c7edce418) at ../../../src/include/executor/executor.h:274 #4 ExecutePlan (execute_once=<optimized out>, dest=0x562c7edde5d8, direction=<optimized out>, numberTuples=0, sendTuples=true, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x562c7edce418, estate=0x562c7edce1f0) at execMain.c:1646 #5 standard_ExecutorRun (queryDesc=0x562c7ecdaea0, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363 #6 0x0000562c7d1edc1f in PortalRunSelect (portal=0x562c7ed2ee70, forward=<optimized out>, count=0, dest=<optimized out>) at pquery.c:924 #7 0x0000562c7d1ef2e0 in PortalRun (portal=portal@entry=0x562c7ed2ee70, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x562c7edde5d8, altdest=altdest@entry=0x562c7edde5d8, qc=0x7ffd128b3b00) at pquery.c:768 #8 0x0000562c7d1eb239 in exec_simple_query (query_string=0x562c7ecb09d0 "select * from test_predtest($$ select x, x is not true from booleans ; select x is false, x from booleans $$);") at postgres.c:1274 #9 0x0000562c7d1eca1e in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4680 #10 0x0000562c7d1e7743 in BackendMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at backend_startup.c:105 #11 0x0000562c7d145352 in postmaster_child_launch (child_type=child_type@entry=B_BACKEND, startup_data=startup_data@entry=0x7ffd128b3fa0 "", startup_data_len=startup_data_len@entry=4, client_sock=client_sock@entry=0x7ffd128b3fc0) at launch_backend.c:265 #12 0x0000562c7d14922e in BackendStartup (client_sock=0x7ffd128b3fc0) at postmaster.c:3593 #13 ServerLoop () at postmaster.c:1674 #14 0x0000562c7d14afa0 in PostmasterMain (argc=argc@entry=8, argv=argv@entry=0x562c7ecaaf10) at postmaster.c:1372 #15 0x0000562c7ce39ec7 in main (argc=8, argv=0x562c7ecaaf10) at main.c:197 second case echo "CREATE EXTENSION test_tidstore; SELECT do_set_block_offsets(1, ARRAY[0]::int2[]);" > src/test/modules/test_tidstore/sql/test_tidstore.sql make check -C src/test/modules/test_tidstore Core was generated by `postgres: postgres contrib_regression [local] SELECT '. Program terminated with signal SIGSEGV, Segmentation fault. #0 TidStoreLockExclusive (ts=0x0) at tidstore.c:300 300 if (TidStoreIsShared(ts)) (gdb) bt #0 TidStoreLockExclusive (ts=0x0) at tidstore.c:300 #1 0x00007f84c48aa8a1 in do_set_block_offsets (fcinfo=<optimized out>) at test_tidstore.c:176 #2 0x00005598b1917af5 in ExecInterpExpr (state=0x5598b28c1f98, econtext=0x5598b28c1d68, isnull=<optimized out>) at execExprInterp.c:764 #3 0x00005598b19495a4 in ExecEvalExprSwitchContext (isNull=0x7fff0096cde7, econtext=0x5598b28c1d68, state=0x5598b28c1f98) at ../../../src/include/executor/executor.h:356 #4 ExecProject (projInfo=0x5598b28c1f90) at ../../../src/include/executor/executor.h:390 #5 ExecResult (pstate=<optimized out>) at nodeResult.c:135 #6 0x00005598b191b3c2 in ExecProcNode (node=0x5598b28c1c60) at ../../../src/include/executor/executor.h:274 #7 ExecutePlan (execute_once=<optimized out>, dest=0x5598b28ac728, direction=<optimized out>, numberTuples=0, sendTuples=true, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x5598b28c1c60, estate=0x5598b28c1a48) at execMain.c:1646 #8 standard_ExecutorRun (queryDesc=0x5598b280afc8, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363 #9 0x00005598b1abe53f in PortalRunSelect (portal=0x5598b285cf48, forward=<optimized out>, count=0, dest=<optimized out>) at pquery.c:924 #10 0x00005598b1abf912 in PortalRun (portal=portal@entry=0x5598b285cf48, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x5598b28ac728, altdest=altdest@entry=0x5598b28ac728, qc=0x7fff0096d060) at pquery.c:768 #11 0x00005598b1abbe2c in exec_simple_query (query_string=0x5598b27e0b18 "SELECT do_set_block_offsets(1, ARRAY[0]::int2[]);") at postgres.c:1274 #12 0x00005598b1abd90d in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4680 #13 0x00005598b1ab85b5 in BackendMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at backend_startup.c:105 #14 0x00005598b1a29d38 in postmaster_child_launch (child_type=child_type@entry=B_BACKEND, startup_data=startup_data@entry=0x7fff0096d490 "", startup_data_len=startup_data_len@entry=4, client_sock=client_sock@entry=0x7fff0096d4b0) at launch_backend.c:265 #15 0x00005598b1a2d859 in BackendStartup (client_sock=0x7fff0096d4b0) at postmaster.c:3593 #16 ServerLoop () at postmaster.c:1674 #17 0x00005598b1a2f501 in PostmasterMain (argc=argc@entry=8, argv=argv@entry=0x5598b27db030) at postmaster.c:1372 #18 0x00005598b176e617 in main (argc=8, argv=0x5598b27db030) at main.c:197
On Tue, May 28, 2024 at 03:15:06AM +0000, PG Bug reporting form wrote: Thanks for the report! My apologies for the late reply but most people have been busy with pgconf.dev last week. > second case > echo "CREATE EXTENSION test_tidstore; > SELECT do_set_block_offsets(1, ARRAY[0]::int2[]);" > > src/test/modules/test_tidstore/sql/test_tidstore.sql > make check -C src/test/modules/test_tidstore > > Core was generated by `postgres: postgres contrib_regression [local] SELECT > '. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 TidStoreLockExclusive (ts=0x0) at tidstore.c:300 > 300 if (TidStoreIsShared(ts)) > (gdb) bt > #0 TidStoreLockExclusive (ts=0x0) at tidstore.c:300 > #1 0x00007f84c48aa8a1 in do_set_block_offsets (fcinfo=<optimized out>) at Yes, something is wrong here. I have added an open item to track that. John and Sawada-san (added in CC), could you look at that? The item has been assigned to both of you. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > Thanks for the report! My apologies for the late reply but most > people have been busy with pgconf.dev last week. The test_predtest problem is mine, will fix it in a bit. Didn't look at the test_tidstore issue. regards, tom lane
On Fri, Jun 07, 2024 at 04:32:56PM -0400, Tom Lane wrote: > The test_predtest problem is mine, will fix it in a bit. Thanks. -- Michael
Attachment
On Tue, Jun 4, 2024 at 5:40 AM Michael Paquier <michael@paquier.xyz> wrote: > Yes, something is wrong here. I have added an open item to track > that. > > John and Sawada-san (added in CC), could you look at that? The item > has been assigned to both of you. I've been sick after the conference, but I'll take a look in the near future.
Hello Michael and Tom, 08.06.2024 01:29, Michael Paquier wrote: > On Fri, Jun 07, 2024 at 04:32:56PM -0400, Tom Lane wrote: >> The test_predtest problem is mine, will fix it in a bit. > Thanks. May I ask you about the project policy regarding such test modules failures? Do you think all of these should be fixed or it's ok to have some server-crashing modules in the tree? I'm asking because I collected a number of such defects but didn't dare to report them yet. For example: echo "CREATE EXTENSION test_shm_mq; SELECT test_shm_mq_pipelined(64, '', 1, 0); " > src/test/modules/test_shm_mq/sql/test_shm_mq.sql make -s check -C src/test/modules/test_shm_mq/ leads to: 1087 Assert(handle->slot < max_worker_processes); (gdb) bt #0 GetBackgroundWorkerPid (handle=0x7f7f7f7f00000000, ...) at bgworker.c:1087 #1 0x000055cab243b2a6 in shm_mq_counterparty_gone (...) at shm_mq.c:1193 ... Best regards, Alexander
Alexander Lakhin <exclusion@gmail.com> writes: > May I ask you about the project policy regarding such test modules > failures? > Do you think all of these should be fixed or it's ok to have some > server-crashing modules in the tree? I don't think there's a one-size-fits-all policy for them. Since we don't intend these modules to be installed, they don't really need to work in any cases beyond the ones exercised in the tests. Still, if it's easy to stop a misbehavior then we probably should. The test_predtest problem was just a missing check-for-null so I definitely regard that as worth fixing. These other ones would have to be investigated enough to figure out the size of the fix before I'd venture an opinion. regards, tom lane
Hello Tom, 08.06.2024 18:07, Tom Lane wrote: > Alexander Lakhin <exclusion@gmail.com> writes: >> May I ask you about the project policy regarding such test modules >> failures? >> Do you think all of these should be fixed or it's ok to have some >> server-crashing modules in the tree? > I don't think there's a one-size-fits-all policy for them. Since > we don't intend these modules to be installed, they don't really > need to work in any cases beyond the ones exercised in the tests. > Still, if it's easy to stop a misbehavior then we probably should. > The test_predtest problem was just a missing check-for-null so > I definitely regard that as worth fixing. These other ones would > have to be investigated enough to figure out the size of the fix > before I'd venture an opinion. Thank you for the explanation! Then maybe you would like to take a look at the rest of my collection and determine which sizes fit these issues: 2) echo " select test_enc_conversion('\x8bc68bcf8b', 'gb18030', 'gb18030', false); " >> src/test/regress/sql/conversion.sql TESTS="conversion" make -s check-tests produces (under Valgrind): ==00:00:00:05.947 3320530== Invalid read of size 1 ==00:00:00:05.947 3320530== at 0x6F21E5: pg_gb18030_mblen (wchar.c:1000) ==00:00:00:05.947 3320530== by 0x6F2BA0: pg_encoding_mblen (wchar.c:2072) ==00:00:00:05.947 3320530== by 0x6B4EAF: report_invalid_encoding (mbutils.c:1700) ==00:00:00:05.947 3320530== by 0x4867D3F: test_enc_conversion (regress.c:1154) ==00:00:00:05.947 3320530== by 0x3B9796: ExecInterpExpr (execExprInterp.c:764) If I understand correctly, the defect is inside test_enc_conversion(), not in the core code. 3) echo " CREATE EXTENSION test_tidstore; SELECT test_is_full(); " >src/test/modules/test_tidstore/sql/test_tidstore.sql make -s check -C src/test/modules/test_tidstore leads to: Program terminated with signal SIGSEGV, Segmentation fault. #0 TidStoreMemoryUsage (ts=0x0) at tidstore.c:553 553 if (TidStoreIsShared(ts)) (gdb) bt #0 TidStoreMemoryUsage (ts=0x0) at tidstore.c:553 #1 0x00007f2d9c715cb7 in test_is_full (fcinfo=<optimized out>) at test_tidstore.c:308 #2 0x000055707450a302 in ExecInterpExpr (state=0x557074faecc8, econtext=0x557074faea98, isnull=0x7ffdd50b2747) at execExprInterp.c:740 #3 0x0000557074506c49 in ExecInterpExprStillValid (state=0x557074faecc8, econtext=0x557074faea98, isNull=0x7ffdd50b2747) at execExprInterp.c:1915 I guess, test_is_full() is missing a check-for-null too. 4) echo " CREATE TABLE tov(id int not null, ov oidvector); INSERT INTO tov (id, ov) VALUES (1, '1 2 3'); SELECT id, make_tuple_indirect(tov)::text FROM tov; " >> src/test/regress/sql/indirect_toast.sql TESTS="indirect_toast" make -s check-tests produces: ERROR: invalid memory alloc request size 18446744073608493557 I guess, make_tuple_indirect() could be improved with: /* only work on existing, not-null varlenas */ if (TupleDescAttr(tupdesc, i)->attisdropped || nulls[i] || - TupleDescAttr(tupdesc, i)->attlen != -1) + TupleDescAttr(tupdesc, i)->attlen != -1 || + TupleDescAttr(tupdesc, i)->attstorage == TYPSTORAGE_PLAIN) Best regards, Alexander
Hi, Sorry for the late reply. On Wed, May 29, 2024 at 1:41 AM PG Bug reporting form <noreply@postgresql.org> wrote: > > The following bug has been logged on the website: > > Bug reference: 18483 > Logged by: Alexander Kozhemyakin > Email address: a.kozhemyakin@postgrespro.ru > PostgreSQL version: 17beta1 > Operating system: ubuntu-22.04 > Description: > > second case > echo "CREATE EXTENSION test_tidstore; > SELECT do_set_block_offsets(1, ARRAY[0]::int2[]);" > > src/test/modules/test_tidstore/sql/test_tidstore.sql > make check -C src/test/modules/test_tidstore Since the test script was modified so that it doesn't create tidstore before setting TIDs, the server crashed with SEGV. I remember that we used to check that the tidstore is already created before use, but we removed that check because it would not be very helpful; if a developer wiped out the create call then the test crashes pretty obviously[1]. Do we want to revive that check as raising an error is better than a SEGV? Regards, [1] https://www.postgresql.org/message-id/CANWCAZbJAxuw6HC%2BTc3p90mCZMKCaHknfrfzxMK5G0P6qp2PwQ%40mail.gmail.com -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Sun, Jun 09, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote: > 2) > echo " > select test_enc_conversion('\x8bc68bcf8b', 'gb18030', 'gb18030', false); > " >> src/test/regress/sql/conversion.sql > TESTS="conversion" make -s check-tests > > produces (under Valgrind): > ==00:00:00:05.947 3320530== Invalid read of size 1 > ==00:00:00:05.947 3320530== at 0x6F21E5: pg_gb18030_mblen (wchar.c:1000) > ==00:00:00:05.947 3320530== by 0x6F2BA0: pg_encoding_mblen (wchar.c:2072) > ==00:00:00:05.947 3320530== by 0x6B4EAF: report_invalid_encoding (mbutils.c:1700) > ==00:00:00:05.947 3320530== by 0x4867D3F: test_enc_conversion (regress.c:1154) > ==00:00:00:05.947 3320530== by 0x3B9796: ExecInterpExpr (execExprInterp.c:764) > > If I understand correctly, the defect is inside test_enc_conversion(), not > in the core code. Interesting, I cannot see this one, did you use a specific option with valgrind? > 3) > echo " > CREATE EXTENSION test_tidstore; > SELECT test_is_full(); > " >src/test/modules/test_tidstore/sql/test_tidstore.sql > make -s check -C src/test/modules/test_tidstore > > I guess, test_is_full() is missing a check-for-null too. Yeah, this one is simple enough that it should be fixed. Even if it is not in the scope of the test, that's a bit surprising. An undefined tid store means that it should not be full, but returning an error may make more sense if there's nothing. > 4) > I guess, make_tuple_indirect() could be improved with: > /* only work on existing, not-null varlenas */ > if (TupleDescAttr(tupdesc, i)->attisdropped || > nulls[i] || > - TupleDescAttr(tupdesc, i)->attlen != -1) > + TupleDescAttr(tupdesc, i)->attlen != -1 || > + TupleDescAttr(tupdesc, i)->attstorage == TYPSTORAGE_PLAIN) Fun. True enough that this function is only here for toastable values, so that makes sense. -- Michael
Attachment
On Mon, Jun 10, 2024 at 9:33 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Hi, > > Sorry for the late reply. > > On Wed, May 29, 2024 at 1:41 AM PG Bug reporting form > <noreply@postgresql.org> wrote: > > > > The following bug has been logged on the website: > > > > Bug reference: 18483 > > Logged by: Alexander Kozhemyakin > > Email address: a.kozhemyakin@postgrespro.ru > > PostgreSQL version: 17beta1 > > Operating system: ubuntu-22.04 > > Description: > > > > second case > > echo "CREATE EXTENSION test_tidstore; > > SELECT do_set_block_offsets(1, ARRAY[0]::int2[]);" > > > src/test/modules/test_tidstore/sql/test_tidstore.sql > > make check -C src/test/modules/test_tidstore > > Since the test script was modified so that it doesn't create tidstore > before setting TIDs, the server crashed with SEGV. I remember that we > used to check that the tidstore is already created before use, but we > removed that check because it would not be very helpful; if a > developer wiped out the create call then the test crashes pretty > obviously[1]. Do we want to revive that check as raising an error is > better than a SEGV? > I've attached the patch to fix the issue. I'm going to push it, barring any objections. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
On Mon, Jun 10, 2024 at 03:23:27PM +0900, Masahiko Sawada wrote: > I've attached the patch to fix the issue. I'm going to push it, > barring any objections. Raising an error if there's nothing when calling these functions sounds good to me. Thanks for the quick workaround. -- Michael
Attachment
Hello Michael, 10.06.2024 08:50, Michael Paquier wrote: > On Sun, Jun 09, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote: >> 2) >> echo " >> select test_enc_conversion('\x8bc68bcf8b', 'gb18030', 'gb18030', false); >> " >> src/test/regress/sql/conversion.sql >> TESTS="conversion" make -s check-tests >> >> produces (under Valgrind): >> ==00:00:00:05.947 3320530== Invalid read of size 1 >> ==00:00:00:05.947 3320530== at 0x6F21E5: pg_gb18030_mblen (wchar.c:1000) >> ==00:00:00:05.947 3320530== by 0x6F2BA0: pg_encoding_mblen (wchar.c:2072) >> ==00:00:00:05.947 3320530== by 0x6B4EAF: report_invalid_encoding (mbutils.c:1700) >> ==00:00:00:05.947 3320530== by 0x4867D3F: test_enc_conversion (regress.c:1154) >> ==00:00:00:05.947 3320530== by 0x3B9796: ExecInterpExpr (execExprInterp.c:764) >> >> If I understand correctly, the defect is inside test_enc_conversion(), not >> in the core code. > Interesting, I cannot see this one, did you use a specific option with > valgrind? Thanks for your interest to it! My Valgrind-testing procedure is: git apply .../install-vrunner.patch # use the attached CPPFLAGS="-Og -DUSE_VALGRIND" ./configure -q && make -j8 -s echo " select test_enc_conversion('\x8bc68bcf8b', 'gb18030', 'gb18030', false); " >> src/test/regress/sql/conversion.sql TESTS="conversion" make -s check-tests not ok 1 - conversion 3189 ms # (test process exited with exit code 2) Best regards, Alexander
Attachment
On Mon, Jun 10, 2024 at 11:00:00AM +0300, Alexander Lakhin wrote: > 10.06.2024 08:50, Michael Paquier wrote: >> On Sun, Jun 09, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote: >>> 2) >>> echo " >>> select test_enc_conversion('\x8bc68bcf8b', 'gb18030', 'gb18030', false); >>> " >> src/test/regress/sql/conversion.sql >>> TESTS="conversion" make -s check-tests >>> >>> produces (under Valgrind): >>> ==00:00:00:05.947 3320530== Invalid read of size 1 >>> ==00:00:00:05.947 3320530== at 0x6F21E5: pg_gb18030_mblen (wchar.c:1000) >>> ==00:00:00:05.947 3320530== by 0x6F2BA0: pg_encoding_mblen (wchar.c:2072) >>> ==00:00:00:05.947 3320530== by 0x6B4EAF: report_invalid_encoding (mbutils.c:1700) >>> ==00:00:00:05.947 3320530== by 0x4867D3F: test_enc_conversion (regress.c:1154) >>> ==00:00:00:05.947 3320530== by 0x3B9796: ExecInterpExpr (execExprInterp.c:764) So, valgrind is noisy here because the incomplete byte sequence \x8b passed as input of the function after a valid byte sequence causes pg_encoding_mblen() to return 2, which is 1 byte passed the end of the actual string. report_invalid_encoding() is careful to not show any data passed in its input. I am not really sure that this is worth bothering with here, TBH. As far as I know this is related to the fact that we want test_enc_conversion() to be flexible enough to accept any kind of input, working as a cheap shortcut to call these internal encoding routines. What do you think? > My Valgrind-testing procedure is: > git apply .../install-vrunner.patch # use the attached > CPPFLAGS="-Og -DUSE_VALGRIND" ./configure -q && make -j8 -s Thanks. I was missing the -DUSE_VALGRIND. -- Michael
Attachment
11.06.2024 02:10, Michael Paquier wrote: > So, valgrind is noisy here because the incomplete byte sequence > \x8b passed as input of the function after a valid byte sequence > causes pg_encoding_mblen() to return 2, which is 1 byte passed the end > of the actual string. report_invalid_encoding() is careful to not > show any data passed in its input. I am not really sure that this is > worth bothering with here, TBH. As far as I know this is related to > the fact that we want test_enc_conversion() to be flexible enough to > accept any kind of input, working as a cheap shortcut to call these > internal encoding routines. What do you think? I think that maybe this case illustrates an API shortcoming. The report_invalid_encoding function takes len as an argument, but then passes mbstr as a null-terminated string to pg_encoding_mblen(): /* * report_invalid_encoding: complain about invalid multibyte character * * note: len is remaining length of string, not length of character; * len must be greater than zero, as we always examine the first byte. */ void report_invalid_encoding(int encoding, const char *mbstr, int len) { int l = pg_encoding_mblen(encoding, mbstr); ... Maybe the comment should say something about null-termination expected? I could not find other report_invalid_encoding() callers that pass not null-terminated strings to it, so maybe leave test_enc_conversion() as-is, just keep in mind that it is an imperfect example of report_invalid_encoding() usage and it might require some modifications for fuzz testing. Best regards, Alexander
On Mon, Jun 10, 2024 at 02:50:33PM +0900, Michael Paquier wrote: > On Sun, Jun 09, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote: >> 4) >> I guess, make_tuple_indirect() could be improved with: >> /* only work on existing, not-null varlenas */ >> if (TupleDescAttr(tupdesc, i)->attisdropped || >> nulls[i] || >> - TupleDescAttr(tupdesc, i)->attlen != -1) >> + TupleDescAttr(tupdesc, i)->attlen != -1 || >> + TupleDescAttr(tupdesc, i)->attstorage == TYPSTORAGE_PLAIN) > > Fun. True enough that this function is only here for toastable > values, so that makes sense. This one was indeed simple enough, so fixed it for now. -- Michael