Thread: BUG #17830: Incorrect memory access in trgm_regexp
The following bug has been logged on the website: Bug reference: 17830 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 15.2 Operating system: Ubuntu 22.04 Description: When the following script executed: CREATE EXTENSION pg_trgm; CREATE TABLE t(t text); CREATE INDEX t_idx_gin ON t USING gin (t gin_trgm_ops); SELECT * FROM t WHERE t ~ '.*$x'; valgrind detects an invalid memory read: ==00:00:00:04.044 873608== Invalid read of size 4 ==00:00:00:04.044 873608== at 0x486B907: packGraph (trgm_regexp.c:2070) ==00:00:00:04.044 873608== by 0x486C41E: createTrgmNFAInternal (trgm_regexp.c:621) ==00:00:00:04.044 873608== by 0x486C5CA: createTrgmNFA (trgm_regexp.c:558) ==00:00:00:04.044 873608== by 0x4865EEC: gin_extract_query_trgm (trgm_gin.c:115) ==00:00:00:04.044 873608== by 0x718F04: FunctionCall7Coll (fmgr.c:1293) ==00:00:00:04.044 873608== by 0x6B6EEB: gincost_pattern (selfuncs.c:7193) ==00:00:00:04.044 873608== by 0x6B7132: gincost_opexpr (selfuncs.c:7281) ==00:00:00:04.044 873608== by 0x6BF1C3: gincostestimate (selfuncs.c:7563) ==00:00:00:04.044 873608== by 0x4AAD17: cost_index (costsize.c:588) ==00:00:00:04.044 873608== by 0x4F640D: create_index_path (pathnode.c:1028) ==00:00:00:04.044 873608== by 0x4B6D0E: build_index_paths (indxpath.c:1033) ==00:00:00:04.044 873608== by 0x4B6F1D: get_index_paths (indxpath.c:748) ==00:00:00:04.044 873608== Address 0x108eab00 is 560 bytes inside a recently re-allocated block of size 8,192 alloc'd ==00:00:00:04.044 873608== at 0x4848899: malloc (vg_replace_malloc.c:381) ==00:00:00:04.044 873608== by 0x73A844: AllocSetContextCreateInternal (aset.c:469) ==00:00:00:04.044 873608== by 0x74C43F: tuplesort_begin_common (tuplesort.c:868) ==00:00:00:04.044 873608== by 0x752F75: tuplesort_begin_index_btree (tuplesort.c:1217) ==00:00:00:04.044 873608== by 0x26A40A: _bt_spools_heapscan (nbtsort.c:477) ==00:00:00:04.044 873608== by 0x26BE6B: btbuild (nbtsort.c:329) ==00:00:00:04.044 873608== by 0x2E135F: index_build (index.c:3021) ==00:00:00:04.044 873608== by 0x2E2F12: index_create (index.c:1252) ==00:00:00:04.044 873608== by 0x30983D: create_toast_table (toasting.c:324) ==00:00:00:04.044 873608== by 0x309A2F: CheckAndCreateToastTable (toasting.c:88) ==00:00:00:04.044 873608== by 0x309A9D: NewRelationCreateToastTable (toasting.c:75) ==00:00:00:04.044 873608== by 0x5C942B: ProcessUtilitySlow (utility.c:1199) ==00:00:00:04.044 873608== The invalid access occurs in the line: while (j < arcsCount && arcs[j].sourceState == i) here arcsCount == 1 even when arcs contains no elements, due to the assignment above: arcsCount = (p2 - arcs) + 1;
PG Bug reporting form <noreply@postgresql.org> writes: > When the following script executed: > CREATE EXTENSION pg_trgm; > CREATE TABLE t(t text); > CREATE INDEX t_idx_gin ON t USING gin (t gin_trgm_ops); > SELECT * FROM t WHERE t ~ '.*$x'; > valgrind detects an invalid memory read: > ... > The invalid access occurs in the line: > while (j < arcsCount && arcs[j].sourceState == i) > here arcsCount == 1 even when arcs contains no elements, due to the > assignment above: > arcsCount = (p2 - arcs) + 1; Yeah, that de-duplication code is incorrectly assuming that the NFA has more than zero arcs, which it doesn't because the regex compiler saw that the pattern is unsatisfiable. Thanks for the report! regards, tom lane
11.03.2023 19:39, Tom Lane wrote: > Yeah, that de-duplication code is incorrectly assuming that the > NFA has more than zero arcs, which it doesn't because the regex > compiler saw that the pattern is unsatisfiable. > > Thanks for the report! I've retested trgm_regexp with all regular expressions presented in src/test/regress/sql/regex.sql and src/test/modules/test_regex/sql/test_regex.sql and found no new anomalies. Thank you for the fix! Best regards, Alexander