Thread: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements
BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18947 Logged by: Alexander Kozhemyakin Email address: a.kozhemyakin@postgrespro.ru PostgreSQL version: 18beta1 Operating system: Ubuntu 24.04 Description: Hi, I encountered an assertion failure when running the following query on the master branch (commit 112e40b8) when building with --enable-cassert reproduce initdb -D data echo " shared_preload_libraries = 'pg_stat_statements' " > data/postgresql.auto.conf pg_ctl -D data -l logfile start psql <<EOF (values(1)) limit 1; EOF bt (gdb) #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007b9ecba4527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007b9ecba288ff in __GI_abort () at ./stdlib/abort.c:79 #5 0x00005ee635e0da6f in ExceptionalCondition (conditionName=conditionName@entry=0x7b9ecc55f97d "len_to_wrt >= 0", fileName=fileName@entry=0x7b9ecc55f000 "/home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c", lineNumber=lineNumber@entry=2929) at /home/postgres/test/postgres/src/backend/utils/error/assert.c:66 #6 0x00007b9ecc55c5a6 in generate_normalized_query (query_len_p=0x7fff53b9a348, query_loc=1, query=0x5ee6575d0df1 "values(1)) limit 1;", jstate=0x5ee657606e88) at /home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c:2929 #7 pgss_store (query=0x5ee6575d0df1 "values(1)) limit 1;", queryId=<optimized out>, query_location=<optimized out>, query_len=<optimized out>, kind=kind@entry=PGSS_INVALID, total_time=total_time@entry=0, rows=rows@entry=0, bufusage=0x0, walusage=0x0, jitusage=0x0, jstate=0x5ee657606e88, parallel_workers_to_launch=0, parallel_workers_launched=0) at /home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c:1350 #8 0x00007b9ecc55c6ca in pgss_post_parse_analyze (pstate=0x5ee6575d1a98, query=0x5ee6575d1ba8, jstate=0x5ee657606e88) at /home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c:868 #9 0x00005ee6359f46cb in parse_analyze_fixedparams (parseTree=parseTree@entry=0x5ee6575d1a18, sourceText=sourceText@entry=0x5ee6575d0df0 "(values(1)) limit 1;", paramTypes=paramTypes@entry=0x0, numParams=numParams@entry=0, queryEnv=queryEnv@entry=0x0) at /home/postgres/test/postgres/src/backend/parser/analyze.c:128 #10 0x00005ee635cb71a0 in pg_analyze_and_rewrite_fixedparams (queryEnv=0x0, numParams=0, paramTypes=0x0, query_string=<optimized out>, parsetree=<optimized out>) at /home/postgres/test/postgres/src/backend/tcop/postgres.c:682 #11 exec_simple_query (query_string=0x5ee6575d0df0 "(values(1)) limit 1;") at /home/postgres/test/postgres/src/backend/tcop/postgres.c:1189 #12 0x00005ee635cb8eb0 in PostgresMain (dbname=<optimized out>, username=<optimized out>) at /home/postgres/test/postgres/src/backend/tcop/postgres.c:4766 #13 0x00005ee635cb3293 in BackendMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at /home/postgres/test/postgres/src/backend/tcop/backend_startup.c:124 #14 0x00005ee635c021e2 in postmaster_child_launch (child_type=<optimized out>, child_slot=1, startup_data=startup_data@entry=0x7fff53b9a9d0, startup_data_len=startup_data_len@entry=24, client_sock=client_sock@entry=0x7fff53b9a9f0) at /home/postgres/test/postgres/src/backend/postmaster/launch_backend.c:290 #15 0x00005ee635c06014 in BackendStartup (client_sock=0x7fff53b9a9f0) at /home/postgres/test/postgres/src/backend/postmaster/postmaster.c:3580 #16 ServerLoop () at /home/postgres/test/postgres/src/backend/postmaster/postmaster.c:1702 #17 0x00005ee635c07b0e in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x5ee65758a0b0) at /home/postgres/test/postgres/src/backend/postmaster/postmaster.c:1400 #18 0x00005ee6358d3f4e in main (argc=3, argv=0x5ee65758a0b0) at /home/postgres/test/postgres/src/backend/main/main.c:227
On Sat, Jun 7, 2025 at 9:10 PM PG Bug reporting form <noreply@postgresql.org> wrote: > > The following bug has been logged on the website: > > Bug reference: 18947 > Logged by: Alexander Kozhemyakin > Email address: a.kozhemyakin@postgrespro.ru > PostgreSQL version: 18beta1 > Operating system: Ubuntu 24.04 > Description: > > Hi, I encountered an assertion failure when running the following query on > the master branch (commit 112e40b8) when building with --enable-cassert > reproduce > initdb -D data > echo " > shared_preload_libraries = 'pg_stat_statements' > " > data/postgresql.auto.conf > pg_ctl -D data -l logfile start > psql <<EOF > (values(1)) limit 1; > EOF > bt > (gdb) > #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized > out>) at ./nptl/pthread_kill.c:44 > #1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at > ./nptl/pthread_kill.c:78 > #2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at > ./nptl/pthread_kill.c:89 > #3 0x00007b9ecba4527e in __GI_raise (sig=sig@entry=6) at > ../sysdeps/posix/raise.c:26 > #4 0x00007b9ecba288ff in __GI_abort () at ./stdlib/abort.c:79 > #5 0x00005ee635e0da6f in ExceptionalCondition > (conditionName=conditionName@entry=0x7b9ecc55f97d "len_to_wrt >= 0", > fileName=fileName@entry=0x7b9ecc55f000 > "/home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c", > lineNumber=lineNumber@entry=2929) > at /home/postgres/test/postgres/src/backend/utils/error/assert.c:66 > #6 0x00007b9ecc55c5a6 in generate_normalized_query > (query_len_p=0x7fff53b9a348, query_loc=1, query=0x5ee6575d0df1 "values(1)) > limit 1;", jstate=0x5ee657606e88) at > /home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c:2929 > #7 pgss_store (query=0x5ee6575d0df1 "values(1)) limit 1;", > queryId=<optimized out>, query_location=<optimized out>, > query_len=<optimized out>, kind=kind@entry=PGSS_INVALID, > total_time=total_time@entry=0, rows=rows@entry=0, bufusage=0x0, > walusage=0x0, jitusage=0x0, jstate=0x5ee657606e88, > parallel_workers_to_launch=0, parallel_workers_launched=0) at > /home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c:1350 > #8 0x00007b9ecc55c6ca in pgss_post_parse_analyze (pstate=0x5ee6575d1a98, > query=0x5ee6575d1ba8, jstate=0x5ee657606e88) at > /home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c:868 > #9 0x00005ee6359f46cb in parse_analyze_fixedparams > (parseTree=parseTree@entry=0x5ee6575d1a18, > sourceText=sourceText@entry=0x5ee6575d0df0 "(values(1)) limit 1;", > paramTypes=paramTypes@entry=0x0, numParams=numParams@entry=0, > queryEnv=queryEnv@entry=0x0) > at /home/postgres/test/postgres/src/backend/parser/analyze.c:128 > #10 0x00005ee635cb71a0 in pg_analyze_and_rewrite_fixedparams (queryEnv=0x0, > numParams=0, paramTypes=0x0, query_string=<optimized out>, > parsetree=<optimized out>) at > /home/postgres/test/postgres/src/backend/tcop/postgres.c:682 > #11 exec_simple_query (query_string=0x5ee6575d0df0 "(values(1)) limit 1;") > at /home/postgres/test/postgres/src/backend/tcop/postgres.c:1189 > #12 0x00005ee635cb8eb0 in PostgresMain (dbname=<optimized out>, > username=<optimized out>) at > /home/postgres/test/postgres/src/backend/tcop/postgres.c:4766 > #13 0x00005ee635cb3293 in BackendMain (startup_data=<optimized out>, > startup_data_len=<optimized out>) at > /home/postgres/test/postgres/src/backend/tcop/backend_startup.c:124 > #14 0x00005ee635c021e2 in postmaster_child_launch (child_type=<optimized > out>, child_slot=1, startup_data=startup_data@entry=0x7fff53b9a9d0, > startup_data_len=startup_data_len@entry=24, > client_sock=client_sock@entry=0x7fff53b9a9f0) at > /home/postgres/test/postgres/src/backend/postmaster/launch_backend.c:290 > #15 0x00005ee635c06014 in BackendStartup (client_sock=0x7fff53b9a9f0) at > /home/postgres/test/postgres/src/backend/postmaster/postmaster.c:3580 > #16 ServerLoop () at > /home/postgres/test/postgres/src/backend/postmaster/postmaster.c:1702 > #17 0x00005ee635c07b0e in PostmasterMain (argc=argc@entry=3, > argv=argv@entry=0x5ee65758a0b0) at > /home/postgres/test/postgres/src/backend/postmaster/postmaster.c:1400 > #18 0x00005ee6358d3f4e in main (argc=3, argv=0x5ee65758a0b0) at > /home/postgres/test/postgres/src/backend/main/main.c:227 I didn't get time to debug, I might look into it tomorrow if someone doesn't do it before that, but I am able to reproduce the issue and seems like problem is with respect to computing the quer_loc in below loop, and due to that 'quer_loc' is becoming bigger than query_len and hitting the assert. for (i = 0; i < jstate->clocations_count; i++) { ... quer_loc = off + tok_len; } len_to_wrt = query_len - quer_loc; -- Regards, Dilip Kumar Google
Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements
From
Michael Paquier
Date:
On Sat, Jun 07, 2025 at 10:19:53PM +0530, Dilip Kumar wrote: > I didn't get time to debug, I might look into it tomorrow if someone > doesn't do it before that, but I am able to reproduce the issue and > seems like problem is with respect to computing the quer_loc in below > loop, and due to that 'quer_loc' is becoming bigger than query_len and > hitting the assert. The loop in charge of assigning the constants in the queries needs to be smarter in terms of the new locations assigned to sub-queries, here the VALUES one. 499edb09741b is the origin of the regression, so open item assigned to me (will look into it later), adding Anthonin in CC. -- Michael
Attachment
On Sun, Jun 8, 2025 at 8:09 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Sat, Jun 07, 2025 at 10:19:53PM +0530, Dilip Kumar wrote: > > I didn't get time to debug, I might look into it tomorrow if someone > > doesn't do it before that, but I am able to reproduce the issue and > > seems like problem is with respect to computing the quer_loc in below > > loop, and due to that 'quer_loc' is becoming bigger than query_len and > > hitting the assert. > > The loop in charge of assigning the constants in the queries needs to > be smarter in terms of the new locations assigned to sub-queries, here > the VALUES one. > > 499edb09741b is the origin of the regression, so open item assigned to > me (will look into it later), adding Anthonin in CC. Thanks, I was looking into the commit pointed by you, I found this change problematic, transformTopLevelStmt(ParseState *pstate, RawStmt *parseTree) { Query *result; + /* Store RawStmt's length and location in pstate */ + pstate->p_stmt_len = parseTree->stmt_len; + pstate->p_stmt_location = parseTree->stmt_location; + /* We're at top level, so allow SELECT INTO */ result = transformOptionalSelectInto(pstate, parseTree->stmt); - result->stmt_location = parseTree->stmt_location; - result->stmt_len = parseTree->stmt_len; So here before this commit we were setting the 'query->stmt_len' as parseTree->stmt_len from the Top level statement, but after this fix, we are setting the length of the select statement, in query->stmt_len, because now we are doing that in transformOptionalSelectInto->transformStmt->setQueryLocationAndLength so that is just transferring the select statement and has length of that. Due to that in generate_normalized_query() we are getting 'quer_loc' bigger than the 'query_len'. -- Regards, Dilip Kumar Google
On 2025/06/08 11:39, Michael Paquier wrote: > On Sat, Jun 07, 2025 at 10:19:53PM +0530, Dilip Kumar wrote: >> I didn't get time to debug, I might look into it tomorrow if someone >> doesn't do it before that, but I am able to reproduce the issue and >> seems like problem is with respect to computing the quer_loc in below >> loop, and due to that 'quer_loc' is becoming bigger than query_len and >> hitting the assert. > > The loop in charge of assigning the constants in the queries needs to > be smarter in terms of the new locations assigned to sub-queries, here > the VALUES one. > > 499edb09741b is the origin of the regression, so open item assigned to > me (will look into it later), adding Anthonin in CC. The issue seems to be that the commit missed handling cases where a clause follows a SELECT query wrapped in parentheses. If the following clause includes a constant (e.g., LIMIT 1 or FETCH FIRST 1 ROW ONLY), it can trigger the reported assertion failure. If it doesn't include a constant (e.g., FETCH FIRST ROW ONLY), the assertion failure doesn't occur, but only the inner SELECT query (inside the parentheses) appears in pg_stat_statements, i.e., the "FETCH FIRST ROW ONLY" part is missing from pg_stat_statements.query. + SelectStmt *n = (SelectStmt *) $2; + + /* + * As SelectStmt's location starts at the SELECT keyword, + * we need to track the length of the SelectStmt within + * parentheses to be able to extract the relevant part + * of the query. Without this, the RawStmt's length would + * be used and would include the closing parenthesis. + */ + n->stmt_len = @3 - @2; Shouldn't this part be skipped in cases where the SELECT with parens is followed by a clause? At least in those cases, this logic doesn't seem appropriate. Regards, -- Fujii Masao NTT DATA Japan Corporation
Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements
From
Michael Paquier
Date:
On Mon, Jun 09, 2025 at 07:07:42PM +0900, Fujii Masao wrote: > Shouldn't this part be skipped in cases where the SELECT with parens is > followed by a clause? At least in those cases, this logic doesn't seem > appropriate. Touché. That means messing with any parent Node that includes or touches select_with_parens, select_no_parens (JSON_ARRAY has one) or SelectStmt (few of these, like in CTAS). That's not cool in the long term because we would expect any new node that englobes a SelectStmt to be able to set up their inner length and location as they should, as far as I get it. I was wondering first if we could have done something with the top-level statement depending on the nesting level of PGSS, but that's not going to fly high, for example with cases like this one (any case that has an inner SELECT): copy ((select 1) union ((select 1)) fetch first 1 row only) to stdout Even with that in mind, a second thing I was wondering is to compile the length within select_with_parens, still there seems to be an extra parenthesis pending at the end of some of the strings. Perhaps I'm missing a simpler concept, of course. Anyway, this is the second bug that we have in this area, and this one is worse. Now that we are in the middle of beta, this is gathering the signs that we should revert the change from the tree for the moment and potentially revisit the whole in v19 with these edge cases handled. So attached is a patch doing a revert of: 499edb09741b, nested statement tracking. 06450c7b8c70, follow-up fix for 499edb09741b. There was also 6b652e6ce85a for the addition of query ID for CREATE TABLE AS and DECLARE, and this is still OK as a standalone change. Its tests are updated with an extra entry added in the non-top-level case, with the query string matching the top-level string as a result of the attached. If you have comments, feel free. -- Michael
Attachment
Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements
From
Michael Paquier
Date:
On Tue, Jun 10, 2025 at 08:14:40PM +0900, Michael Paquier wrote: > Hmm. I am wondering whether this would be enough; these are not the > only places where we use select_no_parens and select_with_parens. I'd > be curious to see what you think a fully-bloomed fix should look like. By the way, this report also means that none of the problematic query patterns are run in the main regression test suite, or we would have noticed something immediately when running the recovery test 027_stream_regress.pl that loads PGSS. So we have a problem in test coverage with query normalization patterns. But we also have a second, older, problem in terms of coverage regarding the parentheses patterns that are assumed as supported in the parser. Even if we finish by reverting the nested query tracking support for now (let's settle things before beta2 not after it), we should definitely improve the main regression test suite in this area to make sure that reintroducing it safe. -- Michael
Attachment
On 2025/06/11 7:39, Michael Paquier wrote: > On Tue, Jun 10, 2025 at 08:14:40PM +0900, Michael Paquier wrote: >> Hmm. I am wondering whether this would be enough; these are not the >> only places where we use select_no_parens and select_with_parens. I'd >> be curious to see what you think a fully-bloomed fix should look like. > > By the way, this report also means that none of the problematic query > patterns are run in the main regression test suite, or we would have > noticed something immediately when running the recovery test > 027_stream_regress.pl that loads PGSS. > > So we have a problem in test coverage with query normalization > patterns. But we also have a second, older, problem in terms of > coverage regarding the parentheses patterns that are assumed as > supported in the parser. Even if we finish by reverting the nested > query tracking support for now (let's settle things before beta2 not > after it), we should definitely improve the main regression test suite > in this area to make sure that reintroducing it safe. Sounds reasonable to me. Also I agree with reverting the commit and revisiting the feature in v19. Regards, -- Fujii Masao NTT DATA Japan Corporation
Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements
From
Michael Paquier
Date:
On Wed, Jun 11, 2025 at 11:04:36AM +0200, Anthonin Bonnefoy wrote: > Here's a v2 for the attempted fix. This is using the mentioned approach: > - SelectStmt's location is set to Select's position when the node if built > - If there are surrounding parentheses, we include them and move the > start of the statement to the outermost '(' > > The Statement length is only set for: > - COPY: We use the PreparableStatement's surrounding '()' to set both > location and length (setting location should be redundant here though) > - CTAS: we set the statement's length if there's a tailing 'WITH (NO) DATA'. > - Other than those, the statement length will fallback to use the top > RawStmt's remaining length. @@ -3417,9 +3418,9 @@ CopyStmt: COPY opt_binary qualified_name opt_column_list [...] - updatePreparableStmtEnd($3, @2, @4 - @2); + updatePreparableStmtEnd($3, @3, @4 - @3); For the case of COPY, it is possible to get a much better report by setting the location and length to not include the first parenthesis, like that. > I haven't done it for Views and JSON_ARRAY. Their nested queries > aren't currently tracked and reported by pgss so there's no way to > test those. I have to admit that it is inconsistent to set a location in their respective inner nodes while we don't use updateSelectStmtEnd() to adjust the length based on the location assigned by the parser. There's a risk of having a code path in the future looking at the location later on based on the parsed state and taking a wrong decision because the location and the length do not match. Such cases could be easier to reach outside of core, for example imagine an extension looking at a ViewStmt where we set the location but not a matching length. We would consider as inner query a string that begins at the beginning of the location until the end of the string because the length is not set to match with what the parser finds. > This could be done but there's no available way to test > this is correct AFAIK? Yes, perhaps we should think harder about this part. There are two patterns among a few others like UNION, which can also be used in views. Anyway, I am going to pull the plug on this one today, because we need to take a decision with beta2 approaching fast but it's mostly me being not fully confident that this is right as-is. If we're wrong in one path, that's an out-of-bound memory write waiting for us in PGSS for the query normalization, as this report proves, and that would be bad. The first step would be to improve the main regression test suite for all these grammar "parenthesis" patterns we have never had coverage for in core. I am wondering if it is actually possible to reach an inconsistent state even in older stable branches because we've never tested all these grammar patterns. Perhaps I'm wrong, but we would be more confident with all these extra patterns stressed first by 027_stream_regress.pl which forces queries to have some normalization applied. -- Michael
Attachment
Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements
From
Anthonin Bonnefoy
Date:
On Fri, Jun 13, 2025 at 1:51 AM Michael Paquier <michael@paquier.xyz> wrote: > - About the tests we could do to validate more in depth the locations > and lengths assigned to the parsed node without relying on PGSS and > EXPLAIN, which will support never support the full range of things > like views. I was wondering about the addition of a test module that > plugs into one or more hooks (only the post-parse one should be > enough), where we simply print the Nodes generated to a string using > the facility in src/backend/nodes/print.c. It would be then possible > to filter the output generated with some regex magic to print the > fields we want to check. That would work for the CREATE VIEW case, > for example, and it could be used for other things than just the > statement length and/or locations set in a gIven Node. Agree on the tests. Having a test module that dumps the parse tree in an usable way for the tests was definitely something I had in mind. Relying on PGSS to test parse behaviour definitely has its limit.