Thread: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements

BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18947
Logged by:          Alexander Kozhemyakin
Email address:      a.kozhemyakin@postgrespro.ru
PostgreSQL version: 18beta1
Operating system:   Ubuntu 24.04
Description:

Hi, I encountered an assertion failure when running the following query on
the master branch (commit 112e40b8) when building with --enable-cassert
reproduce
initdb -D data
echo "
shared_preload_libraries = 'pg_stat_statements'
" > data/postgresql.auto.conf
pg_ctl -D data -l logfile start
psql <<EOF
(values(1)) limit 1;
EOF
bt
(gdb)
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized
out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at
./nptl/pthread_kill.c:89
#3  0x00007b9ecba4527e in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#4  0x00007b9ecba288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00005ee635e0da6f in ExceptionalCondition
(conditionName=conditionName@entry=0x7b9ecc55f97d "len_to_wrt >= 0",
fileName=fileName@entry=0x7b9ecc55f000
"/home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c",
lineNumber=lineNumber@entry=2929)
    at /home/postgres/test/postgres/src/backend/utils/error/assert.c:66
#6  0x00007b9ecc55c5a6 in generate_normalized_query
(query_len_p=0x7fff53b9a348, query_loc=1, query=0x5ee6575d0df1 "values(1))
limit 1;", jstate=0x5ee657606e88) at
/home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c:2929
#7  pgss_store (query=0x5ee6575d0df1 "values(1)) limit 1;",
queryId=<optimized out>, query_location=<optimized out>,
query_len=<optimized out>, kind=kind@entry=PGSS_INVALID,
total_time=total_time@entry=0, rows=rows@entry=0, bufusage=0x0,
walusage=0x0, jitusage=0x0, jstate=0x5ee657606e88,
    parallel_workers_to_launch=0, parallel_workers_launched=0) at
/home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c:1350
#8  0x00007b9ecc55c6ca in pgss_post_parse_analyze (pstate=0x5ee6575d1a98,
query=0x5ee6575d1ba8, jstate=0x5ee657606e88) at
/home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c:868
#9  0x00005ee6359f46cb in parse_analyze_fixedparams
(parseTree=parseTree@entry=0x5ee6575d1a18,
sourceText=sourceText@entry=0x5ee6575d0df0 "(values(1)) limit 1;",
paramTypes=paramTypes@entry=0x0, numParams=numParams@entry=0,
queryEnv=queryEnv@entry=0x0)
    at /home/postgres/test/postgres/src/backend/parser/analyze.c:128
#10 0x00005ee635cb71a0 in pg_analyze_and_rewrite_fixedparams (queryEnv=0x0,
numParams=0, paramTypes=0x0, query_string=<optimized out>,
parsetree=<optimized out>) at
/home/postgres/test/postgres/src/backend/tcop/postgres.c:682
#11 exec_simple_query (query_string=0x5ee6575d0df0 "(values(1)) limit 1;")
at /home/postgres/test/postgres/src/backend/tcop/postgres.c:1189
#12 0x00005ee635cb8eb0 in PostgresMain (dbname=<optimized out>,
username=<optimized out>) at
/home/postgres/test/postgres/src/backend/tcop/postgres.c:4766
#13 0x00005ee635cb3293 in BackendMain (startup_data=<optimized out>,
startup_data_len=<optimized out>) at
/home/postgres/test/postgres/src/backend/tcop/backend_startup.c:124
#14 0x00005ee635c021e2 in postmaster_child_launch (child_type=<optimized
out>, child_slot=1, startup_data=startup_data@entry=0x7fff53b9a9d0,
startup_data_len=startup_data_len@entry=24,
client_sock=client_sock@entry=0x7fff53b9a9f0) at
/home/postgres/test/postgres/src/backend/postmaster/launch_backend.c:290
#15 0x00005ee635c06014 in BackendStartup (client_sock=0x7fff53b9a9f0) at
/home/postgres/test/postgres/src/backend/postmaster/postmaster.c:3580
#16 ServerLoop () at
/home/postgres/test/postgres/src/backend/postmaster/postmaster.c:1702
#17 0x00005ee635c07b0e in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x5ee65758a0b0) at
/home/postgres/test/postgres/src/backend/postmaster/postmaster.c:1400
#18 0x00005ee6358d3f4e in main (argc=3, argv=0x5ee65758a0b0) at
/home/postgres/test/postgres/src/backend/main/main.c:227


On Sat, Jun 7, 2025 at 9:10 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      18947
> Logged by:          Alexander Kozhemyakin
> Email address:      a.kozhemyakin@postgrespro.ru
> PostgreSQL version: 18beta1
> Operating system:   Ubuntu 24.04
> Description:
>
> Hi, I encountered an assertion failure when running the following query on
> the master branch (commit 112e40b8) when building with --enable-cassert
> reproduce
> initdb -D data
> echo "
> shared_preload_libraries = 'pg_stat_statements'
> " > data/postgresql.auto.conf
> pg_ctl -D data -l logfile start
> psql <<EOF
> (values(1)) limit 1;
> EOF
> bt
> (gdb)
> #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized
> out>) at ./nptl/pthread_kill.c:44
> #1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at
> ./nptl/pthread_kill.c:78
> #2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at
> ./nptl/pthread_kill.c:89
> #3  0x00007b9ecba4527e in __GI_raise (sig=sig@entry=6) at
> ../sysdeps/posix/raise.c:26
> #4  0x00007b9ecba288ff in __GI_abort () at ./stdlib/abort.c:79
> #5  0x00005ee635e0da6f in ExceptionalCondition
> (conditionName=conditionName@entry=0x7b9ecc55f97d "len_to_wrt >= 0",
> fileName=fileName@entry=0x7b9ecc55f000
> "/home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c",
> lineNumber=lineNumber@entry=2929)
>     at /home/postgres/test/postgres/src/backend/utils/error/assert.c:66
> #6  0x00007b9ecc55c5a6 in generate_normalized_query
> (query_len_p=0x7fff53b9a348, query_loc=1, query=0x5ee6575d0df1 "values(1))
> limit 1;", jstate=0x5ee657606e88) at
> /home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c:2929
> #7  pgss_store (query=0x5ee6575d0df1 "values(1)) limit 1;",
> queryId=<optimized out>, query_location=<optimized out>,
> query_len=<optimized out>, kind=kind@entry=PGSS_INVALID,
> total_time=total_time@entry=0, rows=rows@entry=0, bufusage=0x0,
> walusage=0x0, jitusage=0x0, jstate=0x5ee657606e88,
>     parallel_workers_to_launch=0, parallel_workers_launched=0) at
> /home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c:1350
> #8  0x00007b9ecc55c6ca in pgss_post_parse_analyze (pstate=0x5ee6575d1a98,
> query=0x5ee6575d1ba8, jstate=0x5ee657606e88) at
> /home/postgres/test/postgres/contrib/pg_stat_statements/pg_stat_statements.c:868
> #9  0x00005ee6359f46cb in parse_analyze_fixedparams
> (parseTree=parseTree@entry=0x5ee6575d1a18,
> sourceText=sourceText@entry=0x5ee6575d0df0 "(values(1)) limit 1;",
> paramTypes=paramTypes@entry=0x0, numParams=numParams@entry=0,
> queryEnv=queryEnv@entry=0x0)
>     at /home/postgres/test/postgres/src/backend/parser/analyze.c:128
> #10 0x00005ee635cb71a0 in pg_analyze_and_rewrite_fixedparams (queryEnv=0x0,
> numParams=0, paramTypes=0x0, query_string=<optimized out>,
> parsetree=<optimized out>) at
> /home/postgres/test/postgres/src/backend/tcop/postgres.c:682
> #11 exec_simple_query (query_string=0x5ee6575d0df0 "(values(1)) limit 1;")
> at /home/postgres/test/postgres/src/backend/tcop/postgres.c:1189
> #12 0x00005ee635cb8eb0 in PostgresMain (dbname=<optimized out>,
> username=<optimized out>) at
> /home/postgres/test/postgres/src/backend/tcop/postgres.c:4766
> #13 0x00005ee635cb3293 in BackendMain (startup_data=<optimized out>,
> startup_data_len=<optimized out>) at
> /home/postgres/test/postgres/src/backend/tcop/backend_startup.c:124
> #14 0x00005ee635c021e2 in postmaster_child_launch (child_type=<optimized
> out>, child_slot=1, startup_data=startup_data@entry=0x7fff53b9a9d0,
> startup_data_len=startup_data_len@entry=24,
> client_sock=client_sock@entry=0x7fff53b9a9f0) at
> /home/postgres/test/postgres/src/backend/postmaster/launch_backend.c:290
> #15 0x00005ee635c06014 in BackendStartup (client_sock=0x7fff53b9a9f0) at
> /home/postgres/test/postgres/src/backend/postmaster/postmaster.c:3580
> #16 ServerLoop () at
> /home/postgres/test/postgres/src/backend/postmaster/postmaster.c:1702
> #17 0x00005ee635c07b0e in PostmasterMain (argc=argc@entry=3,
> argv=argv@entry=0x5ee65758a0b0) at
> /home/postgres/test/postgres/src/backend/postmaster/postmaster.c:1400
> #18 0x00005ee6358d3f4e in main (argc=3, argv=0x5ee65758a0b0) at
> /home/postgres/test/postgres/src/backend/main/main.c:227

I didn't get time to debug, I might look into it tomorrow if someone
doesn't do it before that, but I am able to reproduce the issue and
seems like problem is with respect to computing the quer_loc in below
loop, and due to that 'quer_loc' is becoming bigger than query_len and
hitting the assert.

for (i = 0; i < jstate->clocations_count; i++)
{
 ...
quer_loc = off + tok_len;
}

len_to_wrt = query_len - quer_loc;


--
Regards,
Dilip Kumar
Google



On Sat, Jun 07, 2025 at 10:19:53PM +0530, Dilip Kumar wrote:
> I didn't get time to debug, I might look into it tomorrow if someone
> doesn't do it before that, but I am able to reproduce the issue and
> seems like problem is with respect to computing the quer_loc in below
> loop, and due to that 'quer_loc' is becoming bigger than query_len and
> hitting the assert.

The loop in charge of assigning the constants in the queries needs to
be smarter in terms of the new locations assigned to sub-queries, here
the VALUES one.

499edb09741b is the origin of the regression, so open item assigned to
me (will look into it later), adding Anthonin in CC.
--
Michael

Attachment
On Sun, Jun 8, 2025 at 8:09 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sat, Jun 07, 2025 at 10:19:53PM +0530, Dilip Kumar wrote:
> > I didn't get time to debug, I might look into it tomorrow if someone
> > doesn't do it before that, but I am able to reproduce the issue and
> > seems like problem is with respect to computing the quer_loc in below
> > loop, and due to that 'quer_loc' is becoming bigger than query_len and
> > hitting the assert.
>
> The loop in charge of assigning the constants in the queries needs to
> be smarter in terms of the new locations assigned to sub-queries, here
> the VALUES one.
>
> 499edb09741b is the origin of the regression, so open item assigned to
> me (will look into it later), adding Anthonin in CC.

Thanks, I was looking into the commit pointed by you,  I found this
change problematic,

transformTopLevelStmt(ParseState *pstate, RawStmt *parseTree)
 {
        Query      *result;

+       /* Store RawStmt's length and location in pstate */
+       pstate->p_stmt_len = parseTree->stmt_len;
+       pstate->p_stmt_location = parseTree->stmt_location;
+
        /* We're at top level, so allow SELECT INTO */
        result = transformOptionalSelectInto(pstate, parseTree->stmt);

-       result->stmt_location = parseTree->stmt_location;
-       result->stmt_len = parseTree->stmt_len;

So here before this commit we were setting the 'query->stmt_len' as
parseTree->stmt_len from the Top level statement, but after this fix,
we are setting the length of the select statement, in query->stmt_len,
because now we are doing that in
transformOptionalSelectInto->transformStmt->setQueryLocationAndLength
so that is just transferring the select statement and has length of
that.  Due to that in generate_normalized_query() we are getting
'quer_loc' bigger than the 'query_len'.

--
Regards,
Dilip Kumar
Google




On 2025/06/08 11:39, Michael Paquier wrote:
> On Sat, Jun 07, 2025 at 10:19:53PM +0530, Dilip Kumar wrote:
>> I didn't get time to debug, I might look into it tomorrow if someone
>> doesn't do it before that, but I am able to reproduce the issue and
>> seems like problem is with respect to computing the quer_loc in below
>> loop, and due to that 'quer_loc' is becoming bigger than query_len and
>> hitting the assert.
> 
> The loop in charge of assigning the constants in the queries needs to
> be smarter in terms of the new locations assigned to sub-queries, here
> the VALUES one.
> 
> 499edb09741b is the origin of the regression, so open item assigned to
> me (will look into it later), adding Anthonin in CC.

The issue seems to be that the commit missed handling cases where
a clause follows a SELECT query wrapped in parentheses.

If the following clause includes a constant (e.g., LIMIT 1 or
FETCH FIRST 1 ROW ONLY), it can trigger the reported assertion failure.
If it doesn't include a constant (e.g., FETCH FIRST ROW ONLY),
the assertion failure doesn't occur, but only the inner SELECT query (inside
the parentheses) appears in pg_stat_statements, i.e.,
the "FETCH FIRST ROW ONLY" part is missing from pg_stat_statements.query.

+                    SelectStmt *n = (SelectStmt *) $2;
+
+                    /*
+                     * As SelectStmt's location starts at the SELECT keyword,
+                     * we need to track the length of the SelectStmt within
+                     * parentheses to be able to extract the relevant part
+                     * of the query.  Without this, the RawStmt's length would
+                     * be used and would include the closing parenthesis.
+                     */
+                    n->stmt_len = @3 - @2;

Shouldn't this part be skipped in cases where the SELECT with parens is
followed by a clause? At least in those cases, this logic doesn't seem
appropriate.

Regards,

-- 
Fujii Masao
NTT DATA Japan Corporation




On Mon, Jun 09, 2025 at 07:07:42PM +0900, Fujii Masao wrote:
> Shouldn't this part be skipped in cases where the SELECT with parens is
> followed by a clause? At least in those cases, this logic doesn't seem
> appropriate.

Touché.  That means messing with any parent Node that includes or
touches select_with_parens, select_no_parens (JSON_ARRAY has one) or
SelectStmt (few of these, like in CTAS).  That's not cool in the long
term because we would expect any new node that englobes a SelectStmt
to be able to set up their inner length and location as they should,
as far as I get it.  I was wondering first if we could have done
something with the top-level statement depending on the nesting level
of PGSS, but that's not going to fly high, for example with cases like
this one (any case that has an inner SELECT):
copy ((select 1) union ((select 1)) fetch first 1 row only) to stdout

Even with that in mind, a second thing I was wondering is to compile
the length within select_with_parens, still there seems to be an extra
parenthesis pending at the end of some of the strings.  Perhaps I'm
missing a simpler concept, of course.

Anyway, this is the second bug that we have in this area, and this one
is worse.  Now that we are in the middle of beta, this is gathering
the signs that we should revert the change from the tree for the
moment and potentially revisit the whole in v19 with these edge cases
handled.  So attached is a patch doing a revert of:
499edb09741b, nested statement tracking.
06450c7b8c70, follow-up fix for 499edb09741b.

There was also 6b652e6ce85a for the addition of query ID for CREATE
TABLE AS and DECLARE, and this is still OK as a standalone change.
Its tests are updated with an extra entry added in the non-top-level
case, with the query string matching the top-level string as a result
of the attached.

If you have comments, feel free.
--
Michael

Attachment
On Tue, Jun 10, 2025 at 08:14:40PM +0900, Michael Paquier wrote:
> Hmm.  I am wondering whether this would be enough; these are not the
> only places where we use select_no_parens and select_with_parens.  I'd
> be curious to see what you think a fully-bloomed fix should look like.

By the way, this report also means that none of the problematic query
patterns are run in the main regression test suite, or we would have
noticed something immediately when running the recovery test
027_stream_regress.pl that loads PGSS.

So we have a problem in test coverage with query normalization
patterns.  But we also have a second, older, problem in terms of
coverage regarding the parentheses patterns that are assumed as
supported in the parser.  Even if we finish by reverting the nested
query tracking support for now (let's settle things before beta2 not
after it), we should definitely improve the main regression test suite
in this area to make sure that reintroducing it safe.
--
Michael

Attachment

On 2025/06/11 7:39, Michael Paquier wrote:
> On Tue, Jun 10, 2025 at 08:14:40PM +0900, Michael Paquier wrote:
>> Hmm.  I am wondering whether this would be enough; these are not the
>> only places where we use select_no_parens and select_with_parens.  I'd
>> be curious to see what you think a fully-bloomed fix should look like.
> 
> By the way, this report also means that none of the problematic query
> patterns are run in the main regression test suite, or we would have
> noticed something immediately when running the recovery test
> 027_stream_regress.pl that loads PGSS.
> 
> So we have a problem in test coverage with query normalization
> patterns.  But we also have a second, older, problem in terms of
> coverage regarding the parentheses patterns that are assumed as
> supported in the parser.  Even if we finish by reverting the nested
> query tracking support for now (let's settle things before beta2 not
> after it), we should definitely improve the main regression test suite
> in this area to make sure that reintroducing it safe.

Sounds reasonable to me.

Also I agree with reverting the commit and revisiting the feature in v19.

Regards,

-- 
Fujii Masao
NTT DATA Japan Corporation




On Wed, Jun 11, 2025 at 11:04:36AM +0200, Anthonin Bonnefoy wrote:
> Here's a v2 for the attempted fix. This is using the mentioned approach:
> - SelectStmt's location is set to Select's position when the node if built
> - If there are surrounding parentheses, we include them and move the
> start of the statement to the outermost '('
>
> The Statement length is only set for:
> - COPY: We use the PreparableStatement's surrounding '()'  to set both
> location and length (setting location should be redundant here though)
> - CTAS: we set the statement's length if there's a tailing 'WITH (NO) DATA'.
> - Other than those, the statement length will fallback to use the top
> RawStmt's remaining length.

@@ -3417,9 +3418,9 @@ CopyStmt: COPY opt_binary qualified_name opt_column_list
[...]
-     updatePreparableStmtEnd($3, @2, @4 - @2);
+     updatePreparableStmtEnd($3, @3, @4 - @3);

For the case of COPY, it is possible to get a much better report by
setting the location and length to not include the first parenthesis,
like that.

> I haven't done it for Views and JSON_ARRAY. Their nested queries
> aren't currently tracked and reported by pgss so there's no way to
> test those.

I have to admit that it is inconsistent to set a location in their
respective inner nodes while we don't use updateSelectStmtEnd() to
adjust the length based on the location assigned by the parser.
There's a risk of having a code path in the future looking at the
location later on based on the parsed state and taking a wrong
decision because the location and the length do not match.  Such cases
could be easier to reach outside of core, for example imagine an
extension looking at a ViewStmt where we set the location but not a
matching length.  We would consider as inner query a string that
begins at the beginning of the location until the end of the string
because the length is not set to match with what the parser finds.

> This could be done but there's no available way to test
> this is correct AFAIK?

Yes, perhaps we should think harder about this part.  There are two
patterns among a few others like UNION, which can also be used in
views.

Anyway, I am going to pull the plug on this one today, because we need
to take a decision with beta2 approaching fast but it's mostly me
being not fully confident that this is right as-is.  If we're wrong in
one path, that's an out-of-bound memory write waiting for us in PGSS
for the query normalization, as this report proves, and that would be
bad.

The first step would be to improve the main regression test suite for
all these grammar "parenthesis" patterns we have never had coverage
for in core.  I am wondering if it is actually possible to reach an
inconsistent state even in older stable branches because we've never
tested all these grammar patterns.  Perhaps I'm wrong, but we would be
more confident with all these extra patterns stressed first by
027_stream_regress.pl which forces queries to have some normalization
applied.
--
Michael

Attachment
On Fri, Jun 13, 2025 at 1:51 AM Michael Paquier <michael@paquier.xyz> wrote:
> - About the tests we could do to validate more in depth the locations
> and lengths assigned to the parsed node without relying on PGSS and
> EXPLAIN, which will support never support the full range of things
> like views.  I was wondering about the addition of a test module that
> plugs into one or more hooks (only the post-parse one should be
> enough), where we simply print the Nodes generated to a string using
> the facility in src/backend/nodes/print.c.  It would be then possible
> to filter the output generated with some regex magic to print the
> fields we want to check.  That would work for the CREATE VIEW case,
> for example, and it could be used for other things than just the
> statement length and/or locations set in a gIven Node.

Agree on the tests. Having a test module that dumps the parse tree in
an usable way for the tests was definitely something I had in mind.
Relying on PGSS to test parse behaviour definitely has its limit.